{ "info": { "author": "Ian Gow", "author_email": "ian.gow@unimelb.edu.au", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "## WRDS to PG Migration\nThis software has two functions:\n- Download tables from [WRDS](https://wrds-web.wharton.upenn.edu/wrds/) and uploads to PG. \n- Upload sas file (`*.sas7dbat`) to PG.\n\nThe code will only work if you have access to WRDS and to the data in question.\n## Requirements\n#### 1. Git\nWhile not strictly necessary to use the scripts here, [Git](https://git-scm.com/downloads) likely makes it easier to download and to update.\n\nIf all Git repositories are kept in `~/git`, use the following commands to clone this repository:\n```\ncd ~/git\ngit clone https://github.com/iangow/wrds2pg.git\n```\nThis will create a copy of the repository in `~/git/wrds2pg`. Note that one can get updates to the repository by going to the directory and \"pulling\" the latest code:\n```\ncd ~/git/wrds2pg\ngit pull\n```\nAlternatively, you can fork the repository on GitHub and then clone. Cloning using the SSH URL (e.g., `git@github.com:iangow/wrds2pg.git`) is necessary for Git pulling and pushing to work well in RStudio.\n\n#### 2. Python\nThe software uses Python 3 and depends on Pandas, SQLAlchemy and Paramiko. In addition, the Python scripts generally interact with PostgreSQL using the psycopg (see [here](http://initd.org/psycopg/)) and SQLAlchemy.\n\n#### 3. A WRDS ID\nTo use public-key authentication to access WRDS, follow hints taken from [here](https://debian-administration.org/article/152/Password-less_logins_with_OpenSSH) to set up a public key. Copy that key to the WRDS server from the terminal on my computer. (Note that this code assumes you have a directory `.ssh` in your home directory. If not, log into WRDS via SSH, then type `mkdir ~/.ssh` to create this.) Here's code to create the key and send it to WRDS (for me):\n```\nssh-keygen -t rsa\ncat ~/.ssh/id_rsa.pub | ssh iangow@wrds-cloud.wharton.upenn.edu \"cat >> ~/.ssh/authorized_keys\"\n```\nUse an empty passphrase in setting up the key so that the scripts can run without user intervention.\n\n#### 4. PostgreSQL\nYou should have a PostgreSQL database to store the data. There are also some data dependencies in that some scripts assume the existence of other data in the database. Also, I assume the existence of a role `wrds` (SQL `CREATE ROLE wrds` works to add this if it is absent).\n\n#### 5. Environment variables\nI am migrating the scripts, etc., from using hard-coded values (e.g., my WRDS ID `iangow`) to using environment variales. \nEnvironment variables that I use include:\n\n- `PGDATABASE`: The name of the PostgreSQL database you use.\n- `PGUSER`: Your username on the PostgreSQL database.\n- `PGHOST`: Where the PostgreSQL database is to be found (this will be `localhost` if its on the same machine as you're running the code on)\n- `WRDS_ID`: Your [WRDS](https://wrds-web.wharton.upenn.edu/wrds/) ID.\n\nI set these environment variables in `~/.profile`:\n\n```\nexport PGHOST=\"localhost\"\nexport PGDATABASE=\"crsp\"\nexport WRDS_ID=\"iangow\"\nexport PGUSER=\"igow\"\n```\n\n```\nsource ~/.profile\n```\n\nI also set them in `~/.Rprofile`, as RStudio doesn't seem to pick up the settings in `~/.profile` in recent versions of OS X:\n\n```\nSys.setenv(PGHOST=\"localhost\")\nSys.setenv(PGDATABASE=\"crsp\")\n```\n\n## Settings\n#### 1. WRDS Settings\nSet `WRDS_ID` with `wrds_id=your_wrds_id`, otherwise the software will grep from OS environment variables. If you follow the instructions above closely, you don't need to do anything.\n\n#### 2. PG Settings\nIf you have set `PGHOST`, `PGDATABASE`, `PGUSER` as environment variables, the software can grep them. Otherwise, users are expected to specify them when using `wrds_udpate()`. Default `PGPORT` is`5432`. Again, if you follow the instructions above closely, you don't need to do anything.\n\nTwo variables `table` and `schema` are required.\n\n#### 3. Table Settings\nTo tailor tables, specify the following variables:\n\n`fix_missing`: set to `True` to fix missing values. Default value is `False`. \n\n`fix_cr`: set to `True` to fix characters. Default value is `False`.\n\n`drop`: add column names to be dropped.eg.`drop=\"id name\"` will drop column `id` and `name`.\n\n`obs`: add maxium number of observations. eg.`obs=10` will export the top 10 rows from the table.\n\n`rename`: rename columns. eg.`rename=\"fee=mngt_fee\"` rename `fee` to `mngt_fee`.\n\n`force`: set to `True` to force update. Default value is `False`.\n\n### Upload SAS File\nThe software can also upload SAS file directly to PG. You need to have local SAS in order to use this function.\n\nUse `fpath` to specify file path.\n\n### Examples\nHere are some examples.\n\n```py\nfrom wrds2pg import wrds2pg\n\n# 1. Download crsp.mcti from wrds and upload to pg as crps.mcti\n# Simplest version\nwrds2pg.wrds_update(table=\"mcti\", schema=\"crsp\")\n# Tailor table to your needs\nwrds2pg.wrds_update(table=\"mcti\", schema=\"crsp\", host=your_pghost, dbname=your_pg_database, fix_missing=True, \n\tfix_cr=True, drop=\"b30ret b30ind\", obs=10, rename=\"caldt=calendar_date\", force=True)\n\n# 2. Upload test.sas7dbat to pg as crsp.mcti\nwrds2pg.wrds_update(table=\"mcti\", schema=\"crsp\", fpath=\"your_path/test.sas7dbat\")\n```\n\n### Report Bugs\nAuthor: Ian Gow, \n\nContributor: Jingyu Zhang, \n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/iangow/wrds2pg/", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "wrds2pg", "package_url": "https://pypi.org/project/wrds2pg/", "platform": "", "project_url": "https://pypi.org/project/wrds2pg/", "project_urls": { "Homepage": "https://github.com/iangow/wrds2pg/" }, "release_url": "https://pypi.org/project/wrds2pg/1.0.1/", "requires_dist": [ "pandas", "sqlalchemy", "paramiko" ], "requires_python": ">=3", "summary": "Download wrds tables and upload to PostgreSQL, upload SAS file to PG", "version": "1.0.1" }, "last_serial": 5165842, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "730d9f40a17063518775c528f3768ac5", "sha256": "c17e9bb9d0690af991cd29467caddceb951e6fa85eca90e39b33189b96b61827" }, "downloads": -1, "filename": "wrds2pg-1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "730d9f40a17063518775c528f3768ac5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 8358, "upload_time": "2019-04-19T17:25:13", "url": "https://files.pythonhosted.org/packages/15/7d/9b129d1d3c4e8e83d7cf48f5fa80768c61d95bac112f4d497d46073d704f/wrds2pg-1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "40bc8ecebd06e4a04449b6d6e846941d", "sha256": "020f2c806ab050fa44d5224f55fb07b0aab465e80a8fee1c305fcf0321705614" }, "downloads": -1, "filename": "wrds2pg-1.0.tar.gz", "has_sig": false, "md5_digest": "40bc8ecebd06e4a04449b6d6e846941d", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 7453, "upload_time": "2019-04-19T17:25:15", "url": "https://files.pythonhosted.org/packages/f8/b5/57250a95d680429196b3c940d5ab97aa72061508333b710a933a0fcfee15/wrds2pg-1.0.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "72099f0f61f07b4c6a3304171259bbfd", "sha256": "00898e4550b5e68c76b9d61df99822910dc24e346651a757c3ba1922aa876a44" }, "downloads": -1, "filename": "wrds2pg-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "72099f0f61f07b4c6a3304171259bbfd", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 8386, "upload_time": "2019-04-19T19:06:50", "url": "https://files.pythonhosted.org/packages/3c/3f/06c284f407d522a7d673cc91d504dbb699004663e5c4f689bba8e3f00708/wrds2pg-1.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "32535e58253ddb24c5b13863a07ce5ba", "sha256": "56aa8f40dc63c260f72c37828b0ec0d6b737bcdeba29db2aec839aafd734eb15" }, "downloads": -1, "filename": "wrds2pg-1.0.1.tar.gz", "has_sig": false, "md5_digest": "32535e58253ddb24c5b13863a07ce5ba", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 7450, "upload_time": "2019-04-19T19:06:52", "url": "https://files.pythonhosted.org/packages/65/ef/f548c8c1ddc97f75261a41e5e6e0b0397da3535e7212d90e8a862cd61ed3/wrds2pg-1.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "72099f0f61f07b4c6a3304171259bbfd", "sha256": "00898e4550b5e68c76b9d61df99822910dc24e346651a757c3ba1922aa876a44" }, "downloads": -1, "filename": "wrds2pg-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "72099f0f61f07b4c6a3304171259bbfd", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 8386, "upload_time": "2019-04-19T19:06:50", "url": "https://files.pythonhosted.org/packages/3c/3f/06c284f407d522a7d673cc91d504dbb699004663e5c4f689bba8e3f00708/wrds2pg-1.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "32535e58253ddb24c5b13863a07ce5ba", "sha256": "56aa8f40dc63c260f72c37828b0ec0d6b737bcdeba29db2aec839aafd734eb15" }, "downloads": -1, "filename": "wrds2pg-1.0.1.tar.gz", "has_sig": false, "md5_digest": "32535e58253ddb24c5b13863a07ce5ba", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 7450, "upload_time": "2019-04-19T19:06:52", "url": "https://files.pythonhosted.org/packages/65/ef/f548c8c1ddc97f75261a41e5e6e0b0397da3535e7212d90e8a862cd61ed3/wrds2pg-1.0.1.tar.gz" } ] }