{ "info": { "author": "Ritaotao", "author_email": "ritaotao28@gmail.com", "bugtrack_url": null, "classifiers": [ "Programming Language :: Python :: 3", "Topic :: Database", "Topic :: Education", "Topic :: Utilities" ], "description": "# pymtattl\n\n## Introduction\n\nMTA Turnstile Data: http://web.mta.info/developers/turnstile.html\n\nDownload, process, and store MTA Turnstile Data in database\n\n* `Downloader` class: automate downloading turnstile raw entry/exit data from MTA website into txt files (weekly, cumulated)\n* `Cleaner` class: convert downloaded text files and write decumulated records to database.\n\nNote 1: trying to be database agnostic, used sqlalchemy and tested with sqlite and postgres 10.\n\nNote 2: be cautious about date range of files need to be appended to the database tables, avoid duplication or adding data of weeks prior to the ones in the tables.\n\n## Table of Contents\n\n* [Installation](#installation)\n\n* [Download](#download)\n\n* [Clean](#clean)\n\n* [Caveats](#caveats)\n\n* [To-Do](#to-do)\n\n## Installation\n\n pip install pymtattl\n\n## Requirements\n\n* Written for Python 3! Feel free to test and contribute using Python 2!\n* Requires bs4, pandas, sqlalchemy\n\n## Download\n\n`Downloader`: download data within date range as weekly text files.\n\n from pymtattl import Downloader\n\n download = Downloader(date_range=(\"2018-01-01\", \"2018-02-01\"),\n main_path='./data/',\n verbose=10)\n data_path = download.run()\n\n* `date_range`: *tuple*\n - Define the start and end dates *(recommend testing with small date ranges, as downloading all files might be slow)*\n - Example (yyyy-mm-dd): `(\"2018-01-01\", \"2018-02-01\")`\n\n* `main_path`: *string*, default './data/'\n - A directory to store downloaded data files (will be created if dir not exists)\n - Every run creates a new dir `download-yyyymmddhhmmss`, where all data files and log files are nested under\n\n* `verbose`: *int*, default 10\n - Log and print out when every n files are downloaded\n\n* Returns full directory of parent folder `download-yyyymmddhhmmss`\n\n## Clean\n\n`Cleaner`: decumulate and store downloaded data files in database. Please make sure database already exists if not using sqlite.\n\n from pymtattl import Cleaner\n \n clean = Cleaner(date_range=None,\n input_path='./data/download-20181227160016',\n dbstring='postgresql://user:p@ssword@localhost:5432/mta_sample')\n clean.run()\n\n* Create 4 tables to save disk space and use end of last week numbers to be used as baseline for current week\n - `turnstile`: decumulated entry/exit \n - columns: *id, device_id, timestamp, description, entry, exit*\n - `station`: mta staion defined by ca, unit pairs\n - columns: *id, ca, unit*\n - `device`: device location in each station\n - columns: *id, station_id, scp*\n - `previous`: memorize ending data from previous week, support decumulate accross weekly files\n - columns: *id, device_id, timestamp, description, entry, exit, file_date*\n\n* `date_range`: *tuple*, default None\n - Define the start and end dates of the files to be added to database\n - Example (yyyy-mm-dd): `(\"2018-01-01\", \"2018-02-01\")`\n - If None (default), will add all data files in folder\n\n* `input_path`: *string*\n - Directory of the downloaded text files to be added to database\n\n* `dbstring`: *string*\n - Database urls used by sqlalchemy\n - dialect+driver://username:password@host:port/database\n - postgres: 'postgresql://scott:tiger@localhost/mydatabase'\n - mysql: 'mysql://scott:tiger@localhost/foo'\n - sqlite: 'sqlite:///foo.db'\n - more info: https://docs.sqlalchemy.org/en/latest/core/engines.html#postgresql\n\n## Data Issues\n\n* Some known data issues, might happen in multiple files and quite manual to detect and remove\n\n - In turnstile_120428.txt, one line with empty ('') exit number\n - In turnstile_120714.txt, first few lines could not be parsed\n - Date strings were reformatted to `mm/dd/yyyy` (03/20/2018)\n - In turnstile_170204.txt, A025, R023, 01-03-01, 02/01/2017, entry numbers start counting backwards\n - In turnstile_170909.txt, C020, R233, 00-00-00, 09/02/2017, status switch between REGULAR and RECOVER AUD\n - In turnstile_170318.txt, PTH03, R522, 00-00-09, 03/11/2017, every second record seems to be correct but every next one could increase by 80K. This seem to happen with smaller numbers as well. In turnstile_171028.txt, PTH07, R550, 00-01-06, 10/21/2017, entries increase and decrease by 8K.\n\n* Incompatible data types and formats were detected, logged, and ignored during Downloading process. The package provides a workaround with other issues related to values:\n\n - Count backwards: use absolute values after diff method is called\n - Adjacent values inconsistent, but every second record correct: a second diff is called on values greater than certain threshold (Entry > 7000, Exit > 6000)\n - Huge values: values still above threshold are dropped\n\n## To-Do\n\n* Batch processing of multiple data files together before decumulate step.\n\n* Append station name to station table. (in pymtattl/utils.py)\n\n* More to come...", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Ritaotao/pymtattl", "keywords": "mta,turnstile,data,traffic", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "pymtattl", "package_url": "https://pypi.org/project/pymtattl/", "platform": "", "project_url": "https://pypi.org/project/pymtattl/", "project_urls": { "Homepage": "https://github.com/Ritaotao/pymtattl" }, "release_url": "https://pypi.org/project/pymtattl/1.1.1/", "requires_dist": null, "requires_python": "", "summary": "Download and store MTA turnstile data", "version": "1.1.1" }, "last_serial": 4797325, "releases": { "0.1.2": [ { "comment_text": "", "digests": { "md5": "73817fb7ef106b7ba60a2c7427aad869", "sha256": "80e314523dec7f8e971dc12999caf785d304280de3a5594c2e4fc88dabe9f7f7" }, "downloads": -1, "filename": "pymtattl-0.1.2.tar.gz", "has_sig": false, "md5_digest": "73817fb7ef106b7ba60a2c7427aad869", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5493, "upload_time": "2018-03-02T21:55:41", "url": "https://files.pythonhosted.org/packages/2f/9a/c862266d6972879fd4385dc2334d26afed1cd0176fe4a980b274432f620b/pymtattl-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "228f05487067873f7d3c0de7c5e259df", "sha256": "882dcd651cb0638581f7619048256ca065eacbb57ad0d0ed47241dc3e3f5e75b" }, "downloads": -1, "filename": "pymtattl-0.1.3.tar.gz", "has_sig": false, "md5_digest": "228f05487067873f7d3c0de7c5e259df", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5823, "upload_time": "2018-03-03T21:15:20", "url": "https://files.pythonhosted.org/packages/3f/1a/c1ece10d2d39f2ff4f53fa2142a34d29dbc182982601ebd165fde6bf7b6a/pymtattl-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "1a9c073f0a49b9ad2b6d7b4e3eaa1d3d", "sha256": "640ad135993badff65890a2e6da4dc742530c7256290ceec2d8515ff5a200162" }, "downloads": -1, "filename": "pymtattl-0.1.4.tar.gz", "has_sig": false, "md5_digest": "1a9c073f0a49b9ad2b6d7b4e3eaa1d3d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9882, "upload_time": "2018-03-21T03:10:28", "url": "https://files.pythonhosted.org/packages/af/35/3e4e2981dcefbe93c753b7ca4de5edd6ebd4334c15d08c758e75d722c447/pymtattl-0.1.4.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "bf29e9096df29c869209a172246106fb", "sha256": "9923fa2bf6a7c8e21d9edeaed697014ddf673d730ecf2c8cca65ac854fa5e0c8" }, "downloads": -1, "filename": "pymtattl-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "bf29e9096df29c869209a172246106fb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9827, "upload_time": "2018-12-28T16:53:14", "url": "https://files.pythonhosted.org/packages/6f/2b/695a70a9c97d0885b2638610f6be7c3c00b74bc3a7e926fe775bf55b0417/pymtattl-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4e1b9d4fd69ef9e833adc37f0f217479", "sha256": "423900981e79dd2b2a4e08f319dace13aaa1ea1dcbbbfbd7894e2c6165dacc54" }, "downloads": -1, "filename": "pymtattl-1.0.0.tar.gz", "has_sig": false, "md5_digest": "4e1b9d4fd69ef9e833adc37f0f217479", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8300, "upload_time": "2018-12-28T16:53:16", "url": "https://files.pythonhosted.org/packages/94/dd/7ff429fc8ff9b92bd8c0b862cf819f943e4ae563c568df73bee91b2c2fb2/pymtattl-1.0.0.tar.gz" } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "0c0f5766297efc022efc675b1fcc3286", "sha256": "fca6bd15dc7e91671afc2f40ccb72f8fb59827fa9d342f0785ee4b0833851353" }, "downloads": -1, "filename": "pymtattl-1.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "0c0f5766297efc022efc675b1fcc3286", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11770, "upload_time": "2018-12-30T01:22:42", "url": "https://files.pythonhosted.org/packages/00/30/bc516f032df98566625a581473122060bcf9628c5ca925cab53333274b48/pymtattl-1.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0240cbafb0bbbe7026b0ba2f9ba650e2", "sha256": "c997b2ebb026b58759847c1c66393a1120e7c042cdf9b53acbbd27ec6f1afbbf" }, "downloads": -1, "filename": "pymtattl-1.1.0.tar.gz", "has_sig": false, "md5_digest": "0240cbafb0bbbe7026b0ba2f9ba650e2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11764, "upload_time": "2018-12-30T01:22:43", "url": "https://files.pythonhosted.org/packages/4b/09/f5f10b471d13916252a71bd9cc673e2e58ab8ede80ae866fbfc023958197/pymtattl-1.1.0.tar.gz" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "ffdebd33681157b403cd5cea7b9ff093", "sha256": "19403f310c34d78c5213ee4b52e92515ba04966ff0d31b4076c02436760d5b38" }, "downloads": -1, "filename": "pymtattl-1.1.1.tar.gz", "has_sig": false, "md5_digest": "ffdebd33681157b403cd5cea7b9ff093", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10183, "upload_time": "2019-02-08T20:26:30", "url": "https://files.pythonhosted.org/packages/d4/a8/c60fedab05b01f1ae37de08e7369445f51b89228f96da5a9e6ae5fac2bd5/pymtattl-1.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ffdebd33681157b403cd5cea7b9ff093", "sha256": "19403f310c34d78c5213ee4b52e92515ba04966ff0d31b4076c02436760d5b38" }, "downloads": -1, "filename": "pymtattl-1.1.1.tar.gz", "has_sig": false, "md5_digest": "ffdebd33681157b403cd5cea7b9ff093", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10183, "upload_time": "2019-02-08T20:26:30", "url": "https://files.pythonhosted.org/packages/d4/a8/c60fedab05b01f1ae37de08e7369445f51b89228f96da5a9e6ae5fac2bd5/pymtattl-1.1.1.tar.gz" } ] }