{ "info": { "author": "Ben Smithgall", "author_email": "bsmithgall@codeforamerica.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "|Build Status|\n\nW-Drive Extractor (wextractor)\n==============================\n\nThe W-Drive Extractor (or the wextractor), named after the home of the\nshared list of contracts in the City of Pittsburgh, is an attempt to\nextract and standardize data from spreadsheets, .csvs, and other files\nfor a relational destination.\n\nUsing the W-Drive-Extractor\n---------------------------\n\nGetting Started\n~~~~~~~~~~~~~~~\n\nInstallation\n^^^^^^^^^^^^\n\nW-Drive Extractor is available as a pre-release package via pypi. You\ncan install via pip:\n\n::\n\n pip install wextractor --pre\n\nUsage\n^^^^^\n\nHere's a simple example of extracting data from the Pittsburgh Police\nblotter:\n\n::\n\n >>> from wextractor.extractors import CsvExtractor\n >>> extractor = CsvExtractor('http://apps.pittsburghpa.gov/police/arrest_blotter/arrest_blotter_Monday.csv')\n >>> data = extractor.extract()\n >>> print data\n >>> [{u'NEIGHBORHOOD': u'Spring Garden', u'DESCRIPTION': u'Flight to Avoid Apprehension Tri...\n\nFor a more complex example of usage, please see the bottom of this file.\n\nTODO Features:\n''''''''''''''\n\n- Add cli support\n- Change ``loader`` and ``extractor`` methods to use kwargs\n- Add better exception messaging for the ``load`` and ``extract``\n methods\n\nDeveloping W-Drive-Extractor\n----------------------------\n\nGetting Started\n~~~~~~~~~~~~~~~\n\nW-Drive Extractor has some external dependencies, which can be installed\nvia pip. It is recommended that you use a\n`virtualenv `__\nto manage these.\n\nThe W-Drive-Extractor is an object-oriented application. In order to use\nit, you must first *extract* data from its original source using an\n``Extractor``'s ``extract`` method (the ``ExcelExtractor`` is currently\nthe only supported example). Once the data is extracted, it can then be\n*loaded* back into some other datasource using a ``Loader``'s ``load``\nmethod. (only ``PostgresLoader`` has been implemented thus far). For a\nmore detailed example on how this works, check out the sample usage at\nthe bottom of this file.\n\nExtractors\n~~~~~~~~~~\n\nThe Extractor base class is an interface for implementing data\nextraction from different sources. It requires taking in a ``target``\nwhich can be a file or URL and two optional params. Headers is the title\nof the columns that will ultimately be extracted from your store, and\ndtypes is a list of native python types that each column should have.\n\nCurrent Implementations:\n''''''''''''''''''''''''\n\n- Excel (.xls, .xlsx)\n- Comma-Separated Values (.csv)\n\nTODO implementations:\n'''''''''''''''''''''\n\n- Generic Text Files (.txt)\n- Postgres\n- MS Access\n\nLoaders\n~~~~~~~\n\nThe Loader base class is an interface for implementing data loading into\nnew sources. It requires connection parameters (a python dictionary of\nconnection params) and optional schema. The goal is for a single input\nsource (spreadsheet, denormalized table, etc.) to be split into many\ntables.\n\nCurrent Implementations:\n''''''''''''''''''''''''\n\n- Postgres [with relationships and simple deduplication!]\n\nTODO Implementations:\n'''''''''''''''''''''\n\n- Simple key/value cache (Memcached/Redis)\n- Other relational data stores\n\nTests\n~~~~~\n\nTests are located in the ``test`` directory. To run the tests, run\n\n::\n\n PYTHONPATH=. nosetests test/\n\nfrom inside the root directory. For more coverage information, run\n\n::\n\n PYTHONPATH=. nosetests test/ -vs --with-coverage --cover-package=wextractor --cover-erase\n\nDetailed Sample Usage\n~~~~~~~~~~~~~~~~~~~~~\n\nBelow is an example of extracting data from Excel and loading it into a\nlocal `postgres database `__ with defined\nrelationships. NOTE: This implementation is still fragile and likely to\nbe dependent on the fact that to\\_relations is the last table in the\nlist below.\n\n::\n\n import datetime\n\n from wextractor.extractors import ExcelExtractor\n from wextractor.loaders import PostgresLoader\n\n one_sheet = ExcelExtractor(\n 'files/one sheet contract list.xlsx',\n dtypes=[\n unicode, unicode, unicode, int, unicode,\n unicode, datetime.datetime, int, unicode, unicode,\n unicode, unicode, unicode, unicode, unicode,\n unicode, unicode, unicode, unicode\n ]\n )\n data = one_sheet.extract()\n\n loader = PostgresLoader(\n {'database': 'w_drive', 'user': 'bensmithgall', 'host': 'localhost'},\n [{\n 'table_name': 'contract',\n 'to_relations': [],\n 'from_relations': ['company'],\n 'pkey': None,\n 'columns': (\n ('description', 'TEXT'),\n ('notes', 'TEXT'),\n ('contract_number', 'VARCHAR(255)'),\n ('county', 'VARCHAR(255)'),\n ('type_of_contract', 'VARCHAR(255)'),\n ('pa', 'VARCHAR(255)'),\n ('expiration', 'TIMESTAMP'),\n ('spec_number', 'VARCHAR(255)'),\n ('controller_number', 'INTEGER'),\n ('commcode', 'INTEGER')\n )\n },\n {\n 'table_name': 'company_contact',\n 'to_relations': [],\n 'from_relations': ['company'],\n 'pkey': None,\n 'columns': (\n ('contact_name', 'VARCHAR(255)'),\n ('address_1', 'VARCHAR(255)'),\n ('address_2', 'VARCHAR(255)'),\n ('phone_number', 'VARCHAR(255)'),\n ('email', 'VARCHAR(255)'),\n ('fax_number', 'VARCHAR(255)'),\n ('fin', 'VARCHAR(255)'),\n )\n },\n {\n 'table_name': 'company',\n 'to_relations': ['company_contact', 'contract'],\n 'from_relations': [],\n 'pkey': None,\n 'columns': (\n ('company', 'VARCHAR(255)'),\n ('bus_type', 'VARCHAR(255)'),\n )\n }]\n )\n\n loader.load(data, True)\n\n.. |Build Status| image:: https://travis-ci.org/codeforamerica/w-drive-extractor.svg?branch=master\n :target: https://travis-ci.org/codeforamerica/w-drive-extractor", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/codeforamerica/w-drive-extractor", "keywords": null, "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "wextractor", "package_url": "https://pypi.org/project/wextractor/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/wextractor/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/codeforamerica/w-drive-extractor" }, "release_url": "https://pypi.org/project/wextractor/0.1.dev4/", "requires_dist": null, "requires_python": null, "summary": "Extract flat data and load it as relational data", "version": "0.1.dev4" }, "last_serial": 1544053, "releases": { "0.1.dev1": [ { "comment_text": "", "digests": { "md5": "2d440233c7a21112863d6491feb3a910", "sha256": "62fe34a20d58c3228a57de6fb65a55def3eadee062492afbdd7d48c199f90e29" }, "downloads": -1, "filename": "wextractor-0.1.dev1.tar.gz", "has_sig": false, "md5_digest": "2d440233c7a21112863d6491feb3a910", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15602, "upload_time": "2015-02-20T04:37:31", "url": "https://files.pythonhosted.org/packages/5a/45/302a84abc040738c7997179a3df3af9327ede65fe2655279c4842f355b14/wextractor-0.1.dev1.tar.gz" } ], "0.1.dev2": [ { "comment_text": "", "digests": { "md5": "1ef51dddf199a6776ee0300538ead847", "sha256": "4282966d2d41cb5f6792d8bacccfcc04d814da3e2a689635ff59a004b43e7cb4" }, "downloads": -1, "filename": "wextractor-0.1.dev2.tar.gz", "has_sig": false, "md5_digest": "1ef51dddf199a6776ee0300538ead847", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16342, "upload_time": "2015-02-21T20:36:58", "url": "https://files.pythonhosted.org/packages/0b/a1/ba02a5b24f9ab50c81f0388b59a6b088bae9a5b8ab2618314332de9ab9c9/wextractor-0.1.dev2.tar.gz" } ], "0.1.dev3": [ { "comment_text": "", "digests": { "md5": "04802a3e5ca319a111fd05af9111da0a", "sha256": "b51716a3d196c6db8d7362d7665973a65156a36267f69d934b99d373f6f8484a" }, "downloads": -1, "filename": "wextractor-0.1.dev3.tar.gz", "has_sig": false, "md5_digest": "04802a3e5ca319a111fd05af9111da0a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16355, "upload_time": "2015-02-22T00:06:43", "url": "https://files.pythonhosted.org/packages/ce/c5/0caf477f0af41f259a9f2ed5bc3984a48bd7d514e08e99cba3648e37f05c/wextractor-0.1.dev3.tar.gz" } ], "0.1.dev4": [ { "comment_text": "", "digests": { "md5": "b643efc193bf91755613449c5cb4af72", "sha256": "3316fb68431e62b9d63f9d4765e69277a26f789e8f4c9437372d7a65ca8e644a" }, "downloads": -1, "filename": "wextractor-0.1.dev4.zip", "has_sig": false, "md5_digest": "b643efc193bf91755613449c5cb4af72", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26053, "upload_time": "2015-05-12T17:07:58", "url": "https://files.pythonhosted.org/packages/cd/f2/d928237136110f9c2589c6b18e2abe52d327a6f75fbcb89f5a4e088540ff/wextractor-0.1.dev4.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b643efc193bf91755613449c5cb4af72", "sha256": "3316fb68431e62b9d63f9d4765e69277a26f789e8f4c9437372d7a65ca8e644a" }, "downloads": -1, "filename": "wextractor-0.1.dev4.zip", "has_sig": false, "md5_digest": "b643efc193bf91755613449c5cb4af72", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26053, "upload_time": "2015-05-12T17:07:58", "url": "https://files.pythonhosted.org/packages/cd/f2/d928237136110f9c2589c6b18e2abe52d327a6f75fbcb89f5a4e088540ff/wextractor-0.1.dev4.zip" } ] }