{ "info": { "author": "David Read", "author_email": "david.read@hackneyworkshop.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)", "Programming Language :: Python :: 2.7" ], "description": ".. You should enable this project on travis-ci.org and coveralls.io to make\n these badges work. The necessary Travis and Coverage config files have been\n generated for you.\n\n.. image:: https://travis-ci.org/ckan/ckanext-xloader.svg?branch=master\n :target: https://travis-ci.org/ckan/ckanext-xloader\n\n.. image:: https://img.shields.io/pypi/v/ckanext-xloader.svg\n :target: https://pypi.org/project/ckanext-xloader/\n :alt: Latest Version\n\n.. image:: https://img.shields.io/pypi/pyversions/ckanext-xloader.svg\n :target: https://pypi.org/project/ckanext-xloader/\n :alt: Supported Python versions\n\n.. image:: https://img.shields.io/pypi/status/ckanext-xloader.svg\n :target: https://pypi.org/project/ckanext-xloader/\n :alt: Development Status\n\n.. image:: https://img.shields.io/pypi/l/ckanext-xloader.svg\n :target: https://pypi.org/project/ckanext-xloader/\n :alt: License\n\n================================\nExpress Loader - ckanext-xloader\n================================\n\nLoads CSV (and similar) data into CKAN's DataStore. Designed as a replacement\nfor DataPusher because it offers ten times the speed and more robustness.\n\n**OpenGov Inc.** has sponsored this development, with the aim of benefitting\nopen data infrastructure worldwide.\n\n-------------------------------\nKey differences from DataPusher\n-------------------------------\n\nSpeed of loading\n----------------\n\nDataPusher - parses CSV rows, converts to detected column types, converts the\ndata to a JSON string, calls datastore_create for each batch of rows, which\nreformats the data into an INSERT statement string, which is passed to\nPostgreSQL.\n\nExpress Loader - pipes the CSV file directly into PostgreSQL using COPY.\n\nIn `tests `_, Express Loader\nis over ten times faster than DataPusher.\n\nRobustness\n----------\n\nDataPusher - one cause of failure was when casting cells to a guessed type. The\ntype of a column was decided by looking at the values of only the first few\nrows. So if a column is mainly numeric or dates, but a string (like \"N/A\")\ncomes later on, then this will cause the load to error at that point, leaving\nit half-loaded into DataStore.\n\nExpress Loader - loads all the cells as text, before allowing the admin to\nconvert columns to the types they want (using the Data Dictionary feature). In\nfuture it could do automatic detection and conversion.\n\nSimpler queueing tech\n----------------------\n\nDataPusher - job queue is done by ckan-service-provider which is bespoke,\ncomplicated and stores jobs in its own database (sqlite by default).\n\nExpress Loader - job queue is done by RQ, which is simpler and is backed by\nRedis and allows access to the CKAN model. You can also debug jobs easily using\npdb. Job results are currently still stored in its own database, but the\nintention is to move this relatively small amount of data into CKAN's database,\nto reduce the complication of install.\n\n(The other obvious candidate is Celery, but we don't need its heavyweight\narchitecture and its jobs are not debuggable with pdb.)\n\nSeparate web server\n-------------------\n\nDataPusher - has the complication that the queue jobs are done by a separate\n(Flask) web app, apart from CKAN. This was the design because the job requires\nintensive processing to convert every line of the data into JSON. However it\nmeans more complicated code as info needs to be passed between the services in\nhttp requests, more for the user to set-up and manage - another app config,\nanother apache config, separate log files.\n\nExpress Loader - the job runs in a worker process, in the same app as CKAN, so\ncan access the CKAN config, db and logging directly and avoids many HTTP calls.\nThis simplification makes sense because the xloader job doesn't need to do much\nprocessing - mainly it is streaming the CSV file from disk into PostgreSQL.\n\nCaveats\n-------\n\n* All columns are loaded as 'text' type. However an admin can use the\n resource's Data Dictionary tab (CKAN 2.7 onwards) to change these to numeric\n or datestamp and re-load the file. There is scope to do this automatically in\n future.\n\n\n------------\nRequirements\n------------\n\nWorks with CKAN 2.7.x and later.\n\nWorks with CKAN 2.3.x - 2.6.x if you install ckanext-rq.\n\n\n------------\nInstallation\n------------\n\nTo install Express Loader:\n\n1. Activate your CKAN virtual environment, for example::\n\n . /usr/lib/ckan/default/bin/activate\n\n2. Install the ckanext-xloader Python package into your virtual environment::\n\n pip install ckanext-xloader\n\n3. Install dependencies::\n\n pip install -r requirements.txt\n pip install -U requests[security]\n\n4. If you are using CKAN version before 2.8.x you need to define the\n ``populate_full_text_trigger`` in your database\n ::\n\n sudo -u postgres psql datastore_default -f full_text_function.sql\n\n If successful it will print\n ::\n\n CREATE FUNCTION\n ALTER FUNCTION\n\n NB this assumes you used the defaults for the database name and username.\n If in doubt, check your config's ``ckan.datastore.write_url``. If you don't have\n database name ``datastore_default`` and username ``ckan_default`` then adjust\n the psql option and ``full_text_function.sql`` before running this.\n\n5. Add ``xloader`` to the ``ckan.plugins`` setting in your CKAN\n config file (by default the config file is located at\n ``/etc/ckan/default/production.ini``).\n\n You should also remove ``datapusher`` if it is in the list, to avoid them\n both trying to load resources into the DataStore.\n\n Ensure ``datastore`` is also listed, to enable CKAN DataStore.\n\n6. If it is a production server, you'll want to store jobs info in a more\n robust database than the default sqlite file::\n\n sudo -u postgres createdb -O ckan_default xloader_jobs -E utf-8\n\n And add this list to the config::\n\n ckanext.xloader.jobs_db.uri = postgresql://ckan_default:pass@localhost/xloader_jobs\n\n (This step can be skipped when just developing or testing.)\n\n7. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu::\n\n sudo service apache2 reload\n\n8. Run the worker. First test it on the command-line::\n\n paster --plugin=ckan jobs -c /etc/ckan/default/ckan.ini worker\n\n or if you have CKAN version 2.6.x or less (and are therefore using ckanext-rq)::\n\n paster --plugin=ckanext-rq jobs -c /etc/ckan/default/ckan.ini worker\n\n Test it will load a CSV ok by submitting a `CSV in the web interface `_\n or in another shell::\n\n paster --plugin=ckanext-xloader xloader submit -c /etc/ckan/default/ckan.ini\n\n Clearly, running the worker on the command-line is only for testing - for\n production services see:\n\n http://docs.ckan.org/en/ckan-2.7.0/maintaining/background-tasks.html#using-supervisor\n\n If you have CKAN version 2.6.x or less then you'll need to download\n `supervisor-ckan-worker.conf `_ and adjust the ``command`` to reference\n ckanext-rq.\n\n\n---------------\nConfig settings\n---------------\n\nConfiguration:\n\n::\n\n # The connection string for the jobs database used by Express Loader. The\n # default of an sqlite file is fine for development. For production use a\n # Postgresql database.\n ckanext.xloader.jobs_db.uri = sqlite:////tmp/xloader_jobs.db\n\n # The formats that are accepted. If the value of the resource.format is\n # anything else then it won't be 'xloadered' to DataStore (and will therefore\n # only be available to users in the form of the original download/link).\n # Case insensitive.\n # (optional, defaults are listed in plugin.py - DEFAULT_FORMATS).\n ckanext.xloader.formats = csv application/csv xls application/vnd.ms-excel\n\n # The maximum size of files to load into DataStore. In bytes. Default is 1 GB.\n ckanext.xloader.max_content_length = 1000000000\n\n # The maximum time for the loading of a resource before it is aborted.\n # Give an amount in seconds. Default is 60 minutes\n ckanext.xloader.job_timeout = 3600\n\n # Ignore the file hash when submitting to the DataStore, if set to True\n # resources are always submitted (if their format matches), if set to\n # False (default), resources are only submitted if their hash has changed.\n ckanext.xloader.ignore_hash = False\n\n # When loading a file that is bigger than `max_content_length`, xloader can\n # still try and load some of the file, which is useful to display a\n # preview. Set this option to the desired number of lines/rows that it\n # loads in this case.\n # If the file-type is supported (CSV, TSV) an excerpt with the number of\n # `max_excerpt_lines` lines will be submitted while the `max_content_length`\n # is not exceeded.\n # If set to 0 (default) files that exceed the `max_content_length` will\n # not be loaded into the datastore.\n ckanext.xloader.max_excerpt_lines = 100\n\n------------------------\nDeveloper installation\n------------------------\n\nTo install Express Loader for development, activate your CKAN virtualenv and\nin the directory up from your local ckan repo::\n\n git clone https://github.com/ckan/ckanext-xloader.git\n cd ckanext-xloader\n python setup.py develop\n pip install -r requirements.txt\n pip install -r dev-requirements.txt\n\n\n-------------------------\nUpgrading from DataPusher\n-------------------------\n\nTo upgrade from DataPusher to Express Loader:\n\n1. Install Express Loader as above, including running the xloader worker.\n\n2. If you've not already, change the enabled plugin in your config - on the\n ``ckan.plugins`` line replace ``datapusher`` with ``xloader``.\n\n3. Stop the datapusher worker::\n\n sudo a2dissite datapusher\n\n4. Restart CKAN::\n\n sudo service apache2 reload\n sudo service nginx reload\n\n---------------\nTroubleshooting\n---------------\n\n**KeyError: \"Action 'datastore_search' not found\"**\n\nYou need to enable the `datastore` plugin in your CKAN config. See\n'Installation' section above to do this and restart the worker.\n\n**ProgrammingError: (ProgrammingError) relation \"_table_metadata\" does not\nexist**\n\nYour DataStore permissions have not been set-up - see:\n\n\n-----------------\nRunning the Tests\n-----------------\n\nThe first time, your test datastore database needs the trigger applied::\n\n sudo -u postgres psql datastore_test -f full_text_function.sql\n\nTo run the tests, do::\n\n nosetests --nologcapture --with-pylons=test.ini\n\nTo run the tests and produce a coverage report, first make sure you have\ncoverage installed in your virtualenv (``pip install coverage``) then run::\n\n nosetests --nologcapture --with-pylons=test.ini --with-coverage --cover-package=ckanext.xloader --cover-inclusive --cover-erase --cover-tests\n\n-----------------------------------------\nReleasing a New Version of Express Loader\n-----------------------------------------\n\nExpress Loader is available on PyPI as https://pypi.org/project/ckanext-xloader.\n\nTo publish a new version to PyPI follow these steps:\n\n1. Update the version number in the ``setup.py`` file.\n See `PEP 440 `_\n for how to choose version numbers.\n\n2. Update the CHANGELOG.\n\n3. Make sure you have the latest version of necessary packages::\n\n pip install --upgrade setuptools wheel twine\n\n4. Create a source and binary distributions of the new version::\n\n python setup.py sdist bdist_wheel && twine check dist/*\n\n Fix any errors you get.\n\n5. Upload the source distribution to PyPI::\n\n twine upload dist/*\n\n6. Commit any outstanding changes::\n\n git commit -a\n\n7. Tag the new release of the project on GitHub with the version number from\n the ``setup.py`` file. For example if the version number in ``setup.py`` is\n 0.0.1 then do::\n\n git tag 0.0.1\n git push --tags\n\n\n", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ckan/ckanext-xloader", "keywords": "CKAN extension datastore", "license": "AGPL", "maintainer": "", "maintainer_email": "", "name": "ckanext-xloader", "package_url": "https://pypi.org/project/ckanext-xloader/", "platform": "", "project_url": "https://pypi.org/project/ckanext-xloader/", "project_urls": { "Homepage": "https://github.com/ckan/ckanext-xloader" }, "release_url": "https://pypi.org/project/ckanext-xloader/0.4.0/", "requires_dist": null, "requires_python": "", "summary": "Express Loader - quickly load data into CKAN DataStore", "version": "0.4.0" }, "last_serial": 5430721, "releases": { "0.2.0": [ { "comment_text": "", "digests": { "md5": "3c5cb8760f1edf5af7a37933de8a590a", "sha256": "993b741e3d91a81ae5ff74bc38c7c5b473a7667e31baf0bcf58b7dcede59f54c" }, "downloads": -1, "filename": "ckanext-xloader-0.2.0.tar.gz", "has_sig": false, "md5_digest": "3c5cb8760f1edf5af7a37933de8a590a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 92091, "upload_time": "2017-11-10T17:06:55", "url": "https://files.pythonhosted.org/packages/8f/6b/eff2c598831795c9b28de61fafcae4278e2dce70e5922ab32fd1126c7829/ckanext-xloader-0.2.0.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "3319d8bf65856eec1793fb8a7bb3bc15", "sha256": "728908df2e324b6a931463480072cbc94840e690b9a74bbbf5286d1d3bf3e8e5" }, "downloads": -1, "filename": "ckanext-xloader-0.3.0.tar.gz", "has_sig": false, "md5_digest": "3319d8bf65856eec1793fb8a7bb3bc15", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 92191, "upload_time": "2017-11-17T14:50:48", "url": "https://files.pythonhosted.org/packages/47/ee/d6a9fe9758d5407487df9cb46b028c75e5f235f22fa157da6fc8c3ef71db/ckanext-xloader-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "af542faed8b77ea24aff0f815b119269", "sha256": "6edf820b563295527838f927bf36c70073f70568e8bc8e682d526efa4c3eaf4a" }, "downloads": -1, "filename": "ckanext-xloader-0.3.1.tar.gz", "has_sig": false, "md5_digest": "af542faed8b77ea24aff0f815b119269", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 94686, "upload_time": "2018-01-22T09:58:20", "url": "https://files.pythonhosted.org/packages/1c/f4/59d2de5b425af29a9ef79c95f7d0e0a3af31230cd8b3f805d0c14d15626d/ckanext-xloader-0.3.1.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "59ae0cc52840e2ad3db6f68ec4ecbefc", "sha256": "631dc8cd9f0787832c8d043602800512def7ca49653486691d25087c149f4e7d" }, "downloads": -1, "filename": "ckanext_xloader-0.4.0-py2-none-any.whl", "has_sig": false, "md5_digest": "59ae0cc52840e2ad3db6f68ec4ecbefc", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 61544, "upload_time": "2019-06-21T12:53:50", "url": "https://files.pythonhosted.org/packages/0f/ef/dc5ca0dd4f2d65c6b25038ffaf932fc22159bd853d69e99d2afa28056508/ckanext_xloader-0.4.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "757b02c968cd5e3d157276e5e9d58639", "sha256": "acdf44a3efd89208f05c56089089f6e2ad5fbe452641318c9e3175c8f2c3513e" }, "downloads": -1, "filename": "ckanext-xloader-0.4.0.tar.gz", "has_sig": false, "md5_digest": "757b02c968cd5e3d157276e5e9d58639", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 47415, "upload_time": "2019-06-21T12:53:53", "url": "https://files.pythonhosted.org/packages/eb/ea/2109b80bdaa16eaab13dba7b39d4c39e9c79c122db37c24dd7d714540499/ckanext-xloader-0.4.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "59ae0cc52840e2ad3db6f68ec4ecbefc", "sha256": "631dc8cd9f0787832c8d043602800512def7ca49653486691d25087c149f4e7d" }, "downloads": -1, "filename": "ckanext_xloader-0.4.0-py2-none-any.whl", "has_sig": false, "md5_digest": "59ae0cc52840e2ad3db6f68ec4ecbefc", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 61544, "upload_time": "2019-06-21T12:53:50", "url": "https://files.pythonhosted.org/packages/0f/ef/dc5ca0dd4f2d65c6b25038ffaf932fc22159bd853d69e99d2afa28056508/ckanext_xloader-0.4.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "757b02c968cd5e3d157276e5e9d58639", "sha256": "acdf44a3efd89208f05c56089089f6e2ad5fbe452641318c9e3175c8f2c3513e" }, "downloads": -1, "filename": "ckanext-xloader-0.4.0.tar.gz", "has_sig": false, "md5_digest": "757b02c968cd5e3d157276e5e9d58639", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 47415, "upload_time": "2019-06-21T12:53:53", "url": "https://files.pythonhosted.org/packages/eb/ea/2109b80bdaa16eaab13dba7b39d4c39e9c79c122db37c24dd7d714540499/ckanext-xloader-0.4.0.tar.gz" } ] }