{ "info": { "author": "Amsterdam Datapunt", "author_email": "datapunt@amsterdam.nl", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)", "Natural Language :: English", "Programming Language :: Python", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Information Analysis" ], "description": "Data-processing\n===============\n\n.. image:: https://img.shields.io/badge/python-3.6-blue.svg\n :target: https://www.python.org/\n\n.. image:: https://img.shields.io/badge/license-MPLv2.0-blue.svg\n :target: https://www.mozilla.org/en-US/MPL/2.0/\n\nAt the City of Amsterdam we deal with many different types of structured and unstructered data. Much of the data is not of high quality and are missing proper semantics to do proper analytics with.\n\nThis repository combines generic command line functions to create extract, transform and load steps we can then use for creating a reproducable data for analytics and usage in dashboards and maps.\n\nFor more information about the how we use these functions in our workflow, read the\n`data-pipeline guide `_.\n\nHow to use\n==========\n\nFull documentation can be found here:\n`amsterdam.github.io/data-processing `_ \n\nQuickstart:\nTo use a function in python you can use::\n\n from datapunt_processing.extract import download_from_catalog\n\n or \n\n from datapunt_processing.helpers.connections import objectstore_connection\n\nTo use the functions directly from the command line in your virtual environment or docker shell you can use it like this::\n \n download_from_data_amsterdam -h \n\nTo see the list of command line functions see the modules below or directly in `setup.py `_\n\n\nGetting Started\n===============\n\n\nTo get the functions up and running:\n\n.. code-block:: bash\n\n pip install datapunt-processing\n\n\n\nTo develop the functions locally use these steps:\n\n1. Clone the repository:\n\n.. code-block:: bash\n\n git clone https://github.com/Amsterdam/data-processing.git\n cd data-processing\n\n2. Create Virtual environment in Windows\n\n.. code-block:: bash\n\n # Create and activate a virtual environment, for example with:\n python -m venv --copies --prompt data-processing .venv \n .venv\\Scripts\\activate\n\n2. Create Virtual environment in OSX\n\n.. code-block:: bash\n\n virtualenv --python=$(which python3) venv\n source venv/bin/activate \n\n3. Install the data-processing modules in editable mode\n\n.. code-block:: bash \n\n pip install -e .\n\n4. A database is required for the transform and load functions. \nYou can setup your postgres database credentials in the config.ini file to apply to the functions.\n\nIf want to use `Docker `_, you can start a database server for your project in a new terminal. The name, port and login of the database can be changed in the docker-compose.yml. Also change them in the config.ini file which will be used by the functions to connect to that database.\n\n\n.. code-block:: bash \n\n docker-compose up -d database\n\nNotebooks\n=========\nSome of the examples are in the form of runnable Jupyter notebooks. Copies of these with all the images and output included are hosted at Anaconda Cloud. To run these notebooks on your own system, start up a Jupyter notebook server:\n\nTo install jupyter:\n\n.. code-block:: bash\n\n pip install -e .\\[dev\\]\n\n jupyter notebook --NotebookApp.iopub_data_rate_limit=100000000\n\nHow to Contribute\n=================\nIf you want to contribute please follow the `contribute guidelines `_ \n\n0. Prequisites\n--------------\nFork this repository to your local github account to add add and test new functions.\n\n.. code-block:: bash\n\n git clone https://github.com/Amsterdam/data-processing.git\n\nInstall the docs,test,dev packages using this command:\n\n.. code-block:: bash\n\n pip install -e .[docs,test,dev]\n or when using zsh\n pip install -e .\\[docs,test,dev\\]\n\nThis package is build by using `setuptools `_ to be able to release stable versions on PyPi. It follows some of `these `_ guidelines of setting up a python package.\n\n1. Add function\n---------------\nWe try to use command line functions as much as possible to ensure we create functions to work easily with different environments and to force yourself creating more generic functions with input variables.\n\nIf possible, convert your function into a `python-package command line script `_ using the `boilerplate_function.py `_ \n\nAdd your function to the appropriate `folder `_:\n - `extract `_\n - `transform `_\n - `load `_\n - `helpers `_\n\n\nside note: not all functions are suitable for CL. Machine learning preprocessing steps or general API calls for instance, (that often require parameters in the form of dicts or lists) as input are not suitable and can be used as stand-alone scripts. \n\n2. Add tests\n------------\n\nAdd test to the `test folder `_ and run:\n\n.. code-block:: bash\n\n python setup.py test\n\nto test if no other functions are breaking. Correct those issues as well if needed.\n\n3. Add documentation\n--------------------\nCreate a awesome_module.rst file with `Sphinx Argparse extension `_ fields to generate the description and argument fields by reusing an `existing rst file `_. The helpers docs will generate automatically, so you can skip this step if it is placed in the helper function. \n\nAdd the rst filename to the list in `modules.rst `_ to be found on the main page.\n\nRegenerate the documentation to test the docs output using this command line function:\n\n.. code-block:: bash\n\n sphinx/make docs\n and test if the readme is not broken:\n open docs/index.html\n\n4. Add a Pull Request\n---------------------\nMake a PR to add the add your awesome function to our processing code to be reused by many other developpers and data analists.\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Amsterdam/data-processing", "keywords": "", "license": "Mozilla Public License Version 2.0", "maintainer": "", "maintainer_email": "", "name": "datapunt-processing", "package_url": "https://pypi.org/project/datapunt-processing/", "platform": "", "project_url": "https://pypi.org/project/datapunt-processing/", "project_urls": { "Homepage": "https://github.com/Amsterdam/data-processing" }, "release_url": "https://pypi.org/project/datapunt-processing/0.0.1a12/", "requires_dist": null, "requires_python": "", "summary": "Datapunt generic ETL command line scripts and functions for shell scripting in Docker.", "version": "0.0.1a12" }, "last_serial": 3965324, "releases": { "0.0.1a1": [ { "comment_text": "", "digests": { "md5": "6321a985ccc384d804f62b32767a58e9", "sha256": "50994f9eb86284af96caf81efbeedd9beb249d8d175a6323f53a2c527fc07b9e" }, "downloads": -1, "filename": "datapunt_processing-0.0.1a1-py3-none-any.whl", "has_sig": false, "md5_digest": "6321a985ccc384d804f62b32767a58e9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 59361, "upload_time": "2018-03-20T15:25:06", "url": "https://files.pythonhosted.org/packages/cd/02/905ef6b749dd40ee08bc4728e8ea53414881693035217c56d4792d99ec03/datapunt_processing-0.0.1a1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b611d23a92707ba962dfb23a1b7cc3c8", "sha256": "473a12c556e3e6f9cb6959b14c341b0d23ed88dc9964177aadaa12a25d1143dd" }, "downloads": -1, "filename": "datapunt-processing-0.0.1a1.tar.gz", "has_sig": false, "md5_digest": "b611d23a92707ba962dfb23a1b7cc3c8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1644974, "upload_time": "2018-03-20T15:44:21", "url": "https://files.pythonhosted.org/packages/7f/4d/3aa853906016599f75c2bc32daa1c50f60ae28337ec693db188c68c9a5cb/datapunt-processing-0.0.1a1.tar.gz" } ], "0.0.1a10": [ { "comment_text": "", "digests": { "md5": "5f73fc003fb70c6ffdb35ce7c74a53c8", "sha256": "295cbfc1fae0212aadb0ada9f2baafa743dfc9a77b1c9b36c3d4670b892f9adc" }, "downloads": -1, "filename": "datapunt_processing-0.0.1a10.tar.gz", "has_sig": false, "md5_digest": "5f73fc003fb70c6ffdb35ce7c74a53c8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1673709, "upload_time": "2018-05-25T15:36:06", "url": "https://files.pythonhosted.org/packages/f0/d0/b2fc795c719dc98b9e7639906baf90d5c6d322903384600a8a5457101e1e/datapunt_processing-0.0.1a10.tar.gz" } ], "0.0.1a11": [ { "comment_text": "", "digests": { "md5": "c9dd2be70c32b549a134ec4e727e2a46", "sha256": "97d81238fe3b2cb4c205ef22371e3f8ea73fa5daff4a914907627a5e794d5a70" }, "downloads": -1, "filename": "datapunt_processing-0.0.1a11.tar.gz", "has_sig": false, "md5_digest": "c9dd2be70c32b549a134ec4e727e2a46", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1679917, "upload_time": "2018-06-15T10:20:26", "url": "https://files.pythonhosted.org/packages/c9/e0/fbd4d84bda1b683dcacc7a79e70623797b82519525c56c290025f54955a0/datapunt_processing-0.0.1a11.tar.gz" } ], "0.0.1a12": [ { "comment_text": "", "digests": { "md5": "602db94b9767f237544d9e0f55e3e613", "sha256": "d1a8ff349a2f6623063922a2c543df190dab4eae0b43e911040d23e63edd7247" }, "downloads": -1, "filename": "datapunt_processing-0.0.1a12.tar.gz", "has_sig": false, "md5_digest": "602db94b9767f237544d9e0f55e3e613", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1682459, "upload_time": "2018-06-15T15:50:19", "url": "https://files.pythonhosted.org/packages/a5/b2/1f1c7baf7e5ce47c29125b3dac7ff8001b61f9deab0194b7b9d469ccc61a/datapunt_processing-0.0.1a12.tar.gz" } ], "0.0.1a2": [ { "comment_text": "", "digests": { "md5": "c2c810fcbe1e5c8438f1d4c03dc19830", "sha256": "ef70420d4dc5454ed967beac817440adc79d8c901fcadca95a38129c7535560d" }, "downloads": -1, "filename": "datapunt-processing-0.0.1a2.tar.gz", "has_sig": false, "md5_digest": "c2c810fcbe1e5c8438f1d4c03dc19830", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1644947, "upload_time": "2018-03-20T15:47:55", "url": "https://files.pythonhosted.org/packages/71/da/3de4fa799ac4fd648dc299a784c3bbd57a894ad91d2b385c244b4b7846c6/datapunt-processing-0.0.1a2.tar.gz" } ], "0.0.1a3": [ { "comment_text": "", "digests": { "md5": "4a31f148afda221baf503d2c770b7fbc", "sha256": "314ab6c4391a9288334fb2bd826a4ed82f84c896df328ef8f00a674ba31603f5" }, "downloads": -1, "filename": "datapunt-processing-0.0.1a3.tar.gz", "has_sig": false, "md5_digest": "4a31f148afda221baf503d2c770b7fbc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1645048, "upload_time": "2018-03-20T15:55:43", "url": "https://files.pythonhosted.org/packages/a7/88/80d969aedabdf9954325eb570434b36e94eadbbb81da634074435a27ba20/datapunt-processing-0.0.1a3.tar.gz" } ], "0.0.1a4": [ { "comment_text": "", "digests": { "md5": "1a807cbccb1ee2d94e6c5e67d6240c6d", "sha256": "ab1398ac04ad5120572e1982f25d51f60b6d318c2f69d208668e31bcc365cb27" }, "downloads": -1, "filename": "datapunt-processing-0.0.1a4.tar.gz", "has_sig": false, "md5_digest": "1a807cbccb1ee2d94e6c5e67d6240c6d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1645686, "upload_time": "2018-03-22T15:11:14", "url": "https://files.pythonhosted.org/packages/a9/16/ba9a6eec18358625eaae3091c5068158bb31e21a5a4ed9a57ca7a7e478df/datapunt-processing-0.0.1a4.tar.gz" } ], "0.0.1a5": [ { "comment_text": "", "digests": { "md5": "058a518c556d13f6cb4f0635a83e331e", "sha256": "029550d5a9590588bb36b874aa6b7f9cdc9642cf46bb2579bc383251c8d84f0b" }, "downloads": -1, "filename": "datapunt_processing-0.0.1a5.tar.gz", "has_sig": false, "md5_digest": "058a518c556d13f6cb4f0635a83e331e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1651166, "upload_time": "2018-03-23T15:45:22", "url": "https://files.pythonhosted.org/packages/d3/31/8fd89e3198c673b49e47e5e282183fb288a21104deec94994089b8976382/datapunt_processing-0.0.1a5.tar.gz" } ], "0.0.1a6": [ { "comment_text": "", "digests": { "md5": "70c6c02f2031aabf6bb8b533e3c25bb2", "sha256": "22c92034f324be49827e65b6b3240c240fae4949edea898e68a565e7c9677c54" }, "downloads": -1, "filename": "datapunt_processing-0.0.1a6.tar.gz", "has_sig": false, "md5_digest": "70c6c02f2031aabf6bb8b533e3c25bb2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1658857, "upload_time": "2018-04-09T10:05:54", "url": "https://files.pythonhosted.org/packages/3c/41/1897c9bf376068b1b423993c05baf61319d3503aa3fc9decf6e6abb7b613/datapunt_processing-0.0.1a6.tar.gz" } ], "0.0.1a7": [ { "comment_text": "", "digests": { "md5": "d6d16004c95172588005c6f00c4a0b41", "sha256": "6fd24ab3f1e08fb65615feb6e668446c136b9125788ca716781c2b7b3d61d05e" }, "downloads": -1, "filename": "datapunt_processing-0.0.1a7.tar.gz", "has_sig": false, "md5_digest": "d6d16004c95172588005c6f00c4a0b41", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1665604, "upload_time": "2018-04-13T13:41:23", "url": "https://files.pythonhosted.org/packages/a9/b6/4e9f1522b1f4ec89e0e8b5985eceac8191d1ecdbf679f9b414ac7c618281/datapunt_processing-0.0.1a7.tar.gz" } ], "0.0.1a8": [ { "comment_text": "", "digests": { "md5": "9db05fc08e5d2f9c99551d0a173ae3ba", "sha256": "2cc7fa01f04e28164a1bfa8260ac3f0f58be96db64570fd4bc91a6ad5f2e7195" }, "downloads": -1, "filename": "datapunt_processing-0.0.1a8.tar.gz", "has_sig": false, "md5_digest": "9db05fc08e5d2f9c99551d0a173ae3ba", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1667590, "upload_time": "2018-04-17T18:42:17", "url": "https://files.pythonhosted.org/packages/65/af/0b31f3423e642b3b5dc3849a93c17d7baa8d2e01d149fcd98a6958020875/datapunt_processing-0.0.1a8.tar.gz" } ], "0.0.1a9": [ { "comment_text": "", "digests": { "md5": "27d5b070d31c28f975220e98ad446585", "sha256": "b80ced85b7d86f133f4f1b1203417c9ec46b5d66371a6e430a34814074eaf6ad" }, "downloads": -1, "filename": "datapunt_processing-0.0.1a9.tar.gz", "has_sig": false, "md5_digest": "27d5b070d31c28f975220e98ad446585", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1667608, "upload_time": "2018-04-17T18:55:17", "url": "https://files.pythonhosted.org/packages/ec/ab/0b94fa55625cbb327425cd66f75b2619c6bebb6bd396d1e0aca0b75265be/datapunt_processing-0.0.1a9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "602db94b9767f237544d9e0f55e3e613", "sha256": "d1a8ff349a2f6623063922a2c543df190dab4eae0b43e911040d23e63edd7247" }, "downloads": -1, "filename": "datapunt_processing-0.0.1a12.tar.gz", "has_sig": false, "md5_digest": "602db94b9767f237544d9e0f55e3e613", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1682459, "upload_time": "2018-06-15T15:50:19", "url": "https://files.pythonhosted.org/packages/a5/b2/1f1c7baf7e5ce47c29125b3dac7ff8001b61f9deab0194b7b9d469ccc61a/datapunt_processing-0.0.1a12.tar.gz" } ] }