{ "info": { "author": "UNKNOWN", "author_email": "UNKNOWN", "bugtrack_url": null, "classifiers": [ "Operating System :: OS Independent", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4" ], "description": "treetl: Running ETL tasks with tree-like dependencies\n=====================================================\n\nPipelines of batch jobs don't need to be linear. Sometimes there are shared intermediate transformations that can feed\nfuture steps in the process. **treetl** manages and runs collections of dependent ETL jobs by storing and registering\nthem as a `polytree `_.\n\nThis package was put together with `Spark `_ jobs in mind, so caching intermediate and\ncarrying results forward is top of mind. Due to this, one of the main benefits of **treetl** is that partial\njob results can be shared in memory.\n\nExample\n=======\n\nThe following set of jobs will all run exactly once and pass their transformed data (or some reference to it) to the\njobs dependent upon them.\n\n.. code:: python\n\n from treetl import (\n Job, JobRunner, JOB_STATUS\n )\n\n class JobA(Job):\n def transform(self, **kwargs):\n self.transformed_data = 1\n return self\n\n # JobB.transform can take a kwarg named\n # a_param that corresponds to JobA().transformed_data\n @Job.dependency(a_param=JobA)\n class JobB(Job):\n def transform(self, a_param=None, **kwargs):\n self.transformed_data = a_param + 1\n return self\n\n def load(self, **kwargs):\n # could save intermediate result self.transformed_data here\n pass\n\n @Job.dependency(some_b_param=JobB)\n class JobC(Job):\n pass\n\n @Job.dependency(input_param=JobA)\n class JobD(Job):\n def transform(self, input_param=None, **kwargs):\n self.transformed_data = input_param + 1\n return self\n\n @Job.dependency(in_one=JobB, in_two=JobD)\n class JobE(Job):\n def transform(self, in_one=None, in_two=None, **kwargs):\n # do stuff with in_one.transformed_data and in_two.transformed_data\n self.transformed_data = in_one + in_two\n\n # order submitted doesn't matter\n jobs = [ JobD(), JobC(), JobA(), JobB(), JobE() ]\n job_runner = JobRunner(jobs)\n if job_runner.run().status == JOB_STATUS.FAILED:\n # to see this section in action add the following to\n # def transform(self): raise ValueError()\n # to the definition of JobD\n print('Jobs failed')\n print('Root jobs that caused the failure : {}'.format(job_runner.failed_job_roots()))\n print('Paths to sources of failure : {}'.format(job_runner.failed_job_root_paths()))\n else:\n print('Success!')\n print('JobE transformed data: {}'.format(jobs[4].transformed_data))\n\n\nTODO\n====\n\n* Set parameters common to multiple jobs via the top level JobRunner\n* Set/pass state parameters to job methods\n* Support submitting a `JobRunner` as a job for nested job dependency graphs.\n* Run from a specific point in the tree. Allow for parents of starting point to retrieve last loaded data instead of recomputing the whole set of dependencies.\n* Ability to pass job attributes to component functions used in the decorator based definition of a job", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "UNKNOWN", "keywords": null, "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "treetl", "package_url": "https://pypi.org/project/treetl/", "platform": "any", "project_url": "https://pypi.org/project/treetl/", "project_urls": { "Download": "UNKNOWN", "Homepage": "UNKNOWN" }, "release_url": "https://pypi.org/project/treetl/1.3.0/", "requires_dist": null, "requires_python": null, "summary": "Job organizer for ETL tasks with tree like dependencies.", "version": "1.3.0" }, "last_serial": 2204868, "releases": { "1.3.0": [ { "comment_text": "", "digests": { "md5": "52659b82d384ce4785e8914a6f333950", "sha256": "59300ed8c9b9a31d7622524b7ee7925b54e89c4c0b02b31df82f50990b69d520" }, "downloads": -1, "filename": "treetl-1.3.0.tar.gz", "has_sig": false, "md5_digest": "52659b82d384ce4785e8914a6f333950", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14577, "upload_time": "2016-07-06T02:24:16", "url": "https://files.pythonhosted.org/packages/d2/38/a3c9c2c4c40c2f34c36252c6e9d04b0a8ce1452262c6c5d9a15a2f9557a8/treetl-1.3.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "52659b82d384ce4785e8914a6f333950", "sha256": "59300ed8c9b9a31d7622524b7ee7925b54e89c4c0b02b31df82f50990b69d520" }, "downloads": -1, "filename": "treetl-1.3.0.tar.gz", "has_sig": false, "md5_digest": "52659b82d384ce4785e8914a6f333950", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14577, "upload_time": "2016-07-06T02:24:16", "url": "https://files.pythonhosted.org/packages/d2/38/a3c9c2c4c40c2f34c36252c6e9d04b0a8ce1452262c6c5d9a15a2f9557a8/treetl-1.3.0.tar.gz" } ] }