{ "info": { "author": "Chu-Hsuan Lee", "author_email": "joseph.chuhsuanlee@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 2" ], "description": "# flyingtrain - Document\n\nUse an iterative parser to retrieve transport models and total passenger capacity from long JSON transport list in a .txt file\n\n## Installation\nThis project is packaged with Python 2, and can be installed with `pip`. Copy-paste and run this command in the terminal:\n```\npip install flyingtrain\n```\n\n## Docker (supplementary solution)\n* This project is also dockerized. [Docker](https://docs.docker.com/install/) needs to be installed to run this project in containerization method.\n* The [Dockerfile](Dockerfile) uses \u200b`python:2`\u200b\u200b as base image.\n* There are some feasible commands as indicated in \u200b[Makefile\u200b](Makefile), or simply execute \u200b `make help`, it will show the Make commands that can be used. (We will go through more in detail later)\n\n## Tool\nThis project uses [__ijson__](https://pypi.org/project/ijson/) as an iterative JSON parser to avoid dumping the entire data file into memory\n\n## Usage\nAfter installation, the following snippet can be used inside a virtual environment to extract the data\n```py\nimport flyingtrain\n\ntest_file = 'test.txt' # the full path of the file\n\nflyingtrain.extract_data(test_file)\n```\nthe result\n```sh\n(flyingtrain) chuhsuan@ubuntu:~/Desktop$ python\nPython 2.7.12 (default, Nov 12 2018, 14:36:49)\n[GCC 5.4.0 20160609] on linux2\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import flyingtrain\n>>> flyingtrain.extract_data('test.txt')\n\"planes\": 524\n\"trains\": 150\n\"cars\": 14\n\n\"distinct-cars\": 3\n\"distinct-planes\": 2\n\"distinct-trains\": 1\n```\n_Docker solution_
\nCopy the data file to the root folder, assign the file name to [test_file](main.py#L4) in `main.py` and execute `make run`. Volume binding can be used like [this line](Makefile#L10) in Makefile to avoid copying the file, but it's not implemented here while taking docker as a supplementary solution.
\n\n_the result of the docker solution_\n```sh\nchuhsuan@ubuntu:~/git/flyingtrain$ make run\ndocker build \\\n\t-t chuhsuanlee/flyingtrain \\\n\t.\nSending build context to Docker daemon 61.44kB\nStep 1/5 : FROM python:2\n ---> 3c43a5d4034a\nStep 2/5 : WORKDIR /usr/src\n ---> Using cache\n ---> 37e4d0e02609\nStep 3/5 : COPY requirements.txt /usr/src/\n ---> Using cache\n ---> 85ae12b2a6f6\nStep 4/5 : RUN pip install -r requirements.txt\n ---> Using cache\n ---> 9d33ec10c044\nStep 5/5 : ENTRYPOINT [\"python\", \"main.py\"]\n ---> Using cache\n ---> e3d261a60154\nSuccessfully built e3d261a60154\nSuccessfully tagged chuhsuanlee/flyingtrain:latest\ndocker run \\\n\t--rm -v /etc/localtime:/etc/localtime -v /home/chuhsuan/git/flyingtrain:/usr/src \\\n\tchuhsuanlee/flyingtrain\n\"planes\": 524\n\"trains\": 150\n\"cars\": 14\n\n\"distinct-cars\": 3\n\"distinct-planes\": 2\n\"distinct-trains\": 1\n```\n\n## Benchmark\nThe following command is used in the terminal to show how much time it takes to retrieve the data\n```sh\npython -m timeit -s \"import flyingtrain\" \"flyingtrain.extract_data('test.txt')\"\n```\nthe result\n```\n1000 loops, best of 3: 684 usec per loop\n```\nwhich means it takes around 684 usec for executing once
\n\n_Docker solution_
\nAssign the file name to [test_file](benchmark.py#L4) in `benchmark.py` and execute `make runbenchmark`. Again, volume binding is not implemented here, so the file should be put under the root folder.
\n\n_the result of the docker solution_\n```\n[0.6676740646362305, 0.6634271144866943, 0.6310489177703857]\n```\nwhich means measuring execution time with 3 repeats counts and each count with 1000 executions. For average it takes 654 usec per execution\n\n## Possible optimizations\n* First, for __benchmarking__, the build-in module `timeit` is used here. There are also some third party packages can be used such as [__memory_profiler__](https://pypi.org/project/memory_profiler/) for monitoring memory consumption of a process as well as line-by-line analysis.\n* Second, when the record amounts scale up, and the __model sets of distinct transports__ keep increasing, that one can take tons of memory and CPU if we still do it naively by keeping a set of the counts for every model around. There's streaming approximate algorithms for this such as [__HyperLogLog__](https://en.wikipedia.org/wiki/HyperLogLog).\n* Last but not least, the __format of the datasets__. [__Protocol buffers__](https://developers.google.com/protocol-buffers/) and [__recordio__](http://mesos.apache.org/documentation/latest/recordio/), or even [__Cap'n Proto__](https://capnproto.org/) will be a good try. It's a binary storage format which is faster to parse, and resilient to corruption. (recordio files are checksummed, and can skip damaged section without losing the whole file)\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/chuhsuanlee/flyingtrain", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "flyingtrain", "package_url": "https://pypi.org/project/flyingtrain/", "platform": "", "project_url": "https://pypi.org/project/flyingtrain/", "project_urls": { "Homepage": "https://github.com/chuhsuanlee/flyingtrain" }, "release_url": "https://pypi.org/project/flyingtrain/0.1.3/", "requires_dist": [ "ijson (==2.3)" ], "requires_python": "", "summary": "package for bonial challenge", "version": "0.1.3" }, "last_serial": 4517202, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "194639c1fa9d24162e0c989393d83257", "sha256": "c06f37ecbec508e20723b2f35f1d608add9c696843c48199b6b17f942ac425a2" }, "downloads": -1, "filename": "flyingtrain-0.1.0-py2-none-any.whl", "has_sig": false, "md5_digest": "194639c1fa9d24162e0c989393d83257", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 2812, "upload_time": "2018-11-20T23:37:02", "url": "https://files.pythonhosted.org/packages/35/41/1c99072e5d61f9fb64c6fb5d1ac219e7da1c9a9ee146d050e4c8f6b3c7ac/flyingtrain-0.1.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fffcb23ecd837048f757a37833f8b03c", "sha256": "f4a7047dc2ec5dedfc44ba2f9c22c9bd67db0158fbbd08078aa0f355f2e85608" }, "downloads": -1, "filename": "flyingtrain-0.1.0.tar.gz", "has_sig": false, "md5_digest": "fffcb23ecd837048f757a37833f8b03c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1620, "upload_time": "2018-11-20T23:37:05", "url": "https://files.pythonhosted.org/packages/6e/24/97d60c20ecc6dcc9228c98e1c04a50973bff4e5df7e8bda4903cdf75ede6/flyingtrain-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "5db45e7af72d50e236df4118e63c1ee5", "sha256": "a284434778723a646c8d2458c7a2d7ccdef94707ef9a8ceae177dfe93c44c3ce" }, "downloads": -1, "filename": "flyingtrain-0.1.1-py2-none-any.whl", "has_sig": false, "md5_digest": "5db45e7af72d50e236df4118e63c1ee5", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 2756, "upload_time": "2018-11-21T20:50:52", "url": "https://files.pythonhosted.org/packages/0c/ef/6d36c1ede8251325c5400b1049455f1490d9eed99779336c94bd186a081a/flyingtrain-0.1.1-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "414f77c7558c8d9fd92426431feeb48c", "sha256": "fa653a026d991255147ebb90d557cd48e3d701b0e2090ec1670988c38bb5f9aa" }, "downloads": -1, "filename": "flyingtrain-0.1.1.tar.gz", "has_sig": false, "md5_digest": "414f77c7558c8d9fd92426431feeb48c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1561, "upload_time": "2018-11-21T20:50:54", "url": "https://files.pythonhosted.org/packages/91/5e/a8a1b2e9d03ccad4cdaf0b142673425ac8833906f6aab235c37730c40848/flyingtrain-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "670b908b9052af0c31f1958bf86acc49", "sha256": "966fdd5cd42d7dc69467cb8c0260bb8e5bb1cb94a57f4866f5e4f0931977d331" }, "downloads": -1, "filename": "flyingtrain-0.1.2-py2-none-any.whl", "has_sig": false, "md5_digest": "670b908b9052af0c31f1958bf86acc49", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 4391, "upload_time": "2018-11-22T00:21:58", "url": "https://files.pythonhosted.org/packages/2f/02/7da51f56375d1ae8d0159d8a8feaae556d5dd63f666962f82d00f0716985/flyingtrain-0.1.2-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "371505bacffffb4691e860da69f83877", "sha256": "d79574c60fb740e40739ae50073c9d82a0941277429fa6bd8c9e6e7a613662bb" }, "downloads": -1, "filename": "flyingtrain-0.1.2.tar.gz", "has_sig": false, "md5_digest": "371505bacffffb4691e860da69f83877", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3467, "upload_time": "2018-11-22T00:22:01", "url": "https://files.pythonhosted.org/packages/9b/05/5e3dc1b6af6db05f6da73512d8331e52f5f4e690874151be4638e13c61ec/flyingtrain-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "53469e901f472fe85f82fd7562ec2892", "sha256": "8d73e1d9b0c5c1ff2834cff9488a0aee8ffea7220f13116a951bab26287de5b6" }, "downloads": -1, "filename": "flyingtrain-0.1.3-py2-none-any.whl", "has_sig": false, "md5_digest": "53469e901f472fe85f82fd7562ec2892", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 4812, "upload_time": "2018-11-22T13:57:44", "url": "https://files.pythonhosted.org/packages/fa/89/a013fdb1c704814c96ce7649ff9dfba78f760483a9756e1747a98e2ca931/flyingtrain-0.1.3-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2c600dc631a32c4093b3cd9013f95c28", "sha256": "a159ef0d88ff26ba9237f1af6abffcf9db9d5e0333124c8b76412dbcb16ca049" }, "downloads": -1, "filename": "flyingtrain-0.1.3.tar.gz", "has_sig": false, "md5_digest": "2c600dc631a32c4093b3cd9013f95c28", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4028, "upload_time": "2018-11-22T13:57:48", "url": "https://files.pythonhosted.org/packages/9e/a9/b0c427129e890c27998ad71aa92eb66b130c8e7b99b78de928df0f7f3962/flyingtrain-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "53469e901f472fe85f82fd7562ec2892", "sha256": "8d73e1d9b0c5c1ff2834cff9488a0aee8ffea7220f13116a951bab26287de5b6" }, "downloads": -1, "filename": "flyingtrain-0.1.3-py2-none-any.whl", "has_sig": false, "md5_digest": "53469e901f472fe85f82fd7562ec2892", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 4812, "upload_time": "2018-11-22T13:57:44", "url": "https://files.pythonhosted.org/packages/fa/89/a013fdb1c704814c96ce7649ff9dfba78f760483a9756e1747a98e2ca931/flyingtrain-0.1.3-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2c600dc631a32c4093b3cd9013f95c28", "sha256": "a159ef0d88ff26ba9237f1af6abffcf9db9d5e0333124c8b76412dbcb16ca049" }, "downloads": -1, "filename": "flyingtrain-0.1.3.tar.gz", "has_sig": false, "md5_digest": "2c600dc631a32c4093b3cd9013f95c28", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4028, "upload_time": "2018-11-22T13:57:48", "url": "https://files.pythonhosted.org/packages/9e/a9/b0c427129e890c27998ad71aa92eb66b130c8e7b99b78de928df0f7f3962/flyingtrain-0.1.3.tar.gz" } ] }