{
"info": {
"author": "Chu-Hsuan Lee",
"author_email": "joseph.chuhsuanlee@gmail.com",
"bugtrack_url": null,
"classifiers": [
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 2"
],
"description": "# flyingtrain - Document\n\nUse an iterative parser to retrieve transport models and total passenger capacity from long JSON transport list in a .txt file\n\n## Installation\nThis project is packaged with Python 2, and can be installed with `pip`. Copy-paste and run this command in the terminal:\n```\npip install flyingtrain\n```\n\n## Docker (supplementary solution)\n* This project is also dockerized. [Docker](https://docs.docker.com/install/) needs to be installed to run this project in containerization method.\n* The [Dockerfile](Dockerfile) uses \u200b`python:2`\u200b\u200b as base image.\n* There are some feasible commands as indicated in \u200b[Makefile\u200b](Makefile), or simply execute \u200b `make help`, it will show the Make commands that can be used. (We will go through more in detail later)\n\n## Tool\nThis project uses [__ijson__](https://pypi.org/project/ijson/) as an iterative JSON parser to avoid dumping the entire data file into memory\n\n## Usage\nAfter installation, the following snippet can be used inside a virtual environment to extract the data\n```py\nimport flyingtrain\n\ntest_file = 'test.txt' # the full path of the file\n\nflyingtrain.extract_data(test_file)\n```\nthe result\n```sh\n(flyingtrain) chuhsuan@ubuntu:~/Desktop$ python\nPython 2.7.12 (default, Nov 12 2018, 14:36:49)\n[GCC 5.4.0 20160609] on linux2\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import flyingtrain\n>>> flyingtrain.extract_data('test.txt')\n\"planes\": 524\n\"trains\": 150\n\"cars\": 14\n\n\"distinct-cars\": 3\n\"distinct-planes\": 2\n\"distinct-trains\": 1\n```\n_Docker solution_
\nCopy the data file to the root folder, assign the file name to [test_file](main.py#L4) in `main.py` and execute `make run`. Volume binding can be used like [this line](Makefile#L10) in Makefile to avoid copying the file, but it's not implemented here while taking docker as a supplementary solution.
\n\n_the result of the docker solution_\n```sh\nchuhsuan@ubuntu:~/git/flyingtrain$ make run\ndocker build \\\n\t-t chuhsuanlee/flyingtrain \\\n\t.\nSending build context to Docker daemon 61.44kB\nStep 1/5 : FROM python:2\n ---> 3c43a5d4034a\nStep 2/5 : WORKDIR /usr/src\n ---> Using cache\n ---> 37e4d0e02609\nStep 3/5 : COPY requirements.txt /usr/src/\n ---> Using cache\n ---> 85ae12b2a6f6\nStep 4/5 : RUN pip install -r requirements.txt\n ---> Using cache\n ---> 9d33ec10c044\nStep 5/5 : ENTRYPOINT [\"python\", \"main.py\"]\n ---> Using cache\n ---> e3d261a60154\nSuccessfully built e3d261a60154\nSuccessfully tagged chuhsuanlee/flyingtrain:latest\ndocker run \\\n\t--rm -v /etc/localtime:/etc/localtime -v /home/chuhsuan/git/flyingtrain:/usr/src \\\n\tchuhsuanlee/flyingtrain\n\"planes\": 524\n\"trains\": 150\n\"cars\": 14\n\n\"distinct-cars\": 3\n\"distinct-planes\": 2\n\"distinct-trains\": 1\n```\n\n## Benchmark\nThe following command is used in the terminal to show how much time it takes to retrieve the data\n```sh\npython -m timeit -s \"import flyingtrain\" \"flyingtrain.extract_data('test.txt')\"\n```\nthe result\n```\n1000 loops, best of 3: 684 usec per loop\n```\nwhich means it takes around 684 usec for executing once
\n\n_Docker solution_
\nAssign the file name to [test_file](benchmark.py#L4) in `benchmark.py` and execute `make runbenchmark`. Again, volume binding is not implemented here, so the file should be put under the root folder.
\n\n_the result of the docker solution_\n```\n[0.6676740646362305, 0.6634271144866943, 0.6310489177703857]\n```\nwhich means measuring execution time with 3 repeats counts and each count with 1000 executions. For average it takes 654 usec per execution\n\n## Possible optimizations\n* First, for __benchmarking__, the build-in module `timeit` is used here. There are also some third party packages can be used such as [__memory_profiler__](https://pypi.org/project/memory_profiler/) for monitoring memory consumption of a process as well as line-by-line analysis.\n* Second, when the record amounts scale up, and the __model sets of distinct transports__ keep increasing, that one can take tons of memory and CPU if we still do it naively by keeping a set of the counts for every model around. There's streaming approximate algorithms for this such as [__HyperLogLog__](https://en.wikipedia.org/wiki/HyperLogLog).\n* Last but not least, the __format of the datasets__. [__Protocol buffers__](https://developers.google.com/protocol-buffers/) and [__recordio__](http://mesos.apache.org/documentation/latest/recordio/), or even [__Cap'n Proto__](https://capnproto.org/) will be a good try. It's a binary storage format which is faster to parse, and resilient to corruption. (recordio files are checksummed, and can skip damaged section without losing the whole file)\n\n\n",
"description_content_type": "text/markdown",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/chuhsuanlee/flyingtrain",
"keywords": "",
"license": "",
"maintainer": "",
"maintainer_email": "",
"name": "flyingtrain",
"package_url": "https://pypi.org/project/flyingtrain/",
"platform": "",
"project_url": "https://pypi.org/project/flyingtrain/",
"project_urls": {
"Homepage": "https://github.com/chuhsuanlee/flyingtrain"
},
"release_url": "https://pypi.org/project/flyingtrain/0.1.3/",
"requires_dist": [
"ijson (==2.3)"
],
"requires_python": "",
"summary": "package for bonial challenge",
"version": "0.1.3"
},
"last_serial": 4517202,
"releases": {
"0.1.0": [
{
"comment_text": "",
"digests": {
"md5": "194639c1fa9d24162e0c989393d83257",
"sha256": "c06f37ecbec508e20723b2f35f1d608add9c696843c48199b6b17f942ac425a2"
},
"downloads": -1,
"filename": "flyingtrain-0.1.0-py2-none-any.whl",
"has_sig": false,
"md5_digest": "194639c1fa9d24162e0c989393d83257",
"packagetype": "bdist_wheel",
"python_version": "py2",
"requires_python": null,
"size": 2812,
"upload_time": "2018-11-20T23:37:02",
"url": "https://files.pythonhosted.org/packages/35/41/1c99072e5d61f9fb64c6fb5d1ac219e7da1c9a9ee146d050e4c8f6b3c7ac/flyingtrain-0.1.0-py2-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "fffcb23ecd837048f757a37833f8b03c",
"sha256": "f4a7047dc2ec5dedfc44ba2f9c22c9bd67db0158fbbd08078aa0f355f2e85608"
},
"downloads": -1,
"filename": "flyingtrain-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "fffcb23ecd837048f757a37833f8b03c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 1620,
"upload_time": "2018-11-20T23:37:05",
"url": "https://files.pythonhosted.org/packages/6e/24/97d60c20ecc6dcc9228c98e1c04a50973bff4e5df7e8bda4903cdf75ede6/flyingtrain-0.1.0.tar.gz"
}
],
"0.1.1": [
{
"comment_text": "",
"digests": {
"md5": "5db45e7af72d50e236df4118e63c1ee5",
"sha256": "a284434778723a646c8d2458c7a2d7ccdef94707ef9a8ceae177dfe93c44c3ce"
},
"downloads": -1,
"filename": "flyingtrain-0.1.1-py2-none-any.whl",
"has_sig": false,
"md5_digest": "5db45e7af72d50e236df4118e63c1ee5",
"packagetype": "bdist_wheel",
"python_version": "py2",
"requires_python": null,
"size": 2756,
"upload_time": "2018-11-21T20:50:52",
"url": "https://files.pythonhosted.org/packages/0c/ef/6d36c1ede8251325c5400b1049455f1490d9eed99779336c94bd186a081a/flyingtrain-0.1.1-py2-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "414f77c7558c8d9fd92426431feeb48c",
"sha256": "fa653a026d991255147ebb90d557cd48e3d701b0e2090ec1670988c38bb5f9aa"
},
"downloads": -1,
"filename": "flyingtrain-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "414f77c7558c8d9fd92426431feeb48c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 1561,
"upload_time": "2018-11-21T20:50:54",
"url": "https://files.pythonhosted.org/packages/91/5e/a8a1b2e9d03ccad4cdaf0b142673425ac8833906f6aab235c37730c40848/flyingtrain-0.1.1.tar.gz"
}
],
"0.1.2": [
{
"comment_text": "",
"digests": {
"md5": "670b908b9052af0c31f1958bf86acc49",
"sha256": "966fdd5cd42d7dc69467cb8c0260bb8e5bb1cb94a57f4866f5e4f0931977d331"
},
"downloads": -1,
"filename": "flyingtrain-0.1.2-py2-none-any.whl",
"has_sig": false,
"md5_digest": "670b908b9052af0c31f1958bf86acc49",
"packagetype": "bdist_wheel",
"python_version": "py2",
"requires_python": null,
"size": 4391,
"upload_time": "2018-11-22T00:21:58",
"url": "https://files.pythonhosted.org/packages/2f/02/7da51f56375d1ae8d0159d8a8feaae556d5dd63f666962f82d00f0716985/flyingtrain-0.1.2-py2-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "371505bacffffb4691e860da69f83877",
"sha256": "d79574c60fb740e40739ae50073c9d82a0941277429fa6bd8c9e6e7a613662bb"
},
"downloads": -1,
"filename": "flyingtrain-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "371505bacffffb4691e860da69f83877",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3467,
"upload_time": "2018-11-22T00:22:01",
"url": "https://files.pythonhosted.org/packages/9b/05/5e3dc1b6af6db05f6da73512d8331e52f5f4e690874151be4638e13c61ec/flyingtrain-0.1.2.tar.gz"
}
],
"0.1.3": [
{
"comment_text": "",
"digests": {
"md5": "53469e901f472fe85f82fd7562ec2892",
"sha256": "8d73e1d9b0c5c1ff2834cff9488a0aee8ffea7220f13116a951bab26287de5b6"
},
"downloads": -1,
"filename": "flyingtrain-0.1.3-py2-none-any.whl",
"has_sig": false,
"md5_digest": "53469e901f472fe85f82fd7562ec2892",
"packagetype": "bdist_wheel",
"python_version": "py2",
"requires_python": null,
"size": 4812,
"upload_time": "2018-11-22T13:57:44",
"url": "https://files.pythonhosted.org/packages/fa/89/a013fdb1c704814c96ce7649ff9dfba78f760483a9756e1747a98e2ca931/flyingtrain-0.1.3-py2-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "2c600dc631a32c4093b3cd9013f95c28",
"sha256": "a159ef0d88ff26ba9237f1af6abffcf9db9d5e0333124c8b76412dbcb16ca049"
},
"downloads": -1,
"filename": "flyingtrain-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "2c600dc631a32c4093b3cd9013f95c28",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 4028,
"upload_time": "2018-11-22T13:57:48",
"url": "https://files.pythonhosted.org/packages/9e/a9/b0c427129e890c27998ad71aa92eb66b130c8e7b99b78de928df0f7f3962/flyingtrain-0.1.3.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "53469e901f472fe85f82fd7562ec2892",
"sha256": "8d73e1d9b0c5c1ff2834cff9488a0aee8ffea7220f13116a951bab26287de5b6"
},
"downloads": -1,
"filename": "flyingtrain-0.1.3-py2-none-any.whl",
"has_sig": false,
"md5_digest": "53469e901f472fe85f82fd7562ec2892",
"packagetype": "bdist_wheel",
"python_version": "py2",
"requires_python": null,
"size": 4812,
"upload_time": "2018-11-22T13:57:44",
"url": "https://files.pythonhosted.org/packages/fa/89/a013fdb1c704814c96ce7649ff9dfba78f760483a9756e1747a98e2ca931/flyingtrain-0.1.3-py2-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "2c600dc631a32c4093b3cd9013f95c28",
"sha256": "a159ef0d88ff26ba9237f1af6abffcf9db9d5e0333124c8b76412dbcb16ca049"
},
"downloads": -1,
"filename": "flyingtrain-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "2c600dc631a32c4093b3cd9013f95c28",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 4028,
"upload_time": "2018-11-22T13:57:48",
"url": "https://files.pythonhosted.org/packages/9e/a9/b0c427129e890c27998ad71aa92eb66b130c8e7b99b78de928df0f7f3962/flyingtrain-0.1.3.tar.gz"
}
]
}