{ "info": { "author": "Christoph Boeddeker", "author_email": "boeddeker@nt.upb.de", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy" ], "description": "\n# lazy_dataset\n\n[![Build Status](https://travis-ci.org/fgnt/lazy_dataset.svg?branch=master)](https://travis-ci.org/fgnt/lazy_dataset)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/fgnt/lazy_dataset/blob/master/LICENSE)\n\nLazy_dataset is a helper to deal with large datasets that do not fit into memory.\nIt allows to define transformations that are applied lazily,\n(e.g. a mapping function to read data from HDD). When someone iterates over the dataset all\ntransformations are applied.\n\nSupported transformations:\n - `dataset.map(map_fn)`: Apply the function `map_fn` to each example ([builtins.map](https://docs.python.org/3/library/functions.html#map))\n - `dataset[2]`: Get example at index `2`.\n - `dataset['example_id']` Get that example that has the example id `'example_id'`.\n - `dataset[10:20]`: Get a sub dataset that contains only the examples in the slice 10 to 20.\n - `dataset.filter(filter_fn, lazy=True)` Drops examples where `filter_fn(example)` is false ([builtins.filter](https://docs.python.org/3/library/functions.html#filter)).\n - `dataset.concatenate(*others)`: Concatenates two or more datasets ([numpy.concatenate](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.concatenate.html))\n - `dataset.shuffle(reshuffle=False)`: Shuffles the dataset. When `reshuffle` is `True` it shuffles each time when you iterate over the data.\n - `dataset.tile(reps, shuffle=False)`: Repeats the dataset `reps` times and concatenates it ([numpy.tile](https://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html))\n - `dataset.groupby(group_fn)`: Groups examples together. In contrast to `itertools.groupby` a sort is not nessesary, like in pandas ([itertools.groupby](https://docs.python.org/3/library/itertools.html#itertools.groupby), [pandas.DataFrame.groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html))\n - `dataset.sort(key_fn, sort_fn=sorted)`: Sorts the examples depending on the values `key_fn(example)` ([list.sort](https://docs.python.org/3/library/stdtypes.html#list.sort))\n - `dataset.batch(batch_size, drop_last=False)`: Batches `batch_size` examples together as a list. Usually followed by a map ([tensorflow.data.Dataset.batch](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch))\n - `dataset.random_choice()`: Get a random example ([numpy.random.choice](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html))\n - ...\n\n\n```python\n>>> from IPython.lib.pretty import pprint\n>>> import lazy_dataset\n>>> examples = {\n... 'example_id_1': {\n... 'observation': [1, 2, 3],\n... 'label': 1,\n... },\n... 'example_id_2': {\n... 'observation': [4, 5, 6],\n... 'label': 2,\n... },\n... 'example_id_3': {\n... 'observation': [7, 8, 9],\n... 'label': 3,\n... },\n... }\n>>> for example_id, example in examples.items():\n... example['example_id'] = example_id\n>>> ds = lazy_dataset.new(examples)\n>>> ds\n DictDataset(len=3)\nMapDataset(_pickle.loads)\n>>> ds.keys()\n('example_id_1', 'example_id_2', 'example_id_3')\n>>> for example in ds:\n... print(example)\n{'observation': [1, 2, 3], 'label': 1, 'example_id': 'example_id_1'}\n{'observation': [4, 5, 6], 'label': 2, 'example_id': 'example_id_2'}\n{'observation': [7, 8, 9], 'label': 3, 'example_id': 'example_id_3'}\n>>> def transform(example):\n... example['label'] *= 10\n... return example\n>>> ds = ds.map(transform)\n>>> for example in ds:\n... print(example)\n{'observation': [1, 2, 3], 'label': 10, 'example_id': 'example_id_1'}\n{'observation': [4, 5, 6], 'label': 20, 'example_id': 'example_id_2'}\n{'observation': [7, 8, 9], 'label': 30, 'example_id': 'example_id_3'}\n>>> ds = ds.filter(lambda example: example['label'] > 15)\n>>> for example in ds:\n... print(example)\n{'observation': [4, 5, 6], 'label': 20, 'example_id': 'example_id_2'}\n{'observation': [7, 8, 9], 'label': 30, 'example_id': 'example_id_3'}\n>>> ds['example_id_2']\n{'observation': [4, 5, 6], 'label': 20, 'example_id': 'example_id_2'}\n>>> ds\n DictDataset(len=3)\n MapDataset(_pickle.loads)\n MapDataset()\nFilterDataset( at 0x7ff74efb67b8>)\n```\n\n\n## Installation\n\nInstall it directly with Pip, if you just want to use it:\n\n```bash\npip install lazy_dataset\n```\n\nIf you want to make changes or want the most recent version: Clone the repository and install it as follows:\n\n```bash\ngit clone https://github.com/fgnt/lazy_dataset.git\ncd lazy_dataset\npip install --editable .\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/fgnt/lazy_dataset", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "lazy-dataset", "package_url": "https://pypi.org/project/lazy-dataset/", "platform": "", "project_url": "https://pypi.org/project/lazy-dataset/", "project_urls": { "Homepage": "https://github.com/fgnt/lazy_dataset" }, "release_url": "https://pypi.org/project/lazy-dataset/0.0.6/", "requires_dist": null, "requires_python": ">=3.6.0", "summary": "Process large datasets as if it was an iterable.", "version": "0.0.6" }, "last_serial": 5687508, "releases": { "0.0.0": [ { "comment_text": "", "digests": { "md5": "2fbed0996a8e0a75e3acb0dfcdde035d", "sha256": "4ebaccad796d907ab6f8d49e66182c637fddc13733c2ad885ad56e90e6eee308" }, "downloads": -1, "filename": "lazy_dataset-0.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "2fbed0996a8e0a75e3acb0dfcdde035d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 16568, "upload_time": "2019-03-08T15:10:22", "url": "https://files.pythonhosted.org/packages/73/74/5d22630ea7bbdf728ec3ed0e5cb657c2fce659bcfbc153f0da8645dbffe0/lazy_dataset-0.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "453f9feb8d24a834148260d15091ee55", "sha256": "b3a1d36eb843dfc758cd8c2173c2bbeb24c3bc3b523a1137df5fe2a3d1ac131e" }, "downloads": -1, "filename": "lazy_dataset-0.0.0.tar.gz", "has_sig": false, "md5_digest": "453f9feb8d24a834148260d15091ee55", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 16156, "upload_time": "2019-03-08T15:10:25", "url": "https://files.pythonhosted.org/packages/d2/db/a40e088ce20dd06ccbfca4a4f436107c136073d89ac4fcf6d6c8620d0159/lazy_dataset-0.0.0.tar.gz" } ], "0.0.1": [ { "comment_text": "", "digests": { "md5": "a99e77b4bbecd1ee4df88e6bd31ccf4a", "sha256": "58d628b90ee148a79c5b30f58c7beba57d589ca8914d915a1fdd2ea06e869e48" }, "downloads": -1, "filename": "lazy_dataset-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "a99e77b4bbecd1ee4df88e6bd31ccf4a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 16757, "upload_time": "2019-03-12T14:10:24", "url": "https://files.pythonhosted.org/packages/3c/f1/15ce34f7e9183f34ebb4e5345dfdab6831fa05a8cd1e51fe824babf9c556/lazy_dataset-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f843d97b5f2450bd543e6df7e2970578", "sha256": "ee2b30e9a54dc7b4c404d9f8e37f6f095d8f113c42ef9b94630e6eb30030ea23" }, "downloads": -1, "filename": "lazy_dataset-0.0.1.tar.gz", "has_sig": false, "md5_digest": "f843d97b5f2450bd543e6df7e2970578", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 16399, "upload_time": "2019-03-12T14:10:26", "url": "https://files.pythonhosted.org/packages/26/d5/63a7d254c0190281b4fa23ab5527c8d12a9c42557e57954e1a696bf58a80/lazy_dataset-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "9f7bed69862d9228be23082be5c5a648", "sha256": "84c6147fbb395094ef9aa40087fdd99f80c0f57cc6013c1e4384694f827acecb" }, "downloads": -1, "filename": "lazy_dataset-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "9f7bed69862d9228be23082be5c5a648", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 17088, "upload_time": "2019-03-29T13:46:28", "url": "https://files.pythonhosted.org/packages/9b/fd/4aeed7a9e1fe080b7d574d64a7914c4c5e6b2258d2bdd7c017cef837e221/lazy_dataset-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2146e642a3c9842e18a6632cec84f516", "sha256": "89fb47577d2419124ff55e26798f54f73b1f814eb03eb7a15d946a3848026725" }, "downloads": -1, "filename": "lazy_dataset-0.0.2.tar.gz", "has_sig": false, "md5_digest": "2146e642a3c9842e18a6632cec84f516", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 16684, "upload_time": "2019-03-29T13:46:29", "url": "https://files.pythonhosted.org/packages/92/67/3a7e6817534e3fd1253ea5938578dd1b1a58d0a0a20e221ae79bdc68c60c/lazy_dataset-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "d7373611fb662a5e966e8a1e091a755e", "sha256": "3d730eaf91b8d74e4da34f971091aef82e23533975fca15bd129e6763a4ee58f" }, "downloads": -1, "filename": "lazy_dataset-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "d7373611fb662a5e966e8a1e091a755e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 17090, "upload_time": "2019-03-29T13:51:52", "url": "https://files.pythonhosted.org/packages/77/30/cac5dc2d6545d4f3ec4ada6d4868a2296be230c7f6a358831d1a0d98983c/lazy_dataset-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "56d19b4f3e3f91bacb9a9bce0a675af1", "sha256": "810ca4cf4f419dc059c2e7976865ca4a8b95b34f01810886048981ce8b137613" }, "downloads": -1, "filename": "lazy_dataset-0.0.3.tar.gz", "has_sig": false, "md5_digest": "56d19b4f3e3f91bacb9a9bce0a675af1", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 16685, "upload_time": "2019-03-29T13:51:53", "url": "https://files.pythonhosted.org/packages/5f/72/96a46fb681185ede77fd3e7d4ff418c58948449c52b369972ee5151c240b/lazy_dataset-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "f5a32065e533a17177aee819a29f9d1e", "sha256": "e90575d6bd0d48eb53d165377eccbe74972d16476901af4567cd7d2489bc1b6d" }, "downloads": -1, "filename": "lazy_dataset-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "f5a32065e533a17177aee819a29f9d1e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 20341, "upload_time": "2019-06-21T15:14:21", "url": "https://files.pythonhosted.org/packages/b4/9b/ca230898304123acc0abe0032608c15dfd32005759fdb5a54dbd0060988a/lazy_dataset-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "433aefa5721f78dd301c45f08b66a2f7", "sha256": "8acd0675ad882684c1dfe8444db7db6c879367914c69e1455b6a0d506ef540f1" }, "downloads": -1, "filename": "lazy_dataset-0.0.4.tar.gz", "has_sig": false, "md5_digest": "433aefa5721f78dd301c45f08b66a2f7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 19929, "upload_time": "2019-06-21T15:14:22", "url": "https://files.pythonhosted.org/packages/30/21/3ef030ea8d2c24083bc17015f16b60debd0aed7a9bf7f741304f23750291/lazy_dataset-0.0.4.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "1fcbc8ca28643d0ab1477da3d2797694", "sha256": "031179ad292e4b8c9365f08d83a4cf6a311c55c1cfbba788d57e02c4cb8a04ef" }, "downloads": -1, "filename": "lazy_dataset-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "1fcbc8ca28643d0ab1477da3d2797694", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 24802, "upload_time": "2019-08-16T12:40:29", "url": "https://files.pythonhosted.org/packages/b3/05/473c844002a498c2b34f5b7cc411ed1fae57b32c5b5b3d7ccf95c60e7bf4/lazy_dataset-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "70e8a7c4efa5357fc71fd9dc6eb2c174", "sha256": "19cf63ad843253905278a2ed4776e991709667c9366d0fda634dafb2b40e8eb6" }, "downloads": -1, "filename": "lazy_dataset-0.0.6.tar.gz", "has_sig": false, "md5_digest": "70e8a7c4efa5357fc71fd9dc6eb2c174", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 23903, "upload_time": "2019-08-16T12:40:30", "url": "https://files.pythonhosted.org/packages/36/3a/596d5f22c6b1e596cb362bf6ad8b79198faadfbfced8e5f1482f055b57de/lazy_dataset-0.0.6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1fcbc8ca28643d0ab1477da3d2797694", "sha256": "031179ad292e4b8c9365f08d83a4cf6a311c55c1cfbba788d57e02c4cb8a04ef" }, "downloads": -1, "filename": "lazy_dataset-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "1fcbc8ca28643d0ab1477da3d2797694", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 24802, "upload_time": "2019-08-16T12:40:29", "url": "https://files.pythonhosted.org/packages/b3/05/473c844002a498c2b34f5b7cc411ed1fae57b32c5b5b3d7ccf95c60e7bf4/lazy_dataset-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "70e8a7c4efa5357fc71fd9dc6eb2c174", "sha256": "19cf63ad843253905278a2ed4776e991709667c9366d0fda634dafb2b40e8eb6" }, "downloads": -1, "filename": "lazy_dataset-0.0.6.tar.gz", "has_sig": false, "md5_digest": "70e8a7c4efa5357fc71fd9dc6eb2c174", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 23903, "upload_time": "2019-08-16T12:40:30", "url": "https://files.pythonhosted.org/packages/36/3a/596d5f22c6b1e596cb362bf6ad8b79198faadfbfced8e5f1482f055b57de/lazy_dataset-0.0.6.tar.gz" } ] }