{ "info": { "author": "Alexander Karpinsky", "author_email": "homm86@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "# Django chunked iterator\n\nDjango provides a simple way to make complex queries.\nUnfortunately, another side of the coin is\nhigh memory consumption for really huge datasets.\n\nThe thing is that when you are getting an object from\n[QuerySet](https://docs.djangoproject.com/en/2.1/ref/models/querysets/),\nDjango makes a query for all objects and makes model instances\nfor *all returned rows in memory*.\nEven if you only need one object at a time.\nMemory also will be consumed for holding result rows\nin the database and in Python's database driver.\n\n```python\nfor e in Entry.objects.all():\n print(e.headline)\n```\n\nThere is a way to improve it:\n[iterator](https://docs.djangoproject.com/en/2.1/ref/models/querysets/#iterator).\n\n```python\nfor e in Entry.objects.iterator():\n print(e.headline)\n```\n\nIn this way, Django will construct model instances\non the fly only for current iteration.\nDepending on your database and settings,\nDjango can also get all rows from database in one query,\nor it can use server-side cursors to get rows by chunks.\n\nIn the latter case (with server-side cursors),\nonly limited amount of memory will be consumed\nin the database and in Python's database driver.\nBut this only works with certain databases\nand without connection poolers (like [pgbouncer](https://pgbouncer.github.io)).\nThere is no way for your code to be sure that\nthe memory-efficient method is used.\n\n## Design\n\nThis chunked iterator takes queryset and makes serial queries\nreturning fixed number of rows or model instances.\nThis allows iterating really huge amount of rows\nwith fixed memory consumption on the database,\nPython's driver, and application layers.\nAs a side effect, the first portion of rows is returned much quicker,\nwhich allows starting processing in parallel in some cases.\n\nThere is only one limitation: the model should have a unique field\nwhich will be used for sorting and pagination.\nFor most cases, this is the primary key, but you can use other fields.\n\n## Installing\n\n```bash\n$ pip install django-chunked-iterator\n```\n\n## Using\n\n#### Iterate over `QuerySet` by chunks\n\n```python\nfrom django_chunked_iterator import batch_iterator\n\nfor entries in batch_iterator(Entry.objects.all()):\n for e in entries:\n print(e.headline)\n```\n\n#### Iterate over `QuerySet` by items\n\n```python\nfrom django_chunked_iterator import iterator\n\nfor e in iterator(Entry.objects.all()):\n print(e.headline)\n```\n\n#### Limit number of returned rows\n\nWRONG!\n\n```python\nfor e in iterator(Entry.objects.all()[:10000]):\n print(e.headline)\nAssertionError: Cannot reorder a query once a slice has been taken.\n```\n\nRight:\n\n```python\nfor e in iterator(Entry.objects.all(), limit=10000):\n print(e.headline)\n```\n\n#### Change batch size\n\nThe smaller batch size, the faster first item is returned, \nthe larger overhead for additional queries.\nOptimal values from 100 to 1000.\n\n```python\nfor e in iterator(Entry.objects.all(), batch_size=150):\n print(e.headline)\n```\n\n#### Change ordering\n\nReturns items in reverse creation order.\n`created` field **should** have `uniquie=True`\n(this is not hard when datetime has microseconds accuracy).\n\n```python\nfor e in iterator(Entry.objects.all(), order_by='-created'):\n print(e.headline)\n```\n\n\n## Testing\n\n```bash\n$ pip install -r ./requirements.txt\n$ ./test_project/manage.py test -v 2 --with-coverage --cover-package=django_chunked_iterator\n```\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/homm/django-chunked-iterator", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "django-chunked-iterator", "package_url": "https://pypi.org/project/django-chunked-iterator/", "platform": "", "project_url": "https://pypi.org/project/django-chunked-iterator/", "project_urls": { "Homepage": "https://github.com/homm/django-chunked-iterator" }, "release_url": "https://pypi.org/project/django-chunked-iterator/0.6.1/", "requires_dist": null, "requires_python": "", "summary": "Iterates Django querysets by chunks, saving memory and allows to start faster.", "version": "0.6.1" }, "last_serial": 4214082, "releases": { "0.5": [ { "comment_text": "", "digests": { "md5": "41b7513cdbf8f990e76f9808cd3bb84c", "sha256": "20f5d205be702eed63992fdcf357b92b9f11b793fb8c331571a4505d01022316" }, "downloads": -1, "filename": "django-chunked-iterator-0.5.tar.gz", "has_sig": false, "md5_digest": "41b7513cdbf8f990e76f9808cd3bb84c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1271, "upload_time": "2018-08-26T21:56:28", "url": "https://files.pythonhosted.org/packages/ef/53/47a74b5180bd2b4ead51d497c834c53d16ed596156c86fb7681f9d2eceba/django-chunked-iterator-0.5.tar.gz" } ], "0.6": [ { "comment_text": "", "digests": { "md5": "38fa03925e8258a2542c7390d5f7242c", "sha256": "279e19e020e3d30f31c0bb3f4f7bae59809b8202f7a5e6682c6136bedc1357b5" }, "downloads": -1, "filename": "django-chunked-iterator-0.6.tar.gz", "has_sig": false, "md5_digest": "38fa03925e8258a2542c7390d5f7242c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2970, "upload_time": "2018-08-27T13:22:09", "url": "https://files.pythonhosted.org/packages/b6/a7/a742fcd5eb06de1a3759c839b81b3b784c6c72ab25e66995231421acc062/django-chunked-iterator-0.6.tar.gz" } ], "0.6.1": [ { "comment_text": "", "digests": { "md5": "57ad51ca87a4852ff97c58e04ecda45b", "sha256": "89b73870f3dce2c688fdbec2eb30bacb7dedc5d138a7cb36d2f47baa58411015" }, "downloads": -1, "filename": "django-chunked-iterator-0.6.1.tar.gz", "has_sig": false, "md5_digest": "57ad51ca87a4852ff97c58e04ecda45b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3292, "upload_time": "2018-08-28T08:46:15", "url": "https://files.pythonhosted.org/packages/c0/74/c7091a5bcf21283c67cde1b3c7f36406355c4cc697e90699341e0fb32545/django-chunked-iterator-0.6.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "57ad51ca87a4852ff97c58e04ecda45b", "sha256": "89b73870f3dce2c688fdbec2eb30bacb7dedc5d138a7cb36d2f47baa58411015" }, "downloads": -1, "filename": "django-chunked-iterator-0.6.1.tar.gz", "has_sig": false, "md5_digest": "57ad51ca87a4852ff97c58e04ecda45b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3292, "upload_time": "2018-08-28T08:46:15", "url": "https://files.pythonhosted.org/packages/c0/74/c7091a5bcf21283c67cde1b3c7f36406355c4cc697e90699341e0fb32545/django-chunked-iterator-0.6.1.tar.gz" } ] }