{ "info": { "author": "Naftali Harris", "author_email": "naftaliharris@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.5", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.1", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: Implementation :: CPython", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "lazysorted\n==========\n\nlazysorted is a Python extension module for sorting sequences lazily. It\npresents the programmer with the abstraction that they are actually\nworking with a sorted list, when in fact the list is only physically\nsorted when the programmer requests elements from it, and even then it\nis only sorted partially, just enough to return whatever was requested.\n\nThe LazySorted object has a constructor that implements the same\ninterface as the builtin ``sorted(...)`` function, and it supports most\nof the non-mutating methods of a python list.\n\nSince the LazySorted object only sorts as much as necessary, it can be\nfaster than using the builtin ``sorted(...)`` for tasks that do not\nrequire the entire data to be sorted, like:\n\n1. Computing medians\n2. Computing `truncated\n means `_\n3. Quickly iterating through the first few sorted elements of a list\n4. Computing the deciles or quartiles of some data\n\nHow to use it\n-------------\n\nYou can use LazySorted in much the same way you use the ``sorted(...)``\nfunction and the python lists it produces:\n\n::\n\n from lazysorted import LazySorted\n from math import floor, ceil\n\n\n def median(xs):\n \"\"\"An expected linear time median function\"\"\"\n ls = LazySorted(xs)\n n = len(ls)\n if n == 0:\n raise ValueError(\"Need a non-empty iterable\")\n elif n % 2 == 1:\n return ls[n//2]\n else:\n return sum(ls[(n/2-1):(n/2+1)]) / 2.0\n\n\n def top_k(xs, k, key=None, reverse=False):\n \"\"\"Efficiently computes the top k elements of xs using the given key, or\n the bottom k if reverse=True\"\"\"\n ls = LazySorted(xs, key=key, reverse=reverse)\n return ls[0:k]\n \n\n def trimmed_mean(xs, alpha=0.05):\n \"\"\"Computes the mean of the xs from the alpha to (1-alpha) quantiles\n in expected linear time. More robust than the ordinary sample mean.\"\"\"\n if not 0 <= alpha < 0.5:\n raise ValueError(\"alpha must be in [0, 0.5)\")\n\n ls = LazySorted(xs)\n n = len(ls)\n if n == 0:\n raise ValueError(\"Need a non-empty iterable\")\n lower = int(floor(n * alpha))\n upper = int(ceil(n * (1 - alpha)))\n return sum(ls.between(lower, upper)) / (upper - lower)\n\nIn addition to the ``__len__`` and ``__getitem__`` methods demostrated\nabove, LazySorted also supports the ``__iter__``, ``__contains__``,\n``index``, and ``count`` methods, just like a regular python list:\n\n::\n\n >>> import random\n >>> from lazysorted import LazySorted\n >>> xs = list(range(1000)) + 5 * [1234]\n >>> random.shuffle(xs)\n >>> ls = LazySorted(xs)\n >>> for x in ls:\n ... print(x)\n ... if x >= 3:\n ... break\n 0\n 1\n 2\n 3\n >>> 1235 in ls\n False\n >>> ls.index(821)\n 821\n >>> ls.count(1234)\n 5\n\nAlthough the LazySorted constructor pretends to be equivalent to the\n``sorted`` function, and the LazySorted object pretends to be equivalent\nto a sorted python list, there are a few differences between them:\n\n1. LazySorted objects are immutable, while python lists are not.\n2. Sorting with the builtin ``sorted`` function is guaranteed to be\n stable, (ie, preserve the original order of elements that compare\n equal), while LazySorted sorting is not stable.\n3. The LazySorted object has a ``between(i, j)`` method, which returns a\n list of all the items whose sorted indices are in ``range(i, j)``,\n but not necessarily in order. This is useful, for example, for\n throwing away outliers when computing an alpha-trimmed mean.\n\nWhen the APIs differ between python2.x and python3.x, lazysorted\nimplements the python3.x version. So the LazySorted constructor does not\nsupport the ``cmp`` argument that was removed in python3.x, and the\nLazySorted object does not support the ``__getslice__`` method that was\nalso removed in python3.x.\n\nAll of the LazySorted methods have pretty good documentation, which can\nbe accessed through the builtin ``help(...)`` function.\n\nI've tested lazysorted and found it to work for CPython versions 2.5,\n2.6, 2.7, and 3.1, 3.2, and 3.3. I haven't tested 3.0.\n\nHow it works\n------------\n\nIn short, LazySorted works by using quicksort partitions lazily and\nkeeping track of the indices used as pivots.\n\n**`quicksort `_** sorts a list\nby picking an element of the list to be the \"pivot\", and then\npartitioning the data into the part that's greater than or equal to the\npivot and the part that's less than the pivot. These two parts are then\nrecursively sorted with quicksort,\n\n**`quickselect `_** finds the\nkth smallest element of a list by picking a pivot element and\npartitioning the data, as in quicksort. Then the algorithm recurses into\nthe larger or smaller part of the list, depending on whether k is larger\nor smaller than the index of the pivot element.\n\nThere are two key observations to make from these algorithms: First of\nall, if we are only interested in part of a sorted list, we only need to\nrecurse into the part we are interested in after doing a partition.\nSecond of all, after doing some partitions, the list is partially\nsorted, with the pivots all in their sorted order and the elements\nbetween two pivots guaranteed to be bigger than the pivot to their left\nand smaller than the pivot to their right.\n\nSo whenever some data is queried from a LazySorted object, we first look\nthrough the pivots to see which pivots delimit the data we want. Then we\npartition sublist(s) as necessary and recurse into the side(s) that our\ndata is in.\n\nThere are also some implementation details that help lazysorted to run\nquickly: First of all, pivots elements are chosen to be the median of\nthree randomly elements, which makes the partition likely to be more\nbalanced and guarantees average case O(n log n) behavior.\n\nSecond of all, for sufficiently small lists, lazysorted uses insertion\nsort instead of quicksort, which is faster on small lists. Both of these\ntricks are well-known to speed up quicksort implementations.\n\nThirdly, since it's important to find the pivots that bound an index\nquickly, lazysorted stores the pivots in a binary search tree, so that\nthese sorts of lookups occur in O(log n) expected time. The BST\nlazysorted uses is a `Treap `_,\nselected for its overall expected speed, especially in insertion and\ndeletion.\n\nlazysorted also makes a big effort to delete irrelevant pivots from the\nBST; for example, if there are three pivots at indices 5, 26, and 42,\nand both the data (between 5 and 26) and (between 26 and 42) is sorted,\nthen we can remove the irrelevant pivot 26, and just say that the data\nbetween indices 5 and 42 is sorted.\n\nInstallation\n------------\n\nlazysorted requires the python headers, (Python.h). I believe they ship\nwith OSX, but if you don't have them they can be installed on a\ndebian-like system with\n\n::\n\n $ sudo apt-get install python-dev\n\nThen you can install lazysorted with\n\n::\n\n $ sudo python setup.py install\n\nAlternatively, you can install lazysorted from pypi with\n\n::\n\n $ easy_install --user lazysorted\n\nor\n\n::\n\n $ pip install lazysorted\n\nthough you'll still need the python headers for it to build properly.\n\nTesting\n-------\n\nI've put in a fair bit of effort to test that lazysorted actually does\nwhat it's supposed to. You can test it yourself (after installing it)\nwith\n\n::\n\n $ python test.py\n\nFAQ\n---\n\n**Doesn't numpy have a median and percentile function?**\n\nYes, but it's implemented by sorting the entire array and then reading\noff the requested values, not with quickselect or another O(n) selection\nalgorithm. And LazySorted is empirically faster, as you can see from\nbenchmark.py\n\n**Isn't python3.4 going to have a statistics module with a median\nfunction?**\n\nYes, and I'm really excited about it! This is\n`PEP450 `_. Unfortunately, the\ncurrent implementation is in pure python, and computes the median by\nsorting the data and picking off the middle element.\n\n**Doesn't the standard library have a heapq module?**\n\nYes, but it lacks the full generality of this module. For example, you\ncan use it to get the k smallest elements in O(n log k) time, but not k\narbitrary contiguous elements. This module represents a different\nparadigm: you're allowed to program as if your list was sorted, and let\nthe data structure deal with the details.\n\n**How is lazysorted licensed?**\n\nlazysorted is BSD-licensed. So you can use it pretty much however you\nlike! See LICENSE for details.\n\n**What should I not use lazysorted for?**\n\n1. Applications requiring a stable sort; the quicksort partitions make\n the order of equal elements in the sorted list undefined.\n2. Applications requiring guaranteed fast worst-case performance.\n Although it's very unlikely, many operations in LazySorted run in\n worst case O(n^2) time.\n3. Applications requiring high security. The random number generator is\n insecure and seeded from system time, so an (ambitious) attacker\n could reverse engineer the random number generator and feed\n LazySorted pathological lists that make it run in O(n^2) time.\n4. Sorting entire lists: The builtin ``sorted(...)`` is *very*\n impressively designed and implemented. It also has the advantage of\n running faster than O(n log n) on lists with partial structure.\n\n**How does lazysorted work at scale?**\n\nUnfortunately, only okay. This turns out to be primarily due to the fact\nthat CPython deals with python objects by passing around pointers to\nthem, causing cache misses when the list and its elements no longer fit\nin cache. The gory details can be found in a blog post I wrote about\n`Memory Locality and Python\nObjects `_.\n\nHowever, this effect doesn't kick in until lists grow larger than about\n100K values, and even past that lazysorted remains faster than complete\nsorting.\n\nContact me!\n-----------\n\nIf you use this software and feel so inclined, I'd greatly appreciate\nhearing what you are using it for! You can hit me up on Twitter\n`@naftaliharris `_, or at my email\naddress on my `contact page `_.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "www.naftaliharris.com", "keywords": "sort,sorting,partial,lazy,list", "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "lazysorted", "package_url": "https://pypi.org/project/lazysorted/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/lazysorted/", "project_urls": { "Download": "UNKNOWN", "Homepage": "www.naftaliharris.com" }, "release_url": "https://pypi.org/project/lazysorted/0.1.1/", "requires_dist": null, "requires_python": null, "summary": "A partially and lazily sorted list data structure", "version": "0.1.1" }, "last_serial": 996664, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "2efd01988d1292ecba139fb0ce48f81a", "sha256": "fe3868e1337478d1a58904515c61ac68aff91488d131155486d987211731365f" }, "downloads": -1, "filename": "lazysorted-0.1.tar.gz", "has_sig": false, "md5_digest": "2efd01988d1292ecba139fb0ce48f81a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16435, "upload_time": "2013-11-02T18:47:27", "url": "https://files.pythonhosted.org/packages/29/59/1f2b31bc588720af5f352f77be41a0d8b6220c10316e23ec18dd14e39489/lazysorted-0.1.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "08d3bed823c432f345e46099ac59d9d9", "sha256": "5149cb9e7ba953e8a127ce7e0a701278d7d6a379e61b26a15ae9cd98277f70f2" }, "downloads": -1, "filename": "lazysorted-0.1.1.tar.gz", "has_sig": false, "md5_digest": "08d3bed823c432f345e46099ac59d9d9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16768, "upload_time": "2014-02-10T22:41:25", "url": "https://files.pythonhosted.org/packages/55/ae/e0e5bad8a56755794c6610be837ebc330bacf8ce65393e114faf110862e7/lazysorted-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "08d3bed823c432f345e46099ac59d9d9", "sha256": "5149cb9e7ba953e8a127ce7e0a701278d7d6a379e61b26a15ae9cd98277f70f2" }, "downloads": -1, "filename": "lazysorted-0.1.1.tar.gz", "has_sig": false, "md5_digest": "08d3bed823c432f345e46099ac59d9d9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16768, "upload_time": "2014-02-10T22:41:25", "url": "https://files.pythonhosted.org/packages/55/ae/e0e5bad8a56755794c6610be837ebc330bacf8ce65393e114faf110862e7/lazysorted-0.1.1.tar.gz" } ] }