{ "info": { "author": "Ronald L. Rivest", "author_email": "rivest@mit.edu", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# consistent_sampler\n\nRoutine ``sampler`` for providing 'consistent sampling' --- sampling\nthat is consistent across subsets (as explained below).\n\nHere we call the elements to be sampled \"ids\", although they may be\narbitrary python objects (strings, tuples, whatever). We assume that\nids are distinct.\n\nConsistent sampling works by associating a random \"ticket number\" with\neach id; the desired sample is found by taking the subset of the\ndesired sample size containing those elements with the smallest\nassociated random numbers.\n\nThe random ticket numbers are computed using a given \"seed\"; this seed\nmay be an arbitrary python object, typically a large integer or long\nstring.\n\nThe sampling is *consistent* since it consistently favors elements\nwith small ticket numbers; if two sets S and T have substantial\noverlap, then their samples of a given size will also have substantial\noverlap (for the same random seed).\n\nThis routine takes as input a finite collection of distinct object\nids, a random seed, and some other parameters. The sampling may be\n\"with replacement\" or \"without replacement\". One of the additional\nparameters to the routine is \"take\" -- the size of the desired sample.\n\nIt provides as output a \"sampling order\" --- an ordered list of object\nids that determine the sample. Each object id as associated with a\nrandom value (its \"ticket number\") that depends on the id and the\nseed; ids are output in order of increasing ticket number. For\nefficiency and portability, the ticket number is represented as a\ndecimal fraction 0.dddd...dddd between 0 and 1.\n\nFor sampling without replacement, the output can not be longer than\nthe input, as no id may appear in the sample more than once.\n\nFor sampling with replacement, the output may be infinite in length,\nas an id may appear in the sample an arbitrarily large (even infinite)\nnumber of times. When an id is sampled and then replaced in the set\nof ids being sampled, it is given a new random ticket number drawn\nuniformly from the set of numbers in (0, 1) larger than its previous\nticket number.\n\nThe output of ``sampler`` is always a python \ngenerator, capable of producing an infinitely long stream of ids.\n\n## Example 1.\nAs a small example of sampling without replacement:\n\n g = sampler(['A-1', 'A-2', 'A-3', 'B-1', 'B-2', 'B-3'], \n with_replacement=False, take=4, seed=314159, output='id')\n\nyields a generator g whose output can be printed:\n\n print(list(id for id in g))\n\nwhich produces:\n\n ['B-2', 'B-3', 'A-3', 'A-2']\n\n\n\n## Example 2.\nHere is an example where sampling is with replacement from a set of 6 ids,\nand the output gives a triple (tuple) for each selected id, giving\n 1. the associated random ticket number,\n 2. the selected id, and\n 3. the \"generation\" (number of times the id has been selected so far).\n\n >>> for t in sampler(['a1', 'b2', 'c3', 'd4', 'e5', 'f6'],\n with_replacement=True, seed=19283746, take=10):\n ... print(t)\n ('0.303241347', 'e5', 1)\n ('0.432145156', 'b2', 1)\n ('0.487135586', 'c3', 1)\n ('0.581779914', 'b2', 2)\n ('0.680782907', 'b2', 3)\n ('0.700258702', 'c3', 2)\n ('0.816686725', 'b2', 4)\n ('0.841870265', 'a1', 1)\n ('0.857737141', 'a1', 2)\n ('0.866227993', 'f6', 1)\n\n## Discussion\nThis routine is designed for use in election audits,\nwhere the ids being sampled are ballot ids, but this routine\nis suitable for general use. \n\nFor a similar election audit sampling method,\nsee Stark's election audit tools:\n https://www.stat.berkeley.edu/~stark/Vote/auditTools.htm\n\nConsistent sampling is not a new idea, see for example\nhttps://arxiv.org/abs/1612.01041\nand the references to consistent sampling therein.\nThe routine here may (or may not) be novel in that it extends consistent\nsampling to sampling with replacement.\n\nFor our applications, one big advantage of consistent sampling is the\nfollowing. If each county collects cast ballots separately, then they\ncan order their own ballots for sampling and interpretation\nindependently of what other counties are doing. An overall sample\norder can be constructed from the individual county sample order, by\nmerging the list of triples each produces into an overall sorted\norder.\n\n## Usage\nFurther documentation and examples are:\n 1. in the code `consistent_sampler.py`\n 2. obtainable by running the program ``demo_consistent_sampler.py``\n 3. described in the file USAGE_EXAMPLES.md\n 4. in the file ``docs/consistent_sampling_with_replacement.pdf``\n (This file will soon appear on arXiv as well.)\n\nThis code has been packaged and uploaded to PyPI. From python3 you can say\n\n from consistent_sampler import sampler\n\nand then say ``help(sampler)`` for more documentation, or run ``sampler``\nas in the above examples.\n\n## Efficiency\n\nThe bulk of the work is in computing the SHA256 hash function, which\ntakes about one microsecond per call on a typical laptop. Thus, the\nrunning time of ``sampler`` is proportional to the length if\n``id_list`` (to set up the priority queue) plus, if sampling is done\nwith replacement, the value of ``drop+take``, where the constant of\nproportionality is about 1 microsecond. \n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ron-rivest/consistent_sampler", "keywords": "", "license": "MIT License", "maintainer": "", "maintainer_email": "", "name": "consistent-sampler", "package_url": "https://pypi.org/project/consistent-sampler/", "platform": "", "project_url": "https://pypi.org/project/consistent-sampler/", "project_urls": { "Homepage": "https://github.com/ron-rivest/consistent_sampler" }, "release_url": "https://pypi.org/project/consistent-sampler/1.0.10/", "requires_dist": null, "requires_python": "", "summary": "Package for consistent sampling with or without replacement.", "version": "1.0.10" }, "last_serial": 4220045, "releases": { "1.0.10": [ { "comment_text": "", "digests": { "md5": "00ab2b0f535acce6c97536368290baee", "sha256": "80cd24a9bfb8089b929915805744a9c3604c97314de492599bd3be43252fa0a4" }, "downloads": -1, "filename": "consistent_sampler-1.0.10-py3-none-any.whl", "has_sig": false, "md5_digest": "00ab2b0f535acce6c97536368290baee", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14098, "upload_time": "2018-08-29T19:13:43", "url": "https://files.pythonhosted.org/packages/c4/ab/e074ffa44ff637542a72518936296b6a82e3a34b0e98750affe95ac07b5d/consistent_sampler-1.0.10-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "86a7943449f72746b5f3c8dfaf5b55d6", "sha256": "89dd222271fdf0bb941f5eccee75be39b2ac0c0205af7b7023a12bab99d38fba" }, "downloads": -1, "filename": "consistent_sampler-1.0.10.tar.gz", "has_sig": false, "md5_digest": "86a7943449f72746b5f3c8dfaf5b55d6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13811, "upload_time": "2018-08-29T19:13:45", "url": "https://files.pythonhosted.org/packages/1d/8e/3571bb805337f61d7976567d19208006fcea6867416ccc4e95584e640de5/consistent_sampler-1.0.10.tar.gz" } ], "1.0.6": [ { "comment_text": "", "digests": { "md5": "6b37a7d4f464c48f1547d1d01e89b5b9", "sha256": "98177d03b47b6e69e94e6ed6254457f61ed320527c17628d1e37f3390c36660d" }, "downloads": -1, "filename": "consistent_sampler-1.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "6b37a7d4f464c48f1547d1d01e89b5b9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11280, "upload_time": "2018-08-21T00:31:53", "url": "https://files.pythonhosted.org/packages/da/e4/182272cc27171ec6702717d6d3809a52a5bbdee53571ad336dec9bdeb5cf/consistent_sampler-1.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1374e32760f0a9e1b1bf156772d828a8", "sha256": "35522c8ee791bad0787b23f116a0ce7b13a85343142170b6ef01807a0c5196e4" }, "downloads": -1, "filename": "consistent_sampler-1.0.6.tar.gz", "has_sig": false, "md5_digest": "1374e32760f0a9e1b1bf156772d828a8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11622, "upload_time": "2018-08-21T00:31:54", "url": "https://files.pythonhosted.org/packages/f7/93/101b098d9226984f10cd30cb2a07c4c20d59ba63b98a143b94512378b7ca/consistent_sampler-1.0.6.tar.gz" } ], "1.0.7": [ { "comment_text": "", "digests": { "md5": "d43b95ca98866c43fb71065f822154b7", "sha256": "34694758245749474881fba3e9fb70a58831b7776f5b41bf47e54d55ca433a14" }, "downloads": -1, "filename": "consistent_sampler-1.0.7-py3-none-any.whl", "has_sig": false, "md5_digest": "d43b95ca98866c43fb71065f822154b7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 2564, "upload_time": "2018-08-21T21:39:58", "url": "https://files.pythonhosted.org/packages/2c/2f/528836ccd79466415a6bc183813b1cf20954543c2f92914edab42d9f98cb/consistent_sampler-1.0.7-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d3d075b6bbff9c0be8989d0231d6d2dd", "sha256": "cdd436e7f9dffe76d1fb3bbbb51bb806a5d159f2a36ef7bb4b53604be95d0f5a" }, "downloads": -1, "filename": "consistent_sampler-1.0.7.tar.gz", "has_sig": false, "md5_digest": "d3d075b6bbff9c0be8989d0231d6d2dd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2515, "upload_time": "2018-08-21T21:40:00", "url": "https://files.pythonhosted.org/packages/72/db/fe7fea48f0e24244efb2d72785e02cdd4c974be0242329c28d53f9b0e127/consistent_sampler-1.0.7.tar.gz" } ], "1.0.8": [ { "comment_text": "", "digests": { "md5": "662cf68f9306bf01f0f3c811fcaf0fba", "sha256": "d36dfebeaa342e970e27c61365a89b0d5742bfa64514477f7b42be759a31d2d5" }, "downloads": -1, "filename": "consistent_sampler-1.0.8-py3-none-any.whl", "has_sig": false, "md5_digest": "662cf68f9306bf01f0f3c811fcaf0fba", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11300, "upload_time": "2018-08-22T22:12:26", "url": "https://files.pythonhosted.org/packages/19/6c/ab54e1870c022a21f8579418a67dba2f58d2bc36c4f24b4d57285766743c/consistent_sampler-1.0.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "acf42ef266709416ebcae0500b964ea5", "sha256": "d329f030326e4b84ec75357cf4bcaefa5305e202ee7121b594a538ba63bf86cc" }, "downloads": -1, "filename": "consistent_sampler-1.0.8.tar.gz", "has_sig": false, "md5_digest": "acf42ef266709416ebcae0500b964ea5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11628, "upload_time": "2018-08-22T22:12:27", "url": "https://files.pythonhosted.org/packages/d2/57/7d8f42b359331066b1a1dd08dd46ed342f687b351b1cb19595291a662688/consistent_sampler-1.0.8.tar.gz" } ], "1.0.9": [ { "comment_text": "", "digests": { "md5": "c08a3e9b55aad339c74f10d3729f4eab", "sha256": "841fc386144aa99c88298ef8bb64eaf80afbda5e346d1b33ecbf32bb7758c54d" }, "downloads": -1, "filename": "consistent_sampler-1.0.9-py3-none-any.whl", "has_sig": false, "md5_digest": "c08a3e9b55aad339c74f10d3729f4eab", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11941, "upload_time": "2018-08-22T23:10:52", "url": "https://files.pythonhosted.org/packages/7c/8e/431c3b61e5d898c914db4634fd67f935a348afd4bb1f5f3f8347624c3671/consistent_sampler-1.0.9-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "04064b5011e19970da1932146e09a7e5", "sha256": "dbe9fdb0ccf44063bcbe6927dd2ca39696998d7aab68ca088061dd7359fdb828" }, "downloads": -1, "filename": "consistent_sampler-1.0.9.tar.gz", "has_sig": false, "md5_digest": "04064b5011e19970da1932146e09a7e5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12843, "upload_time": "2018-08-22T23:10:53", "url": "https://files.pythonhosted.org/packages/8b/d9/40cc15c557d788f4e2faf2a0756d82e515a9144f5eb83ed6a042b75feff7/consistent_sampler-1.0.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "00ab2b0f535acce6c97536368290baee", "sha256": "80cd24a9bfb8089b929915805744a9c3604c97314de492599bd3be43252fa0a4" }, "downloads": -1, "filename": "consistent_sampler-1.0.10-py3-none-any.whl", "has_sig": false, "md5_digest": "00ab2b0f535acce6c97536368290baee", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14098, "upload_time": "2018-08-29T19:13:43", "url": "https://files.pythonhosted.org/packages/c4/ab/e074ffa44ff637542a72518936296b6a82e3a34b0e98750affe95ac07b5d/consistent_sampler-1.0.10-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "86a7943449f72746b5f3c8dfaf5b55d6", "sha256": "89dd222271fdf0bb941f5eccee75be39b2ac0c0205af7b7023a12bab99d38fba" }, "downloads": -1, "filename": "consistent_sampler-1.0.10.tar.gz", "has_sig": false, "md5_digest": "86a7943449f72746b5f3c8dfaf5b55d6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13811, "upload_time": "2018-08-29T19:13:45", "url": "https://files.pythonhosted.org/packages/1d/8e/3571bb805337f61d7976567d19208006fcea6867416ccc4e95584e640de5/consistent_sampler-1.0.10.tar.gz" } ] }