{ "info": { "author": "Alan Kang", "author_email": "alankang@boxnwhis.kr", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy", "Topic :: Internet :: Log Analysis", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: System :: Logging", "Topic :: System :: Networking :: Monitoring", "Topic :: System :: Systems Administration", "Topic :: Text Processing :: Filters" ], "description": "csample: Sampling library for Python\n====================================\n\n|travismaster| |coverage|\n\n.. |travismaster| image:: https://travis-ci.org/box-and-whisker/csample.svg\n :target: http://travis-ci.org/box-and-whisker/csample\n\n.. |coverage| image:: https://img.shields.io/coveralls/box-and-whisker/csample.svg\n :target: https://coveralls.io/r/box-and-whisker/csample?branch=master\n\n``csample`` provides pseudo-random sampling methods applicable when the size\nof population is unknown:\n\n* Use hash-based sampling to fix sampling rate\n* Use reservoir sampling to fix sample size\n\nHash-based sampling\n===================\n\nHash-based sampling is a filtering method that tries to approximate random\nsampling by using a hash function as a selection criterion.\n\nFollowing list describes some features of the method:\n\n* Since there are no randomness involved at all, the same data set with the\n same sampling rate (and also with the same salt value) always yields\n exactly the same result.\n* The size of population doesn't need to be specified beforehand. It means\n that the sampling process can be applied to data stream with unknown size\n such as system logs.\n\nHere are some real and hypothetical applications:\n\n* `[RFC5475] Sampling and Filtering Techniques for IP Packet Selection `_\n is a well-known application.\n* Online streaming algorithm to select 10% of users for A/B testing.\n \"Consistent\" nature of the algorithm guarantees that any user ID selected\n once will always be selected again. There's no need to maintain a list of\n selected user IDs.\n\n``csample`` provides two sampling functions for a convenience.\n\n``sample_line()`` accepts iterable type containing strs::\n\n data = [\n 'alan',\n 'brad',\n 'cate',\n 'david',\n ]\n samples = csample.sample_line(data, 0.5)\n\n``sample_tuple()`` expects tuples instead of strs as a content of\niterable. The third argument 0 indicates a column index::\n\n data = [\n ('alan', 10, 5),\n ('brad', 53, 7),\n ('cate', 12, 6),\n ('david', 26, 5),\n ]\n samples = csample.sample_tuple(data, 0.5, 0)\n\nIn both cases, the function returns immediately with sampled iterable.\n\n\nReservoir sampling\n==================\n\nReservoir sampling is a family of randomized algorithms for randomly choosing\na sample of k items from a list S containing n items, where n is either a very\nlarge or unknown number.\n\nYou can specify random seed to perform reproducible sampling.\n\nFor more information, read `Wikipedia `_\n\n``csample`` provides single function for reservoir sampling::\n\n data = [\n 'alan',\n 'brad',\n 'cate',\n 'david',\n ]\n samples = csample.reservoir(data, 2)\n\nResulting ``samples`` contains two elements randomly choosen from given ``data``.\n\nNote that the function doesn't return a generator but list, and also won't\nfinish until it consume the entire input stream.\n\nAlso note that, by default, reservoir sampling doesn't preserve order of original\nlist which means that following assertion holds in general::\n\n population = [0, 1, 2, 3, 4, 5]\n samples = csample.reservoir(population, 3)\n assert sorted(samples) != samples\n\nTo maintain original order, provide ``keep_order=True`` parameter::\n\n population = [0, 1, 2, 3, 4, 5]\n samples = csample.reservoir(population, 3, keep_order=True)\n assert sorted(samples) == samples\n\n\nAPI documentation\n=================\n\nRead the `full API documentation. `_\n\n\nCommand-line interface\n======================\n\n``csample`` also provides command-line interface.\n\nFollowing command prints 50% sample from 100 integers::\n\n > seq 100 | csample -r 0.5\n\nTo see more options use ``--help`` command-line argument::\n\n > csample --help\n\n\nHash functions\n==============\n\nIn order to obtain fairly random/unbiased sample, it is critical to use suitable\nhash function.\n\nThere could be many criteria such as `avalanche effect `_.\nFor those who are interested, see link below:\n\n* `Empirical Evaluation of Hash Functions for Multipoint Measurements `_\n\nHash-based sampling implemented in ``csample`` currently supports `xxhash`_\nand `spooky`_.\n\n.. _xxhash: https://code.google.com/p/xxhash/\n.. _spooky: http://burtleburtle.net/bob/hash/spooky.html\n\n\nInstallation\n============\n\nInstalling csample is easy::\n\n pip install csample\n\nor download the source and run::\n\n python setup.py install", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/box-and-whisker/csample", "keywords": "hash filter sample log analysis streaming", "license": "MIT License", "maintainer": null, "maintainer_email": null, "name": "csample", "package_url": "https://pypi.org/project/csample/", "platform": "any", "project_url": "https://pypi.org/project/csample/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/box-and-whisker/csample" }, "release_url": "https://pypi.org/project/csample/0.6.2/", "requires_dist": null, "requires_python": null, "summary": "Sampling library for Python", "version": "0.6.2" }, "last_serial": 1782251, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "c58d366004fc200eff7c19b265dfb191", "sha256": "255a058d62c76951b3f5c466f647d6ef141b5cb39af2d54c8e52f6ba9a1ebea3" }, "downloads": -1, "filename": "csample-0.1.0.tar.gz", "has_sig": false, "md5_digest": "c58d366004fc200eff7c19b265dfb191", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3760, "upload_time": "2014-12-11T04:26:29", "url": "https://files.pythonhosted.org/packages/cf/ec/323336c20465537713328addb64a7b61c7cec43faf81305ee04b9e8a546e/csample-0.1.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "7d42ed32a801ddfc9d6b83a2450d573a", "sha256": "b144f7ba2824bb9ea476502e12f85d2baa63f444dd2626cfd677c1f82a842b9c" }, "downloads": -1, "filename": "csample-0.2.1.tar.gz", "has_sig": false, "md5_digest": "7d42ed32a801ddfc9d6b83a2450d573a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4347, "upload_time": "2014-12-12T21:40:09", "url": "https://files.pythonhosted.org/packages/64/e3/4e64f7e01ef292b409810c9a43fa9578f276e6b933444e634d1318d54877/csample-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "e1187465b54c92fd1b25843e16b49fe1", "sha256": "3cc084a96d70b6c58cd4675136a5603cb809421ba7eee6102b9bfef9b94da793" }, "downloads": -1, "filename": "csample-0.2.2.tar.gz", "has_sig": false, "md5_digest": "e1187465b54c92fd1b25843e16b49fe1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4377, "upload_time": "2014-12-13T00:19:13", "url": "https://files.pythonhosted.org/packages/b1/64/7bac1c7057f604ef6839f28c73c380d7f7e9bba87be85e5633c6e23c32b9/csample-0.2.2.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "66f888f34d60c84fb0cd4d8efc46cb80", "sha256": "2ba326f49c2d12e8eab08426c00b603e24beef99f09bae6be208fbc61ac02047" }, "downloads": -1, "filename": "csample-0.3.0.tar.gz", "has_sig": false, "md5_digest": "66f888f34d60c84fb0cd4d8efc46cb80", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5560, "upload_time": "2014-12-21T13:58:55", "url": "https://files.pythonhosted.org/packages/70/fb/396798211731399145ebf186ff1a9c85593edb92a80e5f18eb55c2fbacaf/csample-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "8bf052ecb9641dc9fb317de003825ba7", "sha256": "40cae828cba1dd5665e8b01f63e506dab93a3aee870084f707c0260b359de7b8" }, "downloads": -1, "filename": "csample-0.3.1.tar.gz", "has_sig": false, "md5_digest": "8bf052ecb9641dc9fb317de003825ba7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5560, "upload_time": "2014-12-21T14:43:43", "url": "https://files.pythonhosted.org/packages/c0/1c/4a025eb922fb85ff6258937fca3bf353ef93ba94ada20add67352b1cd6aa/csample-0.3.1.tar.gz" } ], "0.3.2": [ { "comment_text": "", "digests": { "md5": "c1575e6837a154e597e1d4fc3f37cf16", "sha256": "820e843f94b9847a27dde150c183155d858fe6188e324fe47d53c0683d95ff03" }, "downloads": -1, "filename": "csample-0.3.2.tar.gz", "has_sig": false, "md5_digest": "c1575e6837a154e597e1d4fc3f37cf16", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5565, "upload_time": "2014-12-21T14:48:29", "url": "https://files.pythonhosted.org/packages/c6/fa/44afe995e6e22a82487d604f3911f3b653798d2167821ab7a11136ba5247/csample-0.3.2.tar.gz" } ], "0.3.3": [ { "comment_text": "", "digests": { "md5": "e8d34399961ebf635799dcb37fc477f8", "sha256": "c92d007816008be6e9cf8bc84950e23090acdba43637caacf85f5bd275391d51" }, "downloads": -1, "filename": "csample-0.3.3.tar.gz", "has_sig": false, "md5_digest": "e8d34399961ebf635799dcb37fc477f8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6079, "upload_time": "2015-01-05T03:02:52", "url": "https://files.pythonhosted.org/packages/76/f2/6e6c54d4aa1b79aa364845327f02817819c269ce5316092a96727bf00666/csample-0.3.3.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "9c314fa709e03d01fcd16dc05cfbd34a", "sha256": "aac7ada2f2d44988f8f68b0ec8b67334b8480604b7d98f1ba14ed1b03244901d" }, "downloads": -1, "filename": "csample-0.4.0.tar.gz", "has_sig": false, "md5_digest": "9c314fa709e03d01fcd16dc05cfbd34a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6368, "upload_time": "2015-01-23T01:06:30", "url": "https://files.pythonhosted.org/packages/45/33/3ad9bc4ccd569062544703cacd874ab3bb5844c3c1920b32156bfc828c76/csample-0.4.0.tar.gz" } ], "0.5.0": [ { "comment_text": "", "digests": { "md5": "ab63a81b3098f7b89028fe610c1a7c04", "sha256": "1b3d6345144eb32c7adf51dc229e28b458bed30c0f0eb68ffda09b0b02ed1c14" }, "downloads": -1, "filename": "csample-0.5.0.tar.gz", "has_sig": false, "md5_digest": "ab63a81b3098f7b89028fe610c1a7c04", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6567, "upload_time": "2015-01-24T02:10:11", "url": "https://files.pythonhosted.org/packages/0b/84/595a3b84a69cef8108b26e6d93637642da41f65ae5cfba902727fa3dc04d/csample-0.5.0.tar.gz" } ], "0.6.0": [ { "comment_text": "", "digests": { "md5": "c8b318e4abf1185280d58e2547faab41", "sha256": "1a72d33af30f37fe073885807527b1ef3da06c9e8ab67d6988bbaa615dcfb54e" }, "downloads": -1, "filename": "csample-0.6.0.tar.gz", "has_sig": false, "md5_digest": "c8b318e4abf1185280d58e2547faab41", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6623, "upload_time": "2015-01-24T02:23:01", "url": "https://files.pythonhosted.org/packages/1e/b8/a55f2567685faeeaca2fddce9b02d58e3db3120181e9e9197c32f19e18a5/csample-0.6.0.tar.gz" } ], "0.6.1": [ { "comment_text": "", "digests": { "md5": "7840d8a48275452c05a737a04a4275c8", "sha256": "00dab7ad7dd267ce71fc4b2a78a32e09d1e33ed7a2d34337e77ca4a86686cfbd" }, "downloads": -1, "filename": "csample-0.6.1.tar.gz", "has_sig": false, "md5_digest": "7840d8a48275452c05a737a04a4275c8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6655, "upload_time": "2015-08-21T05:22:25", "url": "https://files.pythonhosted.org/packages/b1/41/837aa835864a9dcc5cca0980011cbca1f1740077fcddb5cd67793247c008/csample-0.6.1.tar.gz" } ], "0.6.2": [ { "comment_text": "", "digests": { "md5": "01c978b8815aa0ff1364281a65eb6d66", "sha256": "afabdc064d52b0df644ff20ecc358d16ec48595322388a7da92da7fe6efda215" }, "downloads": -1, "filename": "csample-0.6.2.tar.gz", "has_sig": false, "md5_digest": "01c978b8815aa0ff1364281a65eb6d66", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6665, "upload_time": "2015-10-23T01:15:13", "url": "https://files.pythonhosted.org/packages/3d/9e/e6344a1dd98dda51d10593896672d8dc396b79d9251644793c7b0f337ad2/csample-0.6.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "01c978b8815aa0ff1364281a65eb6d66", "sha256": "afabdc064d52b0df644ff20ecc358d16ec48595322388a7da92da7fe6efda215" }, "downloads": -1, "filename": "csample-0.6.2.tar.gz", "has_sig": false, "md5_digest": "01c978b8815aa0ff1364281a65eb6d66", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6665, "upload_time": "2015-10-23T01:15:13", "url": "https://files.pythonhosted.org/packages/3d/9e/e6344a1dd98dda51d10593896672d8dc396b79d9251644793c7b0f337ad2/csample-0.6.2.tar.gz" } ] }