{ "info": { "author": "Todd Wilson", "author_email": "todd@toddwilson.net", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Programming Language :: Python", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4" ], "description": "datagen: make sh[2] up\r\n======================\r\n\r\nDatagen helps you create sample delimited data using a simple schema format.\r\nIt runs on Python 2.6-3.4 and *particularly well* on PyPy.\r\n\r\nInstallation\r\n------------\r\n\r\n``pip install datagen``\r\n\r\nOr::\r\n\r\n $ git clone https://github.com/toddwilson/datagen.git\r\n $ cd datagen\r\n $ python setup.py install\r\n\r\nUsage\r\n-----\r\n\r\n usage: datagen [-h] [-d DELIMITER] [--with-header] -n NUM_ROWS -s SCHEMA [output]\r\n\r\n\r\n**1. Create a schema file**\r\n\r\n::\r\n\r\n $ cat > schema.txt <\r\n\r\n.. code-block:: python\r\n\r\n from random import uniform\r\n from datagen.types import reg_type\r\n from datagen import main\r\n\r\n\r\n @register_type(\"price\") # the decorator sets the name of the type\r\n def price(arg): # the method must accept one argument (even if not used)\r\n return round(uniform(0, 100), 2)\r\n\r\n\r\n if __name__ == '__main__':\r\n main()\r\n\r\n\r\n\r\n\r\n::\r\n\r\n item_id int[5]\r\n price price\r\n\r\n::\r\n\r\n $ python my_datagen.py -s schema.txt -n 3\r\n 41746|7.32\r\n 4077|40.55\r\n 12814|43.82\r\n\r\n\r\nAdding Arguments to Your Types\r\n++++++++++++++++++++++++++++++\r\n\r\n\r\n\r\n.. code-block:: python\r\n\r\n from random import uniform\r\n from datagen.types import register_type, type_arg\r\n from datagen import main\r\n\r\n\r\n @type_arg(\"price\") # Use the same name as the type defined in reg_type()\r\n def price_argument(arg): # This method is passed the contents of what's in price[]\r\n return int(arg) # This will get passed to price() when iterating\r\n\r\n\r\n @register_type(\"price\") # the decorator sets the name of the type\r\n def price(max_price): # the method must accept one argument (even if not used)\r\n return round(uniform(0, max_price), 2)\r\n\r\n\r\n if __name__ == '__main__':\r\n main()\r\n\r\n\r\n\r\n\r\n::\r\n\r\n item_id int[5]\r\n price price[10]\r\n\r\n::\r\n\r\n $ python my_datagen.py -s schema.txt -n 3\r\n 66995|5.08\r\n 5894|7.86\r\n 53659|9.26\r\n\r\n\r\nPerformance\r\n-----------\r\n\r\nIf you need datagen to write faster, use PyPy::\r\n\r\n $ time python my_datagen.py -s schema.txt -n 1000000 > test_data\r\n python my_datagen.py -s schema.txt -n 1000000 > test_data 7.87s user 0.07s system 99% cpu 7.950 total\r\n\r\n $ time pypy my_datagen.py -s schema.txt -n 1000000 > test_data\r\n pypy my_datagen.py -s schema.txt -n 1000000 > test_data 2.79s user 0.06s system 99% cpu 2.863 total", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/toddwilson/datagen/tarball/1.0.1", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/toddwilson/datagen", "keywords": "data generation,sample data,hadoop", "license": "Apache License 2.0", "maintainer": "", "maintainer_email": "", "name": "datagen", "package_url": "https://pypi.org/project/datagen/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/datagen/", "project_urls": { "Download": "https://github.com/toddwilson/datagen/tarball/1.0.1", "Homepage": "https://github.com/toddwilson/datagen" }, "release_url": "https://pypi.org/project/datagen/1.0.1/", "requires_dist": null, "requires_python": null, "summary": "Generate delimited sample data with a simple schema.", "version": "1.0.1" }, "last_serial": 1440294, "releases": { "1.0.1": [] }, "urls": [] }