{ "info": { "author": "Nikita Savelyev", "author_email": "n.a.savelyev@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering", "Topic :: Software Development" ], "description": "Datapot\n=======\n\n|Build Status| *Open source tool for machine learning on semi-structured\ndata that creates numeric object-feature matrix from JSON. The idea of\nDatapot is to make the process of data preparation and feature\nextraction automatic, easy and effective.*\n\nUsage\n-----\n\n**Install Datapot:**\n\n.. code:: bash\n\n $ git clone https://github.com/bashalex/datapot.git\n $ cd datapot\n $ pip install .\n\nTo **create a Datapot** object simply write the following:\n\n.. code:: python\n\n >>> import datapot as dp \n >>> data = dp.DataPot()\n\nDataPot has two main methods:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- fit()\n- transform()\n\nMethod ``fit(self, data, limit)`` goes through the first N objects (N =\nlimit), passes the possible features to Transformers. Each Transformer\nevaluates if a feature from current field or a number of fields can be\ncreated. As a result a dict of features and Transformers is created.\n\nTo apply ``fit()`` to JSON file:\n\n.. code:: python\n\n >>> f = open('data/matches_test.jsonlines', 'r')\n >>> data.fit(f, limit=100)\n >>> data\n DataPot class instance\n - number of features without transformation: 806\n - number of new features: 315\n features to transform: \n (u'players.0.gold_t', [ComplexTransformer])\n (u'picks_bans.0.is_pick', [BoolToIntTransformer])\n (u'players.0.kills_log.0.unit', [TfidfTransformer])\n (u'players.1.xp_t', [ComplexTransformer])\n (u'picks_bans.1.is_pick', [BoolToIntTransformer])\n (u'players.1.kills_log.0.unit', [TfidfTransformer])\n ...\n\nMethod ``transform(self, data, verbose)`` generates a pandas. DataFrame\nwith new features that were detected on the fit() call. If parameter\nverbose is true, progress description is printed during the feature\nextraction.\n\n.. code:: python\n\n >>> df = data.transform(f, verbose=False)\n fit transformers...OK\n num of new features: 315\n\nExamples\n--------\n\nLook for `more examples `__ of using Datapot with\ndifferent datasets and more Transformer specific.\n\nFeatures\n--------\n\nDatapot provides many ways of extracting features from JSON-s.\n\nData types that can be processed: - Boolean - Numerical array (transform\narray to their sum divided by average length of array in training set) -\nTime series (\u0441alculate descriptive statistical properties of a given\ntime series) - Timestamp (date, time, day of week, day of month etc.) -\nText (bag of words tf-idf, word2vec) - Categorial (one-hot encoding,\ndimension reduction)\n\nAuthors\n-------\n\n- Alex Bash\n- Yuriy Mokriy\n- Nikita Savelyev\n- Michal Rozenwald\n- Peter Romov\n\nDatapot is a course work project of `the Faculty of Computer\nScience `__ of `the Higher School of\nEconomics `__.\n\n.. |Build Status| image:: https://travis-ci.org/bashalex/datapot.svg?branch=master\n :target: https://travis-ci.org/bashalex/datapot", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/bashalex/datapot", "keywords": null, "license": "GNU v3.0", "maintainer": null, "maintainer_email": null, "name": "datapot", "package_url": "https://pypi.org/project/datapot/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/datapot/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/bashalex/datapot" }, "release_url": "https://pypi.org/project/datapot/0.1.3/", "requires_dist": null, "requires_python": null, "summary": "Library for automatic feature extraction from JSON-datasets", "version": "0.1.3" }, "last_serial": 2910738, "releases": { "0.1": [], "0.1.2": [ { "comment_text": "", "digests": { "md5": "63eed56ddbe05e994ad9d92d26a6105b", "sha256": "3eb99671d5979dc11e7c4dee6556aacc1939a585b809e2ae6483364dd6b48c09" }, "downloads": -1, "filename": "datapot-0.1.2.tar.gz", "has_sig": false, "md5_digest": "63eed56ddbe05e994ad9d92d26a6105b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15243, "upload_time": "2017-05-30T20:09:15", "url": "https://files.pythonhosted.org/packages/95/00/820591223fcf5c1d4aaebe973d38cd8d58e982cc91669d8b22a5e077b5ee/datapot-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "e92a2b2a3aa36d3650c5e98123a1bd96", "sha256": "06696991b3477e4af9ae8753621607ac9015e6361eb0dcb2ce431e06863158ba" }, "downloads": -1, "filename": "datapot-0.1.3.tar.gz", "has_sig": false, "md5_digest": "e92a2b2a3aa36d3650c5e98123a1bd96", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15247, "upload_time": "2017-05-30T20:17:19", "url": "https://files.pythonhosted.org/packages/3c/5b/639e1534777e934cbb4fdec8abcdfbeb5a17aee045b6d29b7207a637367d/datapot-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e92a2b2a3aa36d3650c5e98123a1bd96", "sha256": "06696991b3477e4af9ae8753621607ac9015e6361eb0dcb2ce431e06863158ba" }, "downloads": -1, "filename": "datapot-0.1.3.tar.gz", "has_sig": false, "md5_digest": "e92a2b2a3aa36d3650c5e98123a1bd96", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15247, "upload_time": "2017-05-30T20:17:19", "url": "https://files.pythonhosted.org/packages/3c/5b/639e1534777e934cbb4fdec8abcdfbeb5a17aee045b6d29b7207a637367d/datapot-0.1.3.tar.gz" } ] }