{
"info": {
"author": "Nikita Savelyev",
"author_email": "n.a.savelyev@gmail.com",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"Programming Language :: Python :: 2",
"Programming Language :: Python :: 3",
"Topic :: Scientific/Engineering",
"Topic :: Software Development"
],
"description": "Datapot\n=======\n\n|Build Status| *Open source tool for machine learning on semi-structured\ndata that creates numeric object-feature matrix from JSON. The idea of\nDatapot is to make the process of data preparation and feature\nextraction automatic, easy and effective.*\n\nUsage\n-----\n\n**Install Datapot:**\n\n.. code:: bash\n\n $ git clone https://github.com/bashalex/datapot.git\n $ cd datapot\n $ pip install .\n\nTo **create a Datapot** object simply write the following:\n\n.. code:: python\n\n >>> import datapot as dp \n >>> data = dp.DataPot()\n\nDataPot has two main methods:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- fit()\n- transform()\n\nMethod ``fit(self, data, limit)`` goes through the first N objects (N =\nlimit), passes the possible features to Transformers. Each Transformer\nevaluates if a feature from current field or a number of fields can be\ncreated. As a result a dict of features and Transformers is created.\n\nTo apply ``fit()`` to JSON file:\n\n.. code:: python\n\n >>> f = open('data/matches_test.jsonlines', 'r')\n >>> data.fit(f, limit=100)\n >>> data\n DataPot class instance\n - number of features without transformation: 806\n - number of new features: 315\n features to transform: \n (u'players.0.gold_t', [ComplexTransformer])\n (u'picks_bans.0.is_pick', [BoolToIntTransformer])\n (u'players.0.kills_log.0.unit', [TfidfTransformer])\n (u'players.1.xp_t', [ComplexTransformer])\n (u'picks_bans.1.is_pick', [BoolToIntTransformer])\n (u'players.1.kills_log.0.unit', [TfidfTransformer])\n ...\n\nMethod ``transform(self, data, verbose)`` generates a pandas. DataFrame\nwith new features that were detected on the fit() call. If parameter\nverbose is true, progress description is printed during the feature\nextraction.\n\n.. code:: python\n\n >>> df = data.transform(f, verbose=False)\n fit transformers...OK\n num of new features: 315\n\nExamples\n--------\n\nLook for `more examples `__ of using Datapot with\ndifferent datasets and more Transformer specific.\n\nFeatures\n--------\n\nDatapot provides many ways of extracting features from JSON-s.\n\nData types that can be processed: - Boolean - Numerical array (transform\narray to their sum divided by average length of array in training set) -\nTime series (\u0441alculate descriptive statistical properties of a given\ntime series) - Timestamp (date, time, day of week, day of month etc.) -\nText (bag of words tf-idf, word2vec) - Categorial (one-hot encoding,\ndimension reduction)\n\nAuthors\n-------\n\n- Alex Bash\n- Yuriy Mokriy\n- Nikita Savelyev\n- Michal Rozenwald\n- Peter Romov\n\nDatapot is a course work project of `the Faculty of Computer\nScience `__ of `the Higher School of\nEconomics `__.\n\n.. |Build Status| image:: https://travis-ci.org/bashalex/datapot.svg?branch=master\n :target: https://travis-ci.org/bashalex/datapot",
"description_content_type": null,
"docs_url": null,
"download_url": "UNKNOWN",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/bashalex/datapot",
"keywords": null,
"license": "GNU v3.0",
"maintainer": null,
"maintainer_email": null,
"name": "datapot",
"package_url": "https://pypi.org/project/datapot/",
"platform": "UNKNOWN",
"project_url": "https://pypi.org/project/datapot/",
"project_urls": {
"Download": "UNKNOWN",
"Homepage": "https://github.com/bashalex/datapot"
},
"release_url": "https://pypi.org/project/datapot/0.1.3/",
"requires_dist": null,
"requires_python": null,
"summary": "Library for automatic feature extraction from JSON-datasets",
"version": "0.1.3"
},
"last_serial": 2910738,
"releases": {
"0.1": [],
"0.1.2": [
{
"comment_text": "",
"digests": {
"md5": "63eed56ddbe05e994ad9d92d26a6105b",
"sha256": "3eb99671d5979dc11e7c4dee6556aacc1939a585b809e2ae6483364dd6b48c09"
},
"downloads": -1,
"filename": "datapot-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "63eed56ddbe05e994ad9d92d26a6105b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 15243,
"upload_time": "2017-05-30T20:09:15",
"url": "https://files.pythonhosted.org/packages/95/00/820591223fcf5c1d4aaebe973d38cd8d58e982cc91669d8b22a5e077b5ee/datapot-0.1.2.tar.gz"
}
],
"0.1.3": [
{
"comment_text": "",
"digests": {
"md5": "e92a2b2a3aa36d3650c5e98123a1bd96",
"sha256": "06696991b3477e4af9ae8753621607ac9015e6361eb0dcb2ce431e06863158ba"
},
"downloads": -1,
"filename": "datapot-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "e92a2b2a3aa36d3650c5e98123a1bd96",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 15247,
"upload_time": "2017-05-30T20:17:19",
"url": "https://files.pythonhosted.org/packages/3c/5b/639e1534777e934cbb4fdec8abcdfbeb5a17aee045b6d29b7207a637367d/datapot-0.1.3.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "e92a2b2a3aa36d3650c5e98123a1bd96",
"sha256": "06696991b3477e4af9ae8753621607ac9015e6361eb0dcb2ce431e06863158ba"
},
"downloads": -1,
"filename": "datapot-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "e92a2b2a3aa36d3650c5e98123a1bd96",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 15247,
"upload_time": "2017-05-30T20:17:19",
"url": "https://files.pythonhosted.org/packages/3c/5b/639e1534777e934cbb4fdec8abcdfbeb5a17aee045b6d29b7207a637367d/datapot-0.1.3.tar.gz"
}
]
}