{ "info": { "author": "giantcroc", "author_email": "1204449533@qq.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "# featuretoolsOnSpark\n[Featuretools](https://github.com/Featuretools/featuretools) is a python library for automated feature engineering.\n\nThis repo is a simplified version of featuretools,using automatic feature generation framework of featuretools.Instead of the fussy back-end architecture of featuretools,We mainly use [Spark DataFrame](http://spark.apache.org/docs/latest/api/python/index.html#) to achieve faster feature generation process(speed up 10x+).\n\n## Installation\nInstall with pip\n\n\tpip install featuretoolsOnSpark\nInstall from source\n\n\tgit clone https://github.com/giantcroc/featuretoolsOnSpark.git\n\tcd featuretoolsOnSpark\n\tpython setup.py install\n\n## Example\nBelow is an example of how to use apis of this repo.We Choose the dataset from Kaggle's competition([Home-Credit-Default-Risk](https://www.kaggle.com/c/home-credit-default-risk/data)).The relationships between tables are shown in the picture below.\n\n
\n
\n