{ "info": { "author": "Meng Pan", "author_email": "meng.pan95@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Software Development :: Libraries" ], "description": "## FeatureTools for Spark (featuretools4s)\n\n### 1. What's FeatureTools? \nFeatureTools is a Python library open-sourced by MIT's \nFeatureLab aiming to automate \nthe process of feature engineering in Machine Learning \napplications. \n\nPlease visit the [official website](https://docs.featuretools.com/index.html)\nfor more details about FeatureTools. \n\n*FeatureTools4S* is a Python library written by me aiming to scale \nFeatureTools with **Spark**, making it capable of generating \nfeatures for billions of rows of data, which is usually \nconsidered impossible to process on single machine using \noriginal FeatureTools library with Pandas. \n\n*FeatureTools4S* provides **almost the same** API as original \nFeatureTools, which make its users completely free of transferring \nbetween FeatureTools and FeatureTools4S. **Hence we suggest the readers \nfirst to learn FeatureTools and then you can easily work on FeatureTools4S.**\n\n### 2. How to use FeatureTools4S? \nFirst install *featuretools4s* through pip: \n```bash\npip3 install featuretools4s \n``` \n\nThen a simple example of using *featuretools4s* is as follows: \n```python\nimport featuretools4s as fts\nfrom pyspark.sql import SparkSession\n\nimport os\nimport pandas as pd\n\nos.environ[\"SPARK_HOME\"] = \"C:\\Python36\\Lib\\site-packages\\pyspark\"\nos.environ[\"PATH\"] = \"C:\\Python36;\" + os.environ[\"PATH\"]\npd.set_option('display.expand_frame_repr', False)\nspark = SparkSession.builder.master(\"local[*]\").getOrCreate()\n\norder_df = spark.read.csv(\"resources/order.csv\", header=True, inferSchema=True).sort(\"sales_tax\")\ncustomer_df = spark.read.csv(\"resources/customer.csv\", header=True, inferSchema=True)\n\nes = fts.EntitySetSpark(id=\"test\")\nes.entity_from_dataframe(\"order\", order_df, index=\"order_num\", time_index=\"wo_timestamp\")\nes.entity_from_dataframe(\"customer\", customer_df, index=\"cust_num\")\nes.add_relationship(fts.Relationship(es[\"customer\"][\"cust_num\"], es[\"order\"][\"cust_num\"]))\n\nfeatures = fts.dfs(spark, entityset=es, target_entity=\"customer\", primary_col=\"cust_num\", num_partition=5)\nfeatures.show()\n```", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/pan5431333/featuretools4s", "keywords": "", "license": "BSD License", "maintainer": "Meng Pan", "maintainer_email": "meng.pan95@gmail.com", "name": "featuretools4s", "package_url": "https://pypi.org/project/featuretools4s/", "platform": "all", "project_url": "https://pypi.org/project/featuretools4s/", "project_urls": { "Homepage": "https://github.com/pan5431333/featuretools4s" }, "release_url": "https://pypi.org/project/featuretools4s/0.1.5/", "requires_dist": null, "requires_python": "", "summary": "Run FeatureTools to automate Feature Engineering distributionally on Spark.", "version": "0.1.5" }, "last_serial": 4363199, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "f637ab2d2cb22e16cdf090c596b71b70", "sha256": "c22f8b6aef5ca89b1c761fefd72335031a003b49310c78bf2a0c87a70113a6db" }, "downloads": -1, "filename": "featuretools4s-0.1.tar.gz", "has_sig": false, "md5_digest": "f637ab2d2cb22e16cdf090c596b71b70", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1689, "upload_time": "2018-10-10T08:08:43", "url": "https://files.pythonhosted.org/packages/52/3d/e884564ff4828bcdb9a46fe9c346bc73403ab756a9818e280488f4b7dcab/featuretools4s-0.1.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "06307e066f0b34754376226b939a9b44", "sha256": "396b32fc003cdcf7447f8c0878d02e5068e6ace30eb74e3a361073f1e12a021b" }, "downloads": -1, "filename": "featuretools4s-0.1.1.tar.gz", "has_sig": false, "md5_digest": "06307e066f0b34754376226b939a9b44", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2332, "upload_time": "2018-10-10T08:42:09", "url": "https://files.pythonhosted.org/packages/7c/f0/1546dcfef826248a1f65e2dde58234514cc93e8463f3e7dcd5ac656a0abe/featuretools4s-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "16a7ca9e3b169bffdf9c240f4672f871", "sha256": "4c63f6bb315a0627e3fb4018f2ba4fe0d2b15ca06d2a8480b8307fc6d10ac601" }, "downloads": -1, "filename": "featuretools4s-0.1.2.tar.gz", "has_sig": false, "md5_digest": "16a7ca9e3b169bffdf9c240f4672f871", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4706, "upload_time": "2018-10-10T08:56:20", "url": "https://files.pythonhosted.org/packages/8a/81/e57eb621568363578bbe8dedfb8c21ed5c51fcf8605ecb6f39201757a527/featuretools4s-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "93f3f26046cae4cac7821dde19f68372", "sha256": "695b03e830662184c19fac6299e6fb46560f3eff2ac72ba253e6afe45d9624a7" }, "downloads": -1, "filename": "featuretools4s-0.1.3.tar.gz", "has_sig": false, "md5_digest": "93f3f26046cae4cac7821dde19f68372", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4727, "upload_time": "2018-10-10T09:00:31", "url": "https://files.pythonhosted.org/packages/2c/80/fa4962d3ccf6b423a079c81f2fba4f6565057358894db930f6e1287e6bb2/featuretools4s-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "ab208a3469566ca0b4d22e6b8af73a39", "sha256": "893802b299d9849307e3f09d9381b202fde4896ecb1d2f04b26f2083bc17c382" }, "downloads": -1, "filename": "featuretools4s-0.1.4.tar.gz", "has_sig": false, "md5_digest": "ab208a3469566ca0b4d22e6b8af73a39", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4731, "upload_time": "2018-10-10T09:06:32", "url": "https://files.pythonhosted.org/packages/4e/2e/e15cb41ff39818601734f8ab57f3975c7c6e77677298efbf30b53aeb2a1b/featuretools4s-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "f1db433307473ff8a9e4131c61796903", "sha256": "676bffb6a7fa8d49dde81e2c3ce00fbf766a00f2fc7ace9218cc4c116a8194a5" }, "downloads": -1, "filename": "featuretools4s-0.1.5.tar.gz", "has_sig": false, "md5_digest": "f1db433307473ff8a9e4131c61796903", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4954, "upload_time": "2018-10-11T07:39:03", "url": "https://files.pythonhosted.org/packages/f8/62/7bc8a472ff1472924ba379092542c609871382280082011dda4c50d81d12/featuretools4s-0.1.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "f1db433307473ff8a9e4131c61796903", "sha256": "676bffb6a7fa8d49dde81e2c3ce00fbf766a00f2fc7ace9218cc4c116a8194a5" }, "downloads": -1, "filename": "featuretools4s-0.1.5.tar.gz", "has_sig": false, "md5_digest": "f1db433307473ff8a9e4131c61796903", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4954, "upload_time": "2018-10-11T07:39:03", "url": "https://files.pythonhosted.org/packages/f8/62/7bc8a472ff1472924ba379092542c609871382280082011dda4c50d81d12/featuretools4s-0.1.5.tar.gz" } ] }