{ "info": { "author": "Christopher Moody", "author_email": "chrisemoody@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Flexi Hash Embeddings\n\nThis PyTorch Module hashes and sums variably-sized dictionaries of \nfeatures into a single fixed-size embedding. \nFeature keys are hashed, which\nis ideal for streaming contexts and online-learning\nsuch that we don't have to memorize a\nmapping between feature keys and indices.\nMultiple variable-length features are grouped by example\nand then summed. Feature embeddings are scaled by their\nvalues, enabling linear features rather than just one-hot\nfeatures.\n\nUses the wonderful [torch scatter](https://github.com/rusty1s/pytorch_scatter)\nlibrary and sklearn's feature hashing under the hood.\n\nSo for example:\n\n```python\n>>> from flexi_hash_embedding import FlexiHashEmbedding\n>>> X = [{'dog': 1, 'cat':2, 'elephant':4},\n {'dog': 2, 'run': 5}]\n>>> embed = FlexiHashEmbedding(dim=5)\n>>> embed(X)\ntensor([[ 1.0753, -5.3999, 2.6890, 2.4224, -2.8352],\n [ 2.9265, 5.1440, -4.1737, -12.3070, -8.2725]],\n grad_fn=)\n```\n\n## Example\nFrequently, we'll have data for a single event or user\nthat's, for example, in a JSON blob format which lends itself\nto features which may be missing, incomplete or never before seen.\nFurthermore, there may be a variable number of features defined.\nThis use case is ideal for feature hashing and for groupby summing\nof feature embeddings.\n\nIn this example we have a total of six features across our whole dataset,\nbut we compute three vectors, one for every input row:\n\n![img](https://i.imgur.com/OBkiM7T.png)\n\n\nIn the example above we have a total of six features but they're\nspread out across three clients. The first client has three active\nfeatures, the second client two features \n(and only one feature that overlaps with the first client)\nand the third client has a single feature active.\n`Flexi Hash Embeddings` returns three vectors, one for each client,\nand not six vectors even though there are six features present.\nThe first client's vector is a sum of three feature vectors \n(plus_flag, age_under, luxe_flag) \nwhile the second client's vector is a sum of just two feature vectors\n(lppish, luxe_flag)\nand the third client's vector is just a single feature.\n\n## Speed\n\nOnline feature hasing and groupby are relatively fast. \nFor a large batchsize of 4096 with on average 5 features per row\nequals 20,000 total features. This module will hash that many features\nin about 20ms on a modern MacBook Pro.\n\n## Installation\n\nInstall from PyPi do `pip install flexi-hash-embedding`\n\nInstall locally by doing `git@github.com:cemoody/flexi_hash_embedding.git`.\n\n## Testing\n\n```bash\n>>> pip install -e .\n>>> py.test\n```\n\nTo publish a new package version:\n\n```bash\npython3 setup.py sdist bdist_wheel\ntwine upload dist/*\npip install --index-url https://pypi.org/simple/ --no-deps flexi_hash_embedding\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://github.com/cemoody/flexi_hash_embedding/archive/0.0.2.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/cemoody/flexi_hash_embedding", "keywords": "pytorch,scatter,groupby,embedding,hashing,variable,fixed", "license": "", "maintainer": "", "maintainer_email": "", "name": "flexi-hash-embedding", "package_url": "https://pypi.org/project/flexi-hash-embedding/", "platform": "", "project_url": "https://pypi.org/project/flexi-hash-embedding/", "project_urls": { "Download": "https://github.com/cemoody/flexi_hash_embedding/archive/0.0.2.tar.gz", "Homepage": "https://github.com/cemoody/flexi_hash_embedding" }, "release_url": "https://pypi.org/project/flexi-hash-embedding/0.0.2/", "requires_dist": [ "torch", "torch-scatter", "sklearn" ], "requires_python": "", "summary": "PyTorch Extension Library of Optimized Scatter Operations", "version": "0.0.2" }, "last_serial": 5710743, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "c412b02e2cb2e272bc116b5715a34277", "sha256": "c9138e7fc08b509609593249101b40f385dbe150d397536602001a1c712270d8" }, "downloads": -1, "filename": "flexi_hash_embedding-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "c412b02e2cb2e272bc116b5715a34277", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 4327, "upload_time": "2019-08-21T16:22:14", "url": "https://files.pythonhosted.org/packages/ad/1f/9e834e798b0998c380186edd50a6bea7c61789ef712e47e1df1578ca4be6/flexi_hash_embedding-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c84a1f943452c8d11e834dd6217bb3ab", "sha256": "be123e20e3caf229898c4f9c8aeb46a146a669a8190d4523986f52a64f95bf5b" }, "downloads": -1, "filename": "flexi_hash_embedding-0.0.1.tar.gz", "has_sig": false, "md5_digest": "c84a1f943452c8d11e834dd6217bb3ab", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3848, "upload_time": "2019-08-21T16:22:16", "url": "https://files.pythonhosted.org/packages/93/33/5bc49df11ecf5d4482a364dfe9558e90c732171b538250ae14f33451e939/flexi_hash_embedding-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "2f176a5a4cfe365d34d3c997a5db1acb", "sha256": "adf7a849d365dfd3020ac141c8242e10c747772b875963b65b36d274103bc7ce" }, "downloads": -1, "filename": "flexi_hash_embedding-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "2f176a5a4cfe365d34d3c997a5db1acb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5079, "upload_time": "2019-08-21T17:04:08", "url": "https://files.pythonhosted.org/packages/35/ef/a3cd7ab618e11cfa3e76e9f24b1e50e7b5f3fa753d114a0e6bc8c91539af/flexi_hash_embedding-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3241556f6c25501351b4d1bfafcb6daa", "sha256": "096f4f76ce3e78474e74d9fc57495f2dfa8fa53143d4a0f1ac55f00d97490e14" }, "downloads": -1, "filename": "flexi_hash_embedding-0.0.2.tar.gz", "has_sig": false, "md5_digest": "3241556f6c25501351b4d1bfafcb6daa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4785, "upload_time": "2019-08-21T17:04:09", "url": "https://files.pythonhosted.org/packages/5f/fc/46992faf2f9ca79166f3c2ad7f90372e57391b34759e1dddb2f22dc3931e/flexi_hash_embedding-0.0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2f176a5a4cfe365d34d3c997a5db1acb", "sha256": "adf7a849d365dfd3020ac141c8242e10c747772b875963b65b36d274103bc7ce" }, "downloads": -1, "filename": "flexi_hash_embedding-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "2f176a5a4cfe365d34d3c997a5db1acb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5079, "upload_time": "2019-08-21T17:04:08", "url": "https://files.pythonhosted.org/packages/35/ef/a3cd7ab618e11cfa3e76e9f24b1e50e7b5f3fa753d114a0e6bc8c91539af/flexi_hash_embedding-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3241556f6c25501351b4d1bfafcb6daa", "sha256": "096f4f76ce3e78474e74d9fc57495f2dfa8fa53143d4a0f1ac55f00d97490e14" }, "downloads": -1, "filename": "flexi_hash_embedding-0.0.2.tar.gz", "has_sig": false, "md5_digest": "3241556f6c25501351b4d1bfafcb6daa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4785, "upload_time": "2019-08-21T17:04:09", "url": "https://files.pythonhosted.org/packages/5f/fc/46992faf2f9ca79166f3c2ad7f90372e57391b34759e1dddb2f22dc3931e/flexi_hash_embedding-0.0.2.tar.gz" } ] }