{ "info": { "author": "Arthur Mensch", "author_email": "arthur.mensch@m4x.org", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved", "Operating System :: MacOS", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX", "Operating System :: Unix", "Programming Language :: C", "Programming Language :: Python", "Programming Language :: Python :: 3.5", "Topic :: Scientific/Engineering", "Topic :: Software Development" ], "description": "# MODL: Massive Online Dictionary Learning\n\n[![Travis](https://travis-ci.org/arthurmensch/modl.svg?branch=master)](https://travis-ci.org/arthurmensch/modl)\n[![Coveralls](https://coveralls.io/repos/github/arthurmensch/modl/badge.svg?branch=master)](https://coveralls.io/github/arthurmensch/modl?branch=master)\n\nThis python package ([webpage](https://github.com/arthurmensch/modl)) implements the two following papers:\n\n>Arthur Mensch, Julien Mairal, Bertrand Thirion, Ga\u00ebl Varoquaux.\n[Stochastic Subsampling for Factorizing Huge Matrices](https://hal.archives-ouvertes.fr/hal-01431618v1). 2017.\n\n>Arthur Mensch, Julien Mairal, Bertrand Thirion, Ga\u00ebl Varoquaux.\n[Dictionary Learning for Massive Matrix Factorization](https://hal.archives-ouvertes.fr/hal-01308934v2). International Conference\n on Machine Learning, Jun 2016, New York, United States. 2016\n\nIt allows to perform sparse / dense matrix factorization on fully-observed/missing data very efficiently, by leveraging random subsampling with online learning.\nIt is able to factorize matrices of terabyte scale with hundreds of components in the latent space in a few hours.\n\nThis package allows to reproduce the\n experiments and figures from the papers.\n\nMore importantly, it provides [https://github.com/scikit-learn/scikit-learn](scikit-learn) compatible\n estimators that fully implements the proposed algorithms.\n\n## Installing from source with pip\n\nInstallation from source is simple. In a command prompt:\n\n```\ngit clone https://github.com/arthurmensch/modl.git\ncd modl\npip install -r requirements.txt\npip install .\ncd $HOME\npy.test --pyargs modl\n```\n\n*This package is only tested with Python 3.5+ !*\n\n## Core code\n\nThe package essentially provides three estimators:\n\n- `DictFact`, that computes a matrix factorization from Numpy arrays\n- `fMRIDictFact`, that computes sparse spatial maps from fMRI images\n- `ImageDictFact`, that computes a patch dictionary from an image\n- `RecsysDictFact`, that allows to predict score from a collaborative filtering approach\n\n\n## Examples\n\n### fMRI decomposition\n\nA fast running example that decomposes a small dataset of resting-fmri data into a 70 components map is provided\n\n```\npython examples/decompose_fmri.py\n```\n\nIt can be adapted for running on the 2TB HCP dataset, by changing the source parameter into 'hcp' (you will need to download the data first)\n\n### Hyperspectral images\n\nA fast running example that extracts the patches of a HD image can be run from\n\n```\npython examples/decompose_image.py\n```\n\nIt can be adapted to run on AVIRIS data, changing the image source into 'aviris' in the file.\n\n### Recommender systems\n\nOur core algorithm can be run to perform collaborative filtering very efficiently:\n\n```\npython examples/recsys_compare.py\n```\n\nYou will need to download datasets beforehand:\n\n```\nmake download-movielens1m\nmake download-movielens10m\n```\n\n## Future work\n\n- `sacred` dependency will be removed\n- Release a fetcher for HCP from S3 bucker\n- Release examples with larger datasets and benchmarks\n\n## Contributions\n\nPlease feel free to report any issue and propose improvements on github.\n\n## References\n\nRelated projects :\n - [spira](https://github.com/mblondel/spira) is a python library to perform collaborative filtering based on coordinate descent. It serves as the baseline for recsys experiments - we hard included it for simplicity.\n - [scikit-learn](https://github.com/scikit-learn/scikit-learn) is a python library for machine learning. It serves as the basis of this project.\n - [nilearn](https://github.com/nilearn/nilearn) is a neuro-imaging library that we wrap in our fMRI related estimators.\n\n## Author\n\nLicensed under simplified BSD.\n\nArthur Mensch, 2015 - present\n\n", "description_content_type": "", "docs_url": null, "download_url": "https://github.com/arthurmensch/modl", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/arthurmensch/modl", "keywords": "", "license": "new BSD", "maintainer": "", "maintainer_email": "", "name": "modl", "package_url": "https://pypi.org/project/modl/", "platform": "", "project_url": "https://pypi.org/project/modl/", "project_urls": { "Download": "https://github.com/arthurmensch/modl", "Homepage": "https://github.com/arthurmensch/modl" }, "release_url": "https://pypi.org/project/modl/0.6.1.1/", "requires_dist": null, "requires_python": "", "summary": "Subsampled Online Matrix Factorization in Python", "version": "0.6.1.1" }, "last_serial": 4386062, "releases": { "0.6.1": [ { "comment_text": "", "digests": { "md5": "b5079909162065ed87a4e023391976e6", "sha256": "9e68ad7cc01c1d48b6fa34b01a2839a5ec951f438b1980a3ca501cc57644dc26" }, "downloads": -1, "filename": "modl-0.6.1.tar.gz", "has_sig": false, "md5_digest": "b5079909162065ed87a4e023391976e6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1119412, "upload_time": "2018-10-17T11:20:01", "url": "https://files.pythonhosted.org/packages/dd/e1/e1624f0bf7bd4ae310fed537f4ff5f111910be6883ad5e42edd1799f98a5/modl-0.6.1.tar.gz" } ], "0.6.1.1": [ { "comment_text": "", "digests": { "md5": "329fdf6d58856ece56d7bf34c33e0c6a", "sha256": "7df69946b1cb7232cd1688ed6cc5836fd1a5280a040362ae87ccd32de7e6b053" }, "downloads": -1, "filename": "modl-0.6.1.1.tar.gz", "has_sig": false, "md5_digest": "329fdf6d58856ece56d7bf34c33e0c6a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1119420, "upload_time": "2018-10-17T11:48:08", "url": "https://files.pythonhosted.org/packages/c6/1b/a80192216e6cb547c142aec6e72c2d26168d80f465f2eacd67bbde5991f4/modl-0.6.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "329fdf6d58856ece56d7bf34c33e0c6a", "sha256": "7df69946b1cb7232cd1688ed6cc5836fd1a5280a040362ae87ccd32de7e6b053" }, "downloads": -1, "filename": "modl-0.6.1.1.tar.gz", "has_sig": false, "md5_digest": "329fdf6d58856ece56d7bf34c33e0c6a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1119420, "upload_time": "2018-10-17T11:48:08", "url": "https://files.pythonhosted.org/packages/c6/1b/a80192216e6cb547c142aec6e72c2d26168d80f465f2eacd67bbde5991f4/modl-0.6.1.1.tar.gz" } ] }