{ "info": { "author": "Alejandro Mart\u00ednez", "author_email": "mara80@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3.6" ], "description": "\n.. image:: https://travis-ci.org/elaeon/dama_ml.svg?branch=master\n :target: https://travis-ci.org/elaeon/dama_ml\n\n.. image:: https://api.codacy.com/project/badge/Grade/0ab998e72f4f4e31b3dc7b3c9921374a\n :target: https://www.codacy.com/app/elaeon/dama_ml?utm_source=github.com&utm_medium=referral&utm_content=elaeon/dama_ml&utm_campaign=Badge_Grade\n\n\nOverview\n=====================================\n\nDama ML is a framework for data management and is used to do data science and machine learning's pipelines, also dama-ml try to unify diverse data sources like csv, sql db, hdf5, zarr, etc, and also unify machine learning frameworks (sklearn, Keras, LigthGBM, etc) with a simplify interface.\n\nFor more detail read the docs_.\n\n.. _docs: https://elaeon.github.io/dama_ml/\n\n\nWarning\n---------------\n Although, the API is stable this work is in alpha steps and there are methods that have limited functionality or aren't implemented.\n\n\nInstallation\n=====================\n\n.. code-block:: bash\n\n git clone https://github.com/elaeon/dama_ml.git\n pip install dama_ml/\n\nor\n\n.. code-block:: bash\n\n pip install DaMa-ML\n\n\nYou can install the python dependences with pip, but we strongly recommend install the dependences with conda and conda forge.\n\n.. code-block:: bash\n\n conda config --add channels conda-forge\n conda create -n new_environment --file dama_ml/requirements.txt\n conda activate new_environment\n pip install DaMa-ML\n\n\nQuick start\n==================\n\nConfigure the data paths where all data will be saved. This can be done with help of dama_ml cli tools.\n\n.. code-block:: python\n\n $ dama-cli config --edit\n\nThis will display a nano editor where you can edit data_path, models_path, code_path, class_path and metadata_path.\n\n* data_path is where all datasets are saved.\n* models_path is where all files from your models are saved.\n* code_path is the repository of code. (In development)\n* metadata_path is where the metadata database is saved.\n\nBuilding a dataset\n\n.. code-block:: python\n\n from dama.data.ds import Data\n from dama.drivers.core import Zarr, HDF5\n import numpy as np\n\n array_0 = np.random.rand(100, 1)\n array_1 = np.random.rand(100,)\n array_2 = np.random.rand(100, 3)\n array_3 = np.random.rand(100, 6)\n array_4 = (np.random.rand(100)*100).astype(int)\n array_5 = np.random.rand(100).astype(str)\n with Data(name=name, driver=Zarr(mode=\"w\")) as data:\n data.from_data({\"x\": array_0, \"y\": array_1, \"z\": array_2, \"a\": array_3, \"b\": array_4, \"c\": array_5})\n\n\nWe can use a regression model, in this case we use RandomForestRegressor\n\n.. code-block:: python\n\n from dama.reg.extended.w_sklearn import RandomForestRegressor\n from dama.utils.model_selection import CV\n\n data.driver.mode = \"r\" # we changed mode \"w\" to \"r\" to not overwrite the data previously saved\n with data, Data(name=\"test_from_hash\", driver=HDF5(mode=\"w\")) as ds:\n cv = CV(group_data=\"x\", group_target=\"y\", train_size=.7, valid_size=.1) # cross validation class\n stc = cv.apply(data)\n ds.from_data(stc, from_ds_hash=data.hash)\n reg = RandomForestRegressor()\n model_params = dict(n_estimators=25, min_samples_split=2)\n reg.train(ds, num_steps=1, data_train_group=\"train_x\", target_train_group='train_y',\n data_test_group=\"test_x\", target_test_group='test_y', model_params=model_params,\n data_validation_group=\"validation_x\", target_validation_group=\"validation_y\")\n reg.save(name=\"test_model\", model_version=\"1\")\n\nUsing RandomForestRegressor to do predictions is like this:\n\n.. code-block:: python\n\n with RandomForestRegressor.load(model_name=\"test_model\", model_version=\"1\") as reg:\n for pred in reg.predict(data):\n prediction = pred.batch.to_ndarray()\n\n\nCLI\n==============\ndama-ml has a CLI where you can view your datasets and models.\nFor example\n\n.. code-block:: bash\n\n dama-cli datasets\n\nReturn a table of datasets previously saved.\n\n.. code-block:: python\n\n Using metadata ..../metadata/metadata.sqlite3\n Total 2 / 2\n\n hash name driver group name size num groups datetime UTC\n --------------------- -------------- -------- ------------ -------- ------------ -------------------\n sha1.3124d5f16eb0e... test_from_hash HDF5 s/n 9.12 KB 6 2019-02-27 19:39:00\n sha1.e832f56e33491... reg0 Zarr s/n 23.68 KB 6 2019-02-27 19:39:00\n\n\n.. code-block:: bash\n\n dama-cli models\n\n.. code-block:: bash\n\n Total 3 / 3\n from_ds name group_name model version score name score\n ------------------------- ---------- ------------ ------------------------------------------------- --------- --------------- ----------\n sha1.d8ff5a342d2d7229... test_model s/n dama.reg.extended.w_sklearn.RandomForestRegressor 1 mse 0.162365\n sha1.d8ff5a342d2d7229... test_model s/n dama.reg.extended.w_sklearn.RandomForestRegressor 1 msle 0.0741331\n sha1.d8ff5a342d2d7229... test_model s/n dama.reg.extended.w_sklearn.RandomForestRegressor 1 gini_normalized -0.307407\n\n\nYou can use \"--help\" for view more options.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/elaeon/dama_ml", "keywords": "data python management machine learning science", "license": "Apache", "maintainer": "", "maintainer_email": "", "name": "DaMa-ML", "package_url": "https://pypi.org/project/DaMa-ML/", "platform": "", "project_url": "https://pypi.org/project/DaMa-ML/", "project_urls": { "Homepage": "https://github.com/elaeon/dama_ml" }, "release_url": "https://pypi.org/project/DaMa-ML/1.0a1/", "requires_dist": [ "matplotlib (>=2.2)", "networkx (>=1.11)", "numpy (<=1.14.5)", "pandas (>=0.19.2)", "scikit-learn (>=0.18)", "scipy (>=0.18.1)", "tabulate (>=0.7.5)", "tensorflow (>=1.0.1)", "tqdm (>=4.11.2)", "h5py (>=2.6.0)", "xmltodict (>=0.10.2)", "keras (>=2.0.2)", "psycopg2 (>=2.7.3.2)", "dask (>=0.18.2)", "xarray (>=0.10.9)", "zarr (>=2.2.0)", "numcodecs (>=0.6.1)", "toolz (>=0.9.0)", "cytoolz (>=0.9.0.1)", "msgpack-python (>=0.5.6)", "scikit-image (>=0.14.0)", "GitPython (>=2.1.11)", "psutil (>=5.5.1)", "colorlog (>=4.0.2)", "sphinx (>=1.4.0) ; extra == 'docs'" ], "requires_python": "", "summary": "A framework for data management and is used to do data science and machine learning's pipelines", "version": "1.0a1" }, "last_serial": 4887404, "releases": { "1.0a0": [ { "comment_text": "", "digests": { "md5": "a24ffcef96809c22a87e6eebca34b767", "sha256": "83aab6f6bf6329e7589e16bc6936fd6abaaac0a42b9d0c2a942311b869b873d3" }, "downloads": -1, "filename": "DaMa_ML-1.0a0-py3-none-any.whl", "has_sig": false, "md5_digest": "a24ffcef96809c22a87e6eebca34b767", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 74325, "upload_time": "2019-02-28T03:43:19", "url": "https://files.pythonhosted.org/packages/13/88/418bb1bc08bfcd6099eeee69d976dd9d67103a0a17fd111abb7558ddd247/DaMa_ML-1.0a0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e6d8b11544895acf8b50a74c75eed469", "sha256": "2651cfbacc1442c2e803394a17013fb4890b6a4807cafbf2d015efc050f1f10d" }, "downloads": -1, "filename": "DaMa ML-1.0a0.tar.gz", "has_sig": false, "md5_digest": "e6d8b11544895acf8b50a74c75eed469", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 744790, "upload_time": "2019-02-28T03:43:23", "url": "https://files.pythonhosted.org/packages/f3/c8/1ae8fab05649e0151f7728033ac03fc4be6792f466ecceed0a5ae95748cb/DaMa%20ML-1.0a0.tar.gz" } ], "1.0a1": [ { "comment_text": "", "digests": { "md5": "67a790e6d016bb44cb2f60ccf67cea5c", "sha256": "44bdd248fd599a1066052ecefb03bad98c38a579fb634f4625565987b4f6b1d7" }, "downloads": -1, "filename": "DaMa_ML-1.0a1-py3-none-any.whl", "has_sig": false, "md5_digest": "67a790e6d016bb44cb2f60ccf67cea5c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 74731, "upload_time": "2019-03-02T02:26:49", "url": "https://files.pythonhosted.org/packages/44/c6/668bfeb58731161cc957271ef62712467dac6313eba799a63b97f6b62f28/DaMa_ML-1.0a1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "60a91f0222db942225f4141fc6c3e3b1", "sha256": "588c66919ad0a601b06333a9ae6fa8805ea55a6990f87ec2b497588cefdeec45" }, "downloads": -1, "filename": "DaMa ML-1.0a1.tar.gz", "has_sig": false, "md5_digest": "60a91f0222db942225f4141fc6c3e3b1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 698219, "upload_time": "2019-03-02T02:26:53", "url": "https://files.pythonhosted.org/packages/23/50/ae01536d28a1ef40c9642bb1f6a119e33a4681a6e78fc9723ad294d8077e/DaMa%20ML-1.0a1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "67a790e6d016bb44cb2f60ccf67cea5c", "sha256": "44bdd248fd599a1066052ecefb03bad98c38a579fb634f4625565987b4f6b1d7" }, "downloads": -1, "filename": "DaMa_ML-1.0a1-py3-none-any.whl", "has_sig": false, "md5_digest": "67a790e6d016bb44cb2f60ccf67cea5c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 74731, "upload_time": "2019-03-02T02:26:49", "url": "https://files.pythonhosted.org/packages/44/c6/668bfeb58731161cc957271ef62712467dac6313eba799a63b97f6b62f28/DaMa_ML-1.0a1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "60a91f0222db942225f4141fc6c3e3b1", "sha256": "588c66919ad0a601b06333a9ae6fa8805ea55a6990f87ec2b497588cefdeec45" }, "downloads": -1, "filename": "DaMa ML-1.0a1.tar.gz", "has_sig": false, "md5_digest": "60a91f0222db942225f4141fc6c3e3b1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 698219, "upload_time": "2019-03-02T02:26:53", "url": "https://files.pythonhosted.org/packages/23/50/ae01536d28a1ef40c9642bb1f6a119e33a4681a6e78fc9723ad294d8077e/DaMa%20ML-1.0a1.tar.gz" } ] }