{ "info": { "author": "PeopleDoc", "author_email": "stephanie_bracaloni@ultimatesoftware.com", "bugtrack_url": null, "classifiers": [ "Programming Language :: Python :: 3" ], "description": "Machine Learning Versioning Tools - MLV-tools\n=============================================\nPublic repository for versioning machine learning data.\n\nInstalling\n----------\n\nMLV-tools can be installed from **PyPi**:\n\n pip install ml-versioning-tools\n \nIt is also possible to install it directly from sources:\n \n git clone https://github.com/peopledoc/ml-versioning-tools.git\n cd ml-versioning-tools\n \n make develop \n OR\n make package \n pip install ./package/*.whl\n \nTutorial\n--------\n\nA tutorial is available to showcase how to use the tools. \nSee [MLV-tools tutorial](https://github.com/peopledoc/mlv-tools-tutorial). \n\nKeywords\n--------\n\n**Step metadata**: in this document it refers to the first code cell when it\nis used to declare metadata such as parameters, dvc inputs/outputs, etc.\n\n**Work directory**: the git top level directory of the project to version.\n(If the project does not use git, which is not recommended, use --working-dir\n argument on each command call)\n\n\nTools\n-----\n\n**ipynb_to_python**: this command converts a given *Jupyter Notebook* to a\nparameterized and executable *Python3 script* (see specific syntax in section below)\n\n ipynb_to_python -n [notebook_path] -o [python_script_path]\n \n**gen_dvc**: this command creates a *DVC command* which call the script generated by ipynb_to_python. \n\n gen_dvc -i [python_script] --out-py-cmd [python_command] \\\n --out-bash-cmd [dvc_command]\n \n**export_pipeline**: this command exports the pipeline corresponding to the given DVC meta file into a bash script.\nPipeline steps are called sequentially in a dependency order. Only for local steps.\n \n export_pipeline --dvc [DVC target meta file] -o [pipeline script]\n \n\n**ipynb_to_dvc**: this command converts a given *Jupyter Notebook* to a\nparameterized and executable *Python3 script* and a *DVC command*. It is the combination\nof **ipynb_to_python** and **gen_dvc**. It only works with a configuration file.\n\n ipynb_to_dvc -n [notebook_path]\n \n**check_script_consistency** and **check_all_scripts_consistency**: those commands ensure consitency between a Jupyter\nnotebook and its generated python script. It is possible to use them as git hook or in the project continuous\n integration. The consistency check ignores blank lines and comments.\n\n check_script_consistency -n [notebook_path] -s [script_path]\n \n check_all_scripts_consistency -n [notebook_directory]\n # Works only with a configuration file (provided or auto-detected)\n \nConfiguration\n-------------\n\nA configuration file can be provided, but it is not mandatory. \nIt's default location is in the **working directory**, ie `[working_dir]/.mlvtools`. \nBut it can be in a custom file provided as a command argument.\n\nThe configuration file format is JSON\n\n {\n \"path\": {\n \t\"python_script_root_dir\": \"[path_to_the_script_directory]\",\n \t\"dvc_cmd_root_dir\": \"[path_to_the_dvc_cmd_directory]\"\n \t}\n \"ignore_keys: [\"keywords\", \"to\", \"ignore\"],\n \"dvc_var_python_cmd_path\": \"MLV_PY_CMD_PATH_CUSTOM\",\n \"dvc_var_python_cmd_name\": \"MLV_PY_CMD_NAME_CUSTOM\",\n \"docstring_conf\": \"./docstring_conf.yml\" \n }\n\nAll given path must be relative to the **working directory**\n\n- *path_to_the_script_directory*: is the directory where **Python 3** script will be generated using \n**ipynb_to_script** command. The **Python 3** script name is based on the notebook name.\n\n ipynb_to_script -n ./data/My\\ Notebook.ipynb \n \n Generated script: `[path_to_the_script_directory]/my_notebook.py`\n \n- *path_to_the_dvc_cmd_directory*: is the directory where **DVC** commands will be generated using \n**gen_dvc** command. Generated command names are based on **Python 3** script name.\n\n gen_dvc -i ./scripts/my_notebook.py\n \n Generated commands: `[path_to_the_python_cmd_directory]/my_notebook_dvc`\n \n- *ignore_keys*: list of keywords use to discard a cell. Default value is *['# No effect ]*.\n (See *Discard cell* section)\n \n- *dvc_var_python_cmd_path*, *dvc_var_python_cmd_name*, *dvc_var_meta_filename*: they allow to customize variable names which \ncan be used in **dvc-cmd** Docstring parameter. They respectively correspond to the variables holding the python command \nfile path, the file name and the variable holding the **DVC** default meta file name. Default values are 'MLV_PY_CMD_PATH',\n 'MLV_PY_CMD_NAME' and 'MLV_DVC_META_FILENAME'. (See DVC Command/Complex cases section for usage) \n\n- *docstring_conf*: the path to the docstring configuration used for Jinja templating (see DVC templating section). \nThis parameter is not mandatory.\n\n\nJupyter Notebook syntax\n-----------------------\n\nThe **Step metadata** cell is used to declare script parameters and **DVC** outputs and dependencies.\nThis can be done using basic Docstring syntax. This Docstring must be the first statement is this cell, only\ncomments can be writen above. \n\n\n### Good practices \n\nAvoid using relative paths in your Jupyter Notebook because they are relative to \nthe notebook location which is not the same when it will be converted to a script.\n\n\n### Python Script Parameters\n\nParameters can be declared in the **Jupyter Notebook** using basic Docstring syntax.\nThis parameters description is used to generate configurable and executable python scripts.\n\nParameters declaration in **Jupyter Notebook**:\n\n**Jupyter Notebook**: process_files.ipynb\n\n \n #:param [type]? [param_name]: [description]?\n \"\"\"\n :param str input_file: the input file\n :param output_file: the output_file\n :param rate: the learning rate\n :param int retry:\n \"\"\"\n \nGenerated **Python3 script**:\n\n [...]\n def process_file(input_file: str, output_file, rate, retry:int):\n \"\"\"\n ...\n \"\"\"\n [...]\n\nScript command line parameters:\n\n my_script.py -h\n \n usage: my_cmd [-h] --input-file INPUT_FILE --output-file OUTPUT_FILE --rate\n RATE --retry RETRY\n \n Command for script [script_name]\n \n optional arguments:\n -h, --help show this help message and exit\n --input-file INPUT_FILE\n the input file\n --output-file OUTPUT_FILE\n the output_file\n --rate RATE the rate\n --retry RETRY\n\nAll declared arguments are required.\n\n### DVC command\n\nA **DVC** command is a wrapper over **dvc run** command called on a **Python 3** script generated \nwith **ipynb_to_python** command. It is a step of a pipeline. \n\nIt is based on data declared in **notebook metadata**,\n 2 modes are available:\n - describe only input/output for simple cases (recommended)\n - describe full command for complex cases\n\n#### Simple cases\n\nSyntax\n \n :param str input_csv_file: Path to input file\n :param str output_csv_file: Path to output file\n [...]\n \n [:dvc-[in|out][\\s{related_param}]?:[\\s{file_path}]?]*\n [:dvc-extra: {python_other_param}]?\n \n :dvc-in: ./data/filter.csv\n :dvc-in input_csv_file: ./data/info.csv \n :dvc-out: ./data/train_set.csv \n :dvc-out output_csv_file: ./data/test_set.csv\n :dvc-extra: --mode train --rate 12\n \nProvided **{file_path}** path can be absolute or relative to the git top dir.\n\nThe **{related_param}** is a parameter of the corresponding **Python 3** script,\n it is filled in for the python script call\n\nThe **dvc-extra** allows to declare parameters which are not dvc outputs or dependencies.\nThose parameters are provided to the call of the **Python 3** command.\n \n pushd $(git rev-parse --show-toplevel)\n \n INPUT_CSV_FILE=\"./data/info.csv\"\n OUTPUT_CSV_FILE=\"./data/test_set.csv\"\n \n dvc run \\\n -d ./data/filter.csv\\\n -d $INPUT_CSV_FILE\\\n -o ./data/train_set.csv\\\n -o $OUTPUT_CSV_FILE\\\n gen_src/python_script.py --mode train --rate 12 \n --input-csv-file $INPUT_CSV_FILE \n --output-csv-file $OUTPUT_CSV_FILE\n\n \n \n#### Complex cases\n\nSyntax\n \n :dvc-cmd: {dvc_command}\n\n :dvc-cmd: dvc run -o ./out_train.csv -o ./out_test.csv \n \"$MLV_PY_CMD_PATH -m train --out ./out_train.csv && \n $MLV_PY_CMD_PATH -m test --out ./out_test.csv\"\n \nThis syntax allows to provide the full dvc command to generate. All paths can be absolute or relative to the git top dir.\nThe variables $MLV_PY_CMD_PATH and $MLV_PY_CMD_NAME are available. They respectively contains the path and the name\n of the corresponding python command.\nThe variable $MLV_DVC_META_FILENAME contains the default name of the **DVC** meta file.\n \n pushd $(git rev-parse --show-toplevel)\n MLV_PY_CMD_PATH=\"gen_src/python_script.py\"\n MLV_PY_CMD_NAME=\"python_script.py\"\n \n dvc run -f $MLV_DVC_META_FILENAME -o ./out_train.csv \\\n -o ./out_test.csv \\\n \"$MLV_PY_CMD_PATH -m train --out ./out_train.csv && \\\n $MLV_PY_CMD_PATH -m test --out ./out_test.csv\" \n popd\n\n\n### DVC templating\n\nIt is possible to use Jinja2 template in DVC Docstring part. For example, it can be useful to declare all \nsteps dependencies, outputs and extra parameters.\n\nExample:\n\n # Docstring in Jupyter notebook \n \"\"\"\n [...]\n :dvc-in: {{ conf.train_data_file_path }} \n :dvc-out: {{ conf.model_file_path }}\n :dvc-extra: --rate {{ conf.rate }}\n \"\"\"\n \n # Docstring configuration file (Yaml format): ./dc_conf.yml\n \n train_data_file_path: ./data/trainset.csv\n model_file_path: ./data/model.pkl\n rate: 45\n \n # DVC command generation\n gen_dvc -i ./python_script.py --docstring-conf ./dc_conf.yml\n \nThe *Docstring configuration file* can be provided through the main configuration or using **--docstring-conf**\nargument. This feature is only available for **gen_dvc** command.\n\n\n### Discard cell\n\nSome cells in **Jupyter Notebook** are executed only to watch intermediate results.\nIn a **Python 3** script those are statements with no effect. \nThe comment **# No effect** allows to discard a whole cell content to avoid waste of \ntime running those statements. It is possible to customize the list of discard keywords, see *Configuration* section.\n\n\nContributing\n------------\n\nWe happily welcome contributions to MLV-tools. Please see our [contribution](./CONTRIBUTING.md) guide for details.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/peopledoc/ml-versionning-tools", "keywords": "peopledoc,machine learning,versioning,mlvtools", "license": "", "maintainer": "", "maintainer_email": "", "name": "ml-versioning-tools", "package_url": "https://pypi.org/project/ml-versioning-tools/", "platform": "", "project_url": "https://pypi.org/project/ml-versioning-tools/", "project_urls": { "Homepage": "http://github.com/peopledoc/ml-versionning-tools" }, "release_url": "https://pypi.org/project/ml-versioning-tools/1.0.2/", "requires_dist": null, "requires_python": "", "summary": "Set of Machine Learning versioning helpers", "version": "1.0.2" }, "last_serial": 5424231, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "8159eae35f9c0d22234f369a0d7253f9", "sha256": "67811de4218f89d5578a53af9aaee9104877de3f58a3573300cc23514995a4c0" }, "downloads": -1, "filename": "ml-versioning-tools-0.0.1.tar.gz", "has_sig": false, "md5_digest": "8159eae35f9c0d22234f369a0d7253f9", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 15301, "upload_time": "2018-09-18T13:50:57", "url": "https://files.pythonhosted.org/packages/97/f5/2590cd36a2930255b0d0a9397a5a063d6f1231afc186d4b788e482ff1674/ml-versioning-tools-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "aaaf4d5eb238e72f6212bd36ba814a86", "sha256": "7e11c900c78e2ce756820286f00856d7759d6ba955e3876b4d37d6dc627150f8" }, "downloads": -1, "filename": "ml-versioning-tools-0.0.2.tar.gz", "has_sig": false, "md5_digest": "aaaf4d5eb238e72f6212bd36ba814a86", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15243, "upload_time": "2018-09-20T15:02:40", "url": "https://files.pythonhosted.org/packages/f1/07/d589d6475e2aa1d728214d1fc07cb8ba65ef3f786127fb52e1c84f01418c/ml-versioning-tools-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "77770de706576fcffe068afafb42e147", "sha256": "616c71fc99ec34f11602eaf7e2133464c9e12847e5033578a68582c8b186a09a" }, "downloads": -1, "filename": "ml-versioning-tools-0.0.3.tar.gz", "has_sig": false, "md5_digest": "77770de706576fcffe068afafb42e147", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15438, "upload_time": "2018-10-05T12:17:52", "url": "https://files.pythonhosted.org/packages/7b/22/c243d58bce95c8f02a1c70a689d0127863501504af061f8e5db0e411141a/ml-versioning-tools-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "368d2c507168e6cc0ec65d4ce095ac63", "sha256": "6e35c233efa89a8b8286fb7a06c5a168ed7de1dcd8ee49ba8f70535171d3eed6" }, "downloads": -1, "filename": "ml-versioning-tools-0.0.4.tar.gz", "has_sig": false, "md5_digest": "368d2c507168e6cc0ec65d4ce095ac63", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15454, "upload_time": "2018-10-11T12:36:39", "url": "https://files.pythonhosted.org/packages/85/53/47da19a0adef7477f4c8a4fc9258c2f6d4a6c5d45caf22a604c62cb402f7/ml-versioning-tools-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "d7f7cc0bcf62e53e3a2c7c986bf7bfcc", "sha256": "39a05db49154f22da53c6b90bf61bdec593ba86154622bafbbf9525d6df85247" }, "downloads": -1, "filename": "ml-versioning-tools-0.0.5.tar.gz", "has_sig": false, "md5_digest": "d7f7cc0bcf62e53e3a2c7c986bf7bfcc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19335, "upload_time": "2018-10-15T12:19:06", "url": "https://files.pythonhosted.org/packages/91/a4/fabd0b525f2eef1d01bddf6e07268b96574fbd7fee2f30316f779bb21255/ml-versioning-tools-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "c488799df3c151e1a62a99fe5722ac2e", "sha256": "4ac651b6358d68b2ad1f2939927e5e3c52b84621525162b7dedf0da7cac8f878" }, "downloads": -1, "filename": "ml-versioning-tools-0.0.6.tar.gz", "has_sig": false, "md5_digest": "c488799df3c151e1a62a99fe5722ac2e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20398, "upload_time": "2018-10-16T08:25:31", "url": "https://files.pythonhosted.org/packages/45/9c/b302fd030f1b28d29b65ebf1a569796bc46ccc3e0a67655750cece68ca42/ml-versioning-tools-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "72fd0d0535f805aa9a8aa8c7b1de1325", "sha256": "272b35bec8806e931c4f8487c818d209a4eadf147023070c54e5332daf0e5200" }, "downloads": -1, "filename": "ml-versioning-tools-0.0.7.tar.gz", "has_sig": false, "md5_digest": "72fd0d0535f805aa9a8aa8c7b1de1325", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20462, "upload_time": "2018-10-18T13:14:25", "url": "https://files.pythonhosted.org/packages/7d/8c/0b4da7db0ac37fd3716efc85062622211672b60bd4123bf57590390977db/ml-versioning-tools-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "180b7485dc20e47d7965b12c3696d378", "sha256": "7a0f3f1f70784ef6c2460be5d99380602c2d159ca5c6a005db455d8e4dcb2985" }, "downloads": -1, "filename": "ml-versioning-tools-0.0.8.tar.gz", "has_sig": false, "md5_digest": "180b7485dc20e47d7965b12c3696d378", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21335, "upload_time": "2018-11-28T08:18:53", "url": "https://files.pythonhosted.org/packages/a3/84/94f7b59491665aece1c4e4e4111d2f8ed4c72e4008660c266cf9d8d887ad/ml-versioning-tools-0.0.8.tar.gz" } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "94661c13a0fe5658823cd843ff91d78a", "sha256": "f17a11a71daf8f9abc04d081f573dd42dc802e4d2420d166de1965f1d4340341" }, "downloads": -1, "filename": "ml-versioning-tools-0.0.9.tar.gz", "has_sig": false, "md5_digest": "94661c13a0fe5658823cd843ff91d78a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20926, "upload_time": "2019-01-03T14:15:00", "url": "https://files.pythonhosted.org/packages/04/aa/ac2589b652b434ce441cd7b0890fedd8c560078823e7c835390a8df69fca/ml-versioning-tools-0.0.9.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "5c796441138e1361bdaa156826ba3855", "sha256": "d826315f18648bfb8f7540447f36ace9f3ef83858ae8d730c68e398efd06999e" }, "downloads": -1, "filename": "ml-versioning-tools-1.0.1.tar.gz", "has_sig": false, "md5_digest": "5c796441138e1361bdaa156826ba3855", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20200, "upload_time": "2019-03-25T09:06:55", "url": "https://files.pythonhosted.org/packages/dc/1b/022d17415792dfcccd824472ec3226577c634b76f87080bb1f3aa023ce2a/ml-versioning-tools-1.0.1.tar.gz" } ], "1.0.2": [ { "comment_text": "", "digests": { "md5": "243b4593a47a0c2808269a6f07dafb88", "sha256": "f673ca007daa53dc5c9ecf4b74d37a0de49de0bd86c0def335da0442b27d22ed" }, "downloads": -1, "filename": "ml-versioning-tools-1.0.2.tar.gz", "has_sig": false, "md5_digest": "243b4593a47a0c2808269a6f07dafb88", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21829, "upload_time": "2019-06-20T07:17:17", "url": "https://files.pythonhosted.org/packages/e1/25/9c422e853d9fdb191245ecf3e9549d74ae84f68b149336a96e8ca814d78e/ml-versioning-tools-1.0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "243b4593a47a0c2808269a6f07dafb88", "sha256": "f673ca007daa53dc5c9ecf4b74d37a0de49de0bd86c0def335da0442b27d22ed" }, "downloads": -1, "filename": "ml-versioning-tools-1.0.2.tar.gz", "has_sig": false, "md5_digest": "243b4593a47a0c2808269a6f07dafb88", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21829, "upload_time": "2019-06-20T07:17:17", "url": "https://files.pythonhosted.org/packages/e1/25/9c422e853d9fdb191245ecf3e9549d74ae84f68b149336a96e8ca814d78e/ml-versioning-tools-1.0.2.tar.gz" } ] }