{ "info": { "author": "Martin Becker", "author_email": "becker@informatik.uni-wuerzburg.de", "bugtrack_url": null, "classifiers": [ "Programming Language :: Python" ], "description": "# mlflowhelper\n\nA set of tools for working with *mlflow* (see https://mlflow.org)\n\n## Features\n\n* managed artifact logging and **loading**\n * **automatic** artifact logging and cleanup\n * **no overwriting files** when running scripts in parallel\n * **loading** artifact\n * **central configuration** of logging and loading behavior\n* log **all** function parameters and locals with a simple call to `mlflowhelper.log_vars()`\n\n\n## Documentation\n\n### Managed artifact logging and loading\n\n#### General functionality\n\n```python\nfrom matplotlib import pyplot as plt\nimport mlflowhelper\n\nwith mlflowhelper.start_run():\n with mlflowhelper.managed_artifact(\"plot.png\") as artifact:\n fig = plt.figure()\n plt.plot([1,2,3], [1,2,3])\n fig.savefig(artifact.get_path())\n```\nThis code snippet automatically logs the created artifact (`plot.png`).\nAt the same time if will create the artifact in a temporary folder so that you don't have to worry about\noverwriting it when running your scripts in parallel.\nBy default, this also cleans up the artifact and the temporary folder after logging.\n\nYou can also manage artifacts on a directory level:\n```python\nfrom matplotlib import pyplot as plt\nimport mlflowhelper\n\nwith mlflowhelper.start_run():\n with mlflowhelper.managed_artifact_dir(\"plots\") as artifact_dir:\n\n # plot 1\n fig = plt.figure()\n plt.plot([1,2,3], [1,2,3])\n fig.savefig(artifact_dir.get_path(\"plot1.png\"))\n\n # plot 2\n fig = plt.figure()\n plt.plot([1,2,3], [1,2,3])\n fig.savefig(artifact_dir.get_path(\"plot2.png\"))\n```\n\n#### Artifact loading\nYou may want to run experiments but reuse some precomputed artifact from a different run (such\nas preprocessed data, trained models, etc.). This can be done as follows:\n```python\nimport mlflowhelper\nimport pandas as pd\n\nwith mlflowhelper.start_run():\n mlflowhelper.set_load(run_id=\"e1363f760b1e4ab3a9e93f856f2e9341\", stages=[\"load_data\"]) # activate loading from previous run\n with mlflowhelper.managed_artifact_dir(\"data.csv\", stage=\"load_data\") as artifact:\n if artifact.loaded:\n # load artifact\n data = pd.read_csv(artifact.get_path())\n else:\n # create and save artifact\n data = pd.read_csv(\"/shared/dir/data.csv\").sample(frac=1)\n data.to_csv(artifact.get_path())\n```\n\nSimilarly, this works for directories of course:\n```python\nimport mlflowhelper\nimport pandas as pd\n\nmlflowhelper.set_load(run_id=\"e1363f760b1e4ab3a9e93f856f2e9341\", stages=[\"load_data\"]) # activate loading from previous run\nwith mlflowhelper.start_run():\n with mlflowhelper.managed_artifact_dir(\"data\", stage=\"load_data\") as artifact_dir:\n train_path = artifact_dir.get_path(\"test.csv\")\n test_path = artifact_dir.get_path(\"train.csv\")\n if artifact_dir.loaded:\n # load artifacts\n train = pd.read_csv(train_path)\n test = pd.read_csv(test_path)\n else:\n data = pd.read_csv(\"/shared/dir/data.csv\").sample(frac=1)\n train = data.iloc[:100,:]\n test = data.iloc[100:,:]\n # save artifacts\n train.to_csv(train_path)\n test.to_csv(test_path)\n```\n\n**Note:** The `stage` parameter must be set in `mlflowhelper.managed_artifact(_dir)` to enable loading.\n\n#### Central logging and loading behavior management\n\nLogging and loading behavior can be managed in a central way:\n```python\nimport mlflowhelper\nimport pandas as pd\n\nwith mlflowhelper.start_run():\n\n # activate loading the stage `load_data` from previous run `e1363f760b1e4ab3a9e93f856f2e9341`\n mlflowhelper.set_load(run_id=\"e1363f760b1e4ab3a9e93f856f2e9341\", stages=[\"load_data\"])\n\n # deactivate logging the stage `load_data`, in this case for example because it was loaded from a previous run\n mlflowhelper.set_skip_log(stages=[\"load_data\"])\n\n with mlflowhelper.managed_artifact_dir(\"data\", stage=\"load_data\") as artifact_dir:\n train_path = artifact_dir.get_path(\"test.csv\")\n test_path = artifact_dir.get_path(\"train.csv\")\n if artifact_dir.loaded:\n # load artifacts\n train = pd.read_csv(train_path)\n test = pd.read_csv(test_path)\n else:\n data = pd.read_csv(\"/shared/dir/data.csv\").sample(frac=1)\n train = data.iloc[:100,:]\n test = data.iloc[100:,:]\n # save artifacts\n train.to_csv(train_path)\n test.to_csv(test_path)\n```\n\n**Note:** For central managing the `stage` parameter must be set in `mlflowhelper.managed_artifact(_dir)`.\n\n\n### Easy parameter logging\n\n*mlflowhelper* helps you to never forget logging parameters again by making it easy to log all existing variables\nusing `mlflowhelper.log_vars`.\n\n```python\nimport mlflowhelper\n\ndef main(param1, param2, param3=\"defaultvalue\", verbose=0, *args, **kwargs):\n some_variable = \"x\"\n with mlflowhelper.start_run(): # mlflow.start_run() is also OK here\n mlflowhelper.log_vars(exclude=[\"verbose\"])\n\nif __name__ == '__main__':\n main(\"a\", \"b\", something_else=6)\n```\nThis will log:\n```json\n{\n \"param1\": \"a\",\n \"param2\": \"b\",\n \"param3\": \"defaultvalue\",\n \"something_else\": 6\n}\n```\n\n\n### Other\nThere are a few more convenience functions included in `mlflowhelper`:\n\n\n## TODOs / Ideas\n- [ ] check if loading works across experiments\n- [ ] purge local artifacts (check via API which runs are marked as deleted and delete their artifacts)\n- [ ] support nested runs by creating subdirectories based on experiment and run\n- [ ] support loading from central cache instead of from runs\n- [ ] automatically log from where and what has been loaded\n- [ ] set tags for logged stages (to check for artifacts before loading them)\n- [ ] consider loading extensions:\n - [ ] does nested loading make sense (different loads for certain nested runs)?\n - [ ] does mixed loading make sense (loading artifacts from different runs for different stages)?\n\n\n## Note\nThis project has been set up using PyScaffold 3.2.1. For details and usage\ninformation on PyScaffold see https://pyscaffold.org/.", "description_content_type": "text/markdown; charset=UTF-8", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/mgbckr/mlflowhelper", "keywords": "", "license": "mit", "maintainer": "", "maintainer_email": "", "name": "mlflowhelper", "package_url": "https://pypi.org/project/mlflowhelper/", "platform": "any", "project_url": "https://pypi.org/project/mlflowhelper/", "project_urls": { "Documentation": "https://github.com/mgbckr/mlflowhelper", "Homepage": "https://github.com/mgbckr/mlflowhelper" }, "release_url": "https://pypi.org/project/mlflowhelper/1.0.0/", "requires_dist": null, "requires_python": ">=3.4", "summary": "A set of tools for working with mlflow (see https://mlflow.org)", "version": "1.0.0" }, "last_serial": 5643886, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "6911f7e86b1e1fbc752e3e27f692847a", "sha256": "3a24bbc52dcfde813a64df03aa89fd75dd62abbe4009ebe006ccaa7a6ac797bd" }, "downloads": -1, "filename": "mlflowhelper-1.0.0.tar.gz", "has_sig": false, "md5_digest": "6911f7e86b1e1fbc752e3e27f692847a", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 21630, "upload_time": "2019-08-07T08:23:12", "url": "https://files.pythonhosted.org/packages/ef/1d/912ae1e31924e86a564f856f0519f653d5296a2182b9c31fcda7b6bceb21/mlflowhelper-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6911f7e86b1e1fbc752e3e27f692847a", "sha256": "3a24bbc52dcfde813a64df03aa89fd75dd62abbe4009ebe006ccaa7a6ac797bd" }, "downloads": -1, "filename": "mlflowhelper-1.0.0.tar.gz", "has_sig": false, "md5_digest": "6911f7e86b1e1fbc752e3e27f692847a", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 21630, "upload_time": "2019-08-07T08:23:12", "url": "https://files.pythonhosted.org/packages/ef/1d/912ae1e31924e86a564f856f0519f653d5296a2182b9c31fcda7b6bceb21/mlflowhelper-1.0.0.tar.gz" } ] }