{
    "info": {
        "author": "",
        "author_email": "",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 4 - Beta",
            "Environment :: Console",
            "Intended Audience :: Developers",
            "Intended Audience :: Science/Research",
            "License :: OSI Approved :: Apache Software License",
            "Operating System :: POSIX :: Linux",
            "Programming Language :: Python :: 3",
            "Programming Language :: Python :: 3.6",
            "Programming Language :: Python :: 3.7",
            "Topic :: Software Development :: Libraries"
        ],
        "description": "# tf-yarn\u1d5d\n\n![tf-yarn](https://github.com/criteo/tf-yarn/blob/master/skein.png?raw=true)\n\n## Installation\n\n### Install with Pip\n\n```bash\n$ pip install tf-yarn\n```\n\n### Install from source\n\n```bash\n$ git clone https://github.com/criteo/tf-yarn\n$ cd tf-yarn\n$ pip install .\n```\n\n### Prerequisites\n\ntf-yarn only supports Python \u22653.6.\n\nMake sure to have Tensorflow working with HDFS by setting up all the environment variables as described [here](https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/hadoop.md).\n\nYou can run the `check_hadoop_env` script to check that your setup is OK (it has been installed by tf_yarn):\n\n```\n$ check_hadoop_env\n# You should see something like\n# INFO:tf_yarn.bin.check_hadoop_env:results will be written in /home/.../shared/Dev/tf-yarn/check_hadoop_env.log\n# INFO:tf_yarn.bin.check_hadoop_env:check_env: True\n# INFO:tf_yarn.bin.check_hadoop_env:write dummy file to hdfs hdfs://root/tmp/a1df7b99-fa47-4a86-b5f3-9bc09019190f/hello_tf_yarn.txt\n# INFO:tf_yarn.bin.check_hadoop_env:check_local_hadoop_tensorflow: True\n# INFO:root:Launching remote check\n# ...\n# INFO:tf_yarn.bin.check_hadoop_env:remote_check: True\n# INFO:tf_yarn.bin.check_hadoop_env:Hadoop setup: OK\n```\n\n## Quickstart\n\nThe core abstraction in tf-yarn is called an `ExperimentFn`. It is\na function returning a triple of an `Estimator`, and two specs --\n`TrainSpec` and `EvalSpec`.\n\nHere is a stripped down `experiment_fn` from\n[`examples/linear_classifier_example.py`][linear_classifier_example]\nto give you an idea of how it might look:\n\n```python\nfrom tf_yarn import Experiment\n\ndef experiment_fn():\n    # ...\n    estimator = tf.estimator.LinearClassifier(...)\n    return Experiment(\n        estimator,\n        tf.estimator.TrainSpec(train_input_fn),\n        tf.estimator.EvalSpec(eval_input_fn)\n```\n\nAn experiment can be scheduled on YARN using the `run_on_yarn` function which\ntakes three required arguments: python environment(s), `experiment_fn`,\nand a dictionary specifying how much resources to allocate for each of the\ndistributed TensorFlow task types. The example uses the [Wine Quality][wine-quality]\ndataset from UCI ML repository. With just under 5000 training instances available,\nthere is no need for multi-node training, meaning that a `\"chief\"` task complemented by an\n`\"evaluator\"` would manage just fine. Note that each task will be executed\nin its own YARN container.\n\n```python\nfrom tf_yarn import TaskSpec, run_on_yarn\nfrom tf_yarn import packaging\n\npyenv_zip_path = packaging.upload_env_to_hdfs()\nrun_on_yarn(\n    pyenv_zip_path,\n    experiment_fn,\n    task_specs={\n        \"chief\": TaskSpec(memory=2 * 2**10, vcores=4),\n        \"evaluator\": TaskSpec(memory=2**10, vcores=1),\n        \"tensorboard\": TaskSpec(memory=2**10, vcores=1)\n    }\n)\n```\n\nThe final bit is to forward the `winequality.py` module to the YARN containers,\nin order for the tasks to be able to import them:\n\n```python\nrun_on_yarn(\n    ...,\n    files={\n        os.path.basename(winequality.__file__): winequality.__file__,\n    }\n)\n```\n\n[linear_classifier_example]: https://github.com/criteo/tf-yarn/blob/master/examples/linear_classifier_example.py\n[wine-quality]: https://archive.ics.uci.edu/ml/datasets/Wine+Quality\n\n## Distributed TensorFlow 101\n\nThe following is a brief summary of the core distributed TensorFlow\nconcepts relevant to [training estimators][train-and-evaluate]. Please refer\nto the [official documentation][distributed-tf] for the full version.\n\nDistributed TensorFlow operates in terms of tasks. A task has a type which\ndefines its purpose in the distributed TensorFlow cluster. ``\"worker\"`` tasks\nheaded by the `\"chief\"` worker do model training. The `\"chief\"` additionally\nhandles checkpointing, saving/restoring the model, etc. The model itself is\nstored on one or more `\"ps\"` tasks. These tasks typically do not compute\nanything. Their sole purpose is serving the variables of the model. Finally,\nthe `\"evaluator\"` task is responsible for periodically evaluating the model.\n\nAt the minimum, a cluster must have a single `\"chief\"` task. However, it\nis a good idea to complement it by the `\"evaluator\"` to allow for running\nthe evaluation in parallel with the training.\n\n```\n+-----------+              +---------+   +----------+   +----------+\n| evaluator |        +-----+ chief:0 |   | worker:0 |   | worker:1 |\n+-----+-----+        |     +----^----+   +-----^----+   +-----^----+\n      ^              |          |            |              |\n      |              v          |            |              |\n      |        +-----+---+      |            |              |\n      |        | model   |   +--v---+        |              |\n      +--------+ exports |   | ps:0 <--------+--------------+\n               +---------+   +------+\n```\n\n[distributed-tf]: https://www.tensorflow.org/deploy/distributed\n[train-and-evaluate]: https://www.tensorflow.org/api_docs/python/tf/estimator/train_and_evaluate\n\n## Training with multiple workers\n\nMulti-worker clusters require at least a single parameter server aka `\"ps\"` task\nto store the variables being updated by the `\"chief\"` and `\"worker\"` tasks. It is\ngenerally a good idea to give `\"ps\"` tasks >1 vcores to allow for concurrent I/O\nprocessing.\n\n```python\nrun_on_yarn(\n    ...,\n    task_specs={\n        \"chief\": TaskSpec(memory=2 * 2**10, vcores=4),\n        \"worker\": TaskSpec(memory=2 * 2**10, vcores=4, instances=8),\n        \"ps\": TaskSpec(memory=2 * 2**10, vcores=8),\n        \"evaluator\": TaskSpec(memory=2**10, vcores=1),\n        \"tensorboard\": TaskSpec(memory=2**10, vcores=1)\n    }\n)\n```\n\n## Configuring the Python interpreter and packages\n\ntf-yarn needs to ship an isolated virtual environment to the containers. \n\nYou can use the packaging module to generate a package on hdfs based on your current installed virtual environment.\n(You should have installed the dependencies from `requirements.txt` first `pip install -r requirements.txt`)\nThis works if you use conda and virtual environments.\n\nBy default the generated package is a [pex][pex] package.\n\n```python\npyenv_zip_path, env_name = packaging.upload_env_to_hdfs()\nrun_on_yarn(\n    pyenv_zip_path=pyenv_zip_path\n)\n```\n\nBy specifiying your own packaging.CONDA_PACKER to `upload_env_to_hdfs` it will use [conda-pack][conda-pack] to create the package.\n\nYou can also directly use the command line tools provided by [conda-pack][conda-pack] and [pex][pex]\n\nFor pex you can run this command in the root directory to create the package (it includes all requirements from setup.py)\n```\npex . -o myarchive.pex\n```\n\nYou can then run tf-yarn with your generated package:\n\n```python\nrun_on_yarn(\n    pyenv_zip_path=\"myarchive.pex\"\n)\n```\n\n[conda-pack]: https://conda.github.io/conda-pack/\n[pex]: https://pex.readthedocs.io/en/stable/\n\n## Running on GPU\n\nYARN does not have first-class support for GPU resources. A common workaround is\nto use [node labels][node-labels] where CPU-only nodes are unlabelled, while\nthe GPU ones have a label. Furthermore, in this setting GPU nodes are\ntypically bound to a separate queue which is different from the default one.\n\nCurrently, tf-yarn assumes that the GPU label is ``\"gpu\"``. There are no\nassumptions on the name of the queue with GPU nodes, however, for the sake of\nexample we wil use the name ``\"ml-gpu\"``.\n\nThe default behaviour of `run_on_yarn` is to run on CPU-only nodes. In order\nto run on the GPU ones:\n\n1. Set the `queue` argument.\n2. Set `TaskSpec.label` to `NodeLabel.GPU` for relevant task types.\n   A good rule of a thumb is to run compute heavy `\"chief\"` and `\"worker\"`\n   tasks on GPU, while keeping `\"ps\"` and `\"evaluator\"` on CPU.\n3. Generate two python environements: one with Tensorflow for CPUs and one\n   with Tensorflow for GPUs. Parameters additional_packages and ignored_packages\n   of upload_env_to_hdfs are only supported with PEX packet\n\n```python\nfrom tf_yarn import NodeLabel\nfrom tf_yarn import packaging\n\npyenv_zip_path_cpu, _ = packaging.upload_env_to_hdfs()\npyenv_zip_path_gpu, _ = packaging.upload_env_to_hdfs(\n    additional_packages={\"tensorflow-gpu\", \"2.0.0a0\"},\n    ignored_packages={\"tensorflow\"}\n)\nrun_on_yarn(\n    {NodeLabel.CPU: pyenv_zip_path_cpu, NodeLabel.GPU: pyenv_zip_path_gpu}\n    experiment_fn,\n    task_specs={\n        \"chief\": TaskSpec(memory=2 * 2**10, vcores=4, label=NodeLabel.GPU),\n        \"evaluator\": TaskSpec(memory=2**10, vcores=1),\n        \"tensorboard\": TaskSpec(memory=2**10, vcores=1)\n    },\n    queue=\"ml-gpu\"\n)\n```\n\n[node-labels]: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeLabel.html\n\n## Accessing HDFS in the presence of [federation][federation]\n\n`skein` the library underlying `tf_yarn` automatically acquires a delegation token\nfor ``fs.defaultFS`` on security-enabled clusters. This should be enough for most\nuse-cases. However, if your experiment needs to access data on namenodes other than\nthe default one, you have to explicitly list them in the `file_systems` argument\nto `run_on_yarn`. This would instruct `skein` to acquire a delegation token for\nthese namenodes in addition to ``fs.defaultFS``:\n\n```python\nrun_on_yarn(\n    ...,\n    file_systems=[\"hdfs://preprod\"]\n)\n```\n\nDepending on the cluster configuration, you might need to point libhdfs to a\ndifferent configuration folder. For instance:\n\n```python\nrun_on_yarn(\n    ...,\n    env={\"HADOOP_CONF_DIR\": \"/etc/hadoop/conf.all\"}\n)\n```\n\n[federation]: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html\n\n## Tensorboard\n\nYou can use Tensorboard with TF Yarn.\nTensorboard is automatically spawned when using a default task_specs. Thus running as a separate container on YARN.\nIf you use a custom task_specs, you must add explicitly a Tensorboard task to your configuration.\n\n```python\nrun_on_yarn(\n    ...,\n    task_specs={\n        \"chief\": TaskSpec(memory=2 * 2**10, vcores=4),\n        \"worker\": TaskSpec(memory=2 * 2**10, vcores=4, instances=8),\n        \"ps\": TaskSpec(memory=2 * 2**10, vcores=8),\n        \"evaluator\": TaskSpec(memory=2**10, vcores=1),\n        \"tensorboard\": TaskSpec(memory=2**10, vcores=1, instances=1, termination_timeout_seconds=30)\n    }\n)\n```\n\nBoth instances and termination_timeout_seconds are optional parameters.\n* instances: controls the number of Tensorboard instances to spawn. Defaults to 1\n* termination_timeout_seconds: controls how many seconds each tensorboard instance must stay alive after the end of the run. Defaults to 30 seconds\n\nThe full access URL of each tensorboard instance is advertised as a _url_event_ starting with \"Tensorboard is listening at...\".\nTypically, you will see it appearing on the standard output of a _run_on_yarn_ call.\n\n### Environment variables\nThe following optional environment variables can be passed to the tensorboard task:\n* TF_BOARD_MODEL_DIR: to configure a model directory. Note that the experiment model dir, if specified, has higher priority. Defaults: None\n* TF_BOARD_EXTRA_ARGS: appends command line arguments to the mandatory ones (--logdir and --port): defaults: None \n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/criteo/tf-yarn",
        "keywords": "tensorflow yarn",
        "license": "",
        "maintainer": "Criteo",
        "maintainer_email": "github@criteo.com",
        "name": "tf-yarn",
        "package_url": "https://pypi.org/project/tf-yarn/",
        "platform": "",
        "project_url": "https://pypi.org/project/tf-yarn/",
        "project_urls": {
            "Homepage": "https://github.com/criteo/tf-yarn"
        },
        "release_url": "https://pypi.org/project/tf-yarn/0.4.3/",
        "requires_dist": [
            "pex (==1.6.7)",
            "tensorflow (==1.12.2)",
            "conda-pack",
            "cloudpickle (==1.0.0)",
            "skein (==0.7.2)",
            "pandas"
        ],
        "requires_python": ">=3.6",
        "summary": "Distributed TensorFlow on a YARN cluster",
        "version": "0.4.3"
    },
    "last_serial": 5519261,
    "releases": {
        "0.2.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "0c73047a959d8d070930e39d5574c4eb",
                    "sha256": "fb89c67c663895e83f51e87ef966ff305aa292e0c81a29ffad3a656ac05d28a8"
                },
                "downloads": -1,
                "filename": "tf_yarn-0.2.2-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "0c73047a959d8d070930e39d5574c4eb",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.6",
                "size": 29280,
                "upload_time": "2019-04-12T13:15:02",
                "url": "https://files.pythonhosted.org/packages/83/0d/ef28400d27d7bca870b09f05a64f1b3d91d2e5f3e13dd038dc3af33d6be1/tf_yarn-0.2.2-py3-none-any.whl"
            }
        ],
        "0.3.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "1e1dedc7b9d0d578a01357dc3260e605",
                    "sha256": "ebdefe6341c17bb8956059236acb1a5e03489bb76cfae96cce8507bee1117e90"
                },
                "downloads": -1,
                "filename": "tf_yarn-0.3.2-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "1e1dedc7b9d0d578a01357dc3260e605",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.6",
                "size": 33019,
                "upload_time": "2019-05-22T14:46:55",
                "url": "https://files.pythonhosted.org/packages/45/8b/43b5bd9b34504b11f1368fc036f19b9d2799b39a9c41a79d099b63147ee0/tf_yarn-0.3.2-py3-none-any.whl"
            }
        ],
        "0.4.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "96614bfc17c101828c082fc90c228c6f",
                    "sha256": "3de1b2f362710a3f428f92eafdcc68f1dde96beb078d5d9f42f0f65df8490197"
                },
                "downloads": -1,
                "filename": "tf_yarn-0.4.2-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "96614bfc17c101828c082fc90c228c6f",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.6",
                "size": 34750,
                "upload_time": "2019-07-08T16:09:03",
                "url": "https://files.pythonhosted.org/packages/61/96/17b4ee7a51eb01591fcd9d233aef97dfce0d2b6b0665214a83ec140b2a1d/tf_yarn-0.4.2-py3-none-any.whl"
            }
        ],
        "0.4.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "ebc49efb41171897d206461a460d9207",
                    "sha256": "ae304741eb9ebfdb649fe6527c2a666bda342a653b34091f1024a3529d7e9173"
                },
                "downloads": -1,
                "filename": "tf_yarn-0.4.3-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "ebc49efb41171897d206461a460d9207",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3.6",
                "size": 37030,
                "upload_time": "2019-07-11T17:24:33",
                "url": "https://files.pythonhosted.org/packages/19/1c/870c39da390788a0c791f738fe86172d58d3210535fd477f6c9952751271/tf_yarn-0.4.3-py3-none-any.whl"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "ebc49efb41171897d206461a460d9207",
                "sha256": "ae304741eb9ebfdb649fe6527c2a666bda342a653b34091f1024a3529d7e9173"
            },
            "downloads": -1,
            "filename": "tf_yarn-0.4.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ebc49efb41171897d206461a460d9207",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 37030,
            "upload_time": "2019-07-11T17:24:33",
            "url": "https://files.pythonhosted.org/packages/19/1c/870c39da390788a0c791f738fe86172d58d3210535fd477f6c9952751271/tf_yarn-0.4.3-py3-none-any.whl"
        }
    ]
}