{ "info": { "author": "Mingyu Gao", "author_email": "mgao12@stanford.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Programming Language :: Python :: 2.7", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: System :: Hardware" ], "description": ".. image:: https://travis-ci.org/stanford-mast/nn_dataflow.svg?branch=master\n :target: https://travis-ci.org/stanford-mast/nn_dataflow\n.. image:: https://coveralls.io/repos/github/stanford-mast/nn_dataflow/badge.svg?branch=master\n :target: https://coveralls.io/github/stanford-mast/nn_dataflow?branch=master\n\n\nNeural Network Dataflow Scheduling\n==================================\n\nThis Python tool allows you to explore the energy-efficient dataflow scheduling\nfor neural networks (NNs), including array mapping, loop blocking and\nreordering, and (coarse-grained) parallel processing within and across layers.\n\nFor hardware, we assume an Eyeriss-style NN accelerator [Chen16]_, i.e., a 2D\narray of processing elements (PEs) with a local register file in each PE, and a\nglobal SRAM buffer shared by all PEs. We further support a tiled architecture\nwith multiple nodes that can partition and process the NN computations in\nparallel. Each node is an Eyeriss-style engine as above.\n\nIn software, we decouple the dataflow scheduling into three subproblems:\n\n- Array mapping, which deals with mapping one 2D convolution computation (one\n 2D ifmap convolves with one 2D filter to get one 2D ofmap) onto the hardware\n PE array. We support row stationary mapping [Chen16]_.\n- Loop blocking and reordering, which decides the order between all 2D\n convolutions by blocking and reordering the nested loops. We support\n exhaustive search over all blocking and reordering schemes [Yang16]_, and\n analytical bypass solvers [Gao17]_.\n- Parallel processing, which partitions the NN computations across the multiple\n tiled engines. We support both intra-layer and inter-layer parallelism. For\n intra-layer, we support batch partitioning, fmap partitioning, output\n partitioning, input partitioning, and the combination between them (hybrid)\n [Gao17]_. We also explore various dataflow optimizations including access\n forwarding and buffer sharing [Gao19]_. We use exhaustive search within each\n layer. For inter-layer, we support spatial pipelining (inter-layer\n pipelining) and temporal pipelining (time multiplexing without writing back\n intermediate data) as well as their optimized scheduling [Gao19]_. We use\n layer-wise greedy beam search across layers.\n\nSee the details in our ASPLOS'17 [Gao17]_ and ASPLOS'19 [Gao19]_ papers.\n\nIf you use this tool in your work, we kindly request that you reference our\npaper(s) below, and send us a citation of your work.\n\n- Gao et al., \"TETRIS: Scalable and Efficient Neural Network Acceleration with\n 3D Memory\", in ASPLOS, April 2017.\n\n- Gao et al., \"TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN\n Accelerators\", in ASPLOS. April 2019.\n\n\nInstall\n-------\n\n``nn_dataflow`` requires Python 2.7 and is not yet Python 3 compatible.\n\n``nn_dataflow`` can be directly used without installation if you have first\ndefined the environment variable ``PYTHONPATH`` to include the top directory path.\nSee the Usage section below for details.\n\n``nn_dataflow`` has been registered on `PyPI\n`_, so it can be installed through\n``pip`` as::\n\n > pip install nn-dataflow\n\nAnd ``pip`` will take care of all dependencies.\n\nTo only install ``nn_dataflow`` in local user install directory (without\n``sudo``), and/or to install in editable mode, at the top directory do::\n\n > pip install --user -e .\n\n\nUsage\n-----\n\nFirst, define the NN structure in ``nn_dataflow/nns``. We already defined\nseveral popular NNs for you, including AlexNet, VGG-16, GoogLeNet, ResNet-152,\netc.\n\nThen, use ``nn_dataflow/tools/nn_dataflow_search.py`` to search for the optimal\ndataflow for the NN. For detailed options, type::\n\n > python ./nn_dataflow/tools/nn_dataflow_search.py -h\n\nYou can specify NN batch size and word size, PE array dimensions, number of\ntile nodes, register file and global buffer capacity, and the energy cost of\nall components. Note that, the energy cost of array bus should be the average\nenergy of transferring the data from the buffer to one PE, *not* local neighbor\ntransfer; the unit static energy cost should be the static energy of *all*\nnodes in one clock cycle.\n\nOther options include:\n\n- ``-g``, ``--goal``: ``E``, ``D``, or ``ED``. the optimization goal, e(nergy),\n d(elay), or ED product.\n- ``--mem-type``: ``2D`` or ``3D``. With 2D memory, memory channels are only on\n the four corners of the chip; with 3D memory, memory channels are on the top\n of all tile nodes (one per each).\n- ``--bus-width``: the multicast bus bit width in the PE array for one data\n type. Set to 0 to ignore multicast overheads.\n- ``--dram-bw``: ``float`` or ``inf``. Total DRAM bandwidth for all tile nodes,\n in bytes per cycle.\n- ``--disable-bypass``: a combination of ``i``, ``o``, ``f``, whether to\n disallow global buffer bypass for ifmaps, ofmaps, and weights.\n- ``--solve-loopblocking``: whether to use analytical bypass solvers for loop\n blocking and reordering. See [Gao17]_.\n- ``--hybrid-partitioning``: whether to use hybrid partitioning in [Gao17]_.\n If not enabled, use naive partitioning, i.e., fmap partitioning for CONV\n layers, and output partitioning for FC layers.\n- ``--batch-partitioning`` and ``--ifmap-partitioning``: whether the hybrid\n partitioning also explores batch and input partitioning.\n- ``--enable-access-forwarding``: access forwarding, where the nodes fetch\n disjoint subsets of data and forward them to other nodes. See [Gao19]_.\n- ``--enable-gbuf-sharing``: buffer sharing, where the global buffer capacity is\n shared across nodes through NoC. See [Gao19]_.\n- ``--enable-save-writeback``: allow to elide the intermediate data writeback to\n memory when switching between layers if it is possible to store the entire\n data set in on-chip buffers.\n- ``--interlayer-partition``: whether to use inter-layer pipelining to\n partition resources across multiple layers and process them simultaneously.\n- ``--layer-pipeline-time-overhead``, ``--layer-pipeline-max-degree``:\n constrain the configuration space of inter-layer pipelining, by specifying\n the maximum execution time overhead, or the maximum pipelining degree.\n- ``--disable-interlayer-opt``: disable optimizations and only allow basic\n inter-layer pipelining.\n\n\nCode Structure\n--------------\n\n- ``nn_dataflow``\n - ``core``\n - Top-level dataflow exploration: ``nn_dataflow``,\n ``nn_dataflow_scheme``.\n - Layer scheduling: ``scheduling``.\n - Array mapping: ``map_strategy``.\n - Loop blocking and reordering: ``loop_blocking``,\n ``loop_blocking_scheme``, ``loop_blocking_solver``.\n - Intra-layer partitioning: ``partition``, ``partition_scheme``,\n ``buf_shr_scheme``.\n - Inter-layer pipelining: ``inter_layer_pipeline``,\n ``pipeline_segment``.\n - Network and layer: ``network``, ``layer``.\n - ``nns``: example NN definitions.\n - ``tests``: unit tests.\n - ``tools``: executables.\n\n\nVerification and Testing\n------------------------\n\nTo verify the tool against the Eyeriss result [Chen16]_, see\n``nn_dataflow/tests/dataflow_test/test_nn_dataflow.py``.\n\nTo run (unit) tests, do one of the following::\n\n > python -m unittest discover\n\n > python -m pytest\n\n > pytest\n\nTo check code coverage with ``pytest-cov`` plug-in::\n\n > pytest --cov=nn_dataflow\n\n\nCopyright & License\n-------------------\n\n``nn_dataflow`` is free software; you can redistribute it and/or modify it\nunder the terms of the `BSD License `__ as published by the Open\nSource Initiative, revised version.\n\n``nn_dataflow`` was originally written by Mingyu Gao at Stanford University,\nand per Stanford University policy, the copyright of this original code remains\nwith the Board of Trustees of Leland Stanford Junior University.\n\n\nReferences\n----------\n\n.. [Gao19] Gao, Yang, Pu, Horowitz, and Kozyrakis, `TANGRAM: Optimized\n Coarse-Grained Dataflow for Scalable NN Accelerators\n `__, in ASPLOS. April, 2019.\n\n.. [Gao17] Gao, Pu, Yang, Horowitz, and Kozyrakis, `TETRIS: Scalable and\n Efficient Neural Network Acceleration with 3D Memory\n `__, in ASPLOS. April, 2017.\n\n.. [Chen16] Chen, Emer, and Sze, `Eyeriss: A Spatial Architecture for\n Energy-Efficient Dataflow for Convolutional Neural Networks\n `__, in ISCA. June, 2016.\n\n.. [Yang16] Yang, Pu, Rister, Bhagdikar, Richardson, Kvatinsky,\n Ragan-Kelley, Pedram, and Horowitz, `A Systematic Approach to Blocking\n Convolutional Neural Networks `__, arXiv\n preprint, 2016.\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/stanford-mast/nn_dataflow", "keywords": "neural-network scheduling dataflow optimizer", "license": "BSD 3-clause", "maintainer": "", "maintainer_email": "", "name": "nn-dataflow", "package_url": "https://pypi.org/project/nn-dataflow/", "platform": "", "project_url": "https://pypi.org/project/nn-dataflow/", "project_urls": { "Homepage": "https://github.com/stanford-mast/nn_dataflow" }, "release_url": "https://pypi.org/project/nn-dataflow/2.0/", "requires_dist": [ "argparse", "coverage (>=4)", "fastcache (>=1)", "pytest (>=3)", "pytest-cov (>=2)", "pytest-xdist (>=1)", "sympy (>=1)" ], "requires_python": "", "summary": "Explore the energy-efficient dataflow scheduling for neural networks.", "version": "2.0" }, "last_serial": 4883004, "releases": { "1.5": [ { "comment_text": "", "digests": { "md5": "1597459119f829240819fb70e0a2cbe8", "sha256": "c2fb4eae8ba3bb4600c2ad3420555f7199dc50de5211517d23705ed67e94699b" }, "downloads": -1, "filename": "nn_dataflow-1.5-py2-none-any.whl", "has_sig": false, "md5_digest": "1597459119f829240819fb70e0a2cbe8", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 153486, "upload_time": "2017-08-23T20:45:00", "url": "https://files.pythonhosted.org/packages/b3/50/1842942006585ee27570429d36b20f211dd27b64a7be5de427f17b4cb86d/nn_dataflow-1.5-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ab3eb315e5aabdb90b408c7645267bd8", "sha256": "f02d237cc68eb34494414ac5b929614dcce54b524a40b9c1121a1ae39126b989" }, "downloads": -1, "filename": "nn_dataflow-1.5.tar.gz", "has_sig": false, "md5_digest": "ab3eb315e5aabdb90b408c7645267bd8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 86202, "upload_time": "2017-08-23T20:45:02", "url": "https://files.pythonhosted.org/packages/1a/ee/01f35471bb20005d7d15e28140702f7fd9d785bd74ad367344f848306f2e/nn_dataflow-1.5.tar.gz" } ], "1.6": [ { "comment_text": "", "digests": { "md5": "c7f1cd598f7586fcfb9c047ab0897af0", "sha256": "72ac07c132bcfd105e187307357407249cfed1458789668c3cb81e4a838767f5" }, "downloads": -1, "filename": "nn_dataflow-1.6-py2-none-any.whl", "has_sig": false, "md5_digest": "c7f1cd598f7586fcfb9c047ab0897af0", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 157881, "upload_time": "2019-02-01T07:50:55", "url": "https://files.pythonhosted.org/packages/2f/ee/504eab6fc9920192f1ed46d91c49f7f09a61f421fc02a7981f41ce8b2edd/nn_dataflow-1.6-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "57bde6ceda80235d5674d20a1a349002", "sha256": "7c599fb840647a74fb62e246ec524923278e3b75e4a8104db823dcc1fbfdc31e" }, "downloads": -1, "filename": "nn_dataflow-1.6.tar.gz", "has_sig": false, "md5_digest": "57bde6ceda80235d5674d20a1a349002", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 99842, "upload_time": "2019-02-01T07:50:58", "url": "https://files.pythonhosted.org/packages/37/e9/ecbc6357176d39ebac202b1c05fb7383f358fceb5b1e2a3c9b34c3f0302e/nn_dataflow-1.6.tar.gz" } ], "2.0": [ { "comment_text": "", "digests": { "md5": "c76c276f2060ba3c9e40ccc84e2cd2e6", "sha256": "77016f262f50d6c99a4d715c7f285624183e00676289580dc8325a1b07826be6" }, "downloads": -1, "filename": "nn_dataflow-2.0-py2-none-any.whl", "has_sig": false, "md5_digest": "c76c276f2060ba3c9e40ccc84e2cd2e6", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 231778, "upload_time": "2019-03-01T08:34:08", "url": "https://files.pythonhosted.org/packages/68/4a/4636d481e977d1053369e9f88f14f98b3ea700a2e17af6f4f76dd4b7157a/nn_dataflow-2.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6215991c25344183d835077151b40aeb", "sha256": "97bef3c9d1c820d77c0b7849a2131eb408d930d10501a862fa72e5859575fa60" }, "downloads": -1, "filename": "nn_dataflow-2.0.tar.gz", "has_sig": false, "md5_digest": "6215991c25344183d835077151b40aeb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 163912, "upload_time": "2019-03-01T08:34:15", "url": "https://files.pythonhosted.org/packages/46/85/845502913554e812b0039db2feba3147da44e10fda912418265b0348a348/nn_dataflow-2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c76c276f2060ba3c9e40ccc84e2cd2e6", "sha256": "77016f262f50d6c99a4d715c7f285624183e00676289580dc8325a1b07826be6" }, "downloads": -1, "filename": "nn_dataflow-2.0-py2-none-any.whl", "has_sig": false, "md5_digest": "c76c276f2060ba3c9e40ccc84e2cd2e6", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 231778, "upload_time": "2019-03-01T08:34:08", "url": "https://files.pythonhosted.org/packages/68/4a/4636d481e977d1053369e9f88f14f98b3ea700a2e17af6f4f76dd4b7157a/nn_dataflow-2.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6215991c25344183d835077151b40aeb", "sha256": "97bef3c9d1c820d77c0b7849a2131eb408d930d10501a862fa72e5859575fa60" }, "downloads": -1, "filename": "nn_dataflow-2.0.tar.gz", "has_sig": false, "md5_digest": "6215991c25344183d835077151b40aeb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 163912, "upload_time": "2019-03-01T08:34:15", "url": "https://files.pythonhosted.org/packages/46/85/845502913554e812b0039db2feba3147da44e10fda912418265b0348a348/nn_dataflow-2.0.tar.gz" } ] }