{ "info": { "author": "Christoph Boeddeker", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Software Development :: Build Tools" ], "description": "# dlp_mpi - Data-level parallelism with mpi for python\n\n\n\n\n\n\n\n\n\n\n\n\n
\nRun an serial algorithm on multiple examples\n\nUse dlp_mpi to run the loop body in parallel\n\nUse dlp_mpi to run a function in parallel\n
\n\n```python\n# python script.py\n\nimport time\n\n\nexamples = list(range(10))\nresults = []\n\n\n\n\n\n\n\nfor example in examples:\n\n # Some heavy workload:\n # CPU or IO\n time.sleep(0.2)\n result = example\n\n # Remember the results\n results.append(result)\n\n\n\n\n\n\n\n\n\n\n# Summarize your experiment\nprint(sum(results))\n```\n\n\n```python\n# mpiexec -np 8 python script.py\n\nimport time\nimport dlp_mpi\n\nexamples = list(range(10))\nresults = []\n\n\n\n\n\n\n\nfor example in dlp_mpi.split_managed(\n examples):\n # Some heavy workload:\n # CPU or IO\n time.sleep(0.2)\n result = example\n\n # Remember the results\n results.append(result)\n\nresults = dlp_mpi.gather(results)\n\nif dlp_mpi.IS_MASTER:\n results = [\n result\n for worker_results in results\n for result in worker_results\n ]\n\n # Summarize your experiment\n print(results)\n```\n\n\n```python\n# mpiexec -np 8 python script.py\n\nimport time\nimport dlp_mpi\n\nexamples = list(range(10))\nresults = []\n\ndef work_load(example):\n # Some heavy workload:\n # CPU or IO\n time.sleep(0.2)\n result = example\n\nfor result in dlp_mpi.map_unordered(\n work_load, examples):\n\n\n\n\n\n # Remember the results\n results.append(result)\n\n\n\n\n\n\n\n\n\nif dlp_mpi.IS_MASTER:\n # Summarize your experiment\n print(results)\n```\n
\n\nThis package uses `mpi4py` to provide utilities to parallelize algorithms that are applied to multiple examples.\n\nThe core idea is: Start `N` processes and each process works on a subset of all examples.\nTo start the processes `mpiexec` can be used. Most HPC systems support MPI to scatter the workload across multiple hosts. For the command, look in the documentation for your HPC system and search for MPI launches.\n\nSince each process should operate on different examples, MPI provides the variables `RANK` and `SIZE`, where `SIZE` is the number of workers and `RANK` is a unique identifier from `0` to `SIZE - 1`.\nThe easiest way to improve the execution time is to process `examples[RANK::SIZE]` on each worker.\nThis is a round robin load balancing (`dlp_mpi.split_round_robin`).\nA more advanced load balancing is `dlp_mpi.split_managed`, where one process manages the load and assigns a new task to a worker once he finishes the last task.\n\nWhen in the end of a program all results should be summarized or written in a single file, communication between all processes is nessesary.\nFor this purpose `dlp_mpi.gather` (`mpi4py.MPI.COMM_WORLD.gather`) can be used. This function sends all data to the root process (Here, `pickle` is used for serialization).\n\nAs an alternative to splitting the data, this package also provides a `map` style parallelization (see example in the beginning):\nThe function `dlp_mpi.map_unordered` calls `work_load` in parallel and executes the `for` body in serial.\nThe communication between the processes is only the `result` and the index to get the `i`th example from the examples, i.e., the example aren't transferred between the processes.\n\n# Runtime\n\nWithout this package your code runs in serial.\nThe execution time of the following code snippets will be demonstrated by running it with this package.\nRegarding the color: The `examples = ...` is the setup code.\nTherefore, the code and the correspoding block representing the execution time it is blue in the code.\n\n![(Serial Worker)](doc/tikz_split_managed_serial.svg)\n\nThis easiest way to parallelize the workload (dark orange) is to do a round robin assignment of the load:\n`for example in dlp_mpi.split_round_robin(examples)`.\nThis function call is equivalent to `for example in examples[dlp_mpi.RANK::dlp_mpi.SIZE]`.\nThus, there is zero comunications between the workers.\nOnly when it is nessesary to do some final work on the results of all data (e.g. calculating average metrics) a communication is nessesary.\nThis is done with the `gather` function.\nThis functions returns the worker results in a list on the master process and the worker process gets a `None` return value.\nDepending on the workload the round robin assingment can be suboptimal.\nSee the example block diagramm.\nWorker 1 got tasks that are relative long.\nSo this worker used much more time than the others.\n\n![(Round Robin)](doc/tikz_split_managed_rr.svg)\n\nTo overcome the limitations of the round robin assignment, this package helps to use a manager to assign the work to the workers.\nThis optimizes the utilisation of the workers.\nOnce a worker finished an example, it requests a new one from the manager and gets one assigned.\nNote: The communication is only which example should be processed (i.e. the index of the example) not the example itself.\n\n![(Managed Split)](doc/tikz_split_managed_split.svg)\n\nAn alternative to splitting the iterator is to use a `map` function.\nThe function is then executed on a worker and the return value is sent back to the manager.\nBe carefull, that the loop body is fast enough, otherwise it can be a bottleneck.\nYou should use the loop body only for book keeping, not for actual work load.\nWhen a worker sends a task to the manager, the manager sends back a new task and enters the for loop body. \nWhile the manager is in the loop body, he cannot react on requests of other workers, see the block diagramm:\n\n![(Managed Map)](doc/tikz_split_managed_map.svg)\n\n\n# Installation\n\nYou can install this package from pypi:\n```bash\npip install dlp_mpi\n```\n\nTo check if the installation was successful, try the following command:\n```bash \n$ mpiexec -np 4 python -c 'import dlp_mpi; print(dlp_mpi.RANK)'\n3\n0\n1\n2\n```\nThe command should print the numbers 0, 1, 2 and 3.\nThe order is random.\nWhen that line prints 4 times a zero, something went wrong.\n\nThis can happen, when you have no `mpi` installed or the installation is brocken.\nIn a Debian-based Linux you can install it with `sudo apt install libopenmpi-dev`.\nWhen you do not have the rights to install something with `apt`, you could also install `mpi4py` with `conda`.\nThe above `pip install` will install `mpi4py` from `pypi`.\nBe careful, that the installation from `conda` may conflict with your locally installed `mpi`. \nEspecially in High Performance Computing (HPC) environments this can cause troubles.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/fgnt/dlp_mpi", "keywords": "mpi", "license": "", "maintainer": "", "maintainer_email": "", "name": "dlp-mpi", "package_url": "https://pypi.org/project/dlp-mpi/", "platform": "", "project_url": "https://pypi.org/project/dlp-mpi/", "project_urls": { "Homepage": "https://github.com/fgnt/dlp_mpi" }, "release_url": "https://pypi.org/project/dlp-mpi/0.0.3/", "requires_dist": [ "mpi4py" ], "requires_python": ">=3.6, <4", "summary": "Data-level parallelism with mpi in python", "version": "0.0.3", "yanked": false, "yanked_reason": null }, "last_serial": 13399620, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "5953199b28a6e3afd813fb20e328bb64", "sha256": "b1304074c08eccd2ce84882809ff81156986fb3eec3e8fe18c53429bbfae2102" }, "downloads": -1, "filename": "dlp_mpi-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "5953199b28a6e3afd813fb20e328bb64", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "size": 11695, "upload_time": "2019-10-28T10:02:12", "upload_time_iso_8601": "2019-10-28T10:02:12.603669Z", "url": "https://files.pythonhosted.org/packages/d0/e2/741b63557278113e86fef8cee4644f6541ec9dd8e1fb715a121b7aa93850/dlp_mpi-0.0.1-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "37438c1ec34adb23a1a15988f20bff2d", "sha256": "dd174ada88a1703e09dd78fd178a00198bf54fe3d860366e037693a5b92f1ffb" }, "downloads": -1, "filename": "dlp_mpi-0.0.1.tar.gz", "has_sig": false, "md5_digest": "37438c1ec34adb23a1a15988f20bff2d", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4", "size": 14393, "upload_time": "2019-10-28T10:02:14", "upload_time_iso_8601": "2019-10-28T10:02:14.607715Z", "url": "https://files.pythonhosted.org/packages/5a/68/9b338d930a0cd36a4f8dcc26e0fce9c6b96be87a84841b851cbebc986712/dlp_mpi-0.0.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "fb5647c8bdebfa91afecf132d64e9d28", "sha256": "6635708a758db61cea0bfa4d381ee507e2729a1710712cf365d48cb3d41f8a36" }, "downloads": -1, "filename": "dlp_mpi-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "fb5647c8bdebfa91afecf132d64e9d28", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6, <4", "size": 12022, "upload_time": "2019-10-29T08:23:45", "upload_time_iso_8601": "2019-10-29T08:23:45.850781Z", "url": "https://files.pythonhosted.org/packages/01/59/20d7b6c5c531a2007aea49664b70ebbfefc56477c594669e0a7e4162c9ee/dlp_mpi-0.0.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "574b0d1cbab516305627f50f99995005", "sha256": "dd092a839dacadeee241a71f80b7ef26baedc044271728396ac63710982d0bf2" }, "downloads": -1, "filename": "dlp_mpi-0.0.2.tar.gz", "has_sig": false, "md5_digest": "574b0d1cbab516305627f50f99995005", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6, <4", "size": 14549, "upload_time": "2019-10-29T08:23:48", "upload_time_iso_8601": "2019-10-29T08:23:48.231298Z", "url": "https://files.pythonhosted.org/packages/d3/9a/d89ba58e8ffbc2b6a79fc5787e8d655c00fb279bb8807138515c7f8574d0/dlp_mpi-0.0.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "bcc9012aae0439b545ff4ec6e175e1b7", "sha256": "5bec459480c16b4f1dd35dad7c095fd95749fc9145dc8561fc4c8d5ac6d8e074" }, "downloads": -1, "filename": "dlp_mpi-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "bcc9012aae0439b545ff4ec6e175e1b7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6, <4", "size": 13170, "upload_time": "2020-08-10T13:51:33", "upload_time_iso_8601": "2020-08-10T13:51:33.533686Z", "url": "https://files.pythonhosted.org/packages/da/0d/4aa3286d97d7708b86df65259542081d348f6d06b49b6b6674fd39ac32fc/dlp_mpi-0.0.3-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "102653b84d538b1040a00ea363505464", "sha256": "459577f527dff69033d45b6e9005b316706145f850c832100c9995419ba6ac80" }, "downloads": -1, "filename": "dlp_mpi-0.0.3.tar.gz", "has_sig": false, "md5_digest": "102653b84d538b1040a00ea363505464", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6, <4", "size": 15372, "upload_time": "2020-08-10T13:51:34", "upload_time_iso_8601": "2020-08-10T13:51:34.866359Z", "url": "https://files.pythonhosted.org/packages/d2/8d/0b3f5677dc4dc34f5a109f0f38e1abc9fc50bc64674789ea4aee7324f444/dlp_mpi-0.0.3.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "bcc9012aae0439b545ff4ec6e175e1b7", "sha256": "5bec459480c16b4f1dd35dad7c095fd95749fc9145dc8561fc4c8d5ac6d8e074" }, "downloads": -1, "filename": "dlp_mpi-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "bcc9012aae0439b545ff4ec6e175e1b7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6, <4", "size": 13170, "upload_time": "2020-08-10T13:51:33", "upload_time_iso_8601": "2020-08-10T13:51:33.533686Z", "url": "https://files.pythonhosted.org/packages/da/0d/4aa3286d97d7708b86df65259542081d348f6d06b49b6b6674fd39ac32fc/dlp_mpi-0.0.3-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "102653b84d538b1040a00ea363505464", "sha256": "459577f527dff69033d45b6e9005b316706145f850c832100c9995419ba6ac80" }, "downloads": -1, "filename": "dlp_mpi-0.0.3.tar.gz", "has_sig": false, "md5_digest": "102653b84d538b1040a00ea363505464", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6, <4", "size": 15372, "upload_time": "2020-08-10T13:51:34", "upload_time_iso_8601": "2020-08-10T13:51:34.866359Z", "url": "https://files.pythonhosted.org/packages/d2/8d/0b3f5677dc4dc34f5a109f0f38e1abc9fc50bc64674789ea4aee7324f444/dlp_mpi-0.0.3.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }