{
    "info": {
        "author": "Panagiotis Chatzidoukas",
        "author_email": "phadjido@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "License :: OSI Approved",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 3"
        ],
        "description": "# torcpy: supporting task-based parallelism in Python\n\n## 1. Introduction ##\n\n**torcpy** is a platform-agnostic adaptive load balancing library that orchestrates scheduling of multiple function evaluations on both shared and distributed memory platforms.\n\nAs an open-source tasking library, **torcpy**, aims at providing a parallel computing framework that:\n- offers a unified approach for expressing and executing task-based parallelism on both shared and distributed memory platforms\n- takes advantage of MPI internally in a transparent to the user way but also allows the use of legacy MPI code at the application level\n- provides lightweight support for parallel nested loops and map functions\n- supports task stealing at all levels of parallelism\n- exports the above functionalities through a simple and single Python package\n\ntorcpy exports an interface compatible with PEP 3148. Therefore tasks (futures) can be spawned and joined with the *submit* and *wait* calls. A parallel *map* function is provided, while *spmd* allows for switching\nto the typical SPMD execution mode that is natively supported by MPI.\n\n## 2. Installation and testing\n\n### 2.1 Installation\n\nPrerequisites: `python3 >= 5` and `pip3`.\n\n```bash\ngit clone git@github.com:IBM/torc_py.git\ncd torc_py\npip3 install .\n```\n\nObservation:\nThe main requirements are *mpi4py* and *termcolor*, while *numpy*, *cma*, *h5py* and *pillow* are needed by some examples.\n\n\n### 2.2. Testing\n\nWe use the file `examples\\ex00_masterworker.py` to demonstrate the execution of the tasking library using multiple processes and threads. The task function receives as input a number `x`, sleeps for one second and then computes and returns as result the square value `x*x`. The main task spawns `ntasks` (= four) tasks that are distributed cyclically, by default, to the available workers and then calls `wait`, waiting for their completion. Finally, it prints the task results and reports the elapsed time.\n\nThe MPI processes start with the execution of `__main__` and call `torcpy.start(main)`, which initializes the tasking library and then executes the primary application task (with task function `main()`) on the process with rank 0.  \n\n```python\nimport time\nimport threading\nimport torcpy\n\ndef work(x):\n    time.sleep(1)\n    y = x**2\n    print(\"work inp={:.3f}, out={:.3f} ...on node {:d} worker {} thread {}\".format(x, y, torcpy.node_id(), torcpy.worker_id(), threading.get_ident()), flush=True)\n    return y\n\ndef main():\n    ntasks = 4\n    sequence = range(1, ntasks + 1)\n\n    t0 = torcpy.gettime()\n    tasks = []\n    for i in sequence:\n        task = torcpy.submit(work, i)\n        tasks.append(task)\n    torcpy.wait()\n    t1 = torcpy.gettime()\n\n    for t in tasks:\n        print(\"Received: {}^2={:.3f}\".format(t.input(), t.result()))\n\n    print(\"Elapsed time={:.2f} s\".format(t1 - t0))\n\nif __name__ == '__main__':\n    torcpy.start(main)\n```\n\n### 2.3 Execution of tests\n\n#### 2.3.1. One MPI process with one worker\n\nThis is similar to the sequential execution of the code with the main difference that the task functions are executed not immediately but in deferred mode. No MPI communication takes place and the tasks are directly inserted in the local queue. Upon `torcpy.wait()`, the current (primary) application task suspends its execution, the scheduling loop of the underlying worker is activated and the child tasks are fetched and executed. When the last child task completes, the primary task resumes and prints the results. Moreover, the tasking library reports how many tasks where created and executed by each MPI process.\n\n```console\n$ mpirun -n 1 python3 ex00_masterworker.py\nTORCPY: main starts\nwork inp=1.000, out=1.000 ...on node 0 worker 0 thread 4536538560\nwork inp=2.000, out=4.000 ...on node 0 worker 0 thread 4536538560\nwork inp=3.000, out=9.000 ...on node 0 worker 0 thread 4536538560\nwork inp=4.000, out=16.000 ...on node 0 worker 0 thread 4536538560\nReceived: 1^2=1.000\nReceived: 2^2=4.000\nReceived: 3^2=9.000\nReceived: 4^2=16.000\nElapsed time=4.03 s\nTORCPY: node[0]: created=4, executed=4\n```\n\n#### 2.3.2. Two MPI process with one worker each\n\nTwo MPI processes (*nodes*) are started with rank 0 and 1, respectively. Each process has a single worker thread, with global id 0 and 1, accordingly. The primary task runs on rank 0 and spawns the four tasks. The first and the third tasks are submitted locally to worker 0 while the second and fourth tasks are send to worker 1.\n\n```console\n$ mpirun -n 2 python3 ex00_masterworker.py\nTORCPY: main starts\nwork inp=1.000, out=1.000 ...on node 0 worker 0 thread 4585866688\nwork inp=2.000, out=4.000 ...on node 1 worker 1 thread 4623332800\nwork inp=3.000, out=9.000 ...on node 0 worker 0 thread 4585866688\nwork inp=4.000, out=16.000 ...on node 1 worker 1 thread 4623332800\nReceived: 1^2=1.000\nReceived: 2^2=4.000\nReceived: 3^2=9.000\nReceived: 4^2=16.000\nElapsed time=2.03 s\nTORCPY: node[0]: created=4, executed=2\nTORCPY: node[1]: created=0, executed=2\n```\n\n#### 2.3.3. One MPI process with two workers\n\nThe single MPI process is initialized with two worker threads, with global ids 0 and 1. The four tasks are inserted in the local process queue and extracted and executed by the two workers. The primary task is executed by worker 0 and it is also tied to it, therefore it always continues on the same worker.\n\n```console\n$ mpirun -n 1 -env TORCPY_WORKERS=2  python3 ex00_masterworker.py\nTORCPY: main starts\nwork inp=1.000, out=1.000 ...on node 0 worker 0 thread 4607645120\nwork inp=2.000, out=4.000 ...on node 0 worker 1 thread 123145550958592\nwork inp=3.000, out=9.000 ...on node 0 worker 0 thread 4607645120\nwork inp=4.000, out=16.000 ...on node 0 worker 1 thread 123145550958592\nReceived: 1^2=1.000\nReceived: 2^2=4.000\nReceived: 3^2=9.000\nReceived: 4^2=16.000\nElapsed time=2.02 s\nTORCPY: node[0]: created=4, executed=4\n```\n\n#### 2.3.4 Two MPI processes with two workers each\n\nThere are two MPI processes with two workers each, therefore workers 0 and 1 belong to the process with rank 0 and workers 2 and 3 to rank 1. Since task distribution is performed on a worker basis, the first and the second tasks are submitted locally to node 0 while the third and fourth tasks are send to node 1. Eventually, every worker executes one task and the application is executed 4x times faster.\n\n```console\n$ mpirun -n 2 -env TORCPY_WORKERS=2  python3 ex00_masterworker.py\nTORCPY: main starts\nwork inp=2.000, out=4.000 ...on node 0 worker 0 thread 4560111040\nwork inp=1.000, out=1.000 ...on node 0 worker 1 thread 123145531727872\nwork inp=4.000, out=16.000 ...on node 1 worker 2 thread 4643915200\nwork inp=3.000, out=9.000 ...on node 1 worker 3 thread 123145537077248\nReceived: 1^2=1.000\nReceived: 2^2=4.000\nReceived: 3^2=9.000\nReceived: 4^2=16.000\nElapsed time=1.04 s\nTORCPY: node[0]: created=4, executed=2\nTORCPY: node[1]: created=0, executed=2\n```\n\n## 3. Examples\n\nPlease note that the `torcpy` module is imported as `torc` in the following examples.\n\n### 3.1. Submit and wait\n\nThe primary task spawns and distributes cyclically 10 tasks to the available workers, waits for their completion and finally prints the results.\n\n```python\nimport torcpy as torc\n\n\ndef work(x):\n    return x * x\n\n\ndef main():\n    data = range(10)\n    tasks = []\n    for d in data:\n        tasks.append(torc.submit(work, d))\n    torc.wait()\n    for t in tasks:\n        print(t.result())\n\nif __name__ == '__main__':\n    torc.start(main)\n```\n\n### 3.2. Parallel map\n\nEquivalent to the previous example but this time using the `map` function with default chunk size equal to 1.\n\n```python\nimport torcpy as torc\n\ndef work(x):\n    return x*x\n\ndef main():\n    data = range(10)\n    results = torc.map(work, data)\n    print(results)\n\nif __name__ == '__main__':\n    torc.start(main)\n```    \n\n\n### 3.3. Simple callback example\n\nFour tasks are spawned and executed by the available workers. When a task completes, it is passed as argument to a callback task that is executed by the worker threads of the node (process) where the parent task is active.\n\n```python\nimport torcpy as torc\nimport threading\n\n\ndef cb(task):\n    arg = task.result()\n    tid = threading.get_ident()\n    print(\"thread {}: callback with arg={}\".format(tid, arg), flush=True)\n\n\ndef work(x):\n    tid = threading.get_ident()\n    y = x * x\n    print(\"thread {}: work inp={}, out={} ... on node {}\".format(tid, x, y, torc.node_id()), flush=True)\n    return y\n\n\ndef main():\n    ntasks = 4\n    sequence = range(1, ntasks+1)\n\n    t_all = []\n    for i in sequence:\n        task = torc.submit(work, i, callback=cb)\n        t_all.append(task)\n    torc.wait()\n\n    for task in t_all:\n        print(\"Received: {}^2={}\".format(task.input(), task.result()))\n\n\nif __name__ == '__main__':\n    torc.start(main)\n```\n\n\n### 3.4. Hierarchical parallelism: recursive Fibonacci\n\nMultiple levels of parallelism are exploited in this commonly used parallelization example of recursive Fibonacci.\n\n```python\nimport torcpy as torc\n\n\ndef fib(n):\n    if n == 0:\n        result = 0\n    elif n == 1:\n        result = 1\n    else:\n        n_1 = n - 1\n        n_2 = n - 2\n        if n < 30:\n            result1 = fib(n_1)\n            result2 = fib(n_2)\n            result = result1 + result2\n        else:\n            task1 = torc.submit(fib, n_1)\n            task2 = torc.submit(fib, n_2)\n            torc.wait()\n            result = task1.result() + task2.result()\n\n    return result\n\n\ndef main():\n    n = 35\n    result = fib(n)\n\n    print(\"fib({}) = {}\".format(n, result))\n```\n\n### 3.5. Calling MPI SPMD code: MPI_Bcast  \n\nThe global array `A` is initialized by the primary application task (`main`) on MPI process 0.\nNext, the `spmd` function triggers the execution of `bcast_task` on all MPI processes, thus switching\nto the SPMD execution model and allowing for direct data broadcast using `Bcast`.\n\n```python\nimport numpy\nimport torcpy as torc\nfrom mpi4py import MPI\n\nN = 3\nA = numpy.zeros(N, dtype=numpy.float64)\n\n\ndef bcast_task(root):\n    global A\n    comm = MPI.COMM_WORLD\n    # Broadcast A from rank 0\n    comm.Bcast([A, MPI.DOUBLE], root=root)\n\n\ndef work():\n    global A\n    print(\"node:{} -> A={}\".format(torc.node_id(), A))\n\n\ndef main():\n    global A\n\n    # primary task initializes array A on rank 0\n    for i in range(0, N):\n        A[i] = 100*i\n\n    torc.spmd(bcast_task, torc.node_id())  # 2nd arg (root) is 0 and can be omitted\n\n    torc.spmd(work)\n```\n\n\n### 3.6. Reduction operation using callbacks\n\nIn this example, the callback function adds the task result to a global variable, implementing thus\na reduction operation. This example assumes that a single worker thread per MPI process is used and that callbacks are instantiated as tasks that are executed by worker threads.\n\n```python\nimport torcpy as torc\n\nsum_v = 0\n\n\ndef cb(task):\n    global sum_v\n    arg = task.result()\n    sum_v = sum_v + arg\n\n\ndef work(x):\n    y = x ** 2\n    return y\n\n\ndef main():\n    data = range(10)\n\n    tasks = []\n    for d in data:\n        t = torc.submit(work, d, callback=cb)\n        tasks.append(t)\n\n    torc.wait()\n    print(\"Sum=\", sum_v)\n\n\nif __name__ == '__main__':\n    torc.start(main)\n```\n\n\n### 3.7. Late switch from SPMD to master-worker and image preprocessing\n\n```python\nimport os\nimport sys\nimport time\nfrom PIL import Image\nimport torcpy as torc\n\nfiles = []\n\n\ndef get_files(path):\n    all_files = []\n    for dirpath, dirnames, filenames in os.walk(path):\n        for f in filenames:\n            if f.endswith('.jpg') | f.endswith('.jpeg') | f.endswith('.png'):\n                all_files.append(os.path.join(dirpath, f))\n\n    return sorted(all_files)\n\n\ndef work(i):\n    global files\n    f = files[i]\n    with Image.open(f) as im:\n        im = im.resize((32, 32))\n        # do something more here\n\n    return None\n\n\ndef main():\n    global files\n\n    # SPMD execution: torcpy and MPI initialization\n    torc.init()\n\n    # SPMD execution: common global initialization takes place here\n    if len(sys.argv) == 1:\n        if torc.node_id() == 0:\n            print(\"usage: python3 {} <path>\".format(os.path.basename(__file__)))\n        return\n\n    files = get_files(sys.argv[1])\n\n    # Switching to master-worker\n    torc.launch(None)\n\n    t0 = time.time()\n    _ = torc.map(work, range(len(files)))\n    t1 = time.time()\n    print(\"t1-t0=\", t1-t0)\n\n    torc.shutdown()\n\n\nif __name__ == \"__main__\":\n    main()\n```\n\n### 3.8. Parallel numerical optimization\n\n```python\nimport cma  # python package for numerical optimization\nimport torcpy as torc\n\ndef rosenbrock(x):\n    \"\"\"Rosenbrock test objective function\"\"\"\n    n = len(x)\n    if n < 2:\n      raise ValueError('dimension must be greater one')\n    return sum(100 * (x[i]**2 - x[i+1])**2 + (x[i] - 1)**2 for i in range(n-1))\n\ndef main():\n    es = cma.CMAEvolutionStrategy(2 * [0], 0.5, {'popsize':640})\n    while not es.stop():\n        solutions = es.ask()\n        es.tell(solutions, torc.map(rosenbrock, solutions))\n        es.logger.add(es)  # write data to disc to be plotted\n        es.disp()\n\n    cma.plot()\n```\n\n\n### 3.9. Demonstration of work stealing\n\nAll 16 tasks are submitted by the primary task, running on node 0, to the worker thread of rank 1.\nThe idle workers that find the local queue empty, issue steal requests and eventually retreive a task from the queue of rank 1. The primary task resumes when all the child tasks have completed and their results are available on rank 0.\n\n```python\nimport time\nimport torcpy as torc\n\n\ndef work(x):\n    time.sleep(1)\n    y = x*x\n    print(\"taskfun inp={}, out={} ...on node {:d}\".format(x, y, torc.node_id()), flush=True)\n    return y\n\n\ndef main():\n    nodes = torc.num_nodes()\n    if nodes < 2:\n        print(\"This examples needs at least two MPI processes. Exiting...\")\n        return\n    local_workers = torc.num_local_workers()\n    if local_workers > 1:\n        print(\"This examples should use one worker thread per MPI process. Exiting...\")\n        return\n\n    ntasks = 16\n    sequence = range(1, ntasks + 1)\n\n    t0 = torc.gettime()\n    t_all = []\n    for i in sequence:\n        try:\n            task = torc.submit(work, i, qid=1)\n            t_all.append(task)\n        except ValueError:\n            print(\"torc.submit: invalid argument\")\n\n    torc.enable_stealing()\n    torc.wait()\n    t1 = torc.gettime()\n    torc.disable_stealing()\n\n    for task in t_all:\n        print(\"Received: {}^2={}\".format(task.input(), task.result()))\n\n    print(\"Elapsed time={:.2f} s\".format(t1 - t0))\n\n\nif __name__ == '__main__':\n    torc.start(main)\n```\n\n\n## 4. Application Programming Interface\n\n### 4.1. Task management routines\n\n- `submit(f, *args, qid=-1, callback=None, **kwargs)`:\nsubmits a new task that corresponds to the asynchronous execution of function `f()` with input arguments `args`. The task is submitted to the worker with global identifier `qid`. If `qid` is equal to -1, then cyclic distribution of tasks to processes is performed. The `callback` function is called on the rank that spawned the task, when the task completes and its results have been returned to that node.\n- `map(f, *seq, chunksize=1)`: executes function `f()` on a sequence (list) of arguments.\nIt returns a list with the results of all tasks. It is similar to the `map()` function of Python and other packages, allowing for straightforward usage of existing codes.\n- `wait(tasks=None)`: the current task waits for all its child tasks to finish. The underlying worker thread is released and can execute other tasks.\n- `as_completed(tasks=None)`: similar to `wait` but returns the finished child tasks in the order they completed their execution.\n\n### 4.2. Application setup\n\n- `start(f)`: initializes the library, launches function `f()` on process with rank 0 as the primary application task. When `f()` completes, it shutdowns the library. It is a collective function that must be called within `__main__`.\n\n```python\nif __name__ == '__main__':\n    torcpy.start(main)\n```\n\n### 4.3. Low-level application setup\n\n- `init()`: initializes the tasking library.\n- `launch(f)`: launches function `f()` as primary application task on the MPI process with rank 0. It is a collective call that must be called by all MPI processes. If `f == NULL` then the function returns on rank 0 but activates the scheduling loop of the main worker thread on all other MPI processes. Therefore, the current function becomes the primary application task running on rank 0.\n- `shutdown()`: shutdowns the tasking library.\n\n### 4.4. MPI code\n\n- `spmd(f, *args)`: executes function `f()` on all MPI processes. It allows for dynamic switching from the master-worker to the SPMD execution mode, allowing thus legacy MPI code to be used within the function.\n\n### 4.5. Additional calls\n\n- `enable_stealing(), disable_stealing()`: control task stealing between MPI processes.\n- `gettime()`: current time in seconds (float).\n- `worker_id(), num_workers()`: return the global worker thread id and the total number of workers.\n- `node_id(), num_nodes()`: return the rank of the calling MPI process and the number of MPI processes.\n\n### 4.6 Context manager\n\n### 4.7. Environment variables\n\n- `TORCPY_WORKERS` (integer): number of worker threads used by each MPI processor. Default value is 1.\n- `TORCPY_STEALING` (boolean): determines if internode task-stealing is enabled or not. Default value is \"False\".\n- `TORCPY_SERVER_YIELDTIME` (float): for how many seconds an idle server thread will sleep releasing the processor. Default value is 0.01.\n- `TORCPY_WORKER_YIELDTIME` (float): for how many seconds an idle worker thread will sleep releasing the processor. Default value is 0.01.\n\n\n## 5. Design and architecture\n\nThe library is implemented on top of MPI and multithreading and it can considered as the pure Python implementation of the [*TORC*](https://github.com/phadjido/torc_lite) C/C++ runtime library [Hadjidoukas:2012], a software package for programming and running unaltered task-parallel programs on both shared and distributed memory platforms. The library supports platform agnostic nested parallelism and automatic load balancing in large scale computing architectures. It has been used at the core of the [\u03a04U](https://github.ibm.com/cselab/pi4u) framework [Hadjidoukas:2015], allowing for HPC implementations, for both multicore and GPU clusters, of algorithms such as Transitional Markov Chain Monte Carlo (TMCMC) and Approximate Bayesian Computational Subset-simulation.\n\n*torcpy* is mainly built on top of the following third-party Python packages: *mpi4py*, *threading*, *queue*.\nTasks are instantiated as Python dictionaries, which introduce less overhead than objects. The result of the task function is transparently stored in the task descriptor (future) on the MPI process that spawned the task. According to PEP 3184, the result can be then accessed as `task.result()`. Similarly, the input parameters can be accessed as `task.input()`.\n\nAll remote operations are performed asynchronously through a server thread. This thread is responsible for:\n- inserting incoming tasks to the local queue of the process\n- receiving the completed tasks and their results\n- serving task stealing requests\n\nThe internal architecture of *torcpy* is depicted in the following figure:\n\n![](./doc/torc_architecture.png)\n\n\n## 6. Performance evaluation\n\nTORC, the C/C++ counterpart of torcpy has been used extensively on small and large scale HPC environments\nsuch as the Euler cluster (ETH) and the Piz Daint (CSCS) supercomputer. TORC has been used to orchestrate the scheduling of function evaluations of the TMCMC method within \u03a04U on multiple cluster nodes. The TMCMC method was able to achieve an overall parallel efficiency of more than 90% on 1024 compute nodes of Piz Daint running hybrid MPI+GPU molecular simulation codes with highly variable time-to-solution between simulations with different interaction parameters.\n\n### 6.1. Preprocessing of image datasets\n\nA typical preprocessing stage of Deep Learning workloads includes the transformation of datasets of raw images to a single file in HDF5 format. The images are organized in subfolders, where the name of each subfolder denotes the label of the enclosed images. For each image, the file is opened and the binary data are loaded to a buffer (numpy array). This operations includes data decompression if the image is stored in JPEG format. Then, the image is resized and rescaled and additional preprocessing filters might be also applied. Finally, the result is written to an HDF5 file, that we will be used at the training phase of deep learning.\n\n```python\ndef process_train_image(i, target_dim):\n    global reader, pipe\n\n    # load the image and its label\n    im, label = reader.get_train_image(i)        \n\n    # apply preprocessing filters\n    im = pipe.filter(im)\n\n    # resize accordingly\n    dx = image_to_4d_tensor(im, target_dim)\n    dy = label\n\n    # return results\n    return dx, dy\n```\n\nThe parallelization of the sequential for loop is performed with the map function, using a chunk size of 32 so as to reduce the number of spawned tasks.\n\n```python\nsequence_i = range(n_train)\nsequence_t = [target_dim] * n_train\n\n# parallel map with chunksize\ntask_results = torcpy.map(process_train_image, sequence_i, sequence_t, chunksize=32)\n\n# write the results to the HDF5 dataset\ni = 0\nfor t in task_results:\n    dx, dy = t\n    dataset_x[i, :, :, :] = dx\n    dataset_y[i] = dy\n    i = i+1\n```\n\n### 6.2. Results\n\n[Imagenet](http://www.image-net.org/) is very large dataset of 1000 classes, each with 1300 images stored in JPEG format. We preprocess the images of 10 training classes of the Imagenet dataset. We perform our experiments on a single IBM Power8 node equipped with 20 cores and 8 threads each. The software configuration includes Python 3.5, mpi4py/3.0.1 and OpenMPI/3.1.2.\n\nWe spawn a single MPI process per core and then utilize 1,2 and 4 workers per process. The command for running the benchmark for various numbers of processes (*NR*) and local workers (*NW*) on the specific computing platform is as follows:\n\n```console\nmpirun -n $NR -x TORCPY_WORKERS=$NW --bind-to core --map-by socket --mca btl self,tcp  python3 benchmark.py\n```\n\nThe measurements include the time for spawning the parallelism, executing the preprocessing in parallel and waiting for the completion of all tasks, i.e. collecting the results back. We observe that the application exhibits good scaling and achieves ~78% efficiency when 20 processes of one worker thread each are used. The performance does not scale linearly with the number of cores as image decompression and processing stress the memory subsystem of the node. We also observe that multithreading further improves the performance, allowing for a maximum achieved speedup of 22x (20 processes, 4 threads).\n\n![](./doc/preprocessing_performance.png)\n\n\n## 7. Related work ##\n\nThere is a number of Python packages and frameworks that enable the orchestration and execution of task-based parallelism on various computing platforms. On single-node multi-core systems Python provides two packages: the [*multiprocessing*](https://docs.python.org/3/library/multiprocessing.html) and the [*concurrent.futures*](https://docs.python.org/3/library/concurrent.futures.html). The [*futures* package of *mpi4py*](https://mpi4py.readthedocs.io/en/stable/mpi4py.futures.html) provides an extension of *futures* on top of the MPI programming model. [*DTM*](http://deap.gel.ulaval.ca/doc/0.9/api/dtm.html) is an MPI-based framework that supports task-based parallelism. However, *DTM* is obsolete and has been replaced by [*Scoop*](http://pyscoop.org), which follows a more distributed-based approach without relying on MPI. This is also the case for the [*Celery*](http://www.celeryproject.org/) and [*Dask*](http://dask.pydata.org/en/latest/) frameworks, which mainly target cloud computing environments. Finally, [*PycompSS*](https://pypi.org/project/pycompss/) is a compiler-based approach, where task functions and code parallelization is based on annotations.    \n\n\n|Framework           | Clusters | Nested parallelism           | MPI |\n|--------------------|----------|------------------------------|-----|\n| *multiprocessing*  | No       | No                           | No  |\n| *futures*          | No       | No                           | No  |\n| *mpi4py.futures*   | Yes      | No                           | No  |\n| *dtm (deap 0.9.2)* | Yes      | Inefficiently (threads)      | Yes |\n| *scoop (0.7.1.1)*  | Yes      | Yes, coroutines              | No  |\n| *celery (4.2.0)*   | Yes      | No                           | No  |\n| *dask (1.2.2)*     | Yes      | Inefficiently (more workers) | No  |\n| *pycompss (2.4)*   | Yes      | Yes                          | No  |\n| **torcpy**         | Yes      | Yes                          | Yes |\n\n\n\n## Authors and contacts\n - Panagiotis Chatzidoukas, IBM Research - Zurich, hat@zurich.ibm.com\n - Cristiano Malossi, IBM Research - Zurich, acm@zurich.ibm.com\n - Costas Bekas, IBM Research - Zurich, bek@zurich.ibm.com\n\n## Acknowledgments\n\nWe would like to thank our colleagues in the IBM OCL team: Roxana Istrate, Florian Scheidegger, Andrea Bartezzaghi and Nico Gorbach.\nThis work was supported by VIMMP (Virtual Materials Marketplace Project) (Horizon 2020, GA No 760907).\n\n## References\n\n[Hadjidoukas:2012] A Runtime Library for Platform-Independent Task Parallelism. PDP 2012: 229-236. 2011\n[[DOI]](https://doi.org/10.1109/PDP.2012.89)  \n[Hadjidoukas:2015] \"\u03a04U: a high performance computing framework for Bayesian uncertainty quantification of complex models\",\" Journal of Computational Physics, 284:1\u201321, 2015. [[DOI]](https://doi.org/10.1016/j.jcp.2014.12.006)\n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "http://github.com/ibm/torc_py/",
        "keywords": "",
        "license": "Eclipse Public License v1.0",
        "maintainer": "",
        "maintainer_email": "",
        "name": "torcpy",
        "package_url": "https://pypi.org/project/torcpy/",
        "platform": "",
        "project_url": "https://pypi.org/project/torcpy/",
        "project_urls": {
            "Homepage": "http://github.com/ibm/torc_py/"
        },
        "release_url": "https://pypi.org/project/torcpy/0.1.1/",
        "requires_dist": [
            "mpi4py",
            "numpy",
            "termcolor",
            "cma",
            "pillow",
            "coloredlogs",
            "pytest"
        ],
        "requires_python": "",
        "summary": "TORC Tasking library",
        "version": "0.1.1",
        "yanked": false,
        "yanked_reason": null
    },
    "last_serial": 6452118,
    "releases": {
        "0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "71beacab57883794fd1fda69511ec164",
                    "sha256": "a2e2b07a70191964889a5d5edae4a06cc37afeeb52bdb1422bb3849b10c08dbf"
                },
                "downloads": -1,
                "filename": "torcpy-0.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "71beacab57883794fd1fda69511ec164",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 26619,
                "upload_time": "2019-10-29T14:26:08",
                "upload_time_iso_8601": "2019-10-29T14:26:08.447672Z",
                "url": "https://files.pythonhosted.org/packages/d6/f5/dd168720c81f2222a1c131d2c716c1ac8e70a3aab1f98dbb14ca61da028a/torcpy-0.1-py3-none-any.whl",
                "yanked": false,
                "yanked_reason": null
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "ab9975d9231e6203488cc6afbbc45aaf",
                    "sha256": "52785c7e7df63805b3061453e48c20e2533ace6e0dbbf45f9d72f33c3e1fb414"
                },
                "downloads": -1,
                "filename": "torcpy-0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "ab9975d9231e6203488cc6afbbc45aaf",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 31897,
                "upload_time": "2019-10-29T14:26:11",
                "upload_time_iso_8601": "2019-10-29T14:26:11.948230Z",
                "url": "https://files.pythonhosted.org/packages/ec/70/0b7752d5d749eea5dbdd387420996817457dc1144037283c365ce4408f57/torcpy-0.1.tar.gz",
                "yanked": false,
                "yanked_reason": null
            }
        ],
        "0.1.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "4fd26eb69cf56a145eadfe2d2f7809e6",
                    "sha256": "fe1af7413c278871698aa0cce99ae5dec61c62c0e70397bfd1fced4cc7c1f4c9"
                },
                "downloads": -1,
                "filename": "torcpy-0.1.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "4fd26eb69cf56a145eadfe2d2f7809e6",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 20730,
                "upload_time": "2020-01-14T15:04:05",
                "upload_time_iso_8601": "2020-01-14T15:04:05.573627Z",
                "url": "https://files.pythonhosted.org/packages/8d/17/bf19dbce022c3e8f7b38e664d4bcaa88e657a856839d4e914bf9fa0d9c30/torcpy-0.1.1-py3-none-any.whl",
                "yanked": false,
                "yanked_reason": null
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "8cfd5ecec57d26dd2c5af1adcf1c639f",
                    "sha256": "a672c51729ee549154b59a424916a9fc4ee17ac8430b75177ba840a08eb41767"
                },
                "downloads": -1,
                "filename": "torcpy-0.1.1.tar.gz",
                "has_sig": false,
                "md5_digest": "8cfd5ecec57d26dd2c5af1adcf1c639f",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 32170,
                "upload_time": "2020-01-14T15:04:07",
                "upload_time_iso_8601": "2020-01-14T15:04:07.921665Z",
                "url": "https://files.pythonhosted.org/packages/df/23/cc779a7241ba2b7a1773ac8495f21f35fb4684fe179b3e3393ad83716e01/torcpy-0.1.1.tar.gz",
                "yanked": false,
                "yanked_reason": null
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "4fd26eb69cf56a145eadfe2d2f7809e6",
                "sha256": "fe1af7413c278871698aa0cce99ae5dec61c62c0e70397bfd1fced4cc7c1f4c9"
            },
            "downloads": -1,
            "filename": "torcpy-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4fd26eb69cf56a145eadfe2d2f7809e6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 20730,
            "upload_time": "2020-01-14T15:04:05",
            "upload_time_iso_8601": "2020-01-14T15:04:05.573627Z",
            "url": "https://files.pythonhosted.org/packages/8d/17/bf19dbce022c3e8f7b38e664d4bcaa88e657a856839d4e914bf9fa0d9c30/torcpy-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "8cfd5ecec57d26dd2c5af1adcf1c639f",
                "sha256": "a672c51729ee549154b59a424916a9fc4ee17ac8430b75177ba840a08eb41767"
            },
            "downloads": -1,
            "filename": "torcpy-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "8cfd5ecec57d26dd2c5af1adcf1c639f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 32170,
            "upload_time": "2020-01-14T15:04:07",
            "upload_time_iso_8601": "2020-01-14T15:04:07.921665Z",
            "url": "https://files.pythonhosted.org/packages/df/23/cc779a7241ba2b7a1773ac8495f21f35fb4684fe179b3e3393ad83716e01/torcpy-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "vulnerabilities": []
}