{ "info": { "author": "Radu Jica", "author_email": "radu.jica+code@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "# Baloo\n\nImplementing the [*bare necessities*](https://www.youtube.com/watch?v=08NlhjpVFsU) \nof [Pandas](https://pandas.pydata.org/) with the *lazy* evaluating\nand optimizing [Weld](https://github.com/weld-project/weld) framework.\n\n[![PyPI version](https://badge.fury.io/py/baloo.svg)](https://badge.fury.io/py/baloo)\n[![Build Status](https://travis-ci.com/radujica/baloo.svg?branch=master)](https://travis-ci.com/radujica/baloo)\n\n### Documentation [here](https://radujica.github.io/baloo/)\n\n## Install\n pip install baloo\n\nNote that currently it has only been tested on Python 3.5.2, though any Python 3 version should be fine.\n\n## Benchmarks\nBenchmark results over seeded randomized data are shown below. \nThe data consists of 4 NumPy array columns: 2 of float64, 1 of int64, and 1 of int32.\nFirst 2 plots run the following operations over 56MB and 420MB total data, respectively:\n\n df = df[(df['col1'] > 0) & (df['col2'] >= 10) & (df['col3'] < 30)] # filter \n df = df.agg(['min', 'prod', 'mean', 'std']) # 4x agg\n df['col4'] = df['col1'] * 2 + 1 - 23 # 3x op\n df['col5'] = df['col1'].apply(np.exp) # udf\n df = df.groupby(['col2', 'col4']).var() # groupby*\n df = df[['col3', 'col1']].join(df[['col3', 'col2']], on='col3', rsuffix='_r') # join*\n\n\\* Note that the groupby and join implementations are simplified in Baloo. For instance, the groupby result is not sorted\nin Baloo as is in Pandas. The join implementation in Baloo currently relies on the `on` data being sorted and distinct;\nsortednes is expected to be patched soon.\n\n![benchmark results](benchmarks/benchmarks-2000.png)\n![benchmark results](benchmarks/benchmarks-15000.png)\n\nThis last graph shows the execution time of `3x op` over varying dataset sizes:\n\n![benchmark scalability](benchmarks/scalability.png)\n\nWeld is, indeed, expected to scale well due to features such as vectorization, however the compilation time outweighs\nthe improved computation time for small datasets. Nevertheless, Baloo currently only supports a limited subset of\nPandas. More work coming soon!\n\nThe scripts used to run the benchmarks are available in the relevant folder.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/radujica/baloo", "keywords": "", "license": "BSD 3-Clause", "maintainer": "", "maintainer_email": "", "name": "baloo", "package_url": "https://pypi.org/project/baloo/", "platform": "linux", "project_url": "https://pypi.org/project/baloo/", "project_urls": { "Homepage": "https://github.com/radujica/baloo" }, "release_url": "https://pypi.org/project/baloo/0.0.5/", "requires_dist": [ "pandas", "numpy", "tabulate" ], "requires_python": ">=3.0", "summary": "Implementing the bare necessities of Pandas with the lazy evaluating and optimizing Weld framework.", "version": "0.0.5" }, "last_serial": 4691585, "releases": { "0.0.5": [ { "comment_text": "", "digests": { "md5": "a4ab982a2879577845be0ae802206ed2", "sha256": "437e2ace18984432e80505f46045e7a39ea31176e90a0bc730fb57636c53a91f" }, "downloads": -1, "filename": "baloo-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "a4ab982a2879577845be0ae802206ed2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.0", "size": 10809721, "upload_time": "2019-01-13T19:47:56", "url": "https://files.pythonhosted.org/packages/a9/e3/819c3deda433c443919d4bd9bc389435088f9d3997592493ce7255bf412e/baloo-0.0.5-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "a4ab982a2879577845be0ae802206ed2", "sha256": "437e2ace18984432e80505f46045e7a39ea31176e90a0bc730fb57636c53a91f" }, "downloads": -1, "filename": "baloo-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "a4ab982a2879577845be0ae802206ed2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.0", "size": 10809721, "upload_time": "2019-01-13T19:47:56", "url": "https://files.pythonhosted.org/packages/a9/e3/819c3deda433c443919d4bd9bc389435088f9d3997592493ce7255bf412e/baloo-0.0.5-py3-none-any.whl" } ] }