{ "info": { "author": "Keith Goodman", "author_email": "bottle-neck@googlegroups.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Operating System :: OS Independent", "Programming Language :: C", "Programming Language :: Python", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering" ], "description": "=========\nfemto\n=========\n\nWhat's the fastest way to sum a NumPy array? I don't know---that's why I\ncreated femto.\n\n**femto**, written in C, contains several implementations of a sum\nfunction. To keep things simple the input array must be at least 2d and the\naxis of summation cannot be None. Limiting ourselves to the 1d case would\nhave be even simpler. But I am interested in both summing along an axis\nwhere the array elements are closely packed in memory (e.g. axis=-1 of a\nC contiguous array) and where they are widely spaced (axis=0). Both cases\nrequire different optimizations.\n\nMy goal is to find fast ways to implement reduction functions (sum, mean,\nstd, max, nansum, etc.) that are bound by memory I/O. I chose summation as a\ntest case because very little time is spent with arithmetic which makes it\neasier to measure improvements from things like manual loop unrolling, SIMD,\nand OpenMP.\n\nfemto, based on code from `bottleneck`_, comes with several benchmark\nsuites::\n\n >>> ss.bench_axis0()\n femto performance benchmark\n femto 0.0.1.dev0; Numpy 1.11.2\n Speed is NumPy time divided by femto time\n Score is harmonic mean of speeds\n\n (1000,1000)(1000,1000)(1000,1000)(1000,1000)\n float64 float32 int64 int32\n axis=0 axis=0 axis=0 axis=0 score\n sum00 0.34 0.22 0.63 1.19 0.40\n sum01 0.35 0.22 0.65 1.20 0.41\n sum02 0.47 0.27 0.69 1.19 0.49\n sum03 0.56 0.45 0.87 1.54 0.69\n sum04 0.81 0.45 0.56 1.55 0.68\n sum10 0.80 0.45 1.20 1.90 0.83\n sum11 1.19 1.34 1.20 1.88 1.35\n sum12 1.19 0.45 1.21 1.89 0.91\n p_sum01 1.30 0.22 2.02 1.36 0.62\n p_sum02 1.38 0.23 2.41 1.36 0.63\n p_sum03 1.90 1.13 2.73 4.28 1.99\n p_sum04 3.22 1.07 2.71 4.33 2.16\n\n >>> ss.bench_axis1()\n femto performance benchmark\n femto 0.0.1.dev0; Numpy 1.11.2\n Speed is NumPy time divided by femto time\n Score is harmonic mean of speeds\n\n (1000,1000)(1000,1000)(1000,1000)(1000,1000)\n float64 float32 int64 int32\n axis=1 axis=1 axis=1 axis=1 score\n sum00 0.48 0.44 0.84 1.32 0.63\n sum01 0.47 0.45 1.05 1.85 0.68\n sum02 1.13 1.29 1.38 3.07 1.47\n sum03 1.13 1.27 1.37 3.07 1.47\n sum04 1.12 1.29 1.36 3.11 1.47\n sum10 1.12 1.28 1.37 3.06 1.47\n sum11 1.42 3.04 1.38 3.07 1.92\n sum12 1.45 1.29 1.39 3.03 1.59\n p_sum01 3.11 3.04 4.16 6.80 3.85\n p_sum02 4.37 4.37 5.48 10.65 5.45\n p_sum03 4.20 4.35 5.52 10.48 5.37\n p_sum04 4.43 4.30 5.49 10.36 5.43\n\nI chose numpy.sum as a benchmark because it is fast and convenient. It\nshould be possible to beat NumPy's performance. That's because femto has\nan unfair advantage. We will not duplicate the `pairwise summation`_ NumPy\nuses to deal with the accumulated round-off error in floating point arrays.\n\nThe overall fastest function is the one with the highest benchmark score.\nLet's consider the case where we benchmark each function with two arrays\n(five are used by default in the benchmark) and the speeds are 0.5 (half as\nfast as NumPy) and 2.0 (twice as fast). What should the overall score be? Some\npossibilities are the mean (1.25, which is faster than NumPy), the geometric\nmean (1.0, same as NumPy), or the harmonic mean (0.8, slower). I chose the\nharmonic mean. If a NumPy program spends equal time summing the two benchmark\narrays, each 1 unit of time, then it will take 1/2 + 2 units of time with\nfemto, which is a speed of 2/2.5 = 0.8.\n\nLet's look at function call overhead by benchmarking with small input arrays::\n\n >>> ss.bench_overhead()\n femto performance benchmark\n femto 0.0.1.dev0; Numpy 1.11.2\n Speed is NumPy time divided by femto time\n Score is harmonic mean of speeds\n\n (10,10) (10,10) (100,100) (100,100)\n float64 float64 float64 float64\n axis=0 axis=1 axis=0 axis=1 score\n sum00 16.11 16.11 1.12 0.98 1.96\n sum01 13.82 13.22 1.17 1.03 2.03\n sum02 15.83 15.91 2.76 2.51 4.51\n sum03 16.94 13.38 2.81 2.40 4.42\n sum04 17.85 13.45 4.53 2.38 5.19\n sum10 16.67 18.16 2.01 2.52 3.96\n sum11 18.33 18.62 3.33 3.52 5.78\n sum12 18.29 16.14 3.27 3.01 5.30\n p_sum01 1.75 1.73 2.59 2.20 2.01\n p_sum02 1.80 1.77 2.83 2.58 2.15\n p_sum03 1.83 1.87 2.94 2.55 2.21\n p_sum04 1.90 1.85 3.30 2.46 2.25\n\nPlease help me avoid over optimizing for my particular operating system, CPU,\nand compiler. `Let me know`_ the benchmark results on your system. If you have\nideas on how to speed up the `code`_ then `share`_ them.\n\nLicense\n=======\n\nfemto is distributed under the GPL v3+. See the LICENSE file for details.\n\nRequirements\n============\n\nCurrently femto only compiles on GNU/Linux. `Please help`_ us with getting\nit to compile on OSX and Windows.\n\n- SSE3, AVX, x86intrin.h, OpenMP\n- Python 2.7, 3.4, 3.5\n- NumPy 1.11\n- gcc\n- nose\n\n.. _bottleneck: https://github.com/kwgoodman/bottleneck\n.. _code: https://github.com/kwgoodman/femto\n.. _share: https://github.com/kwgoodman/femto/issues\n.. _pairwise summation: https://en.wikipedia.org/wiki/Pairwise_summation\n.. _Let me know: https://github.com/kwgoodman/femto/issues\n.. _Please help: https://github.com/kwgoodman/femto/issues/1", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/kwgoodman/femto", "keywords": null, "license": "GNU GPLv3+", "maintainer": null, "maintainer_email": null, "name": "femto", "package_url": "https://pypi.org/project/femto/", "platform": "OS Independent", "project_url": "https://pypi.org/project/femto/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/kwgoodman/femto" }, "release_url": "https://pypi.org/project/femto/0.0.1.dev0/", "requires_dist": null, "requires_python": null, "summary": "What's the fastest way to sum a NumPy array?", "version": "0.0.1.dev0" }, "last_serial": 2606970, "releases": { "0.0.1.dev0": [] }, "urls": [] }