{
    "info": {
        "author": "Jay Marcyes",
        "author_email": "jay@marcyes.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 4 - Beta",
            "License :: OSI Approved :: MIT License",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3"
        ],
        "description": "Mister\n======\n\nFor all your medium data needs!\n\nMister attempts to make running a map/reduce job approachable.\n\nWhen you've got data that isn't really big and so you're not quite ready\nto distribute the data across a gazillian machines and stuff but would\nstill like an answer in a reasonable amount of time.\n\n5 minute getting started\n------------------------\n\nMister needs you to define three methods: ``prepare`` (get the data\nready to be run across multiple processes), ``map`` (actually do\nsomething with the chunks of data from ``prepare``), and ``reduce``\n(mash all the values returned from ``map`` together).\n\nThe ``reduce`` method\n~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n    prepare(self, count, *args, **kwargs)\n\nThe ``count`` is the number of processes the job will be run across, and\n``*args`` and ``**kwargs`` is whatever is passed into your child class's\n``__init__`` method. The ``prepare`` method returns **count** rows\ncontaining a tuple ``((), {})`` of the arguments that will be passed to\neach ``map`` process.\n\nThe ``map`` method\n~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n    map(self, *args, **kwargs)\n\nThe ``*args`` and ``**kwargs`` are whatever was returned from\n``prepare``. The ``map`` method returns whatever you want ``reduce`` to\nuse to merge all the data together.\n\nThe ``reduce`` method\n~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n    reduce(self, output, value)\n\nThe ``output`` is the global aggregation of all the ``value`` arguments\nthe ``reduce`` method has seen. Basically, whatever you return from one\n``reduce`` call will be passed back into the next ``reduce`` call as\n``output``. The ``value`` argument is whatever the recently finished\n``map`` call returned.\n\nBringing it all together\n~~~~~~~~~~~~~~~~~~~~~~~~\n\nSo let's bring it all together in our ``MrHelloWorld`` job, first let's\nget the skeleton in place:\n\n.. code:: python\n\n    from mister import BaseMister\n\n\n    class MrHelloWorld(BaseMister):\n        def prepare(self, count, *args, **kwargs): pass\n        def map(self, *args, **kwargs): pass\n        def reduce(self, output, value): pass\n\nNow let's flesh out the ``prepare`` method:\n\n.. code:: python\n\n    def prepare(self, count, name):\n        # we're just going to return the number and the name we pass in \n        for x in range(count):\n            yield ([x, name], {})\n\nAnd our ``map`` method:\n\n.. code:: python\n\n    def map(self, x, name):\n        return \"Process {} says 'hello {}'\".format(x, name)\n\nFinally, our ``reduce`` method:\n\n.. code:: python\n\n    def reduce(self, output, value):\n        if output is None:\n            output = []\n        output.append(value)\n        return output\n\nRunning our job:\n\n.. code:: python\n\n    mr = MrHelloWorld(\"Alice\")\n    output = mr.run()\n    print(output)\n\nwill result in:\n\n::\n\n    [\n        \"Process 1 says 'hello Alice'\",\n        \"Process 0 says 'hello Alice'\",\n        \"Process 2 says 'hello Alice'\",\n        \"Process 3 says 'hello Alice'\",\n        \"Process 4 says 'hello Alice'\",\n        \"Process 5 says 'hello Alice'\",\n        \"Process 6 says 'hello Alice'\",\n        \"Process 7 says 'hello Alice'\",\n        \"Process 8 says 'hello Alice'\",\n        \"Process 9 says 'hello Alice'\",\n        \"Process 10 says 'hello Alice'\"\n    ]\n\nCongrats, you just ran a map/reduce job, you are now an AI and a ML\nengineer, remember me when you're famous!\n\nAnother Example\n---------------\n\nI think word counting is the traditional map/reduce example? So here it\nis:\n\n.. code:: python\n\n    import os\n    import re\n    improt math\n    from collections import Counter\n\n    from mister import BaseMister\n\n\n    class MrWordCount(BaseMister):\n        def prepare(self, count, path):\n            \"\"\"prepare segments the data for the map() method\"\"\"\n            size = os.path.getsize(path)\n            length = int(math.ceil(size / count))\n            start = 0\n            for x in range(count):\n                kwargs = {}\n                kwargs[\"path\"] = path\n                kwargs[\"start\"] = start\n                kwargs[\"length\"] = length\n                start += length\n                yield (), kwargs\n\n        def map(self, path, start, length):\n            \"\"\"all the magic happens right here\"\"\"\n            output = Counter()\n            with open(path) as fp:\n                fp.seek(start, 0)\n                words = fp.read(length)\n\n            # I don't compensate for word boundaries because example\n            for word in re.split(r\"\\s+\", words):\n                output[word] += 1\n            return output\n\n        def reduce(self, output, count):\n            \"\"\"take all the return values from map() and aggregate them to the final value\"\"\"\n            if not output:\n                output = Counter()\n            output.update(count)\n            return output\n            \n    # let's count the bible\n    path = \"./testdata/bible-kjv.txt\"\n    mr = MrWordCount(path)\n    wordcounts = mr.run()\n    print(wordcounts.most_common(10))\n\nOn my computer, the asynchronous code above runs about 3x faster than\nits syncronous equivalent below:\n\n.. code:: python\n\n    import re\n    from collections import Counter\n\n    path = \"./testdata/bible-kjv.txt\"\n\n    output = Counter()\n    with open(path) as fp:\n        words = fp.read()\n\n    for word in re.split(r\"\\s+\", words):\n        output[word] += 1\n\n    print(wordcounts.most_common(10))\n\nInstallation\n------------\n\nTo install, use Pip:\n\n::\n\n    $ pip install mister\n\nOr, to grab the latest and greatest:\n\n::\n\n    $ pip install --upgrade git+https://github.com/Jaymon/mister#egg=mister\n\n",
        "description_content_type": "",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "http://github.com/Jaymon/mister",
        "keywords": "",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "mister",
        "package_url": "https://pypi.org/project/mister/",
        "platform": "",
        "project_url": "https://pypi.org/project/mister/",
        "project_urls": {
            "Homepage": "http://github.com/Jaymon/mister"
        },
        "release_url": "https://pypi.org/project/mister/0.0.2/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "Approachable map/reduce jobs",
        "version": "0.0.2"
    },
    "last_serial": 4542151,
    "releases": {
        "0.0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "94f97b2606fb50d1fb1e247255a09ee1",
                    "sha256": "e97fdbac6bd8ea5e40ad1f4c752ad7898eed85c92d09a8756cf812cd9cfabf9f"
                },
                "downloads": -1,
                "filename": "mister-0.0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "94f97b2606fb50d1fb1e247255a09ee1",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 5194,
                "upload_time": "2018-11-29T08:11:32",
                "url": "https://files.pythonhosted.org/packages/a5/26/e7e4807b70581516d30046119940948792c1eb0dbe56fd129ac18ae4d36d/mister-0.0.1.tar.gz"
            }
        ],
        "0.0.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "fa3344ff1b357e75b20c1945162589fe",
                    "sha256": "5b837b952920189e6d1e189f5a4a09d6d24596623a13c59519c142fe761f5afd"
                },
                "downloads": -1,
                "filename": "mister-0.0.2.tar.gz",
                "has_sig": false,
                "md5_digest": "fa3344ff1b357e75b20c1945162589fe",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 6513,
                "upload_time": "2018-11-29T08:48:01",
                "url": "https://files.pythonhosted.org/packages/38/73/887e8787648bb26862e4c042472cb9046af0155e235fdf5c32a14438d61b/mister-0.0.2.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "fa3344ff1b357e75b20c1945162589fe",
                "sha256": "5b837b952920189e6d1e189f5a4a09d6d24596623a13c59519c142fe761f5afd"
            },
            "downloads": -1,
            "filename": "mister-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "fa3344ff1b357e75b20c1945162589fe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 6513,
            "upload_time": "2018-11-29T08:48:01",
            "url": "https://files.pythonhosted.org/packages/38/73/887e8787648bb26862e4c042472cb9046af0155e235fdf5c32a14438d61b/mister-0.0.2.tar.gz"
        }
    ]
}