{ "info": { "author": "Bryan Cole", "author_email": "bryancole.cam@googlemail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta" ], "description": "==============================\r\nInstalling and Using Sendtools\r\n==============================\r\n\r\n.. contents:: **Table of Contents**\r\n\r\n--------\r\nOverview\r\n--------\r\n\r\nSendtools is a collections of classes for efficiently consuming iterators into \r\none or more data structures. It compliments the itertools module and the other \r\nexcellent facilities python offers for iteration. Sendtools is useful when:\r\n\r\n * Your source iterator is too big to fit in memory\r\n * Your data source is I/O-bound so you don't want to make more than one pass\r\n * You want to collect data into two or more lists (or other collections)\r\n * You want to group, filter, transform or otherwise aggregate the data\r\n\r\nSuch situations occur when you're analysing query-sets from large databases or \r\ndatafiles (HDF5-files, for example).\r\n\r\nSendtools is written using Cython to produce a 100% compiled module, for maximum \r\nperformance.\r\n\r\nIdeas and further discussion on sendtools can be posted to \r\nhttp://groups.google.com/group/python-sendtools\r\n\r\n------------\r\nRequirements\r\n------------\r\n\r\nThere are no dependencies outside of python to compile and install sendtools (although\r\nyou will need a compiler obviously).\r\n\r\nIf you want to hack on the Cython code, you'll need Cython-0.12.1 or later.\r\n\r\n------------\r\nInstallation\r\n------------\r\n\r\nSendtools is installed from source using distutils in the usual way - run::\r\n\r\n #python setup.py install\r\n\r\nto install it site-wide\r\n\r\nIf you have Cython installed, you can also import the sendtools.pyx file directly\r\nusing the pyximport module (part of Cython). This is handy for development, as used\r\nin the unittest script.\r\n\r\n-----\r\nUsage\r\n-----\r\n\r\nSendtools is built on the concept of \"Consumer\" objects. These were inspired by \r\npython's generators (an early version of sendtools was implemented in python \r\nusing generators). Consumer objects can have data \"sent\" into them. Unlike \r\ngenerators, Consumers do not produce data iteratively (no ``next`` method), \r\nbut they do produce a result which can be accessed at any time using the ``.result()`` \r\nmethod.\r\n\r\nData is typically sent into a Consumer using the ``sendtools.send()`` function, \r\nwhich takes the form::\r\n\r\n output = send(source, target)\r\n\r\nwhere ``source`` is an iterator producing data. ``target`` is a Consumer object into \r\nwhich the data is sent. ``output`` is the Consumer's result, returned after the \r\nsource has been fully consumed, or the Consumer indicates it's complete (by \r\nraising StopIteration), whichever happens first. Basically, the send function \r\nis a shortcut for writing a for-loop. It's equivalent to (but faster than)::\r\n\r\n def send(source, target):\r\n try:\r\n for item in source:\r\n target.send(item)\r\n except StopIteration:\r\n pass\r\n return target.result()\r\n \r\nNote, StopIteration can be raised by ``target.send(...)`` to exit the loop (as \r\nwell as by the source), so we handle it explicitly.\r\n\r\nThe target may be list or set, representing the data structure you want to \r\ncollect the data into. These are implicitly converted to Consumer objects by \r\nthe send function. The input list (or set) is returned by the send function \r\nhaving been filled with data. \r\n\r\n``target`` can also be a (multiply nested) tuple of consumers. In this case the \r\nresult will be a tuple which matches the structure of the target tuple, \r\ncontaining the results for each consumer. In this way, data from a source \r\niterator can be collected into multiple lists in a single iteration pass.\r\n\r\nSendtools defines many aggregation consumers. These do not produce a list or \r\nother collection as their result, but a scalar value.\r\n\r\n--------\r\nExamples\r\n--------\r\n\r\nLet's start with basic usage of the ``send()`` function::\r\n\r\n >>> from sendtools import send\r\n >>> data = range(10)\r\n >>> out=[]\r\n >>> result = send(data, out)\r\n >>> result\r\n [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\r\n >>> out\r\n [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\r\n >>> result is out\r\n True\r\n\r\nThe source 'data' is copied into the target 'out' and this is returned.\r\n\r\nNow lets see how to send data into multiple targets::\r\n\r\n >>> a, (b,c) = send(data, ([], ([],[]) ) )\r\n >>> a is b; b is c; a is c\r\n False\r\n False\r\n False\r\n >>> a == b; b == c; a == c\r\n True\r\n True\r\n True\r\n\r\nThe data is copied into three different lists.\r\n\r\nData can be collected into sets as well as lists:: \r\n\r\n >>> data = [1,2,3,5,4,2,6,3,4,8,5,6,3,1,5,3,6,3,6,\"moose\",4,2]\r\n >>> send(data, set())\r\n set([1, 2, 3, 4, 5, 6, 8, 'moose'])\r\n\r\nIn fact, any MutableSequence or MutableSet (the Abstract Base Class) will do. \r\nSadly, the std-lib array.array object is not registered as a MutableSequence \r\nout-the-box, but we can do this ourselves::\r\n\r\n >>> from array import array\r\n >>> from collections import MutableSequence\r\n >>> MutableSequence.register(array)\r\n >>> data = [1,2,3,5,4,2,6,3,4,8,5,6,3,1,5,3,6,3,6,4,2]\r\n >>> target = array(\"f\") #an empty array\r\n >>> send(data, target)\r\n array('f', [1.0, 2.0, 3.0, 5.0, 4.0, 2.0, 6.0, 3.0, 4.0, 8.0, 5.0, 6.0, 3.0, \r\n 1.0, 5.0, 3.0, 6.0, 3.0, 6.0, 4.0, 2.0])\r\n\r\n\r\nAggregation\r\n-----------\r\n\r\nNow let's see some aggregation::\r\n\r\n >>> send(data, ([], (Max(), Min(), Sum(), Ave())))\r\n ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], (9, 0, 45, 4.5))\r\n\r\nAll the aggregation functions found in SQL are available: Sum, Max, Min, Ave, \r\nFirst, Last, Count.\r\n\r\nThere are a few more besides these: \r\n\r\n * All, Any - like the builtin ``all`` and ``any``, but for consumers\r\n * Select - Picks the n'th item in a sequence\r\n * Stats - Computes an incremental standard deviation, mean and count of it's input. \r\n \r\nThis last one only works with numerical input and returns a length-3 tuple as it's result.\r\n\r\nHere's a (somewhat pointless) example of Select and Stats::\r\n\r\n >>> data = [1,2,3,5,4,2,6,3,4,8,5,6,3,1,5,3,6,3,6,4,2]\r\n >>> targets = tuple([Select(i) for i in xrange(0,10,3)])\r\n >>> send(data, targets)\r\n (1, 5, 6, 8)\r\n >>> send(data, Stats())\r\n (21L, 3.9047619047619047, 1.868281614338746)\r\n\r\nObviously, a better way to pick out every 3rd item from a series from 0 to 10 \r\nwould be to use the Slice object (see below).\r\n\r\nTransformations and Filtering\r\n-----------------------------\r\n\r\nData can be filtered using Filter::\r\n\r\n >>> data = [1,2,3,5,4,2,6,3,4,8,5,6,3,1,5,3,6,3,6,4,2]\r\n >>> send(data, Filter(lambda x:x%2==0, []))\r\n [2, 4, 2, 6, 4, 8, 6, 6, 6, 4, 2]\r\n\r\nData can be transformed using Map::\r\n\r\n >>> send(data, ([], Map(lambda x:x**2, [])))\r\n ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 4, 9, 16, 25, 36, 49, 64, 81])\r\n\r\nOne important use-case is splitting a sequence of tuples or other \r\ncompound objects into multiple lists. Although this can be done with Map,\r\nthis is such a common operation, we have a dedicated Get object for this\r\npurpose. eg.::\r\n\r\n >>> tups = [(x,x**2) for x in range(10)]\r\n >>> print tups\r\n [(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), \r\n (8, 64), (9, 81)]\r\n >>> a,b = send(tups, (Get(0,[]), Get(1,[])))\r\n >>> a\r\n [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\r\n >>> b\r\n [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]\r\n\r\nThis works for any suitable indexing object. For example, columns from a database\r\nquery can be collected into some lists using this method. Object attributes\r\ncan also be retrieved in a similar manner using the Attr object.\r\n\r\nSequence/iterable unpacking has a further simplification, using the Unzip object. The\r\nabove example can be rewritten as\r\n\r\n >>> a,b = send(tups, Unzip([],[]))\r\n >>> a\r\n [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\r\n >>> b\r\n [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]\r\n\r\nThe Unzip object takes any number of Consumers as it's arguments. Sequences or iterables\r\ncan be sent into it. There must be at least enough items in the input container \r\nas output targets, otherwise TypeError is raised. Excess input items are discarded.\r\n\r\nGrouping and Switching\r\n----------------------\r\n\r\nData can be grouped in a variety of ways. The grouping objects take a factory \r\nfunction as a keyword argument. This is called to create each group. By default, \r\na list group is created, but more complex group-types are possible: aggregates, \r\ntuples of targets or even other grouping objects. Any valid target object can \r\nbe used.\r\n\r\nHere's an example of simple grouping by number into sublists::\r\n\r\n >>> data\r\n [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]\r\n >>> send(data, GroupByN(3,[]))\r\n [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17]]\r\n\r\nNow let's use a more complex group factory for get the mean of each group,\r\nas well as the group list::\r\n\r\n >>> send(data, GroupByN(3, [], factory=lambda :([],Ave())))\r\n [([0, 1, 2], 1.0), ([3, 4, 5], 4.0), ([6, 7, 8], 7.0), ([9, 10, 11], 10.0), \r\n ([12, 13, 14], 13.0), ([15, 16, 17], 16.0)]\r\n\r\nGroups can also be created using a key-function, with the GroupByKey object::\r\n\r\n >>> data = [1,2,3,5,4,2,6,3,4,8,5,6,3,1,5,3,6,3,6,4,2]\r\n >>> send(data, GroupByKey(lambda x:x==5, []))\r\n [[1, 2, 3], [5], [4, 2, 6, 3, 4, 8], [5], [6, 3, 1], [5], [3, 6, 3, 6, 4, 2]]\r\n\r\nNote, new groups are created whenever the key-function returns a different \r\nresult to the previous item, regardless of whether that result has been used to\r\ncreate previous groups.\r\n \r\nSwitching is a very close relative to grouping. The Switch object passes it's\r\ninput to a key-function which must return an int. The input is passed to one\r\nof N outputs according to this int. I.e.\r\n\r\n >>> data = [1,2,3,5,4,2,6,3,4,8,5,6,3,1,5,3,6,3,6,4,2]\r\n >>> send(data, Switch(lambda x:int(x<5), [],[]))\r\n ([5, 6, 8, 5, 6, 5, 6, 6], [1, 2, 3, 4, 2, 3, 4, 3, 1, 3, 3, 4, 2])\r\n \r\nThe Switch object can take any number of target Consumers.\r\n\r\nIf you want to collect objects into groups according a key, without preserving\r\nthe order, you need SwitchByKey. This object outputs a dictionary of groups. \r\n\r\n >>> data = [1,2,3,5,4,2,6,3,4,8,5,6,3,1,5,3,6,3,6,4,2]\r\n >>> func = lambda item: \"low\" if item<5 else \"high\"\r\n >>> send(data, SwitchByKey(func, init={\"low\":['foo']}))\r\n {'high': [5, 6, 8, 5, 6, 5, 6, 6], \r\n 'low': ['foo', 1, 2, 3, 4, 2, 3, 4, 3, 1, 3, 3, 4, 2]}\r\n >>> send(data, SwitchByKey(func, factory=Sum))\r\n {'high': 47, 'low': 35}\r\n\r\nThe init keyword specifies a dictionary of groups with which to initialise\r\nthe object (an empty dict by default). When a new key is encountered (that does \r\nnot already exist in the dict), the factory function is called to create a new \r\ngroup for this key. \r\n\r\nSlicing\r\n-------\r\n\r\nThe Slice object works analogously to the builtin slice function, but for \r\nConsumers. Like builtin slice, it takes one to three arguments specifying the\r\nstart, stop and step values for selection::\r\n\r\n >>> data = range(20)\r\n >>> send(data, Slice(None,15,3, []))\r\n [0, 3, 6, 9, 12]\r\n >>> send(data, Slice(5,None,3, []))\r\n [5, 8, 11, 14, 17]\r\n\r\nSlice follows a similar call-signature overloading as used by built-in slice, where\r\nthe step or step and start arguments may be left out. It differs from the built-in \r\nslice object in that the stop-index is not required.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://bitbucket.org/bryancole/sendtools", "keywords": "", "license": "UNKNOWN", "maintainer": "", "maintainer_email": "", "name": "sendtools", "package_url": "https://pypi.org/project/sendtools/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/sendtools/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://bitbucket.org/bryancole/sendtools" }, "release_url": "https://pypi.org/project/sendtools/0.2.0/", "requires_dist": null, "requires_python": null, "summary": "Tools for composing consumers for iterators. A companion to itertools.", "version": "0.2.0" }, "last_serial": 799406, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "97c505531f8b411a65323810f79bc241", "sha256": "25fc4d7cd3b05b278f2556eb46cf4d2cb9ffcbdcfbf444f7f6eb6e65b3e7f4f1" }, "downloads": -1, "filename": "sendtools-0.1.0.win32-py2.6.exe", "has_sig": false, "md5_digest": "97c505531f8b411a65323810f79bc241", "packagetype": "bdist_wininst", "python_version": "2.6", "requires_python": null, "size": 231198, "upload_time": "2010-06-07T14:44:40", "url": "https://files.pythonhosted.org/packages/a6/8e/fddb9b68aa1f85a57ddde199addea85e56b9b68b2ba53f9ac8e50e59a8ed/sendtools-0.1.0.win32-py2.6.exe" }, { "comment_text": "", "digests": { "md5": "16588c6460cb22545aa2fd2e830bfeeb", "sha256": "fceebe29ba4badae328c0777ea7a448a00960627c061641b8e02faf16dd91fda" }, "downloads": -1, "filename": "sendtools-0.1.0.zip", "has_sig": false, "md5_digest": "16588c6460cb22545aa2fd2e830bfeeb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6940, "upload_time": "2010-06-07T14:44:40", "url": "https://files.pythonhosted.org/packages/c3/1b/5cba3f85356b9bb25655ef65b28a4001346cda1546a52d3f6ea1349e0c21/sendtools-0.1.0.zip" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "193c227fe8a14c39d891564cc47a0c87", "sha256": "31be6570e33f2b086acb302592ef03668c723e8d700ec0ed0e64f916b786d58c" }, "downloads": -1, "filename": "sendtools-0.1.1.tar.gz", "has_sig": false, "md5_digest": "193c227fe8a14c39d891564cc47a0c87", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9539, "upload_time": "2010-06-08T13:43:20", "url": "https://files.pythonhosted.org/packages/69/df/09a30498d182840fbb11716da61544884af5657502da13e33b9705728f7b/sendtools-0.1.1.tar.gz" }, { "comment_text": "", "digests": { "md5": "a0443de2697536328f1382d7d89f4547", "sha256": "a4e3b782cee3c2c8426b8781caebf797d8df04e52f24107c5a29bd154f2cee17" }, "downloads": -1, "filename": "sendtools-0.1.1.win32-py2.6.exe", "has_sig": false, "md5_digest": "a0443de2697536328f1382d7d89f4547", "packagetype": "bdist_wininst", "python_version": "2.6", "requires_python": null, "size": 243839, "upload_time": "2010-06-08T13:51:00", "url": "https://files.pythonhosted.org/packages/94/1f/070678e6a1f9c68f8e00329751011f5a4e0df20bb805784fce5cbdb78f5d/sendtools-0.1.1.win32-py2.6.exe" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "bd5677ef9aa4109e999ca03d908f3a3b", "sha256": "9fe2e3e431eb8bcfe1cf4e85ac89c78085e60eee17c98de81a694b186e0ddde8" }, "downloads": -1, "filename": "sendtools-0.2.0.tar.gz", "has_sig": false, "md5_digest": "bd5677ef9aa4109e999ca03d908f3a3b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11117, "upload_time": "2011-01-17T09:21:03", "url": "https://files.pythonhosted.org/packages/d1/47/a3f8ad93b6579735b5a10deecb5509dd2cd650b06426b22c0b1e13d41803/sendtools-0.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "bd5677ef9aa4109e999ca03d908f3a3b", "sha256": "9fe2e3e431eb8bcfe1cf4e85ac89c78085e60eee17c98de81a694b186e0ddde8" }, "downloads": -1, "filename": "sendtools-0.2.0.tar.gz", "has_sig": false, "md5_digest": "bd5677ef9aa4109e999ca03d908f3a3b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11117, "upload_time": "2011-01-17T09:21:03", "url": "https://files.pythonhosted.org/packages/d1/47/a3f8ad93b6579735b5a10deecb5509dd2cd650b06426b22c0b1e13d41803/sendtools-0.2.0.tar.gz" } ] }