{ "info": { "author": "Travis Bear", "author_email": "travis_bear@yahoo.com", "bugtrack_url": null, "classifiers": [], "description": "=============\nMT File Utils\n=============\n\n\nOverview\n========\n\nMT File Utils provides a simple API for common I/O tasks on text\nfiles.\n\n * **Tail**: performs like the linux 'tail' command.\n * **Reverse Seek**: Starting at the end of a file, returns all the\n lines until a specific target is found\n * **Reader**: High-performance API for reading lines from a text file.\n Most useful when you need fast concurrent access to very large data\n sets that don't fit in memory. Supports multithreaded access,\n random or sequential reads.\n\n\nCompatibility\n=============\n\nMT File Utils has been tested in cpython (2.7) and jython (2.5).\n\n\nUsage\n=====\n\nFor each example below, assume a file named 'data.txt' with one number per line::\n\n 1\n 2\n 3\n ...\n 9999\n\n\ntail\n----\n\nFunctions like the standard unix tail command. (\"follow\" or \"-f\" not supported.)\nPass in the file name, and the max number of lines you want returned::\n\n >>> from mtfileutil.reverse import tail\n >>> tail('data.txt', 5)\n ['9995', '9996', '9997', '9998', '9999']\n \n \nreverse_seek\n------------\n\nSearches backward from the end of a file for target text. Returns a list of each line\nbetween the end of the file and the target::\n\n >>> from mtfileutil.reverse import reverseSeek\n >>> reverseSeek('data.txt', '994')\n '994' found after searching back 6 lines.\n ['9994', '9995', '9996', '9997', '9998', '9999']\n\nYou can specify how far back to seek. The default limit is 3000 lines::\n\n >>> from mtfileutil.reverse import reverseSeek\n >>> reverseSeek('data.txt', 'loot')\n 'loot' not found in final 3000 lines of data.txt\n >>> reverseSeek('data.txt', '9993', max=5)\n '9993' not found in final 5 lines of data.txt\n []\n >>> reverseSeek('data.txt', '9993', max=10)\n '9993' found after searching back 7 lines.\n ['9993', '9994', '9995', '9996', '9997', '9998', '9999']\n \n \n \nread random\n-----------\nThe random reader selects a random line from a given text file\nevery time it is invoked. \n\n*CAUTION: The random nature of these reads typically defeats the \npage caching strategies used by your OS, so for large text files that\ndon't fit entirely in memory, it's easy to saturate your disk \nI/O capacity with this reader.*\n\nThe typical pattern is\n\n * Start a reader by assigning it a text file to read and a queue name to use\n * Use the reader as desired\n * Stop the reader\n \nExample::\n\n >>> from mtfileutil import reader\n >>> reader.startRandomReader('data.txt', 'rand_queue')\n Initializing random line queues[rand_queue](100)\n Initializing random line reader thread (data.txt)\n Data reader thread starting against file 'data.txt', size 48887\n Populating random queue\n >>> reader.getRandomLine('rand_queue')\n '7276'\n >>> reader.getRandomLine('rand_queue')\n '8452'\n >>> reader.getRandomLine('rand_queue')\n '640'\n >>> reader.stopRandomReader('rand_queue')\n Data reader received signal to end\n \n \n\n\n\nread sequential\n---------------\nFor large files that don't fit in memory, this is much friendlier on\nyour disk I/O because the data can be read and used in entire blocks.\nMakes a \"best effort\" to remember (bookmark) your location between reads,\nthat will typically be off by the maximum queue size. For text files with\nmany millions of lines this is probably not a big deal, but may be\na consideration when using repeatedly with smaller files.\n\nAs with the random reader, the typical pattern is to start a reader,\nuse the reader, then stop the reader. The bookmark is not saved until\nyou stop the reader, so this step is required if you want the reader to\nbookmark its location between runs::\n\n >>> from mtfileutil import reader\n >>> reader.startSequentialReader('./data.txt', 'seq_queue')\n Initializing sequential line queue[seq_queue](100)\n Initializing sequential line reader thread[seq_queue] (./data.txt)\n Data reader thread starting against file './data.txt', size 48887\n Reading data file './data.txt' from 0\n >>> reader.getNextLine('seq_queue')\n '1'\n >>> reader.getNextLine('seq_queue')\n '2'\n >>> reader.getNextLine('seq_queue')\n '3'\n >>> reader.stopSequentialReader('seq_queue')\n Data reader received signal to end\n Writing seek position 312 to temp file ./data.txt.seek\n Sequential data reader thread ending normally\n\nSubsequent invocations will remember the approximate location of the\nlast line you read. A subsequent python shell might look like this::\n\n >>> from mtfileutil import reader\n >>> reader.startSequentialReader('./data.txt', 'seq_queue')\n Initializing sequential line queue[seq_queue](100)\n Initializing sequential line reader thread[seq_queue] (./data.txt)\n Data reader thread starting against file './data.txt', size 48887\n >>> Reading data file './data.txt' from 312\n >>> reader.getNextLine('seq_queue')\n '106'\n >>> reader.getNextLine('seq_queue')\n '107'\n >>> reader.getNextLine('seq_queue')\n '108'\n >>> reader.stopSequentialReader('seq_queue')\n Data reader received signal to end\n Waiting for sequential line reader 'seq_queue' to end\n Writing seek position 732 to temp file ./data.txt.seek\n Sequential data reader thread ending normally\n\nAs you can see, the bookmarked location is about 100 lines ahead of\nthe last line read. The amount of variance depends on the size of\nthe read-ahead queue, which is sized at 100 by default.\n\nIf you want to start reading the file from the beginning each time, the\nbookmark feature can be disabled.::\n\n >>> from mtfileutil import reader\n >>> reader.startSequentialReader('./data.txt', 'seq_queue', start_at_last_read_location=False)\n Initializing sequential line queue[seq_queue](100)\n Initializing sequential line reader thread[seq_queue] (./data.txt)\n Data reader thread starting against file './data.txt', size 48887\n Reading data file './data.txt' from 0\n >>> reader.getNextLine('seq_queue')\n '1'\n\n\nthin\n----\nTakes a text file and samples every n lines.\n\n >>> from mtfileutil.reduce import thin\n >>> thin ('data.txt', 'data.reduced.txt')\n Writing every 25 lines of data.txt to data.reduced.txt\n >>> thin ('data.txt', 'data.reduced.txt', interval=100)\n Writing every 100 lines of data.txt to data.reduced.txt\n\nThis utility does not handle the various I/O exceptions that may\nbe caused by nonexistent files, insufficient permissions, etc.\n\n\nExamples\n========\n \nExample scripts are included in the package:\n\n * mtfu_read_random_example.py\n * mtfu_read_sequential_multithread_example.py\n * mtfu_read_sequential_singlethread_example.py\n * mtfu_reverse_seek_example.py\n * mtfu_tail_example.py\n\n\nSource\n======\n\nSource code and additional detail available here:\nhttps://bitbucket.org/travis_bear/file_util\n\nPatches accepted.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://pypi.python.org/pypi/mtFileUtil/", "keywords": null, "license": "LICENSE.txt", "maintainer": null, "maintainer_email": null, "name": "mtFileUtil", "package_url": "https://pypi.org/project/mtFileUtil/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/mtFileUtil/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://pypi.python.org/pypi/mtFileUtil/" }, "release_url": "https://pypi.org/project/mtFileUtil/0.6.2/", "requires_dist": null, "requires_python": null, "summary": "High-performance tools for reading data from text files.", "version": "0.6.2" }, "last_serial": 795069, "releases": { "0.6.0": [ { "comment_text": "", "digests": { "md5": "06758df67c464303c365f57c9e61331a", "sha256": "e553d3513813aabdce8461689944714e6694bea24d2706259ac5de8bf2399436" }, "downloads": -1, "filename": "mtFileUtil-0.6.0.tar.gz", "has_sig": false, "md5_digest": "06758df67c464303c365f57c9e61331a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 42306, "upload_time": "2012-10-04T22:05:59", "url": "https://files.pythonhosted.org/packages/e3/f5/f9c1c6af19835bc1540f27e660708b40f0cbff74a30667a1815fdfbe31ab/mtFileUtil-0.6.0.tar.gz" } ], "0.6.2": [ { "comment_text": "", "digests": { "md5": "b12a99e3a49a53b10da800ddda83eba5", "sha256": "2306ba72eae12690f50a26ca42a98c465fd0386c9e475a697ce5a6a20b1d1ec7" }, "downloads": -1, "filename": "mtFileUtil-0.6.2.tar.gz", "has_sig": false, "md5_digest": "b12a99e3a49a53b10da800ddda83eba5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 43538, "upload_time": "2012-10-20T00:54:22", "url": "https://files.pythonhosted.org/packages/ff/6b/632a962e32018bc72853c37364fff5d10bd2e5d49dab7055aa22c2a20d45/mtFileUtil-0.6.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b12a99e3a49a53b10da800ddda83eba5", "sha256": "2306ba72eae12690f50a26ca42a98c465fd0386c9e475a697ce5a6a20b1d1ec7" }, "downloads": -1, "filename": "mtFileUtil-0.6.2.tar.gz", "has_sig": false, "md5_digest": "b12a99e3a49a53b10da800ddda83eba5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 43538, "upload_time": "2012-10-20T00:54:22", "url": "https://files.pythonhosted.org/packages/ff/6b/632a962e32018bc72853c37364fff5d10bd2e5d49dab7055aa22c2a20d45/mtFileUtil-0.6.2.tar.gz" } ] }