{
"info": {
"author": "Radim Rehurek",
"author_email": "me@radimrehurek.com",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 4 - Beta",
"Environment :: Console",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Topic :: Database :: Front-Ends",
"Topic :: System :: Distributed Computing"
],
"description": "======================================================\nsmart_open \u2014 utils for streaming large files in Python\n======================================================\n\n|License|_ |Travis|_\n\n.. |License| image:: https://img.shields.io/pypi/l/smart_open.svg\n.. |Travis| image:: https://travis-ci.org/RaRe-Technologies/smart_open.svg?branch=master\n.. _Travis: https://travis-ci.org/RaRe-Technologies/smart_open\n.. _License: https://github.com/RaRe-Technologies/smart_open/blob/master/LICENSE\n\nWhat?\n=====\n\n``smart_open`` is a Python 2 & Python 3 library for **efficient streaming of very large files** from/to S3, HDFS, WebHDFS, HTTP, or local (compressed) files. It's a drop-in replacement for Python's built-in ``open()``: it can do anything ``open`` can (100% compatible, falls back to native ``open`` wherever possible), plus lots of nifty extra stuff on top.\n\n``smart_open`` is well-tested, well-documented, and has a simple, Pythonic API:\n\n.. code-block:: python\n\n >>> from smart_open import smart_open\n\n >>> # stream lines from an S3 object\n >>> for line in smart_open('s3://mybucket/mykey.txt', 'rb'):\n ... print(line.decode('utf8'))\n\n >>> # stream from/to compressed files, with transparent (de)compression:\n >>> for line in smart_open('./foo.txt.gz', encoding='utf8'):\n ... print(line)\n\n >>> # can use context managers too:\n >>> with smart_open('/home/radim/foo.txt.bz2', 'wb') as fout:\n ... fout.write(u\"some content\\n\".encode('utf8'))\n\n >>> with smart_open('s3://mybucket/mykey.txt', 'rb') as fin:\n ... for line in fin:\n ... print(line.decode('utf8'))\n ... fin.seek(0) # seek to the beginning\n ... b1000 = fin.read(1000) # read 1000 bytes\n\n >>> # stream from HDFS\n >>> for line in smart_open('hdfs://user/hadoop/my_file.txt', encoding='utf8'):\n ... print(line)\n\n >>> # stream from HTTP\n >>> for line in smart_open('http://example.com/index.html'):\n ... print(line)\n\n >>> # stream from WebHDFS\n >>> for line in smart_open('webhdfs://host:port/user/hadoop/my_file.txt'):\n ... print(line)\n\n >>> # stream content *into* S3 (write mode):\n >>> with smart_open('s3://mybucket/mykey.txt', 'wb') as fout:\n ... for line in [b'first line\\n', b'second line\\n', b'third line\\n']:\n ... fout.write(line)\n\n >>> # stream content *into* HDFS (write mode):\n >>> with smart_open('hdfs://host:port/user/hadoop/my_file.txt', 'wb') as fout:\n ... for line in [b'first line\\n', b'second line\\n', b'third line\\n']:\n ... fout.write(line)\n\n >>> # stream content *into* WebHDFS (write mode):\n >>> with smart_open('webhdfs://host:port/user/hadoop/my_file.txt', 'wb') as fout:\n ... for line in [b'first line\\n', b'second line\\n', b'third line\\n']:\n ... fout.write(line)\n\n >>> # stream using a completely custom s3 server, like s3proxy:\n >>> for line in smart_open('s3u://user:secret@host:port@mybucket/mykey.txt', 'rb'):\n ... print(line.decode('utf8'))\n\n >>> # you can also use a boto.s3.key.Key instance directly:\n >>> key = boto.connect_s3().get_bucket(\"my_bucket\").get_key(\"my_key\")\n >>> with smart_open(key, 'rb') as fin:\n ... for line in fin:\n ... print(line.decode('utf8'))\n \n >>> # Stream to Digital Ocean Spaces bucket providing credentials from boto profile\n >>> with smart_open('s3://bucket-for-experiments/file.txt', 'wb', endpoint_url='https://ams3.digitaloceanspaces.com', profile_name='digitalocean') as fout:\n ... fout.write(b'here we stand')\n\nWhy?\n----\n\nWorking with large S3 files using Amazon's default Python library, `boto `_ and `boto3 `_, is a pain. Its ``key.set_contents_from_string()`` and ``key.get_contents_as_string()`` methods only work for small files (loaded in RAM, no streaming).\nThere are nasty hidden gotchas when using ``boto``'s multipart upload functionality that is needed for large files, and a lot of boilerplate.\n\n``smart_open`` shields you from that. It builds on boto3 but offers a cleaner, Pythonic API. The result is less code for you to write and fewer bugs to make.\n\nInstallation\n------------\n::\n\n pip install smart_open\n\nOr, if you prefer to install from the `source tar.gz `_::\n\n python setup.py test # run unit tests\n python setup.py install\n\nTo run the unit tests (optional), you'll also need to install `mock `_ , `moto `_ and `responses `_ (``pip install mock moto responses``). The tests are also run automatically with `Travis CI `_ on every commit push & pull request.\n\nSupported archive types\n-----------------------\n``smart_open`` allows reading and writing gzip, bzip2 and xz files. They are transparently handled\nover HTTP, S3, and other protocols, too.\n\nS3-Specific Options\n-------------------\n\nThe S3 reader supports gzipped content transparently, as long as the key is obviously a gzipped file (e.g. ends with \".gz\").\n\nThere are a few optional keyword arguments that are useful only for S3 access.\n\nThe **host** and **profile** arguments are both passed to `boto.s3_connect()` as keyword arguments:\n\n.. code-block:: python\n\n >>> smart_open('s3://', host='s3.amazonaws.com')\n >>> smart_open('s3://', profile_name='my-profile')\n\nThe **s3_session** argument allows you to provide a custom `boto3.Session` instance for connecting to S3:\n\n.. code-block:: python\n\n >>> smart_open('s3://', s3_session=boto3.Session())\n\n\nThe **s3_upload** argument accepts a dict of any parameters accepted by `initiate_multipart_upload `_:\n\n.. code-block:: python\n\n >>> smart_open('s3://', s3_upload={ 'ServerSideEncryption': 'AES256' })\n\nSince going over all (or select) keys in an S3 bucket is a very common operation,\nthere's also an extra method ``smart_open.s3_iter_bucket()`` that does this efficiently,\n**processing the bucket keys in parallel** (using multiprocessing):\n\n.. code-block:: python\n\n >>> from smart_open import smart_open, s3_iter_bucket\n >>> # get all JSON files under \"mybucket/foo/\"\n >>> bucket = boto.connect_s3().get_bucket('mybucket')\n >>> for key, content in s3_iter_bucket(bucket, prefix='foo/', accept_key=lambda key: key.endswith('.json')):\n ... print(key, len(content))\n\nFor more info (S3 credentials in URI, minimum S3 part size...) and full method signatures, check out the API docs:\n\n.. code-block:: python\n\n >>> import smart_open\n >>> help(smart_open.smart_open_lib)\n\n\nComments, bug reports\n---------------------\n\n``smart_open`` lives on `Github `_. You can file\nissues or pull requests there. Suggestions, pull requests and improvements welcome!\n\n----------------\n\n``smart_open`` is open source software released under the `MIT license `_.\nCopyright (c) 2015-now `Radim \u0158eh\u016f\u0159ek `_.\n",
"description_content_type": "",
"docs_url": null,
"download_url": "http://pypi.python.org/pypi/smart_open",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/piskvorky/smart_open",
"keywords": "file streaming",
"license": "MIT",
"maintainer": "",
"maintainer_email": "",
"name": "srcd_smart_open",
"package_url": "https://pypi.org/project/srcd_smart_open/",
"platform": "any",
"project_url": "https://pypi.org/project/srcd_smart_open/",
"project_urls": {
"Download": "http://pypi.python.org/pypi/smart_open",
"Homepage": "https://github.com/piskvorky/smart_open"
},
"release_url": "https://pypi.org/project/srcd_smart_open/1.9.0/",
"requires_dist": null,
"requires_python": "",
"summary": "Utils for streaming large files (S3, HDFS, gzip, bz2...) - temporary source{d} fork",
"version": "1.9.0"
},
"last_serial": 4863983,
"releases": {
"1.9.0": [
{
"comment_text": "",
"digests": {
"md5": "0b1de3c6f892d0ec2e8fbe2dc94d111a",
"sha256": "a5917ed8bad95e10c6995310313c4e1127ea7015b53e48d32698b5f5b9620eb6"
},
"downloads": -1,
"filename": "srcd_smart_open-1.9.0.tar.gz",
"has_sig": false,
"md5_digest": "0b1de3c6f892d0ec2e8fbe2dc94d111a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 42775,
"upload_time": "2019-02-25T10:34:53",
"url": "https://files.pythonhosted.org/packages/37/0c/cadc3a0dd34fcaf5c539157be437c1cde3451b722c60dc0620d1dca8a0c0/srcd_smart_open-1.9.0.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "0b1de3c6f892d0ec2e8fbe2dc94d111a",
"sha256": "a5917ed8bad95e10c6995310313c4e1127ea7015b53e48d32698b5f5b9620eb6"
},
"downloads": -1,
"filename": "srcd_smart_open-1.9.0.tar.gz",
"has_sig": false,
"md5_digest": "0b1de3c6f892d0ec2e8fbe2dc94d111a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 42775,
"upload_time": "2019-02-25T10:34:53",
"url": "https://files.pythonhosted.org/packages/37/0c/cadc3a0dd34fcaf5c539157be437c1cde3451b722c60dc0620d1dca8a0c0/srcd_smart_open-1.9.0.tar.gz"
}
]
}