{
    "info": {
        "author": "Radim Rehurek",
        "author_email": "me@radimrehurek.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 4 - Beta",
            "Environment :: Console",
            "Intended Audience :: Developers",
            "License :: OSI Approved :: MIT License",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3.5",
            "Programming Language :: Python :: 3.6",
            "Programming Language :: Python :: 3.7",
            "Topic :: Database :: Front-Ends",
            "Topic :: System :: Distributed Computing"
        ],
        "description": "======================================================\nsmart_open \u2014 utils for streaming large files in Python\n======================================================\n\n|License|_ |Travis|_\n\n.. |License| image:: https://img.shields.io/pypi/l/smart_open.svg\n.. |Travis| image:: https://travis-ci.org/RaRe-Technologies/smart_open.svg?branch=master\n.. _Travis: https://travis-ci.org/RaRe-Technologies/smart_open\n.. _License: https://github.com/RaRe-Technologies/smart_open/blob/master/LICENSE\n\nWhat?\n=====\n\n``smart_open`` is a Python 2 & Python 3 library for **efficient streaming of very large files** from/to S3, HDFS, WebHDFS, HTTP, or local (compressed) files. It's a drop-in replacement for Python's built-in ``open()``: it can do anything ``open`` can (100% compatible, falls back to native ``open`` wherever possible), plus lots of nifty extra stuff on top.\n\n``smart_open`` is well-tested, well-documented, and has a simple, Pythonic API:\n\n.. code-block:: python\n\n  >>> from smart_open import smart_open\n\n  >>> # stream lines from an S3 object\n  >>> for line in smart_open('s3://mybucket/mykey.txt', 'rb'):\n  ...    print(line.decode('utf8'))\n\n  >>> # stream from/to compressed files, with transparent (de)compression:\n  >>> for line in smart_open('./foo.txt.gz', encoding='utf8'):\n  ...    print(line)\n\n  >>> # can use context managers too:\n  >>> with smart_open('/home/radim/foo.txt.bz2', 'wb') as fout:\n  ...    fout.write(u\"some content\\n\".encode('utf8'))\n\n  >>> with smart_open('s3://mybucket/mykey.txt', 'rb') as fin:\n  ...     for line in fin:\n  ...         print(line.decode('utf8'))\n  ...     fin.seek(0)  # seek to the beginning\n  ...     b1000 = fin.read(1000)  # read 1000 bytes\n\n  >>> # stream from HDFS\n  >>> for line in smart_open('hdfs://user/hadoop/my_file.txt', encoding='utf8'):\n  ...     print(line)\n\n  >>> # stream from HTTP\n  >>> for line in smart_open('http://example.com/index.html'):\n  ...     print(line)\n\n  >>> # stream from WebHDFS\n  >>> for line in smart_open('webhdfs://host:port/user/hadoop/my_file.txt'):\n  ...     print(line)\n\n  >>> # stream content *into* S3 (write mode):\n  >>> with smart_open('s3://mybucket/mykey.txt', 'wb') as fout:\n  ...     for line in [b'first line\\n', b'second line\\n', b'third line\\n']:\n  ...          fout.write(line)\n\n  >>> # stream content *into* HDFS (write mode):\n  >>> with smart_open('hdfs://host:port/user/hadoop/my_file.txt', 'wb') as fout:\n  ...     for line in [b'first line\\n', b'second line\\n', b'third line\\n']:\n  ...          fout.write(line)\n\n  >>> # stream content *into* WebHDFS (write mode):\n  >>> with smart_open('webhdfs://host:port/user/hadoop/my_file.txt', 'wb') as fout:\n  ...     for line in [b'first line\\n', b'second line\\n', b'third line\\n']:\n  ...          fout.write(line)\n\n  >>> # stream using a completely custom s3 server, like s3proxy:\n  >>> for line in smart_open('s3u://user:secret@host:port@mybucket/mykey.txt', 'rb'):\n  ...    print(line.decode('utf8'))\n\n  >>> # you can also use a boto.s3.key.Key instance directly:\n  >>> key = boto.connect_s3().get_bucket(\"my_bucket\").get_key(\"my_key\")\n  >>> with smart_open(key, 'rb') as fin:\n  ...     for line in fin:\n  ...         print(line.decode('utf8'))\n \n  >>> # Stream to Digital Ocean Spaces bucket providing credentials from boto profile\n  >>> with smart_open('s3://bucket-for-experiments/file.txt', 'wb', endpoint_url='https://ams3.digitaloceanspaces.com', profile_name='digitalocean') as fout:\n  ...     fout.write(b'here we stand')\n\nWhy?\n----\n\nWorking with large S3 files using Amazon's default Python library, `boto <http://docs.pythonboto.org/en/latest/>`_ and `boto3 <https://boto3.readthedocs.io/en/latest/>`_, is a pain. Its ``key.set_contents_from_string()`` and ``key.get_contents_as_string()`` methods only work for small files (loaded in RAM, no streaming).\nThere are nasty hidden gotchas when using ``boto``'s multipart upload functionality that is needed for large files, and a lot of boilerplate.\n\n``smart_open`` shields you from that. It builds on boto3 but offers a cleaner, Pythonic API. The result is less code for you to write and fewer bugs to make.\n\nInstallation\n------------\n::\n\n    pip install smart_open\n\nOr, if you prefer to install from the `source tar.gz <http://pypi.python.org/pypi/smart_open>`_::\n\n    python setup.py test  # run unit tests\n    python setup.py install\n\nTo run the unit tests (optional), you'll also need to install `mock <https://pypi.python.org/pypi/mock>`_ , `moto <https://github.com/spulec/moto>`_ and `responses <https://github.com/getsentry/responses>`_ (``pip install mock moto responses``). The tests are also run automatically with `Travis CI <https://travis-ci.org/RaRe-Technologies/smart_open>`_ on every commit push & pull request.\n\nSupported archive types\n-----------------------\n``smart_open`` allows reading and writing gzip, bzip2 and xz files. They are transparently handled\nover HTTP, S3, and other protocols, too.\n\nS3-Specific Options\n-------------------\n\nThe S3 reader supports gzipped content transparently, as long as the key is obviously a gzipped file (e.g. ends with \".gz\").\n\nThere are a few optional keyword arguments that are useful only for S3 access.\n\nThe **host** and **profile** arguments are both passed to `boto.s3_connect()` as keyword arguments:\n\n.. code-block:: python\n\n  >>> smart_open('s3://', host='s3.amazonaws.com')\n  >>> smart_open('s3://', profile_name='my-profile')\n\nThe **s3_session** argument allows you to provide a custom `boto3.Session` instance for connecting to S3:\n\n.. code-block:: python\n\n  >>> smart_open('s3://', s3_session=boto3.Session())\n\n\nThe **s3_upload** argument accepts a dict of any parameters accepted by `initiate_multipart_upload <https://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.ObjectSummary.initiate_multipart_upload/>`_:\n\n.. code-block:: python\n\n  >>> smart_open('s3://', s3_upload={ 'ServerSideEncryption': 'AES256' })\n\nSince going over all (or select) keys in an S3 bucket is a very common operation,\nthere's also an extra method ``smart_open.s3_iter_bucket()`` that does this efficiently,\n**processing the bucket keys in parallel** (using multiprocessing):\n\n.. code-block:: python\n\n  >>> from smart_open import smart_open, s3_iter_bucket\n  >>> # get all JSON files under \"mybucket/foo/\"\n  >>> bucket = boto.connect_s3().get_bucket('mybucket')\n  >>> for key, content in s3_iter_bucket(bucket, prefix='foo/', accept_key=lambda key: key.endswith('.json')):\n  ...     print(key, len(content))\n\nFor more info (S3 credentials in URI, minimum S3 part size...) and full method signatures, check out the API docs:\n\n.. code-block:: python\n\n  >>> import smart_open\n  >>> help(smart_open.smart_open_lib)\n\n\nComments, bug reports\n---------------------\n\n``smart_open`` lives on `Github <https://github.com/RaRe-Technologies/smart_open>`_. You can file\nissues or pull requests there. Suggestions, pull requests and improvements welcome!\n\n----------------\n\n``smart_open`` is open source software released under the `MIT license <https://github.com/piskvorky/smart_open/blob/master/LICENSE>`_.\nCopyright (c) 2015-now `Radim \u0158eh\u016f\u0159ek <https://radimrehurek.com>`_.\n",
        "description_content_type": "",
        "docs_url": null,
        "download_url": "http://pypi.python.org/pypi/smart_open",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/piskvorky/smart_open",
        "keywords": "file streaming",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "srcd_smart_open",
        "package_url": "https://pypi.org/project/srcd_smart_open/",
        "platform": "any",
        "project_url": "https://pypi.org/project/srcd_smart_open/",
        "project_urls": {
            "Download": "http://pypi.python.org/pypi/smart_open",
            "Homepage": "https://github.com/piskvorky/smart_open"
        },
        "release_url": "https://pypi.org/project/srcd_smart_open/1.9.0/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "Utils for streaming large files (S3, HDFS, gzip, bz2...) - temporary source{d} fork",
        "version": "1.9.0"
    },
    "last_serial": 4863983,
    "releases": {
        "1.9.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "0b1de3c6f892d0ec2e8fbe2dc94d111a",
                    "sha256": "a5917ed8bad95e10c6995310313c4e1127ea7015b53e48d32698b5f5b9620eb6"
                },
                "downloads": -1,
                "filename": "srcd_smart_open-1.9.0.tar.gz",
                "has_sig": false,
                "md5_digest": "0b1de3c6f892d0ec2e8fbe2dc94d111a",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 42775,
                "upload_time": "2019-02-25T10:34:53",
                "url": "https://files.pythonhosted.org/packages/37/0c/cadc3a0dd34fcaf5c539157be437c1cde3451b722c60dc0620d1dca8a0c0/srcd_smart_open-1.9.0.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "0b1de3c6f892d0ec2e8fbe2dc94d111a",
                "sha256": "a5917ed8bad95e10c6995310313c4e1127ea7015b53e48d32698b5f5b9620eb6"
            },
            "downloads": -1,
            "filename": "srcd_smart_open-1.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0b1de3c6f892d0ec2e8fbe2dc94d111a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 42775,
            "upload_time": "2019-02-25T10:34:53",
            "url": "https://files.pythonhosted.org/packages/37/0c/cadc3a0dd34fcaf5c539157be437c1cde3451b722c60dc0620d1dca8a0c0/srcd_smart_open-1.9.0.tar.gz"
        }
    ]
}