{ "info": { "author": "Center for Data Science and Public Policy", "author_email": "datascifellows@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Natural Language :: English", "Programming Language :: Python :: 3" ], "description": "\nOhio\n****\n\nOh! IO: The I/O tools that ``io`` doesn\u2019t want you to have.\n\nOhio provides the missing links between Python\u2019s built-in I/O\nprimitives, to help ensure the efficiency, clarity and elegance of\nyour code.\n\nFor higher-level examples of what Ohio can do for you, see\n`Extensions`_ and `Recipes`_.\n\n\nContents\n^^^^^^^^\n\n* `Ohio`_\n\n * `Installation`_\n\n * `Modules`_\n\n * `csvio`_\n\n * `iterio`_\n\n * `pipeio`_\n\n * `baseio`_\n\n * `Extensions`_\n\n * `Extensions for NumPy`_\n\n * `Extensions for Pandas`_\n\n * `Benchmarking`_\n\n * `Recipes`_\n\n * `dbjoin`_\n\n\nInstallation\n============\n\nOhio is a distributed library with support for Python v3. It is\navailable from `pypi.org `_:\n\n::\n\n $ pip install ohio\n\n\nModules\n=======\n\n\ncsvio\n-----\n\nFlexibly encode data to CSV format.\n\n**ohio.encode_csv(rows, *writer_args, writer=, write_header=False, **writer_kwargs)**\n\n Encode the specified iterable of ``rows`` into CSV text.\n\n Data is encoded to an in-memory ``str``, (rather than to the file\n system), via an internally-managed ``io.StringIO``, (newly\n constructed for every invocation of ``encode_csv``).\n\n For example:\n\n ::\n\n >>> data = [\n ... ('1/2/09 6:17', 'Product1', '1200', 'Mastercard', 'carolina'),\n ... ('1/2/09 4:53', 'Product1', '1200', 'Visa', 'Betina'),\n ... ]\n\n >>> encoded_csv = encode_csv(data)\n\n >>> encoded_csv[:80]\n '1/2/09 6:17,Product1,1200,Mastercard,carolina\\r\\n1/2/09 4:53,Product1,1200,Visa,Be'\n\n >>> encoded_csv.splitlines(keepends=True)\n ['1/2/09 6:17,Product1,1200,Mastercard,carolina\\r\\n',\n '1/2/09 4:53,Product1,1200,Visa,Betina\\r\\n']\n\n By default, ``rows`` are encoded by built-in ``csv.writer``. You\n may specify an alternate ``writer``, and provide construction\n arguments:\n\n ::\n\n >>> header = ('Transaction_date', 'Product', 'Price', 'Payment_Type', 'Name')\n\n >>> data = [\n ... {'Transaction_date': '1/2/09 6:17',\n ... 'Product': 'Product1',\n ... 'Price': '1200',\n ... 'Payment_Type': 'Mastercard',\n ... 'Name': 'carolina'},\n ... {'Transaction_date': '1/2/09 4:53',\n ... 'Product': 'Product1',\n ... 'Price': '1200',\n ... 'Payment_Type': 'Visa',\n ... 'Name': 'Betina'},\n ... ]\n\n >>> encoded_csv = encode_csv(data, writer=csv.DictWriter, fieldnames=header)\n\n >>> encoded_csv.splitlines(keepends=True)\n ['1/2/09 6:17,Product1,1200,Mastercard,carolina\\r\\n',\n '1/2/09 4:53,Product1,1200,Visa,Betina\\r\\n']\n\n And, for such writers featuring the method ``writeheader``, you may\n instruct ``encode_csv`` to invoke this, prior to writing ``rows``:\n\n ::\n\n >>> encoded_csv = encode_csv(\n ... data,\n ... writer=csv.DictWriter,\n ... fieldnames=header,\n ... write_header=True,\n ... )\n\n >>> encoded_csv.splitlines(keepends=True)\n ['Transaction_date,Product,Price,Payment_Type,Name\\r\\n',\n '1/2/09 6:17,Product1,1200,Mastercard,carolina\\r\\n',\n '1/2/09 4:53,Product1,1200,Visa,Betina\\r\\n']\n\n**class ohio.CsvTextIO(rows, *writer_args, write_header=False,\nchunk_size=10, **writer_kwargs)**\n\n Readable file-like interface encoding specified data as CSV.\n\n Rows of input data are only consumed and encoded as needed, as\n ``CsvTextIO`` is read.\n\n Rather than write to the file system, an internal ``io.StringIO``\n buffer is used to store output temporarily, until it is read. (Also\n unlike ``ohio.encode_csv``, this buffer is reused across read/write\n cycles.)\n\n For example, we might encode the following data as CSV:\n\n ::\n\n >>> data = [\n ... ('1/2/09 6:17', 'Product1', '1200', 'Mastercard', 'carolina'),\n ... ('1/2/09 4:53', 'Product1', '1200', 'Visa', 'Betina'),\n ... ]\n\n >>> csv_buffer = CsvTextIO(data)\n\n Data may be encoded and retrieved via standard file object methods,\n such as ``read``, ``readline`` and iteration:\n\n ::\n\n >>> csv_buffer.read(15)\n '1/2/09 6:17,Pro'\n\n >>> next(csv_buffer)\n 'duct1,1200,Mastercard,carolina\\r\\n'\n\n >>> list(csv_buffer)\n ['1/2/09 4:53,Product1,1200,Visa,Betina\\r\\n']\n\n >>> csv_buffer.read()\n ''\n\n Note, in the above example, we first read 15 bytes of the encoded\n CSV, then read the remainder of the line via iteration, (which\n invokes ``readline``), and then collected the remaining CSV into a\n list. Finally, we attempted to read the entirety still remaining \u2013\n which was nothing.\n\n**class ohio.CsvDictTextIO(rows, *writer_args, write_header=False,\nchunk_size=10, **writer_kwargs)**\n\n ``CsvTextIO`` which accepts row data in the form of ``dict``.\n\n Data is passed to ``csv.DictWriter``.\n\n See also: ``ohio.CsvTextIO``.\n\n**ohio.iter_csv(rows, *writer_args, write_header=False,\n**writer_kwargs)**\n\n Generate lines of encoded CSV from ``rows`` of data.\n\n See: ``ohio.CsvWriterTextIO``.\n\n**ohio.iter_dict_csv(rows, *writer_args, write_header=False,\n**writer_kwargs)**\n\n Generate lines of encoded CSV from ``rows`` of data.\n\n See: ``ohio.CsvWriterTextIO``.\n\n**class ohio.CsvWriterTextIO(*writer_args, **writer_kwargs)**\n\n csv.writer-compatible interface to iteratively encode CSV in\n memory.\n\n The writer instance may also be read, to retrieve written CSV, as\n it is written.\n\n Rather than write to the file system, an internal ``io.StringIO``\n buffer is used to store output temporarily, until it is read.\n (Unlike ``ohio.encode_csv``, this buffer is reused across\n read/write cycles.)\n\n Features class method ``iter_csv``: a generator to map an input\n iterable of data ``rows`` to lines of encoded CSV text.\n (``iter_csv`` differs from ``ohio.encode_csv`` in that it lazily\n generates lines of CSV, rather than eagerly encoding the entire CSV\n body.)\n\n **Note**: If you don\u2019t need to control *how* rows are written, but\n do want an iterative and/or readable interface to encoded CSV,\n consider also the more straight-forward ``ohio.CsvTextIO``.\n\n For example, we may construct ``CsvWriterTextIO`` with the same\n (optional) arguments as we would ``csv.writer``, (minus the file\n descriptor):\n\n ::\n\n >>> csv_buffer = CsvWriterTextIO(dialect='excel')\n\n \u2026and write to it, via either ``writerow`` or ``writerows``:\n\n ::\n\n >>> csv_buffer.writerows([\n ... ('1/2/09 6:17', 'Product1', '1200', 'Mastercard', 'carolina'),\n ... ('1/2/09 4:53', 'Product1', '1200', 'Visa', 'Betina'),\n ... ])\n\n Written data is then available to be read, via standard file object\n methods, such as ``read``, ``readline`` and iteration:\n\n ::\n\n >>> csv_buffer.read(15)\n '1/2/09 6:17,Pro'\n\n >>> list(csv_buffer)\n ['duct1,1200,Mastercard,carolina\\r\\n',\n '1/2/09 4:53,Product1,1200,Visa,Betina\\r\\n']\n\n Note, in the above example, we first read 15 bytes of the encoded\n CSV, and then collected the remaining CSV into a list, through\n iteration, (which returns its lines, via ``readline``). However,\n the first line was short by that first 15 bytes.\n\n That is, reading CSV out of the ``CsvWriterTextIO`` empties that\n content from its buffer:\n\n ::\n\n >>> csv_buffer.read()\n ''\n\n We can repopulate our ``CsvWriterTextIO`` buffer by writing to it\n again:\n\n ::\n\n >>> csv_buffer.writerows([\n ... ('1/2/09 13:08', 'Product1', '1200', 'Mastercard', 'Federica e Andrea'),\n ... ('1/3/09 14:44', 'Product1', '1200', 'Visa', 'Gouya'),\n ... ])\n\n >>> encoded_csv = csv_buffer.read()\n\n >>> encoded_csv[:80]\n '1/2/09 13:08,Product1,1200,Mastercard,Federica e Andrea\\r\\n1/3/09 14:44,Product1,1'\n\n >>> encoded_csv.splitlines(keepends=True)\n ['1/2/09 13:08,Product1,1200,Mastercard,Federica e Andrea\\r\\n',\n '1/3/09 14:44,Product1,1200,Visa,Gouya\\r\\n']\n\n Finally, class method ``iter_csv`` can do all this for us,\n generating lines of encoded CSV as we request them:\n\n ::\n\n >>> lines_csv = CsvWriterTextIO.iter_csv([\n ... ('Transaction_date', 'Product', 'Price', 'Payment_Type', 'Name'),\n ... ('1/2/09 6:17', 'Product1', '1200', 'Mastercard', 'carolina'),\n ... ('1/2/09 4:53', 'Product1', '1200', 'Visa', 'Betina'),\n ... ('1/2/09 13:08', 'Product1', '1200', 'Mastercard', 'Federica e Andrea'),\n ... ('1/3/09 14:44', 'Product1', '1200', 'Visa', 'Gouya'),\n ... ])\n\n >>> next(lines_csv)\n 'Transaction_date,Product,Price,Payment_Type,Name\\r\\n'\n\n >>> next(lines_csv)\n '1/2/09 6:17,Product1,1200,Mastercard,carolina\\r\\n'\n\n >>> list(lines_csv)\n ['1/2/09 4:53,Product1,1200,Visa,Betina\\r\\n',\n '1/2/09 13:08,Product1,1200,Mastercard,Federica e Andrea\\r\\n',\n '1/3/09 14:44,Product1,1200,Visa,Gouya\\r\\n']\n\n**class ohio.CsvDictWriterTextIO(*writer_args, **writer_kwargs)**\n\n ``CsvWriterTextIO`` which accepts row data in the form of ``dict``.\n\n Data is passed to ``csv.DictWriter``.\n\n See also: ``ohio.CsvWriterTextIO``.\n\n\niterio\n------\n\nProvide a readable file-like interface to any iterable.\n\n**class ohio.IteratorTextIO(iterable)**\n\n Readable file-like interface for iterable text streams.\n\n ``IteratorTextIO`` wraps any iterable of text for consumption like\n a file, offering methods ``readline()``, ``read([size])``, *etc.*,\n (implemented via base class ``ohio.StreamTextIOBase``).\n\n For example, given a consumer which expects to ``read()``:\n\n ::\n\n >>> def read_chunks(fdesc, chunk_size=1024):\n ... get_chunk = lambda: fdesc.read(chunk_size)\n ... yield from iter(get_chunk, '')\n\n \u2026And either streamed or in-memory text (*i.e.* which is not simply\n on a file system):\n\n ::\n\n >>> def all_caps(fdesc):\n ... for line in fdesc:\n ... yield line.upper()\n\n \u2026We can connect these two interfaces via ``IteratorTextIO``:\n\n ::\n\n >>> with open('/usr/share/dict/words') as fdesc:\n ... louder_words_lines = all_caps(fdesc)\n ... with IteratorTextIO(louder_words_lines) as louder_words_desc:\n ... louder_words_chunked = read_chunks(louder_words_desc)\n\n\npipeio\n------\n\nEfficiently connect ``read()`` and ``write()`` interfaces.\n\n``PipeTextIO`` provides a *readable* and iterable interface to text\nwhose producer requires a *writable* interface.\n\nIn contrast to first writing such text to memory and then consuming\nit, ``PipeTextIO`` only allows write operations as necessary to fill\nits buffer, to fulfill read operations, asynchronously. As such,\n``PipeTextIO`` consumes a stable minimum of memory, and may\nsignificantly boost speed, with a minimum of boilerplate.\n\n**ohio.pipe_text(writer_func, *args, buffer_size=None, **kwargs)**\n\n Iteratively stream output written by given function through\n readable file-like interface.\n\n Uses in-process writer thread, (which runs the given function), to\n mimic buffered text transfer, such as between the standard output\n and input of two piped processes.\n\n Calls to ``write`` are blocked until required by calls to ``read``.\n\n Note: If at all possible, use a generator! Your iterative text-\n writing function can most likely be designed as a generator, (or as\n some sort of iterator). Its output can then, far more simply and\n easily, be streamed to some input. If your input must be ``read``\n from a file-like object, see ``ohio.IteratorTextIO``. If your\n output must be CSV-encoded, see ``ohio.encode_csv``,\n ``ohio.CsvTextIO`` and ``ohio.CsvWriterTextIO``.\n\n ``PipeTextIO`` is suitable for situations where output *must* be\n written to a file-like object, which is made blocking to enforce\n iterativity.\n\n ``PipeTextIO`` is not \u201cseekable,\u201d but supports all other typical,\n read-write file-like features.\n\n For example, consider the following callable, (artificially)\n requiring a file-like object, to which to write:\n\n ::\n\n >>> def write_output(file_like):\n ... file_like.write(\"Hi there.\\r\\n\")\n ... print('[writer]', 'Yay I wrote one line')\n ... file_like.write(\"Cool, right?\\r\\n\")\n ... print('[writer]', 'Finally ... I wrote a second line!')\n ... file_like.write(\"All right, later :-)\\r\\n\")\n ... print('[writer]', \"Done.\")\n\n Most typically, we might *read* this content as follows, using\n either the ``PipeTextIO`` constructor or its ``pipe_text`` helper:\n\n ::\n\n >>> with PipeTextIO(write_output) as pipe:\n ... for line in pipe:\n ... ...\n\n And, this syntax is recommended. However, for the sake of example,\n consider the following:\n\n ::\n\n >>> pipe = PipeTextIO(write_output, buffer_size=1)\n\n >>> pipe.read(5)\n [writer] Yay I wrote one line\n 'Hi th'\n [writer] Finally ... I wrote a second line!\n\n >>> pipe.readline()\n 'ere.\\r\\n'\n\n >>> pipe.readline()\n 'Cool, right?\\r\\n'\n [writer] Done.\n\n >>> pipe.read()\n 'All right, later :-)\\r\\n'\n\n In the above example, ``write_output`` requires a file-like\n interface to which to write its output; (and, we presume that there\n is no alternative to this implementation \u2013 such as a generator \u2013\n that its output is large enough that we don\u2019t want to hold it in\n memory **and** that we don\u2019t need this output written to the file\n system). We are enabled to read it directly, in chunks:\n\n ..\n\n 1. Initially, nothing is written.\n\n 2. 1. Upon requesting to read \u2013 in this case, only the first 5\n bytes \u2013 the writer is initialized, and permitted to\n write its first chunk, (which happens to be one full\n line). This is retrieved from the write buffer, and\n sufficient to satisfy the read request.\n\n 2. Having removed the first chunk from the write buffer,\n the writer is permitted to eagerly write its next chunk,\n (the second line), (but, no more than that).\n\n 3. The second read request \u2013 for the remainder of the line \u2013 is\n fully satisfied by the first chunk retrieved from the write\n buffer. No more writing takes place.\n\n 4. The third read request, for another line, retrieves the\n second chunk from the write buffer. The writer is permitted\n to write its final chunk to the write buffer.\n\n 5. The final read request returns all remaining text,\n (retrieved from the write buffer).\n\n Concretely, this is commonly useful with the PostgreSQL COPY\n command, for efficient data transfer, (and without the added\n complexity of the file system). While your database interface may\n vary, ``PipeTextIO`` enables the following syntax, for example to\n copy data into the database:\n\n ::\n\n >>> def write_csv(file_like):\n ... writer = csv.writer(file_like)\n ... ...\n\n >>> with PipeTextIO(write_csv) as pipe, \\\n ... connection.cursor() as cursor:\n ... cursor.copy_from(pipe, 'my_table', format='csv')\n\n \u2026or, to copy data out of the database:\n\n ::\n\n >>> with connection.cursor() as cursor:\n ... writer = lambda pipe: cursor.copy_to(pipe,\n ... 'my_table',\n ... format='csv')\n ...\n ... with PipeTextIO(writer) as pipe:\n ... reader = csv.reader(pipe)\n ... ...\n\n Alternatively, writer arguments may be passed to ``PipeTextIO``:\n\n ::\n\n >>> with connection.cursor() as cursor:\n ... with PipeTextIO(cursor.copy_to,\n ... args=['my_table'],\n ... kwargs={'format': 'csv'}) as pipe:\n ... reader = csv.reader(pipe)\n ... ...\n\n (But, bear in mind, the signature of the callable passed to\n ``PipeTextIO`` must be such that its first, anonymous argument is\n the ``PipeTextIO`` instance.)\n\n Consider also the above example with the helper ``pipe_text``:\n\n ::\n\n >>> with connection.cursor() as cursor:\n ... with pipe_text(cursor.copy_to,\n ... 'my_table',\n ... format='csv') as pipe:\n ... reader = csv.reader(pipe)\n ... ...\n\n Finally, note that copying *to* the database is likely best\n performed via ``ohio.CsvTextIO``, (though copying *from* requires\n ``PipeTextIO``, as above):\n\n ::\n\n >>> with ohio.CsvTextIO(data_rows) as csv_buffer, \\\n ... connection.cursor() as cursor:\n ... cursor.copy_from(csv_buffer, 'my_table', format='csv')\n\n\nbaseio\n------\n\nLow-level primitives.\n\n**class ohio.StreamTextIOBase**\n\n Readable file-like abstract base class.\n\n Concrete classes must implement method ``__next_chunk__`` to return\n chunk(s) of the text to be read.\n\n**exception ohio.IOClosed(*args)**\n\n Exception indicating an attempted operation on a file-like object\n which has been closed.\n\n.. _extensions:\n\n\nExtensions\n----------\n\nModules integrating Ohio with the toolsets that need it.\n\n\nExtensions for NumPy\n~~~~~~~~~~~~~~~~~~~~\n\nThis module enables writing NumPy array data to database and\npopulating arrays from database via PostgreSQL ``COPY``. The operation\nis ensured, by Ohio, to be memory-efficient.\n\n**Note**: This integration is intended for NumPy, and attempts to\n``import numpy``. NumPy must be available (installed) in your\nenvironment.\n\n**ohio.ext.numpy.pg_copy_to_table(arr, table_name, connectable,\ncolumns=None, fmt=None)**\n\n Copy ``array`` to database table via PostgreSQL ``COPY``.\n\n ``ohio.PipeTextIO`` enables the direct, in-process \u201cpiping\u201d of\n ``array`` CSV into the \u201cstandard input\u201d of the PostgreSQL ``COPY``\n command, for quick, memory-efficient database persistence, (and\n without the needless involvement of the local file system).\n\n For example, given a SQLAlchemy ``connectable`` \u2013 either a database\n connection ``Engine`` or ``Connection`` \u2013 and a NumPy ``array``:\n\n ::\n\n >>> from sqlalchemy import create_engine\n >>> engine = create_engine('postgresql://')\n\n >>> arr = numpy.array([1.000102487, 5.982, 2.901, 103.929])\n\n We may persist this data to an existing table \u2013 *e.g.* \u201cdata\u201d:\n\n ::\n\n >>> pg_copy_to_table(arr, 'data', engine, columns=['value'])\n\n ``pg_copy_to_table`` utilizes ``numpy.savetxt`` and supports its\n ``fmt`` parameter.\n\n**ohio.ext.numpy.pg_copy_from_table(table_name, connectable, dtype,\ncolumns=None)**\n\n Construct ``array`` from database table via PostgreSQL ``COPY``.\n\n ``ohio.PipeTextIO`` enables the in-process \u201cpiping\u201d of the\n PostgreSQL ``COPY`` command into NumPy\u2019s ``fromiter``, for quick,\n memory-efficient construction of ``array`` from database, (and\n without the needless involvement of the local file system).\n\n For example, given a SQLAlchemy ``connectable`` \u2013 either a database\n connection ``Engine`` or ``Connection``:\n\n ::\n\n >>> from sqlalchemy import create_engine\n >>> engine = create_engine('postgresql://')\n\n We may construct a NumPy ``array`` from the contents of a specified\n table:\n\n ::\n\n >>> arr = pg_copy_from_table(\n ... 'data',\n ... engine,\n ... float,\n ... )\n\n**ohio.ext.numpy.pg_copy_from_query(query, connectable, dtype)**\n\n Construct ``array`` from database ``query`` via PostgreSQL\n ``COPY``.\n\n ``ohio.PipeTextIO`` enables the in-process \u201cpiping\u201d of the\n PostgreSQL ``COPY`` command into NumPy\u2019s ``fromiter``, for quick,\n memory-efficient construction of ``array`` from database, (and\n without the needless involvement of the local file system).\n\n For example, given a SQLAlchemy ``connectable`` \u2013 either a database\n connection ``Engine`` or ``Connection``:\n\n ::\n\n >>> from sqlalchemy import create_engine\n >>> engine = create_engine('postgresql://')\n\n We may construct a NumPy ``array`` from a given query:\n\n ::\n\n >>> arr = pg_copy_from_query(\n ... 'select value0, value1, value3 from data',\n ... engine,\n ... float,\n ... )\n\n\nExtensions for Pandas\n~~~~~~~~~~~~~~~~~~~~~\n\nThis module extends ``pandas.DataFrame`` with methods ``pg_copy_to``\nand ``pg_copy_from``.\n\nTo enable, simply import this module anywhere in your project, (most\nlikely \u2013 just once, in its root module):\n\n::\n\n >>> import ohio.ext.pandas\n\nFor example, if you have just one module \u2013 in there \u2013 or, in a Python\npackage:\n\n::\n\n ohio/\n __init__.py\n baseio.py\n ...\n\nthen in its ``__init__.py``, to ensure that extensions are loaded\nbefore your code, which uses them, is run.\n\n**Note**: These extensions are intended for Pandas, and attempt to\n``import pandas``. Pandas must be available (installed) in your\nenvironment.\n\n**class ohio.ext.pandas.DataFramePgCopyTo(data_frame)**\n\n ``pg_copy_to``: Copy ``DataFrame`` to database table via PostgreSQL\n ``COPY``.\n\n ``ohio.CsvTextIO`` enables the direct reading of ``DataFrame`` CSV\n into the \u201cstandard input\u201d of the PostgreSQL ``COPY`` command, for\n quick, memory-efficient database persistence, (and without the\n needless involvement of the local file system).\n\n For example, given a SQLAlchemy ``connectable`` \u2013 either a database\n connection ``Engine`` or ``Connection`` \u2013 and a Pandas\n ``DataFrame``:\n\n ::\n\n >>> from sqlalchemy import create_engine\n >>> engine = create_engine('postgresql://')\n\n >>> df = pandas.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})\n\n We may simply invoke the ``DataFrame``\u2019s Ohio extension method,\n ``pg_copy_to``:\n\n ::\n\n >>> df.pg_copy_to('users', engine)\n\n ``pg_copy_to`` supports all the same parameters as ``to_sql``,\n (excepting parameter ``method``).\n\n**ohio.ext.pandas.to_sql_method_pg_copy_to(table, conn, keys,\ndata_iter)**\n\n Write pandas data to table via stream through PostgreSQL ``COPY``.\n\n This implements a pandas ``to_sql`` \u201cmethod\u201d, utilizing\n ``ohio.CsvTextIO`` for performance stability.\n\n**ohio.ext.pandas.data_frame_pg_copy_from(sql, connectable,\nschema=None, index_col=None, parse_dates=False, columns=None,\ndtype=None, nrows=None, buffer_size=100)**\n\n ``pg_copy_from``: Construct ``DataFrame`` from database table or\n query via PostgreSQL ``COPY``.\n\n ``ohio.PipeTextIO`` enables the direct, in-process \u201cpiping\u201d of the\n PostgreSQL ``COPY`` command into Pandas ``read_csv``, for quick,\n memory-efficient construction of ``DataFrame`` from database, (and\n without the needless involvement of the local file system).\n\n For example, given a SQLAlchemy ``connectable`` \u2013 either a database\n connection ``Engine`` or ``Connection``:\n\n ::\n\n >>> from sqlalchemy import create_engine\n >>> engine = create_engine('postgresql://')\n\n We may simply invoke the ``DataFrame``\u2019s Ohio extension method,\n ``pg_copy_from``:\n\n ::\n\n >>> df = DataFrame.pg_copy_from('users', engine)\n\n ``pg_copy_from`` supports many of the same parameters as\n ``read_sql`` and ``read_csv``.\n\n In addition, ``pg_copy_from`` accepts the optimization parameter\n ``buffer_size``, which controls the maximum number of CSV-encoded\n results written by the database cursor to hold in memory prior to\n their being read into the ``DataFrame``. Depending on use-case,\n increasing this value may speed up the operation, at the cost of\n additional memory \u2013 and vice-versa. ``buffer_size`` defaults to\n ``100``.\n\n\nBenchmarking\n~~~~~~~~~~~~\n\nOhio extensions for pandas were benchmarked to test their speed and\nmemory-efficiency relative both to pandas built-in functionality and\nto custom implementations which do not utilize Ohio.\n\nInterfaces and syntactical niceties aside, Ohio generally features\nmemory stability. Its tools enable pipelines which may also improve\nspeed, (and which do so in standard use-cases).\n\nIn the below benchmark, Ohio extensions ``pg_copy_from`` &\n``pg_copy_to`` reduced memory consumption by 84% & 61%, and completed\nin 39% & 91% less time, relative to pandas built-ins ``read_sql`` &\n``to_sql``, (respectively).\n\nCompared to purpose-built extensions \u2013 which utilized PostgreSQL\n``COPY``, but using ``io.StringIO`` in place of ``ohio.PipeTextIO``\nand ``ohio.CsvTextIO`` \u2013 ``pg_copy_from`` & ``pg_copy_to`` also\nreduced memory consumption by 60% & 32%, respectively.\n``pg_copy_from`` & ``pg_copy_to`` also completed in 16% & 13% less\ntime than the ``io.StringIO`` versions.\n\nThe benchmarks plotted below were produced from averages and standard\ndeviations over 3 randomized trials per target. Input data consisted\nof 896,677 rows across 83 columns: 1 of these of type timestamp, 51\nintegers and 31 floats. The benchmarking package, ``prof``, is\npreserved in `Ohio's repository `_.\n\n.. image:: https://raw.githubusercontent.com/dssg/ohio/0.5.0/doc/img/profile-copy-from-database-to-datafram-1554345457.svg?sanitize=true\n\nohio_pg_copy_from_X\n ``pg_copy_from(buffer_size=X)``\n\n A PostgreSQL database-connected cursor writes the results of\n ``COPY`` to a ``PipeTextIO``, from which pandas constructs a\n ``DataFrame``.\n\npandas_read_sql\n ``pandas.read_sql()``\n\n Pandas constructs a ``DataFrame`` from a given database query.\n\npandas_read_sql_chunks_100\n ``pandas.read_sql(chunksize=100)``\n\n Pandas is instructed to generate ``DataFrame`` slices of the\n database query result, and these slices are concatenated into a\n single frame, with: ``pandas.concat(chunks, copy=False)``.\n\npandas_read_csv_stringio\n ``pandas.read_csv(StringIO())``\n\n A PostgreSQL database-connected cursor writes the results of\n ``COPY`` to a ``StringIO``, from which pandas constructs a\n ``DataFrame``.\n\n.. image:: https://raw.githubusercontent.com/dssg/ohio/0.5.0/doc/img/profile-copy-from-dataframe-to-databas-1555458507.svg?sanitize=true\n\nohio_pg_copy_to\n ``pg_copy_to()``\n\n ``DataFrame`` data are encoded through a ``CsvTextIO``, and read by\n a PostgreSQL database-connected cursor\u2019s ``COPY`` command.\n\npandas_to_sql\n ``pandas.DataFrame.to_sql()``\n\n Pandas inserts ``DataFrame`` data into the database row by row.\n\npandas_to_sql_multi_100\n ``pandas.DataFrame.to_sql(method='multi', chunksize=100)``\n\n Pandas inserts ``DataFrame`` data into the database in chunks of\n rows.\n\ncopy_stringio_to_db\n ``DataFrame`` data are written and encoded to a ``StringIO``, and\n then read by a PostgreSQL database-connected cursor\u2019s ``COPY``\n command.\n\n.. _recipes:\n\n\nRecipes\n-------\n\nStand-alone modules implementing functionality which depends upon Ohio\nprimitives.\n\n\ndbjoin\n~~~~~~\n\nJoin the \u201cCOPY\u201d results of arbitrary database queries in Python,\nwithout unnecessary memory overhead.\n\nThis is largely useful to work around databases\u2019 per-query column\nlimit.\n\n**ohio.recipe.dbjoin.pg_join_queries(queries, engine, sep=', ',\nend='\\n', copy_options=('CSV', 'HEADER'))**\n\n Join the text-encoded result streams of an arbitrary number of\n PostgreSQL database queries to work around the database\u2019s per-query\n column limit.\n\n Query results are read via PostgreSQL ``COPY``, streamed through\n ``PipeTextIO``, and joined line-by-line into a singular stream.\n\n For example, given a set of database queries whose results cannot\n be combined into a single PostgreSQL query, we might join these\n queries\u2019 results and write these results to a file-like object:\n\n ::\n\n >>> queries = [\n ... 'SELECT a, b, c FROM a_table',\n ... ...\n ... ]\n\n >>> with open('results.csv', 'w', newline='') as fdesc:\n ... for line in pg_join_queries(queries, engine):\n ... fdesc.write(line)\n\n Or, we might read these results into a single Pandas DataFrame:\n\n ::\n\n >>> csv_lines = pg_join_queries(queries, engine)\n >>> csv_buffer = ohio.IteratorTextIO(csv_lines)\n >>> df = pandas.read_csv(csv_buffer)\n\n By default, ``pg_join_queries`` requests CSV-encoded results, with\n an initial header line indicating the result columns. These\n options, which are sent directly to the PostgreSQL ``COPY``\n command, may be controlled via ``copy_options``. For example, to\n omit the CSV header:\n\n ::\n\n >>> pg_join_queries(queries, engine, copy_options=['CSV'])\n\n Or, to request PostgreSQL\u2019s tab-delimited text format via the\n syntax of PostgreSQL v9.0+:\n\n ::\n\n >>> pg_join_queries(\n ... queries,\n ... engine,\n ... sep='\\t',\n ... copy_options={'FORMAT': 'TEXT'},\n ... )\n\n In the above example, we\u2019ve instructed PostgreSQL to use its\n ``text`` results encoder, (and we\u2019ve omitted the instruction to\n include a header).\n\n **NOTE**: In the last example, we also explicitly specified the\n separator used in the results\u2019 encoding. This is not passed to the\n database; rather, it is necessary for ``pg_join_queries`` to\n properly join queries\u2019 results.\n\n\n", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/dssg/ohio", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "ohio", "package_url": "https://pypi.org/project/ohio/", "platform": "", "project_url": "https://pypi.org/project/ohio/", "project_urls": { "Homepage": "https://github.com/dssg/ohio" }, "release_url": "https://pypi.org/project/ohio/0.5.0/", "requires_dist": null, "requires_python": "", "summary": "I/O extras", "version": "0.5.0" }, "last_serial": 5930335, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "94ad4d116db02cbbc74b71d45a20e36e", "sha256": "504eb6a445e91c9e22d12ec213cdace5aaf4ab815e9bab46b718ff44669d50e9" }, "downloads": -1, "filename": "ohio-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "94ad4d116db02cbbc74b71d45a20e36e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5988, "upload_time": "2019-02-27T23:02:42", "url": "https://files.pythonhosted.org/packages/fe/a0/81bf2dbf53cdc1b7aa4e99756e67bee45ed720cff7341dbada46cd867d71/ohio-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b68cda77ba2d897beb0a02f4e46e8c36", "sha256": "da786fae2835a182bbcdb7c0438fcfd19267ce4d78ee7b30f584022c2eeed721" }, "downloads": -1, "filename": "ohio-0.1.0.tar.gz", "has_sig": false, "md5_digest": "b68cda77ba2d897beb0a02f4e46e8c36", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6131, "upload_time": "2019-02-27T23:02:44", "url": "https://files.pythonhosted.org/packages/2e/38/0458e26b42ae4c65d9a194160df00c97fa2e09601dc08d0a060316e052aa/ohio-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "8d628dc9d170409886c51df823d04c19", "sha256": "163ff20f0edde1f144022616d7d3b93ef7819a5e2b09b822f77e0d7025a111ff" }, "downloads": -1, "filename": "ohio-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "8d628dc9d170409886c51df823d04c19", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6057, "upload_time": "2019-03-01T00:28:03", "url": "https://files.pythonhosted.org/packages/d7/55/3740875a0672219a4106c074f4f980718cd1a1ca3de08b26197141d8e558/ohio-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bdd4ed5b1d63cef1995ca9e60e94021a", "sha256": "a7488c08cc9292d3d7be921bcd917c3dd69dde56e496c277946f5b1db8cff896" }, "downloads": -1, "filename": "ohio-0.1.1.tar.gz", "has_sig": false, "md5_digest": "bdd4ed5b1d63cef1995ca9e60e94021a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6190, "upload_time": "2019-03-01T00:28:04", "url": "https://files.pythonhosted.org/packages/33/28/c2b45fdc50ce5f85428204957b4db6e4c19ec55f9afcf0ae7fae2ef5dd45/ohio-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "fba28173f2e3a8768bf17ae0ddd5575a", "sha256": "253517bf33daf3af2b008d4b735acbf5f806504d451d346a9b8025fc8e56c959" }, "downloads": -1, "filename": "ohio-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "fba28173f2e3a8768bf17ae0ddd5575a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5691, "upload_time": "2019-03-05T21:36:36", "url": "https://files.pythonhosted.org/packages/8e/25/51c6b6ddbf8a42d63a7cb2f938414e44ed48182206ce0f6fed4bb95d8cb2/ohio-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "41e5703fe3f10b62b59de3ef74678ffa", "sha256": "40bf8b674097da2f85d447a8275972fc1dd73289ddeeb932e0d710d1ec420836" }, "downloads": -1, "filename": "ohio-0.1.2.tar.gz", "has_sig": false, "md5_digest": "41e5703fe3f10b62b59de3ef74678ffa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5818, "upload_time": "2019-03-05T21:36:38", "url": "https://files.pythonhosted.org/packages/ad/7f/aec9150f7ef44f5f5f4037113516bf06643ab2cf146f469e80a77ae1afe3/ohio-0.1.2.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "c95aa65076259d1f775e2bdfb1849e4d", "sha256": "a96a1c3df9b1b60eec6ac0297b1c88f0990ae6dbdf398e6b04a8855b4b0ec462" }, "downloads": -1, "filename": "ohio-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "c95aa65076259d1f775e2bdfb1849e4d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13523, "upload_time": "2019-04-03T22:26:19", "url": "https://files.pythonhosted.org/packages/3b/01/62b6fa88fe1e3208896a6b4512e1f2e82925b1d230a1d41fe87a5c1ee77f/ohio-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "74b7731a4b3062821e7546a6eccb9880", "sha256": "1209ea00dd0d1934d190bba42c9f82de8c0b7f056f49a0764336f9559b3f0655" }, "downloads": -1, "filename": "ohio-0.2.0.tar.gz", "has_sig": false, "md5_digest": "74b7731a4b3062821e7546a6eccb9880", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9172, "upload_time": "2019-04-03T22:26:20", "url": "https://files.pythonhosted.org/packages/ab/e0/febca81cbd7b994b144dbca84bf5772a5ed4cd84907c770b7ec627d64975/ohio-0.2.0.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "61121570a05618b23c6bb5e7146c2de1", "sha256": "08ea66bb442f72522f3ae5ff16d9285d264f39bec97f6625bcce4451e3af21bb" }, "downloads": -1, "filename": "ohio-0.3.0-py3-none-any.whl", "has_sig": false, "md5_digest": "61121570a05618b23c6bb5e7146c2de1", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20910, "upload_time": "2019-04-11T20:53:03", "url": "https://files.pythonhosted.org/packages/c7/08/ab1eaa5d363a52198c4b74d73e91363b92055ca96f73300fee752c51ae7e/ohio-0.3.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ecf6813b3fbf5937dd29cec9d79745e2", "sha256": "d774ce07ca06827c83684a70ecee629fbc419e2f3ce04609740352f57354d2b8" }, "downloads": -1, "filename": "ohio-0.3.0.tar.gz", "has_sig": false, "md5_digest": "ecf6813b3fbf5937dd29cec9d79745e2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19278, "upload_time": "2019-04-11T20:53:04", "url": "https://files.pythonhosted.org/packages/27/d7/cb1c7ceed7479351e1e9c5815d9269e2116d046ccf9fa8aafaca5718bc44/ohio-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "4957bf8fb2282c39d77c3b8678eefe09", "sha256": "a1ad6cffa08186917e1f5e4a7cbed6d681ea432c282eb52c935638d2e0690cbc" }, "downloads": -1, "filename": "ohio-0.3.1-py3-none-any.whl", "has_sig": false, "md5_digest": "4957bf8fb2282c39d77c3b8678eefe09", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20913, "upload_time": "2019-04-11T21:21:51", "url": "https://files.pythonhosted.org/packages/70/77/8d730ae6427012c34bc2f12101c211e6b5b9ba2032f24f9b90ad620a4208/ohio-0.3.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "26004f4d53955f96040ffefb2d15f280", "sha256": "6828d21934dc42efc39e1cfb68f46dc12b4e8c03090d4fccf48c9d682a3cb3c5" }, "downloads": -1, "filename": "ohio-0.3.1.tar.gz", "has_sig": false, "md5_digest": "26004f4d53955f96040ffefb2d15f280", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19280, "upload_time": "2019-04-11T21:21:53", "url": "https://files.pythonhosted.org/packages/08/86/cef466b434eab44950ba898b40efee49a5466be82b405087dd5bb61d4387/ohio-0.3.1.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "ac24278b3c2c332f8145f43d9f147b73", "sha256": "c0c92468987012aa21edb57b5a2f8b8de0a982e32459d361af9338a70d18e415" }, "downloads": -1, "filename": "ohio-0.4.0-py3-none-any.whl", "has_sig": false, "md5_digest": "ac24278b3c2c332f8145f43d9f147b73", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 23155, "upload_time": "2019-04-17T22:25:00", "url": "https://files.pythonhosted.org/packages/49/df/86933c69254f48892a4e646f486ce04aac346a90d76e4e7e4306eeab1a56/ohio-0.4.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "106a5db325884eff6e92d15217040e23", "sha256": "ec5095e9c6d4a5b45f761d9b2732aadf61bd30c311c2fae949807d0c6dc842d2" }, "downloads": -1, "filename": "ohio-0.4.0.tar.gz", "has_sig": false, "md5_digest": "106a5db325884eff6e92d15217040e23", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23460, "upload_time": "2019-04-17T22:25:02", "url": "https://files.pythonhosted.org/packages/96/cc/46c1960c09b84f99356725571aa545819c3fbf5e295319a949b3b2372bd9/ohio-0.4.0.tar.gz" } ], "0.5.0": [ { "comment_text": "", "digests": { "md5": "af2e95188590e728608da4bf7c41b490", "sha256": "7769c7dcaf26b8c8c588a2b6648c73b88a0dbd44833ac34e587fa772de82d83c" }, "downloads": -1, "filename": "ohio-0.5.0-py3-none-any.whl", "has_sig": false, "md5_digest": "af2e95188590e728608da4bf7c41b490", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 26435, "upload_time": "2019-10-04T22:02:42", "url": "https://files.pythonhosted.org/packages/76/fb/40cec4ddf99a15fc587925ee84a768e7416729f4c410e4bdf40c3b91843d/ohio-0.5.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8767f3935235d6910dd13a70f7670894", "sha256": "f452bd152938bf2cf26322a0bf7ebed792c414c07e56e8725a402e4df2122308" }, "downloads": -1, "filename": "ohio-0.5.0.tar.gz", "has_sig": false, "md5_digest": "8767f3935235d6910dd13a70f7670894", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34907, "upload_time": "2019-10-04T22:02:44", "url": "https://files.pythonhosted.org/packages/9c/26/ca21dc1e5d110c38560bcfd1da296afe992c530871ad550a03a0964a1bc0/ohio-0.5.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "af2e95188590e728608da4bf7c41b490", "sha256": "7769c7dcaf26b8c8c588a2b6648c73b88a0dbd44833ac34e587fa772de82d83c" }, "downloads": -1, "filename": "ohio-0.5.0-py3-none-any.whl", "has_sig": false, "md5_digest": "af2e95188590e728608da4bf7c41b490", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 26435, "upload_time": "2019-10-04T22:02:42", "url": "https://files.pythonhosted.org/packages/76/fb/40cec4ddf99a15fc587925ee84a768e7416729f4c410e4bdf40c3b91843d/ohio-0.5.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8767f3935235d6910dd13a70f7670894", "sha256": "f452bd152938bf2cf26322a0bf7ebed792c414c07e56e8725a402e4df2122308" }, "downloads": -1, "filename": "ohio-0.5.0.tar.gz", "has_sig": false, "md5_digest": "8767f3935235d6910dd13a70f7670894", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34907, "upload_time": "2019-10-04T22:02:44", "url": "https://files.pythonhosted.org/packages/9c/26/ca21dc1e5d110c38560bcfd1da296afe992c530871ad550a03a0964a1bc0/ohio-0.5.0.tar.gz" } ] }