{ "info": { "author": "Shon T. Urbas, Geoff Johnson", "author_email": "shon.urbas@gmail.com, geoff.johnson@handshake.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: CPython", "Topic :: Utilities" ], "description": "==========\npg2kinesis\n==========\n\n.. image:: https://travis-ci.org/handshake/pg2kinesis.svg?branch=master\n :target: https://travis-ci.org/handshake/pg2kinesis/\n\npg2kinesis uses `logical decoding\n`_\nin Postgres 9.4 or later to capture a consistent, continuous stream of events\nfrom the database and publishes them to an `AWS Kinesis `_\nstream in a format of your choosing.\n\nIt does this without requiring any changes to your schema like triggers or\n\"shadow\" columns or tables, and has a negligible impact on database performance.\nThis is done while being extremely fault tolerant. No data loss will be incurred\non any type of underlying system failure including process crashes, network\noutages, or ec2 instance failures. However, in these situations there will\nlikely be records that are sent more than once, so your consumer should be\ndesigned with this in mind.\n\nThe fault tolerance comes from guarantees provided by the underlying\ntechnologies and from the \"2-phase commit\" style of publishing inherent to the\ndesign of the program. Changes are first peeked from the replication slot and\npublished to Kinesis. Once Kinesis successfully recieves a batch of records, we\nadvance the `xmin `_\nof the slot, thereby telling postgres it is safe to reclaim the space taken by\nthe WAL. As is always the case with logical replication slots, unacknowledged\ndata on the slot will consume disk on the database until it is read.\n\nThere are other utilities that do similar things, often by injecting a C library\ninto Postgres to do data transformations in place. Unfortunately these\napproaches are not suitable for managed databases like AWS' RDS where support\nfor various plugins is limited and ultimately determined by the hosting provider.\nWe specifically created pg2kinesis to make use of logical decoding on\n`Amazon's RDS for PostgreSQL `_. Amazon\nsupports logical decoding with either the `test_decoding `_\nor `wal2json `_\noutput plugins. This utility takes the output of either plugin, transforms it\nbased on a formatter you can define, guarantees publishing to a Kinesis stream\nin *transaction commit time order* and with a guarantee that *no data will be lost*.\n\nInstallation\n------------\n\nPrerequisites\n^^^^^^^^^^^^^\n\n #. Python 2.7*, 3.3+\n #. AWS-CLI installed and configured\n #. A PostgreSQL 9.4+ server with logical replication enabled\n #. A Kinesis stream\n\nInstall\n^^^^^^^\n\n ``pip install pg2kinesis``\n\n\nTests\n-----\n\nTo run tests you will need a clone of the repo and have to install some additional requirements:\n\n #. ``git clone git@github.com:handshake/pg2kinesis.git``\n #. ``cd pg2kinesis``\n #. ``pip install -r requirements.txt``\n #. ``(cd tests && pytest)``\n\n\nUsage\n-----\n\nRun ``pg2kinesis --help`` to get a list of the latest command line options.\n\nBy default pg2kinesis attempts to connect to a local postgres instance and\npublish to a stream named ``pg2kinesis`` using the AWS credentials in the\nenvironment the utility was invoked in.\n\nOn successful start it will query your database for the primary key definitions\nof every table in ``--pg-dbname``. This is used to identify the correct column\nin the test_decoding output to publish. If a table does not have a primary key\nits changes will **NOT** be published unless using wal2json and ``--full-change``.\n\nYou have the choice for 3 different textual formats that will be sent to the\nkinesis stream:\n\n* ``CSV``: outputs stings to Kinesis that look like::\n\n 0,CDC,,,,\n\n* ``CSVPayload``: outputs similar to the above except the 3rd column is now a\n json object representing the change.\n\n .. code-block::\n\n 0,CDC,{\n \"xid\": \n \"table\": <...>\n \"operation: <...>\n \"pkey\": <...>\n }\n\n* If ``wal2json`` is being used, this can either be the primary key as above or\n the full changed row.\n\n .. code-block::\n\n 0,CDC,{\n \"xid\": 30355,\n \"change\": {\n \"kind\": \"insert\",\n \"columnnames\": [\"a\", \"b\"],\n \"columntypes\": [\"int4\", \"int4\"],\n \"table\": \"foo\",\n \"columnvalues\": [1, null],\n \"schema\": \"public\"\n }\n }\n\n\nShout Outs\n----------\n\npg2kinesis is based on the ideas of others including:\n\n* Logical Decoding: a new world of data exchange applications for Postgres SQL\n [(`slides `_)]\n* psycopg2 [(`main `_]) [(`repo\n `__)]\n* bottledwater-pg [(`blog `_)] [(`repo `__)]\n* wal2json [(`repo `__)]\n\n\nFuture Road Map\n---------------\n\n* Support full change output from test_decoding plugin\n* Allow HUPing to notify utility to regenerate primary key cache\n* Support above on a schedule specified via commandline with sensible default of once an hour.", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/handshake/pg2kinesis", "keywords": "logical replication,kinesis,database", "license": "MIT", "maintainer": "Shon T. Urbas, Geoff Johnson", "maintainer_email": "shon.urbas@gmail.com, geoff.johnson@handshake.com", "name": "pg2kinesis", "package_url": "https://pypi.org/project/pg2kinesis/", "platform": "", "project_url": "https://pypi.org/project/pg2kinesis/", "project_urls": { "Homepage": "https://github.com/handshake/pg2kinesis" }, "release_url": "https://pypi.org/project/pg2kinesis/0.7.0/", "requires_dist": null, "requires_python": "", "summary": "A utility that enables PostgreSQL 9.4+ to logically replicate to Amazon", "version": "0.7.0" }, "last_serial": 3718114, "releases": { "0.5.0": [ { "comment_text": "", "digests": { "md5": "b73e02c5e06de342c0a3b2d4d14786a9", "sha256": "db5fd39ef301ca2d29a5db88042bba54f6d8ffb01f53d2592b1b8452c3e45647" }, "downloads": -1, "filename": "pg2kinesis-0.5.0.tar.gz", "has_sig": false, "md5_digest": "b73e02c5e06de342c0a3b2d4d14786a9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16466, "upload_time": "2018-02-04T22:48:26", "url": "https://files.pythonhosted.org/packages/e0/83/b6740dc1243b2250ae57dffd4c410f61ece44a1f8c1810608acf5e8f2125/pg2kinesis-0.5.0.tar.gz" } ], "0.6.0": [ { "comment_text": "", "digests": { "md5": "6535d66aef6c3d13e2ea7cf209bbd334", "sha256": "911fb9225a92d4702dc58b184f1fc2501496fd201ecb401bc599c45377677967" }, "downloads": -1, "filename": "pg2kinesis-0.6.0.tar.gz", "has_sig": false, "md5_digest": "6535d66aef6c3d13e2ea7cf209bbd334", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18062, "upload_time": "2018-02-15T15:23:23", "url": "https://files.pythonhosted.org/packages/3a/21/ed5fedb65154ca836b1a4b7610d7fa6d0442cd37bcb1578bc38eaead1ee3/pg2kinesis-0.6.0.tar.gz" } ], "0.7.0": [ { "comment_text": "", "digests": { "md5": "c68b7e839db62885d82381b488fec82b", "sha256": "94a5ba1ba76bfc6067880b19210d4dadb6f47ca4083aa0074c7c1a36e3b7916e" }, "downloads": -1, "filename": "pg2kinesis-0.7.0.tar.gz", "has_sig": false, "md5_digest": "c68b7e839db62885d82381b488fec82b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17846, "upload_time": "2018-03-29T20:32:47", "url": "https://files.pythonhosted.org/packages/e6/44/2510f517faaddbe6ea8545eb525be1392219c7c01eedcdde3ac73ade5b0d/pg2kinesis-0.7.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c68b7e839db62885d82381b488fec82b", "sha256": "94a5ba1ba76bfc6067880b19210d4dadb6f47ca4083aa0074c7c1a36e3b7916e" }, "downloads": -1, "filename": "pg2kinesis-0.7.0.tar.gz", "has_sig": false, "md5_digest": "c68b7e839db62885d82381b488fec82b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17846, "upload_time": "2018-03-29T20:32:47", "url": "https://files.pythonhosted.org/packages/e6/44/2510f517faaddbe6ea8545eb525be1392219c7c01eedcdde3ac73ade5b0d/pg2kinesis-0.7.0.tar.gz" } ] }