{ "info": { "author": "Mike Bayer", "author_email": "mike@zzzcomputing.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Programming Language :: Python", "Programming Language :: Python :: 3", "Topic :: Database :: Front-Ends" ], "description": "===================\nsqlalchemy-collectd\n===================\n\nSend statistics on `SQLAlchemy `_ connection and\ntransaction metrics used by Python applications to the\n`collectd `_ service.\n\nsqlalchemy-collectd works as a SQLAlchemy plugin invoked via the database URL,\nso can be used in any SQLAlchemy application (1.1 or greater) that accepts\narbitrary connection URLs. The plugin is loaded using setuptools\nentrypoints and no code changes to the application are required. There\nare no dependencies on database backends or drivers.\n\nsqlalchemy-collectd is oriented towards providing a unified view of\napplication-side database metrics in sprawling, many-host / many-process\nenvironments that may make use of any number of topologically complicating\ntechnologies such as database clusters, proxy servers, large numbers of\nclient applications, multi-process applications, and containers.\n\nWhat's collectd?\n================\n\ncollectd is a statistics collection daemon that is easy to run. It serves as\na collector and re-broadcaster of runtime statistics for a wide variety of\nperformance and other metrics. Once a set of stats are in collectd, they can\nbe broadcast out virtually anywhere, including to `RRD `_\ndatabases and front-ends, to metrics\nreporting applications like `Graphite `_ and `Grafana\n`_, and to `other collectd servers\n`_.\n\nArchitecture Nutshell\n=====================\n\nsqlalchemy-collectd gathers its statistics from **within** the Python\napplication itself, and delivers live metrics to a collectd service over UDP.\nTo achieve this, it's client portion is loaded within the process as a\nSQLAlchemy **engine plugin** that attaches listeners to a\n``sqlalchemy.engine.Engine`` object as well as the connection pool within it.\nA background thread running within each process periodically sends a snapshot\nof statistics out over UDP.\n\nOn the collectd side, a Python plugin listens on the same UDP port and\naggregates statistics received from any number of Python processes and hosts,\nand then delivers them to the collectd engine itself as a series of\nper-host and per-program statistics.\n\nA key goal of this architecture is to allow a Python program that uses\nmultiple processes (e.g. via Python ``multiprocessing`` or just plain\n``fork``) to report **unified** information on each engine/connection pool\nwithin each subprocess, as well as to track multiple instances of the\nsame application running from many hosts (and of course it can track\nany number of applications and hosts simultaneously). Having a full\nclient/server model allows the collectd service itself to be located not only\non the same host as the application itself, but on any other host on the\nnetwork.\n\nThe network model itself makes use of collectd's own binary protocol; while\nnot strictly necessary, this is because originally the plan was to use the\ncollectd \"network\" plugin as the receiver, however after observing limitations\nin collectd's \"aggregation\" plugin this was replaced with a full Python plugin\nthat does everything needed in a much more straightforward way.\n\nHow is this different from using database monitoring?\n=====================================================\n\nWhen you run a database like MySQL or Postgresql, there's lots of ways to see\nactivity in the database; you can list out statistics regarding connections,\ntransactions, threads / processes in use, and in most cases you can integrate\nthese server-side statistics with collectd itself to watch trends in real time.\n\nHowever, while gathering stats from the server can provide insights into\nactivity, including being able to look at the originating host as well as the\nspecific database(s) being accessed by each client, in a large scale\nenvironment it's difficult to get a unified, real-time picture for how each\nprocess on each host is making use of its database connections, especially if\nthere are additional layers of indirection between application and\ndatabases present, such as proxy servers like HAProxy, ProxySQL or PGBouncer,\nas well as when databases and/or applications are containerized and potentially\nhopping over additional network translation layers. This kind of analysis\nrequires being able to relate database connections reported by the database\nto the originating hosts and individual processes on each host.\n\nSQLAlchemy-oriented applications usually make use of process-local connection pools as\nwell, and it is valuable to be able to see how well these pools are being\nutilized, which includes being able to see how many connections are sitting\nidle vs. how often does the application need to create new connections in\norder to respond to requests. These are still things that can probably be\ngleaned from the database itself from things like connection idle time, but\nespecially when layers of indirection are in place, it's simpler to get the\nperformance metrics you care about (e.g., how well are the **applications**\nperforming) from the applications themselves, as they can give you the exact\ninformation about what they are doing without having to reverse-engineer it\nfrom database servers and network status.\n\nOf course, this assumes the applications are Python applications using\nSQLAlchemy. Which of course they should be! :)\n\nInstallation\n============\n\nTo use SQLAlchemy-collectd, you need to have:\n\n* SQLAlchemy-collectd installed as a Python library alongside SQLAlchemy\n itself, in all Python environments that run a SQLAlchemy-oriented\n application.\n\n* The collectd service itself somewhere on the network.\n\n* The collectd-python plugin, which may be delivered as a separate package\n depending on distribution\n\n* SQLAlchemy-collectd installed as a Python library alongside the collectd\n server itself, either as part of the system Python which collectd-python\n accesses by default, **or** the SQLAlchemy-collectd application can be\n extracted into any arbitrary location that can be set up as an additional\n system path with collectd.\n\nWithout using a package manager, SQLAlchemy-collectd can be installed via\npip using::\n\n\tpip install sqlalchemy-collectd\n\nConfiguration\n=============\n\nConfiguration involves both a client-side configuration as well as a server\nside configuration. Both are very simple.\n\nClient\n------\n\nSQLAlchemy applications use a database connection URL, usually loaded\nfrom a configuration system of some kind. Wherever this URL is for your\ntarget application, basically add ``?plugin=collectd`` to it (or ``&plugin=collectd``\nif other query parameters already there). Such as::\n\n\tmysql+pymysql://user:password@databasehost/dbname?charset=utf8&plugin=collectd\n\nUsing a URL as above, the sqlalchemy-collectd plugin will be pulled in where it\nwill attempt to send messages to a collectd server listening on localhost port\n25827 (note this is one higher than the default collectd network plugin port of\n25826).\n\n\nDestination Host\n^^^^^^^^^^^^^^^^\n\nTo send stats to collectd on a different host, add ``collectd_host``\n(currently ipv4 only) and optionally ``collectd_port``::\n\n\tmysql+pymysql://user:password@databasehost/dbname?charset=utf8&plugin=collectd&collectd_host=172.18.5.2&collectd_port=25827\n\nProgram Name\n^^^^^^^^^^^^\n\nAnother important configuration is the \"program name\" - this is the application\nname that sqlalchemy-collectd will report within statistics. Within a particular\n\"program name\" on a particular host, statistics are **aggregated across all\nprocesses**, regardless of parent process.\n\nBy default, the \"program name\" comes from ``sys.argv[0]``, but this is not\nalways what's desired; for example, if you're running from within mod_wsgi,\nthis will likely return ``httpd`` which is more vague that most would prefer.\nAdditionally, a single application might create connections to multiple\ndatabases for different purposes, and one might want to separate the reporting\nfor these into different sections. To set up this program name, add\n``collectd_program_name``::\n\n\tmysql+pymysql://user:password@databasehost/dbname?charset=utf8&plugin=collectd&collectd_program_name=nova_api&collectd_host=172.18.5.2\n\nWith the above URL, all Python processes that use this URL on a single host\nwill aggregate their connection use statistics under the name ``nova_api``.\n\nStartup\n^^^^^^^\n\nAfter the URL is configured, the vast majority of applications probably\nneed to be restarted for the change to take effect.\n\nThe plugin will transparently spawn a background thread for each individual process\nthat starts up which also connects to the database (don't worry, these work\nif you are using gevent, eventlet, asyncio, gunicorn, etc. threads are your\nfriend).\n\n\nTODO\n^^^^\n\nWe can add options so that stats are still grouped under parent pids, that\nis instead of using ```` as the classifier we use\n``-``, like ``nova_api-15840`` vs. ``nova_api-4573``, etc.\nOf course we can report on the raw subprocess identifiers as well but this\ndoesn't appear to be that useful.\n\n\n\nServer\n------\n\nsqlalchemy-collectd uses a Python plugin, so in your collectd.conf or in a\ncollectd.d/sqlalchemy.conf file, assuming a system-installed sqlalchemy-collectd::\n\n\tLoadPlugin python\n\t\n\t LogTraces true\n\n\t Import \"sqlalchemy_collectd.server.plugin\"\n\n\t \n\t \t# ipv4 only for the moment\n\t listen \"0.0.0.0\" 25827\n\n\t # set to \"debug\" to show messages received\n\t loglevel \"info\"\n\n\t \n\t\n\nAbove, the plugin will listen for UDP on port 25827 of the default network\ninterface. It can also be configured to listen on \"localhost\" or any\nother IP number (currently ipv4 only) on the host.\n\nCustom Module Path\n^^^^^^^^^^^^^^^^^^\n\nTo reference sqlalchemy-collectd extracted into an arbitrary file location,\nadd ``ModulePath``::\n\n\tLoadPlugin python\n\t\n\t\tModulePath \"/path/to/sqlalchemy-collectd/\"\n\t LogTraces true\n\n\t Import \"sqlalchemy_collectd.server.plugin\"\n\n\t \n\t \t# ipv4 only for the moment\n\t listen \"0.0.0.0\" 25827\n\n\t # set to \"debug\" to show messages received\n\t loglevel \"info\"\n\t \n\t\n\nFor further information about the Python plugin system see\n`collectd-python `_.\n\nThe collectd server is typically restarted for the configurational change\nto take effect.\n\nTODO\n^^^^\n\n* ipv6 support\n\n* security layer (e.g. network packet signing / encryption)\n\nStats\n=====\n\nNow that sqlalchemy-collectd is running, what stats can we see?\n\nSupposing we have the plugin turned on for the applications ``neutron``\nand ``nova``, the namespace we would see in a tool like graphana would\nlook like::\n\n\thostname\n\t\tsqlalchemy-host\n\t\t\tcount-checkedin\n\t\t\tcount-checkedout\n\t\t\tcount-connections\n\t\t\tcount-detached\n\t\t\tcount-numpools\n\t\t\tcount-numprocs\n\t\t\tderive-checkouts\n\t\t\tderive-connects\n\t\t\tderive-disconnects\n\t\t\tderive-invalidated\n\t\t\tderive-commits\n\t\t\tderive-rollbacks\n\t\t\tderive-transactions\n\n\t\tsqlalchemy-neutron\n\t\t\tcount-checkedin\n\t\t\tcount-checkedout\n\t\t\tcount-connections\n\t\t\tcount-detached\n\t\t\t... everything else\n\n\t\tsqlalchemy-nova\n\t\t\tcount-checkedin\n\t\t\tcount-checkedout\n\t\t\tcount-connections\n\t\t\tcount-detached\n\t\t\t... everything else\n\nAbove, we first see that all stats are grouped per-hostname. Within that,\nwe have a fixed *plugin instance* called \"host\", which renders as ``sqlalchemy-host``.\nThis represents aggregated statistics for the entire host, that is, statistics\nthat take into account all database connections used by all applications (that\nuse sqlalchemy-collectd) on this particular host.\n\nFollowing that, we can see there are groups for the individual ``program_name``\nwe set up, for ``nova`` and ``neutron`` we get stats aggregated for that\nname specifically.\n\nThe statistics themselves are labeled ``count-`` or ``derive-``,\nwhich correspond to pre-supplied collectd types ``count`` and ``derive`` (see\n\"collectd types\" below for why the naming is done this way). The stats labeled\n``count`` are integers representing the current count of a resource or\nactivity:\n\n* ``count-checkedin`` - current number of connections that are checked in to the\n connection pool\n\n* ``count-checkedout`` - current number of connections that are checked out from\n the connection pool, e.g. are in use by the application to talk to the\n database.\n\n* ``count-connections`` - total number of connections to the database at this moment,\n checked out, checked in, detached, or soft-invalidated.\n\n* ``count-detached`` - total number of connections that are **detached**; meaning\n they have been disconnected from the engine/pool using the ``.detach()``\n method but are still being used as a database connection.\n\n* ``count-numpools`` - the number of connection pools in use. A SQLAlchemy\n ``Engine`` features exactly one connection pool. If an application connects\n to two different database URLs in a process and creates two different\n ``Engine`` objects, then you'd have two pools. If that same application\n spawns off into ten subprocesses, then you have 20 or 22 pools in use,\n depending on how the parent uses the database also. Use ``count-numpools``\n to make sure this number is what you expect. A poorly written application\n that is spawning a brand new ``Engine`` for each request will have a\n dramatically larger number here (as well as one that is changing constantly)\n and that is an immediate red flag that the application should be fixed.\n\n* ``count-numprocs`` - the total number of Python processes, e.g. parent and\n subprocesses, that are contributing to the connection statistics in this\n group. This number will match ``count-numpools`` if you have one\n ``Engine`` per process.\n\n Both the ``count-numpools`` and ``count-numprocs`` values provide context to\n when one looks at the total connections and checkouts. If connection pools\n are configured to allow at most 20 connections max, and you have 10\n connection pools on the host, now you can have 200 connections max to your\n database.\n\nThe stats labeled ``derive`` are floating point values representing a\n**rate** of activity. sqlalchemy-collectd sends these numbers to the\ncollectd server as a total number of events occurred as of a specific\ntimestamp; collectd then compares this to the previous value to determine\nthe rate. How the rate is reported (e.g. number per second, etc.) depends\non the reporting tools being used.\n\n* ``derive-checkouts`` - rate of connections being checked out.\n\n* ``derive-connects`` - rate of new connections made to the database\n\n* ``derive-disconnects`` - rate of database connections being closed\n\n* ``derive-invalidated`` - rate of connections that are explicitly **invalidated**,\n e.g. have encountered a connectivity error which made the program invalidate\n the connection. The application may or may not have tried to connect\n again immediately depending on how it is using this feature. See the\n section on \"invalidated connections\" below for details on this.\n\n* ``derive-commits`` - (TODO: not implemented yet) rate of calls to ``transaction.commit()``. This value\n can be used to estimate TPS, e.g. transactions per second, however note that\n this is limited to SQLAlchemy-explicit transactions where the Engine-level\n begin() / commit() methods are being invoked. When using the SQLAlchemy\n ORM with the ``Session``, this rate should be tracking the rate of\n calls to ``Session.commit()``.\n\n* ``derive-rollbacks`` - (TODO: not implemented yet) rate of calls to ``transaction.rollback()``.\n\n* ``derive-transactions`` - (TODO: not implemented yet) rate of transactions overall. This should add up\n to the commit and rollback rates combined, however may be higher than that\n if the application also discards transactions and/or ``Session`` objects\n without calling ``.commit()`` or ``.rollback()``.\n\nInvalidated Connections\n-----------------------\n\nThe ``derive-invalidated`` stat records the rate of invalidations.\n\nBy invalidated, we mean the ``.invalidated()`` method on the connection\nis called, which marks this connection as no longer usable and marks it\nfor refresh on next use (soft invalidation) or more commonly closes it\nimmediately (hard invalidation). Typically, when a connection is invalidated,\nthe application is either pre-pinging the database and will try to connect\nagain, or it was in the middle of an operation when the database got\ncut off, in which case depending on how the application was designed it\nmay or may not try the operation again.\n\nInvalidation usually corresponds to a\nconnection that reported a problem in being able to communicate with the\ndatabase, and for which an error was raised. For this reason, the\n\"invalidated\" rate should be considered to be roughly an \"error\" rate -\neach count here usually corresponds to a connectivity error encountered by the\napplication to which it responded by invalidating the connection, which results\neither in immediate or eventual reconnection.\n\nFor most invalidation scenarios, the entire pool of connections is\ninvalidated at once using a \"freshness\" timestamp; any connection older than\nthis timestamp is refreshed on next use. This is to suit the case of assuming\nthat the database was probably restarted, so all connections need to be\nreconnected. These connections which have been **implicitly** invalidated\nare **not** included in this count.\n\nCollectd Types\n--------------\n\nThese funny names ``count-`` and ``derive-`` are an artifact of how\ncollectd provides *types*. collectd has a fixed list of \"types\" which it\nlists in a file called ``types.db``. The server does not accept type names\nthat are not either in this file or in a separately configured custom types file,\nas each type is accompanied by a template for what kinds of values it\ncarries. Annoyingly, collectd does not let us add these names within the\nregular .conf file, which would make it very easy for us to include\nour own custom names; it instead requires they be listed in completely separate file that must be\nexplicitly referred to by absolute path within a conf file, and then to\nmake matters worse when this option is used, we have to uncomment the location\nof the default types.db file in the central collectd.conf else it will\nno longer be able to find it. Given the choice between \"very nice names\"\nand \"no need to set up three separate config files\", we chose the latter :)\n\nconnmon mode\n============\n\nAs an added feature, the **connmon** UX has now been integrated into SQLAlchemy-collectd.\nThis is a console application that displays a \"top\"-like display of the current\nstatus of connections.\n\nUsing the configuration above, we can add a \"monitor\" line to our collectd\nserver configuration::\n\n\n LoadPlugin python\n \n LogTraces true\n\n Import \"sqlalchemy_collectd.server.plugin\"\n\n \n # ipv4 only for the moment\n listen \"0.0.0.0\" 25827\n\n # set to \"debug\" to show messages received\n loglevel \"info\"\n\n # connmon monitor port\n monitor \"localhost\" 25828\n \n \n\nWe can now run \"connmon\" on localhost port 25828::\n\n connmon --port 25828\n\nScreenshot of connmon:\n\n|connmon_screenshot|\n\n.. |connmon_screenshot| image:: connmon.png\n :width: 800", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sqlalchemy/sqlalchemy-collectd", "keywords": "SQLAlchemy collectd", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "sqlalchemy-collectd", "package_url": "https://pypi.org/project/sqlalchemy-collectd/", "platform": "", "project_url": "https://pypi.org/project/sqlalchemy-collectd/", "project_urls": { "Homepage": "https://github.com/sqlalchemy/sqlalchemy-collectd" }, "release_url": "https://pypi.org/project/sqlalchemy-collectd/0.0.5/", "requires_dist": null, "requires_python": "", "summary": "Send database connection pool stats to collectd", "version": "0.0.5" }, "last_serial": 5636614, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "843e12553ce5ea799a7661914cdb2393", "sha256": "008856257f3b798b2a2252a91ec8dcbd2b0c38f3fa0d7bfec5cadc289e8bbbf4" }, "downloads": -1, "filename": "sqlalchemy-collectd-0.0.1.tar.gz", "has_sig": false, "md5_digest": "843e12553ce5ea799a7661914cdb2393", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19207, "upload_time": "2018-02-11T14:05:43", "url": "https://files.pythonhosted.org/packages/1b/91/2a920652b17974d9d31d8c7b6826bf04ccb4220888ada47b0db761a6485c/sqlalchemy-collectd-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "2800cfad1a93ec1c0f3677c1b00caa3d", "sha256": "ab3352f125d2a167026d451fcc69eb82f8ec3afb4055e20771575d98431da254" }, "downloads": -1, "filename": "sqlalchemy-collectd-0.0.2.tar.gz", "has_sig": false, "md5_digest": "2800cfad1a93ec1c0f3677c1b00caa3d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19219, "upload_time": "2018-02-11T23:55:59", "url": "https://files.pythonhosted.org/packages/bb/51/966d7196d7310b0baa5853b9aa0737ba3f5e195cfc0bab0c46fcf9c92640/sqlalchemy-collectd-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "2c12100229a0c33a9ac465b715c0bee1", "sha256": "a311ed7e7d1323600fa6810f3756ae4a21400106340469533b73b11be68308df" }, "downloads": -1, "filename": "sqlalchemy-collectd-0.0.3.tar.gz", "has_sig": true, "md5_digest": "2c12100229a0c33a9ac465b715c0bee1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28973, "upload_time": "2018-11-28T00:40:22", "url": "https://files.pythonhosted.org/packages/88/76/bc1faa1da43663238e68b0f1a0670a8a630e8f7469c3133bd54afb977224/sqlalchemy-collectd-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "ca608ddf5aa4a7ed79a3a0ee36675cbc", "sha256": "a9969a59f03436f1dcfcad6127f59cb0333de97a245a554751c179d7e6135df9" }, "downloads": -1, "filename": "sqlalchemy-collectd-0.0.4.tar.gz", "has_sig": true, "md5_digest": "ca608ddf5aa4a7ed79a3a0ee36675cbc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 179275, "upload_time": "2019-05-29T15:06:12", "url": "https://files.pythonhosted.org/packages/4e/4b/22a131dd41a88454dffe2f55e3f04bae0c5600279ca7df66c365454c7146/sqlalchemy-collectd-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "0bc9c2057bc69518ce4ff04e18940df4", "sha256": "88a5e4b523114f284bf39ee6a52d578a4449fa6e7599c092b4bccdd1cb69ed3e" }, "downloads": -1, "filename": "sqlalchemy-collectd-0.0.5.tar.gz", "has_sig": true, "md5_digest": "0bc9c2057bc69518ce4ff04e18940df4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 181149, "upload_time": "2019-08-05T21:51:06", "url": "https://files.pythonhosted.org/packages/e3/de/785c6a192b3208954565b498d35dd80f1c9b7bf2591c476a7e8f0cb8f485/sqlalchemy-collectd-0.0.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0bc9c2057bc69518ce4ff04e18940df4", "sha256": "88a5e4b523114f284bf39ee6a52d578a4449fa6e7599c092b4bccdd1cb69ed3e" }, "downloads": -1, "filename": "sqlalchemy-collectd-0.0.5.tar.gz", "has_sig": true, "md5_digest": "0bc9c2057bc69518ce4ff04e18940df4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 181149, "upload_time": "2019-08-05T21:51:06", "url": "https://files.pythonhosted.org/packages/e3/de/785c6a192b3208954565b498d35dd80f1c9b7bf2591c476a7e8f0cb8f485/sqlalchemy-collectd-0.0.5.tar.gz" } ] }