{ "info": { "author": "Ralph Bean", "author_email": "rbean@redhat.com", "bugtrack_url": null, "classifiers": [], "description": "What is \"Reliability\"?\n----------------------\n\nEvery few months, a discussion comes up about *fedmsg reliability*.\nTypically, a team somewhere in the Fedora Project will be looking into how they\ncan streamline their workflows by either using data from or sending data to the\nbus. We end up having a lengthy discussion in IRC until the matter gets settled.\nBut that conversation gets lost in the ether of freenode and the *next* time a\ndifferent team has the same questions we have to have the conversation over\nagain.\n\n`fedmsg `_ is a set of python tools (one part library, one\npart framework) that we use around Fedora Infrastructure to enable system\nprocesses to publish and listen for messages. We typically call it our\n\"message bus\", but that is bad nomenclature; it doesn't accurately describe\nwhat is going on. When a process wants to publish fedmsg messages, it invokes\n``fedmsg.publish(...)`` which reads in some configuration from disk, binds a\nsocket to a port if it hasn't already, and writes the message there. When\nanother process wants to consume fedmsg messages, it invokes\n``fedmsg.tail_messages()`` (or registers itself with an already listening\n``fedmsg-hub`` daemon) which then *connects* a socket to the bound port of the\nother process to ``recv`` the message(s).\n\nA little more detail now: the call to ``fedmsg.publish`` doesn't manage the\nsocket binding and sending itself. It hands things off to zeromq which in\nturn hands things off to a number of worker threads it manages. Zeromq keeps\nan internal queue of messages it has been asked to send, which it sends as\nfast as it can. A key takeaway here is that it is \"fire-and-forget\": a web\napplication that has been enabled to publish fedmsg messages will ask the fedmsg\nlibrary to do so, and then walk away to finish its database transactions or do\nwhatever other work it is intended to do. fedmsg (really zeromq here) opaquely\nsends the message as soon as it has a spare moment.\n\n(As an aside, the process involves even more than that. Outgoing\nmessages are signed using cryptographic certificates. Python data types are\nserialized. Incoming messages are validated and deserialized and checked\nagainst an authorization policy about who is allowed to sign what messages,\netc.. We'll come back to this.)\n\nLet's quote `the zguide `_::\n\n Most people who speak of \"reliability\" don't really know what they mean. We\n can only define reliability in terms of failure. That is, if we can handle\n a certain set of well-defined and understood failures, then we are reliable\n with respect to those failures. No more, no less. So let's look at the\n possible causes of failure in a distributed ZeroMQ application\n\nIn practice, we do not \"drop messages\" in Fedora Infrastructure. We have run a\nnumber of experiments that demonstrate this to a degree sufficient\nto establish confidence with a number of other teams.\n\nIn theory, though, there's a race condition here. You could start your\npublishing service and it could publish a message *before* anyone connects to\nlisten for it. It would be lost. Again, in practice we have appropriate\ndelays and reconnect intervals (with exponential backoff) configured to make\nthis a non-issue.\n\nStill, there could be network partitions -- a listening service may suddenly\nfind itself walled off from a publishing service by dead routers or a\nmisconfigured firewall or a dead vpn or catastrophe, in all of such cases:\nmessages will be lost.\n\nDesigning Reliability\n---------------------\n\nThe zguide provides a good description of `how to think about and approach\nreliability for REQ-REP socket patterns\n`_. However,\nwe've settled over the last few years on using *only* the PUB-SUB socket\npattern for fedmsg. Adding another pattern (REQ-REP) alongside that smells\nwrong (it's currently so simple).\n\nThe approach here in `gilmsg `_ is to\nlayer a reliability check *on top of* the existing PUB-SUB fedmsg framework.\n\nHere's how it works broadly:\n\n- When ``gilmsg.publish(...)`` is invoked, you must declare a list of\n **required recipients**.\n- A background thread is started that listens for **ACK** messages on the whole\n bus.\n- If an **ACK** is not received from all ``recipients`` within a given timeout,\n then a ``Timeout`` exception is raised.\n\nFailure cases this addresses:\n\n- If an intended recipient is offline when the message is published, we'll\n raise an exception so that the originating process can fail and alert the\n operator.\n- If an intended recipient is online, but we're suffering from a network\n partition, the publisher will fail loudly.\n\nIt is possible for us to have *False Negatives*:\n\n- If the intended recipient is online, but cannot publish the *ACK* back due to\n a one-way network failure or a misconfiguration, the publisher will fail\n loudly *even though* the consumer did receive the message.\n\nIt is not possible for us to have *False Positives*. If the producer completes\nexecution, we can be sure the consumer received the message.\n\nSome Details\n------------\n\n- We are sure that the ACKs comes from the ``recipients`` we declared because\n the ACKs are *cryptographically signed*.\n- We take care to distinguish between ACKs for *the message we published* and\n ACKs for *other messages we know nothing about*.\n- We start the ACK-listening socket on the producer in a thread **before** we\n publish the original message. This is necessary to avoid a race condition\n where we publish our original message but the ACK comes back before we get a\n chance to start listening for it.\n\nWhy not use gilmsg?\n-------------------\n\nfedmsg has worked very well. We have a `long list of integrated message\nproducers in Fedora Infrastructure `_.\nThe success is due in part to just how *dumb* fedmsg is. It has contributed to\nthe rise of a *loosely coupled architecture*, which was the original aim. When\nsome Fedora hacker gets an idea, then can hook onto the bus to consume messages\nwithout having to negotiate with anyone about it. They can produce messages\nwithout having to go to committee.\n\nAs an example: you can stand up Bodhi2 web app on your local box, and it will\n\"publish to fedmsg\" without you having to start any additional services or\nanything like that. It will publish to fedmsg on your laptop, no one will\nbe listening, and it won't care.\n\nIn contrast, with gilmsg-enabled services:\n\n- Your producers will declare their required consumers.\n- As you add more consumers, you're going to have to patch your producers\n leading to an exponential number of code changes to get new things done.\n- You cannot run or test a gilmsg-enabled service on your own box without a\n replica of the whole production environment, since it will *require* that\n other services respond to it.\n- Your ensemble of production services becomes brittle -- prone to one dead\n service bringing the others down.\n\ngilmsg-enabled services are *tightly coupled*.\n\n----\n\nA cosmetic note:\n\n- The more gilmsg-enabled producer/consumer pairs you bring on line, the more\n spam you'll have on the bus. It can handle the throughput(!) but it will\n make the datagrepper logs less readable as you add more.\n\nUsing gilmsg\n------------\n\nIt is API backwards compatible with fedmsg core. So, write your script to use\nfedmsg first. If at some point in the future you decide that you *must* have\nthe set of guarantees that gilmsg provides, then port to gilmsg.\n\n**Publishing** with ``.publish(..)`` from Python::\n\n import gilmsg\n\n import fedmsg.config\n config = fedmsg.config.load_config()\n\n gilmsg.publish(\n topic=\"whatever\",\n msg=dict(foo=\"bar\"),\n recipients=(\n \"bodhi-bodhi-backend01.phx2.fedoraproject.org\",\n \"shell-autocloud01.phx2.fedoraproject.org\",\n ),\n ack_timeout=0.25, # 0.25 seconds\n **config)\n\n**Publishing** from the shell with ``gilmsg-logger`` with a timeout of 3 seconds::\n\n echo testing | gilmsg-logger --recipients shell-value01.phx2.fedoraproject.org --ack-timeout 3\n\nCompare the above with `publishing with fedmsg alone\n`_.\n\n----\n\n**Consuming** with ``.tail_messages(..)`` in Python::\n\n import gilmsg\n\n import fedmsg.config\n config = fedmsg.config.load_config()\n\n target = \"org.fedoraproject.prod.compose.rawhide.complete\"\n for name, ep, t, msg in gilmsg.tail_messages(topic=target, **config):\n # The ACK has already been sent at this point.\n print \"Received\", t, msg['msg_id']\n\n**Consuming** with the \"Hub-Consumer\" approach::\n\n import gilmsg\n\n class MyConsumer(gilmsg.GilmsgConsumer):\n topic = \"org.fedoraproject.prod.compose.rawhide.complete\"\n\n def consume(self, message):\n # The ACK has already been sent at this point.\n print \"Received\", message['topic'], message['msg_id']\n\nCompare the above with `consuming with fedmsg alone\n`_.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/fedora-infra/gilmsg/", "keywords": null, "license": "LGPLv2+", "maintainer": null, "maintainer_email": null, "name": "gilmsg", "package_url": "https://pypi.org/project/gilmsg/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/gilmsg/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/fedora-infra/gilmsg/" }, "release_url": "https://pypi.org/project/gilmsg/0.1.2/", "requires_dist": null, "requires_python": null, "summary": "A reliability layer on top of fedmsg", "version": "0.1.2" }, "last_serial": 2045584, "releases": { "0.0.1": [], "0.1.0": [ { "comment_text": "", "digests": { "md5": "e034553412a3822523d56a80b8d745e0", "sha256": "3d87bc259256ceec6f64dbd3a6c74781ae4719431fc41c895df1dbecc0b055d1" }, "downloads": -1, "filename": "gilmsg-0.1.0.tar.gz", "has_sig": true, "md5_digest": "e034553412a3822523d56a80b8d745e0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8138, "upload_time": "2015-09-05T12:49:47", "url": "https://files.pythonhosted.org/packages/20/37/3093f1007f5fa33d7fa61e626a8c658dcec5abd5b89e3db5e888d0b793e0/gilmsg-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "c13642ea36316c31c3861fb343f1bbae", "sha256": "8339eaf9734bd520446b678ca1192aae1ddaabb7ef300eed916adee29b44b82f" }, "downloads": -1, "filename": "gilmsg-0.1.1.tar.gz", "has_sig": true, "md5_digest": "c13642ea36316c31c3861fb343f1bbae", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8191, "upload_time": "2015-09-05T12:52:32", "url": "https://files.pythonhosted.org/packages/c9/5a/c613f9e13a1e21e25d3a8d9c7abe813a01ed59ef4668b925f1e2925d37b2/gilmsg-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "66a2ee339b03659b613bd92cfb083ae8", "sha256": "bc108b3b45a3c3c1ca517bdfccb9f9cf3e3f4f5d724accf70ecf82acfd2b5166" }, "downloads": -1, "filename": "gilmsg-0.1.2.tar.gz", "has_sig": true, "md5_digest": "66a2ee339b03659b613bd92cfb083ae8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20955, "upload_time": "2015-09-05T12:57:36", "url": "https://files.pythonhosted.org/packages/20/9d/c65d5cb72af1e030256de07edd4eee14427671c29a71bdd46e7c7b56d7b4/gilmsg-0.1.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "66a2ee339b03659b613bd92cfb083ae8", "sha256": "bc108b3b45a3c3c1ca517bdfccb9f9cf3e3f4f5d724accf70ecf82acfd2b5166" }, "downloads": -1, "filename": "gilmsg-0.1.2.tar.gz", "has_sig": true, "md5_digest": "66a2ee339b03659b613bd92cfb083ae8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20955, "upload_time": "2015-09-05T12:57:36", "url": "https://files.pythonhosted.org/packages/20/9d/c65d5cb72af1e030256de07edd4eee14427671c29a71bdd46e7c7b56d7b4/gilmsg-0.1.2.tar.gz" } ] }