Monitoring Dispatchers
======================

A process's zc.async dispatcher [#setUp]_ can be monitored in-process via
zc.monitor plugins.  Let's imagine we have a connection over which we
can send text messages to the monitor server [#monitor_setup]_.

All monitoring is done through the ``async`` command.  Here is its
description, using the zc.monitor ``help`` command.

    >>> connection.test_input('help async\n')
    Help for async:
    <BLANKLINE>
    Monitor zc.async activity in this process.
    <BLANKLINE>
        To see a list of async tools, use 'async help'.
    <BLANKLINE>
        To learn more about an async monitor tool, use 'async help <tool name>'.
    -> CLOSE

As you can see, you use ``async help`` to get more information about each
async-specific command.

    >>> connection.test_input('async help\n')
    These are the tools available.  Usage for each tool is 
    'async <tool name> [modifiers...]'.  Learn more about each 
    tool using 'async help <tool name>'.
    <BLANKLINE>
    UUID: Get instance UUID in hex.
    help: Get help on an async monitor tool.
    job: Local information about a job as of last poll, if known.
    jobs: Show active jobs in worker threads as of the instant.
    jobstats: Statistics on historical jobs as of last poll.
    poll: Get information about a single poll, defaulting to most recent.
    polls: Get information about recent polls, defaulting to most recent.
    status: Get a mapping of general zc.async dispatcher information.
    utcnow: Return the current time in UTC, in ISO 8601 format. 
    -> CLOSE

Let's give a quick run through these for an overview, and then we'll dig in
just a bit.

The ``UUID`` command returns the instance's UUID.

    >>> connection.test_input('async help UUID\n')
    Get instance UUID in hex. 
    -> CLOSE

    >>> connection.test_input('async UUID\n')
    d10f43dc-ffdf-11dc-abd4-0017f2c49bdd 
    -> CLOSE

The ``utcnow`` command returns the current time in UTC.  This can be
convenient to decipher the meaning of UTC datetimes returned from other
commands.

    >>> connection.test_input('async help utcnow\n')
    Return the current time in UTC, in ISO 8601 format. 
    -> CLOSE

    >>> connection.test_input('async utcnow\n')
    2006-08-10T15:44:23.000211Z 
    -> CLOSE

The ``status`` command is the first of the "serious" monitoring
commands.  As such, it starts some patterns that the rest of the
commands will follow.

- output is pretty-printed JSON

- durations are in a dict of keys 'days', 'hours', 'minutes', and 'seconds',
  with all as ints except seconds, which is a float.

    >>> connection.test_input('async help status\n')
    Get a mapping of general zc.async dispatcher information.
    <BLANKLINE>
        'status' is one of 'STUCK', 'STARTING', 'RUNNING', or 'STOPPED', where
        'STUCK' means the poll is past due. 
    -> CLOSE

    >>> connection.test_input('async status\n')
    {
        "poll interval": {
            "seconds": 5.0
        }, 
        "status": "RUNNING", 
        "time since last poll": {
            "seconds": 1.0
        }, 
        "uptime": {
            "seconds": 1.0
        }, 
        "uuid": "d10f43dc-ffdf-11dc-abd4-0017f2c49bdd"
    } 
    -> CLOSE

Here's the ``jobs`` command.  It introduces some new patterns.

- some command modifiers are available as <modifier>:<value>

- several commands have the "queue:" and "agent:" modifiers.

    >>> connection.test_input('async help jobs\n')
    Show active jobs in worker threads as of the instant.
    <BLANKLINE>
        Usage:
    <BLANKLINE>
            jobs
            (returns active jobs as of last poll, newest to oldest)
    <BLANKLINE>
            jobs queue:<queue name>
            (jobs are filtered to those coming from the named queue)
    <BLANKLINE>
            jobs agent:<agent name>
            (jobs are filtered to those coming from agents with given name)
    <BLANKLINE>
        "queue:" and "agent:" modifiers may be combined.
    <BLANKLINE>
        Example:
    <BLANKLINE>
            async jobs queue: agent:main
            (results filtered to queue named '' and agent named 'main') 
    -> CLOSE

    >>> connection.test_input('async jobs\n')
    [] 
    -> CLOSE

The ``jobstats`` analyzes past polls and job information to come up with
some potentially useful statistics.  It includes the optional "queue:" and
"agent:" modifiers.  It also shows some new patterns.

- datetimes are in UTC, in ISO 8601 format.

- The "at:", "before:" and "since:" modifiers are intervals, or poll keys.

- "at:" and "before:" may not be combined.

    >>> connection.test_input('async help jobstats\n')
    Statistics on historical jobs as of last poll.
    <BLANKLINE>
        Usage:
    <BLANKLINE>
            jobstats
            (returns statistics on historical jobs as of last poll)
    <BLANKLINE>
            jobstats queue:<queue name>
            (statistics are filtered to those coming from the named queue)
    <BLANKLINE>
            jobstats agent:<agent name>
            (statistics are filtered to those coming from agents with given name)
    <BLANKLINE>
            jobstats at:<poll key or interval>
            (statistics are collected at or before the poll key or interval)
    <BLANKLINE>
            jobstats before:<pollkey or interval>
            (statistics are collected before the poll key or interval)
    <BLANKLINE>
            jobstats since:<pollkey or interval>
            (statistics are collected since poll key or interval, inclusive)
    <BLANKLINE>
        The modifiers "queue:", "agent:", "since:", and one of "at:" or "before:"
        may be combined.
    <BLANKLINE>
        Intervals are of the format ``[nD][nH][nM][nS]``, where "n" should
        be replaced with a positive integer, and "D," "H," "M," and "S" are
        literals standing for "days," "hours," "minutes," and "seconds."
        For instance, you might use ``5M`` for five minutes, ``20S`` for
        twenty seconds, or ``1H30M`` for an hour and a half.
    <BLANKLINE>
        Poll keys are the values shown as "key" from the ``poll`` or ``polls``
        command.
    <BLANKLINE>
        Example:
    <BLANKLINE>
            async jobstats queue: agent:main since:1H
            (results filtered to queue named '' and agent named 'main' from
             one hour ago till now) 
    -> CLOSE

    >>> connection.test_input('async jobstats\n')
    {
        "failed": 0, 
        "longest active": null, 
        "longest failed": null, 
        "longest successful": null, 
        "shortest active": null, 
        "shortest failed": null, 
        "shortest successful": null, 
        "started": 0, 
        "statistics end": "2006-08-10T15:44:22.000211Z", 
        "statistics start": "2006-08-10T15:44:22.000211Z", 
        "successful": 0, 
        "unknown": 0
    } 
    -> CLOSE

The ``poll`` command uses patterns we've seen above.

    >>> connection.test_input('async help poll\n')
    Get information about a single poll, defaulting to most recent.
    <BLANKLINE>
        Usage:
    <BLANKLINE>
            poll
            (returns most recent poll)
    <BLANKLINE>
            poll at:<poll key or interval>
            (returns poll at or before the poll key or interval)
    <BLANKLINE>
            poll before:<poll key or interval>
            (returns poll before the poll key or interval)
    <BLANKLINE>
        Intervals are of the format ``[nD][nH][nM][nS]``, where "n" should
        be replaced with a positive integer, and "D," "H," "M," and "S" are
        literals standing for "days," "hours," "minutes," and "seconds."
        For instance, you might use ``5M`` for five minutes, ``20S`` for
        twenty seconds, or ``1H30M`` for an hour and a half.
    <BLANKLINE>
        Example:
    <BLANKLINE>
            async poll at:5M
            (get the poll information at five minutes ago or before) 
    -> CLOSE

    >>> connection.test_input('async poll\n')
    {
        "key": 6420106068108777167, 
        "results": {
            "": {}
        }, 
        "time": "2006-08-10T15:44:22.000211Z"
    } 
    -> CLOSE

``polls`` does too.

    >>> connection.test_input('async help polls\n')
    Get information about recent polls, defaulting to most recent.
    <BLANKLINE>
        Usage:
    <BLANKLINE>
            polls
            (returns most recent 3 poll)
    <BLANKLINE>
            polls at:<poll key or interval>
            (returns up to 3 polls at or before the poll key or interval)
    <BLANKLINE>
            polls before:<poll key or interval>
            (returns up to 3 polls before the poll key or interval)
    <BLANKLINE>
            polls since:<poll key or interval>
            (returns polls since the poll key or interval, inclusive)
    <BLANKLINE>
            polls count <positive integer>
            (returns the given number of the most recent files)
    <BLANKLINE>
        The modifiers "since:", "count:", and one of "at:" or "before:" may
        be combined.
    <BLANKLINE>
        Intervals are of the format ``[nD][nH][nM][nS]``, where "n" should
        be replaced with a positive integer, and "D," "H," "M," and "S" are
        literals standing for "days," "hours," "minutes," and "seconds."
        For instance, you might use ``5M`` for five minutes, ``20S`` for
        twenty seconds, or ``1H30M`` for an hour and a half.
    <BLANKLINE>
        Example:
    <BLANKLINE>
            async polls since:10M before:5M
            (get the poll information from 10 to 5 minutes ago) 
    -> CLOSE

    >>> connection.test_input('async polls\n')
    [
        {
            "key": 6420106068108777167, 
            "results": {
                "": {}
            }, 
            "time": "2006-08-10T15:44:22.000211Z"
        }
    ] 
    -> CLOSE

Now that you've seen the basics, we will add some jobs and look at some of the
statistics.

    >>> def send_message():
    ...     print "imagine this sent a message to another machine"
    >>> job = queue.put(send_message)
    >>> import transaction
    >>> transaction.commit()

    >>> reactor.wait_for(job)
    imagine this sent a message to another machine

Here are the revised stats.

    >>> connection.test_input('async jobstats\n')
    {
        "failed": 0, 
        "longest active": null, 
        "longest failed": null, 
        "longest successful": [
            30, 
            "unnamed"
        ], 
        "shortest active": null, 
        "shortest failed": null, 
        "shortest successful": [
            30, 
            "unnamed"
        ], 
        "started": 1, 
        "statistics end": "2006-08-10T15:44:22.000211Z", 
        "statistics start": "2006-08-10T15:44:27.000211Z", 
        "successful": 1, 
        "unknown": 0
    } 
    -> CLOSE

[Notice we are actually showing the oids of the jobs.  For test purposes, this
relies on implementation details of the MappingStorage, but it is currently
practically stable, and lets us see precisely how these stats work and how to
interpret the information]

The longest successful job (the only job!) has *unpacked* oid 30, in the
unnamed database. We can get to it using the connection and ZODB.utils.p64.

    >>> import ZODB.utils
    >>> queue._p_jar.get(ZODB.utils.p64(30)) is job
    True

So what are the details about that job?  We have not used the ``job`` command
before because we didn't have a job to look at.

    >>> connection.test_input('async job 30\n') # doctest: +ELLIPSIS
    {
        "agent": "main", 
        "call": "<zc.async.job.Job (oid 30, db 'unnamed') ``zc.async.doctest_test.send_message()``>", 
        "completed": "2006-08-10T15:44:...Z", 
        "failed": false, 
        "poll id": 6420106068024891087, 
        "queue": "", 
        "quota names": [], 
        "reassigned": false, 
        "result": "None", 
        "started": "2006-08-10T15:44:...Z", 
        "thread": ...
    } 
    -> CLOSE

Here's the most recent poll.

    >>> connection.test_input('async poll\n') # doctest: +ELLIPSIS
    {
        "key": 6420106068024891087, 
        "results": {
            "": {
                "main": {
                    "active jobs": [], 
                    "error": null, 
                    "len": 0, 
                    "new jobs": [
                        [
                            30, 
                            "unnamed"
                        ]
                    ], 
                    "size": 3
                }
            }
        }, 
        "time": "2006-08-10T15:44:...Z"
    } 
    -> CLOSE

Now let's look at the dispatcher with a job in progress.

    >>> import threading
    >>> lock = threading.Lock()
    >>> lock.acquire()
    True
    >>> def wait_for_me():
    ...     lock.acquire()
    ...     print 'OK, done'
    ...
    >>> job = queue.put(wait_for_me)
    >>> transaction.commit()
    >>> reactor.wait_for(job)
    TIME OUT

    >>> _ = transaction.begin()
    >>> job.status == zc.async.interfaces.ACTIVE
    True

Now we can use the ``jobs`` command to look at the active jobs.

    >>> connection.test_input('async jobs\n')
    [
        [
            36, 
            "unnamed"
        ]
    ] 
    -> CLOSE

Here's the information for that job.

    >>> connection.test_input('async job 36\n') # doctest: +ELLIPSIS
    {
        "agent": "main", 
        "call": "<zc.async.job.Job (oid 36, db 'unnamed') ``zc.async.doctest_test.wait_for_me()``>", 
        "completed": null, 
        "failed": false, 
        "poll id": 6420106067941005007, 
        "queue": "", 
        "quota names": [], 
        "reassigned": false, 
        "result": null, 
        "started": "2006-08-10T15:44:...Z", 
        "thread": ...
    } 
    -> CLOSE

Here are the revised stats.

    >>> connection.test_input('async jobstats\n') # doctest: +ELLIPSIS
    {
        "failed": 0, 
        "longest active": [
            36, 
            "unnamed"
        ], 
        "longest failed": null, 
        "longest successful": [
            30, 
            "unnamed"
        ], 
        "shortest active": [
            36, 
            "unnamed"
        ], 
        "shortest failed": null, 
        "shortest successful": [
            30, 
            "unnamed"
        ], 
        "started": 2, 
        "statistics end": "2006-08-10T15:44:...Z", 
        "statistics start": "2006-08-10T15:44:...Z", 
        "successful": 1, 
        "unknown": 0
    } 
    -> CLOSE

Here's our poll.

    >>> connection.test_input('async poll\n') # doctest: +ELLIPSIS
    {
        "key": 6420106067941005007, 
        "results": {
            "": {
                "main": {
                    "active jobs": [], 
                    "error": null, 
                    "len": 0, 
                    "new jobs": [
                        [
                            36, 
                            "unnamed"
                        ]
                    ], 
                    "size": 3
                }
            }
        }, 
        "time": "2006-08-10T15:44:...Z"
    } 
    -> CLOSE

Let's wait again.

    >>> reactor.wait_for(job)
    TIME OUT

    >>> _ = transaction.begin()
    >>> job.status == zc.async.interfaces.ACTIVE
    True

Here's ``jobs`` and ``jobstats`` again.

    >>> connection.test_input('async jobs\n')
    [
        [
            36, 
            "unnamed"
        ]
    ] 
    -> CLOSE

    >>> connection.test_input('async jobstats\n') # doctest: +ELLIPSIS
    {
        "failed": 0, 
        "longest active": [
            36, 
            "unnamed"
        ], 
        "longest failed": null, 
        "longest successful": [
            30, 
            "unnamed"
        ], 
        "shortest active": [
            36, 
            "unnamed"
        ], 
        "shortest failed": null, 
        "shortest successful": [
            30, 
            "unnamed"
        ], 
        "started": 2, 
        "statistics end": "2006-08-10T15:44:...Z", 
        "statistics start": "2006-08-10T15:44:...Z", 
        "successful": 1, 
        "unknown": 0
    } 
    -> CLOSE

Here's the most recent poll.

    >>> connection.test_input('async poll\n') # doctest: +ELLIPSIS
    {
        "key": 6420106067857118927, 
        "results": {
            "": {
                "main": {
                    "active jobs": [
                        [
                            36, 
                            "unnamed"
                        ]
                    ], 
                    "error": null, 
                    "len": 1, 
                    "new jobs": [], 
                    "size": 3
                }
            }
        }, 
        "time": "2006-08-10T15:44:...Z"
    } 
    -> CLOSE

We do have a few polls now.  Here they are from most recent to oldest.

    >>> connection.test_input('async polls\n') # doctest: +ELLIPSIS
    [
        {
            "key": 6420106067857118927, 
            "results": {
                "": {
                    "main": {
                        "active jobs": [
                            [
                                36, 
                                "unnamed"
                            ]
                        ], 
                        "error": null, 
                        "len": 1, 
                        "new jobs": [], 
                        "size": 3
                    }
                }
            }, 
            "time": "2006-08-10T15:44:...Z"
        }, 
        {
            "key": 6420106067941005007, 
            "results": {
                "": {
                    "main": {
                        "active jobs": [], 
                        "error": null, 
                        "len": 0, 
                        "new jobs": [
                            [
                                36, 
                                "unnamed"
                            ]
                        ], 
                        "size": 3
                    }
                }
            }, 
            "time": "2006-08-10T15:44:...Z"
        }, 
        {
            "key": 6420106068024891087, 
            "results": {
                "": {
                    "main": {
                        "active jobs": [], 
                        "error": null, 
                        "len": 0, 
                        "new jobs": [
                            [
                                30, 
                                "unnamed"
                            ]
                        ], 
                        "size": 3
                    }
                }
            }, 
            "time": "2006-08-10T15:44:...Z"
        }
    ] 
    -> CLOSE

Now we'll let the job complete.

    >>> lock.release()
    >>> reactor.wait_for(job)
    OK, done

The job is done according to the ``job`` command also.

    >>> connection.test_input('async job 36\n') # doctest: +ELLIPSIS
    {
        "agent": "main", 
        "call": "<zc.async.job.Job (oid 36, db 'unnamed') ``zc.async.doctest_test.wait_for_me()``>", 
        "completed": "2006-08-10T15:44:...Z", 
        "failed": false, 
        "poll id": 6420106067941005007, 
        "queue": "", 
        "quota names": [], 
        "reassigned": false, 
        "result": "None", 
        "started": "2006-08-10T15:44:...Z", 
        "thread": ...
    } 
    -> CLOSE

No more active jobs.

    >>> connection.test_input('async jobs\n')
    [] 
    -> CLOSE

Here are the revised stats.

    >>> connection.test_input('async jobstats\n') # doctest: +ELLIPSIS
    {
        "failed": 0, 
        "longest active": null, 
        "longest failed": null, 
        "longest successful": [
            36, 
            "unnamed"
        ], 
        "shortest active": null, 
        "shortest failed": null, 
        "shortest successful": [
            30, 
            "unnamed"
        ], 
        "started": 2, 
        "statistics end": "2006-08-10T15:44:...Z", 
        "statistics start": "2006-08-10T15:44:...Z", 
        "successful": 2, 
        "unknown": 0
    } 
    -> CLOSE

    >>> reactor.stop()

.. [#setUp] See the discussion in other documentation to explain this code.

    >>> import ZODB.FileStorage
    >>> storage = ZODB.FileStorage.FileStorage(
    ...     'zc_async.fs', create=True)
    >>> from ZODB.DB import DB 
    >>> db = DB(storage) 
    >>> conn = db.open()
    >>> root = conn.root()

    >>> import zc.async.configure
    >>> zc.async.configure.base()

    >>> import zc.async.testing
    >>> reactor = zc.async.testing.Reactor()
    >>> reactor.start() # this monkeypatches datetime.datetime.now 

    >>> import zc.async.queue
    >>> import zc.async.interfaces
    >>> mapping = root[zc.async.interfaces.KEY] = zc.async.queue.Queues()
    >>> queue = mapping[''] = zc.async.queue.Queue()
    >>> import transaction
    >>> transaction.commit()

    >>> import zc.async.dispatcher
    >>> dispatcher = zc.async.dispatcher.Dispatcher(db, reactor)
    >>> dispatcher.activate()
    >>> reactor.time_flies(1)
    1

    >>> import zc.async.agent
    >>> agent = zc.async.agent.Agent()
    >>> queue.dispatchers[dispatcher.UUID]['main'] = agent
    >>> transaction.commit()

.. [#monitor_setup] This part actually sets up the monitoring.

    >>> import zc.ngi.testing
    >>> import zc.monitor

    >>> connection = zc.ngi.testing.TextConnection()
    >>> server = zc.monitor.Server(connection)

    >>> import zc.async.monitor
    >>> import zope.component
    >>> import zc.monitor.interfaces
    >>> zope.component.provideUtility(
    ...     zc.async.monitor.async,
    ...     zc.monitor.interfaces.IMonitorPlugin,
    ...     'async')
    >>> zope.component.provideUtility(zc.monitor.help,
    ...     zc.monitor.interfaces.IMonitorPlugin, 'help')
