{ "info": { "author": "Ismail Uddin", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Programming Language :: Python :: 3" ], "description": "![](header.png)\n\n# markovclick\n\n[![CircleCI](https://circleci.com/gh/ismailuddin/markovclick/tree/master.svg?style=svg)](https://circleci.com/gh/ismailuddin/markovclick/tree/master)\n![AUR](https://img.shields.io/aur/license/yaourt.svg)\n[![Documentation Status](https://readthedocs.org/projects/markovclick/badge/?version=latest)](https://markovclick.readthedocs.io/en/latest/?badge=latest)\n\n\nPython implementation of the R package [clickstream](https://cran.r-project.org/web/packages/clickstream/index.html) which models website clickstreams as Markov chains.\n\n---\n\n`markovclick` allows you to model clickstream data from websites as Markov chains, which can then be used to predict the next likely click on a website for a user, given their history and current state. \n\n## Requirements\n* Python 3.X\n* numpy\n* matplotlib\n* seaborn (Recommended)\n* pandas\n\n## Installation\n```\npython setup.py install\n```\nor\n\n```\npip install markovclick\n```\n\n## Tests\nTests can be run using `pytest` or `tox` command from the root directory.\n\n## Documentation\nTo build the documentation, run `make build_docs`. The documentation can then be viewed through a web server using `make serve_docs`.\n\n## Usage\n\n### Quick start\nTo start using the package without any data, `markovclick` can produce dummy data for you to experiment with:\n\n```python\nfrom markovclick import dummy\nclickstream = dummy.gen_random_clickstream(nOfStreams=100, nOfPages=12)\n```\n\n\n### Terminology\nIn the context of this package, streams refer to a series of clicks belonging to a given user. The time difference between clicks is defined by the user when assembling these streams, but is typically taken to be 30 minutes in the industry.\n\nThe pages refer to the individual clicks of the user, and thus the pages they visit. Rather than storing the entire URL of the page the user visits, it is better to encode pages using a simple code such as `PXX` where `X` can be any number. This strategy can be used to group similar pages under the same code, as modelling them as separate pages is sometimes not useful leading to an excessively large probability matrix.\n\n\n#### Building Markov chains\nTo build a Markov chain from the dummy data:\n\n```python\nfrom markovclick.models import MarkovClickstream\nm = MarkovClickstream(clickstream)\n```\n\nThe instance `m` of the `MarkovClickstream` class provides access the class's attributes such as the probability matrix (`m.prob_matrix`) used to model the Markov chain, and the list of unique pages (`m.pages`) featuring in the clickstream.\n\n### Visualisation \n\n#### Visualising as a heatmap\n\nThe probability matrix can be visualised as a heatmap as follows:\n\n```python\nsns.heatmap(m.prob_matrix, xticklabels=m.pages, yticklabels=m.pages)\n```\n\n\n\n\n#### Visualising the Markov chain\n\nA Markov chain can be thought of as a graph of nodes and edges, with the edges representing the transitions from each state. `markovclick` provides a wrapper function around the `graphviz` package to visualise the Markov chain in this manner.\n\n```python\nfrom markovclick.viz imoport visualise_markov_chain\ngraph = visualise_markov_chain(m)\n```\n\nThe function `visualise_markov_chain()` returns a `Digraph` object, which can be viewed directly inside a Jupyter notebook by simply calling the reference to the object returned. It can also be outputted to a PDF file by calling the `render()` function on the object. \n\n\n\nIn the graph produced, the nodes representing the individual pages are shown in green, and up to 3 edges from each node are rendered. The first edge is in a thick blue arrow, depicting the most likely transition from this page / state to the next page / state. The second edge depicted by a thinner blue arrow, depicts the second most likely transition from this state. Finally, a third edge is shown that depicts the transition from this page / state back to itself (light grey). This edge is only shown if the the two most likely transitions are not already to itself. For all transitions, the probability is shown next to the edge (arrow).\n\n\n\n### Clickstream processing with `markovclick.preprocessing`\n\n`markovclick` provides functions to process clickstream data such as server logs, which contain unique identifiers such as cookie IDs associated with each click. This allows clicks to be aggregated into groups, whereby clicks from the same browser (identified by the unique identifier) are grouped such that the difference between individual clicks does not exceed the maximum session timeout (typically taken to be 30 minutes).\n\n#### Sessionise clickstream data\n\n##### `Sessionise`\n\nTo sessionise clickstream data, the following code can be used that require a `pandas` DataFrame object.\n\n```python\nfrom markovclic.preprocessing import Sessionise\nsessioniser = Sessionise(df, unique_id_col='cookie_id',\n\t\t\t datetime_col='timestamp', session_timeout=30)\n```\n\n##### Arguments\n\n| Argument | Type | Description |\n| ----------------- | --------- | ------------------------------------------------------------ |\n| `df` | DataFrame | `pandas` DataFrame object containing clickstream data. Must contain atleast a timestamp column, unique identifier column such as cookie ID. |\n| `unique_id_col` | String | Column name of unique identifier, e.g. `cookie_id` |\n| `datetime_col` | String | Column name of timestamp column. |\n| `session_timeout` | Integer | Maximum time in minutes after which a session is broken. |\n\n##### `Sessionise.assign_sessions()`\n\nWith a `Sessionise` object instantiated, the `assign_sessions()` function can then be called. This function supports multi-processing, enabling you the split job into multiple processes to take advantage of a multi-core CPU.\n\n```python\nsessioniser.assign_sessions(n_jobs=2)\n```\n\n##### Arguments\n\n| Argument | Type | Description |\n| -------- | ------- | ------------------------------------------------------------ |\n| `n_jobs` | Integer | Number of processes to spawn to enable parallel processing. If set to `1`, no splitting occurs. |\n\nThe `assign_sessions()` function returns the DataFrame, with an additional column added storing the unique identifier for the session. Rows of the DataFrame can then be grouped using this column.\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://github.com/ismailuddin/markovclick/tarball/0.1.1", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ismailuddin/markovclick", "keywords": "markov chain data science machine learning statistics clickstream", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "markovclick", "package_url": "https://pypi.org/project/markovclick/", "platform": "", "project_url": "https://pypi.org/project/markovclick/", "project_urls": { "Download": "https://github.com/ismailuddin/markovclick/tarball/0.1.1", "Homepage": "https://github.com/ismailuddin/markovclick" }, "release_url": "https://pypi.org/project/markovclick/0.1.1/", "requires_dist": null, "requires_python": "", "summary": "Package for modelling clickstream data using Markov chains", "version": "0.1.1" }, "last_serial": 4817325, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "c02b54df5bc8e8821810f19b7c7cb188", "sha256": "94b5b8eb92090e3340c6d9fe6d8dc6b29978bbf6384257e87eb7517712982fb0" }, "downloads": -1, "filename": "markovclick-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "c02b54df5bc8e8821810f19b7c7cb188", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10301, "upload_time": "2019-02-13T20:16:56", "url": "https://files.pythonhosted.org/packages/68/c6/e3b5c7913eba939298cdbbe543c65983bb102b9635d19c02e5365560b2b1/markovclick-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0124b0bc822ad323c34b9519c2bec3df", "sha256": "85fade84b4604efcb98636db434f9bd14071f40dd224fafb00217da928de16d4" }, "downloads": -1, "filename": "markovclick-0.1.0.tar.gz", "has_sig": false, "md5_digest": "0124b0bc822ad323c34b9519c2bec3df", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10153, "upload_time": "2019-02-13T20:16:58", "url": "https://files.pythonhosted.org/packages/fd/a6/809d9043f44b247fdfbccfadd03e09583a893c1f533222371c2e44b7a9db/markovclick-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "aca44b94b8206d89dcb9815a83b1be07", "sha256": "b37a44b1cc72e02a5e62f5dbd2aa27a1995d0909220494e7cccf2eebd24f9a4f" }, "downloads": -1, "filename": "markovclick-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "aca44b94b8206d89dcb9815a83b1be07", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10359, "upload_time": "2019-02-13T20:26:24", "url": "https://files.pythonhosted.org/packages/1a/ff/8be808d320bed494f310aaed44406df1502e306724b2d19d059cf01cf363/markovclick-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f2970434256bdfa94f3a65a0844e5474", "sha256": "3f39d1c01db7240f8679e3af749e90faab31ab1c82d614a195094f7b6485ab8f" }, "downloads": -1, "filename": "markovclick-0.1.1.tar.gz", "has_sig": false, "md5_digest": "f2970434256bdfa94f3a65a0844e5474", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10301, "upload_time": "2019-02-13T20:26:25", "url": "https://files.pythonhosted.org/packages/31/8c/0ac3c3bb7f4734e23cafd601a9adf56e3fac8c5b714bf5340c2ccebb50d6/markovclick-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "aca44b94b8206d89dcb9815a83b1be07", "sha256": "b37a44b1cc72e02a5e62f5dbd2aa27a1995d0909220494e7cccf2eebd24f9a4f" }, "downloads": -1, "filename": "markovclick-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "aca44b94b8206d89dcb9815a83b1be07", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10359, "upload_time": "2019-02-13T20:26:24", "url": "https://files.pythonhosted.org/packages/1a/ff/8be808d320bed494f310aaed44406df1502e306724b2d19d059cf01cf363/markovclick-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f2970434256bdfa94f3a65a0844e5474", "sha256": "3f39d1c01db7240f8679e3af749e90faab31ab1c82d614a195094f7b6485ab8f" }, "downloads": -1, "filename": "markovclick-0.1.1.tar.gz", "has_sig": false, "md5_digest": "f2970434256bdfa94f3a65a0844e5474", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10301, "upload_time": "2019-02-13T20:26:25", "url": "https://files.pythonhosted.org/packages/31/8c/0ac3c3bb7f4734e23cafd601a9adf56e3fac8c5b714bf5340c2ccebb50d6/markovclick-0.1.1.tar.gz" } ] }