{ "info": { "author": "Samuel Li", "author_email": "sli@projreality.com", "bugtrack_url": null, "classifiers": [], "description": "WebHist\n=======\n\n.. contents:: Table of Contents\n :local:\n\nWebHist indexes a collection of saved webpages and provides an interface to search the index.\n\nWebHist can handle the following archive file types:\n\n- MAFF files generated by Mozilla Archive Format, with MHT and Faithful Save\n- HTML files generated by Save Page WE\n\nInstallation\n------------\n\nPackage is uploaded on `PyPI `_.\n\nYou can install it with pip::\n\n $ pip install webhist\n\nUsage\n-----\n\nCreate an index of archived webpages\n\n.. code:: python\n\n i = webhist.Index(\"/path/to/index\")\n\nIndex a single file\n\n.. code:: python\n\n i.add(\"/path/to/file\")\n\nA file will not be re-indexed unless explicitly requested. Files are tracked by the path string passed to the add() function, so an absolute path and a relative path will be considered two different files.\n\nThe code below will update the file in the index\n\n.. code:: python\n\n i.add(\"/path/to/file\", update=True)\n\nAdd all files in a specified directory (note that it does not search within subdirectories)\n\n.. code:: python\n\n i.add_path(\"/path/to/directory\")\n\nAgain, you can specify :literal:`update=True` to re-index files. You can also specify :literal:`verbose=True` to print information about whether or not files were indexed\n\n.. code:: python\n\n i.add_path(\"/path/to/directory\", verbose=True)\n\nThe output will look something like::\n\n file1\n - file2 (already in index)\n - file3 (exception type: error message)\n\nIn the example output above:\n\n- file1 was indexed correctly\n- file2 was already in the index, and was not re-indexed\n- file3 had a problem and was not indexed (python exception message shown)\n\nAfter adding files, the changes to the index need to be committed\n\n.. code:: python\n\n i.commit()\n\nYou can also cancel the changes\n\n.. code:: python\n\n i.cancel()\n\nOnce an index has been populated, you can run search queries against it. The syntax follows the Whoosh default query language. More information can be found `here `_.\n\nThe code below searches for webpage archives that contain \"webhist\" and \"installation\"\n\n.. code:: python\n\n results = i.search(\"webhist installation\")\n\nThe field searched by default is the :literal:`content` field. The following fields are indexed and searchable:\n\n- title (title of page)\n- content (content of page)\n- url (full URL of page)\n- fqdn (fully qualified domain name, e.g. packaging.python.org)\n- dn (domain name, e.g. python.org)\n- date (the date the webpage archive was saved)\n\nFor example, you can search the title field for webpages saved from example.com\n\n.. code:: python\n\n results = i.search(\"title:webhist dn:example.com\")\n\nShell Interface\n---------------\n\nA simple shell interface to a WebHist index is provided in :literal:`examples/shell.py`. You can clone the webhist repo and run it from the repo root::\n\n $ python examples/shell.py /path/to/archive -i /path/to/index\n\nThe :literal:`-i` parameter is optional. The default index location is :literal:`/path/to/archive/index`.\n\nRun a search query::\n\n webhist> search title:webhist dn:example.com\n\nThe output will look something like::\n\n 0: [2010-01-02 12:30:01] Title of page\n 1: [2011-02-03 16:20:25] Another page\n 2: [2013-06-12 00:00:01] Yet another page\n\nTo open page #2 from the search results::\n\n webhist> open 2\n\nTo get more help::\n\n webhist> help\n\nTo exit the shell::\n\n webhist> exit\n\nLicense\n-------\n\nWebHist is released under the GNU Lesser General Public License, Version 3.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/projreality/webhist", "keywords": "", "license": "https://www.gnu.org/licenses/lgpl.html", "maintainer": "", "maintainer_email": "", "name": "webhist", "package_url": "https://pypi.org/project/webhist/", "platform": "", "project_url": "https://pypi.org/project/webhist/", "project_urls": { "Homepage": "https://github.com/projreality/webhist" }, "release_url": "https://pypi.org/project/webhist/1.0.0/", "requires_dist": [ "bs4", "python-dateutil", "maflib", "tld", "whoosh" ], "requires_python": "", "summary": "Saved webpage index and search", "version": "1.0.0" }, "last_serial": 4967080, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "22e2a4d04a717f272c4399de8af96a5c", "sha256": "2016ad076f202994949269d6c2f019fb3e070e98043a18615ec3fa8d0568fc6c" }, "downloads": -1, "filename": "webhist-1.0.0-py2-none-any.whl", "has_sig": false, "md5_digest": "22e2a4d04a717f272c4399de8af96a5c", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 20834, "upload_time": "2019-03-21T08:27:39", "url": "https://files.pythonhosted.org/packages/72/c2/4002385e64b13af721b3909546252f4ab7b9db827ef74929be319c3a18a9/webhist-1.0.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7f36ebdbf04b8b5bce13879ba771d38b", "sha256": "cb097f9ad13c9e9e43786e2a045f67c10315bb30ad18493bdbc1eb5ba42ea326" }, "downloads": -1, "filename": "webhist-1.0.0.tar.gz", "has_sig": false, "md5_digest": "7f36ebdbf04b8b5bce13879ba771d38b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4770, "upload_time": "2019-03-21T08:27:41", "url": "https://files.pythonhosted.org/packages/a5/f0/0b9ff7dd25ce9bb5fb6bab972560a6a740dfbe5cb2a7e5566a1e9e8c092f/webhist-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "22e2a4d04a717f272c4399de8af96a5c", "sha256": "2016ad076f202994949269d6c2f019fb3e070e98043a18615ec3fa8d0568fc6c" }, "downloads": -1, "filename": "webhist-1.0.0-py2-none-any.whl", "has_sig": false, "md5_digest": "22e2a4d04a717f272c4399de8af96a5c", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 20834, "upload_time": "2019-03-21T08:27:39", "url": "https://files.pythonhosted.org/packages/72/c2/4002385e64b13af721b3909546252f4ab7b9db827ef74929be319c3a18a9/webhist-1.0.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7f36ebdbf04b8b5bce13879ba771d38b", "sha256": "cb097f9ad13c9e9e43786e2a045f67c10315bb30ad18493bdbc1eb5ba42ea326" }, "downloads": -1, "filename": "webhist-1.0.0.tar.gz", "has_sig": false, "md5_digest": "7f36ebdbf04b8b5bce13879ba771d38b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4770, "upload_time": "2019-03-21T08:27:41", "url": "https://files.pythonhosted.org/packages/a5/f0/0b9ff7dd25ce9bb5fb6bab972560a6a740dfbe5cb2a7e5566a1e9e8c092f/webhist-1.0.0.tar.gz" } ] }