{ "info": { "author": "Vit Novacek, DERI, NUIG", "author_email": "vit.novacek@deri.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Environment :: Console", "Environment :: Web Environment", "Intended Audience :: Education", "Intended Audience :: End Users/Desktop", "Intended Audience :: Information Technology", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Text Processing :: Indexing", "Topic :: Text Processing :: Linguistic" ], "description": "==============================\r\n SKIMMR_GT - a Brief Overview\r\n==============================\r\n\r\n-----------------------------------------------------\r\n Contact vit.novacek@deri.org for more detailed info\r\n-----------------------------------------------------\r\n\r\n0. ABSTRACT AND TABLE OF CONTENTS\r\n=================================\r\n\r\nThis document provides basic information about SKIMMR (a tool for \r\nmachine-aided skim reading), in particular about its SKIMMR_GT package version\r\nthat focuses on general texts. \r\n\r\nThe document contains three sections:\r\n\r\n1. ABOUT - overview of the tool and its functionalities\r\n\r\n2. INSTALLATION - basic instructions on how to install SKIMMR\r\n\r\n3. USING SKIMMR - basic instruction on how to use it after installation\r\n\r\n\r\n1. ABOUT\r\n========\r\n\r\nSKIMMR is a research prototype aimed at helping users to navigate through \r\nlarge amounts of textual data efficiently. This is done by extending the \r\ntraditional paradigm of searching and browsing of text collections. SKIMMR \r\nlets the user skim texts by navigating a network of concepts and relations \r\nexplicitly or implicitly present in them. The concepts and their relations \r\nhave been extracted and inferred from the textual content using novel machine \r\nreading techniques that power the SKIMMR back-end.\r\n\r\nThe interconnected `skimming networks' provide a high-level overview of the \r\ndomain covered by the texts, and let the user quickly discover interesting \r\npieces of information. This process also largely reduces the burden of \r\nsieving through lots of irrelevant resources, which is often the down side \r\nof using the standard search engines. When the users identify interesting \r\ninformation within the high level overview, they can continue reading the \r\nrelated textual resources in detail. \r\n\r\n2. INSTALLATION\r\n===============\r\n\r\nProbably the easiest way of installing SKIMMR is using easy_install:\r\n\r\n\t*easy_install skimmr_gt*\r\n\r\nCheck the documentation at http://peak.telecommunity.com/DevCenter/EasyInstall\r\nfor more detailed info on easy_install and setuptools.\r\n\r\nShould you prefer to download and install the package manually, fetch the\r\nSKIMMR distribution archive file first. After unpacking it, switch to the \r\ngenerated directory and execute the following command:\r\n\r\n\t*python setup.py install*\r\n\r\nIf you want to install the package locally (for the current user only), use \r\nthe following:\r\n\r\n\t*python setup.py install --user*\r\n\r\nCheck the documentation at http://docs.python.org/2/distutils/index.html for \r\nmore detailed options.\r\n\r\n3. USING SKIMMR\r\n===============\r\n\r\nAfter downloading and installing the SKIMMR package, it can be readily used\r\nin a basic manner through the scripts provided. These are:\r\n\r\n- exst_gt.py - extraction of co-occurrence statements from texts\r\n\r\n- crkb_gt.py - creation of a knowledge base and its population by semantic \r\n similarity relations\r\n\r\n- ixkb_gt.py - indexing of the knowledge base for efficient querying\r\n\r\n- prep_gt.py - preparation of the sub-folder structure and resources in the \r\n working directory necessary in order to launch the SKIMMR server\r\n\r\n- srvr_gt.py - launching the SKIMMR server and UI\r\n\r\nThe scripts are located in the *bin* subdirectory of the installation package.\r\nAlternatively, you can copy them from wherever your system puts Python package\r\nbinaries (check the documentation of your local operating system and Python\r\nimplementation).\r\n\r\nThe typical way of using these scripts is summarised in the following sections.\r\nNote that there are other ways how to launch the scripts - you can check the \r\ndocumentation in the script source code for details.\r\n\r\n3.1 Creating the working directory\r\n----------------------------------\r\n\r\nFirst of all, SKIMMR requires a place to store and process its data. Create a\r\ndirectory for that somewhere (let us assume it is called *skimmr* in the \r\nfollowing). Switch to that directory then and copy all the SKIMMR scripts \r\nthere. After that, run\r\n\r\n\t*python prep_gt.py*\r\n\r\nin there. This will generate two sub-directories, *data* and *text*, as well\r\nas a couple of files and directories deeper in the *data* one. You are all set\r\nfor loading the texts you want to process into SKIMMR then.\r\n\r\n3.2 Processing the texts\r\n------------------------\r\n\r\nCopy the text files you want to process to the *text* folder in the *skimmr*\r\ndirectory. Plain text files (in ASCII or Unicode format) are supported, with\r\nthe *.txt* extensions. It is advisable to use meaningful and unique filenames\r\nfor the text files, as they will be used later on for assigning the provenance\r\nidentifiers to the original text data.\r\n\r\nAfter you have all the texts in place, run\r\n\r\n\t*python exst_gt.py*\r\n\r\nin the *skimmr* directory. This will chop up the texts into paragraphs and\r\nextract the co-occurrence statements from them. There is a limit imposed\r\non the number of produced statements in the exst.py script, dynamically\r\ncomputed from the available memory (or set to 750,000 if the psutil package\r\nis not available on your system). You can change that when using the SKIMMR\r\nlibrary functions directly. \r\n\r\n3.3 Creating the knowledge base\r\n-------------------------------\r\n\r\nAfter generating the co-occurrence statements in the previous step, you can\r\ncreate the knowledge base from them using \r\n\r\n\t*python crkb_gt.py create*\r\n\r\nwhich will generate a couple of knowledge base persistence files in the *stre*\r\nsub-folder of the *data* directory in the *skimmr* root folder.\r\n\r\n3.4 Computing similarities\r\n--------------------------\r\n\r\nWhen the knowledge base has been generated, you can augment it by computing\r\nsemantic similarity relationships between the terms that are more frequent \r\nthan average:\r\n\r\n\t*python crkb_gt.py compsim*\r\n\r\nThis will update the knowledge base persistence files accordingly. Note that \r\nthis step may take up to several hours for larger knowledge bases!\r\n\r\n3.5 Indexing the knowledge base\r\n-------------------------------\r\n\r\nBefore you can expose the processed content via a SKIMMR web interface, you\r\nhave to index the knowledge base. This is done by running\r\n\r\n\t*python ixkb_gt.py*\r\n\r\nwhich will generate a couple of index files in the knowledge base persistence\r\nsub-directory.\r\n\r\n3.6 Launching and using the server\r\n----------------------------------\r\n\r\nAt last, you can launch the SKIMMR server by\r\n\r\n\t*python srvr_gt.py*\r\n\r\nThis will start the server at localhost (127.0.0.1) and port 8008. You can \r\nspecify alternative addresses and ports by running the server as\r\n\r\n\t*python srvr_gt.py ADDRESS:PORT*\r\n\r\nAlso, you can specify an alternative store to be loaded by the server (useful\r\nif you want to examine multiple stores you have previously generated):\r\n\r\n\t*python srvr_gt.py [ADDRESS:PORT] [FOLDER]*\r\n\r\nwhere FOLDER is a path to the store you want to load.\r\n\r\nAfter the SKIMMR server has been started, you can point you browse to the \r\ncorresponding address and port and start using the tool as indicated in the\r\n*About* web-page accessible from the SKIMMR interface (just follow the link \r\nin the bottom of every page in the SKIMMR web interface).", "description_content_type": null, "docs_url": null, "download_url": "https://dl.dropbox.com/u/21379226/software/skimmr_gt-0.1-a2.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://www.skimmr.org", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "skimmr_gt", "package_url": "https://pypi.org/project/skimmr_gt/", "platform": "", "project_url": "https://pypi.org/project/skimmr_gt/", "project_urls": { "Download": "https://dl.dropbox.com/u/21379226/software/skimmr_gt-0.1-a2.tar.gz", "Homepage": "http://www.skimmr.org" }, "release_url": "https://pypi.org/project/skimmr_gt/0.1-a2/", "requires_dist": null, "requires_python": null, "summary": "SKIMMR library and scipts for machine-aided skim-reading (the general-purpose package version for arbitrary texts)", "version": "0.1-a2" }, "last_serial": 803342, "releases": { "0.1-a1": [ { "comment_text": "", "digests": { "md5": "9841d91cc1a475860848f774fff12517", "sha256": "416a4ca24a0c2c9ec60f51c1a0a6b0548a18513c1eabc2109d9baa48fce78426" }, "downloads": -1, "filename": "skimmr_gt-0.1-a1.tar.gz", "has_sig": false, "md5_digest": "9841d91cc1a475860848f774fff12517", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 659017, "upload_time": "2012-11-28T12:59:01", "url": "https://files.pythonhosted.org/packages/15/25/b11c9816029beac4ac014665ade0005ae08a8a906cb6f457bbabb56e1b0c/skimmr_gt-0.1-a1.tar.gz" } ], "0.1-a2": [ { "comment_text": "", "digests": { "md5": "7e4043826c3b65afbe4f44da062e5778", "sha256": "eadcfe514d22a459dfee58ffb5905c6df4aa2a180e45ba9358dce8cbefd4e705" }, "downloads": -1, "filename": "skimmr_gt-0.1-a2.tar.gz", "has_sig": false, "md5_digest": "7e4043826c3b65afbe4f44da062e5778", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 659041, "upload_time": "2013-02-15T17:31:58", "url": "https://files.pythonhosted.org/packages/e5/4e/618f6f3843703ae5f03fbb44ca9a8abfd7992ac056642b96e08e3cd8b610/skimmr_gt-0.1-a2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "7e4043826c3b65afbe4f44da062e5778", "sha256": "eadcfe514d22a459dfee58ffb5905c6df4aa2a180e45ba9358dce8cbefd4e705" }, "downloads": -1, "filename": "skimmr_gt-0.1-a2.tar.gz", "has_sig": false, "md5_digest": "7e4043826c3b65afbe4f44da062e5778", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 659041, "upload_time": "2013-02-15T17:31:58", "url": "https://files.pythonhosted.org/packages/e5/4e/618f6f3843703ae5f03fbb44ca9a8abfd7992ac056642b96e08e3cd8b610/skimmr_gt-0.1-a2.tar.gz" } ] }