{ "info": { "author": "Francisco Canas", "author_email": "mailfrancisco@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "Distiller\n=========\n\nDistiller provides convenient auto-extraction of document key words\nbased on term-frequency/inverse-document-frequency (TF-IDF) and word\npositioning.\n\nDistiller handles all of the pre-processing details and produces final\nstatistic reports in JSON format.\n\n\nRequirements\n------------\n\nDistiller uses the [Natural Language Toolkit](http://www.nltk.org/)\n\nYou will need to download a couple of NLTK packages:\n\n >>> import nltk\n >>> nltk.download()\n Downloader> d\n Download which package (l=list; x=cancel)?\n Identifier> maxent_treebank_pos_tagger\n Downloader> d\n Download which package (l=list; x=cancel)?\n Identifier> stopwords\n\n\n\nInstallation\n------------\n\nInstallation using pip:\n\n $ pip install Distiller\n\n\nUsage\n-----\n\nTypical usage from within the Python interpreter:\n\n >>> from Distiller.distiller import Distiller\n >>> distiller = Distiller(data, target, options)\n\n\nArguments\n---------\n\n### data\n\nPath to file containing the document collection in JSON format.\n\n {\n 'metadata': {\n 'base_url': 'The document's source URL (if any)'\n },\n 'documents': [\n {\n 'id': 'The document's unique identifier (if any)',\n 'body': 'The entire body of the document in a single text blob.',\n }, ...\n ]\n }\n\n###target\n\nPath where Distiller will output the following reports:\n\nkeywords: A list of words and the frequency with which they were detected as being keywords of documents.\n\nbigrams: A list of word pairs and the frequency with which they were detected as being key pairs in documents.\n\ntrigrams: A list of word triples and the frequency with which they were detected as being key pairs in documents.\n\ndocmap: A mapping of document IDs to their respective keywords, n-grams, and other statistics.\n\nkeymap: A mapping of keywords to the documents they appear in.\n\n\n###options\n\nAn optional dictionary containing document processing arguments in this format:\n\n {\n 'normalize': True, # normalize tokens during pre processing\n 'stem': True, # stems tokens during pre processing\n 'lemmatize': False, # lemmatize during pre processing\n 'tfidf_cutoff': 0.001, # cutoff value to use for term-freq/doc-freq score\n 'pos_list': ['NN','NNP'], # POS white list used to filter for candidates\n 'black_list': [] # token list used to filter out from candidates\n }", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/FranciscoCanas/Distiller", "keywords": null, "license": "LICENSE.txt", "maintainer": null, "maintainer_email": null, "name": "Distiller", "package_url": "https://pypi.org/project/Distiller/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/Distiller/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/FranciscoCanas/Distiller" }, "release_url": "https://pypi.org/project/Distiller/0.1.2/", "requires_dist": null, "requires_python": null, "summary": "Automatic Keyword Extraction from Document Collections", "version": "0.1.2" }, "last_serial": 1189549, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "cfa2500e1897bb8d4d5abf501ba52c11", "sha256": "76add898d294f0e86f50456cd23381ff3a5de08a06d310a3f9f8225eb5b2e705" }, "downloads": -1, "filename": "Distiller-0.1.0.tar.gz", "has_sig": false, "md5_digest": "cfa2500e1897bb8d4d5abf501ba52c11", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5813, "upload_time": "2014-06-20T13:20:29", "url": "https://files.pythonhosted.org/packages/ee/64/b68e184d54b3476c370ea8542bb34baa56d5cc55010386a92d1c91078af2/Distiller-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "0f50a535370302ab460d91dd691dd134", "sha256": "23e5fedc881368b3b22d36cd354916d41a9882b0d9a5e9cc7e57943bfbb67fc7" }, "downloads": -1, "filename": "Distiller-0.1.1.tar.gz", "has_sig": false, "md5_digest": "0f50a535370302ab460d91dd691dd134", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7718, "upload_time": "2014-06-20T15:17:14", "url": "https://files.pythonhosted.org/packages/52/9d/613a60864aafb1c3e952b33ae6e36ca14db0163f288a5bc76bbf33dd2eb3/Distiller-0.1.1.tar.gz" } ], "0.1.1.1": [ { "comment_text": "", "digests": { "md5": "716eda1acffa8d91ba9b0d6b44a19f0f", "sha256": "76e045d8a56d1aa6edaf15ec827534faaea7de8e8a5c589b153864e79fbe4152" }, "downloads": -1, "filename": "Distiller-0.1.1.1.tar.gz", "has_sig": false, "md5_digest": "716eda1acffa8d91ba9b0d6b44a19f0f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9559, "upload_time": "2014-06-21T16:59:10", "url": "https://files.pythonhosted.org/packages/0b/d6/239bd1ab3590e9bb7bcd3397c7b5e8293f99acd47e6086e60539318f4de5/Distiller-0.1.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "c7c04452e39f3e59ff598ae5765aae6c", "sha256": "da39278ec9a88ef5d529bbc165775c431afa04818f278960a38138b43cc9c366" }, "downloads": -1, "filename": "Distiller-0.1.2.tar.gz", "has_sig": false, "md5_digest": "c7c04452e39f3e59ff598ae5765aae6c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9065, "upload_time": "2014-08-13T18:42:27", "url": "https://files.pythonhosted.org/packages/a6/01/cf449be1cd09e817e8bb9a9dd310fb7a0c3d493ecf6a1f62f7099e574fa4/Distiller-0.1.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c7c04452e39f3e59ff598ae5765aae6c", "sha256": "da39278ec9a88ef5d529bbc165775c431afa04818f278960a38138b43cc9c366" }, "downloads": -1, "filename": "Distiller-0.1.2.tar.gz", "has_sig": false, "md5_digest": "c7c04452e39f3e59ff598ae5765aae6c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9065, "upload_time": "2014-08-13T18:42:27", "url": "https://files.pythonhosted.org/packages/a6/01/cf449be1cd09e817e8bb9a9dd310fb7a0c3d493ecf6a1f62f7099e574fa4/Distiller-0.1.2.tar.gz" } ] }