{ "info": { "author": "Michael Moser", "author_email": "moser.michael@gmail.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Natural Language :: English", "Operating System :: OS Independent", "Topic :: Text Processing :: Linguistic" ], "description": "Parses the roget thesaurus and makes it accessible through an API.\n\nthe text of the Roget thesaurus was downloaded from here https://archive.org/details/rogetsthesauruso10681gut\n\nWritten by Michael Moser (c) 2015\n\n----\n\nclass RogetBuilder\n \tThe main entry point of this library; builds an instances of RogetThesaurus\n \n \tMethods defined here:\n\t\t\n\t __init__(self, verbose=0)\n\t\t\t\t\t\t\t\t\t\t\t v\n\t parse(self)\n\t\tparse the roget thesaursus\n\t\treturns an instance of RogetThesaurus \n\t\t \n\t\tNote that that file 10681-body.txt must be in the same directory as the script roget.py\n\n\t load(self, file)\n\t\tloads an instance of roget thesaurus (if possible from pickled/serialized form)\n\t\t \n\t\tif file does not exist\n\t\t parse roget thesaursus \n\t\t store pickled form to file\n\t\telse\n\t\t load pickled form from file\n\t\treturns instance of RogetThesaurus \n\t\t \n\t\tdon't use this! surprisingly it takes less time to parse it from the text file.\n\t\t(even with this inefficient parser)\n\t\t \n\t\tReason for this seems to be that pickled format is much larger then text file;\n\t\tpickle adds the type of the class as first element of sexpression - \n\t\tso there is a lot of redundancy and pickled file is much larger than text file.\n\n----\n\nclass RogetThesaurus\n Methods defined here:\n\n\t__init__(self, rootNode=None, headWordIndex=None, senseIndex=None)\n\n\n\tsemanticSimilarity(self, seq1, seq2)\n\t computes the semantic similarity between two terms, \n\t \n\t returns the following tuple (similarity-score, common-node-in-roget-thesaurus)\n\t \n\t \n\t the similarity score:\n\t\t100 - both terms appear in the same SenseGroup node\n\t\t 90 - both terms he the same head word\n\t\t 80 - both terms appear in the same leaf category\n\t\t 0 - everything else\n\t \n\t common-node-in-roget-thesaurus: is None if the score is 0; \n\t otherwise it is the common node that the score is based on\n\n Data descriptors defined here:\n\n\theadWordIndex\n\t the index of head words - maps a head word to its node in the ontology\n\t \n\trootNode\n\t the root node of the ontology\n\n\tsenseIndex\n\t the index of word senses - maps the word sense to a list of nodes in the ontology\n\n----\n\nclass RogetNode\n RogetNode - the base class of all nodes maintained by Roget thesaurus\n \n Methods defined here:\n\t__init__(self, type, description, parent=None)\n\n\ttoString(self)\n\n\ttypeToString(self)\n\t returns the type o this node as a string\n\n Data descriptors defined here:\n\n\n\tchild\n\t returns the array of child nodes\n\n\tdescription\n\t returns an optional description (in the text this appears as [ .... ] )\n\n\tinternalId\n\t each node has its own internal id\n\n\tkey\n\t the meaning/key of this node\n\t\n\tparent\n\t returns the parent node (one up in the ontology)\n\n\ttype\n\t returns the type of this node as a integer\n\n----\n\n class RogetThesaurusFormatterXML\n \tclass for formatting of Roget thesaurus as xml\n \n \tMethods defined here:\n\t show(self, roget, file)\n\n----\nclass Sense(RogetNode)\n a single sense (the leaf node of the Roget Thesaurus\n \n Methods defined here:\n\t __init__(self, type, parent)\n\n\t toString(self)\n \n Data descriptors defined here:\n\n\t comment\n\t\tan optional comment (in the text this is the text that appears in brackets )\n\n\n\t link\n\t\toptional link to a node of type HeadWord (in the text this appears as \"&c; 111\" - link to headword with id 111\n\n\n\t linkComment\n\t\toptional comment on a link\n\n\t wordType\n\t\toptional word type annotation\n\n Methods inherited from RogetNode:\n\t typeToString(self)\n\t\treturns the type o this node as a string\n\n Data descriptors inherited from RogetNode:\n\t child\n\t\treturns the array of child nodes\n\n\t description\n\t\treturns an optional description (in the text this appears as [ .... ] )\n\t\n\t internalId\n\t\teach node has its own internal id\n\n\t key\n\t\tthe meaning/key of this node\n\n\t parent\n\t\treturns the parent node (one up in the ontology)\n\n\t type\n\t\treturns the type of this node as a integer\n\n\n----\n\n\nclass HeadWord(Sense)\n A headword\n \n \t\n Method resolution order:\n\tHeadWord\n\tSense\n\tRogetNode\n\n\n Methods defined here:\n\t__init__(self, HeadIndex, parent)\n\n\ttoString(self)\n\n Data descriptors defined here:\n\t\n\tindex\n\t the string id that identifies the headword in the Roget thesaurus\n\n Data descriptors inherited from Sense:\n\tcomment\n\t an optional comment (in the text this is the text that appears in brackets )\n\n\tlink\n\t optional link to a node of type HeadWord (in the text this appears as \"&c; 111\" - link to headword with id 111\n\n\tlinkComment\n\t optional comment on a link\n\n\twordType\n\t optional word type annotation\n\n Methods inherited from RogetNode:\n\ttypeToString(self)\n\t returns the type o this node as a string\n\n Data descriptors inherited from RogetNode:\n\tchild\n\t returns the array of child nodes\n\n\tdescription\n\t returns an optional description (in the text this appears as [ .... ] )\n\n\tinternalId\n\t each node has its own internal id\n\n\tkey\n\t the meaning/key of this node\n\n\tparent\n\t returns the parent node (one up in the ontology)\n\n\ttype\n\t returns the type of this node as a integer\t\n\n----\n class RogetThesaususFormatterText\n\tclass for formatting of Roget thesaurus as text report\n \n \tMethods defined here:\n\t show(self, roget, file, mask=15)", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://mosermichael.github.io/cstuff/all/projects/2015/02/26/roget.html", "keywords": "natural language processing; thesaurus", "license": "BSD", "maintainer": null, "maintainer_email": null, "name": "roget", "package_url": "https://pypi.org/project/roget/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/roget/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://mosermichael.github.io/cstuff/all/projects/2015/02/26/roget.html" }, "release_url": "https://pypi.org/project/roget/0.0.1/", "requires_dist": null, "requires_python": null, "summary": "API to the Roget thesaurus", "version": "0.0.1" }, "last_serial": 1475818, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "437d52c5bc3edc49c5e8ada6f5599b4f", "sha256": "1e67144bf3696c54f2782499d8eb7c8998164e372afc89711588620700ac6169" }, "downloads": -1, "filename": "roget-0.0.1.tar.gz", "has_sig": false, "md5_digest": "437d52c5bc3edc49c5e8ada6f5599b4f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 598125, "upload_time": "2015-03-24T19:04:42", "url": "https://files.pythonhosted.org/packages/4c/24/c39c973758c551534016d44b7e2e59e4ac6831369503c500a05a098829f0/roget-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "437d52c5bc3edc49c5e8ada6f5599b4f", "sha256": "1e67144bf3696c54f2782499d8eb7c8998164e372afc89711588620700ac6169" }, "downloads": -1, "filename": "roget-0.0.1.tar.gz", "has_sig": false, "md5_digest": "437d52c5bc3edc49c5e8ada6f5599b4f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 598125, "upload_time": "2015-03-24T19:04:42", "url": "https://files.pythonhosted.org/packages/4c/24/c39c973758c551534016d44b7e2e59e4ac6831369503c500a05a098829f0/roget-0.0.1.tar.gz" } ] }