{ "info": { "author": "Jonathan Dunn", "author_email": "jdunn8@iit.edu", "bugtrack_url": null, "classifiers": [], "description": "\nc2xg 0.2\n=============\n\nComputational Construction Grammar, or c2xg, is two things: \n\n(1) A Python package for the unsupervised learning of CxG representations along with tools for vectorizing these representations for computational tasks\n\n(2) A discovery-device grammar that learns falsifiable and replicable CxGs from observed unannotated text data\n\nWhy CxGs? Constructions are grammatical entities that allow the straight-forward quantification of linguistic structure.\n\n\nInstallation\n--------------\n\n\t\tpip install c2xg\n\nor\n\n\t\tpip install \n\n\nEnvironment and Dependencies\n----------------------------------\n\nThis package is meant to run in Python 3.5 with a number of dependencies. The easiest way to maintain the necessary environment is to use Anaconda Python: https://www.continuum.io/downloads\n\nThis makes it easier to maintain the necessary environment. The package works with the dependency versions listed below. It will likely work with older versions of some packages but has not been tested with them. For example, older versions of numexpr have been known to cause issues withs pandas and may lead to lost candidates.\n\nDependencies:\n\n\tPython 3.5\n\tcytoolz 0.7.4\n\tgensim 0.12.2\n\tmatplotlib 1.5.0\n\tnumexpr 2.5\n\tnumpy 1.10.4\n\tpandas 0.18.0\n\tscipy 0.16.0\n\tsklearn 0.17\n\nUsage\n=====\nC2xG has two main classes:\n\n\tc2xg.Parameters loads and initializes the settings needed for running C2xG; these values are set in a file\n\n\tc2xg.Grammar contains the grammar resources used across all stages of the package\n\nInitialize the package with the following commands:\n\n\timport c2xg\n\tParameters = c2xg.Parameters(\"filename\")\n\nThe parameters class takes as input a string indicating the name of the parameters file. Now, run the API using the following template, where Parameters is an initialized c2xg.Parameter object:\n\n\tc2xg.learn_c2xg(Parameters)\n\tc2xg.learn_idioms(Parameters)\n\nAll functions in the API take a c2xg.Parameters object as an argument. The c2xg.Grammar object can be passed to each function or, if not passed, loaded from file.\n\nAPI\n====\n\nEach function in the API takes a Parameters object and either creates the Grammar object or loads it from the file specified in the parameters.\n\nAutomate Pipeline\n------------------\n\nlearn_c2xg\t\t\t\n\n\t\tUmbrella function for entire learning pipeline (from learn_mwes to learn_constructions).\n\nIndividual Learning Functions\n------------------------------\n\nlearn_dictionary\t\t\n\n\t\tUse GenSim to create the dictionary of semantic representations needed for c2xg.\n\nlearn_rdrpos_model\t\t\n\n\t\tUse RDRPOS Tagger Dependency to learn a new pos-tagging model.\n\nlearn_idioms\t\t\t\t\n\n\t\tUse c2xg to learn a dictionary of idioms (lexical constructions).\n\nlearn_constituents\t \t\n\n\t\tUse c2xg to learn a constituency grammar.\n\nlearn_constructions \t\n\n\t\tUse c2xg to learn a full Construction Grammar with lexical, MWE, semantic, and constituent representations.\n\nlearn_usage\t\t\t\t\n\n\t\tPrepare to use TF-IDF weighting during feature extraction.\n\nlearn_association\n\n\t\tProduce a CSV file of association measures for sequences of a given length and types of representation\n\nHelper Functions\n-----------------\n\nannotate_pos\t\t\t\n\n\t\tTokenize, pos-tag, mark emojis, and convert to CoNLL format.\n\nget_indexes\t\t\t\t\n\n\t\tGet indexes of representation types.\n\nget_candidates\t\t\t\n\n\t\tGetcandidate sequences from input files (covers MWEs, Constituents, and Constructions).\n\nget_association\t\t\t\n\n\t\tGet vector of association values for each candidate.\n\nget_vectors\t\t\t\t\n\n\t\tGet vector of CxG usage for input files.\n\nEvaluation Functions\n----------------------\n\nexamples_constituents\t\n\n\t\tGet examples of predicted constituents by type. (*Not stable in v 0.2)\n\nexamples_constructions\t\n\n\t\tGet examples of each predicted construction. (*Not stable in v 0.2)\n\nCommand-Line Usage\n==================\n\n\t(1) Begin a Python interpreter\n\n\t(2) Import the package:\n\n\t\t\timport c2xg\n\n\t(3) Initialize the parameters object:\n\n\t\t\tParameters = c2xg.Parameters(\"filename\")\n\n\t(4a) Run the API, loading grammar objects from disk:\n\n\t\t\tc2xg.learn_constituents(Parameters)\n\n\t(4b) Run the API, initializing and then passing grammar objects:\n\n\t\t\tGrammar = c2xg.Grammar()\n\t\t\tc2xg.learn_constituents(Parameters, Grammar)\t\n\n\nInput Formats\n===================\n\nThis section describes the input formats for the different components.\n\n(1) Creating Semantic Dictionary\n\n\tInput: Unannotated text, one sentence per line. Tokenization and emoji identification are performed on each line.\n\n(2) Creating Models of Grammar and Usage\n\n\tInput: Annotated: CoNLL format of tab-separate fields [Word-Form, Lemma, POS, Index]. \n\tUse to assign ids to documents.\n\n\tInput: Unannotated: Plain text with line breaks for documents / sentences as desired. \n\t[In both cases, each line is assumed to be a \"text\" or the containing unit of analysis; instances can be separated by the \"|\" character for aggregation]\n\n(3) Extracting Feature Vectors\n\n\tInput with Meta-Data: \t\tField:Value,Field:Value\\tText\n\tInput without Meta-Data:\tPlain text with line breaks (\\n) for documents or sentences depending on the level of analysis.\n\n\nFeature Extraction\n=========================\n\nGiven a language-specific CxG, the get_vectors and learn_usage functions convert that grammar into a vector representation of texts or sentences (i.e., one unit per line in the input files). There are two modes and three quantification methods for creating vectors:\n\n\tvector_scope = \"CxG+Units\": Constructions and lexical / POS / semantic features\n\tvector_scope = \"Lexical\": Only lexical features\n\tvector_scope = \"CxG\": Only construction features\t\n\n\texpand_check == True: Allow complex constituents to fill slots in extracted features\n\trelative_freq == True: Quantify using the relative frequency of the feature in given sentence or text (as negative logarithms)\n\trelative_freq == False: Quantify using unadjusted raw frequency of the feature\n\tuse_centroid == True: Extract vectors with centriod normalization learned using learn_usage. This is functionally equivalent to TF-IDF scaling\n\n\tCentroid normalization first finds the probability of a given feature in the background corpus. This is stored after running learn_usage in separate centroid_df models for the full grammar and for the lexical-only features. During extraction, if centroids are used for representation, this is converted into negative logarithms of the inverted joint probability of each feature occuring as many times as it does in a document.\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://www.c2xg.io", "keywords": "grammar induction,unsupervised language processing,construction grammar,cognitive linguistics", "license": "LGPL 3.0", "maintainer": "", "maintainer_email": "", "name": "c2xg", "package_url": "https://pypi.org/project/c2xg/", "platform": "", "project_url": "https://pypi.org/project/c2xg/", "project_urls": { "Homepage": "http://www.c2xg.io" }, "release_url": "https://pypi.org/project/c2xg/0.22/", "requires_dist": [ "cytoolz", "gensim", "matplotlib", "numexpr", "numpy", "pandas", "scipy", "seaborn", "sklearn" ], "requires_python": "", "summary": "Learn, vectorize, and annotate Construction Grammars", "version": "0.22" }, "last_serial": 2728765, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "f1638a0da94a59c941ea90a9df14cb97", "sha256": "0bb9dabfee576065b7e45421ace638f2ce059734c81ecc9c79fcc9d8756351c6" }, "downloads": -1, "filename": "c2xg-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "f1638a0da94a59c941ea90a9df14cb97", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8385760, "upload_time": "2017-03-05T16:47:03", "url": "https://files.pythonhosted.org/packages/92/11/c73ff29ddfa0eb3c35f156e93e24fdf834c4a437020f5f530d3b10c355c5/c2xg-0.2-py3-none-any.whl" } ], "0.21": [ { "comment_text": "", "digests": { "md5": "effa0dbfd9e588969800c0ef4562b0f2", "sha256": "40579bb62ba5643c9d6cf669461217d7f06339743c6cf9536777f783ca61fb51" }, "downloads": -1, "filename": "c2xg-0.21-py3-none-any.whl", "has_sig": false, "md5_digest": "effa0dbfd9e588969800c0ef4562b0f2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13555036, "upload_time": "2017-03-18T17:57:43", "url": "https://files.pythonhosted.org/packages/eb/84/66a1287616def496f9729b5b743b9fb3d094e45992008db74c7806509452/c2xg-0.21-py3-none-any.whl" } ], "0.22": [ { "comment_text": "", "digests": { "md5": "66f2d205b2c0ca61d78f6b95a0c5390d", "sha256": "1b6feffccf62b409697a6b61834b6d358ae92dbd47b9886e3862541d92855f42" }, "downloads": -1, "filename": "c2xg-0.22-py3-none-any.whl", "has_sig": false, "md5_digest": "66f2d205b2c0ca61d78f6b95a0c5390d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13555212, "upload_time": "2017-03-24T17:13:08", "url": "https://files.pythonhosted.org/packages/f9/d0/6af3818b211705279db062db9f082674e6aa4ef06866f0c317befcd608dc/c2xg-0.22-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "66f2d205b2c0ca61d78f6b95a0c5390d", "sha256": "1b6feffccf62b409697a6b61834b6d358ae92dbd47b9886e3862541d92855f42" }, "downloads": -1, "filename": "c2xg-0.22-py3-none-any.whl", "has_sig": false, "md5_digest": "66f2d205b2c0ca61d78f6b95a0c5390d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13555212, "upload_time": "2017-03-24T17:13:08", "url": "https://files.pythonhosted.org/packages/f9/d0/6af3818b211705279db062db9f082674e6aa4ef06866f0c317befcd608dc/c2xg-0.22-py3-none-any.whl" } ] }