{ "info": { "author": "Matthew Martin", "author_email": "products@mkylemartin.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Operating System :: OS Independent", "Programming Language :: Python :: 3.7" ], "description": "# essay_scorer\n\nAn automated essay scorer for english language learner essays.\n\n## Description\n\nExtracts a set of linguistic features and compares them to a model trained on over 3000 essays to predict the score on a 40 point scale. \n\n## Installation\n\n`pip install essay-scorer`\n\n## Usage/Tutorial\n\n### Command line usage\n\nAccepts either a directory of `.txt` files or a single `.txt` file.\n\n(You'll have to locate the bin where the script is saved by pip to use the command line like this.)\n\n**For a directory of text files**\n`python3 essay_scorer.py path/to/essays/`\n\n*Bonus*\n`python3 essay_scorer.py path/to/essays/ >> output.csv`\n\n**For a single text file**\n`python3 essay_scorer.py path/to/essays/test.txt`\n\n### Importing in a python script\n\n```\nimport essay_scorer\n\ntext = open('test.txt', 'r').read()\nfeat_set = essay_scorer.get_feats(text)\npred_score = essay_scorer.gbr_model.predict(feats)[0]\nprint('predicted score', pred_score)\n```\n## About\nThis automated essay scoring system is based on [Travis Moore's master's thesis work](https://github.com/travismoore3/aes_system).\n\nMoore's master's thesis, which lays the theoretical groundwork for this project can be found [here](https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=7835&context=etd).\n\nThis model is best used on english learner essays that are between 140-300 words. Due to the distribution of scores in the model, this model tends to make better predictions around the median of actual scores from the dataset which seems to be around 20. There is more variance in predictions of outlier scores.\n\nThe model itself is a `GradientBoostingRegressor` model. Here are the parameters and results of it testing itself on its own data:\n\n```\n`model.fit` results:\nGradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,\n learning_rate=0.02, loss='ls', max_depth=4, max_features=0.3,\n max_leaf_nodes=None, min_impurity_decrease=0.0,\n min_impurity_split=None, min_samples_leaf=9,\n min_samples_split=2, min_weight_fraction_leaf=0.0,\n n_estimators=500, n_iter_no_change=None, presort='auto',\n random_state=0, subsample=1.0, tol=0.0001,\n validation_fraction=0.1, verbose=0, warm_start=False)\nMean Absolute Error:\nTrain error:\t2.360482423992901\nTest error:\t2.32169958721344\n\nr2 scores of both train/test:\nr2_train:\t0.8341340712062068\nr2_test:\t0.8575492864872207\n```\n\n## License\n\nGNU GPLv3 - see LICENSE file for details.2\n\n## Contact\n\n`@mkylemartin` on Twitter, GitHub\n\n\n## Note\n\nThe pickled data file in this version does not include the age_bracket or language_id. The 49 features extracted are the following (in alphabetical order).\n```\n['ari', # readability index\n 'avg_len_word', \t# average word length\n 'cli', # readability measure\n 'conjunctions', \n 'cttr', # corrected type to token ratio\n 'dcrs', # dale chall readability score\n 'determiners', \n 'dw', # difficult words\n 'english_usage', # number of english words used\n 'fkg', # flesch_kincaid_grade\n 'fre', # flesch_reading_ease\n 'function_ttr', \n 'gf', # gunning_fog\n 'grammar_chk', # checks for 2000+ grammar errors\n 'lwf', # linsear_write_formula\n 'n_bigram_lemma_types', \n 'n_bigram_lemmas', \n 'n_trigram_lemma_types',\n 'n_trigram_lemmas', \n 'ncontent_words', \n 'nfunction_words', \n 'nlemma_types',\n 'nlemmas', \n 'noun_ttr', \n 'num_tokens', \n 'num_types', \n 'pct_rel_trigrams',\n 'pct_transitions', \n 'rank_avg', \n 'rank_total', \n 's1', # negation stages (the next several features)\n 's1a', \n 's1b', \n 's1c',\n 's2', \n 's2a', \n 's2b', \n 's2c', \n 's3', \n 's3a', \n 's3b', \n 's3c', \n 's4', \n 's4a',\n 's4b', \n 's4c', \n 'sent_density', # average words per sentence\n 'spelling_perc', # what percentage of words spelled correctly\n 'ttr' # type token ratio ]\n```\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/mkylemartin/essay_scorer", "keywords": "automated essay scorer,essay,gradient boosting regressor,linguistics,feature extraction", "license": "", "maintainer": "", "maintainer_email": "", "name": "essay-scorer", "package_url": "https://pypi.org/project/essay-scorer/", "platform": "", "project_url": "https://pypi.org/project/essay-scorer/", "project_urls": { "Homepage": "https://github.com/mkylemartin/essay_scorer" }, "release_url": "https://pypi.org/project/essay-scorer/1.0/", "requires_dist": [ "progress" ], "requires_python": "~=3.7", "summary": "An automated essay scorer for english language learner essays.", "version": "1.0" }, "last_serial": 4954953, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "83e2b4da446346b1c827fb39440a7283", "sha256": "2930fdb9afb0dea2197106beea4843aa3e1d6ee4bbe88f2693d2b36eb79ff6ce" }, "downloads": -1, "filename": "essay_scorer-1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "83e2b4da446346b1c827fb39440a7283", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": "~=3.7", "size": 15216, "upload_time": "2019-03-18T16:42:21", "url": "https://files.pythonhosted.org/packages/4b/63/c1ef195b3928414093015fbb5c10534cf43160135a536d2fe48da5960b06/essay_scorer-1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8f548502abe5b2e1cf2a0ae00873432f", "sha256": "031c2b4c3ede6a1b4875cb0bb34349fa2dfc3921d6cf150fcffb8a86832793a6" }, "downloads": -1, "filename": "essay_scorer-1.0.tar.gz", "has_sig": false, "md5_digest": "8f548502abe5b2e1cf2a0ae00873432f", "packagetype": "sdist", "python_version": "source", "requires_python": "~=3.7", "size": 3366, "upload_time": "2019-03-18T16:42:23", "url": "https://files.pythonhosted.org/packages/2a/76/30159f51786c5751434efaa01eeb5977f30de58c2130b4443e68a9e4ca3e/essay_scorer-1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "83e2b4da446346b1c827fb39440a7283", "sha256": "2930fdb9afb0dea2197106beea4843aa3e1d6ee4bbe88f2693d2b36eb79ff6ce" }, "downloads": -1, "filename": "essay_scorer-1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "83e2b4da446346b1c827fb39440a7283", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": "~=3.7", "size": 15216, "upload_time": "2019-03-18T16:42:21", "url": "https://files.pythonhosted.org/packages/4b/63/c1ef195b3928414093015fbb5c10534cf43160135a536d2fe48da5960b06/essay_scorer-1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8f548502abe5b2e1cf2a0ae00873432f", "sha256": "031c2b4c3ede6a1b4875cb0bb34349fa2dfc3921d6cf150fcffb8a86832793a6" }, "downloads": -1, "filename": "essay_scorer-1.0.tar.gz", "has_sig": false, "md5_digest": "8f548502abe5b2e1cf2a0ae00873432f", "packagetype": "sdist", "python_version": "source", "requires_python": "~=3.7", "size": 3366, "upload_time": "2019-03-18T16:42:23", "url": "https://files.pythonhosted.org/packages/2a/76/30159f51786c5751434efaa01eeb5977f30de58c2130b4443e68a9e4ca3e/essay_scorer-1.0.tar.gz" } ] }