{ "info": { "author": "Fabien Couthouis", "author_email": "fcouthouis@ensc.fr", "bugtrack_url": null, "classifiers": [], "description": "# ClassMail\n\n![alt text](classmail_logo.png \"Classmail icon\") Classmail\n\nMail classification Python library optimized for french mails in the field of insurance. Classmail was created to automate mail classification workflow in quick experiments. Developped during my internship at [Cov\u00e9a](https://www.covea.eu).\n\nClassmail provides:\n\n* **Data visualisation:** For quick data analysis, based on matplotlib and seaborn\n\n* **Mails preprocessing (cleaning):** Optimised for inasurrance purposes, with prebuilt regular expressions (in french). This configuration file can be adapted for other languages or purposes.\n\n* **Deep learning model creation (for classification):** Simple interface to build Pytorch models quickly based on [Flair](https://github.com/zalandoresearch/flair) nlp library.\n\n* **Model analysis and explainer** Simple interface with prebuilt seaborn graphs and model explainer based on [Lime](https://github.com/marcotcr/lime).\n\n\n## Quick Start\n\n### Requirements and Installation\n\nThe project is based on Python 3.7+.\nIf you do not have Python 3.6, install it first. \nThen, in your favorite virtual environment, simply do:\n\n```\npip install classmail\n```\n\n### Example Usage\n\nLet's run named entity recognition (NER) over an example sentence. All you need to do is make a `Sentence`, load \na pre-trained model and use it to predict tags for the sentence:\n\n\n* Data analysis\n ```python\n import classmail.data_visualisation.data_visualisation as dv\n\n # show class balancing graph\n dv.plot_class_balancing(data,col_text='header_body',col_label=\"COMPETENCE\", title=\"Cat\u00e9gories des mails\")\n #show most frequent bigrams\n dv.plot_word_frequencies(data['message'],ngram=2,words_nb=20)\n #plot a wordcloud with most frequent terms in body\n dv.plot_wordcloud(data['body'])\n ```\n\n\n* Cleaning\n ```python\n from classmail.nlp.cleaning import clean_mail\n\n #create a new column in data (\"clean_text\") with preprocessed header and body\n data = clean_mail(data,\"body\",\"header\")\n ```\n\n* Model creation and training\n ```python\n from classmail.classification.trainer import Trainer\n\n trainer = Trainer()\n #generate train / test / val csv files\n trainer.prepare_data(data, col_text=\"clean_text\",col_label=\"COMPETENCE\", train_size=0.7, val_size=0.15, test_size=0.15)\n\n #create a new column in data (\"clean_text\") with preprocessed header and body\n data = clean_mail(data,\"body\",\"header\")\n\n #train a model with default parameters\n trainer.train_model(model_name=\"default_model\")\n ```\n\n* Model predictions, evaluation and explaination\n ```python\n from classmail.classification.model import Model\n\n #load our model, saved in \"ressources\" folder\n model = Model(\"ressources\\\\model_default\")\n #predictions\n predictions=model.get_predictions(X_test)\n #confusion matrix\n model.plot_confusion_matrix(pred_labels=predictions, true_labels=y_test)\n #explain one exemple at index 110\n model.visualize_one_ex(X_test,y_test,index=110,num_features=6)\n #compute most discriminants words in each category\n sorted_contributions = model.get_statistical_explanation(X_test, [\"class 1\",\"class 2\",\"class 3\"] sample_size=15)\n #plot them for first class\n model.plot_discriminant_words(sorted_contributions, \"class 1\", nb_words=15)\n ```\n\n\n## Tutorial\n\nHere is a more complete usage exemple for the mail classification task. Data cannot be provided for legislation and privacy matters.\n\n* [Tutorial : General workflow](/Tutorial.ipynb)\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Fabien-Couthouis/ClassMail", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "classmail", "package_url": "https://pypi.org/project/classmail/", "platform": "", "project_url": "https://pypi.org/project/classmail/", "project_urls": { "Homepage": "https://github.com/Fabien-Couthouis/ClassMail" }, "release_url": "https://pypi.org/project/classmail/0.1/", "requires_dist": [ "Unidecode (>=1.0.23)", "flair (>=0.4.3)", "joblib (>=0.13.2)", "lime (>=0.1.1.36)", "matplotlib (>=3.1.0)", "numpy (>=1.16.3)", "pandas (>=0.24.2)", "seaborn (>=0.9.0)", "sklearn (>=0.0)", "torch (>=1.1.0)", "tqdm (>=4.31.1)", "wordcloud (>=1.5.0)" ], "requires_python": ">=3.6", "summary": "A simple framework for automatise mail classification task", "version": "0.1" }, "last_serial": 5775359, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "5a8bad55d950db11f9d703ae1224d690", "sha256": "1c3ad5c3a57d0b6f8413bf04a6e80da02cee717f08312d89ea44ac16558833a1" }, "downloads": -1, "filename": "classmail-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "5a8bad55d950db11f9d703ae1224d690", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 30533, "upload_time": "2019-09-03T11:15:26", "url": "https://files.pythonhosted.org/packages/4c/71/0aeb5d43441a6b670cf881c0b6c2a9a72a5a361c110276b8e1008045644a/classmail-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6710ab21ee6917213d6da06757d2de33", "sha256": "a377fa900d7698f9b846a95c97bbfd866489490000b96187f4a853c244c5d594" }, "downloads": -1, "filename": "classmail-0.1.tar.gz", "has_sig": false, "md5_digest": "6710ab21ee6917213d6da06757d2de33", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 22955, "upload_time": "2019-09-03T11:15:30", "url": "https://files.pythonhosted.org/packages/88/4d/2f6f1b6daa242f6e75e10964c6a9d30ddd4a8c850dfada4814d8e081e1cb/classmail-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "5a8bad55d950db11f9d703ae1224d690", "sha256": "1c3ad5c3a57d0b6f8413bf04a6e80da02cee717f08312d89ea44ac16558833a1" }, "downloads": -1, "filename": "classmail-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "5a8bad55d950db11f9d703ae1224d690", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 30533, "upload_time": "2019-09-03T11:15:26", "url": "https://files.pythonhosted.org/packages/4c/71/0aeb5d43441a6b670cf881c0b6c2a9a72a5a361c110276b8e1008045644a/classmail-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6710ab21ee6917213d6da06757d2de33", "sha256": "a377fa900d7698f9b846a95c97bbfd866489490000b96187f4a853c244c5d594" }, "downloads": -1, "filename": "classmail-0.1.tar.gz", "has_sig": false, "md5_digest": "6710ab21ee6917213d6da06757d2de33", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 22955, "upload_time": "2019-09-03T11:15:30", "url": "https://files.pythonhosted.org/packages/88/4d/2f6f1b6daa242f6e75e10964c6a9d30ddd4a8c850dfada4814d8e081e1cb/classmail-0.1.tar.gz" } ] }