{ "info": { "author": "Julian Hough, Tom Gurion, David Schlangen", "author_email": "julianchough@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Science/Research", "Intended Audience :: Telecommunications Industry", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Topic :: Multimedia :: Sound/Audio :: Speech", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Human Machine Interfaces" ], "description": "# Deep Learning Driven Incremental Disfluency Detection\n\nCode for Deep Learning driven incremental disfluency detection and related dialogue processing tasks.\n\n## Functionality ##\n\nThe deep disfluency tagger consumes words (and optionally, POS tags and word timings) word-by-word and outputs xml-style tags for each disfluent word, symbolising each part of any repair or edit term detected. The tags are:\n\n`` - an edit term word, not necessarily inside a repair structure\n\n`` - reparandum start word for repair with ID number N\n\n`` - mid-reparandum word for repair N\n\n`` - interregnum word for repair N\n\n`` - repair onset word for repair N (where N is normally the 0-indexed position in the sequence)\n\n`` - mid-repair word for repair N\n\n`` - repair end word for substitution or repetition repair N\n\n`` - repair end word for a delete repair N\n\nEvery repair detected or in the gold standard will have at least the `rms`, `rps` and `rpn`/`rpndel` tags, but the others may not be present.\n\nSome example output on Switchboard utterances is as below, where `` is the default tag for a fluent word:\n\n```\n\t4617:A:15:h\t\t1\tuh \tUH\t \n \t\t\t\t2\ti\t PRP\t \n \t\t\t\t3\tdont\t \tVBPRB\t \n \t\t\t\t4\tknow\t \tVB\t \n\n\t4617:A:16:sd\t\t1\tthe \tDT \n \t\t\t\t2\tthe\t DT\t \n \t\t\t\t3\tthings\t \tNNS\t \n \t\t\t\t4\tthey\t \tPRP\t \n \t\t\t\t5\tasked\t \tVBD \n \t\t\t\t6\tto\t TO\t \n \t\t\t\t7\ttalk\t \tVB\t \n \t\t\t\t8\tabout\t \tIN \n \t\t\t\t9\twere\t \tVBD\t \n \t\t\t\t10\twhether\t \tIN\t \n \t\t\t\t11\tthe\t DT\t \n \t\t\t\t12\tuh\t UH\t \n \t\t\t\t13\twhether\t \tIN\t \n \t\t\t\t14\tthe\t DT\t \n \t\t\t\t15\tjudge\t \tNN\t \n \t\t\t\t16\tshould\t \tMD\t \n \t\t\t\t17\tbe\t VB\t \n \t\t\t\t18\tthe\t DT\t \n \t\t\t\t19\tone\t NN\t \n \t\t\t\t20\tthat\t \tWDT\t \n \t\t\t\t21\tdoes\t \tVBZ\t \n \t\t\t\t22\tthe\t DT\t \n \t\t\t\t23\tuh\t UH\t \n\t\t\t\t24\tsentencing\tNN\t \n```\n\n## Set up and basic use ##\n\nTo run the code here you need to have `Python 2.7` installed, and also [`pip`](https://pip.readthedocs.org/en/1.1/installing.html) for installing the dependencies.\n\nYou need to run the below from the command line from inside this folder (depending on your user status, you may need to prefix the below with `sudo` or use a virtual environment):\n\n`pip install -r requirements.txt`\n\nIf you just want to use the tagger off-the-shelf see the usage in `demo.py` or the notebook `demo.ipynb`.\nMake sure this repository is on your system path if you want to use it in python more generally.\n\n### Use with live ASR ###\n\nIf you would like to run a live ASR version using the IBM Watson speech-to-text recognizer, you need to also do the following: \n\n1. Install [PortAudio](http://www.portaudio.com/) - a free, cross-platform, open-source, audio I/O library. Install it first.\n2. Prepare your credentials from IBM Watson (free trials are available):\n * Visit the [IBM Watson projects](https://console.bluemix.net/developer/watson/projects) page.\n * Choose your project.\n * Copy the credentials to `credentials.json` into this directory.\n\nThe ASR live streaming demo at `asr_demo.py` can then be run and you should be able to see the recognized words, timings, POS tags, and disfluency tags appearing in real time as you speak into your microphone.\n\n\n## Running experiments ##\n\nThe code can be used to run the experiments on Recurrent Neural Networks (RNNs) and LSTMs from:\n\n```\nJulian Hough and David Schlangen. Joint, Incremental Disfluency Detection and Utterance Segmentation from Speech. Proceedings of EACL 2017. Valencia, Spain, April 2017.\n```\n\nPlease cite [the paper](http://aclweb.org/anthology/E17-1031) if you use this code.\n\nIf you are using our pretrained models as in the usage in `demo.py` you can simply run `deep_disfluency/experiments/EACL_2017.py`, ensuring the boolean variables at the top of the file to:\n\n```python\ndownload_raw_data = False\ncreate_disf_corpus = False\nextract_features = False\ntrain_models = False\ntest_models = True\n```\n\nIf that level of reproducibility does not satisfy you, you can set all those boolean values to `True` (NB: be wary that training the models for each experiment in the script can take 24hrs+ even with a decent GPU).\n\nOnce the script has been run, running the Ipython notebook at `deep_disfluency/experiments/analysis/EACL_2017/EACL_2017.ipynb` should process the outputs and give similar results to those recorded in the paper.\n\n*Acknowledgments*\n\nThis basis of these models is the disfluency and dialogue act annotated Switchboard corpus, based on that provided by Christopher Potts's 2011 Computational Pragmatics course ([[at http://compprag.christopherpotts.net/swda.html]]) or at [[https://github.com/cgpotts/swda]]. Here we use Julian Hough's fork which corrects some of the POS-tags and disfluency annotation:\n\n[[https://github.com/julianhough/swda.git]]\n\nThe second basis is the word timings data for switchboard, which is a corrected version with word timing information to the Penn Treebank version of the MS alignments, which can be downloaded at:\n\n[[http://www.isip.piconepress.com/projects/switchboard/releases/ptree_word_alignments.tar.gz]]\n\n## Extra: using the Switchboard audio data ##\n\nIf you are satisfied just using lexical/POS/Dialogue Acts and word timing data alone, the above are sufficient, however if you want to use other acoustic data or generate ASR results from scratch, you must have access to the Switchboard corpus audio release. This is available for purchase from:\n\n[[https://catalog.ldc.upenn.edu/ldc97s62]]\n\nFrom the switchboard audio release, copy or move the folder which contains the .sph files (called `swbd1`) to within the `deep_disfluency/data/raw_data/` folder. Note this is very large at around 14GB.\n\n## Future: Creating your own data ##\n\nTraining data is created through creating dialogue matrices (one per speaker in each dialogue), whereby the format of these for each row in the matrix is as follows, where `,` indicates a new column, and `...` means there are potentially multiple columns:\n\n`word index, pos index, word duration, acoustic features..., lm features..., label`\n\nThere are methods for creating these in the `deep_disfluency/corpus` and `deep_disfluency/feature_extraction` modules.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/dsg-bielefeld/deep_disfluency", "keywords": "disfluency detection utterance segmentation deep machine learning", "license": "", "maintainer": "", "maintainer_email": "", "name": "deep-disfluency", "package_url": "https://pypi.org/project/deep-disfluency/", "platform": "", "project_url": "https://pypi.org/project/deep-disfluency/", "project_urls": { "Homepage": "https://github.com/dsg-bielefeld/deep_disfluency" }, "release_url": "https://pypi.org/project/deep-disfluency/0.0.1/", "requires_dist": [ "numpy", "scipy", "pandas", "theano", "scikit-learn", "gensim", "nltk", "python-crfsuite", "fluteline", "watson-streaming" ], "requires_python": "", "summary": "Deep Learning systems for training and testing disfluency detection and related tasks on speech data.", "version": "0.0.1" }, "last_serial": 4300539, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "855e281ec8f54e168434bbfdc7133c9d", "sha256": "8e7d30e19772ddb1033d4f88aae2905f507efc2e35788396e90d0f8aec29a79e" }, "downloads": -1, "filename": "deep_disfluency-0.0.1-py2-none-any.whl", "has_sig": false, "md5_digest": "855e281ec8f54e168434bbfdc7133c9d", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 61098560, "upload_time": "2018-09-22T21:14:13", "url": "https://files.pythonhosted.org/packages/34/31/488c86d4d22d10713cf46cc810e74730134a9d0b7c6fa1d0742bf709f4e9/deep_disfluency-0.0.1-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c902d934734b79b424ed0721fe300b31", "sha256": "664d4ba2dc59a13d80fcfd41d5209cffc705f37a99ef1d9a3fb755a7280b49ed" }, "downloads": -1, "filename": "deep_disfluency-0.0.1.tar.gz", "has_sig": false, "md5_digest": "c902d934734b79b424ed0721fe300b31", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 60529883, "upload_time": "2018-09-22T21:14:53", "url": "https://files.pythonhosted.org/packages/a5/da/9b604a71959a341956bca98f11d6fa87dd6e879196292ec2cc7fa7915ef6/deep_disfluency-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "855e281ec8f54e168434bbfdc7133c9d", "sha256": "8e7d30e19772ddb1033d4f88aae2905f507efc2e35788396e90d0f8aec29a79e" }, "downloads": -1, "filename": "deep_disfluency-0.0.1-py2-none-any.whl", "has_sig": false, "md5_digest": "855e281ec8f54e168434bbfdc7133c9d", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 61098560, "upload_time": "2018-09-22T21:14:13", "url": "https://files.pythonhosted.org/packages/34/31/488c86d4d22d10713cf46cc810e74730134a9d0b7c6fa1d0742bf709f4e9/deep_disfluency-0.0.1-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c902d934734b79b424ed0721fe300b31", "sha256": "664d4ba2dc59a13d80fcfd41d5209cffc705f37a99ef1d9a3fb755a7280b49ed" }, "downloads": -1, "filename": "deep_disfluency-0.0.1.tar.gz", "has_sig": false, "md5_digest": "c902d934734b79b424ed0721fe300b31", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 60529883, "upload_time": "2018-09-22T21:14:53", "url": "https://files.pythonhosted.org/packages/a5/da/9b604a71959a341956bca98f11d6fa87dd6e879196292ec2cc7fa7915ef6/deep_disfluency-0.0.1.tar.gz" } ] }