{ "info": { "author": "Martin van Harmelen", "author_email": "Martin@vanharmelen.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3.6" ], "description": "# naf2conll\nScript to convert coreference data in NAF format to [CoNLL format][].\n\n!! NB !! At the moment, this script only supports the following columns:\n\n - 1: Document ID\n - 3: Word number\n - 4: Word itself\n - 12: Coreference\n\nThe following CoNLL columns are supported by NAF, but are not (yet) processed (correctly) by this script:\n\n - 5: POS tag\n - 6: constituency tree\n - ...?\n - 11: named entities\n\nSee [CoNLL-specification.md][CoNLL format] for an extensive description of the CoNLL format.\n\n\n## Usage\n\n### `naf2conll.py`\nTo automatically find all (sub)folders that contain NAF files and convert all data in those folders, run:\n\n```sh\nnaf2conll.py path/to/output_dir -d path/to/some/folder [-d path/to/another/folder ...]\n```\n\nTo only convert one file, run:\n```sh\nnaf2conll.py path/to/output.conll path/to/input.naf\n```\n\n## Columns of CoNLL output\nBy default only Column 1, 3, 4 and 12 are output.\n\nIf you choose to output more columns, the following values and place-holders are used.\n\n| Column | Description | Value | Conform CoNLL specification?\n| ---: | --- | --- | ---\n| **1** | Document ID | file path without extension | Yes\n| 2 | Part number | `0` | Yes\n| **3** | Word number | generated | Yes\n| **4** | Word itself | extracted from text layer of NAF | Yes\n| 5 | POS | `[POS]` | No\n| 6 | Parse bit | `*` | No\n| 7 | Predicate lemma | `-` | Yes\n| 8 | Predicate Frameset ID | `-` | Yes\n| 9 | Word sense | `-` | Yes\n| 10 | Speaker/Author | `UNKNOWN` | ???\n| 11 | Named Entities | `*` | Yes\n| - | Predicate Arguments | None: column(s) left out entirely | Yes, conform example in CoNLL 2012\n| **12** | Coreference | extracted from coreference layer of NAF (ISSUE! \\[1\\]) | Yes\n\n\\[1\\]:\n The reference spans are not closed in the correct order if they end at the same word. The following is an example of output from `naf2conll.py`:\n\n (10\n -\n (52|(55\n 52)\n -\n 10)|55)|(133)\n\n While pedantically correct would be:\n\n (10\n -\n (55|(52\n 52)\n -\n (133)|55)|10)\n\n\n# Issues\n\n - [ ] 'on_missing' config key is not validated before use\n - [ ] Raise an error when there is no coref layer in `extract_coref_sets`\n\n[CoNLL format]: https://github.com/cltl/FormatConversions/blob/master/mmax2conll/CoNLL-specification.md", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/cltl/FormatConversions/tree/master/naf2conll", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "naf2conll", "package_url": "https://pypi.org/project/naf2conll/", "platform": "", "project_url": "https://pypi.org/project/naf2conll/", "project_urls": { "Homepage": "https://github.com/cltl/FormatConversions/tree/master/naf2conll" }, "release_url": "https://pypi.org/project/naf2conll/1.0.1/", "requires_dist": null, "requires_python": "", "summary": "Script to convert files in NAF format to CoNLL format", "version": "1.0.1" }, "last_serial": 5144667, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "e23c6446e3a90a99fe11be2d9884a0ed", "sha256": "3012f115fea148e6ab2187ddf4c5dd1f197b7089d04105100c4a0d685d5e2982" }, "downloads": -1, "filename": "naf2conll-1.0.0.zip", "has_sig": false, "md5_digest": "e23c6446e3a90a99fe11be2d9884a0ed", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19337, "upload_time": "2018-07-12T10:30:29", "url": "https://files.pythonhosted.org/packages/be/a6/54f527133b37e626f2ca0d4a219506dbf36fb9946fb70db1836651d982b1/naf2conll-1.0.0.zip" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "e053ee57e155d46db2fadded6df354da", "sha256": "caeb3b9474f49ee2cf61e6955a36a3c44784b96ef282d0d6907887d2dc00dc63" }, "downloads": -1, "filename": "naf2conll-1.0.1.zip", "has_sig": false, "md5_digest": "e053ee57e155d46db2fadded6df354da", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19373, "upload_time": "2019-04-15T13:15:36", "url": "https://files.pythonhosted.org/packages/52/6e/3641da068c13edf031b9735d5af145ec89cd9e089cd97c96138b75d27936/naf2conll-1.0.1.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e053ee57e155d46db2fadded6df354da", "sha256": "caeb3b9474f49ee2cf61e6955a36a3c44784b96ef282d0d6907887d2dc00dc63" }, "downloads": -1, "filename": "naf2conll-1.0.1.zip", "has_sig": false, "md5_digest": "e053ee57e155d46db2fadded6df354da", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19373, "upload_time": "2019-04-15T13:15:36", "url": "https://files.pythonhosted.org/packages/52/6e/3641da068c13edf031b9735d5af145ec89cd9e089cd97c96138b75d27936/naf2conll-1.0.1.zip" } ] }