{ "info": { "author": "icoxfog417", "author_email": "icoxfog417@yahoo.co.jp", "bugtrack_url": null, "classifiers": [], "description": "# chaFiC: chakki Financial Report Corpus\n\n[![PyPI version](https://badge.fury.io/py/chafic.svg)](https://badge.fury.io/py/chaFiC)\n[![Build Status](https://travis-ci.org/chakki-works/chaFiC.svg?branch=master)](https://travis-ci.org/chakki-works/chaFiC)\n[![codecov](https://codecov.io/gh/chakki-works/chaFiC/branch/master/graph/badge.svg)](https://codecov.io/gh/chakki-works/chaFiC)\n\nWe organized Japanese financial reports to encourage applying NLP techniques to financial analytics.\n\n## Dataset\n\nYou can download dataset by command line tool.\n\n```\npip install chafic\n```\n\nPlease refer the usage by `--` (using [fire](https://github.com/google/python-fire)).\n\n```\nchafic --\n```\n\nExample command.\n\n```bash\n# Download raw file version dataset of 2014.\nchafic download --kind F --year 2014\n\n# Extract business.overview_of_result part of TIS.Inc (sec code=3626).\nchafic parse business.overview_of_result --sec_code 3626\n\n# Tokenize text by Janome (Janome or Sudachi is supported).\npip install janome\nchafic tokenize --tokenizer janome\n\n# Show tokenized result (words are separated by \\t).\nhead -n 5 data/processed/2014/docs/S100552V_business_overview_of_result_tokenized.txt\n1 \u3010 \u696d\u7e3e \u7b49 \u306e \u6982\u8981 \u3011\n( 1 ) \u696d\u7e3e\n\u5f53 \u9023\u7d50 \u4f1a\u8a08 \u5e74\u5ea6 \u306b\u304a\u3051\u308b \u6211\u304c\u56fd \u7d4c\u6e08 \u306f \u3001 \u6d88\u8cbb \u7a0e\u7387 \u5f15\u4e0a\u3052 \u306b \u4f34\u3046 \u99c6\u3051\u8fbc\u307f \u9700\u8981 \u306e \u53cd\u52d5 \u3084 \u6d77\u5916 \u666f\u6c17 \u52d5\u5411 \u306b\u5bfe\u3059\u308b \u5148\u884c\u304d \u61f8\u5ff5 \u7b49 \u304b\u3089 \u5f31\u3044 \u52d5\u304d \u3082 \u898b \u3089\u308c \u307e\u3057 \u305f \u304c \u3001 \u4f01\u696d \u53ce\u76ca \u306e \u6539\u5584 \u7b49 \u306b\u3088\u308a \u5168\u4f53 ...\n```\n\n* About the parsable part, please refer the [`edinet-python`](https://github.com/chakki-works/edinet-python#2-extract-contents-from-xbrl).\n\n\n### Raw dataset file\n\nThe corpora are separated to each financial years.\n\n| fiscal_year | Raw file version (F) | Text extracted version (E) | \n|-------------|-------------------|-----------------|\n| 2014 | [.zip (9.3GB)](https://s3-ap-northeast-1.amazonaws.com/chakki.esg.financial.jp/dataset/release/chakki_esg_financial_2014.zip) | [.zip (269.9MB)](https://s3-ap-northeast-1.amazonaws.com/chakki.esg.financial.jp/dataset/release/chakki_esg_financial_extracted_2014.zip) | \n| 2015 | [.zip (9.8GB)](https://s3-ap-northeast-1.amazonaws.com/chakki.esg.financial.jp/dataset/release/chakki_esg_financial_2015.zip) | [.zip (291.1MB)](https://s3-ap-northeast-1.amazonaws.com/chakki.esg.financial.jp/dataset/release/chakki_esg_financial_extracted_2015.zip) | \n| 2016 | [.zip (10.2GB)](https://s3-ap-northeast-1.amazonaws.com/chakki.esg.financial.jp/dataset/release/chakki_esg_financial_2016.zip) | [.zip (334.7MB)](https://s3-ap-northeast-1.amazonaws.com/chakki.esg.financial.jp/dataset/release/chakki_esg_financial_extracted_2016.zip) | \n| 2017 | [.zip (9.1GB)](https://s3-ap-northeast-1.amazonaws.com/chakki.esg.financial.jp/dataset/release/chakki_esg_financial_2017.zip) | [.zip (309.4MB)](https://s3-ap-northeast-1.amazonaws.com/chakki.esg.financial.jp/dataset/release/chakki_esg_financial_extracted_2017.zip) | \n| 2018 | [.zip (10.5GB)](https://s3-ap-northeast-1.amazonaws.com/chakki.esg.financial.jp/dataset/release/chakki_esg_financial_2018.zip) | [.zip (260.9MB)](https://s3-ap-northeast-1.amazonaws.com/chakki.esg.financial.jp/dataset/release/chakki_esg_financial_extracted_2018.zip) | \n\n\n## Statistics\n\n| fiscal_year | number_of_reports | has_csr_reports | has_financial_data | has_stock_data | \n|-------------|-------------------|-----------------|--------------------|----------------| \n| 2014 | 3,724 | 92 | 3,583 | 3,595 | \n| 2015 | 3,870 | 96 | 3,725 | 3,751 | \n| 2016 | 4,066 | 97 | 3,924 | 3,941 | \n| 2017 | 3,578 | 89 | 3,441 | 3,472 | \n| 2018 | 3,513 | 70 | 2,893 | 3,413 | \n\n* financial data is from [\u6c7a\u7b97\u77ed\u4fe1\u60c5\u5831](http://db-ec.jpx.co.jp/category/C027/).\n * We use non-cosolidated data if it exist.\n* stock data is from [\u6708\u9593\u76f8\u5834\u8868\uff08\u5185\u56fd\u682a\u5f0f\uff09](http://db-ec.jpx.co.jp/category/C021/STAT1002.html).\n * `close` is fiscal period end and `open` is 1 year before of it.\n\n## Content\n\n### Raw file version (`--kind F`)\n\nThe structure of dataset is following.\n\n```\nchakki_esg_financial_{year}.zip\n\u2514\u2500\u2500{year}\n \u251c\u2500\u2500 documents.csv\n \u2514\u2500\u2500 docs/\n```\n\n`docs` includes XBRL and PDF file.\n\n* XBRL file of annual reports (files are retrieved from [EDINET]).\n* PDF file of CSR reports (additional content).\n\n`documents.csv` has metadata like following.\n\n* edinet_code: `E0000X`\n* filer_name: `XXX\u682a\u5f0f\u4f1a\u793e`\n* fiscal_year: `201X`\n* fiscal_period: `FY`\n* doc_path: `docs/S000000X.xbrl`\n* csr_path: `docs/E0000X_201X_JP_36.pdf`\n\n### Text extracted version (`--kind E`)\n\nText extracted version includes `txt` files that match each part of an annual report. \nThe extracted parts are defined at [`edinet-python`](https://github.com/chakki-works/edinet-python#2-extract-contents-from-xbrl).\n\n```\nchakki_esg_financial_{year}_extracted.zip\n\u2514\u2500\u2500{year}\n \u251c\u2500\u2500 documents.csv\n \u2514\u2500\u2500 docs/\n```", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/chakki-works/chaFiC", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "chafic", "package_url": "https://pypi.org/project/chafic/", "platform": "", "project_url": "https://pypi.org/project/chafic/", "project_urls": { "Homepage": "https://github.com/chakki-works/chaFiC" }, "release_url": "https://pypi.org/project/chafic/0.1.10/", "requires_dist": null, "requires_python": "", "summary": "chakki Financial Report Corpus", "version": "0.1.10" }, "last_serial": 5889304, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "92a51dcf88282fe4f3a36a0a2f33a4d0", "sha256": "c51d6fc2a04abbbe3b0c86efd6e40d65d8e0df1cacc63ab976c3b34e49654493" }, "downloads": -1, "filename": "chafic-0.1.0.tar.gz", "has_sig": false, "md5_digest": "92a51dcf88282fe4f3a36a0a2f33a4d0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3725, "upload_time": "2019-09-25T06:22:20", "url": "https://files.pythonhosted.org/packages/8f/b8/bf72317ce9d4cb3273e1fe036fbd5cadd3074136d35142829e3636acb002/chafic-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "c75f483d4fcaab810a2cdb94221a5a07", "sha256": "f9f201ee93abe095821b69be8b0818161465044260dbb5a25fc5c2f0feae8e29" }, "downloads": -1, "filename": "chafic-0.1.1.tar.gz", "has_sig": false, "md5_digest": "c75f483d4fcaab810a2cdb94221a5a07", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3717, "upload_time": "2019-09-25T06:26:18", "url": "https://files.pythonhosted.org/packages/ca/95/1d03ffc75ed6e1e93103cb48a460f08ba1385d806c1a6de54e4f5bd4ae57/chafic-0.1.1.tar.gz" } ], "0.1.10": [ { "comment_text": "", "digests": { "md5": "34303cb3a015df81a2d9854a478bf3e9", "sha256": "bf7b429a0e2577c06f09a87757921089005cccb09a494f617289be6de7b90a15" }, "downloads": -1, "filename": "chafic-0.1.10.tar.gz", "has_sig": false, "md5_digest": "34303cb3a015df81a2d9854a478bf3e9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9238, "upload_time": "2019-09-26T08:35:13", "url": "https://files.pythonhosted.org/packages/1e/ba/c8562f57289be992f91226efcecb1552899b02772c34d2d95cc3aaa58561/chafic-0.1.10.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "af0e581754b7824a8711bfcbbde96246", "sha256": "6b4e545668598192ac9cb511e01371ae87461385296826118ade5e8172e8e96c" }, "downloads": -1, "filename": "chafic-0.1.2.tar.gz", "has_sig": false, "md5_digest": "af0e581754b7824a8711bfcbbde96246", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8239, "upload_time": "2019-09-26T06:30:19", "url": "https://files.pythonhosted.org/packages/6d/13/501f4ed7a4f8115fc2705588d637d1662c36d47af8e4fc4c6e3c120f8604/chafic-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "7ec5e6601be4f0c6944a134940e8f226", "sha256": "6119bb641aaeae36e89e8931b83a1acfaa131e683865b745ae5fce9b7a8e4dcd" }, "downloads": -1, "filename": "chafic-0.1.3.tar.gz", "has_sig": false, "md5_digest": "7ec5e6601be4f0c6944a134940e8f226", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8281, "upload_time": "2019-09-26T06:35:04", "url": "https://files.pythonhosted.org/packages/6f/62/fae5d2200308ba0607309eda839bef21c317791f8dfa7c52703151cdf7b8/chafic-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "9a256581fc82988eefc03f37d8c6c85b", "sha256": "8767aa84ccc40a9c6fda479609d02cbefd6868824428132c43b8602905fddc4a" }, "downloads": -1, "filename": "chafic-0.1.4.tar.gz", "has_sig": false, "md5_digest": "9a256581fc82988eefc03f37d8c6c85b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8283, "upload_time": "2019-09-26T07:07:17", "url": "https://files.pythonhosted.org/packages/df/56/934157558945b9c0bc105a5fc799f1a1ee047400babab12e0afb539cf9e5/chafic-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "1a94755b3ed04a8a1970d5e62ca0c2b6", "sha256": "39021a1453b10693527476b1906a34e6e39b14dd81e563520ca724a7864f6d14" }, "downloads": -1, "filename": "chafic-0.1.5.tar.gz", "has_sig": false, "md5_digest": "1a94755b3ed04a8a1970d5e62ca0c2b6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8688, "upload_time": "2019-09-26T07:26:10", "url": "https://files.pythonhosted.org/packages/14/03/6d6b3369536d4cce2763e45a44ca9fbc4759b520b18a368b5791f48b880f/chafic-0.1.5.tar.gz" } ], "0.1.6": [ { "comment_text": "", "digests": { "md5": "c1ae5170c5dbd14fc005b9d502c8c44d", "sha256": "e860bef6724dc3ff76237d6fd492180309baf9b66727b05c623ef852256f16be" }, "downloads": -1, "filename": "chafic-0.1.6.tar.gz", "has_sig": false, "md5_digest": "c1ae5170c5dbd14fc005b9d502c8c44d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8695, "upload_time": "2019-09-26T07:33:50", "url": "https://files.pythonhosted.org/packages/0e/6e/33236198782c4fc5a95877d0bc20f7a3a4754a8ea013528bb0203b060b3d/chafic-0.1.6.tar.gz" } ], "0.1.7": [ { "comment_text": "", "digests": { "md5": "761600e95d903472f5a83c257f116318", "sha256": "622961482522e33bcd6a633ce6fba9e63e185a00618ac95481a2f9278435aaa9" }, "downloads": -1, "filename": "chafic-0.1.7.tar.gz", "has_sig": false, "md5_digest": "761600e95d903472f5a83c257f116318", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8703, "upload_time": "2019-09-26T07:46:38", "url": "https://files.pythonhosted.org/packages/0c/d7/c49e146920a4a974795e6158c2aa24661731242f2474f96f7dec1ac08909/chafic-0.1.7.tar.gz" } ], "0.1.8": [ { "comment_text": "", "digests": { "md5": "e2a560bba1cdd95e3e64b8119d7799ff", "sha256": "464e72b8cb379470dc67d4e54c186ddfd120170bfedabf01a55f2b948b8ac390" }, "downloads": -1, "filename": "chafic-0.1.8.tar.gz", "has_sig": false, "md5_digest": "e2a560bba1cdd95e3e64b8119d7799ff", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8848, "upload_time": "2019-09-26T07:56:12", "url": "https://files.pythonhosted.org/packages/f1/75/0d0a0ab1a44c4c0380358740ac2505f6d1c41d2f3730637e4c825d34f705/chafic-0.1.8.tar.gz" } ], "0.1.9": [ { "comment_text": "", "digests": { "md5": "ef78722bb19617ba0de51b7f64563a23", "sha256": "b182b18cef92a6ec77ad85b89b25e04d31ee86689b92b02528bf51bddedae056" }, "downloads": -1, "filename": "chafic-0.1.9.tar.gz", "has_sig": false, "md5_digest": "ef78722bb19617ba0de51b7f64563a23", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9244, "upload_time": "2019-09-26T08:22:02", "url": "https://files.pythonhosted.org/packages/41/02/f359cc4b5cd8fe805a011b7ad5fcddfa28a3755cecc30fc003063e8e4685/chafic-0.1.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "34303cb3a015df81a2d9854a478bf3e9", "sha256": "bf7b429a0e2577c06f09a87757921089005cccb09a494f617289be6de7b90a15" }, "downloads": -1, "filename": "chafic-0.1.10.tar.gz", "has_sig": false, "md5_digest": "34303cb3a015df81a2d9854a478bf3e9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9238, "upload_time": "2019-09-26T08:35:13", "url": "https://files.pythonhosted.org/packages/1e/ba/c8562f57289be992f91226efcecb1552899b02772c34d2d95cc3aaa58561/chafic-0.1.10.tar.gz" } ] }