{
"info": {
"author": "Jason Lewris, Steve To, Tyler Lewris",
"author_email": "datawrestler@gmail.com",
"bugtrack_url": null,
"classifiers": [
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3"
],
"description": "[](https://www.python.org/)\n[](https://travis-ci.org/datawrestler/sec-utils)\n[](https://badge.fury.io/py/secutils)\n\n#### Welcome to [secutils](https://github.com/datawrestler/sec-utils)\n**secutils** is a utility package to facilitate large bulk downloads of SEC documents. It works with any SEC document type and will retrieve the entire historical database if required. Multi-threaded file downloads are enabled in the command line utility.\n\nKey functionality includes:\n- Multi-threaded downloading\n- Caching of index files\n- Automatic directory structure buildout (i.e. downloading multiple file types w/dir structure: ftype --> year --> quarter --> files)\n- Resume downloading\n- Built in logging and download success tracker\n\nOverview of README:\n- [Motivation](#motivate)\n- [Installation](#install)\n- [Usage](#usage)\n- [Vision](#vision)\n\n##### Motivation \nsecutils picks up where a number of other repos left off. There are a couple SEC downloading python packages out there, however they are designed from retrieval of few documents. I needed a way to consistently download the latest updates from the SEC and secure a local copy of the entire history of the SEC for certain file types. This translates into TB's of documents, where networking, directory structure, logging, etc. issues arise. \n\nThere is a nice package available to download and construct index files, however the user is still left to download the actual files and must be comfortable with bash scripting. \n\nWith secutils the program handles files you have already retrieved, get's the missing files you don't have in your local archive, and continues. \n\nFor examples of other repos that exist: \n\n- [sec-edgar-downloader](https://github.com/jadchaar/sec-edgar-downloader)\n- [python-edgar](https://github.com/edouardswiac/python-edgar/)\n\nFurthermore, the hope of this package is to create parsers for repsective form types. A user could import the 10-K parser and call the Management Discussion and Analysis method to retrieve respective MD&A's from selected files. \n\nThere are also plans to integrate directly with popular cloud providers given the scale of these filings. Processing 10-K/Q's alone requires TB's of storage.\n\n##### Installation \nThere are two primary methods of installing sec-utils. The first is via the python packaging index (pypi). The second is straight from source. \n\nTo install from pypi:\n```bash\npip install secutils\n```\n\nAnd to install from source:\n```bash\ngit clone https://github.com/datawrestler/sec-utils && cd sec-utils\nconda create --name sec_env python=3.7 pip\nconda activate sec_env\npip install -r requirements.txt\npip install -e .\n```\n\n##### Usage \n```bash\nconda activate sec_env\npython download_sec.py --output_dir=/mnt/sda/sec --form_types=S-1 --num_workers=-1 --start_year=2014 --end_year=2019 --quarters 1 2 3 4\n```\nEven more cleanly, you can coordinate long running jobs and keep track of your parameters by modifying this [example script](https://github.com/datawrestler/sec-utils/blob/master/examples/run.sh)\n\nMake sure to make it executable on your system:\n```bash\nchmod +x run.sh\n./run.sh\n```\n\nYou can also generate a config file and use the config to control parameters of longer runs:\n```python\nfrom secutils.utils import generate_config\npath_for_config = ''\ngenerate_config(path_for_config)\n```\n\nthen when calling the longer download run:\n```bash\npython -m secutils.download_sec --config_path='path_for_config'\n```\n\nA useful trick when working with remote servers is to direct output from a session to a file. Using screen also maintains a session even if you disconnect from ssh:\n```bash\nscreen -dm -L python -m secutils.download_sec --config_path='path_for_config'\n```\n\nAdditionally, users can leverage the API directly for more hands on work. An overview resides in an [example jupyter notebook](https://github.com/datawrestler/sec-utils/blob/master/examples/Getting%20Started.ipynb) with additional details below:\n```python\nfrom secutils.edgar import FormIDX\nform = FormIDX(year=2017, quarter=1, seen_files=None, cache_dir=None, form_types=['10-K])\nfiles = form.index_to_files()\nform.master_index.head()\n# CIK\tCompany Name\tForm Type\tDate Filed\tFilename\tfname\n# 1000015\tMETA GROUP INC\t10-K\t1998-03-31\tedgar/data/1000015/0001000015-98-000009.txt\t0001000015-98-000009.txt\n# 1000112\tCHEVY CHASE MASTER CREDIT CARD TRUST II\t10-K\t1998-03-27\tedgar/data/1000112/0000920628-98-000038.txt\t0000920628-98-000038.txt\n# 1000179\tPARAMOUNT FINANCIAL CORP\t10-K\t1998-03-30\tedgar/data/1000179/0000950120-98-000108.txt\t0000950120-98-000108.txt\n\n# lets take a peek at attributes available to individual files:\nex = files[0]\nmsg = f\"\"\"\n Company Name: {ex.company_name}\n CIK Number: {ex.cik_number}\n Date Filed: {ex.date_filed}\n Form Type: {ex.form_type}\n File Name: {ex.file_name}\n Download URL: {ex.file_download_url}\n \"\"\"\nprint(msg) \n# Company Name: OPTICAL CABLE CORP\n# CIK Number: 1000230\n# Date Filed: 2017-12-20 00:00:00\n# Form Type: 10-K\n# File Name: 0001437749-17-020936.txt\n# Download URL: https://www.sec.gov/Archives/edgar/data/1000230/0001437749-17-020936.txt \n# get example file and download:\n# to download our example file:\noutput_dir = '.'\nex.download_file(output_dir) # 200 is a successful download\n\n# verify download \nimport os\nlist(filter(lambda x: x.endswith('txt'), os.listdir(output_dir)))\n# ['0001437749-17-020936.txt']\n```\n\nGetting hands on is great, however using the CLI does provide several advantages:\n- Automatic directory structure creation\n- Built in logging and caching\n- Ability to resume training via download scanning\n- Multi-threaded file downloading\n\n##### Vision \nThe vision for this project extends far beyond it's current state of downloading index and SEC files from the Edgar database. Currently, parsing SEC files is tremendously difficult. There are numerous reasons for these difficulties including:\n\n- No systematic tagging structure for SEC filings\n- File submissions changed over the years\n- Many different file types, header types, and content from one filing type to another\n\nGiven the above, parsing even a 10-K takes tremendous effort. The goal of this project is to bring together like minded individuals and take a stab at a systematic parsing effort with a consistent API. The future state of the project would allow users to download SEC filings and use convenient methods to retrieve particular sections of the filings. For instance, a user could do something like the following:\n\n```python\nfrom secutils.file_types import file_10k\n\nfile_path = '/path/to/10-K'\nf = file_10k.from_path(file_path)\n\n# and retrieve the management discussion and analysis section directly:\nf.management_discussion()\n# Here at XYZ company, we believe the following year will bring about great properity due to our R&D efforts in packages like secutils...\n```\nThis would open up a world of opportunity for collaboration, text analytics research, and general business information gathering. \n\n",
"description_content_type": "text/markdown",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/datawrestler/sec-utils",
"keywords": "",
"license": "",
"maintainer": "",
"maintainer_email": "",
"name": "secutils",
"package_url": "https://pypi.org/project/secutils/",
"platform": "",
"project_url": "https://pypi.org/project/secutils/",
"project_urls": {
"Homepage": "https://github.com/datawrestler/sec-utils"
},
"release_url": "https://pypi.org/project/secutils/0.0.3/",
"requires_dist": null,
"requires_python": ">=3.6",
"summary": "Download SEC files in bulk",
"version": "0.0.3"
},
"last_serial": 5903759,
"releases": {
"0.0.1": [
{
"comment_text": "",
"digests": {
"md5": "305d2f9b6a6de1be6381515e34ab8330",
"sha256": "192a4a957dd6e836e1c12bf1d377367273bd90737c651df7e3cd58078d4e8968"
},
"downloads": -1,
"filename": "secutils-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "305d2f9b6a6de1be6381515e34ab8330",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 10993,
"upload_time": "2019-09-23T01:56:39",
"url": "https://files.pythonhosted.org/packages/13/94/d0bd110a31ae01868a8fa206717f9b4b83d83be5fe0510982868ede0b6be/secutils-0.0.1-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "e5e333b96f0a045743f2b9ccd1f6e151",
"sha256": "630c1c7908c8a25bd57348cc4a189861676022e8e71aff211eedc32b147ab7b6"
},
"downloads": -1,
"filename": "secutils-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "e5e333b96f0a045743f2b9ccd1f6e151",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 9475,
"upload_time": "2019-09-23T01:56:42",
"url": "https://files.pythonhosted.org/packages/ed/50/adbf4b84db17fcc1e05c010e08d2c36a1037d0ae25ea94b672bae0d7503c/secutils-0.0.1.tar.gz"
}
],
"0.0.2": [
{
"comment_text": "",
"digests": {
"md5": "dad99ad73fb7400e29c2abedd7b51884",
"sha256": "dd49c479ef74b7a4c9f335c7fd94b2ed6ca860c76bf122f7baa15e88c02a970f"
},
"downloads": -1,
"filename": "secutils-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dad99ad73fb7400e29c2abedd7b51884",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 11670,
"upload_time": "2019-09-23T04:37:56",
"url": "https://files.pythonhosted.org/packages/7d/b1/aa60b4b6ba907580afef623b71370a1d1dfc9c8b35fe3ec2ca71d0ed3639/secutils-0.0.2-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "e81f2ef066c879689a7536f89a3a5c18",
"sha256": "63617563a01ece5ef301edb20829997e0fe4bc54edccd279c5d37469ee10fe6c"
},
"downloads": -1,
"filename": "secutils-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "e81f2ef066c879689a7536f89a3a5c18",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 10465,
"upload_time": "2019-09-23T04:37:57",
"url": "https://files.pythonhosted.org/packages/80/2d/6d159dba3730a347b974c50db3d4013efc8c467993f956dcdc75dce9b0e5/secutils-0.0.2.tar.gz"
}
],
"0.0.3": [
{
"comment_text": "",
"digests": {
"md5": "48eedfdc6d105ed9dbfe2151c373fa15",
"sha256": "e5eddcd92d7ebfd8713f8c303dc7eb26fb1ecc64e17e261e7cabb209e44836b5"
},
"downloads": -1,
"filename": "secutils-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "48eedfdc6d105ed9dbfe2151c373fa15",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 12412,
"upload_time": "2019-09-29T20:25:52",
"url": "https://files.pythonhosted.org/packages/e5/4e/18b39f8dcf1e77ad1c86ee804f6ab8f55bd816bd6c1e411a54fcf4ab0f46/secutils-0.0.3-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "e55e61e7d42651e39e4f856a46e6b3e5",
"sha256": "e3570ea502c8d728f8a255663183e2f84171ebc2e29f709b424378f2bbc48a0c"
},
"downloads": -1,
"filename": "secutils-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "e55e61e7d42651e39e4f856a46e6b3e5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 13182,
"upload_time": "2019-09-29T20:25:54",
"url": "https://files.pythonhosted.org/packages/2d/a5/7070948ec4df0d1622006ca0083237296139d6b1033310e9c6164dcc4860/secutils-0.0.3.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "48eedfdc6d105ed9dbfe2151c373fa15",
"sha256": "e5eddcd92d7ebfd8713f8c303dc7eb26fb1ecc64e17e261e7cabb209e44836b5"
},
"downloads": -1,
"filename": "secutils-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "48eedfdc6d105ed9dbfe2151c373fa15",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 12412,
"upload_time": "2019-09-29T20:25:52",
"url": "https://files.pythonhosted.org/packages/e5/4e/18b39f8dcf1e77ad1c86ee804f6ab8f55bd816bd6c1e411a54fcf4ab0f46/secutils-0.0.3-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "e55e61e7d42651e39e4f856a46e6b3e5",
"sha256": "e3570ea502c8d728f8a255663183e2f84171ebc2e29f709b424378f2bbc48a0c"
},
"downloads": -1,
"filename": "secutils-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "e55e61e7d42651e39e4f856a46e6b3e5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 13182,
"upload_time": "2019-09-29T20:25:54",
"url": "https://files.pythonhosted.org/packages/2d/a5/7070948ec4df0d1622006ca0083237296139d6b1033310e9c6164dcc4860/secutils-0.0.3.tar.gz"
}
]
}