{ "info": { "author": "Cheney Ni", "author_email": "ncj19991213@126.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# py-stock-crawler\nA multithreading web crawler that retrieves stock data from yahoo finance.\n\n## Usage \n\nExtremely easy to install !!! No extra C or binary libs required !!!\n\nOnly python 3 supported.\n\n```bash\npip install rookie-stock-crawler\n```\n\nUse four lines to create a multi-thread crawler and save the data to local as json files.\n\n```python\nfrom rookie_stock_crawler import StockCrawler\n\nsc = StockCrawler(['MSFT', 'GOOG', 'AMZN', 'AAPL'])\nsc.start()\nsc.save_all()\n```\n\nThe crawler object is iterable:\n\n```python\nfor item in sc:\n print(item.get())\n```\n\n\n\n## Settings\n\n#### class StockCrawler(symbol_list, concurrent=5, auto_save=False, auto_sleep=None)[[Source]](https://github.com/nichujie/rookie-stock-crawler/blob/master/rookie_stock_crawler/__init__.py#L15)\n\n##### symbol_list\n\nA list containing the symbols you want to crawl. The symbol must exist at yahoo finance or it will print some message to notify you(will not raise an exception).\n\n##### concurrent\n\nAn integer. The number of processes the program will start at one time to crawl data. Each process retrieves one stock.\n\n##### auto_save\n\n**True** or **False**. If set to **True**, the data of a stock will save instantly after crawled. \n\n##### auto_sleep\n\n**None**, **True**, or any positive integer. This param decides how many seconds the crawler will sleep after yahoo start to return 404 response (which means your client has reached its accessing limits).\n\nIf set to true, it will sleep 600 seconds by default.\n\n## Utilities\n\nThe whole package is designed to be detachable. All methods and object can be imported and used independently.\n\n#### class Stock[[Source]](https://github.com/nichujie/rookie-stock-crawler/blob/master/rookie_stock_crawler/stock.py#L11)\n\nThe main object stored in the crawler. The variable **item** in the code above is a Stock object. Which means you can access its attributes directly or use other methods.\n\n```python\nfrom rookie_stock_crawler.stock import Stock\n\nst = Stock('AAPL')\nst.retrieve()\nst.save()\nprint(st.get())\n```\n\nThis is an example of creating a single small crawler without multi-thread.\n\n#### rookie_stock_crawler.utils[[Source]](https://github.com/nichujie/rookie-stock-crawler/blob/master/rookie_stock_crawler/utils.py)\n\nAs we keep on breaking down the modules, we can import the methods whick **Stock** used to retrieve data. They works almost like a pure function(as long as your home router didn't explode).\n\n```python\nfrom rookie_stock_crawler.utils import get_financial, get_statistic, get_historical\n\nsymbol = 'AAPL'\nprint(get_financial(symbol))\nprint(get_statistic(symbol))\nprint(get_historical(symbol))\n```\n\nAll methods return a tuple of length 2. The first element is stock data(a list or dict), and the second one is the latest date of the data (e.g. The latest financial of Apple Inc. was released on 2018-9-29).\n\n## Exceptions\n\nThis package do not offer any customized exceptions. However, all exceptions raised during crawling are caught and printed with a prefix tag like **\"[Error]\"**. This is designed not to interrupt the crawling, in which case all the data will lose if you do not set the **auto_save** option to **True**.\n\nAll the exceptions raised outside crawling will still interrupt the program.\n\n## Data Format\n\nAlmost raw json data from yahoo finance(parsed from the webpage source code).\n\nTo see a sample please click [here](https://github.com/nichujie/rookie-stock-crawler/blob/master/json/AAPL%202019-02-09%2000:06:21.json).\n\nData format conversion tools will be introduced in later versions.\n\n## Special Instructions\n\nYahoo finance no longer maintain its API or YQL query. As a result, we cannot know the exact limit of accesing frequency. The crawler actually get the data by directly sending request to the server, which is exactly the same as you open a browser and visit the yahoo website.\n\nIn other words, you cannot crawl huge amount of data in a short time. It's already enough for individual developers and crawler fans. But if you want to get faster, the package also provide a distributed version to run on different servers. The whole solution includes a Django server and a front end. \n\nIf you are interested, you can visit my other repos, or else you can try other methods like fake-useragent (\ud83e\udd23that won't work) or global proxies(\ud83e\udd2amay also not work), etc.\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/nichujie/py-stock-crawler", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "rookie-stock-crawler", "package_url": "https://pypi.org/project/rookie-stock-crawler/", "platform": "", "project_url": "https://pypi.org/project/rookie-stock-crawler/", "project_urls": { "Homepage": "https://github.com/nichujie/py-stock-crawler" }, "release_url": "https://pypi.org/project/rookie-stock-crawler/0.1.0/", "requires_dist": [ "requests", "beautifulsoup4", "psycopg2", "psycopg2-binary" ], "requires_python": "", "summary": "A light weight tool to crawl stock data from yahoo finance.", "version": "0.1.0" }, "last_serial": 4796544, "releases": { "0.0.3": [ { "comment_text": "", "digests": { "md5": "cf4349179588d7e036ab2c52853c9c60", "sha256": "6fc7a2833f2badb3b7263a0ef017394a64f22cc67f22d3d26264518f0677f3a3" }, "downloads": -1, "filename": "rookie_stock_crawler-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "cf4349179588d7e036ab2c52853c9c60", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10267, "upload_time": "2019-02-08T17:13:53", "url": "https://files.pythonhosted.org/packages/56/fd/35be28fd09cf5e5ecb270ff4550f6ede8b8ce6a9af70c7f27a346b910f87/rookie_stock_crawler-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "36031d611af5cf2a1b201b17e457764e", "sha256": "5232370ae15a722dcdf8eeaac0632dfada5e3be54824d187313657cd6a1b5661" }, "downloads": -1, "filename": "rookie-stock-crawler-0.0.3.tar.gz", "has_sig": false, "md5_digest": "36031d611af5cf2a1b201b17e457764e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7493, "upload_time": "2019-02-08T17:13:56", "url": "https://files.pythonhosted.org/packages/93/2c/20a8e58928efe6e4543e8ab8dc579f173025659967cd16696561f06eb5ed/rookie-stock-crawler-0.0.3.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "dd85f7aa59c6c41f9d0ae98001d5c36c", "sha256": "9babcabd0230add64175319f0f229413a8f9075b4adff141d729c38c6d59d841" }, "downloads": -1, "filename": "rookie_stock_crawler-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "dd85f7aa59c6c41f9d0ae98001d5c36c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10395, "upload_time": "2019-02-08T17:21:33", "url": "https://files.pythonhosted.org/packages/ca/cb/641771c93ef317ecb51e9ff34c956162481c7409ea0ab001bfedbd1a07cc/rookie_stock_crawler-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "813c4e97c89e25a306547616407d915d", "sha256": "4852bab56ae8c2487c152ec2964779fcc49e9c2e60ff568d44c29214c9ad1d21" }, "downloads": -1, "filename": "rookie-stock-crawler-0.1.0.tar.gz", "has_sig": false, "md5_digest": "813c4e97c89e25a306547616407d915d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7635, "upload_time": "2019-02-08T17:21:34", "url": "https://files.pythonhosted.org/packages/23/a3/320166f608e34e7506eb786932225030e88d76f1a3509e6603c789b896d1/rookie-stock-crawler-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "dd85f7aa59c6c41f9d0ae98001d5c36c", "sha256": "9babcabd0230add64175319f0f229413a8f9075b4adff141d729c38c6d59d841" }, "downloads": -1, "filename": "rookie_stock_crawler-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "dd85f7aa59c6c41f9d0ae98001d5c36c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10395, "upload_time": "2019-02-08T17:21:33", "url": "https://files.pythonhosted.org/packages/ca/cb/641771c93ef317ecb51e9ff34c956162481c7409ea0ab001bfedbd1a07cc/rookie_stock_crawler-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "813c4e97c89e25a306547616407d915d", "sha256": "4852bab56ae8c2487c152ec2964779fcc49e9c2e60ff568d44c29214c9ad1d21" }, "downloads": -1, "filename": "rookie-stock-crawler-0.1.0.tar.gz", "has_sig": false, "md5_digest": "813c4e97c89e25a306547616407d915d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7635, "upload_time": "2019-02-08T17:21:34", "url": "https://files.pythonhosted.org/packages/23/a3/320166f608e34e7506eb786932225030e88d76f1a3509e6603c789b896d1/rookie-stock-crawler-0.1.0.tar.gz" } ] }