{ "info": { "author": "SylvanasSun", "author_email": "sylvanas.sun@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ], "description": ".. image:: info/logo_904x487.png\n :target: https://github.com/SylvanasSun/FishFishJump\n\n\\\n\n.. image:: https://img.shields.io/github/forks/SylvanasSun/FishFishJump.svg?style=social&label=Fork\n :target: https://github.com/SylvanasSun/FishFishJump\n.. image:: https://img.shields.io/github/stars/SylvanasSun/FishFishJump.svg?style=social&label=Stars\n :target: https://github.com/SylvanasSun/FishFishJump\n.. image:: https://img.shields.io/github/watchers/SylvanasSun/FishFishJump.svg?style=social&label=Watch\n :target: https://github.com/SylvanasSun/FishFishJump\n.. image:: https://img.shields.io/github/followers/SylvanasSun.svg?style=social&label=Follow\n :target: https://github.com/SylvanasSun/FishFishJump\n\n\\\n\n\n.. image:: https://img.shields.io/badge/Scrapy-1.4.0-blue.svg\n :target: https://github.com/scrapy/scrapy\n\n.. image:: https://img.shields.io/badge/Flask-0.12.2-blue.svg\n :target: https://github.com/pallets/flask\n\n.. image:: https://img.shields.io/badge/Redis-required-green.svg\n :target: https://redis.io/\n\n.. image:: https://img.shields.io/badge/Elasticsearch-required-green.svg\n :target: https://www.elastic.co/\n\n.. image:: https://img.shields.io/badge/MongoDB-required-green.svg\n :target: https://www.mongodb.com/\n\n.. image:: https://img.shields.io/badge/docker-support-green.svg\n :target: https://www.docker.com/\n\n\\\n\n.. image:: https://badges.frapsoft.com/os/mit/mit.svg?v=103)](https://opensource.org/licenses/mit-license.php\n :target: LICENSE\n\n.. image:: https://travis-ci.org/SylvanasSun/FishFishJump.svg?branch=master\n :target: https://travis-ci.org/SylvanasSun/FishFishJump\n\n.. image:: https://img.shields.io/pypi/pyversions/FishFishJump.svg\n :target: https://pypi.python.org/pypi/FishFishJump\n\n.. image:: https://img.shields.io/pypi/v/FishFishJump.svg\n :target: https://pypi.python.org/pypi/FishFishJump\n\n.. image:: https://img.shields.io/badge/version-0.2.3-brightgreen.svg\n :target: HISTORY.rst\n\n.. image:: https://img.shields.io/github/release/SylvanasSun/FishFishJump.svg\n :target: https://github.com/SylvanasSun/FishFishJump\n\n.. image:: https://img.shields.io/github/tag/SylvanasSun/FishFishJump.svg\n :target: https://github.com/SylvanasSun/FishFishJump\n\n.. image:: https://img.shields.io/github/issues/SylvanasSun/FishFishJump.svg\n :target: https://github.com/SylvanasSun/FishFishJump\n\n\\\n\n\u7b80\u4f53\u4e2d\u6587_\n\n.. _\u7b80\u4f53\u4e2d\u6587: README_CH.rst\n\nFishFishJump is a solution that simply and basic for search engines and provide multiple demos that independent deployment by used Docker, through those examples can help you implement your customizable search engine site.\n\n.. image:: info/flow_chat.png\n\n- **fish_core**: Include some common utils or components and other modules depend on it.\n\n- **fish_crawlers**: A demo of the distributed crawler that implements base on scrapy-redis, it contains two projects of scrapy, the master_crawler will crawl link from http://dmoztools.net/ and put it to the Redis queue, the slave_crawler will crawl the link from the Redis queue then extract info and store into the MongoDB.\n\n- **fish_dashboard**: A web app for monitoring health status and info of Scrapy and Elasticsearch base on Flask.\n\n- **fish_searcher**: A web app that supports search and returns search results for the user and it base on the Elasticsearch and depends data on the fish_crawler crawling.\n\nUsage\n---------\n\nIf you want to independent deployments then you only need input following commands in the root directory of the project(must contain file docker-compose.yml):\n\n::\n\n docker-compose up -d --build\n\nMore about docker and docker-compose please refer to: https://docs.docker.com/compose/\n\nNotice: for fish_crawlers, you also need to access the Docker container and deploy scrapy, FishFishJump deployment way use Scrapyd, the related configuration file is in the scrapy.cfg such as:\n\n::\n\n # Automatically created by: scrapy startproject\n #\n # For more information about the [deploy] section see:\n # https://scrapyd.readthedocs.org/en/latest/deploy.html\n\n [settings]\n default = master_crawler.settings\n\n [deploy:master_crawler01]\n url = http://127.0.0.1:6800/\n project = master_crawler\n\nLook at the following command to deployments:\n\n::\n\n # enter in inside of the Docker container\n docker exec -it [container id] /bin/bash\n # deploy scrapy project by command 'scrapyd-deploy [deploy name]', the deploy name refers to the file scrapy.cfg\n cd master_crawler\n scrapyd-deploy master_crawler01\n cd ..\n cd slave_crawler\n scrapyd-deploy slave_crawler01\n # start crawlers, project_name and spider_name refer to the file scrapy.cfg\n # The spider dmoz_crwaler must need a list of the Redis and its key is dmoz_crawler:start_urls and value is http://dmoztools.net/\n # Example: redis LPUSH dmoz_crawler:start_urls http://dmoztools.net/\n curl http://localhost:6800/schedule.json -d project=master_crawler -d spider=dmoz_crawler\n curl http://localhost:6800/schedule.json -d project=slave_crawler -d spider=simple_fish_crawler\n # exit from the Docker container\n exit\n\nMore about please refer to: https://github.com/scrapy/scrapyd-client\n\nBy the way, fish_crawlers use local Redis and MongoDB by Docker. if you don't want to then you can delete the following content in docker-compose.yml and config your Redis and MongoDB address in Scrapy project(settings.py).\n\n::\n\n redis:\n image: redis\n container_name: FishFishJump_Redis\n ports:\n - \"6379:6379\"\n\n mongo:\n image: mongo\n container_name: FishFishJump_Mongo\n ports:\n - \"27017:27017\"\n\n links:\n - redis\n - mongo\n\n\nif you want not use Docker then you need manual start fish_crawlers or fish_dashboard, please input following order:\n\n::\n\n # the first need install dependency\n pip install FishFishJump\n # if on the root directory of the master_crawler\n scrapy crawl dmoz_crawler\n # if on the root directory of the slave_crawler\n scrapy crawl simple_fish_crawler\n # if on the root directory of the fish_dashboard or fish_searcher\n python app.py\n\nFor fish_crawlers you can also use scrapyd for deployments, or remote manage by fish_dashboard.\n\n\nDashboard\n---------\n\nfish_dashboard is a monitoring platform that monitoring health status and information of the Scrapy and Elasticsearch and it has some feature help you better for manage Scrpay and Elasticsearch such as:\n\n- real-time update data display by ajax polling if you don't want to use it you can set config POLLING_INTERVAL_TIME to 0 for cancel ajax polling.\n\n- fault alarm mechanism, fish_dashboard will send an alarm email to you when your Scrapy or Elasticsearch there was no response for a long time(reach maximum fault number of times, this param refer to MAX_FAILURE_TIMES in the settings.py).\n\n- transfer data mechanism, you have two methods to transfer data from MongoDB into the Elasticsearch for generating index database for search, the first way is the manual transfer and data is transmitted at one time in the off-line state, the second way is the automatic transfer data based on a thread polling implementation and this thread will always transfer data from MongoDB into the Elasticsearch until you cancel it.\n\nfish_dashboard is based on a Flask implementation and its config file is settings.py in the root directory of the fish_dashboard you can also use command line interface, the specific configuration is as following:\n\n::\n\n Usage: fish_dashboard [options] args\n\n Command line param for FishFishJump webapp.\n\n Options:\n -h, --help show this help message and exit\n --host=HOST host address, default: 0.0.0.0\n --port=PORT port, default: 5000\n --username=ADMIN_USERNAME\n administrator username for login, default: admin\n --password=ADMIN_PASSWORD\n administrator password for login, default: 123456\n -d, --debug enable debug pattern of the flask, default: True\n -t, --test enable test pattern of the flask, default: False\n --cached-expire=CACHE_EXPIRE\n expire of the flask cache, default: 60\n --scrapyd-url=SCRAPYD_URL\n url of the scrapyd for connect scrapyd service,\n default: http://localhost:6800/\n -v, --verbose verbose that log info, default: False\n --log-file-dir=LOG_FILE_DIR\n the dir path of the where store log file, default:\n E:\\FishFishJump\\log\\\n --log-file-name=LOG_FILE_BASIS_NAME\n the name of the what log file, default:\n fish_fish_jump_webapp.log\n --elasticsearch-hosts=ELASTICSEARCH_HOSTS\n the string represent a host address for Elasticsearch,\n format: hostname:port and able to write multiple\n address by comma separated default: localhost:9200\n --polling-interval=POLLING_INTERVAL_TIME\n the time of the interval time for real-time dynamic\n update, units second default: 3\n --failure-sleep-time=FAILURE_SLEEP_TIME\n if connected fail will turn to this time window and\n return backup data in this time window, units second\n default: 30\n --max-failure-times=MAX_FAILURE_TIMES\n the number of the max failure times if occurred fail\n reaching the upper limit will sent message into the\n front-end, default: 5\n --max-failure-message-key=MAX_FAILURE_MESSAGE_KEY\n the string of the key for message sent after reaching\n the upper limit, default: timeout_error\n\n\nHere are some renderings:\n\n.. image:: info/dashboard-01.png\n.. image:: info/dashboard-02.png\n.. image:: info/dashboard-03.gif\n.. image:: info/dashboard-04.gif\n\nSearcher\n---------\n\nThe fish_searcher is a web app that supports search and return search results and implement base on the Elasticsearch, it provides some basic function as a search engine site.\n\n.. image:: info/searching.gif\n\n::\n\n Usage: fish_searcher [options] args\n\n Command line param for FishFishJump webapp.\n\n Options:\n -h, --help show this help message and exit\n --host=HOST host address, default: 0.0.0.0\n --port=PORT port, default: 5009\n -d, --debug enable debug pattern of the flask, default: True\n -t, --test enable test pattern of the flask, default: False\n -v, --verbose verbose that log info, default: False\n --log-file-dir=LOG_FILE_DIR\n the dir path of the where store log file, default:\n E:\\FishFishJump\\log\\\n --log-file-name=LOG_FILE_BASIS_NAME\n the name of the what log file, default:\n fish_fish_jump_searcher.log\n --elasticsearch-hosts=ELASTICSEARCH_HOSTS\n the string represent a host address for Elasticsearch,\n format: hostname:port and able to write multiple\n address by comma separated default: localhost:9200\n --elasticsearch-index=ELASTICSEARCH_INDEX\n the string represents a list of the index for query\n data from Elasticsearch, if you want to assign\n multiple please separate with a comma, for example,\n index_a,index_b, default: ['pages']\n --elasticsearch-doc-type=ELASTICSEARCH_DOC_TYPE\n the string represents a list of the doc_type for query\n data from Elasticsearch, if you want to assign\n multiple please separate with a comma, for example,\n doc_type_a, doc_type_b, default: ['page_item']\n --redis-cache enable Redis for external cache, default: False\n --redis-host=REDIS_HOST\n the string represents a host of the Redis and the\n configuration invalid when not set config --redis-\n cache, default: 127.0.0.1\n --redis-port=REDIS_PORT\n the string represents a port of the Redis and the\n configuration invalid when not set config --redis-\n cache , default: 6379", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/SylvanasSun/FishFishJump", "keywords": "FishFishJump python scrapy scrapy-redis search engines", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "FishFishJump", "package_url": "https://pypi.org/project/FishFishJump/", "platform": "", "project_url": "https://pypi.org/project/FishFishJump/", "project_urls": { "Homepage": "https://github.com/SylvanasSun/FishFishJump" }, "release_url": "https://pypi.org/project/FishFishJump/0.2.3/", "requires_dist": null, "requires_python": "", "summary": "FishFishJump is a solution that simply and basic for search engines and provide multiple demos that independent deployment by used Docker.", "version": "0.2.3" }, "last_serial": 3651544, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "5dc3960fc95b9943f2c6a7c36ffe6fdb", "sha256": "9bc0fa76c54c139d0ae4db37a6045deb923cac3e049c26f5b706eb34b5affa81" }, "downloads": -1, "filename": "FishFishJump-0.1.0.tar.gz", "has_sig": false, "md5_digest": "5dc3960fc95b9943f2c6a7c36ffe6fdb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13722, "upload_time": "2017-11-10T10:48:00", "url": "https://files.pythonhosted.org/packages/46/0b/ef92336f1d24165e3ddc44095654eb5b17af49651066b03d5e1639fa3745/FishFishJump-0.1.0.tar.gz" } ], "0.1.0.1": [ { "comment_text": "", "digests": { "md5": "a74179917a6c20761ee1c44668ad55b9", "sha256": "214c2ff6ca63547aaf048d72a58b140231931bfa8ff26609cbbeb3f10ddebc4d" }, "downloads": -1, "filename": "FishFishJump-0.1.0.1.tar.gz", "has_sig": false, "md5_digest": "a74179917a6c20761ee1c44668ad55b9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13729, "upload_time": "2017-11-10T13:14:06", "url": "https://files.pythonhosted.org/packages/15/7f/7e743ea994e77d0411dd1bbf75faa8dc61a655a8d2f240d9b6eda51b3a9e/FishFishJump-0.1.0.1.tar.gz" } ], "0.1.0.2": [ { "comment_text": "", "digests": { "md5": "195e3c51f84a06aef45d77705800a31a", "sha256": "b2fdcfe89b1d1b6ff4d5a203eb91e3f56ad97ab2c4b1feca2e6a9bc2fa181319" }, "downloads": -1, "filename": "FishFishJump-0.1.0.2.tar.gz", "has_sig": false, "md5_digest": "195e3c51f84a06aef45d77705800a31a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 78999, "upload_time": "2017-11-10T13:23:43", "url": "https://files.pythonhosted.org/packages/bf/98/f619bce675c7e3c1d816ea7244aeca306e35e8c41c24f92844b2215706d5/FishFishJump-0.1.0.2.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "4e180e5d1497225e967b831f5d2b0104", "sha256": "38475f1a72b5c3f7954c1b28d2a983921987d482ae1ea567904c3aefe436899e" }, "downloads": -1, "filename": "FishFishJump-0.1.5.tar.gz", "has_sig": false, "md5_digest": "4e180e5d1497225e967b831f5d2b0104", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15108, "upload_time": "2017-11-16T11:54:02", "url": "https://files.pythonhosted.org/packages/d3/57/2e534c2c83e25776496bc5a0c12183f96d44a710ac209146e293c06c188d/FishFishJump-0.1.5.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "71e84146c71b7e84cefa89457f33ff56", "sha256": "1310b1ceb397250d99a3c053f66ca33d654870e13d8e8539c22865a34d18dad1" }, "downloads": -1, "filename": "FishFishJump-0.2.0.tar.gz", "has_sig": false, "md5_digest": "71e84146c71b7e84cefa89457f33ff56", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4109314, "upload_time": "2017-12-28T09:33:22", "url": "https://files.pythonhosted.org/packages/c4/02/d081a09dfb380d81edee820287c0c2f48d816fb04ea4e01f410eb12ee971/FishFishJump-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "be324b287497968487851e704a571d9e", "sha256": "dc3f6985471862711fa04c17cfc739866a38b9abba46f319c955f9599ada32c3" }, "downloads": -1, "filename": "FishFishJump-0.2.1.tar.gz", "has_sig": false, "md5_digest": "be324b287497968487851e704a571d9e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4109426, "upload_time": "2017-12-28T10:46:12", "url": "https://files.pythonhosted.org/packages/a2/d4/84c4bbb49f7254b29b6aa83d0141de6bb9b6558d1a387b437aeebe1ae59f/FishFishJump-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "b4bcf944fd91919b4e040c34687ad903", "sha256": "a41d13a25ea9370db2ba1b7e55b68c40fd0a09bc2f6c53dc945f06c9342bff42" }, "downloads": -1, "filename": "FishFishJump-0.2.2.tar.gz", "has_sig": false, "md5_digest": "b4bcf944fd91919b4e040c34687ad903", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4371577, "upload_time": "2018-02-16T12:53:46", "url": "https://files.pythonhosted.org/packages/9d/11/3c38e226ce1aae8b23417a9c5a29a8524ef3cbe6cb9f3737f095d80857a8/FishFishJump-0.2.2.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "aa878b8c15207adf9983af4d3e3985ec", "sha256": "552831844aa00a395e9f9a16855c6c8819f53c78dcd35da84fe36055930a00b2" }, "downloads": -1, "filename": "FishFishJump-0.2.3.tar.gz", "has_sig": false, "md5_digest": "aa878b8c15207adf9983af4d3e3985ec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13261044, "upload_time": "2018-03-08T15:12:26", "url": "https://files.pythonhosted.org/packages/e2/df/d6a8ea9bdbce92c9b776d8af26a4524a56d14873f67ef9d71c02567e7497/FishFishJump-0.2.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "aa878b8c15207adf9983af4d3e3985ec", "sha256": "552831844aa00a395e9f9a16855c6c8819f53c78dcd35da84fe36055930a00b2" }, "downloads": -1, "filename": "FishFishJump-0.2.3.tar.gz", "has_sig": false, "md5_digest": "aa878b8c15207adf9983af4d3e3985ec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13261044, "upload_time": "2018-03-08T15:12:26", "url": "https://files.pythonhosted.org/packages/e2/df/d6a8ea9bdbce92c9b776d8af26a4524a56d14873f67ef9d71c02567e7497/FishFishJump-0.2.3.tar.gz" } ] }