{ "info": { "author": "refraction-ray", "author_email": "refraction-ray@protonmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# wos-statistics\n\nPython library for collecting data from web of science and exporting summary in terms of all kinds of requirements on the citation statistics. The crawling part is implemented with aiohttp for a better speed.\n\n## Installation\n\nPython 3.6+ is supported.\n\n## Quick Start\n\n```python\nfrom pywos.crawler import WosQuery, construct_search\nfrom pywos.analysis import Papers\nimport asyncio\n\n# get data\nqd = construct_search(AI=\"D-3202-2011\", PY=\"2014-2018\") # construct the query for papers\nwq = WosQuery(querydict=qd) # create the crawler object based on the query\nloop = asyncio.get_event_loop()\ntask = asyncio.ensure_future(wq.main(path=\"data.json\")) # use main function of the object to download paper metadata and save them in the path\nloop.run_until_complete(task) # here we go\n\n# analyse data\np = Papers(\"data.json\") # fetch data from the path just now\np.show(['Last, First', 'Last, F.'],['flast@abcu.edu.cn'], ['2017','2018']) # generate the summary on citations in the form of pandas dataframe\n```\n\n## Usage\n\n### query part\n\nFirst, it based on a legitimate query on web of science to download metadata of papers. You should provide a query dict for the crawl class. `value(select[n])` corresponds the nth query conditions, eg. `AU` for author's name, `AI` for identifier of authors, `PY` for publication year range, etc. `value(input[n])` corresponds the nth query values, eg. the name of the author or the year range 2012-2018. If there are multiple conditions, `value(bool_[m]_[n])` should also be added, the values include `AND`, `OR`,`NOT` indicating how to combine different search conditions. Besides, `fieldCount` should be updated to the number of query conditions. A legitimate query looks like `{'fieldCount': 2, 'value(input1)': 'D-1234-5678', 'value(select1)': 'AI', 'value(input2)': '2014-2018', 'value(select2)': 'PY', 'value(bool_1_2)': 'AND'}`. There is a quick function provided to construct such query dictionary easily for AND-connecting queries.\n\n```python\nfrom pywos.crawler import construct_search\nconstruct_search(AI=\"D-1234-5678\", PY=\"2018-2018\")\n# return value below\n{'fieldCount': 2,\n 'value(bool_1_2)': 'AND',\n 'value(input1)': 'D-1234-5678',\n 'value(input2)': '2018-2018',\n 'value(select1)': 'AI',\n 'value(select2)': 'PY'}\n```\n\n### download part\n\nFirstly, we should initialize the crawling object by providing the query dict and dict of headers for all http connections (optional, there is a default user-agent for headers).\n\n```python\nfrom pywos.crawler import WosQuery\nwq = WosQuery(querydict = {'value(input1)': '',...}, headers= {'User-Agent':'blah-blah'})\n```\n\nThe data collecting task is called by `WosQuery.main(path=)`. Parameters are all optional except `path`, which is the pathname to save output data. `citedcheck` is a bool, if set to be true, all citation papers of the query paper are also collected. And this is the basis for detailed analysis on citations, like citations by years and citations by others. Otherwise, the default value for `citedcheck` is false, in this case only total citation number of each query paper can be obtained. `limit` option gives the max number of connections in the http connection pool. The default number is 20. A larger number implies faster speed but also implies higher risk of connection failure due to the restriction by web of science. `limit=30` is tested successfully without connection failure, and such speed is enough to handle 1000 papers in around 1 minute. If the query task is too large, the better practice is turning on the parameter `savebyeach=True`, such that every paper within the query will be saved immediately after downloading. Therefore, when meeting connection failure, we can recover the task without fetching all data again. This is determined by the `masklist` paramter of main function. If `masklist` is provided, for all int number in this list, the corresponding paper is omitted to avoid repeating work. In sum, for a large task, we have the following parameters.\n\n```python\nimport asyncio\ntask = asyncio.ensure_future(wq.main(path=\"prefix\", citedcheck=True, savebyeach=True, limit=30))\n```\n\nTo actually run the task is a thing on asyncio, see below.\n\n```python\nloop = asyncio.get_event_loop()\n\ntry:\n loop.run_until_complete(task)\n\nexcept KeyboardInterrupt as e:\n asyncio.Task.all_tasks()\n asyncio.gather(*asyncio.Task.all_tasks()).cancel()\n loop.stop()\n loop.run_forever()\n\nfinally:\n loop.close()\n```\n\nIf one would like to see the progress of the downloading, switch on the logging module.\n\n```python\nimport logging\nlogger = logging.getLogger('pywos')\nlogger.setLevel(logging.DEBUG)\nch = logging.StreamHandler()\nch.setLevel(logging.DEBUG)\nlogger.addHandler(ch)\n```\n\n### analysis part\n\nThe `Papers()` class is designed for analysis on metadata of the papers. To initialize the object, provide a path of the metadata we saved using `WosQuery.main(path)`. One can also provide a list of pathes, such that all data of these jsons are imported. Besides, one can turn on `merge=True`, such that all files with the prefix `path-` will automatically imported, this is specifically suitable for data files saved using `WosQuery.main(path, savebyeach=True)`.\n\nGenerate the table of citation analysis by running `Papers.show(namelist, maillist, years)`. These lists are used for checking whether one is the first/correspondence author of the paper and count citations within `years` as recent citations, respectively. One can turn on `citedcheck=True` if the data to be analysed is obtained from `WosQuery.main(citedcheck=True)`. This includes further classification on citations in terms of years (recent citation) and authors (citation by others/self). The return object of `Papers.show()` is `pandas.DataFrame`, which can be easily transformed into other formats, including csv, html, tables in database and so on.\n\nIn sum, \n\n```python\nfrom pywos.analysis import Papers\np = Papers(\"path-prefix\", merge=True)\np.show([\"Last, First\"], [\"mail@server\"], [\"2018\"], citedcheck=True)\n```\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/refraction-ray/wos-statistics", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "pywos", "package_url": "https://pypi.org/project/pywos/", "platform": "", "project_url": "https://pypi.org/project/pywos/", "project_urls": { "Homepage": "https://github.com/refraction-ray/wos-statistics" }, "release_url": "https://pypi.org/project/pywos/0.0.1/", "requires_dist": [ "aiohttp (>=3.4)", "pandas", "beautifulsoup4" ], "requires_python": "", "summary": "Citation data export and analysis from web of science", "version": "0.0.1" }, "last_serial": 4578709, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "0a8014826a4ee7a05a83d5f4151a45e7", "sha256": "4daf95f3bfbcb05316a8fb486596e699fe61dc810ce6d3e99ead39be07905c03" }, "downloads": -1, "filename": "pywos-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "0a8014826a4ee7a05a83d5f4151a45e7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14511, "upload_time": "2018-12-10T03:36:51", "url": "https://files.pythonhosted.org/packages/ba/8a/a1588ffa02abb359132ba31e9e3242e8c7acc853433bcee751ab979044b3/pywos-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "523f79d4c1156ebea72f0d5bd4a38d61", "sha256": "a01ea4d7c83c2a16bc7a490c6844863033a9ab497ec9291b53c157b9b7461ec4" }, "downloads": -1, "filename": "pywos-0.0.1.tar.gz", "has_sig": false, "md5_digest": "523f79d4c1156ebea72f0d5bd4a38d61", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12375, "upload_time": "2018-12-10T03:36:54", "url": "https://files.pythonhosted.org/packages/19/15/ccd6bf05f2092a37c8513d634dad94bc40a7272af9ba8f90f1a7b1df8ed8/pywos-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0a8014826a4ee7a05a83d5f4151a45e7", "sha256": "4daf95f3bfbcb05316a8fb486596e699fe61dc810ce6d3e99ead39be07905c03" }, "downloads": -1, "filename": "pywos-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "0a8014826a4ee7a05a83d5f4151a45e7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14511, "upload_time": "2018-12-10T03:36:51", "url": "https://files.pythonhosted.org/packages/ba/8a/a1588ffa02abb359132ba31e9e3242e8c7acc853433bcee751ab979044b3/pywos-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "523f79d4c1156ebea72f0d5bd4a38d61", "sha256": "a01ea4d7c83c2a16bc7a490c6844863033a9ab497ec9291b53c157b9b7461ec4" }, "downloads": -1, "filename": "pywos-0.0.1.tar.gz", "has_sig": false, "md5_digest": "523f79d4c1156ebea72f0d5bd4a38d61", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12375, "upload_time": "2018-12-10T03:36:54", "url": "https://files.pythonhosted.org/packages/19/15/ccd6bf05f2092a37c8513d634dad94bc40a7272af9ba8f90f1a7b1df8ed8/pywos-0.0.1.tar.gz" } ] }