{ "info": { "author": "Ed Summers", "author_email": "ehs@pobox.com", "bugtrack_url": null, "classifiers": [], "description": "\n\n*\u00e9tudier* is a small Python program that uses [Selenium] and [requests-html] to\ndrive a *non-headless* browser to collect a citation graph around a particular\n[Google Scholar] citation or set of search results. The resulting network is\nwritten out as a [Gephi] file and a [D3] visualization using [networkx]. *The D3\nvisualization could use some work, so if you add style to it please submit a\npull request.*\n\nIf you are wondering why it uses a non-headless browser it's because Google is\n[quite protective] of this data and routinely will ask you to solve a captcha\n(identifying street signs, cars, etc in photos). *\u00e9tudier* will allow you to\ncomplete these tasks when they occur and then will continue on its way\ncollecting data.\n\n### Install\n\nYou'll need to install [ChromeDriver] before doing anything else. If you use\nHomebrew on OS X this is as easy as:\n\n brew install chromedriver\n\nThen you'll want to install [Python 3] and:\n\n pip3 install etudier\n\n### Run\n\nTo use it you first need to navigate to a page on Google Scholar that you are\ninterested in, for example here is the page of citations that reference Sherry\nOrtner's [Theory in Anthropology since the Sixties]. Then you start *etudier* up\npointed at that page.\n\n % etudier 'https://scholar.google.com/scholar?start=0&hl=en&as_sdt=20000005&sciodt=0,21&cites=17950649785549691519&scipsc='\n\nIf you are interested in starting with keyword search results in Google Scholar\nyou can do that too. For example here is the url for searching for \"cscw memory\"\nif I was interested in papers that talk about the CSCW conference and memory:\n\n % etudier 'https://scholar.google.com/scholar?hl=en&as_sdt=0%2C21&q=cscw+memory&btnG='\n\nNote: it's important to quote the URL so that the shell doesn't interpret the\nampersands as an attempt to background the process.\n\n### --pages\n\nBy default *\u00e9tudier* will collect the 10 citations on that page and then look at\nthe top 10 citatations that reference each one. So you will end up with no more\nthan 100 citations being collected (10 on each page * 10 citations).\n\nIf you would like to get more than one page of results use the `--pages`. For\nexample this would result in no more than 400 (20 * 20) results being collected:\n\n % etudier --pages 2 'https://scholar.google.com/scholar?start=0&hl=en&as_sdt=20000005&sciodt=0,21&cites=17950649785549691519&scipsc=' \n\n### --depth\n\nAnd finally if you would like to look at the citations of the citations you the\n--depth parameter. \n\n % etudier --depth 2 'https://scholar.google.com/scholar?start=0&hl=en&as_sdt=20000005&sciodt=0,21&cites=17950649785549691519&scipsc='\n\nThis will collect the initial set of 10 citations, the top 10 citations for\neach, and then the top 10 citations of each, so no more than 1000 citations 1000\ncitations (10 * 10 * 10). It's no more because there is certain to be some\nduplication of publications in the citations of each.\n\n### --output\n\nBy default a file called `output.gexf` will be written, but you can change this\nwith the `--output` option. The output file will contain rudimentary metadata\ncollected from Google Scholar including:\n\n- *id* - the cluster identifier assigned by Google\n- *url* - the url for the publication\n- *title* - the title of the publication\n- *authors* - a comma separated list of the publication authors\n- *year* - the year of publication\n- *cited-by* - the number of other publications that cite the publication\n- *cited-by-url* - a Google Scholar URL for the list of citing publications\n\n[Theory in Anthropology since the Sixties]: https://scholar.google.com/scholar?hl=en&as_sdt=20000005&sciodt=0,21&cites=17950649785549691519&scipsc=\n[Google Scholar]: https://scholar.google.com\n[Selenium]: https://docs.seleniumhq.org/\n[requests-html]: http://html.python-requests.org/\n[quite protective]: https://www.quora.com/Are-there-technological-or-logistical-challenges-that-explain-why-Google-does-not-have-an-official-API-for-Google-Scholar\n[Gephi]: https://gephi.org/\n[networkx]: https://networkx.github.io/\n[D3]: https://d3js.org/\n[Python 3]: https://www.python.org/downloads/\n[ChromeDriver]: https://sites.google.com/a/chromium.org/chromedriver/", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/edsu/etudier", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "etudier", "package_url": "https://pypi.org/project/etudier/", "platform": "", "project_url": "https://pypi.org/project/etudier/", "project_urls": { "Homepage": "https://github.com/edsu/etudier" }, "release_url": "https://pypi.org/project/etudier/0.0.7/", "requires_dist": null, "requires_python": ">=3", "summary": "Collect a citation graph from Google Scholar", "version": "0.0.7" }, "last_serial": 4802219, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "fe72b2cfb3e94bb79e342f1201271545", "sha256": "a46ba65f70f4882b646b861fbd37c44826919a0c452eefa0748e38b660381c11" }, "downloads": -1, "filename": "etudier-0.0.1.tar.gz", "has_sig": false, "md5_digest": "fe72b2cfb3e94bb79e342f1201271545", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4747, "upload_time": "2018-03-18T20:38:09", "url": "https://files.pythonhosted.org/packages/4d/36/6aaf549afe2cdd46e3ac4ce1415b106dbef77a72f26b52eb3448e12f0a86/etudier-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "cd543ab087b646ad548cc4c3c92d05af", "sha256": "1abbac66f42fb9f9bcf18e58c58dd9350f07e226133da8f13e4e5444dc53c462" }, "downloads": -1, "filename": "etudier-0.0.2.tar.gz", "has_sig": false, "md5_digest": "cd543ab087b646ad548cc4c3c92d05af", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5442, "upload_time": "2018-03-19T16:27:40", "url": "https://files.pythonhosted.org/packages/83/17/33e3becb5419e517fc4fa27bcec83aaa8be60fab36214fad5faef679f82a/etudier-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "86d3d0ebd33dbd0ec333a5e26569ffb8", "sha256": "55b60820cb97cd0c6067c53c442074de1ffa0fad7633c0fd0a3c2d72d27383d8" }, "downloads": -1, "filename": "etudier-0.0.3.tar.gz", "has_sig": false, "md5_digest": "86d3d0ebd33dbd0ec333a5e26569ffb8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5447, "upload_time": "2018-03-19T16:56:20", "url": "https://files.pythonhosted.org/packages/d3/08/51a1ef020224dde9adab5924fe1d728a173a9fcde910227e2313055e4f6b/etudier-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "86f43b0d07fcbd9f8a14a980da80ab02", "sha256": "91d852ab509e300235c4a54feef9cab7e8e0bddc7e5b90304be149cf28f18c11" }, "downloads": -1, "filename": "etudier-0.0.4.tar.gz", "has_sig": false, "md5_digest": "86f43b0d07fcbd9f8a14a980da80ab02", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5719, "upload_time": "2018-03-27T19:57:16", "url": "https://files.pythonhosted.org/packages/84/3f/5cd24e9e158028a311d3d3c141acc0b75635eaf8762a13ba681422565629/etudier-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "644f8c85693f4dc360119ee07ab07554", "sha256": "b9994345751ed74e0048e8088a2aa37977fa0c9a4c04036d05466ed96f4494ae" }, "downloads": -1, "filename": "etudier-0.0.5.tar.gz", "has_sig": false, "md5_digest": "644f8c85693f4dc360119ee07ab07554", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 6289, "upload_time": "2018-06-25T10:45:05", "url": "https://files.pythonhosted.org/packages/97/90/92544f71b70537dbe01e470e17a5db96c4d0fe094176e985a04e4373b795/etudier-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "927980a59ddf0c90cb348e31c9127ba2", "sha256": "bbb8225cf2999551874b3f45a3027eba3d68d6b765cf830d5a039b919bd25369" }, "downloads": -1, "filename": "etudier-0.0.6.tar.gz", "has_sig": false, "md5_digest": "927980a59ddf0c90cb348e31c9127ba2", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 6325, "upload_time": "2018-06-28T14:01:28", "url": "https://files.pythonhosted.org/packages/f9/31/6dcac24e4138524695db561fb65a06aa76226fb408e8b7a2bf664bd03f0d/etudier-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "ad01f9d4b47e08e9a94bd312b0759f5b", "sha256": "64f1e9345d01bffee68805520b2663c0843f28f2b34939daa7335cda1680c063" }, "downloads": -1, "filename": "etudier-0.0.7.tar.gz", "has_sig": false, "md5_digest": "ad01f9d4b47e08e9a94bd312b0759f5b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 6308, "upload_time": "2019-02-10T14:09:20", "url": "https://files.pythonhosted.org/packages/b8/f2/9f002e63ec85d94976a72ef3724ec76b2613d47d36da663614c8e8c06d40/etudier-0.0.7.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ad01f9d4b47e08e9a94bd312b0759f5b", "sha256": "64f1e9345d01bffee68805520b2663c0843f28f2b34939daa7335cda1680c063" }, "downloads": -1, "filename": "etudier-0.0.7.tar.gz", "has_sig": false, "md5_digest": "ad01f9d4b47e08e9a94bd312b0759f5b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 6308, "upload_time": "2019-02-10T14:09:20", "url": "https://files.pythonhosted.org/packages/b8/f2/9f002e63ec85d94976a72ef3724ec76b2613d47d36da663614c8e8c06d40/etudier-0.0.7.tar.gz" } ] }