{
    "info": {
        "author": "Vahid Vaezian",
        "author_email": "vahid.vaezian@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "License :: OSI Approved :: MIT License",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 2.7"
        ],
        "description": "This package include modules for findng links in a webpage and its children.\n\nIn the main module `find_links_by_extension` links are found using two sub-modules and then added together:\n\n1. Using Google Search Results (`get_links_using_Google_search`)  \n\nSince we can specify which types of files we are looking for when we search in Google, this methos scrapes these results.\nBut this method is not complete:  \n\na) Google search works based on crawlers, and sometimes they don't index properly. For example [this][1] webpage has three pdf files at the moment (Aug 7 2018), but when we [use google search][2] to find them it finds only two  although the files were uploaded 4 years ago.  \n\nb) It doesn't work with some websites. For example [this][3] webpage  has three pdf files but google [cannot find any][4]. \n\nc) If many requests are sent in a short period of time, Google blocks access and asks for CAPTCHA solving.\n\n\n2. Using a direct method of finding all urls in the given page and following those links if they are refering to children pages and seach recursively (`get_links_directly`)  \n\nWhile this method does not miss any files in pages that it gets to (in contrast to method 1 which sometimes do), it may not find all the files because:  \n\na) Some webpages in the domain may be isolated i.e. there is no link to them in the parent pages. For these cases method 1 above works.  \n\nb) In rare cases the link to a file of type xyz may not have .xyz in the link ([example][5]). In these cases method 2 cannot detect the file (because it only relies on the extesion appearing in the links), but method 1 detects correctly in these cases.\n\nSo the two methods complete each other's gaps.\n\n\n[1]: http://www.midi.gouv.qc.ca/publications/en/planification/\n[2]: https://www.google.com/search?q=site%3Ahttp%3A%2F%2Fwww.midi.gouv.qc.ca%2Fpublications%2Fen%2Fplanification%2F+filetype%3Apdf\n[3]: http://www.sfu.ca/~vvaezian/Summary/\n[4]: https://www.google.com/search?q=site%3Ahttp%3A%2F%2Fwww.sfu.ca%2F~vvaezian%2FSummary%2F+filetype%3Apdf\n[5]: http://www.sfu.ca/~robson/Random\n\n\n",
        "description_content_type": "",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/vvaezian/Web-Scraper",
        "keywords": "",
        "license": "",
        "maintainer": "",
        "maintainer_email": "",
        "name": "web-scraper",
        "package_url": "https://pypi.org/project/web-scraper/",
        "platform": "",
        "project_url": "https://pypi.org/project/web-scraper/",
        "project_urls": {
            "Homepage": "https://github.com/vvaezian/Web-Scraper"
        },
        "release_url": "https://pypi.org/project/web-scraper/1.0/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "A package for getting data from the intenet",
        "version": "1.0"
    },
    "last_serial": 4155099,
    "releases": {
        "1.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "58a1fdf6ce23d61e31242ced9d55c62d",
                    "sha256": "35f6600243771447ee726165cb8fd832ac4436b57ce7027fcf25cbb43da96686"
                },
                "downloads": -1,
                "filename": "web_scraper-1.0-py2-none-any.whl",
                "has_sig": false,
                "md5_digest": "58a1fdf6ce23d61e31242ced9d55c62d",
                "packagetype": "bdist_wheel",
                "python_version": "py2",
                "requires_python": null,
                "size": 10768,
                "upload_time": "2018-08-10T02:38:27",
                "url": "https://files.pythonhosted.org/packages/26/01/e3d461199c9341b7d39061c14b1af914654d00769241503a87f77505f95f/web_scraper-1.0-py2-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "bce6fd352d18e6eff36f5d5bbad38b1e",
                    "sha256": "ddb620311ebd618b3cee8ed6b08bf30f3813d710f9fef333852637152c00f702"
                },
                "downloads": -1,
                "filename": "web_scraper-1.0.tar.gz",
                "has_sig": false,
                "md5_digest": "bce6fd352d18e6eff36f5d5bbad38b1e",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 5694,
                "upload_time": "2018-08-10T02:38:28",
                "url": "https://files.pythonhosted.org/packages/b4/45/116acaa0e9242103e5c23cea4f368a5516d96386795994f9187b92015727/web_scraper-1.0.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "58a1fdf6ce23d61e31242ced9d55c62d",
                "sha256": "35f6600243771447ee726165cb8fd832ac4436b57ce7027fcf25cbb43da96686"
            },
            "downloads": -1,
            "filename": "web_scraper-1.0-py2-none-any.whl",
            "has_sig": false,
            "md5_digest": "58a1fdf6ce23d61e31242ced9d55c62d",
            "packagetype": "bdist_wheel",
            "python_version": "py2",
            "requires_python": null,
            "size": 10768,
            "upload_time": "2018-08-10T02:38:27",
            "url": "https://files.pythonhosted.org/packages/26/01/e3d461199c9341b7d39061c14b1af914654d00769241503a87f77505f95f/web_scraper-1.0-py2-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "bce6fd352d18e6eff36f5d5bbad38b1e",
                "sha256": "ddb620311ebd618b3cee8ed6b08bf30f3813d710f9fef333852637152c00f702"
            },
            "downloads": -1,
            "filename": "web_scraper-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bce6fd352d18e6eff36f5d5bbad38b1e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5694,
            "upload_time": "2018-08-10T02:38:28",
            "url": "https://files.pythonhosted.org/packages/b4/45/116acaa0e9242103e5c23cea4f368a5516d96386795994f9187b92015727/web_scraper-1.0.tar.gz"
        }
    ]
}