{ "info": { "author": "Alecyrus", "author_email": "heyuangunia@gmail.com", "bugtrack_url": null, "classifiers": [ "Environment :: Web Environment", "Intended Audience :: Developers", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Topic :: Internet", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Indexing", "Topic :: Utilities" ], "description": "# AdIdentifier\n[![PyPI version](https://img.shields.io/pypi/pyversions/adidentifier.svg)](https://pypi.python.org/pypi/adidentifier)\n[![PyPI](https://img.shields.io/pypi/v/adidentifier.svg)](https://pypi.python.org/pypi/adidentifier)\n\n## Installation\nPrerequisites:\n* The re2 library from Google\n> \\# git clone https://github.com/google/re2.git & cd re2 & make & make install\n\n* The Python development headers \n> \\# apt-get install python-dev\n\n* Cython 0.20+ (pip install cython)\n> $ pip install cython\n\nAfter the prerequisites are installed, install as follows (pip3 for python3):\n> $ pip install https://github.com/andreasvc/pyre2/archive/master.zip\n\nor\n>$ git clone git://github.com/andreasvc/pyre2.git\n\n>$ cd pyre2\n\n>$ make install\n\nthen\n>$ pip install adidentifier\n\n## Usage\n\n### Import\n```python\nfrom adidentifier import AdIdentifier\n```\n### Initialize \n```python\n ad = AdIdentifier()\n```\n## API\n### is_finance(text)\nCheck whether the text or url is relevent to Finance.\n```python\n test1 = [\"\u901f\u8d37\u4e4b\u5bb6-\u501f\u94b1\u4e0d\u62c5\u5fc3_2\u5c0f\u65f6\u5230\u8d26\", \n \"https://www.aiqianzhan.com/html/register3_bd4.html?utm_source=bd4-pc-ss&utm_medium=bd4SEM&utm_campaign=D1-%BE%BA%C6%B7%B4%CA-YD&utm_content=%BE%BA%C6%B7%B4%CA-%C3%FB%B4%CA&utm_term=p2p%CD%F8%B4%FB\"]\n for test in test1:\n resu = ad.is_finance(text)\n print text,\"------->>\", resu\n```\n> Output:\n```\n\u901f\u8d37\u4e4b\u5bb6-\u501f\u94b1\u4e0d\u62c5\u5fc3_2\u5c0f\u65f6\u5230\u8d26 ------->> True\nhttps://www.aiqianzhan.com/html/register3_bd4.html?utm_source=bd4-pc-ss&utm_medium=bd4SEM&utm_campaign=D1-%BE%BA%C6%B7%B4%CA-YD&utm_content=%BE%BA%C6%B7%B4%CA-%C3%FB%B4%CA&utm_term=p2p%CD%F8%B4%FB ------->> True\n```\n### is_ad(url)\nCheck whether the url is relevent to AD\n```python\n test2 = [\"https://ss3.baidu.com/-rVXeDTa2gU2pMbgoY3K/it/u=3778907493,3669893773&fm=202&mola=new&crop=v1\",\n \"https://ss2.bdstatic.com/8_V1bjqh_Q23odCf/pacific/upload_25289207_1521622472509.png?x=0&y=0&h=150&w=242&vh=92.98&vw=150.00&oh=150.00&ow=242.00\",\n \"http://pagead2.googlesyndication.com/pagead/show_ads.js\",\n \"http://www.googletagservices.com/tag/js/gpt_mobile.js\"]\n for text in adtexts2:\n resu = ad.is_ad(text)\n print(text, \"------>>\", resu)\n```\n> Output:\n```\n('https://ss3.baidu.com/-rVXeDTa2gU2pMbgoY3K/it/u=3778907493,3669893773&fm=202&mola=new&crop=v1', '------>>', True)\n('https://ss2.bdstatic.com/8_V1bjqh_Q23odCf/pacific/upload_25289207_1521622472509.png?x=0&y=0&h=150&w=242&vh=92.98&vw=150.00&oh=150.00&ow=242.00', '------>>', True)\n('http://pagead2.googlesyndication.com/pagead/show_ads.js', '------>>', True)\n('http://www.googletagservices.com/tag/js/gpt_mobile.js', '------>>', False)\n```\n\n### get_target_from_href(href)\nExtract the target url from a hyperlink. eg. https://www.baidu.com/...%ASDD ----> https://www.wdzj.com/...1%E8%B4%B7\n\n```python\n print ad.get_target_from_href(\"https://www.baidu.com/baidu.php?url=0f0000jsnOdydCYpIY2xQXFCV1h5YmZnZh_pWjXI1sMrqQiM8Y55S59-6yXvznN6gm_5K2BIwOl4qzVcr2qRUIZdYnyTM2gOTAL-ed0xhaXP7ZI4XoxPJtWsnc4vPT3Qgcpo8dLTicCsAu_tZqqn5DH0sVytFArXV5kfFxBwLN5Kyia2R0.DD_NR2Ar5Od663rj6t8ae9zC63p_jnNKtAlEuw9zsISgZsIoDgQvTVxQgzdtEZ-LTEuzk3x5I9qxo9vU_5Mvmxgv3IhOj4en5VS8ZutEOOS1j4SrZdSyZxg9tqhZden5o3OOOqhZ1tT5ot_rSEj4en5ovmxgkl32AM-WI6h9ikX1BsIT7jHzlRL5spycTT5y9G4mgwRDkRAcY_1fdIT7jHzs_lTUQqRHAZ1tT5ot_rSEj4en5ovmxgkl32AM-CFhY_mx5ksSEzselt5M_sSEu9qx7i_nYQZu_LSr4f.U1Yk0ZDq1xBYSsKspynqn0KY5TL3V5_0pyYqnWcd0ATqmhRLn0KdpHdBmy-bIfKspyfqnWR0mv-b5Hckr0KVIjYknjDLg1DsnH-xnW0vn-t1PW0k0AVG5H00TMfqP1cz0ANGujYkPjmvg1cvnWR4g1cknH0Yg1cznHR40AFG5HcsP0KVm1YLPjDknjnknjIxP1fkPWckP1f1g1DkP1bkrHD1nHIxn0KkTA-b5H00TyPGujYs0ZFMIA7M5H00mycqn7ts0ANzu1Ys0ZKs5H00UMus5H08nj0snj0snj00Ugws5H00uAwETjYs0ZFJ5H00uANv5gKW0AuY5H00TA6qn0KET1Ys0AFL5HDs0A4Y5H00TLCq0ZwdT1YLPHTvnHnLPWTLrjmkPWmvnHfk0ZF-TgfqnHRzPHcYrH0knj0dPsK1pyfqrHNhmW-9m10snj0suARvrfKWTvYqPWD4PRuAPHc3Pbw7wj9arfK9m1Yk0ZK85H00TydY5H00Tyd15H00XMfqn0KVmdqhThqV5HKxn7tsg100uA78IyF-gLK_my4GuZnqn7tsg1Kxn0Ksmgwxuhk9u1Ys0AwWpyfqn0K-IA-b5iYk0A71TAPW5H00IgKGUhPW5H00Tydh5H00uhPdIjYs0AulpjYs0Au9IjYs0ZGsUZN15H00mywhUA7M5HD0UAuW5H00mLFW5HfsPHmv&us=0.0.0.0.0.0.0.101&ck=0.0.0.0.0.0.0.0&shh=www.baidu.com&sht=baidu\")\n```\n> Output:\n```shell\nhttps://www.wdzj.com/zhuanti/518lcj/?_pwk=n_4_1_1_1_3_5_4_s%E5%BF%85%E4%BA%89%E8%AF%8D|%E7%BD%91%E8%B4%B7|%E7%BD%91%E8%B4%B7&utm_source=baidu&utm_medium=cpc&tm_content=search&utm_campaign=%E7%BD%91%E8%B4%B7&utm_term=%E7%BD%91%E8%B4%B7\n```\n\n### get_domain_from_url(href)\nExtract the domain from a url . eg. https://www.asdasd.com/asdasd ----> www.asdasd.com\n\n```python\n print ad.get_domain_from_url(\"https://www.asdasd.com/asdasd\")\n```\n> Output:\n```shell\nwww.asdasd.com\n```\n\n\n## Config\nConfig will be generated automatically.\n```ini\n[CUSTOM]\nuri_keywords = qian,dai,cf,wd,jin\ntext_keywords = \u7f51\u8d37\nad_filter = https://ss3.baidu.com/*,https://ss2.bdstatic.com/*\n```\n\n## ATTENTION!!!\n\u8c03\u7528is_finance(),\u5224\u65ad\u94fe\u63a5\u662f\u5426\u662f\u91d1\u878d\u94fe\u63a5\u65f6\uff0c\u5fc5\u987b\u4f20\u5165 href \u8d85\u94fe\u63a5\u6307\u5411\u7684target\u5730\u5740\uff0c\u4e14\u683c\u5f0f\u5982\u540c`{scheme}://{domain}/{path}`,\u5176\u4e2d`path`\u53ef\u4ee5\u7701\u7565\u3002", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Alecyrus/AdIdentifier", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "adidentifier", "package_url": "https://pypi.org/project/adidentifier/", "platform": "", "project_url": "https://pypi.org/project/adidentifier/", "project_urls": { "Homepage": "https://github.com/Alecyrus/AdIdentifier" }, "release_url": "https://pypi.org/project/adidentifier/0.0.8/", "requires_dist": null, "requires_python": "", "summary": "AdIdentifier", "version": "0.0.8" }, "last_serial": 3826333, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "0768d49706ae7d46c29775ad98b87567", "sha256": "6946edfcdc583985170fe7757e278837c92b672602a480b5a4a4bb6a054c5ea6" }, "downloads": -1, "filename": "adidentifier-0.0.1.tar.gz", "has_sig": false, "md5_digest": "0768d49706ae7d46c29775ad98b87567", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2991, "upload_time": "2018-04-28T10:11:44", "url": "https://files.pythonhosted.org/packages/0b/5a/cbc5686205ec86732a57b8b77dd86a6cfa7846954c9d980de6573dc208a1/adidentifier-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "5f44db775653c474aad17d6ba26250d6", "sha256": "ec3ea1fce6a614cd3089e1df10e3c1fdd8acbc41ac507d83c4e5796631d9e79a" }, "downloads": -1, "filename": "adidentifier-0.0.2.tar.gz", "has_sig": false, "md5_digest": "5f44db775653c474aad17d6ba26250d6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3048, "upload_time": "2018-04-29T03:11:02", "url": "https://files.pythonhosted.org/packages/6f/5f/d248eed17a08987e6da173c84ba4b2b276ccb6e949709434a2cc15580a9f/adidentifier-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "c76071d18c5ef58ebbd217a38e4d93ae", "sha256": "9c5c1d539cea3d4189e6b27046b5b53d31307078a3d065962f38d9c16c54700b" }, "downloads": -1, "filename": "adidentifier-0.0.3.tar.gz", "has_sig": false, "md5_digest": "c76071d18c5ef58ebbd217a38e4d93ae", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 607883, "upload_time": "2018-04-29T03:32:48", "url": "https://files.pythonhosted.org/packages/94/81/dfcf14526c400013d44c7b9c5d5fde2c4730af55d530306f5f296b380919/adidentifier-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "51bcaabb023998a47419e535854a1f9b", "sha256": "c7c7b33a8c3e8469bf917f8155f2ec7658ede66368cc383300746918e1fb2c8f" }, "downloads": -1, "filename": "adidentifier-0.0.4.tar.gz", "has_sig": false, "md5_digest": "51bcaabb023998a47419e535854a1f9b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 607898, "upload_time": "2018-04-29T03:40:19", "url": "https://files.pythonhosted.org/packages/2e/9d/c0084f085c6b3264713118b4ef0fc99858490e5e6282f8cd1b3a343e0a24/adidentifier-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "f940d7bc563e2e25fa9c3246a2630e62", "sha256": "0ccb0264275c6e214de827245ca07ee4ac8a828d27ea94eb0d6cf39a3322aab3" }, "downloads": -1, "filename": "adidentifier-0.0.5.tar.gz", "has_sig": false, "md5_digest": "f940d7bc563e2e25fa9c3246a2630e62", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 609482, "upload_time": "2018-04-29T03:48:53", "url": "https://files.pythonhosted.org/packages/64/da/aef663d6e2d4fef0504b5adec34ae8e4d200106c4d2c7b29ecac9deb4860/adidentifier-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "e45d85b46f0bd6c7d11af646901459eb", "sha256": "d8ca54c77a67f9149ef3b624ffe7d960db2dec3118e4f1d2744bcde2e684335a" }, "downloads": -1, "filename": "adidentifier-0.0.6.tar.gz", "has_sig": false, "md5_digest": "e45d85b46f0bd6c7d11af646901459eb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 610202, "upload_time": "2018-05-01T05:46:45", "url": "https://files.pythonhosted.org/packages/75/27/f2c4ad916f659e0bd910e50c0b95c4d0e85d4b2b7ae744ee52354f2f69bd/adidentifier-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "68102c3086ca5476dbd958aa1619964b", "sha256": "bf1a057467dfffd6e66ea82c6a043b77ebb6f07516b0798fb80850f25d0617a5" }, "downloads": -1, "filename": "adidentifier-0.0.7.tar.gz", "has_sig": false, "md5_digest": "68102c3086ca5476dbd958aa1619964b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 613820, "upload_time": "2018-05-02T08:48:48", "url": "https://files.pythonhosted.org/packages/06/da/8e6ba8f1148367c9c40330ad4148d7101462c1668df15167b6a9c3310915/adidentifier-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "b8e5003bd673906a98bb564444c43474", "sha256": "46e2165dfc6f399677ba3be35d13f9dfbd437f3f5985a8758f419428b9725545" }, "downloads": -1, "filename": "adidentifier-0.0.8.tar.gz", "has_sig": false, "md5_digest": "b8e5003bd673906a98bb564444c43474", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 612036, "upload_time": "2018-05-02T09:06:36", "url": "https://files.pythonhosted.org/packages/b7/97/f002d7b60a6aa384988f2b5faafc3e4e847668aa23b4dd6b50256b70d890/adidentifier-0.0.8.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b8e5003bd673906a98bb564444c43474", "sha256": "46e2165dfc6f399677ba3be35d13f9dfbd437f3f5985a8758f419428b9725545" }, "downloads": -1, "filename": "adidentifier-0.0.8.tar.gz", "has_sig": false, "md5_digest": "b8e5003bd673906a98bb564444c43474", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 612036, "upload_time": "2018-05-02T09:06:36", "url": "https://files.pythonhosted.org/packages/b7/97/f002d7b60a6aa384988f2b5faafc3e4e847668aa23b4dd6b50256b70d890/adidentifier-0.0.8.tar.gz" } ] }