{ "info": { "author": "Almer Mendoza", "author_email": "amendoza@stratpoint.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "# Gencon Miner\n\nA general content miner that leverages on Beautiful Soup and Requests to handle extraction. The main goal is to always imagine in terms of targetting parent elements in an HTML form then getting group of tags given that parent.\n\n```python\nfrom gencon_miner import GenconMiner\n```\n\n## From URL\n\n```python\nurl_miner = GenconMiner(url=\"http://google.com\")\ntxt = url_miner.extract('title')\nprint(txt[0].text) # Google\n```\n\n## From text\n\n```python\ntext_miner = GenconMiner(text=\"
Hello
\")\ntxt = text_miner.extract('.myclass')\nprint(txt[0].text) # Hello\n```\n\n## Convert all tag content to string\n\nNote that contents in a tag will be delimited using newline.\n\n```python\nmeaning_of_life = \"\"\"\n\n Hello\n darkness my old friend\n
\n And another one\n\"\"\"\nbulk_miner = GenconMiner(text=meaning_of_life)\nprint(bulk_miner.to_text()) # Hello\\ndarkness my old friend\\nAnd another one\n```\n\n## Parent to target\n\nUse-case on walking document and extracting the targets.\n\n```python\nsong_of_the_day = \"\"\"\n| Mamma Mia | \nHere I go again | \nMy my | \nHow can I resist you | \n