{ "info": { "author": "Chao Li", "author_email": "chaoli.job@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "XML/TRXML Selector\n==================\n\nDescription\n-----------\n\nThis package provides two scripts: ``mine-xml`` and\n``mine-trxml``.\n\n``mine-xml`` selects tags from xml/mxml files, and save the\nselected values to file.\n\n``mine-trxml`` selects fields from trxml/mtrxml files, and save\nthe selected values to file.\n\nStatus\n------------\n\n.. image:: https://travis-ci.org/tilaboy/xml-miner.svg?branch=master\n :target: https://travis-ci.org/tilaboy/xml-miner\n\n.. image:: https://readthedocs.org/projects/xml-miner/badge/?version=latest\n :target: https://xml-miner.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\n.. image:: https://pyup.io/repos/github/tilaboy/xml-miner/shield.svg\n :target: https://pyup.io/repos/github/tilaboy/xml-miner/\n :alt: Updates\n\nRequirements\n------------\n\nPython 3.6+\n\nInstallation\n------------\n\n::\n\n pip install xml-selector\n\n\nUsage\n-----\n\nUse xml selector script\n~~~~~~~~~~~~~~~~~~~~~~~\n\nThe xml selector supports:\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- one or more tagnames:\n\n- selector could be one tagname ``name``\n\n- or comma separated tagnames ``langskill,compskill,softskills``\n\n- multiple sources:\n\n- e.g. select from xml dir, xml files, mxml file, or directly from\n annotation server\n\nexamples:\n^^^^^^^^^\n\n::\n\n #select from xml directory\n mine-xml --source tests/xmls/ --selector name --output_file name.tsv\n mine-xml --source tests/xmls/ --selector langskill,compskill,softskill --output_file skill.tsv --with_field_name\n\n #select from xml file or mxml file\n mine-xml --source tests/sample.mxml --selector experience --output_file experience.tsv\n\n #select directly from annotation server\n mine-xml --source localhost:50249 --selector name --output_file name.tsv --query \"set Data2018\"\n\nUse trxml selector script\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe trxml selector supports:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- one or more selectors:\n\n- selector can be one field: ``name.0.name``\n\n- or comma separated fields: ``name.0.name,address.0.address``\n\n- single or multi item:\n\n- can select field from one item, e.g. ``experienceitem.3.experience``\n\n- or select field value of all item, e.g. ``experienceitem.experience``\n (or ``experienceitem.*.experience``)\n\n- multiple sources:\n\n- e.g. select from trxml dir, trxml files, or mtrxml file\n\nexamples:\n^^^^^^^^^\n\n::\n\n # one selector, single item\n mine-trxml --source tests/trxmls/ --selector name.0.name --output_file name.tsv\n\n # one selector, multiple item\n mine-trxml --source tests/sample.mxml --selector experienceitem.experience --output_file experience.tsv\n\n # more selectors, single item\n mine-trxml --source tests/trxmls/ --selector name.0.name,address.0.address,phone.0.phone --output_file personal.tsv\n\n # more selectors, multiple item\n mine-trxml --source tests/sample.mxml --itemgroup experienceitem --fields experience,experiencedate --output_file experience.tsv\n mine-trxml --source tests/sample.mxml --selector experienceitem.*.experience,experienceitem.*.experiencedate --output_file experience.tsv\n mine-trxml --source tests/sample.mxml --selector experienceitem.experience,experienceitem.experiencedate --output_file experience.tsv\n\nDevelopment\n-----------\n\nTo install package and its dependencies, run the following from project\nroot directory:\n\n::\n\n python setup.py install\n\nTo work the code and develop the package, run the following from project\nroot directory:\n\n::\n\n python setup.py develop\n\nTo run unit tests, execute the following from the project root\ndirectory:\n\n::\n\n python setup.py test\n\nselector and output details:\n----------------------------\n\n- mine-xml:\n\n input: documents, selector(s), output\n\n output:\n\n - default (parameter ``with_field_name`` not set):\n ``filename, field_value``\n\n e.g. select all names with selector ``name``\n\n +------------+-----------+\n | filename | value |\n +============+===========+\n | xxxx | Chao Li |\n +------------+-----------+\n\n - parameter ``with_field_name`` set:\n ``filename, field_value, field_name``\n\n e.g. select skills with selector ``compskill,langskill,otherskill``\n\n +------------+---------+-------------+\n | filename | value | field |\n +============+=========+=============+\n | xxxx | java | compskill |\n +------------+---------+-------------+\n | xxxx | dutch | langskill |\n +------------+---------+-------------+\n\n- mine-trxml\n\n - input:\n - documents, selector(s), output,\n - documents, itemgroup, fields, output\n\n - single selector:\n - single item (``name.0.name``): filename field\n\n +------------+---------------+\n | filename | name.0.name |\n +============+===============+\n | xxxx | Chao Li |\n +------------+---------------+\n\n - multi items (``skill.*.skill``): filename item\\_index field\n\n +------------+---------------+---------+\n | filename | item\\_index | field |\n +============+===============+=========+\n | xxxx | 0 | java |\n +------------+---------------+---------+\n | xxxx | 1 | dutch |\n +------------+---------------+---------+\n\n - multiple selectors\n - single item: filename, field1, field2 ...\n\n each selector points to a field of a specific item with a digital\n index, e.g. ``name.0.lastname,name.0.firstname,address.0.country``\n\n +------------+-------------------+--------------------+---------------------+\n | filename | name.0.lastname | name.0.firstname | address.0.country |\n +============+===================+====================+=====================+\n | xxxx | Li | Chao | China |\n +------------+-------------------+--------------------+---------------------+\n | xxxx | Lee | Richard | USA |\n +------------+-------------------+--------------------+---------------------+\n\n - multi items: filename, item\\_index, field1, field2 ...\n\n each selector points to a field from all items in an itemgroup, e.g.\n ``skill.skill,skill.type,skill.date``\n\n +------------+---------+---------+-------------+-------------+\n | filename | skill | skill | type | date |\n +============+=========+=========+=============+=============+\n | xxxx | 0 | java | compskill | 2001-2005 |\n +------------+---------+---------+-------------+-------------+\n | xxxx | 1 | dutch | langskill | 2002- |\n +------------+---------+---------+-------------+-------------+\n\n\n\n0.0.5 (2019-10-14)\n==================\n- bug fix: ElementTree xpath find will return a None if value is an empty string, restore to empty string\n\n0.0.4 (2019-09-11)\n==================\n- bug fix: reading always use utf8, and not continue reading if failed on encoding of one document\n\n0.0.3 (2019-08-11)\n==================\n- expand miner.py module to generate matched phrases per doc\n\n0.0.2 (2019-08-09)\n==================\n\n- added support for CI\n\n\n0.0.1 (2019-08-09)\n==================\n\n- make two script: mine-xml and mine-trxml\n\n\n0.0.0 (2019-08-06)\n==================\n\n- Add the first version of the mine_xml and mine_trxml\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/tilaboy/xml-miner", "keywords": "data mining,xml", "license": "MIT license", "maintainer": "", "maintainer_email": "", "name": "xml-miner", "package_url": "https://pypi.org/project/xml-miner/", "platform": "", "project_url": "https://pypi.org/project/xml-miner/", "project_urls": { "Homepage": "https://github.com/tilaboy/xml-miner" }, "release_url": "https://pypi.org/project/xml-miner/0.0.5/", "requires_dist": null, "requires_python": "", "summary": "data mining tool, to mine data from batch of xml files", "version": "0.0.5" }, "last_serial": 5969680, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "da0d741ac2657f84387ebc54b3eecbe8", "sha256": "12d435887976e098662515f81dec2efc3c1ecdfe9ee7d32138715ec5ef9db6cf" }, "downloads": -1, "filename": "xml_miner-0.0.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "da0d741ac2657f84387ebc54b3eecbe8", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 19590, "upload_time": "2019-08-09T05:56:59", "url": "https://files.pythonhosted.org/packages/5e/c8/dd0461aba357e5ca37dcef309ba11cbe0b0e3951996085499dc3c4daf9c0/xml_miner-0.0.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e2184ec2b7ebafc5ea199b4dfba9d197", "sha256": "3efdd59ef922c2d48be76c45a6cf618d2fe64713011ab2c5a55528bb1538b31a" }, "downloads": -1, "filename": "xml_miner-0.0.2.tar.gz", "has_sig": false, "md5_digest": "e2184ec2b7ebafc5ea199b4dfba9d197", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23011, "upload_time": "2019-08-09T05:57:01", "url": "https://files.pythonhosted.org/packages/84/13/4fb9b76b3a9e536af2b1e50ab6d742597483f270ff0213235f72899eeb32/xml_miner-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "16e0cd092ac143cadc20ebceee718d2d", "sha256": "85af50d822a99f602b72fdc83c1b0bbd682aecc55deebf1895821715039f6c71" }, "downloads": -1, "filename": "xml_miner-0.0.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "16e0cd092ac143cadc20ebceee718d2d", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 19962, "upload_time": "2019-08-11T07:40:02", "url": "https://files.pythonhosted.org/packages/70/b2/f8dd936ae0d6c9b719ad8c7ef41b726c96f91a0f87f1d10cc7feeea69d01/xml_miner-0.0.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "933cf2a2256f43bb382293f163517a21", "sha256": "f69cc0e2e17f382730db54eaff9cbfe3c3505986cfdaa936d88af666945fc2a7" }, "downloads": -1, "filename": "xml_miner-0.0.3.tar.gz", "has_sig": false, "md5_digest": "933cf2a2256f43bb382293f163517a21", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23415, "upload_time": "2019-08-11T07:40:03", "url": "https://files.pythonhosted.org/packages/81/18/b6b7a0ca8a35bd3a53d2c21293b382ec5c5dc60bd9035b8a9c60c4c1a85e/xml_miner-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "3caedc887077398e30226a84233d0873", "sha256": "6b615e02e5b1178e9a0d1c69d7308c19edc211eb492d426ddc6d80a79d912d59" }, "downloads": -1, "filename": "xml_miner-0.0.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "3caedc887077398e30226a84233d0873", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 20130, "upload_time": "2019-09-11T02:10:20", "url": "https://files.pythonhosted.org/packages/9c/43/df4bd152238cd080774637f236bc0d98668ba02eb0945da069c556e04845/xml_miner-0.0.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "797f3509f08b207c883e06d3a667f1d2", "sha256": "c19fed28361c22b0b5b769ad71c51331cd03c2804a956207566d1b9875baabd0" }, "downloads": -1, "filename": "xml_miner-0.0.4.tar.gz", "has_sig": false, "md5_digest": "797f3509f08b207c883e06d3a667f1d2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23533, "upload_time": "2019-09-11T02:10:22", "url": "https://files.pythonhosted.org/packages/d4/b2/35b60e8dd25b2543e9aedf19f078ac32e7b9378fa225689bfc5a2c1aa09c/xml_miner-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "74e02e2fb6272f241c29d7de8e19469c", "sha256": "2a3ab1f6eca7df8e72a404de063ccb15d977e8e651fcf1978846049e37642d31" }, "downloads": -1, "filename": "xml_miner-0.0.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "74e02e2fb6272f241c29d7de8e19469c", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 20197, "upload_time": "2019-10-14T03:12:30", "url": "https://files.pythonhosted.org/packages/f6/eb/2af8d8f0d9a154f4345f53a8326ab7848463147a46d9aa4a072b48cfe34a/xml_miner-0.0.5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bf8a290733166329b030673d5ef53469", "sha256": "bdab76c295a8de22f8b6e78e4452fd70097d8e754e0c8b7252758672878636d2" }, "downloads": -1, "filename": "xml_miner-0.0.5.tar.gz", "has_sig": false, "md5_digest": "bf8a290733166329b030673d5ef53469", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23674, "upload_time": "2019-10-14T03:12:31", "url": "https://files.pythonhosted.org/packages/83/71/5c6ce0b5835b6bdd6f68252c2f4afbe19b786f441f8c4ca6df4d9849b04e/xml_miner-0.0.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "74e02e2fb6272f241c29d7de8e19469c", "sha256": "2a3ab1f6eca7df8e72a404de063ccb15d977e8e651fcf1978846049e37642d31" }, "downloads": -1, "filename": "xml_miner-0.0.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "74e02e2fb6272f241c29d7de8e19469c", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 20197, "upload_time": "2019-10-14T03:12:30", "url": "https://files.pythonhosted.org/packages/f6/eb/2af8d8f0d9a154f4345f53a8326ab7848463147a46d9aa4a072b48cfe34a/xml_miner-0.0.5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bf8a290733166329b030673d5ef53469", "sha256": "bdab76c295a8de22f8b6e78e4452fd70097d8e754e0c8b7252758672878636d2" }, "downloads": -1, "filename": "xml_miner-0.0.5.tar.gz", "has_sig": false, "md5_digest": "bf8a290733166329b030673d5ef53469", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23674, "upload_time": "2019-10-14T03:12:31", "url": "https://files.pythonhosted.org/packages/83/71/5c6ce0b5835b6bdd6f68252c2f4afbe19b786f441f8c4ca6df4d9849b04e/xml_miner-0.0.5.tar.gz" } ] }