{ "info": { "author": "Vit Novotny", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3.4", "Topic :: Scientific/Engineering :: Mathematics", "Topic :: Text Processing :: Markup :: XML" ], "description": "NTCIR MIaS Search \u2013 Our search engine for the NTCIR Math tasks\n==============================================================\n[![CircleCI](https://circleci.com/gh/MIR-MU/ntcir-mias-search/tree/master.svg?style=shield)][ci]\n\n [ci]: https://circleci.com/gh/MIR-MU/ntcir-mias-search/tree/master (CircleCI)\n\nNTCIR MIaS Search is a Python 3 command-line utility that operates on top of\n[WebMIaS][] and that implements the Math Information Retrival system that won\nthe NTCIR-11 Math-2 main task (see the [task paper][aizawaetal14-ntcir11], and\nthe [system description paper][ruzickaetal14-math]).\n\nExperimentally, NTCIR MIaS Search also reranks subquery results according to\nthe relevance probability estimates from the [NTCIR Math Density\nEstimator][ntcir-math-density] package.\n\n[aizawaetal14-ntcir11]: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.686.444&rep=rep1&type=pdf (NTCIR-11 Math-2 Task Overview)\n[ntcir-math-density]: https://github.com/MIR-MU/ntcir-math-density (NTCIR Math Density Estimator)\n[ruzickaetal14-math]: http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings11/pdf/NTCIR/Math-2/07-NTCIR11-MATH-RuzickaM.pdf (Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy)\n[WebMIaS]: https://github.com/MIR-MU/WebMIaS (WebMIaS)\n\nUsage\n=====\nInstalling\n----------\nThe package can be installed by executing the following command:\n\n $ pip install ntcir-mias-search\n\nDisplaying the usage\n--------------------\nUsage information for the package can be displayed by executing the following\ncommand:\n\n $ ntcir-mias-search --help\n usage: ntcir-mias-search [-h] --dataset DATASET --topics TOPICS --positions\n POSITIONS --estimates ESTIMATES --webmias-url\n WEBMIAS_URL\n [--webmias-index-number WEBMIAS_INDEX_NUMBER]\n [--num-workers-querying NUM_WORKERS_QUERYING]\n [--num-workers-merging NUM_WORKERS_MERGING]\n --output-directory OUTPUT_DIRECTORY\n\n Use topics in the NTCIR-10 Math, NTCIR-11 Math-2, and NTCIR-12 MathIR format\n to query the WebMIaS interface of the MIaS Math Information Retrieval system\n and to retrieve result document lists.\n\n optional arguments:\n -h, --help show this help message and exit\n --dataset DATASET A path to a directory containing a dataset in the\n NTCIR-11 Math-2, and NTCIR-12 MathIR XHTML5 format.\n The directory does not need to exist, since the path\n is only required for extracting data from the file\n with estimated positions of paragraph identifiers.\n --topics TOPICS A path to a file containing topics in the NTCIR-10\n Math, NTCIR-11 Math-2, and NTCIR-12 MathIR format.\n --positions POSITIONS \n The path to the file, where the estimated positions of\n all paragraph identifiers from our dataset were stored\n by the NTCIR Math Density Estimator package.\n --estimates ESTIMATES \n The path to the file, where the density, and\n probability estimates for our dataset were stored by\n the NTCIR Math Density Estimator package.\n --webmias-url WEBMIAS_URL\n The URL at which a WebMIaS Java Servlet has been\n deployed.\n --webmias-index-number WEBMIAS_INDEX_NUMBER\n The numeric identifier of the WebMIaS index that\n corresponds to the dataset. Defaults to 0.\n --num-workers-querying NUM_WORKERS_QUERYING\n The number of processes that will send queries to\n WebMIaS. Defaults to 1. Note that querying, reranking,\n and merging takes place simmultaneously.\n --num-workers-merging NUM_WORKERS_MERGING\n The number of processes that will rerank results.\n Defaults to 3. Note that querying, reranking, and\n merging takes place simmultaneously.\n --output-directory OUTPUT_DIRECTORY\n The path to the directory, where the output files will\n be stored.\n --plots PLOTS [PLOTS ...]\n The path to the files, where the evaluation results\n will plotted.\n\nQuerying WebMIaS\n----------------\nThe following command queries a local WebMIaS instance using 64 worker\nprocesses:\n\n $ mkdir search_results\n\n $ ntcir-mias-search --num-workers-querying 8 --num-workers-merging 56 \\\n > --dataset ntcir-11-12 \\\n > --topics NTCIR11-Math2-queries-participants.xml \\\n > --judgements NTCIR11_Math-qrels.dat \\\n > --estimates estimates.pkl.gz --positions positions.pkl.gz \\\n > --webmias-url http://localhost:58080/WebMIaS --webmias-index-number 1 \\\n > --plots plot.pdf plot.svg \\\n > --output-directory search_results\n Reading relevance judgements from NTCIR11_Math-qrels.dat\n 50 judged topics and 2500 total judgements in NTCIR11_Math-qrels.dat\n Reading topics from NTCIR11-Math2-queries-participants.xml\n 50 topics (NTCIR11-Math-1, NTCIR11-Math-2, ...) contain 55 formulae, and 113 keywords\n Establishing connection with a WebMIaS Java Servlet at http://localhost:58080/WebMIaS\n Reading paragraph position estimates from positions.pkl.gz\n 8301578 total paragraph identifiers in positions.pkl.gz\n Reading density, and probability estimates from estimates.pkl.gz\n Querying WebMIaSIndex(http://localhost:58080/WebMIaS, 1), reranking and merging results\n Using 3 strategies to aggregate MIaS scores with probability estimates:\n - The best possible score that uses relevance judgements (look for 'best' in filenames)\n - The original MIaS score with the probability estimate discarded (look for 'orig' in filenames)\n - The worst possible score that uses relevance judgements (look for 'worst' in filenames)\n Storing reranked per-query result lists in search_results\n Using 4 formats to represent mathematical formulae in queries:\n - Content MathML XML language (look for 'CMath' in filenames)\n - Combined Presentation and Content MathML XML language (look for 'PCMath' in filenames)\n - Presentation MathML XML language (look for 'PMath' in filenames)\n - The TeX language by professor Knuth (look for 'TeX' in filenames)\n Result list for topic NTCIR11-Math-9 contains only 188 / 1000 results, sampling the dataset\n Result list for topic NTCIR11-Math-17 contains only 716 / 1000 results, sampling the dataset\n Result list for topic NTCIR11-Math-26 contains only 518 / 1000 results, sampling the dataset\n Result list for topic NTCIR11-Math-39 contains only 419 / 1000 results, sampling the dataset\n Result list for topic NTCIR11-Math-43 contains only 924 / 1000 results, sampling the dataset\n get_results: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 50/50 [00:26<00:00, 1.88it/s]\n rerank_and_merge_results: 200it [01:02, 3.18it/s]\n Storing final result lists in mias_search_results\n 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 12/12 [00:13<00:00, 3.73it/s]\n Evaluation results:\n - best, PCMath: 0.5569\n - best, PMath: 0.5283\n - best, TeX: 0.5076\n - best, CMath: 0.4983\n - orig, PCMath: 0.4917\n - ...\n - orig, PMath: 0.4616\n - worst, CMath: 0.3080\n - worst, TeX: 0.2810\n - worst, PMath: 0.1156\n - worst, PCMath: 0.1141\n Plotting plot.svg\n Plotting plot.pdf\n\n $ ls search_results\n final_CMath.best.tsv\n final_CMath.orig.tsv\n final_CMath.worst.tsv\n final_PCMath.best.tsv\n final_PCMath.orig.tsv\n final_PCMath.worst.tsv\n final_PMath.best.tsv\n final_PMath.orig.tsv\n final_PMath.worst.tsv\n final_TeX.best.tsv\n final_TeX.orig.tsv\n final_TeX.worst.tsv\n NTCIR11-Math-10_CMath.1.query.txt\n NTCIR11-Math-10_CMath.1.response.xml\n NTCIR11-Math-10_CMath.1.results.best.tsv\n NTCIR11-Math-10_CMath.1.results.orig.tsv\n NTCIR11-Math-10_CMath.1.results.worst.tsv\n NTCIR11-Math-10_CMath.2.query.txt\n NTCIR11-Math-10_CMath.2.response.xml\n ...\n\nThe following command queries a [remote WebMIaS instance][WebMIaS-demo] using\n64 worker processes:\n\n $ mkdir search_results\n\n $ ntcir-mias-search --num-workers-querying 8 --num-workers-merging 56 \\\n > --dataset ntcir-11-12 \\\n > --topics NTCIR11-Math2-queries-participants.xml \\\n > --judgements NTCIR11_Math-qrels.dat \\\n > --estimates estimates.pkl.gz --positions positions.pkl.gz \\\n > --webmias-url https://mir.fi.muni.cz/webmias-demo --webmias-index-number 0 \\\n > --plots plot.pdf plot.svg \\\n > --output-directory search_results\n Reading relevance judgements from NTCIR11_Math-qrels.dat\n 50 judged topics and 2500 total judgements in NTCIR11_Math-qrels.dat\n Reading topics from NTCIR11-Math2-queries-participants.xml\n 50 topics (NTCIR11-Math-1, NTCIR11-Math-2, ...) contain 55 formulae, and 113 keywords\n Establishing connection with a WebMIaS Java Servlet at https://mir.fi.muni.cz/webmias-demo\n Reading paragraph position estimates from positions.pkl.gz\n 8301578 total paragraph identifiers in positions.pkl.gz\n Reading density, and probability estimates from estimates.pkl.gz\n Querying WebMIaSIndex(https://mir.fi.muni.cz/webmias-demo, 0), reranking and merging results\n Using 3 strategies to aggregate MIaS scores with probability estimates:\n - The best possible score that uses relevance judgements (look for 'best' in filenames)\n - The original MIaS score with the probability estimate discarded (look for 'orig' in filenames)\n - The worst possible score that uses relevance judgements (look for 'worst' in filenames)\n Storing reranked per-query result lists in search_results\n Using 4 formats to represent mathematical formulae in queries:\n - Content MathML XML language (look for 'CMath' in filenames)\n - Combined Presentation and Content MathML XML language (look for 'PCMath' in filenames)\n - Presentation MathML XML language (look for 'PMath' in filenames)\n - The TeX language by professor Knuth (look for 'TeX' in filenames)\n get_results: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 50/50 [05:29<00:00, 6.58s/it]\n rerank_and_merge_results: 200it [06:57, 2.09s/it]\n Storing final result lists in mias_search_results\n 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 12/12 [00:13<00:00, 3.73it/s]\n Evaluation results:\n - best, PCMath: 0.5569\n - best, PMath: 0.5283\n - best, TeX: 0.5076\n - best, CMath: 0.4983\n - orig, PCMath: 0.4917\n - ...\n - orig, PMath: 0.4616\n - worst, CMath: 0.3080\n - worst, TeX: 0.2810\n - worst, PMath: 0.1156\n - worst, PCMath: 0.1141\n Plotting plot.svg\n Plotting plot.pdf\n\n $ ls search_results\n final_CMath.best.tsv\n final_CMath.orig.tsv\n final_CMath.worst.tsv\n final_PCMath.best.tsv\n final_PCMath.orig.tsv\n final_PCMath.worst.tsv\n final_PMath.best.tsv\n final_PMath.orig.tsv\n final_PMath.worst.tsv\n final_TeX.best.tsv\n final_TeX.orig.tsv\n final_TeX.worst.tsv\n NTCIR11-Math-10_CMath.1.query.txt\n NTCIR11-Math-10_CMath.1.response.xml\n NTCIR11-Math-10_CMath.1.results.best.tsv\n NTCIR11-Math-10_CMath.1.results.orig.tsv\n NTCIR11-Math-10_CMath.1.results.worst.tsv\n NTCIR11-Math-10_CMath.2.query.txt\n NTCIR11-Math-10_CMath.2.response.xml\n ...\n\n[WebMIaS-demo]: https://mir.fi.muni.cz/webmias-demo/ (Web Math Indexer and Searcher)\n\nContributing\n============\n\nTo get familiar with the codebase, please consult the UML class diagram in the\n[Umbrello][www:Umbrello] project document [project.xmi](project.xmi):\n\n![Rendered UML class diagram](project.svg)\n\n[www:Umbrello]: https://umbrello.kde.org/ (Umbrello Project - Welcome to Umbrello - The UML Modeller)\n\nCiting NTCIR MIaS Search\n========================\nText\n----\nR\u016e\u017dI\u010cKA, Michal, Petr SOJKA and Martin L\u00cd\u0160KA. Math Indexer and Searcher under\nthe Hood: History and Development of a Winning Strategy. In Noriko Kando, Hideo\nJoho, Kazuaki Kishida. *Proceedings of the 11th NTCIR Conference on Evaluation\nof Information Access Technologies.* Tokyo: National Institute of Informatics,\n2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430 Japan, 2014. p. 127-134, 8 pp.\nISBN 978-4-86049-065-2.\n\nBibTeX\n------\n``` bib\n@inproceedings{mir:MIaSNTCIR-11,\n author = \"Michal R\\r{u}\\v{z}i\\v{c}ka and Petr Sojka and Michal L{\\' i}\\v{s}ka\",\n title = \"{Math Indexer and Searcher under the Hood:\n History and Development of a Winning Strategy}\",\n month = Dec,\n year = 2014,\n address = \"Tokyo\",\n booktitle = \"{Proc. of the 11th NTCIR Conference on Evaluation\n of Information Access Technologies}\",\n editor = \"Hideo Joho and Kazuaki Kishida\",\n publisher = \"{NII, Tokyo, Japan}\",\n pages = \"127--134\",\n}\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/MIR-MU/ntcir-mias-search", "keywords": "ntcir mias math_information_retrieval", "license": "MIT", "maintainer": "Vit Novotny", "maintainer_email": "", "name": "ntcir-mias-search", "package_url": "https://pypi.org/project/ntcir-mias-search/", "platform": "", "project_url": "https://pypi.org/project/ntcir-mias-search/", "project_urls": { "Homepage": "https://github.com/MIR-MU/ntcir-mias-search", "Source": "https://github.com/MIR-MU/ntcir-mias-search" }, "release_url": "https://pypi.org/project/ntcir-mias-search/0.2.2/", "requires_dist": [ "tqdm (~=4.23.3)", "lxml (~=4.2.1)", "matplotlib (~=2.2.2)", "numpy (~=1.14.3)", "requests (~=2.18.4)" ], "requires_python": "~= 3.4", "summary": " The MIaS Search package implements the Math Information Retrival system that won the NTCIR-11 Math-2 main task (R\u016f\u017ei\u010dka et al., 2014). ", "version": "0.2.2" }, "last_serial": 3999422, "releases": { "0.2.0": [ { "comment_text": "", "digests": { "md5": "73bb7ca268b29ac03d8ba778af7d398a", "sha256": "7373c39157f7accb07b7bd09765de65b88e7e2e479ee143a19425097cfcc64ba" }, "downloads": -1, "filename": "ntcir_mias_search-0.2.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "73bb7ca268b29ac03d8ba778af7d398a", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": "~= 3.4", "size": 37346, "upload_time": "2018-06-21T21:15:56", "url": "https://files.pythonhosted.org/packages/85/90/8f3e4ceb0552bad203a2a17134e68f14c572b492a3170faa274b381d1e73/ntcir_mias_search-0.2.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7e5a4583a1402ce5475b92ea6a46ef58", "sha256": "10a60e3850fdf11804d29b2497a6f8fd62e16593f67d7d7b976798883b660c1c" }, "downloads": -1, "filename": "ntcir_mias_search-0.2.0.tar.gz", "has_sig": false, "md5_digest": "7e5a4583a1402ce5475b92ea6a46ef58", "packagetype": "sdist", "python_version": "source", "requires_python": "~= 3.4", "size": 22143, "upload_time": "2018-06-21T21:15:59", "url": "https://files.pythonhosted.org/packages/90/c1/5c238a84568fe60898bc97362e33ed9a02343e67692fe5bdb27cf35dcf88/ntcir_mias_search-0.2.0.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "0795dad99b0fd621919049773afb454e", "sha256": "4d6a9a6a4dbc59e31fec8f2d83e02b9633efee6d9944e2aba7b38e20e3102867" }, "downloads": -1, "filename": "ntcir_mias_search-0.2.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "0795dad99b0fd621919049773afb454e", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": "~= 3.4", "size": 38325, "upload_time": "2018-06-24T23:36:43", "url": "https://files.pythonhosted.org/packages/7d/02/581a729ca99f34f176a19b3de6b5c317ec5f3203113c9e5d832dc1d097e6/ntcir_mias_search-0.2.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b3922bc88d822275cf1a731d625641e3", "sha256": "f7acc3e32636b4df6011e7156a1b817c910ae77d572766b4c497ad6bfb5ee791" }, "downloads": -1, "filename": "ntcir_mias_search-0.2.2.tar.gz", "has_sig": false, "md5_digest": "b3922bc88d822275cf1a731d625641e3", "packagetype": "sdist", "python_version": "source", "requires_python": "~= 3.4", "size": 22484, "upload_time": "2018-06-25T07:55:20", "url": "https://files.pythonhosted.org/packages/c6/7f/e9176691123636975a4059914147e75ab0b09c67d173dcf16a229d46e5cf/ntcir_mias_search-0.2.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0795dad99b0fd621919049773afb454e", "sha256": "4d6a9a6a4dbc59e31fec8f2d83e02b9633efee6d9944e2aba7b38e20e3102867" }, "downloads": -1, "filename": "ntcir_mias_search-0.2.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "0795dad99b0fd621919049773afb454e", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": "~= 3.4", "size": 38325, "upload_time": "2018-06-24T23:36:43", "url": "https://files.pythonhosted.org/packages/7d/02/581a729ca99f34f176a19b3de6b5c317ec5f3203113c9e5d832dc1d097e6/ntcir_mias_search-0.2.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b3922bc88d822275cf1a731d625641e3", "sha256": "f7acc3e32636b4df6011e7156a1b817c910ae77d572766b4c497ad6bfb5ee791" }, "downloads": -1, "filename": "ntcir_mias_search-0.2.2.tar.gz", "has_sig": false, "md5_digest": "b3922bc88d822275cf1a731d625641e3", "packagetype": "sdist", "python_version": "source", "requires_python": "~= 3.4", "size": 22484, "upload_time": "2018-06-25T07:55:20", "url": "https://files.pythonhosted.org/packages/c6/7f/e9176691123636975a4059914147e75ab0b09c67d173dcf16a229d46e5cf/ntcir_mias_search-0.2.2.tar.gz" } ] }