{
"info": {
"author": "Najia Ahmadi and Alexander Ott",
"author_email": "a.ott@student.uni-tuebingen.de",
"bugtrack_url": null,
"classifiers": [],
"description": "Copyright (c) 2018 QBiC\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n\nDescription: \n Valifor\n =============================\n \n __Valifor__ is a small easily extensible command line tool to validate different NGS file formats.\n Currently the following file formats are supported:\n * Fasta\n * Fastq\n \n Motivation\n ======================\n \n In any data driven workflow the quality of the data is of high importance. Especially with formats that get can be manually modified there is a chance for corrupted files that don't fit the format anymore. \n If corrupted files are able to enter the workflow undetected the whole system can break and may have to be reset, even if it doesn't break the result gotten at the end will not be usable and all the work done will have to be repeated.\n In the second case you may get an even worse situation if the mistake is not found quickly. All further work and conclusions building on the corrupted result might also be unusable. \n \n To combat this situation there are validators for different formats that are able to make sure they conform to all the defined rules. For example there is [FastaValidator](https://github.com/jwaldman/FastaValidator) and the [FastqValidator](https://github.com/statgen/fastQValidator) which both can validate their respective formats. There are also programs that not only validate certain formats but also allow to run analysis on them like [bamUtil](https://github.com/statgen/bamUtil)\n \n Yet there are no tools that concentrate on validating a broad range of formats that would allow to have a single program that validates all of the different files going into a workflow. Instead currently it is needed to have multiple tools which all need to be maintained. \n \n In order to address this issue we started creating *valifor* a command line application that will be able to validate different NGS file formats while it's currently a small start, it's easily extensible and a needed format can quickly be added.\n \n Table of Contents\n =======================\n \n * [The overall architecture](#toa)\n * [Installation](#install)\n * [The command line interface](#cli)\n * [Adding a new Format](#addFormat)\n \n The overall architecture \n =======================\n The *valifor* architecture is shown in the following graph: \n \n \n \n \n The basic structure is the factory pattern.\n \n Following the flow of information the command line interface receives the input of one ore multiple paths to files/directories to validate and optionally the format which it should check.\n This is used to start the validation process in the main logic. There the factory is used to get the correct validator for the needed format from the child-classes of AbsValidator.\n The AbsValidator is the base class for all implemented validators. Each of the child-classes has to implement the interface given in the parent. \n The validator given by the factory is then used to validate the files and it returns information about if the files are valid and if not where the format is broken. \n \n \n Installation \n =======================\n You can install *valifor* using the source files found on [Github](https://github.com/qbicsoftware/dmqb-grp1-2018)\n \n After you downloaded the directory you can call:\n \n $ pip install path_to_the_directory\n \n to install it on your system.\n \n \n Using Valifor: \n ========================\n \n Once *valifor* is installed, it can be used. \n You can get an overview on how to use it by adding the ``--help`` option in the command line. It will show an overview and explanation for *valifor*:\n \n $ valifor --help\n Usage: valifor [OPTIONS] [PATHS]...\n \n Welcome to the format validator Valifor:\n \n Valifor is a easily extensible validator for different formats. To get\n started you can call \"valifor --help\" to get this message again and more\n information to the options.\n \n To use valifor you can call:\n \n \"valifor [path_to_file]\" to check its format based on its\n file-ending.\n \n \"valifor [path_to_file] --format [format]\" to check its format\n based on the given format.\n \n \tOptions:\n \t -f, --format [fasta-dna|fasta-aa|fastq]\n \t Type of format to be tested for all given files\n \t -h, --help Show this message and exit.\n \n you can also test whole directories by giving the path to the directory:\n \n $ valifor [path_to_dir]\n \n or test multiple files and/or directories:\n \n $ valifor [path_to_dir] [path_to_file]\n \n \n #### An example use case: \n \n To use *valifor* to validate a fasta file call:\n \n $ valifor example.fasta --format fasta-dna\n \n Then it prints the result in the command line. In case of a valid file it prints the format tested and the name:\n \n $ valid: example.fasta - fasta-dna\n \n or in case of a corrupted file it additionally returns the reason if it is possible:\n \n $ failed: example.fasta - fasta-dna: Character [O] not allowed in sequence. At line: 3:10\n \n if the file or directory doesn't exist it prints a warning with the full path: \n \n Given path does not exist: /path/to/fileOrDir\n \n \n Adding a new Format: \n =======================\n \n To add a new format you have to write a corresponding validator of course. \n The new plug-in must consist of a class that inherits from the AbsValidator and overwrites all its functions following the description in the documentation of AbsValidator.\n (Additionally there should be a unittest-class with test-files.)\n \n With the new validator-class most of the work is finished the only thing left is to integrate it in the project.\n For this you only need add it in the functions of the validator-factory which is also in the validator module.\n \n You will find 4 functions:\n \n * available_formats(): \n \t* add the new name which the option in the CLI should have. \n * get_format_from_ending(file_ending):\n \t* add the conversion from the file-ending to the name of the option.\n * get_validator(name):\n \t* add the new class as a return for the new option.\n * def get_uncertain_endings():\n \t* if the file-ending is not enough to be completely certain about the exact format also add it here. \n \n And with that you are finished and have integrated a new format for this project.\n \n \n \n \nKeywords: validator,formats,bioinformatics,fasta,fastq\nPlatform: UNKNOWN\n",
"description_content_type": "",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/qbicsoftware/dmqb-grp1-2018",
"keywords": "",
"license": "MIT License",
"maintainer": "",
"maintainer_email": "",
"name": "Valifor",
"package_url": "https://pypi.org/project/Valifor/",
"platform": "",
"project_url": "https://pypi.org/project/Valifor/",
"project_urls": {
"Homepage": "https://github.com/qbicsoftware/dmqb-grp1-2018"
},
"release_url": "https://pypi.org/project/Valifor/1.0.0/",
"requires_dist": null,
"requires_python": "",
"summary": "A small validator for formats in the NCBI context",
"version": "1.0.0"
},
"last_serial": 4036828,
"releases": {
"1.0.0": [
{
"comment_text": "",
"digests": {
"md5": "cb8bdd458f84f5308cee45126a6229be",
"sha256": "0ed63a775167b4b15a9e5989f1456654e4e7a5aed892eefd8bd18c9279da7947"
},
"downloads": -1,
"filename": "Valifor-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "cb8bdd458f84f5308cee45126a6229be",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 8559,
"upload_time": "2018-07-06T15:21:05",
"url": "https://files.pythonhosted.org/packages/f8/80/cfb71b1cbb1cc79ed1dd2c314c8bb76244250a83c6c6053afeec32e72c89/Valifor-1.0.0.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "cb8bdd458f84f5308cee45126a6229be",
"sha256": "0ed63a775167b4b15a9e5989f1456654e4e7a5aed892eefd8bd18c9279da7947"
},
"downloads": -1,
"filename": "Valifor-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "cb8bdd458f84f5308cee45126a6229be",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 8559,
"upload_time": "2018-07-06T15:21:05",
"url": "https://files.pythonhosted.org/packages/f8/80/cfb71b1cbb1cc79ed1dd2c314c8bb76244250a83c6c6053afeec32e72c89/Valifor-1.0.0.tar.gz"
}
]
}