{
"info": {
"author": "ACEnglish",
"author_email": "acenglish@gmail.com",
"bugtrack_url": null,
"classifiers": [],
"description": "```\n\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2557\u2588\u2588\u2588\u2588\u2588\u2588\u2557 \u2588\u2588\u2557 \u2588\u2588\u2557\u2588\u2588\u2557 \u2588\u2588\u2557 \u2588\u2588\u2588\u2588\u2588\u2557 \u2588\u2588\u2588\u2588\u2588\u2588\u2557 \u2588\u2588\u2557\n\u255a\u2550\u2550\u2588\u2588\u2554\u2550\u2550\u255d\u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2557\u2588\u2588\u2551 \u2588\u2588\u2551\u2588\u2588\u2551 \u2588\u2588\u2551\u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2557\u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2557\u2588\u2588\u2551\n \u2588\u2588\u2551 \u2588\u2588\u2588\u2588\u2588\u2588\u2554\u255d\u2588\u2588\u2551 \u2588\u2588\u2551\u2588\u2588\u2551 \u2588\u2588\u2551\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2551\u2588\u2588\u2588\u2588\u2588\u2588\u2554\u255d\u2588\u2588\u2551\n \u2588\u2588\u2551 \u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2557\u2588\u2588\u2551 \u2588\u2588\u2551\u255a\u2588\u2588\u2557 \u2588\u2588\u2554\u255d\u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2551\u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2557\u2588\u2588\u2551\n \u2588\u2588\u2551 \u2588\u2588\u2551 \u2588\u2588\u2551\u255a\u2588\u2588\u2588\u2588\u2588\u2588\u2554\u255d \u255a\u2588\u2588\u2588\u2588\u2554\u255d \u2588\u2588\u2551 \u2588\u2588\u2551\u2588\u2588\u2551 \u2588\u2588\u2551\u2588\u2588\u2551\n \u255a\u2550\u255d \u255a\u2550\u255d \u255a\u2550\u255d \u255a\u2550\u2550\u2550\u2550\u2550\u255d \u255a\u2550\u2550\u2550\u255d \u255a\u2550\u255d \u255a\u2550\u255d\u255a\u2550\u255d \u255a\u2550\u255d\u255a\u2550\u255d\n```\n\nStructural variant comparison tool for VCFs\n\nGiven benchmark and comparsion sets of SVs, calculate the recall, precision, and f-measure.\n\n[Spiral Genetics](https:www.spiralgenetics.com)\n\n[Motivation](https://docs.google.com/presentation/d/17mvC1XOpOm7khAbZwF3SgtG2Rl4M9Mro37yF2nN7GhE/edit)\n\nUPDATES\n=======\n\nTruvari has some big changes. In order to keep up with the retirement of Python 2.7 https://pythonclock.org/\nWe're now only supporting Python 3.\n\nAdditionally, we now package Truvari so it and its dependencies can be installed directly. See Installation \nbelow. This will enable us to refactor the code for easier maintenance and reusability.\n\nFinally, we now automatically report genotype comparisons in the summary stats.\n\nInstallation\n============\n\nTruvari uses Python 3.7 and can be installed with pip:\n\n $ pip install Truvari \n\n\nQuick start\n===========\n\n $ truvari -b base_calls.vcf -c compare_calls.vcf -o output_dir/\n\nOutputs\n=======\n\n * tp-call.vcf -- annotated true positive calls from the COMP\n * tp-base.vcf -- anotated true positive calls form the BASE\n * fn.vcf -- false negative calls from BASE\n * fp.vcf -- false positive calls from COMP\n * base-filter.vcf -- size filtered calls from BASE\n * call-filter.vcf -- size filtered calls from COMP\n * summary.txt -- json output of performance stats\n * log.txt -- run log\n * giab_report.txt -- (optional) Summary of GIAB benchmark calls. See \"Using the GIAB Report\" below.\n\nsummary.txt\n===========\n\nThe following stats are generated for benchmarking your call set.\n
| Metric | Definition | \n
|---|
| TP-base | Number of matching calls from the base vcf |
\n| TP-call | Number of matching calls from the comp vcf |
\n| FP | Number of non-matching calls from the comp vcf |
\n| FN | Number of non-matching calls from the base vcf |
\n| precision | TP-call / (TP-call + FP) |
\n| recall | TP-base / (TP-base + FN) |
\n| f1 | (recall * precision) / (recall + precision) |
\n| base cnt | Number of calls in the base vcf |
\n| call cnt | Number of calls in the comp vcf |
\n| base size filtered | Number of base vcf calls outside of (sizemin, sizemax) |
\n| call size filtered | Number of comp vcf calls outside of (sizemin, sizemax) |
\n| base gt filtered | Number of base calls not passing the no-ref parameter filter |
\n| call gt filtered | Number of comp calls not passing the no-ref parameter filter |
\n| TP-call_TP-gt | TP-call's with genotype match |
\n| TP-call_FP-gt | TP-call's without genotype match |
\n| TP-base_TP-gt | TP-base's with genotype match |
\n| TP-base_FP-gt | TP-base's without genotype match |
\n| gt_precision | TP-call_TP-gt / (TP-call_TP-gt + FP + TP-call_FP-gt) |
\n| gt_recall | TP-base_TP-gt / (TP-base_TP-gt / FN) |
\n| gt_f1 | (gt_recall * gt_precision) / (gt_recall + gt_precision) |
\n
\n\nMethodology\n===========\n\n```\nInput:\n BaseCall - Benchmark TruthSet of SVs\n CompCalls - Comparison SVs from another program\nBuild IntervalTree of CompCalls\nFor each BaseCall:\n Fetch CompCalls overlapping within *refdist*. \n If typematch and LevDistRatio >= *pctsim* \\\n and SizeRatio >= *pctsize* and PctRecOvl >= *pctovl*: \n Add CompCall to list of Neighbors\n Sort list of Neighbors by TruScore ((2*sim + 1*size + 1*ovl) / 3.0)\n Take CompCall with highest TruScore and BaseCall as TPs\n Only use a CompCall once if not --multimatch\n If no neighbors: BaseCall is FN\nFor each CompCall:\n If not used: mark as FP\n```\n\nMatching Parameters\n--------------------\n\n| Anno | Definition |
\n| TruScore\t | Truvari score for similarity of match. `((2*sim + 1*size + 1*ovl) / 3.0)` |
\n| PctSeqSimilarity | Pct sequence similarity between this variant and its closest match |
\n| PctSizeSimilarity | Pct size similarity between this variant and it's closest match |
\n| PctRecOverlap | Percent reciprocal overlap of the two calls' coordinates |
\n| StartDistance | Distance of this call's start from matching call's start |
\n| EndDistance | Distance of this call's end from matching call's end |
\n| SizeDiff | Difference in size(basecall) and size(compcall) |
\n| NumNeighbors | Number of comparison calls that were in the neighborhood (REFDIST) of the base call |
\n| NumThresholdNeighbors | Number of comparison calls that passed threshold matching of the base call |
\n
\n\nNumNeighbors and NumThresholdNeighbors are also added to the FN vcf.\n\nUsing the GIAB Report\n---------------------\n\nWhen running against the GIAB SV benchmark (link below), you can create a detailed report of \ncalls summarized by the GIAB VCF's SVTYPE, SVLEN, Technology, and Repeat annotations.\n\nTo create this report.\n\n1. Run truvari with the flag `--giabreport`.\n2. In your output directory, you will find a file named `giab_report.txt`.\n3. Next, make a copy of the \n[Truvari Report Template Google Sheet](https://docs.google.com/spreadsheets/d/1T3EdpyLO1Kq-bJ8SDatqJ5nP_wwFKCrH0qhxorvTVd4/edit?usp=sharing).\n4. Finally, paste ALL of the information inside `giab_report.txt` into the \"RawData\" tab. Be careful not \nto alter the report text in any way. If successul, the \"Formatted\" tab you will have a fully formated report.\n\nWhile Truvari can use other benchmark sets, this formatted report currently only works with GIAB SV v0.5 and v0.6. Work\nwill need to be done to ensure Truvari can parse future GIAB SV releases.\n\n