{
    "info": {
        "author": "polycracker team",
        "author_email": "sgordon@lbl.gov",
        "bugtrack_url": null,
        "classifiers": [
            "License :: OSI Approved :: MIT License",
            "Programming Language :: Python :: 2",
            "Programming Language :: Python :: 2.7"
        ],
        "description": "## Quick summary\n\npolyCRACKER is an unsupervised machine learning approach to the classification and extraction\nof sub-genomes from a set of genomic sequences provided in FASTA format. It currently\ntailored to the analysis of moderate to recently derived allopolyploid species. It does not\nrequire training data or even the number of subgenomes to be known (although this helps). It does require\nsome empirical testing, however, in order to determine the most likely number of subgenomes. \n\n#### polyCRACKER can be used to:\n\n1. **Identify subgenomes**  \n\n2. **Extract subgenomes**\n\n3. **Validate subgenomes**\n\n4. **Explorative analysis of subgenomes relative to genomic features**\n\npolyCRACKER works by using repeat kmers (corresponding to viruses, transposons, and other\nselfish repetitive elements) as molecular barcodes for identifying species of origin. Since\nsuch repetitive sequences evolve quickly and copy themselves throughout a genome of a species,\nbut not other closely related species), they can be used to group subsequences based on species\nof origin. \n\nGiven a pool of DNA sequences derived from multiple species,\npolyCRACKER can be used to identify and separate sequences belonging to one species versus another.\nIn some cases, polyCRACKER performs as well at separating subgenomes of an allopolyploid as the manual\nextraction of subgenomes by sequence alignment, when the progenitor genome sequences are known and available.\n\n#### For more information, see the polyCRACKER manuscript preprint. Please cite the following article if you use polyCRACKER in your work.\n\n**[PolyCRACKER, a robust method for the unsupervised partitioning of polyploid subgenomes by signatures of repetitive DNA evolution.\nSean P Gordon, Joshua J Levy, John P Vogel](https://www.biorxiv.org/content/early/2018/12/03/484832)**\n\n(First and second authors are co-first authors.)\n\n## Getting Started With polyCRACKER\n(requires mac or linux OS)\n(Docker works with Windows)\n\n###Install of polyCRACKER dependencies\n\n- Docker (recommended)\n\nInstall of dependencies can be skipped entirely by using the provided docker image available\non github.\n\n#### Miniconda-based image \n\nThe core functionality of polyCRACKER can be accessed by using a miniconda-based image:\n\n`docker pull sgordon/polycracker-miniconda:1.0.2`\n\nYou may need to increase settings in docker to allow additional memory and CPU usage from within\ndocker.  Please see this thread:\n[How to assign more memory to docker](https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container)\n\nWe recommend allowing at least 5 Gb of RAM and at least 4 CPU.\n\n### Running polyCRACKER on test data\n\n```console\n    docker run -it sgordon/polycracker-miniconda:1.0.2\n\n    source activate pCRACKER_p27\n\n    tar -xzvf ./test_data/test_fasta_files/algae.fa.tar.gz && mv algae.fa ./test_data/test_fasta_files/\n\n    polycracker.py test_pipeline -env pCRACKER_p27\n``` \nResults stored in test_results directory.\n\nTo exit the container:\n```console\nexit\n``` \n\n#### Note that if you want to inspect the results outside of the docker container, you may need to mount a volume.  \n\nThe details on mounting a volume in the context of docker is outside the\nscope of this tutorial.  Nonetheless, if you have a `analysis_results` directory on your machine\nand wish to copy the results from polyCRACKER to that directory, then you may modify the above commands to:\n\n```console\n    docker run -v \"$(pwd)\"/analysis_results:/analysis_results -i -t sgordon/polycracker-miniconda:1.0.2\n\n    source activate pCRACKER_p27\n\n    tar -xzvf ./test_data/test_fasta_files/algae.fa.tar.gz && mv algae.fa ./test_data/test_fasta_files/\n\n    polycracker.py test_pipeline -env pCRACKER_p27\n\n    cp -R test_results /analysis_results/\n``` \n\nThen exit the container as above.  The results should be persisted within the analysis_results/test_results\nsubdirectory.  You may also want to perform this mounting when running on your own data.\n\nYou may also build your own docker image by using the Dockerfile at the root of this repository.\nDetails on this are described at the bottom of the page.\n\n- Manual conda install of the required dependencies and run within a conda environment.  See below for more\ndetails on manual conda installation of dependencies.\n\n#### For more testing data:\n\n* [Tobacco (pseudomolecule-anchored and unanchored)](ftp://ftp.solgenomics.net/genomes/Nicotiana_tabacum/edwards_et_al_2017/assembly/)\n* [2017 Wheat genome](https://www.ncbi.nlm.nih.gov/assembly/GCA_002220415.2)\n* [Creinhardtii](https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Creinhardtii)\n* [Csubellipsoidea](https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_CsubellipsoideaC_169)\n* [Ustilago Ustma](https://genome.jgi.doe.gov/Ustma1/Ustma1.home.html)\n* [Ustilago Usthor](https://genome.jgi.doe.gov/Usthor1/Usthor1.home.html) \n* [Aspergillus species](https://genome.jgi.doe.gov/Aspergillus/Aspergillus.info.html)  \n\n### Running on your own data using docker\n\n1. Edit config_polyCRACKER.txt (See below)\n\nThe flow will be similar to test data, but notably you will minimally need to:\n2. Move fasta file in question to ./fasta_files\nThis can be performed by mounting a volume to the docker container as described above,\nprovided that the input FASTA file of interest is in the directory being mounted (for example, \"analysis_results\"),\nthen copying the FASTA file from the mounted directory into the ./fasta_files directory\nthat already exists within the container.\n```console\n    docker pull sgordon/polycracker-miniconda:1.0.2\n\n    # assumes we have copied user input FASTA file into analysis_results directory that we will mount\n    docker run -v \"$(pwd)\"/analysis_results:/analysis_results -i -t sgordon/polycracker-miniconda:1.0.2\n\n    source activate pCRACKER_p27\n\n    # copying user input FASTA file into fasta_files directory\n    cp /analysis_results/[user FASTA file] ./fasta_files\n\n    polycracker.py run_pipeline -env pCRACKER_p27\n\n    cp -R analysisOutputs /analysis_results/\n``` \n\nResults should be in ./analysisOutputs/\\*/\\* sub-directories.\n\n- There's a cluster results directory containing initial clusters of subsequences, and final results directory containing final clusters after signal amplification. \n\nSometimes signal amplification may fail due to the over agressive iterative recruitment of kmers\nthat are either not subgenome-specific or they are specific to the opposite subgenome and \nincorrectly recruited.\nIn this case, one can attain intermediate results by going into ./analysisOutputs/\\*/\\*/bootstrap_\\* directories and looking for extractedSubgenomes subdirectory containing fastas.\n\nNote that extracted subgenome fasta files are still \"chunked\" (split according to the specified subsequence length during normalization),\nbut contain positional information with respect to scaffold of origin.\n\n- Clustering plots found at in \\*html files in project directory.\n\n- Additional plots can be made using polycracker.py plotPositions -h, and there are a few other plotting utilities.\n\n**Pro tip**: Can rerun/resume pipeline at various parts by setting parts of the config already run to 0 instead of 1.\n\n**Pro tip**: Use command polycracker.py number_repeatmers_per_subsequence to find a histogram of the number of repeat-mers present in each chunked genome fragment. \nFile saved as kmers_per_subsequence.png\n\nIf this histogram is too skewed to low kmer counts in each subsequence, then either:\n- Reduce kmer size\n- Increase chunk size splitFastaLineLength\n- Reduce the low_count threshold\n- Set perfectmode to 1\n- Consider adding the NonChunk = 1 to config  \n- And/Or Enforce a higher MinChunkSize.  \n\nVERY IMPORTANT! \n\nIf there are is not enough repeat content included in the subsequences, they will be hard to bin.\nWhen running the pipeline, \"kmers_per_subsequence.png\" may be run in order to identify the frequency of kmers across\nsubsequences and then tune relevant parameters.\n\n### Configuration for Running polyCRACKER Pipeline using Nextflow\n\npolyCRACKER itself is a python module at the root of the repository and contains command line\nfunctions that can be individually accessed as noted above.\n\nBecause polyCRACKER consists of many individual commandline functions,\nwe provide a pipeline written in nextflow workflow language for the convenience\nof users. The nextflow implementation allows a single command to then execute all the required steps in serial.\nThis workflow is accessed for test data through:\n\n`polycracker.py test_pipeline`\n\nor as shown below for use on your own data:\n\n`polycracker.py run_pipeline`.\n\nThe workflow itself is polyCRACKER_pipeline.nf, which is now within the polycracker subdirectory.\nCurrently several resource parameters may need to be edited within\nthe nextflow script itself, namely parameters on the number of CPU to use and memory resoures.\nThese parameters are currently set to conservative values so that it may be run on test data on\na modern laptop with 6 cores and at least 5 Gb of memory.  When executing on larger datasets you\nwill need to increase these resource settings.  In particular, required memory resources scales with\nthe size of the input FASTA sequences being analyzed.  The parmeters can be change on these lines:\n\n`blastMemStr = \"export _JAVA_OPTIONS='-Xms5G -Xmx\" + blastMemory + \"G'\"`\n\nCPU requirements are specified in lines prefixed by \"cpus\" like this:\n\n`cpus = { writeKmer == 1 ? 6 : 1 }`\n\n### polyCRACKER Configuration file setup\n\nThe provided config file within the root of the\nrepository is 'config_polyCRACKER.txt'.  \n\nParameters for controlling the amount of resources for individual functions and third party\nprograms are set within the config file.  Please modify as described below to suit your FASTA\ninput as described below.\n\n-  **File paths:**\nCopy your input FASTA file ( a single FASTA file with all your sequences ) into the `fasta_files` directory.\nYou may alternatively modify `fastaPath` to the path for your respective FASTA input file.\nYou may leave the other paths as provided in the example config.\nFASTA files must end with `.fa` or `.fasta` file extensions or they will not be recognized.\n\n```console\n    blastPath = ./blast_files/\n    kmercountPath = ./kmercount_files/\n    fastaPath = ./test_data/test_fasta_files/\n    bedPath = ./bed_files/\n```\n\n- **genome:**\nThe full filename (not the full path) of your input fasta file.\n\n- **sge interpreter:**\nUnless using an sge or slurm cluster, set local to 1.  We do not currently document how to use\nsge or slurm muti-node clusters, but experienced users may try it on their own.\n\n- **use bbtools:**\nPlease leave this set to 1.\n\n- **Settings surrounding the number of anticipated subgenomes:**\nRecommended practice, number of dimensions > number of subgenomes.  Modify accordingly.\nFor example, if the number of anticipated subgenomes is 2, then set n_dimensions to 3.\n\n- **FASTA normalization**\nSplit FASTA into chunks. This determines the length of subsequences into which the input FASTA is\ndivided into. This is necessary for normalizing subsequences for analysis.  This is typically\na value between 30000 and 1000000, but depends on the lengths of sequences within the input FASTA file.\nWe recommend these as starting values:\n\n```console\n    splitFasta = 1\n\n    preFilter = 0\n\n    splitFastaLineLength = 50000\n```\n\n- **Kmer counting settings**\n'kmerLength' is an important parameter that may need to be adjusted depending on the analysis.\n'kmer_low_count', 'use_high_count', 'kmer_high_count' that control which kmers are used in the\nanaylsis.  'kmer_low_count' determines what kmers are considered 'repetitive'.\n'use_high_count', 'kmer_high_count' limit the use of kmers found in the FASTA at high frequencies.\nWe recommend these initial settings:\n```console\n    writeKmer = 1\n\n    kmerLength = 26\n\n    kmer2Fasta = 1\n\n    kmer_low_count = 30\n\n    use_high_count = 0\n\n    kmer_high_count = 2000000\n\n    sampling_sensitivity = 1\n```\n\n**use original genome for final analysis output**\nTypically this will be set to zero.\n\n- **re-mapping kmers to the genome and transform of results into clustering matrix.\nspecified memory usage options.\n'blastMemory' is an important resource setting.  Set this to the amount of RAM that\nyou would like to use.  On a laptop we recommend these settings:\n\n\n```console\n    writeBlast = 1\n    k_search_length = 13\n    runBlastParallel = 0\n    blastMemory = 5\n    blast2bed = 1\n    generateClusteringMatrix = 1\n    lowMemory = 0\n    minChunkSize = 50000\n    removeNonChunk = 1\n    minChunkThreshold = 0\n    tfidf = 1\n    perfect_mode = 0\n```\n\nOn a larger single node cluster, you will want to increase the memory setting.\n'removeNonChunk' excludes sequences shorter than the specified 'minChunkSize'.\n\n**transform and cluster the data:**\nTwo critical choices are what dimensionality reduction method to use and which\ncluster method to employ.\n'reduction_techniques' indicates which method to use when performing dimensionality reduction\non the sparse repeat-kmer by subsequence matrix.\nAvailable dimensionality reducers include:\n\n- 'kpca': KernelPCA, \n- 'factor': FactorAnalysis,\n- 'feature': FeatureAgglomeration, \n- 'lda': LatentDirichletAllocation, AND 'nmf': NMF.\n\nDescription of these methods is beyond the scope of this work.\n\n'clusterMethods' specifies the cluster method that is used.  \nSupported methods are:\n\n- 'SpectralClustering': SpectralClustering,\n- 'hdbscan_genetic':GeneticHdbscan, \n- 'KMeans': MiniBatchKMeans,\n- 'GMM':GaussianMixture,\n- 'BGMM':BayesianGaussianMixture.\n\nExample parameters are:\n\n```console\n    transformData = 1\n    reduction_techniques = tsne\n    transformMetric = linear\n    ClusterAll = 1\n    clusterMethods = SpectralClustering\n    grabAllClusters = 1\n    n_neighbors = 20\n    metric = cosine\n    weighted_nn = 0\n    mst = 0\n```\n\n**extract the subgenomes:** Heuristics on subgenome repeat-kmer counts in order to\nsay whether a subsequence belongs to one or another subgenome. \nExample parameters:\n\n```console\n    extract = 0\n    diff_kmer_threshold = 20\n    default_kmercount_value = 3\n    diff_sample_rate = 1\n    unionbed_threshold = 10,2\n    bootstrap = 1\n```\n\n### Using polyCRACKER command line functions outside the automated nextflow pipeline. \n\npolyCRACKER is a python module with command line accessible functions.  The nextflow\nrun pipeline scripts allow users to avoid the need to run individual functions in serial\nfor the common purpose of subgenome classification and extraction.\n\nNonetheless there are instances where execution of individual core and helper functions\nare useful.\n\nTo see the full list of command line available functions:\n\n```console\n    docker pull sgordon/polycracker-miniconda:1.0.2\n\n    docker run -it sgordon/polycracker-miniconda:1.0.2\n\n    source activate pCRACKER_p27\n\n    polycracker.py -h\n```\nThe resulting list:\n```console\nUsage: polycracker.py [OPTIONS] COMMAND [ARGS]...\n\nOptions:\n  --version   Show the version and exit.\n  -h, --help  Show this message and exit.\n\nCommands:\n  TE_Cluster_Analysis             Build clustering matrix (repeat counts vs...\n  align                           Align two fasta files.\n  anchor2bed                      Convert syntenic blocks of genes to bed...\n  avg-distance-between-diff-kmers\n                                  Report the average distance between...\n  bed2scaffolds-pickle            Convert correspondence bed file,...\n  bio-hyp-class                   Generate count matrix of kmers versus...\n  blast2bed                       Converts the blast results from blast or...\n  blast_kmers                     Maps kmers fasta file to a reference...\n  build-pairwise-align-similarity-structure\n                                  Take consensus repeats, generate graph...\n  categorize-repeats              Take outputs from denovo repeat...\n  cluster                         Perform clustering on the dimensionality...\n  cluster-exchange                Prior to subgenome Extraction, can choose...\n  clusterGraph                    Plots nearest neighbors graph in html...\n  color-trees                     Color phylogenetic trees by progenitor of...\n  compare-scimm-metabat\n  compare-subclasses              In development: Grab abundance of top...\n  compareSubgenomes_progenitors_v_extracted\n                                  Compares the results found from the...\n  convert-mat2r                   Convert any sparse matrix into a format...\n  convert_subgenome_output_to_pickle\n                                  Find cluster labels for all...\n  correct-kappa                   Find corrected cohen's kappa score.\n  count-repetitive                Infer percent of repetitive sequence in...\n  dash-genetic-algorithm-hdbscan-test\n                                  Save labels of all hdbscan runs, generate...\n  dash-genome-quality-assessment  Input pre chromosome level scaffolded...\n  diff-kmer-analysis              Runs robust differential kmer analysis...\n  differential_TE_histogram       Compare the ratio of hits of certain...\n  explore-kmers                   Perform dimensionality reduction on...\n  extract-scaffolds-fasta         Extract scaffolds from fasta file using...\n  extract-sequences               Extract sequences from fasta file and...\n  final_stats                     Analyzes the accuracy and agreement...\n  find-best-cluster-parameters    In development: Experimenting with...\n  find-denovo-repeats             Wrapper for repeat modeler.\n  find-rules                      In development: Elucidate underlying...\n  find-rules2                     In development: Elucidate underlying...\n  generate-genome-signatures      Wrapper for sourmash.\n  generate-karyotype              Generate karyotype shinyCircos/omicCircos...\n  generate-out-bed                Find cluster labels for all...\n  generate-unionbed               Generate a unionbedgraph with intervals...\n  generate_Kmer_Matrix            From blasted bed file, where kmers were...\n  genomic-label-propagation       Extend polyCRACKER labels up and...\n  get-density                     Return gene or repeat density information...\n  hipmer-output-to-kcount         Converts hipmer kmer count output into a...\n  kcount-hist                     Outputs a histogram plot of a given kmer...\n  kcount-hist-old                 Outputs a histogram plot of a given kmer...\n  kmer2Fasta                      Converts kmer count file into a fasta...\n  kmerratio2scaffasta             Bin genome regions into corresponding...\n  link2color                      Add color information to link file for...\n  maf2bed                         Convert maf file to bed and perform stats...\n  mash-test                       Sourmash integration in development.\n  merge-split-kmer-clusters       In development: working on merging and...\n  multicol2multifiles             Take matrix of total differential kmer...\n  number_repeatmers_per_subsequence\n                                  Find histogram depicting number of repeat...\n  out-bed-to-circos-csv           Take progenitor mapped, species ground...\n  plot-distance-matrix            Perform dimensionality reduction on...\n  plot-rules                      Plots normalized frequency distribution...\n  plot-rules-chromosomes          Plot distribution of rules/conservation...\n  plot-unionbed                   Plot results of union bed file, the...\n  plotPositions                   Another plotting function without...\n  polyploid-diff-kmer-comparison  Compare highly informative differential...\n  progenitorMapping               Takes reference progenitor fasta files,...\n  repeat-subclass-analysis        Input repeat_fasta and find phylogenies...\n  reset-cluster                   Delete cluster results, subgenome...\n  reset-transform                 Remove all html files from main work...\n  return-dash-data-structures     Return dash data structures needed to run...\n  run-iqtree                      Perform multiple sequence alignment on...\n  run-metabat\n  run-tests                       Run basic polyCRACKER tests to see if...\n  run_pipeline                    Run polyCRACKER pipeline locally or on...\n  scaffolds2colors-specified      Attach labels to each scaffold for use of...\n  send-repeats                    Use bbSketch to send fasta file...\n  shiny2omic                      Convert shinyCircos csv input files to...\n  species-comparison-scaffold2colors\n                                  Generate color pickle file for...\n  spectral-embed-plot             Spectrally embed PCA data of any origin.\n  splitFasta                      Split fasta file into chunks of a...\n  subgenome-extraction-via-repeats\n                                  Extends results of TE_cluster_analysis by...\n  subgenomeExtraction             Extract subgenomes from genome, either...\n  test_pipeline\n  transform_plot                  Perform dimensionality reduction on a...\n  txt2fasta                       Extract subgenome fastas from reference...\n  unionbed2matrix                 Convert unionbed file into a matrix of...\n  update_nextflow_config          Updates nextflow configuration file for...\n  writeKmerCount                  Takes list of fasta files and runs...\n```  \n\nTo obtain information on a specific function, for example, plotPositions:\n```console\n    docker pull sgordon/polycracker-miniconda:1.0.2\n\n    docker run -it sgordon/polycracker-miniconda:1.0.2\n\n    source activate pCRACKER_p27\n\n    polycracker.py plotPositions -h\n```\nresult for the above:\n```console\n    Usage: polycracker.py plotPositions [OPTIONS]\n\n      Another plotting function without emphasis on plotting the spectral graph. Emphasis\n      is on plotting positions and clusters.\n\n    Options:\n      -npy, --positions_npy PATH      If standard layout, then use these data points to\n                                      begin simulation.  [default:\n                                      graphInitialPositions.npy]\n      -p, --labels_pickle PATH        Pickle file containing scaffolds.  [default:\n                                      scaffolds.p]\n      -c, --colors_pickle PATH        Pickle file containing the cluster/class each\n                                      label/scaffold belongs to.  [default: colors_pickle.p]\n      -o, --output_fname PATH         Desired output plot name in html.  [default:\n                                      output.html]\n      -npz, --graph_file PATH         Sparse nearest neighbors graph npz file. If desired,\n                                      try spectralGraph.npz.\n      -l, --layout [standard|spectral|random]\n                                      Layout from which to plot graph.  [default: standard]\n      -i, --iterations INTEGER        Number of iterations you would like to simulate to. No\n                                      comma delimit, will only output a single iteration.\n                                      [default: 0]\n      -s, --graph_sampling_speed INTEGER\n                                      When exporting the graph edges to CSV, can choose to\n                                      decrease the number of edges for pdf report\n                                      generation.  [default: 1]\n      -ax, --axes_off                 When enabled, exports graph without the axes.\n      -cmap, --new_colors TEXT        Comma delimited list of colors if you want control\n                                      over coloring scheme.  [default: ]\n      -h, --help                      Show this message and exit.\n```\n\n#### Re-running subgenome classification and extraction during manual optimization\n\nTwo functions of immediate interest in the context of manual optimization and trouble shooting\nare:\n\n- `polycracker.py reset-cluster -h`\n\nresult:\n```console\nUsage: polycracker.py reset-cluster [OPTIONS]\n\n  Delete cluster results, subgenome extraction results and corresponding html files.\n```\nand\n\n- `polycracker.py reset-transform -h`\n\nresult:\n\n```console\nUsage: polycracker.py reset-transform [OPTIONS]\n\n  Remove all html files from main work directory. Must do this if hoping to\n  retransform sparse data.\n```\n\nThe above functions remove some intermediate files as required to be able to\nsuccessfully re-run the pipeline.\n\n## Additional Documentation\n\nOther tips on setting up the config file and running the pipeline are found by running the jupyter notebook ./tutorials/RunningPipeline.ipynb  \n        * Information on what each config parameter means is in this notebook. Highly recommend that you check this out.  \n        * Other examples of old configuration files in ./tutorials/old_configs  \n\nOther downstream analyses not included here, but check out the html file described below for more commands.  \n\n**Accessing additional help docs:**  \n        * You can find them here after you download the repository: ./tutorials/help_docs/index.html  \n        * This is an html file that specifies some of the polyCRACKER commands. Still being updated.  \n\n## Genome Comparison Tool and K-Mer Conservation Rules\n* A separate utility of polyCRACKER that is NOT demonstrated in the paper above is the ability to compare the distribution of k-mers between different genomes/assemblies, and create a plotly/dash app for visualization.  \n* To establish a matrix of k-mers versus genomes for downstream analysis, please use *bio_hyp_class* command (-h)  \n                * Eg. nohup python polycracker.py bio_hyp_class -f ../../,\\_,n -dk 5 -w ../../results/ -m 150 -l 23 -min 2 -max 25 > ../../analysis.log &  \n* There are then scripts that can be used for downstream analysis (clustering, etc. not detailed here).  This aspect will be published\nin a separate manuscript, in preparation. \n\n## Detailed instructions for environment setup\n\n### Building your own docker image \n\n_(from the provided Dockerfile at the root of this repository.)_\n\nThe Dockerfile tested should build and run successfully in its current state.\nTo build the image:\n\n```console\n    docker build . -t polycracker\n    docker run -it polycracker\n    source activate pCRACKER_p27\n```\n\n### The recipe for conda install of the polyCRACKER environment \n(Note that the Docker method is preferred and much easier.)\n\n```console\n    conda create -y --name pCRACKER_p27 python=2.7\n\n    conda activate pCRACKER_p27\n\n    conda install -y -c bioconda nextflow scipy pybedtools pyfaidx pandas numpy bbmap\n\n    conda install -y -c anaconda networkx click biopython matplotlib scikit-learn seaborn pyamg\n\n    conda install -y -c plotly plotly\n\n    conda install -y -c conda-forge deap hdbscan multicore-tsne\n```\n\n**Test your conda environment by running polyCRACKER to classify algae genomes**\n1. Clone the repository to your project directory.\n```console\n    git clone git@bitbucket.org:berkeleylab/jgi-polycracker.git\n```\n2. change cd [your root of the git project directory containing polycracker.py]\n```console\n    cd [your project directory containing polycracker.py] \n```\n3. tar -xzvf ./test_data/test_fasta_files/algae.fa.tar.gz && mv algae.fa ./test_data/test_fasta_files/  \n4. Activate your conda environment \n```console\n    source activate pCRACKER_p27\n```\n5. polycracker.py test_pipeline -env [Your polyCRACKER conda environment]. For example:\n```console\n    polycracker.py test_pipeline -env pCRACKER_p27\n```\n6. Results stored in test_results directory.\n\n# Gallery\n\n## Example Plots\n\n### [Deconvolution of green alga genomes Coccomyxa subellipsoidea and Chlamydomonas reinhardtii](http://portal.nersc.gov/dna/plant/B_distachyon/polycracker/SpectralClusteringmain_tsne_2_n3ClusterTest.html)\n(Plots result of spectral embedding of dimensionality reduced repeat-kmer matrix, Genomes split into 50kb subsequences before classification.)\n\n### [Assigning sequences in the large tetraploid tobacco genome into two progenitor subgenomes](https://portal.nersc.gov/dna/plant/B_distachyon/polycracker/initial_clusters_BGMM_polyCRACKER.html)\n\n### [Classification of sequences in the massive hexaploid bread wheat genome into three ancestral subgenomes](https://portal.nersc.gov/dna/plant/B_distachyon/polycracker/wheat_spectral.html)\n\n\n## Schematics\n\nIllustrative schematic of polyCRACKER clustering of sequences linked by the repeat-kmers that they contain\n\n![fig1](https://user-images.githubusercontent.com/19698023/55671911-e85da000-5862-11e9-96be-1292de1c404c.png)\n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://bitbucket.org/berkeleylab/jgi-polycracker",
        "keywords": "",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "polycracker",
        "package_url": "https://pypi.org/project/polycracker/",
        "platform": "",
        "project_url": "https://pypi.org/project/polycracker/",
        "project_urls": {
            "Homepage": "https://bitbucket.org/berkeleylab/jgi-polycracker"
        },
        "release_url": "https://pypi.org/project/polycracker/1.0.3/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "unsupervised classification of polyploid subgenomes",
        "version": "1.0.3"
    },
    "last_serial": 5360700,
    "releases": {
        "0.0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "1f3aecef8560fef7da6b4f1f30f21355",
                    "sha256": "e95cc41c5a011d8e54b22069aa23d576c3baab5a4990a244c21865f1023039e2"
                },
                "downloads": -1,
                "filename": "polycracker-0.0.1-py2-none-any.whl",
                "has_sig": false,
                "md5_digest": "1f3aecef8560fef7da6b4f1f30f21355",
                "packagetype": "bdist_wheel",
                "python_version": "py2",
                "requires_python": null,
                "size": 100891,
                "upload_time": "2019-06-05T03:58:36",
                "url": "https://files.pythonhosted.org/packages/9b/c6/e89a2113546dbca4947eab7e97fedff135cebba3f5ef913f510d1384cf86/polycracker-0.0.1-py2-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "e9487308db519373b5b3acd32feaf4da",
                    "sha256": "d73b25284217cff52ad7a3858c4d56f98af1a6d18ea798e63b1b9281d826165f"
                },
                "downloads": -1,
                "filename": "polycracker-0.0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "e9487308db519373b5b3acd32feaf4da",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 114229,
                "upload_time": "2019-06-05T03:58:39",
                "url": "https://files.pythonhosted.org/packages/51/f1/bc36417dda6b3781ef445d99b7b5a183b29a9bdcdc33c9f262b9f7c8d071/polycracker-0.0.1.tar.gz"
            }
        ],
        "1.0.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "77c1e3f6ed86d0aefe0cc5db7bff7c40",
                    "sha256": "7d581dfcb28da60870135a06c5c9e3fc1761aca6cc45a7dd752a1d12aabfe704"
                },
                "downloads": -1,
                "filename": "polycracker-1.0.0-py2-none-any.whl",
                "has_sig": false,
                "md5_digest": "77c1e3f6ed86d0aefe0cc5db7bff7c40",
                "packagetype": "bdist_wheel",
                "python_version": "py2",
                "requires_python": null,
                "size": 94164,
                "upload_time": "2019-05-29T04:10:52",
                "url": "https://files.pythonhosted.org/packages/b1/f3/4cddb2b536e46545d53d514bd15b5580e4ab87e0a0c7d32aea80430bf5f7/polycracker-1.0.0-py2-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "d1fd1c42d99c8169539cc65f03f03d46",
                    "sha256": "affece17529aaa723c96394c564647f0f7035f3f111924cd247ff75540a7055e"
                },
                "downloads": -1,
                "filename": "polycracker-1.0.0.tar.gz",
                "has_sig": false,
                "md5_digest": "d1fd1c42d99c8169539cc65f03f03d46",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 98798,
                "upload_time": "2019-05-29T04:10:55",
                "url": "https://files.pythonhosted.org/packages/9a/1d/477ca99193c98aed338f8d543bcf00fef0ca35326ab9e107cf9f96178b3a/polycracker-1.0.0.tar.gz"
            }
        ],
        "1.0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "5f943d97b0776db4d7e2777ce154acdd",
                    "sha256": "98d3f03937b79a17819467a7dcef91a8b2bf742eb895a5482f8a88c9d2782088"
                },
                "downloads": -1,
                "filename": "polycracker-1.0.1-py2-none-any.whl",
                "has_sig": false,
                "md5_digest": "5f943d97b0776db4d7e2777ce154acdd",
                "packagetype": "bdist_wheel",
                "python_version": "py2",
                "requires_python": null,
                "size": 104529,
                "upload_time": "2019-05-29T05:02:43",
                "url": "https://files.pythonhosted.org/packages/e6/5b/a01c2c62fcf8268e5ef38923a15488f72fc1d7006e1699194a46500f3bdc/polycracker-1.0.1-py2-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "a990b921dd291a22905ca903ce0b5eea",
                    "sha256": "656dc67c51f932eac49fcfc8b6107119aa1ca39d7f08821cd880549ab41bb7eb"
                },
                "downloads": -1,
                "filename": "polycracker-1.0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "a990b921dd291a22905ca903ce0b5eea",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 117977,
                "upload_time": "2019-05-29T05:02:45",
                "url": "https://files.pythonhosted.org/packages/17/1d/ff5110f2a4cec78d600e068aa51cf37330512950d4c228d309925e8d600d/polycracker-1.0.1.tar.gz"
            }
        ],
        "1.0.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "a6e08e885ed4592e31d680f7a9fb8dab",
                    "sha256": "4fbf6e4c9e4b2fb24b6948754c8361289b16b2adbb650e01f1b7ee528a70d662"
                },
                "downloads": -1,
                "filename": "polycracker-1.0.2-py2-none-any.whl",
                "has_sig": false,
                "md5_digest": "a6e08e885ed4592e31d680f7a9fb8dab",
                "packagetype": "bdist_wheel",
                "python_version": "py2",
                "requires_python": null,
                "size": 104520,
                "upload_time": "2019-05-29T05:06:32",
                "url": "https://files.pythonhosted.org/packages/c3/f6/80be80106489265087340c6efb5683732487bff69564dbcf932e91c0fd92/polycracker-1.0.2-py2-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "f8e5911a0cd17a70a498e400b0508e83",
                    "sha256": "4ed485c28f2955081ef5f7f27d0f774f4756ae665dd69eeb686245d898fa70f3"
                },
                "downloads": -1,
                "filename": "polycracker-1.0.2.tar.gz",
                "has_sig": false,
                "md5_digest": "f8e5911a0cd17a70a498e400b0508e83",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 117949,
                "upload_time": "2019-05-29T05:06:35",
                "url": "https://files.pythonhosted.org/packages/2e/13/314b4084f8e17ec1a4b770179708b83bb97a0b6b20cfc443f5cf55a32e6a/polycracker-1.0.2.tar.gz"
            }
        ],
        "1.0.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "57499112e36c610da10bc6dc976f2b1a",
                    "sha256": "80d47c9b5d8a49709ebacb4b95b6477035072676838c2e0bc5bdf777681a539d"
                },
                "downloads": -1,
                "filename": "polycracker-1.0.3-py2-none-any.whl",
                "has_sig": false,
                "md5_digest": "57499112e36c610da10bc6dc976f2b1a",
                "packagetype": "bdist_wheel",
                "python_version": "py2",
                "requires_python": null,
                "size": 104402,
                "upload_time": "2019-06-02T01:22:29",
                "url": "https://files.pythonhosted.org/packages/7f/f2/b4e0141ffd3a3fb6de16a1ddbfba117f589e62478567063002b352811457/polycracker-1.0.3-py2-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "89beaa4df1d40981f0e1af8bdb09350f",
                    "sha256": "b027824a661b17180271d70974a7a844a7abb36440176b51f90cf29591508d7f"
                },
                "downloads": -1,
                "filename": "polycracker-1.0.3.tar.gz",
                "has_sig": false,
                "md5_digest": "89beaa4df1d40981f0e1af8bdb09350f",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 117815,
                "upload_time": "2019-06-02T01:22:33",
                "url": "https://files.pythonhosted.org/packages/1c/ba/8008da28a83dc55f85734c97a8b941fc21bd2a931ecf3dbdf23942d193f9/polycracker-1.0.3.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "57499112e36c610da10bc6dc976f2b1a",
                "sha256": "80d47c9b5d8a49709ebacb4b95b6477035072676838c2e0bc5bdf777681a539d"
            },
            "downloads": -1,
            "filename": "polycracker-1.0.3-py2-none-any.whl",
            "has_sig": false,
            "md5_digest": "57499112e36c610da10bc6dc976f2b1a",
            "packagetype": "bdist_wheel",
            "python_version": "py2",
            "requires_python": null,
            "size": 104402,
            "upload_time": "2019-06-02T01:22:29",
            "url": "https://files.pythonhosted.org/packages/7f/f2/b4e0141ffd3a3fb6de16a1ddbfba117f589e62478567063002b352811457/polycracker-1.0.3-py2-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "89beaa4df1d40981f0e1af8bdb09350f",
                "sha256": "b027824a661b17180271d70974a7a844a7abb36440176b51f90cf29591508d7f"
            },
            "downloads": -1,
            "filename": "polycracker-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "89beaa4df1d40981f0e1af8bdb09350f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 117815,
            "upload_time": "2019-06-02T01:22:33",
            "url": "https://files.pythonhosted.org/packages/1c/ba/8008da28a83dc55f85734c97a8b941fc21bd2a931ecf3dbdf23942d193f9/polycracker-1.0.3.tar.gz"
        }
    ]
}