{ "info": { "author": "Luke Hodkinson", "author_email": "furious.luke@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python" ], "description": "# lizards-are-awesome\n\nA Docker based workflow for performing a Plink/fastStructure analysis from\non DArTseq SNP data, inferred from an Excel file.\n\n\n## Overview\n\nThis software seeks to reduce the manual labour involved in preparing DArTseq SNP\ndata in 1 row format for analysis with Plink and fastStructure. LAA is designed specifically for\nSNP data sets generated by DArTseq, in 1 row format. As such, input\ndata will be the following metadata provided by DArTseq: \"0\" =\nReference allele homozygote, \"1\"= SNP allele homozygote, \"2\"=\nheterozygote, and \"-\" = double null/ null allele homozygote (absence\nof fragment with SNP in genomic representation). LAA first converts\nthese data into ped and map files for plink analysis.\n\nMost of the work, besides the mentioned\nexternal packages, is done with a Python script. The primary operations\nperformed by the script are:\n\n 1. Duplicating the input data.\n 2. Performing a substitution on certain characters in both\n sets of data, in order to create Plink compatible characters (i.e. \"-\" to \"0\").\n 3. Independently indexing both sets of data.\n 4. Combining both sets of data.\n 5. Sorting on the combined index.\n 6. Transposing the combined data.\n 7. Outputting to Plink compatible `ped` and `map` formats.\n\nWhereas before these steps would have been carred out manually using various software\npackages, they are now performed automatically.\n\nIn addition to the conversion operation, there are additional functions\nto perform analysis runs of Plink and fastStructre, passing the data files\nbetween the two programs automatically.\n\nIn addition to the conversion operation, LAA automatically initiates \nthe program Plink on the generated ped and map files, and the \nresulting bed, bim and fam files are then passed on to and analysed \nwith fastStructure. The user can choose a maximum of K(number of \npopulations) to be analysed by fastStructure. Output files include \nthe meanQ value for each individual, defining the mean probability \nto belong to any one of the populations K1 to Kx.\n\n\n## Design Decisions\n\n### Why Docker?\n\nPlink is written for Linux based operating systems. As such on a Linux system\nall operations could be performed directly, without the need for any kind of\nvirtualisation layer. But, in order to support researchers using Windows based\noperating systems the decision was made to leverage Docker virtualisation.\n\nDocker provides a light-weight virtualisation layer enabling Linux software to\nrun on Windows with (relative) ease. It also has the added benefit of providing\na cloud based mechanism for disseminating software \"images\" to users. The advantage\nof Docker over other systems, like VirtualBox or VMWare, are:\n\n * cloud based distribution of prebuilt images,\n * future releases will allow native Docker containers, and\n * easy to replicate virtual image creation.\n\n### Why Python?\n\nPython is a powerful and expressive scripting language. It comes with many\ndiverse packages, and has excellent support from developers (for example,\nfastStructure is written in Python).\n\n\n## Dependencies\n\nWhen installing on any platform there are number of requisite dependencies:\n\n * Python\n * Docker\n\nIf you happen to be installing on Windows, then there are a couple of extra requirements:\n\n * Visual Studio Python compiler\n * MsysGit\n\n\n## Important\n\nWe've found that Docker has issues when running on Windows, resulting in faulty data\ntransformation. While you may be able to install LAA on a Windows system, the accuracy of \nresults are likely to be compromised.\n\nTo install on Windows, we recommend using a virtual machine running an Ubuntu\ninstallation, e.g. VMWare All steps detailed below under Installation will have to be \nperformed through the Virtual Machine, including installing Docker.\n\n\n## Installation\n\nBegin by installing all of the dependencies for your operating system as\nlisted above.\n\nOnce complete, open a system terminal (please see the subsection on system terminals\nbelow, under `usage`).\n\nFrom an open system terminal, install the LAA Python interface with:\n\n```bash\npip install lizards-are-awesome\n```\n\nNext, from a system terminal, download and prepare the `laa` docker image. This\nimage contains `plink`, `fastStructure`, and the conversion scripts, all built\ninto a light-weight Alpine linux image:\n\n```bash\nlaa init\n```\n\n## Usage\n\n### Terminals\n\nUsage is currently done directly from your operating system terminal. In Linux\nlike operating systems (including Mac OS X) use the system terminal emulator. In\nWindows operating systems use the Docker quick start terminal.\n\n### Input Format\n\nLAA accepts XLSX Excel formats and CSV. Unfortunately, XLSX is extremely slow\nto parse using opensource utilities. As such we recommend converting your Excel\ndata to CSV before use with LAA (simply open and then save as csv file using\nMicrosoft Office or opensource spreadsheet tools, like Libre \nOffice).\n\nThe data sheet should contain only columns with DArTseq SNP data \n(i.e. 0, 1, 2 and -), all other columns have to be removed.\nThe first row should contain the name of the population each \nindividual belongs to (e.g. species), the second row should contain \nthe ID of each individual. All following rows contain the SNP data.\n\nA short, fictitious, example:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
PminimaPminimaPminorPminimaPminorPminima
lizard1lizard2lizard15lizard39lizard40lizard44
011211
000100
1-1011
0010-0
221112
221210
112121
111201
000000
-12111
\n\nAnd, in CSV format:\n\n```csv\nPminima,Pminima,Pminor,Pminima,Pminor,Pminima\nlizard1,lizard2,lizard15,lizard39,lizard40,lizard44\n0,1,1,2,1,1\n0,0,0,1,0,0\n1,-,1,0,1,1\n0,0,1,0,-,0\n2,2,1,1,1,2\n2,2,1,2,1,0\n1,1,2,1,2,1\n1,1,1,2,0,1\n0,0,0,0,0,0\n-,1,2,1,1,1\n```\n\n### Location\n\nAll LAA commands must be run from the same directory you have your CSV input file\nin. For the purpose of the examples, let's say we have an input file, `input.csv`,\nlocated at `/c/workspace/data`:\n\n```bash\ncd /c/workspace/data\n```\n\n### Quick-run\n\nTo perform the complete process, including conversion, Plink, fastStructre and\nanalysing for K values, you can just run:\n\n```bash\nlaa all input.csv --maxk=5\n```\n\nwhere `--maxk=5` may be replaced with a suitable value for the maximum K value to\nuse.\n\nThis will produce a range of files in the current working directory corresponding\nto the outputs of the conversion, Plink, and fastStructre.\n\n### Conversion\n\nConverting the input data will peform recombination, transposition, output\nto a PED file, and also generation of a suitable mapping file:\n\n```bash\nlaa convert input.csv output.ped\n```\n\nThis will generate two files: `output.ped`, and `output.map`. These files are\nsuitable for use with Plink.\n\n### Plink\n\nTo process the converted input files with Plink, run:\n\n```bash\nlaa plink output.ped\n```\n\n### fastStructure\n\nTo process the Plink outputs with fastStructure, run:\n\n```bash\nlaa fast output\n```\n\n### K Choice\n\nTo run fastStructure a number of times, and then choose an appropriate\nK value, run:\n\n```bash\nlaa choosek output --maxk=5\n```\n\nwhere `--maxk=5` may be replaced with a suitable value for the maximum K value to\nuse.\n\n## Getting Help\n\nHelp is always available from the command-line. To get a printout of available commands,\nrun:\n\n```bash\nlaa -h\n```\n\nYou may also get help for a specific command with something like:\n\n```bash\nlaa convert -h\n```\n\nwhere `convert` may be replaced with the respective command help is sought for.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/furious-luke/lizards-are-awesome", "keywords": null, "license": "BSD", "maintainer": null, "maintainer_email": null, "name": "lizards-are-awesome", "package_url": "https://pypi.org/project/lizards-are-awesome/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/lizards-are-awesome/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/furious-luke/lizards-are-awesome" }, "release_url": "https://pypi.org/project/lizards-are-awesome/0.4.0/", "requires_dist": null, "requires_python": null, "summary": "UNKNOWN", "version": "0.4.0" }, "last_serial": 2452470, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "5c1d3fa531061c6bd578e9f02d8abb7f", "sha256": "5bbee77623b02ea4b1cb65120f6e65c033af40d8c0783af3a5eb175e985f312d" }, "downloads": -1, "filename": "lizards-are-awesome-0.1.tar.gz", "has_sig": false, "md5_digest": "5c1d3fa531061c6bd578e9f02d8abb7f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5982, "upload_time": "2016-04-04T23:49:54", "url": "https://files.pythonhosted.org/packages/36/91/db909ee558b0f3e4cdd4c1849dba4c4a9ba8c7137dd96353ea7b31f74649/lizards-are-awesome-0.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "ef4dacba700c6a73a71b767f12900f99", "sha256": "bda7ffa67f74e6b954f44fe86c94022422b0d886e1b38ad785732342c5cf08fa" }, "downloads": -1, "filename": "lizards-are-awesome-0.2.tar.gz", "has_sig": false, "md5_digest": "ef4dacba700c6a73a71b767f12900f99", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6304, "upload_time": "2016-06-30T07:43:52", "url": "https://files.pythonhosted.org/packages/cb/88/a7e854d16978765a5d2a0cea2ce920cd42290e278e775ebc7af1f98dc6b3/lizards-are-awesome-0.2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "aa612a6260129f5196a95c681cfc9983", "sha256": "b60ef05fc8515b8e777d82e9bb45529ab10755a7693ec3fb60c982755f4bcaaa" }, "downloads": -1, "filename": "lizards-are-awesome-0.2.1.tar.gz", "has_sig": false, "md5_digest": "aa612a6260129f5196a95c681cfc9983", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7086, "upload_time": "2016-07-01T05:38:10", "url": "https://files.pythonhosted.org/packages/cd/37/57daab70614205d1b7f0cd84e110b57724ec9485e62842383f5b821d5fb0/lizards-are-awesome-0.2.1.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "c949d91df3e228890e06e50d12b2342d", "sha256": "7c884ff1f09de720022fcb7f0aa423faad35e7627683a7255b946f65ac2d4bc1" }, "downloads": -1, "filename": "lizards-are-awesome-0.3.0.tar.gz", "has_sig": false, "md5_digest": "c949d91df3e228890e06e50d12b2342d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7094, "upload_time": "2016-07-01T06:50:35", "url": "https://files.pythonhosted.org/packages/29/9f/ed644785c61230b423e7b40942bcdd6df0441de119ee03e413ddc631f5a9/lizards-are-awesome-0.3.0.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "ef41b62d034825f5af14df2cd73634c7", "sha256": "18481e54c3d47bfda28a7994cf0b29297d16eeb7e425cf94c61e42f3b981ee90" }, "downloads": -1, "filename": "lizards-are-awesome-0.4.0.tar.gz", "has_sig": false, "md5_digest": "ef41b62d034825f5af14df2cd73634c7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7145, "upload_time": "2016-11-10T06:51:20", "url": "https://files.pythonhosted.org/packages/6b/98/2c487d9d2f719ede89097cf9a21ced4f812c733f39cca93facd3dae1b95f/lizards-are-awesome-0.4.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ef41b62d034825f5af14df2cd73634c7", "sha256": "18481e54c3d47bfda28a7994cf0b29297d16eeb7e425cf94c61e42f3b981ee90" }, "downloads": -1, "filename": "lizards-are-awesome-0.4.0.tar.gz", "has_sig": false, "md5_digest": "ef41b62d034825f5af14df2cd73634c7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7145, "upload_time": "2016-11-10T06:51:20", "url": "https://files.pythonhosted.org/packages/6b/98/2c487d9d2f719ede89097cf9a21ced4f812c733f39cca93facd3dae1b95f/lizards-are-awesome-0.4.0.tar.gz" } ] }