QTL Reaper

From ea46f42ee640928b92947bfb204c41a482d80937 Mon Sep 17 00:00:00 2001 From: root Date: Tue, 8 May 2012 18:39:56 -0500 Subject: Add all the source codes into the github. --- web/qtlreaper.html | 404 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 404 insertions(+) create mode 100755 web/qtlreaper.html (limited to 'web/qtlreaper.html') diff --git a/web/qtlreaper.html b/web/qtlreaper.html new file mode 100755 index 00000000..daabb51f --- /dev/null +++ b/web/qtlreaper.html @@ -0,0 +1,404 @@ + +QTL Reaper + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + +

+ What is the QTL Reaper?

+ +

+QTL Reaper is command line software for rapidly mapping quantitative trait +loci from large datasets with many phenotypes per genotype (per subject +or inbred strain) such as those generated using microarrays, where every +gene expression level measurement may be treated as a phenotype. +This program is a batch-mode version of WebQTL and is written in C and compiled as a Python +module. +

+QTL Reaper requires as input a phenotype data set and +genotypes for a mapping population such as set of recombinant inbred +strains, an F2 intercross, or a backcross. The program computes +linkage between each trait and all genotypes, evaluating the P-value +of linkage by permutation. For each trait, QTL Reaper ramps up the +number of permutations as necessary to define the empirical P-value +to a reasonable level of precision. It also performs bootstrap +resampling to estimate the confidence region for the location of +putative QTLs. + +

+ +How to Setup and Use QTL Reaper + +

+1. Download QTL Reaper from SourceForge. +

+ +2. Building and installing QTL Reaper +
+    a) Open a terminal or command prompt. + +
+    b) Gunzip and untar the package file, then enter the qtlreaper directory. If you are using the Macintosh OS, you may need to comment out lines 29 and 30 in setup.py using a text editor before building. + +
+    c) To compile qtlreaper module, execute: python setup.py build. Administrator privileges are needed to install qtlreaper, as it is a global module for python. To compile qtlreaper on a PC, Visual Studio 7.1+ (or its Python-component libraries) is required. The setup.py script may be modified if you wish to use your own blas and lapack libraries. To install the qtlreaper module, execute: python setup.py install + +
+    d) Check that the library reaper.so has been installed in a location that is included in the Python path. You can check the Python path when Python is running by executing the Python commands import sys and sys.path. Under Mac OS X 10.4, the standard location would be /System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages. + +
+    e) Run Python, if you have not already, by typing python at the terminal prompt. If reaper.so is installed properly, it is not necesary to change to a specific directory before running Python. Alternately, the user may navigate to the directory with reaper.so, then start python, then type 'import reaper' for the same functionality, without requiring the module be installed. + + +

+ + +3. Instructions for Using QTL Reaper + +
+    a) Now that Python is running, import qtlreaper by typing import reaper in the terminal. + +
+         From here, the user can create a data set into which data can be placed by executing the following command. This command creates a Dataset object which, in this example, we call mydataset. In place of mydataset, you can use any name that begins with a letter. + +
         mydataset = reaper.Dataset(). + +
+         At this point a properly formated file can be read into the newly created Dataset object using the command: + +
         mydataset.read("path/to/file"). + +
+    b) The user now may examine a number of data fields in the Dataset object by simply typing their names in the format mydataset.command. This usage is standard Python syntax for displaying the data of an object. + +
          The names of these data fields are: +

type: Displays the type of dataset (RI Set or Intercross or Backcross, etc.) +
name: Displays the name of the mapping population (BXD, B6D2F2, CXBRIX, etc.) +
prgy: Lists the names of the individuals or strains +
nprgy: Lists the number of individuals or strains +
pat: Shows the paternal parent type or strain +
mat: Shows the maternal parent type or strain +
nloci: Displays the number of marker loci used to map +

+ + + c) +Data related to a specific chromosome or a specific marker on a chromosome can be displayed by using the following syntax, where chr# represents a chromosome number and marker# represents a marker number. The syntax, which is standard Python sytax applied to a Dataset object is: +
mydataset[chr#].name or mydataset[chr#][marker#].name, where the possible data field names include: + + +

mydataset[chr#].name: Displays the name of the selected chromosome +
[0]: Indicates the first chromosome +
[-1]: Used as a shortcut to the last chromosome +
mydataset[chr#][marker#].name: Displays the marker at the selected array location +
[0][0]: Indicates the first chromosome and first marker location +
mydataset[chr#][marker#].cM: Displays the location of the marker in cM +
mydataset[chr#][marker#].Mb: Displays the location of the marker in Mb (if available) +
mydataset[chr#][marker#].genotype: Shows the genotype (scored as - 1, 0, and +1) of the elements in the array +
mydataset[chr#][marker#].genotext: Describes the selected elements in the array as maternal, heterozygous, or paternal genotypes +
len(mydataset[chr#]): Displays the number of markers in the selected chromosome +

+ + + d) Manually putting data into QTL Reaper +

NewData = ['String1', 12.5, 'String2']: Creates a list containing any number of strings and/or floats - quotes are needed between each string element, but not around a numerical element. +

NewData2 = 'Thing1', 3.141, 'Thing2': Creates a tuple containing any number of strings and/or floats - the same as above, but without square brackets around the sides. +

+ e) Generating new tables and data +

NewVariableName = YourVariableName.addinterval(): Creates a new object, calculating intervals at every 1cM +

NewVariableName.nloci: Displays the number of loci in the newly calculated interval. +

Chrom_No_Variable = reaper.Dataset(): Creates a new object that is a duplicate of the original YourVariableName. +

Chrom_No_Variable.chromosome = [YourVariableName[#]]: Extracts a specific chromosome out of YVN and sets ChromVar equal to it. +

+ f) Calculating regression, QTLs, bootstrap tests, permutations, and group variance + +
(All variable names with 'x' below are examples and the entire variable name can be changed.) +

QTLx = YourVariableName.regression(strains = NewData, trait = NewData2, variance = NewData3, control="NewData4": Sets up a QTL with data to be analyzed. Variance and control are optional controls. +

QTLx.sort(): Sorts the QTL data by absolute value of the LRS. +

max(QTLx): Displays the data for the highest QTL +

max(QTLx).lrs: Gives the exact LRS value for the highest QTL +

max(QTLx).locus: Shows the locus data +

min(QTLx): Displays the data for the lowest QTL +

min(QTLx).lrs: Gives the exact LRS value for the lowest QTL +

min(QTLx).locus: Shows the locus data +

bootX = YourVariableName.bootstrap(strains = newData, trait = NewData2, variance = newData 3, nboot = #): Does a bootstrap test on selected data. Variance is an optional control, nboot can be set to any positive integer. +

permuX = YourVariableName.permutation(straints = newData, trait = NewData2, variance = newData3, nperm = #, thresh = #): Returns a list of highest LRS from each permutation, ascending. Variance, nperm, and thresh are all optional. Thresh and nperm should be set as positive integers if either is used. +

pvalueX = reaper.pvalue(max(QTLx).lrs, permuX): Calculates the p-value for the selected permutation. +

ANOVA: ANalysis Of VAriance between groups, a two-step function. +

listvariable = [3, 2.1, 5, 3.2, 1.3, 3.6, 1.6, 3.9, 4.1, 2]: Creates the list that will be analyzed. +

mean, median, variance, stdev, stderr, N = reaper.anova(listvariable): Does the group calculations. The calculations are then displayed by entering them as an individual command (i.e. mean or median). +

+ g) Reading from an Input file, such as trait data. (Sample trait.txt is included in the QTLReaper download's Example folder) Below: This pulls the first line of File.txt and puts it into 'header' and the second line into 'fileData.' +

+import string +
newFile = open("..Path/..To/File.txt") +
header = newFile.readline() +
header = string.split(header) +
header = map(string.strip, header) +
strains = header[1:] +
fileData = newFile.readline() +
fileData = string.split(fileData) +
fileData = map(string.strip, fileData[1:]) +
fileData = map(float, fileData) +

+ + +4. Formatting Input Files:

+The primary input files for QTLReaper must be in an exact, specific +format to work properly. Before exporting this data, ensure that your +first three columns of data are in this order: Chromosome, Locus, and +cM. Mb location may be added as an optional fourth column, though it +cannot be a substitute for cM. It must be a plain text, tab-delimited +document with Unix line breaks. Microsoft Excel, BBEdit, and other +applications will have the option to export as a tab-delimited text +document, most likely under the "save as" option. Many applications have the +option to change the line breaks to Unix (in BBEdit, for example, in +the Options section of "Save As"). This can also be conveniently accomplished in +Windows by using fromdos.exe, a small free utility available here. +
+Once the file is exported, the +data may be pushed down a few lines to make room for (optional) +header information. (Remember to convert these line breaks to Unix style +if adding the header information on a Windows or Macintosh machine!) +At the top, above all the data, the user may add +the information that is called by the .type, .name, and .mat/.pat +commands listed above. This is done by: @type:, @name:, and @mat/@pat: with stating the necessary +information following the colon (no space). All these declarations +should be placed on separate lines. + +
+BXD.txt and BXD2.txt are examples of properly formatted +files, and are contained in the QTLReaper download's Example folder. +A larger, non-sample-only dataset is available for download (722k, its Macintosh line breaks will +have to be converted to Unix line breaks to function properly.) +

+ + +5. Example: +

+#Import the reaper module
+import reaper
+#Initiate a Python Dataset object
+bxdGeno = reaper.Dataset()
+#read genotype information from a file
+#argument is a string containing the filename and location
+#file is a tab-delimited text file
+bxdGeno.read("../../Example/BXD.txt")
+################
+# Dataset Object
+################
+#dataset type (RI Set or Intercross)
+bxdGeno.type
+#dataset name
+bxdGeno.name
+#progeny
+bxdGeno.prgy
+#number of progeny
+bxdGeno.nprgy
+#parent 1
+bxdGeno.pat
+#parent 2
+bxdGeno.mat
+#Number of Loci
+bxdGeno.nloci
+#Number of Chromosome
+len(bxdGeno[0])
+#1st Chromosome
+bxdGeno[0].name
+#last Chromosome
+bxdGeno[-1].name
+#1st Locus
+bxdGeno[0][0].name
+#last Locus
+bxdGeno[-1][-1].name
+#Locus cM
+bxdGeno[-1][-1].cM
+#Whether physical info (Mb) are available
+bxdGeno[-1][-1].Mbmap
+#Locus Mb
+bxdGeno[-1][-1].Mb, if physical info are available
+#Locus genotype in numbers
+bxdGeno[-1][-1].genotype
+#Locus genotype in abbrevs
+bxdGeno[-1][-1].genotext
+################
+# Interval Map
+################
+#Calculate intervals at 1cM, generate a new dataset object 
+bxdIntervalGeno = bxdGeno.addinterval()
+#Number of Loci
+bxdIntervalGeno.nloci
+################
+# individual chromosome
+################
+#You can create a new dataset object which contains
+only one chromosome for individual chromosome analyses 
+chr_1_bxdGeno = reaper.Dataset()
+chr_1_bxdGeno.chromosome = [bxdGeno[0]]
+#Number of Loci
+chr_1_bxdGeno.nloci
+################
+# Member functions
+################
+#Most reaper functions require two lists as inputs
+#the first list is the case or strain list; the second is the trait values list
+#the two lists should have the same number of members
+#the strain list should have the same cases or strains as those listed in the header 
+
 of the genotype file or should be a subset of those
+strains =['BXD1', 'BXD2', 'BXD5', 'BXD6', 'BXD8', 'BXD9', 'BXD11', 'BXD12', 'BXD13',
          'BXD14', 'BXD15', 'BXD16', 'BXD18', 'BXD19', 'BXD20', 'BXD21', 'BXD22',
          'BXD23', 'BXD24', 'BXD25', 'BXD27', 'BXD42']
+trait =  [53.570 ,63.885 ,56.700 ,61.750 ,66.325 ,65.150 ,60.400 ,57.920 ,
          51.925 ,62.350 ,67.175 ,65.850 ,52.425 ,60.925 ,65.350 ,56.750 ,
          59.750 ,57.888 ,60.250 ,64.433 ,57.125 ,63.600]
+#vaiance list will be used in weighted regression (using 1/variance as the weights)
+variance = [0.777, 0.108, 1.78, 1.18, 0.370, 0.808, 1.549, 0.710, 0.257, 1.482, 1.816, 
            0.711, 1.204, 0.059, 0.182, 0.591, 0.357, 0.072, 0.490, 0.239, 0.905, 1.327]
+#genotypes of a control locus as a cofactor in the regression
+control = "D1Mit1"
+################
+# Regression
+################
+#the result is a list of QTL Objects
+#simple regression (no variance and no control cofactor)
+qtl1 = bxdGeno.regression(strains = strains, trait = trait)
+#weighted regression (variance weighting but no control cofactor)
+qtl2 = bxdGeno.regression(strains = strains, trait = trait, variance = variance)
+#composite regression (control cofactor with or without variance weighting)
+qtl3 = bxdGeno.regression(strains = strains, trait = trait, control = "D1Mit1")
+qtl4 = bxdGeno.regression(strains = strains, trait = trait, variance = variance, control = "D1Mit1")
+#maximum QTL
+max(qtl1)
+max(qtl2)
+max(qtl3)
+max(qtl4)
+#maximum LRS
+max(qtl1).lrs
+max(qtl2).lrs
+max(qtl3).lrs
+max(qtl4).lrs
+#Locus with maximum LRS
+max(qtl1).locus
+max(qtl1).locus.name
+#sort the qtl list
+qtl1.sort()
+qtl2.sort()
+qtl3.sort()
+qtl4.sort()
+################
+# Permutation
+################
+#returns a list of highest LRS value for each permutation in ascending order
+#fixed number permutations
+permu1 = bxdGeno.permutation(strains = strains, trait = trait,nperm=1000)
+permu2 = bxdGeno.permutation(strains = strains, trait = trait, variance = variance,nperm=1000)
+#progressive permutation
+#keep on doing permutations until the threshold LRS is not in the top 10 
+#or the total number of permutations reaches 1,000,000 
+permu3 = bxdGeno.permutation(strains = strains, trait = trait, thresh = 23)
+#calculate p-value
+pv1 = reaper.pvalue(max(qtl1).lrs, permu1)
+################
+# Bootstrap
+################
+#returns a list of counts of times that a locus has the highest LRS score
+#the length of the list equal to the total number of markers
+boot1 = bxdGeno.bootstrap(strains = strains, trait = trait, nboot=1000)
+boot2 = bxdGeno.bootstrap(strains = strains, trait = trait, variance = variance, nboot=1000)
+################
+# Anova 
+################
+# A simple ANOVA (ANalysis Of VAriance between groups) function
+list = [1, 2, 4.1, 2, 4, 3.2, 1.1, 5, 5.6, 7.1, 2.3, 4.3, 3.6]
+mean, median, variance, stdev, stderr, N = reaper.anova(list)
+#######################
+# Read from Input file
+#######################
+#It is easy to read from tab-delimited input file to generate the strain and trait value lists
+#use the included trait.txt as example
+import string
+fp = open("../../Example/trait.txt")
+header = fp.readline()
+header = string.split(header)
+header = map(string.strip, header)
+#strip any blank characters
+strains = header[1:]
+#the header here is the strain list
+trait = fp.readline()
+trait = string.split(trait)
+trait = map(string.strip, trait[1:])
+#strip any blank characters
+trait = map(float, trait)
+#the trait here is the trait valuelist
+

+ + About this text file +

+QTL Reaper Tutorial written by Evan Williams, July 2005. Edits by EGW, Aug 4. 2005; KFM, March 20, 2006. Last edit by KFM, March 20, 2006.

+ + + +

+ +

+ + + + + + + + + -- cgit 1.4.1