How to Setup and Use QTL Reaper
1. Download QTL Reaper from SourceForge.
2. Building and installing QTL Reaper
a) Open a terminal or command prompt.
b) Gunzip and untar the package file, then enter the qtlreaper directory. If you are using the Macintosh OS, you may need to comment out lines 29 and 30 in setup.py using a text editor before building.
c) To compile qtlreaper module, execute: python setup.py build. Administrator privileges are needed to install qtlreaper, as it is a global module for python. To compile qtlreaper on a PC, Visual Studio 7.1+ (or its Python-component libraries) is required. The setup.py script may be modified if you wish to use your own blas and lapack libraries. To install the qtlreaper module, execute: python setup.py install
d) Check that the library reaper.so has been installed in a location that is included in the Python path. You can check the Python path when Python is running by executing the Python commands import sys and sys.path. Under Mac OS X 10.4, the standard location would be /System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages.
e) Run Python, if you have not already, by typing python at the terminal prompt. If reaper.so is installed properly, it is not necesary to change to a specific directory before running Python. Alternately, the user may navigate to the directory with reaper.so, then start python, then type 'import reaper' for the same functionality, without requiring the module be installed.
3. Instructions for Using QTL Reaper
a) Now that Python is running, import qtlreaper by typing import reaper in the terminal.
From here, the user can create a data set into which data can be placed by executing the following command. This command creates a Dataset object which, in this example, we call mydataset. In place of mydataset, you can use any name that begins with a letter.
mydataset = reaper.Dataset().
At this point a properly formated file can be read into the newly created Dataset object using the command:
mydataset.read("path/to/file").
b) The user now may examine a number of data fields in the Dataset object by simply typing their names in the format mydataset.command. This usage is standard Python syntax for displaying the data of an object.
The names of these data fields are:
- type: Displays the type of dataset (RI Set or Intercross or Backcross, etc.)
- name: Displays the name of the mapping population (BXD, B6D2F2, CXBRIX, etc.)
- prgy: Lists the names of the individuals or strains
- nprgy: Lists the number of individuals or strains
- pat: Shows the paternal parent type or strain
- mat: Shows the maternal parent type or strain
- nloci: Displays the number of marker loci used to map
c)
Data related to a specific chromosome or a specific marker on a chromosome can be displayed by using the following syntax, where chr# represents a chromosome number and marker# represents a marker number. The syntax, which is standard Python sytax applied to a Dataset object is:
mydataset[chr#].name or mydataset[chr#][marker#].name, where the possible data field names include:
- mydataset[chr#].name: Displays the name of the selected chromosome
- [0]: Indicates the first chromosome
- [-1]: Used as a shortcut to the last chromosome
- mydataset[chr#][marker#].name: Displays the marker at the selected array location
- [0][0]: Indicates the first chromosome and first marker location
- mydataset[chr#][marker#].cM: Displays the location of the marker in cM
- mydataset[chr#][marker#].Mb: Displays the location of the marker in Mb (if available)
- mydataset[chr#][marker#].genotype: Shows the genotype (scored as - 1, 0, and +1) of the elements in the array
- mydataset[chr#][marker#].genotext: Describes the selected elements in the array as maternal, heterozygous, or paternal genotypes
- len(mydataset[chr#]): Displays the number of markers in the selected chromosome
d) Manually putting data into QTL Reaper
- NewData = ['String1', 12.5, 'String2']: Creates a list containing any number of strings and/or floats - quotes are needed between each string element, but not around a numerical element.
- NewData2 = 'Thing1', 3.141, 'Thing2': Creates a tuple containing any number of strings and/or floats - the same as above, but without square brackets around the sides.
e) Generating new tables and data
- NewVariableName = YourVariableName.addinterval(): Creates a new object, calculating intervals at every 1cM
- NewVariableName.nloci: Displays the number of loci in the newly calculated interval.
- Chrom_No_Variable = reaper.Dataset(): Creates a new object that is a duplicate of the original YourVariableName.
- Chrom_No_Variable.chromosome = [YourVariableName[#]]: Extracts a specific chromosome out of YVN and sets ChromVar equal to it.
f) Calculating regression, QTLs, bootstrap tests, permutations, and group variance
(All variable names with 'x' below are examples and the entire variable name can be changed.)
- QTLx = YourVariableName.regression(strains = NewData, trait = NewData2, variance = NewData3, control="NewData4": Sets up a QTL with data to be analyzed. Variance and control are optional controls.
- QTLx.sort(): Sorts the QTL data by absolute value of the LRS.
- max(QTLx): Displays the data for the highest QTL
- max(QTLx).lrs: Gives the exact LRS value for the highest QTL
- max(QTLx).locus: Shows the locus data
- min(QTLx): Displays the data for the lowest QTL
- min(QTLx).lrs: Gives the exact LRS value for the lowest QTL
- min(QTLx).locus: Shows the locus data
- bootX = YourVariableName.bootstrap(strains = newData, trait = NewData2, variance = newData 3, nboot = #): Does a bootstrap test on selected data. Variance is an optional control, nboot can be set to any positive integer.
- permuX = YourVariableName.permutation(straints = newData, trait = NewData2, variance = newData3, nperm = #, thresh = #): Returns a list of highest LRS from each permutation, ascending. Variance, nperm, and thresh are all optional. Thresh and nperm should be set as positive integers if either is used.
- pvalueX = reaper.pvalue(max(QTLx).lrs, permuX): Calculates the p-value for the selected permutation.
- ANOVA: ANalysis Of VAriance between groups, a two-step function.
- listvariable = [3, 2.1, 5, 3.2, 1.3, 3.6, 1.6, 3.9, 4.1, 2]: Creates the list that will be analyzed.
- mean, median, variance, stdev, stderr, N = reaper.anova(listvariable): Does the group calculations. The calculations are then displayed by entering them as an individual command (i.e. mean or median).
g) Reading from an Input file, such as trait data. (Sample trait.txt is included in the QTLReaper download's Example folder) Below: This pulls the first line of File.txt and puts it into 'header' and the second line into 'fileData.'
import string
newFile = open("..Path/..To/File.txt")
header = newFile.readline()
header = string.split(header)
header = map(string.strip, header)
strains = header[1:]
fileData = newFile.readline()
fileData = string.split(fileData)
fileData = map(string.strip, fileData[1:])
fileData = map(float, fileData)
4. Formatting Input Files:
The primary input files for QTLReaper must be in an exact, specific
format to work properly. Before exporting this data, ensure that your
first three columns of data are in this order: Chromosome, Locus, and
cM. Mb location may be added as an optional fourth column, though it
cannot be a substitute for cM. It must be a plain text, tab-delimited
document with Unix line breaks. Microsoft Excel, BBEdit, and other
applications will have the option to export as a tab-delimited text
document, most likely under the "save as" option. Many applications have the
option to change the line breaks to Unix (in BBEdit, for example, in
the Options section of "Save As"). This can also be conveniently accomplished in
Windows by using fromdos.exe, a small free utility available here.
Once the file is exported, the
data may be pushed down a few lines to make room for (optional)
header information. (Remember to convert these line breaks to Unix style
if adding the header information on a Windows or Macintosh machine!)
At the top, above all the data, the user may add
the information that is called by the .type, .name, and .mat/.pat
commands listed above. This is done by: @type:, @name:, and @mat/@pat: with stating the necessary
information following the colon (no space). All these declarations
should be placed on separate lines.
BXD.txt and BXD2.txt are examples of properly formatted
files, and are contained in the QTLReaper download's Example folder.
A larger, non-sample-only dataset is available for download (722k, its Macintosh line breaks will
have to be converted to Unix line breaks to function properly.)
5. Example:
#Import the reaper module
import reaper
#Initiate a Python Dataset object
bxdGeno = reaper.Dataset()
#read genotype information from a file
#argument is a string containing the filename and location
#file is a tab-delimited text file
bxdGeno.read("../../Example/BXD.txt")
################
# Dataset Object
################
#dataset type (RI Set or Intercross)
bxdGeno.type
#dataset name
bxdGeno.name
#progeny
bxdGeno.prgy
#number of progeny
bxdGeno.nprgy
#parent 1
bxdGeno.pat
#parent 2
bxdGeno.mat
#Number of Loci
bxdGeno.nloci
#Number of Chromosome
len(bxdGeno[0])
#1st Chromosome
bxdGeno[0].name
#last Chromosome
bxdGeno[-1].name
#1st Locus
bxdGeno[0][0].name
#last Locus
bxdGeno[-1][-1].name
#Locus cM
bxdGeno[-1][-1].cM
#Whether physical info (Mb) are available
bxdGeno[-1][-1].Mbmap
#Locus Mb
bxdGeno[-1][-1].Mb, if physical info are available
#Locus genotype in numbers
bxdGeno[-1][-1].genotype
#Locus genotype in abbrevs
bxdGeno[-1][-1].genotext
################
# Interval Map
################
#Calculate intervals at 1cM, generate a new dataset object
bxdIntervalGeno = bxdGeno.addinterval()
#Number of Loci
bxdIntervalGeno.nloci
################
# individual chromosome
################
#You can create a new dataset object which contains
only one chromosome for individual chromosome analyses
chr_1_bxdGeno = reaper.Dataset()
chr_1_bxdGeno.chromosome = [bxdGeno[0]]
#Number of Loci
chr_1_bxdGeno.nloci
################
# Member functions
################
#Most reaper functions require two lists as inputs
#the first list is the case or strain list; the second is the trait values list
#the two lists should have the same number of members
#the strain list should have the same cases or strains as those listed in the header
of the genotype file or should be a subset of those
strains =['BXD1', 'BXD2', 'BXD5', 'BXD6', 'BXD8', 'BXD9', 'BXD11', 'BXD12', 'BXD13', 'BXD14', 'BXD15', 'BXD16', 'BXD18', 'BXD19', 'BXD20', 'BXD21', 'BXD22', 'BXD23', 'BXD24', 'BXD25', 'BXD27', 'BXD42']
trait = [53.570 ,63.885 ,56.700 ,61.750 ,66.325 ,65.150 ,60.400 ,57.920 , 51.925 ,62.350 ,67.175 ,65.850 ,52.425 ,60.925 ,65.350 ,56.750 , 59.750 ,57.888 ,60.250 ,64.433 ,57.125 ,63.600]
#vaiance list will be used in weighted regression (using 1/variance as the weights)
variance = [0.777, 0.108, 1.78, 1.18, 0.370, 0.808, 1.549, 0.710, 0.257, 1.482, 1.816, 0.711, 1.204, 0.059, 0.182, 0.591, 0.357, 0.072, 0.490, 0.239, 0.905, 1.327]
#genotypes of a control locus as a cofactor in the regression
control = "D1Mit1"
################
# Regression
################
#the result is a list of QTL Objects
#simple regression (no variance and no control cofactor)
qtl1 = bxdGeno.regression(strains = strains, trait = trait)
#weighted regression (variance weighting but no control cofactor)
qtl2 = bxdGeno.regression(strains = strains, trait = trait, variance = variance)
#composite regression (control cofactor with or without variance weighting)
qtl3 = bxdGeno.regression(strains = strains, trait = trait, control = "D1Mit1")
qtl4 = bxdGeno.regression(strains = strains, trait = trait, variance = variance, control = "D1Mit1")
#maximum QTL
max(qtl1)
max(qtl2)
max(qtl3)
max(qtl4)
#maximum LRS
max(qtl1).lrs
max(qtl2).lrs
max(qtl3).lrs
max(qtl4).lrs
#Locus with maximum LRS
max(qtl1).locus
max(qtl1).locus.name
#sort the qtl list
qtl1.sort()
qtl2.sort()
qtl3.sort()
qtl4.sort()
################
# Permutation
################
#returns a list of highest LRS value for each permutation in ascending order
#fixed number permutations
permu1 = bxdGeno.permutation(strains = strains, trait = trait,nperm=1000)
permu2 = bxdGeno.permutation(strains = strains, trait = trait, variance = variance,nperm=1000)
#progressive permutation
#keep on doing permutations until the threshold LRS is not in the top 10
#or the total number of permutations reaches 1,000,000
permu3 = bxdGeno.permutation(strains = strains, trait = trait, thresh = 23)
#calculate p-value
pv1 = reaper.pvalue(max(qtl1).lrs, permu1)
################
# Bootstrap
################
#returns a list of counts of times that a locus has the highest LRS score
#the length of the list equal to the total number of markers
boot1 = bxdGeno.bootstrap(strains = strains, trait = trait, nboot=1000)
boot2 = bxdGeno.bootstrap(strains = strains, trait = trait, variance = variance, nboot=1000)
################
# Anova
################
# A simple ANOVA (ANalysis Of VAriance between groups) function
list = [1, 2, 4.1, 2, 4, 3.2, 1.1, 5, 5.6, 7.1, 2.3, 4.3, 3.6]
mean, median, variance, stdev, stderr, N = reaper.anova(list)
#######################
# Read from Input file
#######################
#It is easy to read from tab-delimited input file to generate the strain and trait value lists
#use the included trait.txt as example
import string
fp = open("../../Example/trait.txt")
header = fp.readline()
header = string.split(header)
header = map(string.strip, header)
#strip any blank characters
strains = header[1:]
#the header here is the strain list
trait = fp.readline()
trait = string.split(trait)
trait = map(string.strip, trait[1:])
#strip any blank characters
trait = map(float, trait)
#the trait here is the trait valuelist
About this text file
QTL Reaper Tutorial written by Evan Williams, July 2005. Edits by EGW, Aug 4. 2005; KFM, March 20, 2006. Last edit by KFM, March 20, 2006.
|