aboutsummaryrefslogtreecommitdiff
path: root/web/tutorial/WebQTLTour/index.html
blob: 627da9c695cfc5acb3f017c72ea51e3d29ee2a47 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>An HTML Guide to GeneNetwork and WebQTL</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<base href="http://www.genenetwork.org/">
<LINK REL="stylesheet" TYPE="text/css" HREF='/css/general.css'>
<LINK REL="stylesheet" TYPE="text/css" HREF='/css/menu.css'>
<SCRIPT SRC="/javascript/webqtl.js"></SCRIPT>
<SCRIPT SRC="/javascript/tooltip.js"></SCRIPT>
</HEAD>
<BODY  bottommargin="2" leftmargin="2" rightmargin="2" topmargin="2" text=#000000 bgColor=#ffffff>
<TABLE cellSpacing=5 cellPadding=4 width="100%" border=0>
	<TBODY>
		<!--  TOP BANNER   -->
	<TR>
		<script language="JavaScript" src="/javascript/header.js"></script> 
	</TR>
	<TR>
		<TD bgColor=#eeeeee class="solidBorder">
		<Table width= "100%" cellSpacing=0 cellPadding=5>
		<TR>



<div class=Section1> 
  <h1><span style='font-size:18.0pt;font-family:Verdana'>&nbsp;&nbsp;GeneNetwork: A tour and tutorial  <A HREF="/webqtl/WebQTL.py?FormID=editHtml"><img src="/images/modify.gif" alt="modify this page" border= 0></A></P>
</span></h1>

<DIR>

<DIR><I><SMALL>This text is taken from the GeneNetwork Tour available at http://www.genenetwork.org/tutorial/WebQTLTour/ .  See GeneNetwork Help menu for find the latest version of the Tour.]</SMALL></I></DIR>

<p><b>Aim of this tutorial.  </b><span style='font-weight:normal'> 
The goal is to illustrate how to use GeneNetwork (GN) to study gene function and relations between genes and traits such as differences in disease severity, differences in anatomy, and differences in physiology and behavior. You can also use GN to study relations among phenotypes, for example: to what degree does an injection with cocaine or alcohol lead to an increase in movements?  Most of the experimental data sets in GN are from small populations of mice (e.g., the BXD family of strains) and rats (HXB family). There are also some human, monkey, drosophila, and plant data sets to explore, although you may find that mapping functions have not yet been implemented for these other species. 

<P>The focus in this tutorial is how to use some of the most important functions. You should be able to complete this tutorial in about an hour. For this demonstration we will study expression of a key gene known as NR2B or <I>Grin2b</I>. This gene/mRNA/protein is crucial in learning and memory. Once you have worked through this example, you should be able to use GN to explore single genes or set of genes, mRNAs, and other standard traits that interest you.</span></p>
  
<p><b>What you will learn. </b><span style='font-weight:normal'> If you spend an hour working through this tutorial you will learn how to extract dozen of molecules that potentially interact with NR2B. It will be easy for you to generalize what you learn to any other gene or transcript of interest. You should be able to confirm known relations (information from the literature) and you should be able to uncover intriguing new relations among sets of molecules and other traits. You will learn how to exploit a gene ontology (a gene ontology is a simple and systematic way of categorizing the functions of genes) and you will also learn about gene or QTL mapping and complex trait analysis.</span></p>

<P>By the way: if a term is new to you (What is a <A HREF="http://www.genenetwork.org/glossary.html#Q"  class="fs14"  target="_empty">QTL</A>?), and you would like to read an explanation then have a look at the GeneNetwork <A HREF="http://www.genenetwork.org/glossary.html"  class="fs14"  target="_empty">Glossary</A>.

<P>Much of the data in GN were gotten from gene array experiments. As you have undoubtedly heard or experienced yourself, the analysis of array data sets is difficult and sometimes messy. You may encounter poor quality data in some of these enormous data sets. An important part of learning how to use GN involves tools to evaluate data quality. Treat the results you generate with GN with caution. There are solutions to some problems you run into, but for other problems, including comparison across multiple data sets. 

<P>After you have worked through with this tour, please look quickly at the  <A HREF="http://www.genenetwork.org/faq.html">Frequently Asked Questions</A> and the <A HREF="http://www.genenetwork.org/glossary.html">Glossary</A>. Let me know if you have other questions or if you see mistakes that I should fix. Email should go to Rob Williams (rwilliams@uthsc.edu). 

<p><b>Step 1: Getting the terminology right.</b>
<span style='font-weight:normal'> 
   
The use of gene symbols and names in research papers is not consistent. When you search databases it helps to use the preferred or official gene name and symbol. In most papers, the NMDA 2B receptor is abbreviated <I>NR2B</I> and a Google search for &quot;NR2B&quot; will generate about 100,000 hits, many of which deal with the genetics and molecular biology of learning and memory. But it turns out that <I>NR2B</I> is not the official name of either the gene or protein. A great way to verify nomenclature is to go to </span><A HREF="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene"_blank"> NCBI Entrez Gene</a><span style='font-weight:normal'> and enter the gene name or symbol. Entrez gene should be able to resolve your query and give you the correct symbol. </span></p>


</Dir>
<p> 
<img width=735 height=500 src="/images/upload/Tour_1_NR2B.png" v:shapes="_x0000_i1025"></p>

<DIR>
<p><B>Figure 1: Getting the terminology right</B> 

<p>When we enter <b>NR2B</b> or <B>nr2b</B> <span style='font-weight:normal'> we find that the official gene symbol in Entrez Gene is <i>Grin2b</i></span> for mice. The corresponding human gene is the same, but written entirely in capital letters <I>GRIN2B</I>. In both species the official gene name is <i>glutamate receptor, ionotropic, N-methyl D-aspartate 2B</i><span style='font-style:normal'>. </span></p>
  
<p><b>Step 2. Linking to the GeneNetwork search page</b><span style='font-weight:normal'> 

Link to GN at </span><B>www.genenetwork.org/</B>. Ideally, keep this tutorial page open at the same time so that you can look back and forth between the two windows.</b><span style='font-weight:normal'> You have a few choices to make: Choose species = <B>Mouse</B>,  Group = <B>BXD</B>, Type = <B>Hippocampus mRNA</B>. We have a great deal of data for the BXDs, so when in doubt, please select the BXD group of mice. Having made these choices, you still need to pick a particular database; in this case an array data set for a particular brain region called the hippocampus that, like NR2B, is critical in learning and memory. For the purpose of this tutorial, choose the database file called:

<P><B>Hippocampus Consortium M430v2 BXD (Jun06) PDNN</B>

<P>You can set your particular choice of species, group, type, and database as your personal default setting. Simply click on the <B>Set to Default</B> button (lower right). If you want to know what this long database term is all about, click on the <A HREF="http://www.genenetwork.org/dbdoc/HC_M2_0606_P.html"<B>INFO</B></A> button immediately to the right of the database name. 

<P>Now enter your search terms in either of the search term fields labeled <B>Get Any</B> or <B>Combined</B>. <B>Get Any</B> is usually best and will search for all of the entries you put in this field (logical OR). <B>Combined</B> will only get records that match all of the terms that you enter (logical AND). You could enter both </span><b>NR2B</b><span style='font-weight:normal'> and </span><b>Grin2b</b><span style='font-weight: normal'> in the </span><b>ANY</b><span style='font-weight:normal'> field. You can also use wildcard characters ? and * for single or multiple characters. It is often a good idea to enter an asterisk after a search term, such as Grin*. This will get all subunits of a molecule or complex.

<p ><a name="OLE_LINK1"></a><a
name="OLE_LINK2"> <img width=716 height=677
src="/images/upload/Tour_2.png" v:shapes="_x0000_i1026"> </a></p>
<p><B>Figure 2: The Search Screen. This is the page to bookmark.</B>
 

<p><b>Step 3. Retrieving the data.</b><span
style='font-weight:normal'> When you click on the </span><b>Search</b><span style='font-weight:normal'> button, your computer sends this string of search terms to GN (GN is an Apache-Python-MySQL web database system), which then looks through thousands of records for matching terms. The database that we just searched has 45,101 entries that represent close to 20,000 known genes and expressed sequence tags (ESTs). (To determine the size of any database enter a single asterisk (*) in the ANY field.)
</span></p> 

<p><span style='font-weight:normal'> <img width=822 height=688 src="/images/upload/Tour_3.png" v:shapes="_x0000_i1027"> </span></p>
<p><B>Figure 3: Search results for the search string <I>Grin2b</I></B>
 

<p>In this particular case, if you entered just "Grin2b", your will get at a list of six data sets. The last three are measurements of <i>Grin2b</i></span> expression. These thee measure different parts of the mRNA: the distal 3 prime UnTranslated Region (3' UTR), the 3' region of the last coding exons (coding exon 12), and an alternative 3' UTR of a short mRNA splice variant (an mRNA isoform). Which of the three should you pick? The best choice is usually the that which corresponds to coding sequence. In this case, you should study the 5th entry highlighted in red. You an come back to the other two data sets later to see how they compare. And you can ignore the top three <I>Kif17</I> data sets that were found only because the gene description includes the text "NR2B/GRIN2B NMDA receptor transporter." 

<p> <img width=703 height=29
src="/images/upload/tutorial_banner.jpg" v:shapes="_x0000_i1028"> </p>

<P>This is a good point to review several of the features most GN pages. Feature 1 is the banner of terms toward the top  labeled <b>Home, Search, Help, News, etc.</b><span style='font-weight:normal'> etc. Most of these menu headings have pop-down lists from which you can select additional resources and tools. For example, the </span><b>Search</b><span style='font-weight:normal'> menu heading lists the <B>Search Databases</B> page (our starting point), the <B>SNP Browser</B> tool, the <B>GeneWiki</B> resource, <B>Interval Analyst</B>, and <B>GenomeGraph</B></span> displays. These features are worth trying out later. The <b>Policy</b><span style='font-weight:normal'> menu explains how to contact the data providers and how to use and cite data. </span></p> 
  
<p>Another useful feature of the Search Results window is the <B>Sort By</B> selector. You can sort longer lists of "hits" by their location, their expression levels, or by their maximum LRS or LOD scores.  There are also small check boxes to the left of each entry. These are used to select data you would like to move into a Collection. The <b>Add</b><span style='font-weight:normal'> button will move any checked items into your collection. Collections can include any gene, trait, or SNP marker that has measured in the BXD family. You can even add your own BXD data using <B>Enter Trait Data</B> in the MAIN menu (top left). The ability to add diverse data types into a Collection provides a great deal of power.  The use of the <B>Collections</B> is a topic for a more advanced tour. Let's get through this quick tour and then feel free to build up your own collections of phenotypes for "collective" analysis. 

<p><b>Step 4. Reviewing the NR2B expression data.</b><span style='font-weight:normal'> 
Roll you cursor over the term </span><b>ProbeSet/1422223_at </b><span style='font-weight:normal'>in the</span><b> Search Results</b><span style='font-weight:normal'> window. This is the probe set that targets the last exon of the <I>Grin2b</I> transcript. The text will turn red. Click on the term. This will generate a new page called the Trait Data and Analysis Form. 

This </span><b>Trait Data and Analysis Form</b><span style='font-weight:normal'> page is the most important page for the analysis of genes and traits. We will return here several times. The top of this page contains useful background information, including the database that we used, the trait identifier, gene symbol and aliases, the chromosomal location and megabase position (Mb) of <i>Grin2b</i></span> in the mouse genome. GN also includes links to NCBI, OMIM, GenBank, BioGPS, STRING, PANTHER, Gemma, and the Allen Brain Atlas (ABA). To find out more about these resource, just click on the links. There are also many additional useful links under the <A HREF="http://www.genenetwork.org/links.html">Links</A> menu heading.</p>

<p> <img width=765 height=494
src="/images/upload/Tour_4.png" v:shapes="_x0000_i1030"> </p>
<p><B>Figure 4: Trait Data and Analysis Form for <I>Grin2b</I>, probe set 1422223_at</B>

<P>Eight buttons shown in Figure 4 and you need to know what most of them do.
<OL>

<LI>SNP Variant Browser: This link provides you with a list of all known SNPs in <I>Grin2b</I> that are in the GN database. You can, of course, also search for SNPs in other genes or regions. The SNP Variant database is pretty well populated (about 8 million SNPs) and includes all of the Celera SNPs and many SNPs from Perlegen, NIEHS, and our own in-house sequencing projects. As of Jan 2011, there are nearly 7000 SNPs in <I>Grin2b</I>, but only about 20 of these are in exons. Click on the SNP Browser button to have a quick look 9you may need to restrict the SNP search to just <B>Domain = Exon</B>.

<LI>GeneWiki: This link lets you annotated our databases. You can leave yourself notes and comments about particular genes or probe sets. You can easily find your own notes using a special search string described in the <A HREF="http://www.genenetwork.org/searchHelp.html">Advanced Search</A> page. But in short your search would be written out "wiki=myName". Leave out the quotes and make sure that "myName" is in your Wiki entry. It is that simple.  

<LI>Verify Location and Verify RNA Seq: These buttons are used to confirm that the data set correctly targets the last exon of <I>Grin2b</I>. Click on either link. The probe sequences used on the array are sent to the Genome Browser and the best match is found in real time using the BLAT algorithm. The Search Results page allows you to drill down to a view of the genome. Click on the "browser" link to the far left (top row). Look for the horizontal track that is made up of a series of black rectangles labeled "Blat Sequence" or "Probe XXXYYY".  You will also see several "tracks" labeled <I>Grin2b</I> and <I>GRIN2B</I>. If you use the <B>Zoom Out 10X</B> button you will see that the probes and probe set are aligned with the 3' end of the last exon--a bit more detail than we had before. You may also notice that <I>Grin2b</I> is encoded on the minus strand of chromosome 6 and that the tiny arrow heads visible on the last intron point to the left (the transcription direction is from right to left). The Verify RNA seq button has the same function, but takes you to a GN mirror of the UCSC Browser that has added RNA sequencing data for brain, hippocampus, and eye (Jan 2011) of many BXD strains of mice.

<LI>Basic Statistics: This button will generate summaries such as the average expression, the range, bar charts of expression ordered by strain and by rank. Try it quickly.

<LI>Similar Traits: This button will provide you a link to <I>Grin2b</I> expression data in other data sets that may interest you.

<LI>Probe Tool: A link to the sequence data for the individual probes that make up a probe set. This table can be used for a very fine-grained analysis of particular probe sets. 

<LI>Add to Collection: If you would like to add a trait to your collection of traits, transcripts, or markers, use this button. This is the same function we mentioned earlier.

<LI> Reset: GN allows you to modify values for traits and this button is used to reset to the original values.
</OL>

<p>In general, text that uses a blue font is also a link. For example, the text at the top of Figure 4 <A HREF="http://www.genenetwork.org/dbdoc/HC_M2_0606_P.html">Hippocampus Consortium M430v2 BXD (Jun06) PDNN</A> will link you to a Materials and Methods "metadata" or  <B>Info</B> page. There is lots of information on the Info page, but the short summary is that the NR2B expression data in the Hippocampus mRNA database we are exploring was generated from approximately 1200 hippocampii and 600 mice belonging to 99 strains (typically three animals per array and approximately 200 arrays). This is one of the largest data sets in GN. Each array includes samples from a single age, sex, and litter. The BXD strain family were all made by crossing two parental strains, C57BL/6J and DBA/2J. Both of these parental strains have been fully sequenced. The Hippocampus data set includes expression estimates for both parental strain, and also data for 15 other common inbred strains, for example, 129S1/SvImJ, C3H/HeJ, CAST/EiJ, and others.  There is also a complementary, but smaller Hippocampus Consortium data set for the CXB strains of mice.</p>


<p>Farther down the page we encounter sections labeled <B>Trait Correlations</B> and <b>Interval Mapping</b><span style='font-weight:normal'> and </span><b>Trait Data</b><span style='font-weight:normal'>. We will come back to these tools in a moment, but keep scrolling down to the actual numerical data on gene expression for <I>Grin2b</I>.

The larger numbers in the boxes (6.631, 6.612, etc.) are estimates of the abundance of <i>Grin2b</i></span> mRNA in the hippocampii of different samples of mice. The smaller numbers (0.184, 0.205, etc.) are the standard errors of expression, usually based on two arrays (hence SEM also equals SD). Numbers are all expressed using a log base 2 scale. A value of 8 therefore corresponds to 2^8 or 256. A difference of one unit is roughly equivalent to a two-fold difference in expression in <i>Grin2b</i><span style='font-style:normal'> expression. DBA/2J has an expression of 6.506 +/- 0.008 whereas BXD14 has an expression of 7.604 +/- 0.232. That amounts to approximately a 2-fold difference in the amount of transcript. 

<P>These kinds of  preliminary results often generate intriguing and testable hypothesis: do strains of mice have the anticipated differences in learning and memory performance given the known effects of <i>Grin2b</i> overexpression in transgenic mice? If you scroll down farther in this lists of <i>Grin2b</i><span style='font-style:normal'> expression estimates you will see that the strain of mouse called BXD80 only expresses 6.098 units of NR2B whereas BXD42 expresses 8.400 units. That amouts to a putative 5-fold difference in mRNA level. We need to take this with a grain of salt, because these expression estimates are lower than we expect. The average expression for all transcripts (including those NOT expressed) is 8 units. The fact that the average expression of the <I>Grin2b</I> probe set is less than average should make us worry. Are these data too noisy to use? Is there an unsuspected problem with the data handling or the probes? These are hard questions to answer but the <B>Probe Tool</B> is useful because you can look at the expression of the individual probes values used to generate the probe set summary value.  We will skip this process for now, but remember that this "deeper" level is always just one click away. For the time being, let's evaluate the data using the Basic Statistics function.
</span></p>

<p> <img width=572 height=351 src="/images/upload/Tutoral_Stats1.jpg" v:shapes="_x0000_i1031"> </p>
<p><B>Figure 5: Basic statistics for <I>Grin2b</I></B>  

<p><b>Step 5. Basic Statistics.&nbsp; </b><span style='font-weight:normal'>To get a better understanding of these values and how expression estimates are distributed click on the </span><b>Basic Statistics</b><span style='font-weight: normal'> button. A new window will open with a statistical summary table and a box plot toward the top (Figure 5), two bar charts in the middle, and a normal probability plot toward the bottom. You may not be familiar with these types of plots yet, but they are simple to read. The box plot is a simply summary of the spread of the 86 values. The blue plus sign represents the mean expression. The box defines the 25% and 75% quantiles (if we had studied exactly 100 strains these would be those strains at the 25 and 75 rank. If you want more information, just click on the link beneath the plot.

The bar charts are easy to read. They provide a graphic output of the data that you saw in the Trait Data and Analysis Form. The Y axis of the graph is truncated and does not extend down to a value of zero. This tend to highlight the variation within strains. The error bars are quite large, and are only based on two samples (in this case the SEM is usually the same as the SD). Also note that the size of these error bars tend to increase as the expression increases (non-uniform error). High noise and non-uniform variance are all characteristics that should reduce your enthusiasm. But let's persevere, because these data have not yet cost you a penny and because this is a great lesson.

<p> <img width=737 height=497
src="/images/upload/Tutorial_Stats2.jpg" v:shapes="_x0000_i1031"> </p>
<p><B>Figure 6: Normal Probability plot for <I>Grin2b</I></B>  

<P>Toward the bottom of the Basic Statistics page you will find a Normal Probability plot (Figure 6).  There is another link associated with this plot  that will provide more background, but here is a brief explanation of how to read these plots. On the X axis is the expected Z score for every strain based on its ranking out of 86 strains. Values range from about -2.5 to +2.5. If you randomly drew 86 values from a normal  distribution you would expect the lowest value to be about -2.5 standard deviations below the mean (or -2.5 Z) from the mean. The Y axis provides a read-out of the actual expression level. If the expression of NR2B were normally distributed then the strain averages would form a straight line. If expression of <I>Grin2b</I> were obviously controlled by a single Mendelian factor, then this plot would have an S shape with many high strains and many low strains and few strains with intermediate values. Instead this plot highlights skew toward low values (also seen in the box plot). A few strains have comparatively high expression, but the main feature is the excess of  strains with values from 6.0 to 6.5 units. It does not look like a Mendelian trait. But looks can deceive.

<P>What this plot also highlights is the wide range of expression of NR2B gene transcript in normal strains (the BXD strains are not mutant or knockout mice). But the high error raises the possibility that much of this variation is simply sampling error. If we performed an analysis of variance with Strain as our main effect, this data set would be associated with a modest and statistically insignificant F values. But we have other methods to evaluate this putative strain difference. We can map it and see if any interesting patterns emerge from the mapping that might cheer us up and demonstrate that the variation of strain means is actually true signal. In this case, the answer is (fortunately) a strong Yes (LOD = 16). But when you see data of this type, the usual answer will be No. 

<P>Variation in NR2B gene expression is a signal that we can now cautiously use to search for transcripts that co-vary. Does NR2B message covary with other subunits of the NMDA receptor complex (over 1000 transcripts are part of the postsynpatic density of which NR2B is a key member). Does the the expression of other genes compensate for the apparent 5-fold range in NR2B message level?</span></p>

<p><b>Step 6. Covariation of expression.</b><span style='font-weight:normal'> To answer these types of questions return to the </span><b>Trait Data and Analysis Form</b><span style='font-weight:normal'> that has all of the values for <i>Grin2b</i></span> in 86 different strains of mice. This time select the <b>Trait Correlation</b><span style='font-weight:normal'> button. There are five pop-down menus that allow you to modify search parameters.  

Let's modify one of the default settings. Change <B>Return</B> to read <B>top 500</B>. The other seetings are fine for the time being.</span></p> Now click on the <B>Trait Correlations</B> button.

<p> <img width=464 height=101 src="/images/upload/Tutorial_Correlation1.jpg" v:shapes="_x0000_i1032"> </p>
<p><B>Figure 7: How to set up the search for covariates of <I>Grin2b</I></B> 

<p>Within a few seconds of clicking this button, GN will return a new page of data, a<b> Correlation Table</b><span style='font-weight:normal'> of the top 499 transcripts that covary with variation in <i>Grin2b</i></span> expression. At the top of this list (sorted by p value) is <i>Grin2b</i><span style='font-style:normal'> itself. The third best covariate of our <I>Grin2b</I> probe set is another  <I>Grin2b</I> probe set. That is reassuring. 

<P>Let's review the columns:

<OL>
<span style='font-style:normal'>
<LI>The first column is just an index with check boxes. You can easily add items into your BXD Collection using these checkboxes. 
<LI>Record ID: The ID of the trait; in this case just the probe set identifier given by Affymetrix
<LI>Symbol: The official gene symbol. Clicking on these symbols will link you to NCBI. 
<LI>Description: The name of the trait or the name of the gene from which the mRNA is transcribed
<LI>Chr: The chromosome of the gene from which the transcript is transcribed
<LI>Megabase: The chromosomal nucleotide position of the most proximal end of the probe set (mm6 alignment)
<LI>Mean Expression: The average expression of the probe set (mean of strain averages)
<LI>Correlation: The correlation of with the reference trait, in this case with Grin2b probe set 1422223_at.
<LI>N Cases: The number of strains involved in the correlation analysis
<LI>p Value: The p value associated with the correlation and number of cases without correction for multiple tests.
<LI>Lit Corr: The Literature Correlation. This is a very cool column of data generated by Ramin Homayouni, Michael Berry and colleagues that summarizes the correlation of <I>Grin2b</I> with many other genes based upon an analysis of the PubMed Literature.
</OL>

<P>&nbsp;

<p> <img width=800 height=380
src="/images/upload/Tutorial-Correlation2.jpg" v:shapes="_x0000_i1033"> </p>
<p"><B>Figure 8: Correlation Table</B> 

<P>Clicking on any correlation value will generate a scattergram of<i>Grin2b</i> on the X axis and the other transcript on the Y axis. For example, the scattergram of <i>Grin2B</i><span style='font-style:normal'> (X axis) versus <i>Grin2b</i> is shown in Figure 9.  The </span><i>p</i><span style='font-style:normal'> value associated with this correlation is highly signficant and is listed in the upper right corner for both the parametric Pearson's r value and for Spearman's rank order r value. GN generates the <b>Correlation Table</b></span>&nbsp; after performing 45,100 statistical tests; so we should correct for multiple tests. In this case, the <i>p</i> value is significant even if we apply a stringent Bonferroni correction. You may regard this <I>Grin2b-to-Grin2b</I> correlation as somewhat of a disappointment, but the more you appreciate the great complexity of mRNA metabolism, the less suprised you will be.  If you were planning follow-up functional or behavioral studies of strains with high and low <I>Grin2b</I> expression you would obviously want to resolve some of the discrepancies at the protein level or you could (with some risk) just select strains with high or low expression for all forms of <I>Grin2b</I> (BXD15, BXD24, BXD60 vs BXD14, BXD42, BXD96).</p>

<P>Correlation Tables can be resorted. Click on the the small arrowheads in the header column to resort by the Literature Correlation.  You will find that <I>Grin2b</I> actually does covary reasonably well with <I>Grin1</I> (r = 0.501). Generate the scatter plot for <I>Grin1</I> and <I>Grin2b</I>.

<p> <img width=806 height=867 src="/images/upload/Tutorial_Correlation3.jpg" v:shapes="_x0000_i1034"> </p>
<p><B>Figure 9: Scatter plot of two probe sets that measure expression of different parts of <I>Grin2b</I>.</B> 

<p><b>Step 7. Gene ontology analysis.</b> At the top of the Correlation Table</b> is a button labeled <b>Gene Ontology</b>. When you click on this button GN sends a list of gene IDs to the WebGestalt server for analysis. Before you click on this button you need to decide which list of transcripts to send to WebGestalt. The easy answer is to send all 500 transcripts. To do this click on the <B>Select ALL</B> button.  This action will highlight all 500 transcripts. Now click on the Gene Ontology button. The output will be a large graph consisting of three major categories and a "bush" of subcategories. </p>


<p> <img width=791 height=878 src="/images/upload/Tutorial_GO1.jpg" v:shapes="_x0000_i1035"> </p>
<p><B>Figure 10: Gene ontology analysis with WebGestalt</B> 


<p>A gene ontology is a hierarchical categorization of genes by their functions. A large subset of the roughly 20000 genes measured using microarrays have been assigned to one or more functional categories. The three independent trunks of this ontology are "biological process", "molecular function", and "cellular component". Within each of the the GO category  we see the number of genes included in this category and the p value. For example, for the category "Nervous System Development, the numbers are 21 genes with a p value of 0.005. You can click on the category and generate the list of all 21 genes--from <I>Acsl6</I> (the only covariate with a negative correlation) to <I>Ulk1</I>. The single category that has the highest "enrichment" is the molecular function "binding" with a p value of about 0.0000001. This category includes 261 of 500 transcripts on our <I>Grin2b</I> list.  

<b>Step 8: Mapping <I>Grin2b</I>.</b> At this point we have gained some confidence in the <I>Grin2b</I> expression data. While the expression values are low, they seem to be correctly associated with moleculates enriched in the postsynaptic complex. What causes the variation in <I>Grin2b</I> expression among strains of mice? The simple but unsatisfying answer is that differences are caused by <i>genetic variation between the parental strains that are inherited by the BXD progeny.</i></span> The WebQTL module of GN can give us much better answers. WebQTL can tell us where in the genome this genetic variation is located and which parental strain (C57BL/6J or DBA/2J) is associated with higher expression. These chromosomal regions are the ultimate, but most distal causes of variation in <i>Grin2b</i> mRNA abundance. These sources of variance can be mapped like any other genetically determined trait. The method we use to do this is called complex trait analysis or quantitative trait locus (QTL) mapping (hence the term WebQTL). The method is covered at an accessible level in a previous <a href="http://www.nervenet.org/papers/shortcourse98.html" target="_blank">SfN Short Course article</a>. 

<P>But now we will skip the details and proceed directly to the results. Go back to the <b>Trait Data and Analysis Form</b> and click on the <b>Interval Mapping</b><span style='font-weight:normal'> button.</span></p>

<p> <img width=648 height=574 src="/images/upload/Tutorial_Map1.jpg" v:shapes="_x0000_i1038"> </p>
<p><B>Figure 11: QTL map for <I>Grin2b</I></B> 

<p>An <b>Interval Mapping Result</b><span style='font-weight:normal'> window will automatically open after a short pause during which WebQTL calculates and assembles results based on 4 million linear regression equations. The window should have to a single major spike on the right side of chromosome (Chr) 6. The X-axis is a linear representation of all mouse chromosomes, as if they were tied end to end (Chr 1 to the left, and Chr X to the far right).  The Y-axis and the bold lines (blue) provide estimates of the likelihood that differences in NR2B expression are modulated by polymorphic loci (allelic variants). Likelihoods are presented using a chi square statistic called the likelihood ratio statistic (LRS). Big numbers are good in the sense that they signify that we have successfully identified a chromosomal interval that controls <i>Grin2b</i></span> expression. In this case, the number is extraordinarily high, with a peak LRS of 72.9 (LOD of 15.8) on Chr 6. The horizontal gray line and the and pink line are the statistical thresholds. If the spikes exceed the upper rose-colored line, then the linkage between the chromosomal interval and variation in <i>Grin2b</i><span style='font-style:normal'> expression is significant.  In this case, only the Chr 6 linkage is significant, whereas that on Chr 4 at about 70 Mb is suggestive. </span></p>

<p><b>Step 9. Evaluation of candidate genes. </b><span style='font-weight:normal'>Note that on the X-axis just under the largest spike there is a small purple triangle. This triangle indicates that genetic location of the <i>Grin2b</i></span> gene itself. The correspondence of the QTL and the location of the <I>Grin2b</I> gene suggest that polymorphisms in <i>Grin2b</i> modulate  expression (for example, a variant in the <i>Grin2b</i> promoter). The finer jagged line (red or green) provides a gauge of whether DBA/2J (green) alleles or C57BL/6J (red) alleles contribute to higher expression of <I>Grin2</I>. See the far right axis to see how to read this "additive effect" scale. In this case, the allele or haplotype inherited from C57BL/J (red line) contributes to higher expression of <I>Grin2b</I>. Just to make sure, let's zoom in on the map of Chr 6 and confirm that the QTL does align with <I>Grin2b</I>. To zoom the map, click on the chromosome number (Chr 6) in the whole-genome interval map. This will generate a chromosome 6-specific map of <I>Grin2b</I> expression. Once you have this Chr 6 map on your screen, you can zoom again by clicking on the rose-colored horizontal bar at the top of the map. You will end up with a map that looks somewhat like Figure 11, in which the strongest position candidate is obviously <I>Grin2b</I> itself. We have informally "cloned" a QTL for <I>Grin2b</I> expresssion. </p>

<p><span style='font-style:normal'><br> <img width=660 height=425
src="/images/upload/Tutorial_Map2.jpg" v:shapes="_x0000_i1039"> </span></p>
<p><B>Figure 12: Zoomed QTL map for <I>Grin2b</I> on Chr 6</B> 


<p>Having mapped the major controller of <I>Grin2b</I> expression, we can ask if any of the "left-over" variance  can be explained by secondary QTLs. We can use either <B>Composite Interval Mapping</B> or the <B>Pair Scan</B> mapping function (to gain access to Composite Interval mapping you first need to click on the <B>Marker Regression</B> button). Both of these mapping methods are somewhat more advanced than simple interval mapping. Without going into the details (the data are weak), there is a hint that a region  near the centromere of Chr 2 may also affect <I>Grin2b</I> expression by interacting non-additively with the Chr 6 variant of <I>Grin2b</I>.
  

<p><b>Conclusion. </b><span
style='font-weight:normal'>The tour you have just taken has led your through typical steps in analyzing and evaluating gene expression data. This tour may also have generated a number of intriguingand hypotheses about the relations of NR2B to itself and with molecules in the hippocampus. You can repeat this type of analysis with any of about 20000 other genes in this data set. You can do the same analysis for complex sets of transcript and traits. you can also repeat this type of analysis in other tissues. It takes time, but the cost-benefit ratio is high.</span></p>


<p><b>Step 10. A simple self-test</b></p>
  <p>Question1. Can you verify any of these NR2B results using a different data set such as the SJUT Cerebellum M430 data set? What does a success or failure indicate?</p>
  

<p>Q2. Are there any functionally interesting correlations between <i>Grin2b</i><span style='font-style:normal'> expression and behavioral traits in these same strains of mice? (Hint: generate a <b>Correlation Table </b></span>of <i>Grin2b</i><span style='font-style:normal'> with traits in the <b>Published Phenotypes</b></span> database.) What is the  importance of a correlation and what mechanisms can generated these correlations? Can very high correlations still be entirely spurious? </p>

<p>Q3. Do NR1 (<i>Grin1</i><span style='font-style:
normal'>) and NR2B (</span><i>Grin2b</i><span style='font-style:normal'>) share common modulators on Chr 8?</span></p>
    
  
<p><b>Caveat emptor: </b><span style='font-weight:normal'>Always be skeptical regarding results. There are pitfalls of this type of analysis highlighted in some of the questions above. These are more than counterbalanced by tremendous opportunities. Consider GN as a great tool to generate interesting new hypotheses, and be prepared to validate or refute these hypotheses using independent data and direct experimental tests.</span></p>

<p>--------</p>
  
<p>ANSWERS:<b> Q1.</b><span style='font-weight:normal'> No, verification is not possible using the Cerebellum data set. Different tissues have&nbsp; different expression patterns and control. </span>

<b>Q2.</b><span style='font-weight:normal'> Yes, you should get a high correlation with <i>rearing movements after a cocaine injection</i></span> (a paper by B. Jones et al., 1999). Yes, very high correlations can be &quot;functionally&quot; spurious and can arise from linkage disequilibrium, sampling error, or (we hope rarely) poor experimental design.&nbsp; 

<b>Q3.</b> No. 


<p>This work was supported by a Human Brain Project funded jointly by the National Institute on Drug Abuse, National Institute of Mental Health, the National Institute on Alcohol Abuse and Alcoholism, and the National Science Foundation (award P20-MH 62009 and P20-DA 21131 to KFM and RW) and by a separate grant from The National Institute on Alcohol Abuse and Alcoholism (INIA grants U01AA13499, U24AA13513 to RW).</p>
  
<p><span style='color:black'>(Text version of March 28, 2011 by RW</span></p>

<!--
<p><span style='color:black'>(Text generated October 8, 2003 by RW; modified Oct. 27, 2003 by RW; completely rewritten by RW on May 22, 2006.)</span></p>
-->




</TR>
</TABLE>
</TD>
</TR>

<TR>
<TD align=center bgColor=#ddddff class="solidBorder">
<!--Start of footer-->
<TABLE width="90%">
<script language='JavaScript' src='/javascript/footer.js'></script>
</TABLE>
<!--End of footer-->
</TD>
</TR>
</TBODY>
</TABLE><!-- /Footer -->

<!-- menu script itself. you should not modify this file -->
<script language="JavaScript" src="/javascript/menu_new.js"></script>
<!-- items structure. menu hierarchy and links are stored there -->
<script language="JavaScript" src="/javascript/menu_items.js"></script>

<!-- files with geometry and styles structures -->
<script language="JavaScript" src="/javascript/menu_tpl.js"></script>
<script language="JavaScript">
<!--//
// Note where menu initialization block is located in HTML document.
// Don't try to position menu locating menu initialization block in
// some table cell or other HTML element. Always put it before </body>
// each menu gets two parameters (see demo files)
// 1. items structure
// 2. geometry structure
new menu (MENU_ITEMS, MENU_POS);
// make sure files containing definitions for these variables are linked to the document
// if you got some javascript error like "MENU_POS is not defined", then you've made syntax
// error in menu_tpl.js file or that file isn't linked properly.

// also take a look at stylesheets loaded in header in order to set styles
//-->

</script>
</BODY>
</HTML>