web/protocols.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Frequently Asked Questions</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<LINK REL="stylesheet" TYPE="text/css" HREF='css/general.css'>
<LINK REL="stylesheet" TYPE="text/css" HREF='css/menu.css'>
<SCRIPT SRC="javascript/webqtl.js"></SCRIPT>


</HEAD>
<BODY  bottommargin="2" leftmargin="2" rightmargin="2" topmargin="2" text=#000000 bgColor=#ffffff>
<TABLE cellSpacing=5 cellPadding=4 width="100%" border=0>
	<TBODY>
	<TR>
		<script language="JavaScript" src="/javascript/header.js"></script>
	</TR>
	<TR>
		<TD bgColor=#eeeeee class="solidBorder">
		<Table width= "100%" cellSpacing=0 cellPadding=5><TR>
		<!-- Body Start from Here -->
		<TD valign="top" height="200" width="100%" bgcolor="#eeeeee">
		<P class="title">Frequently Asked Questions
		<A HREF="/webqtl/main.py?FormID=editHtml"><img src="images/modify.gif" alt="modify" border=0 valign="middle"></A></P>
<A NAME="freqOfPeakLRS"></A>
		<a name="index"></a>
		<Blockquote class="subtitle">Questions<P>
		<OL>

<LI><A HREF="#Q-1" class="fs14">How do I report an error or program bug? </A>
<BR><BR>
<LI><A HREF="#Q-2" class="fs14">Expression levels are often measured by several probe sets. Which probe set should I use? </A>
<BR><BR>
<LI><A HREF="#Q-3" class="fs14">There are often mutliple database. Which database is best?</A>
<BR><BR>
<LI><A HREF="#Q-4" class="fs14">How should we cite the GeneNetwork and WebQTL, and what are conditions on use of data? </A>
<BR><BR>
<LI><A HREF="#Q-5" class="fs14">How can I compare the correlates from two transcripts that interest me? Let's say I am interested in transcripts that correlate well with both <I>Drd1a</I> and <I>Drd2</I>. </A>
<BR><BR>

<LI><A HREF="#Q-5.1" class="fs14">If I have a list of transcripts that covary with <I>Drd1a</I> how to I decide if the correlations are truly significant or informative?</A>
<BR><BR>

<LI><A HREF="#Q-6" class="fs14">How much would it cost to add transcriptome data for an organ, tissue, or cell type that is more relevant for my research? </A>
<BR><BR>
<LI><A HREF="#Q-7" class="fs14">How many genes and transcripts are in the expression databases, and what fraction of the genome is being surveyed? </A>
<BR><BR>
<LI><A HREF="#Q-8" class="fs14">The Correlation Results window includes a maximum of 500 traits. How can I generate a complete list of all correlations? </A>
<BR><BR>
<LI><A HREF="#Q-9" class="fs14"><B>Validation</B>: Are there great examples of validated QTLs and correlation results? What is the proof that relations detected using the GeneNetwork and WebQTL are biologically compelling and meaningful? </A>
<BR><BR>
<LI><A HREF="#Q-10" class="fs14"><B>Relevance to protein expression</B>: Are measurements of steady-state mRNA levels relevant? Cells operate principally in the proteome domain, and there are many examples of poor correlations between mRNA and protein levels.</A>
<BR><BR>
<LI><A HREF="#Q-11" class="fs14">What is the  best way to  handle a whole set of interesting traits or transcripts simultaneously? For example, can I study the genetics of all dopamine receptors simultaneously?</A>
<BR><BR>
<LI><A HREF="#Q-12" class="fs14">What web browser do you recommend?</A>
<BR><BR>
<LI><A HREF="#Q-13" class="fs14"><B>Reverse mapping</B>: How can I find a set of transcripts and other traits that are possibly controlled by a transcription factor or other gene variant that I already know about? For example, in the paper by Chesler et al., the region near <I>D6Mit150</I> is nominated as a master controller. What are some of the controlled traits? How to I review them efficiently since they are not all listed in the paper?</A>
<BR><BR>
<LI><A HREF="#Q-14" class="fs14"><B>Finding transcripts that modulate their own expression levels (cis-QTs and cis-QTLs)</B>: How can I find a set of transcripts or proteins that are under tight control by a locus that overlaps their own physical location in the genome—that have a cis-QTL?  This class of transcripts is particulary interesting because polymorphic genes that modulate their own expression, may also produce numerous downstream effects.</A>
<BR><BR>
<LI><A HREF="#Q-15" class="fs14">How do you error-check data?</A>
<BR><BR>
<LI><A HREF="#Q-16" class="fs14">Is there a way for me to automatically generate a log file of my use of the GeneNetwork and WebQTL?</A>
<BR><BR>
<LI><A HREF="#Q-17" class="fs14">How can I determine the precise region of the transcript that is targeted by a particular Affymetrix or Agilent probe set?</A>

<BR><BR>
<LI><A HREF="#Q-18" class="fs14">I am having trouble with the <B>Network Graph</B> feature. Problems include time-outs and failures to display the graphs.</A>

<BR><BR>
<LI><A HREF="#Q-19" class="fs14">What expression levels are considered high and reliable; what expression levels are so low as to disregard?</A>


</OL>
<BR></Blockquote>

<Blockquote class="subtitle">Answers<P>

<Blockquote>
<B>Q1: </B> <A NAME="Q-1" class="Normal">How do I report an error or program bug?</A><BR><BR>

<B>A1: </B>Software errors that generate on-screen error messages are automatically logged and reviewed by us, usually on a daily basis. If you note an error on the public site (rather than the less stable beta site) that is persistent (more than one day) or that is really causing you trouble, please send us an email notification immediately and we wil do our best to resolve the problem. Email us at: 
<BR>
webqtl@gmail.com, rwilliam@nb.utmem.edu
 
<SMALL>[RWW, September 27, 2005]</SMALL>
<BR>  
		<BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>

<Blockquote>
<B>Q2:</B> <A NAME="Q-2" class="Normal">Expression levels of some transcripts are measured by two or more probe sets, but their values do not correlate well with each other. For example, two probe sets that target <I>Bcl2l</I> have no correlation with each other, whereas two probe sets for <I>Erbb3</I> show a strong negative correlation (<I>r</I> = -0.74 using the UTHSC Brain mRNA U74Av2 RMA data set). In cases such as these which probe set should I trust? </A><BR><BR>

<B>A2:</B> Probes vary greatly in hybridization properties and sensitivity to cross-hybridization. They also target different exons and different parts of the 3' untranslated regions of transcripts (3' UTR). A very small number (<0.1%) also contain SNPs that can affect hybridization efficiency.</P>

<P>The quickest answer is to use the set of probes with the highest and most consistent expression. Higher intensity signals usually have a higher signal-to-noise ratio. Select the <B>Probe Information</B> page from the <B>Trait Data and Analysis</B> form. It is interesting (and sometimes scary) to compare the mean and standard error of the mean (SEM) of the signal of different probes in the set. Also check the heritability estimate of the entire probe set in the <B>Basic Statistics</B> page. Heritability is a often a reasonably good indicator. You can also compare the lists of top 100 correlated transcripts for the different probe sets and see if one probe set makes more sense given the known biology and function of the gene. 

There are other important features that you may want to examine.
<OL>
<LI>Check the placement of the probes that are part of the probe set. Use the <I>Verify UCSC</I> or <I>Verify ENSEMBL</I> button next to the probe set position in the <B>Trait Data and
Analysis</B> window. The <I>Verify</I> functions will BLAT the concatenated probe sequences (overlap is trimmed away) to the most current mouse genome assembly. If the placement and annotation appears to be wrong, please email us. 

BLAT analysis of <I>Erbb3</I> reveals a relatively complex situation. The two probe sets target different <I>Erbb3</I> expressed sequence tags (ESTs).
<LI>Use the <B>Probe Information</B> link in the <B>Trait Data</B> window to view exon
targets and the original probe sequences and their mean expression.
<LI>Select all the probes and add them to your BXD selections. Use the <B>Custer Map</B>
to view the probe-specific QTLs. Strong cis QTLs detected only in a group of tightly overlapping
probes may indicate a SNP.
<LI>Each probe can be examined as an individual trait. Check the noise of the
probe using <B>Basic Statistics</B> window.
<LI>Individual probe sequences can be BLATed to the genome using UCSC's BLAT
function. You can also retrieve the sequence data to compare individual probes by
location and known polymorphisms.
<LI>Also from the selections page, use the <B>Correlation Matrix</B> function to generate a
correlation matrix and perform a principal component analysis (PCA). The PC scores can be used as "consensus
traits." You can eliminate probes that appear to misbehave from you selections
prior to performing the PCA. <SMALL>[EJ Chesler, Oct 2004; minor update by RWW, Jan 2004]</SMALL>
</OL>
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>
		

<Blockquote>
<B>Q3: </B> <A NAME="Q-3" class="Normal">There are often mutliple database versions associated with each tissue or experiment. Which database is best? </A><BR><BR>

<B>A3: </B>GeneNetwork often provides several complementary transformations of data sets, for example PDNN, RMA, and MAS5. The Position-Dependent Nearest Neighbor (PDNN) method of Zhang and colleagues generally gives better results than two more common alternatives--RMA and MAS5 transforms. 

<P>To determine the best data set among alternatives do this: enter the string "CisLRS=(50 1000 10)" into the <B>ANY</B> search field for the first of the alternatives that interest you. This is one of GN's Advanced Search strings that finds all transcripts that are associated with a very strong quantitative trait locus (QTL) very close to the location of the gene. The command translates as "find all transcripts with an LRS value above 50 and less than 1000 that is located within 10 Mb on either side of the gene." GeneNetwork will compute the number of transcripts that are associated with very high LRS or LOD scores. The great majority of these hits are naturally genes that modulate their own expression. This number is an excellent measurement of data quality. GN will open a new page with the total numbers of hits.  The number will be listed in red font toward the top of the Search Results page. For example, there are several alternative data sets for the cerebellum of the BXD genetic reference panel. If you systematically test each of these you will get the following results:

<OL>
<LI>  n = 130   GE-NIAAA Cerebellum mRNA M430v2 (May05) RMA    
<LI>  n = 117   GE-NIAAA Cerebellum mRNA M430v2 (May05) MAS5  
<LI>  n = 207   GE-NIAAA Cerebellum mRNA M430v2 (May05) PDNN  
<LI>  n = 514   SJUT Cerebellum mRNA M430 (Mar05) RMA  
<LI>  n = 732   SJUT Cerebellum mRNA M430 (Mar05) PDNN  
<LI>  n = 420   SJUT Cerebellum mRNA M430 (Mar05) MAS5  
<LI>  n =  91   SJUT  Cerebellum mRNA M430 (Oct04) MAS5  
<LI>  n = 228   SJUT  Cerebellum mRNA M430 (Oct04) PDNN  
<LI>  n = 130   SJUT Cerebellum mRNA M430 (Oct04)  RMA  
<LI>  n =  85    SJUT Cerebellum mRNA M430 (Oct03) MAS5  
</OL>

In this case, the 5th data set is significantly better than all of the other transforms or data sets (n = 732 trnscripts associated with LRS values above 50 (a LOD score > 10). There is really no way to systematically generate high numbers of these so-called cisQTLs as an artifact. One of the advantages of large transcriptome mapping data sets is that we have internal but entriely objective measures of data quality. The only caveat is that some of the cisQTLs will be generated by hybridization artifacts (SNPs and other sequence variants). However, this is generally an artifact of the array platform and not of the transformation method.  

<P>When available we recommend using databases that have the suffix <B>HWT</B>, for example the database "UTHSC Brain mRNA (Dec03) HWT1PM." The heritability weighted transform (HWT) accentuates meaningful variation in probe signal and takes advantage of the unusually large data sets used by GN. HWT outperforms PDNN for the majority of probe sets as assessed by the strong covariance among probe sets in single data sets and in terms of the yield of QTLs at a fixed false discovery rate. 

<Blockquote>
Manly KF, Wang J, Williams RW (<A HREF="http://genomebiology.com/2005/6/3/R27" class="fs14" target="_blank">2005</A>) Weighting by heritability for detection of quantitative trait loci with microarray estimates of gene expression. Genome Biology 6:R27  <A href="http://genomebiology.com/2005/6/3/R27" class="fs12"><I>Full Text HTML and PDF Version</I></A>. 
</Blockquote>

<P>MAS5 and dChip do not generally perform as well as the other transforms. However, there are a few probe sets for which MAS5's reliance on the mismatch probe actually does improve performance, one instructive example being the transcript of <I>Pam</I> using the selection sequence Mouse -> BXD -> Striatum. WebQTL also provides access to the primary probe signals, and it is possible to generate custom probe set consensus expression estimates by performing a principal component analysis of all or a subset of probes (see the previous question). <SMALL>[RWW, Dec 14, 2004; Sept 25, 2005; April 23, 2006]</SMALL> <BR><BR>
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>

<Blockquote>
<B>Q4: </B> <A NAME="Q-4" class="Normal">How should we cite the GeneNetwork and WebQTL, and what are the conditions on use of data? </A><BR><BR>

<B>A4: </B>Please have a look at the <A HREF="http://www.genenetwork.org/reference.html" class="normal" target="_blank">References</A> page or at the <A HREF="conditionsofUse.html" class="normal" target="_blank">Usage Conditions</A> page. If you have other questions about a particular data set, please link to the <A HREF="statusandContact.html" class="normal" target="_blank">Contacts</A> page the individual data sets. <SMALL>[RWW, Dec 14, 2004, Feb 23, 2005]</SMALL>
<BR><BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>

<Blockquote>
<B>Q5: </B> <A NAME="Q-5" class="Normal">How can I compare the correlates from two transcripts that interest me? Let's say I am interested in transcripts that correlate well with both <I>Drd1a</I> and <I>Drd2</I>. </A><BR><BR>

<B>A5: </B>The two traits need to have been measured using the same genetic reference population, such as the BXD strains. But it is ok if they have been measured in different tissues. Put <I>Drd1a</I> and <I>Drd2</I> transcripts into a single <B>Selections</B> window. Click on their small selection boxes, and then use the <B>Compare Correlates</B> function. <SMALL>[RWW, Dec 23, 2004]</SMALL>
<BR><BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>

<Blockquote>
<B>Q6: </B> <A NAME="Q-5.1" class="Normal">If I have a list of transcripts that covary with <I>Drd1a</I> how to I decide if the correlations are truly significant or informative?. </A><BR><BR>

<B>A6: </B> In most databases correlations under 0.7 will have relatively high false discovery rates (FDR). However, this statement needs to be moderated if you already have strong prior data that suggests that such correlation should exist. The Literature Correlation column (far right) tries to formalize the likely biological connection between two genes based on a comparison of PubMed abstracts for the genes.  

<P>One can compute a formal  FDR for the data in a correlation table given the size of the array, but the FDR does not account for the noise structure of the array data. Structured noise, such as batch effects, can seriously inflate correlations.  We recommend that your biological sense of the system you are studying be a major "prior" in evaluating a list of correlations. You can also compute the Gene Ontology stats for the top 100 or 500 transcripts. A "bad" list should not generate an interesting GO structure.

<P>Here is an operation that will help you in evaluating the significance of correlations:  Search the ANY field using this string "mean=(1 5)".    This will  find probe sets with very low expression. For example, in the BXD Whole Brain INIA PDNN data set, this search string returns 10 probe sets.  For example, the correlation table for <I>Abcd2</I> (probe set 1439835_x_at_B) starts at a high value of r = 0.65.  Similarly, <I>Myo1f</I> has a top covariate of 0.73 but then shifts down immediately to 0.64. These correlations are not likely to be biologically meaningful, particulary without strong prior data.


 <SMALL>[RWW, May 12, 2006]</SMALL>
<BR><BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>

<Blockquote>
<B>Q7: </B> <A NAME="Q-6" class="Normal">How much would it cost to add transcriptome data for an organ, tissue, or cell type that is more relevant for my research? </A><BR><BR>

<B>A7: </B>Between $50,000 and $100,000. A minimum sample size is two biological replicates for each member of the genetic reference population (GRP), often one male sample or pool of male samples, and one female sample or pool of female samples. If the GRP contains 40 genomes or strains, then you need to budget for a minimum of 90 arrays (10 for control, wastage, and reruns). Ideally all of the samples should be processed in one large batch, although batches of 20 or more arrays can usually be normalized to each other fairly well. We would be happy to help generate new data sets at any stage, the earlier the better. <SMALL>[RWW, Dec 23, 2004]</SMALL>
<BR><BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>

 
<Blockquote>
<B>Q8: </B> <A NAME="Q-7" class="Normal">How many genes and transcripts are in your databases, and what fraction of the genome is being surveyed? </A><BR><BR>

<B>A8: </B>The U74Av2 data sets (brain and hematopoietic stem cells) contain ~12,400 probe sets that target about 9,000 different UniGene clusters. A UniGene cluster is a group of real and putative mRNAs that appear to be generated from a single gene (unified gene). The M430 data sets contain ~45,000 probe sets that target at least one member from each of ~32,000 nonredundant UniGene clusters out of a total of 46,000 clusters in the most recent UniGene build (#143) of <I>Mus musculus</I>. 

<P>What fraction of the genome and transcriptome does this represent? According to the most recent <A HREF="http://www.ncbi.nih.gov/IEB/Research/Acembly/HelpJan.html#MainGenes" target="_empty" class="fs14">AceView</A> summary (Nov 2004), there are 51,000 main genes (well-supported genes that code for proteins with at least 100 amino acids or that contain conventional introns) in the human genome. There are another 60,000 putative genes, some of which may be pseudogenes. Finally, there are an additional 229,000 so-called cloud genes that have a few associated GenBank sequences (usually less than 6 entries) but do not contain introns and do not code for protein (no open reading frame). These cloud genes are often intercalated in the right orientation near or in main genes. The mouse genome is likely to have nearly the same numbers of these three categories of genes. The majority of main genes are associated with multiple alternative splice variant transcripts, often more than 5. Thus, old COT curve estimates that there are 200,000 or more unique transcript species in a single tissue such as the brain are entirely plausible. 

<P>In summary, the Affymetrix M430 2.0 array is likely to represent 50% to 70% of main genes, an unknown fraction of putative and cloud genes, and a more modest fraction of the entire transcriptome. However, it is likely that the M430 array samples at least one common transcript (or a collection of transcripts with the same 3' UTR) from the great majority of abundant and widely expressed genes that have 50 or more UniGene GenBank entries. Array platforms of this type can therefore be called "whole genome" arrays without too much inaccuracy. They cannot be considered true "whole transcriptome" arrays. 

<P>The Agilent <A HREF="http://www.chem.agilent.com/scripts/generic.asp?lPage=10408&indcol=Y&prodcol=Y" target="_empty" class="fs14">G4121A</A> toxarray consists of 20868 60-mer probes and is likely to represent 40% of so-called main genes listed in <A HREF="http://www.ncbi.nih.gov/IEB/Research/Acembly/HelpJan.html#MainGenes" target="_empty" class="fs14">AceView</A>. 


<SMALL>[RWW, Jan 1,2, 2005]</SMALL>
<BR><BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>


<Blockquote>
<B>Q9: </B> <A NAME="Q-8" class="Normal">The Correlation Results window includes a maximum of 500 traits. How can I generate a more comprehensive list of all correlations? </A><BR><BR>

<B>A9: </B>Select the <B>SEARCH</B> menu item labeled <B>Simple Query Interface</B>. Select the appropriate menu items, enter the trait identifier (a specific ID), and chose an output order and format. The output can be saved as a tab-delimited text file and imported into spreadsheet and statistics programs. 

<SMALL>[RWW, Jan 2, 2005]</SMALL>
 <BR><BR>
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>


<Blockquote>
<B>Q10: </B> <A NAME="Q-9" class="Normal">Are there strong examples of validated QTLs and correlation results? What is the proof that relations detected using the GeneNetwork and WebQTL are biologically compelling and meaningful? </A><BR><BR>

<B>A10: </B>Yes, there are already several examples, and we expect the number of validated results to increase rapidly along with the depth and quality of data sets. Here are examples:
<OL>
<LI>Pumilio 2 is a mouse homolog of the Drosophila RNA-binding gene <A HREF="http://www.sdbonline.org/fly/gene/pumilio.htm" target="_new" class="fs14"><I>pum</I></A>. The <I>PUM</I> protein in Drosophila binds to a 3' UTR Puf domain in a number of mRNAs and strongly inhibits their translation (a translational repressor). While the mouse pumilio homologs of this well conversed eukaryotic gene have been known for several years, there were no known mRNAs that are <I>PUM2</I> targets. Using WebQTL, Scott and colleagues (2004) noted strong positive correlations between <I>Pum2</I> and <I>Rbbp6/P2P-R</I> message levels in three transcriptome data sets (forebrain, cerebellum, and hematopoietic stem cells). <I>P2P-R</I> is an important nuclear gene (also known as retinoblastoma binding protein 6) that is involved in p53-mediated transcriptional control. The robust covariance between <I>Pum2</I> and <I>P2P-R</I> suggested that <I>P2P-R</I> was a target of <I>Pum2</I> repression. Three nearly perfect Puf domains were subsequently found in the 3' UTR of the <I>P2P-R</I> 3' UTR, providing additional bioinformatic support. Subsequent pull-down experiments carried out by E. White-Grindley and E. Ruley provide the final confirmation that <I>PUM2</I> protein binds to P2P-R mRNA. <SMALL>[RWW, Jan 8, 2005]</SMALL><BR>

<Blockquote>
<SMALL>
Scott RW, White-Grindley E, Ruley HE, Chesler EJ, Williams RW (<A HREF="http://www3.interscience.wiley.com/cgi-bin/fulltext/109860478/HTMLSTART" target="_blank" class="fs14">2004</A>) P2P-R expression is genetically coregulated with components of the translation machinery and with PUM2, a translational repressor that associates with the P2P-R mRNA. Journal of Cellular Physiology, in press. <A href="http://www3.interscience.wiley.com/cgi-bin/fulltext/109860478/HTMLSTART"  target="_blank" class="fs12"><I> Full text HTML version</I></A>
</SMALL>
</Blockquote>

<LI> Retinoblastoma binding protein 7 (<A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12376095l" target="_blank" class="fs14"><I>Rbbp7</I></A>, Mis16 or p55, probe set 1415775*) is part of the core histone deacetylase complex that modulates nucleosome structure via effects on histone transport and acetylation, and DNA methylation. Together with <I>RBBP4</I> (1434892*) and several other proteins such as <I>BRCA1</I> (1424629*), <I>MTA1</I> (1417295*), <I>MBD3</I> (1417728*), <I>HDAC1</I> (1448246*), and <I>HDAC2</I> (1445684*), <I>RBBP7</I> protein helps suppress levels of transcription, enhances apoptosis, and inhibits cell growth and transformation (Cheng et al., <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11394910" target="_blank" class="fs14">2001</A>). The gene maps to Chr X at approximately 153 Mb. Its expression is comparatively high in brain and kidney (Yang et al., <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12376095" target="_blank" class="fs14">2002</A>). We have shown that variation in <I>Rbbp7</I> expression in the striatum of BXD strains is substantial. Expression is high in C57BL/6J and comparatively low in DBA/2J (1415775* in the <A HREF="http://web2qtl.utmem.edu:81/dbdoc/Striatum_M430_V2_MAS5_November04.html" target="_blank" class="fs14">HBP/Rosen Striatum M430v2 11/04 PDNN data set</A>). This variation is caused by a strong QTL that peaks very near to the <I>Ahr</I> marker (LRS of 21.3; also see the adjacent marker <I>D12Mit153</I>) on proximal Chr 12. <I>Ahr</I> is not a typical marker; it is actually the aryl hydrocarbon response gene  (1449045* and BXD Published Phenotype ID 10371). The <I>AHR</I> protein is an important transcription factor that complexes with the <I>ARNT</I> nuclear translocator (Affymetrix probe set 1437042*) and binds to xenobiotic response elements and AhR elements in promoters to influence gene expression. There is a critical leucine-to-proline substitution in the <I>Ahr</I> gene that results in a 15 to 20-fold reduction in the binding affinity of the proline variant found in DBA/2J compared to the <bI>b-1</I> leucine variant  found in C57BL/6J (Chang et al., <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=8148872" target="_blank" class="fs14">1993</A>). Note that Published Phenotype ID 10371 demonstrates an 80-fold variation in Ahr induction by anthracene that unequivocally maps to the <I>Ahr</I> gene locus on Chr 12 (despite the title of the <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=6547399&dopt=Abstract" target="_blank" class="fs14">1984</A> paper by Levgraverend and colleagues). For this reason <I>Ahr</I> is an unusually compelling candidate gene. If variation in the binding affinity of <I>AHR</I> isoforms causes expression difference, then we naturally expect an AhR element in the promoter of <I>Rbbp7</I>. The 5' UTR and proximal promoter of <I>Rbbp7</I>has the following sequence:<BR><BR>  
<DIR>
<SMALL>
ACACC GCGCT CGCAT CCGCC CCACC CCCGC GCGGG CCCAG CCGCC CCCGC GGCCA GCCTG GGGAG TGACG CCTCG CGCCT GCGCC TCGCC GACTT CCTGC 
CGCGG AACGC CCCAC CCACT CTCGA GAAGC CCACC CCCGG AGAGC GCGTC AGACC CTCCC GTCGC ACGCT ATTGG TCCAA GCCGC CGAGC CGTTG GCTCC 
CAGGC CCGCC TCTTC TCCGC CTCTC CAATT TCCCA GGGCG GCTGC GCCTG CGCTC AGCTG CCTGG GCGGG CTGAG AGGCG CGGGT TGAAA AGTCT CGTTC 
CAAGT TTGGC GAGAG GGAGA GAGAG GAGAG CGGCT CAGAC CTCGC TACCC GCCAG CGGGG AGGAG GCAG AAGAG GAGAT CGCGG CGTCT GGGGG GAGAA 
CCCAG ACGGC CAGAC CGAAC TCAGG CTTTT CCGAG C<B>GAGG ACTGC GTGAC GTGCC</B>
TGGGA GAGGC AAGGA GCGCC TGCCG GGCTG CTCTT GACTA GCGAG 
AGAGA AGTCC GAGGC GGCCA AGGGG GGCGA AACGA CCCGA CGCAA G<I>ATGG CGAGT AAAGA GAGTA AGGAT GCCTG CCCTG TGGGG CGGGC GGGCG TGCGG </I>
</SMALL>
</DIR>
<BR>The ATG translation initiation codon and exon 1 are highlighted using italic font, and the position of the AhR consensus binding site is highlighted using bold font. (Note: <I>Rbbp7</I> does not have a TATA box.) All of the conditions are met for <I>Ahr</I> to be the polymorphic gene that modulates <I>Rbbp7</I> expression among BXD strains. Do sequence differences in <I>Ahr</I> produce an effect on the steady state expression level of its own mRNA? In other words, is this gene also a cis QTL?  The answer is a qualified no. In the striatum, <I>Ahr</I> (1422631*) is a good example of a polymorphic gene that does not act primarily via changes in its own mRNA level but acts via "classical" differences in protein sequence and conformation. However, this is not true in the liver, in which there is unequivocal evidence of cis modulation of <I>Ahr</I> (Agilent probe P449133). Thus <I>Ahr</I> is likely to have downstream effects due to two distinct mechanisms--one acting via differences in <I>Ahr</I> gene expression, the other acting via changes in <I>AHR</I> protein binding affinity.

<BR><BR>Another gene regulated by a QTL that coincides with <I>Ahr</I> that also has an AhR response element is <I>Exoc2</I> (1426630*).

<!-- and perhaps AI661017 (1445448*), looks like <I>Cenpb</I> is controlled by Ahr in Striatum PDNN dataset and it does have an Ahr-Arnt site in the middle of this sequence ccacctcccgg <B>gttcacgccactctc</B> tgcctcagcct-->

<BR><BR>
<SMALL>*Affymetrix probe set identifiers are listed without the "underscore_at" type suffixes. Enter an asterisk when searching for the probe sets in WebQTL (eg., 1415775*). When mulitple probe sets are available, I have selected the best overall performer using criteria listed in Q&A 8. To enter all of these probe sets, just copy and paste this string into the "Any term" field: 1415775* 1434892* 1424629* 1417295* 1417728* 1448246* 1445684* 1422631* 1437042* . </SMALL>

<!-- Other related genes include 1428068* 1423967* 1427661* 1427171* 1451695* 1418646* 1427032* 1460741* 1436161* 1435986* 1448892*
-->

<SMALL>[Example 2 is based on preliminary work by RW Williams, GD Rosen, and colleagues (2005).  RWW, Jan 8,9, 2005]</SMALL><BR>


<!-- example 3 to go here -->


</OL>

<BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>


<Blockquote>
<B>Q11: </B> <A NAME="Q-10" class="Normal">Are measurements of steady-state mRNA levels relevant? Cells operate principally in the proteome domain, and there are many examples of poor correlations between mRNA and protein levels.
</A><BR><BR>

<B>A11: </B>It is true that there are many examples of poor correlations between mRNA and protein levels, but this fact does not negate the strong global tendency of mRNA expression and protein expression to be correlated positively. It is important to recognize the strong coupling between message and protein levels. Technical errors in estimating mRNA and protein level will inevitably degrade positive correlations. A powerful test of the mRNA-protein relation is the ability to predict cell phenotype from mRNA data. An excellent example is work by Markam and colleagues (Toledo-Rodriguez et al., <A HREF="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15192011" target="_blank" class="fs14">2004</A>) in which major electrophysiologically-defined classes of neocortical neurons were accurately classified using expression data for merely 29 mRNAs (3 calcium-binding and 26 ion channel genes).

<P>Another interesting and related question to consider: How are positive and negative correlations between transcripts achieved at a mechanistic level? Keep in mind that we always have to keep on mental eye on the idea of difference among individuals and strains. It is easy to get tied up in a mechanistic explanation and to neglect the actual source of the phenotypic variation among individuals that we are trying to explain. There are probably many answers to this questions:
<OL>
<LI>Common transcription factors and cofactors (proteins <B>x</B>, <B>y</B>, and <B>z</B>) modulate the expression of a pair of transcripts A' and B'. The levels of <B>x</B>, <B>y</B>, and <B>z</B> differ among cases and strains and this variation generates well coupled differences in expression of genes A and B that we pick up as a positive or negative correlations in the array data sets between A' and B'. What is interesting about this idea is that the effectors <B>x</B>, <B>y</B>, and <B>z</B> may have a difference in protein expression or protein sequence among the cases or strains (protein variation --> mRNA variation). Genes A and B that vary in expression at the transcript level (A' and B') will not necessarily vary in expression at the protein level (<B>a</B> and <B>b</B>). A secondary homeostatic mechanism may neutralize differences (protein variation --> mRNA variation -- no protein variation).  While it is most like that <B>x</B>, <B>y</B>, and <B>z</B> protein effectors vary among cases and strains, this is not essential. Alternatively there may be a segregating sequence variants in the promoters of BOTH genes A and B that generate coupled variation in A' and B' mRNA. (no protein variation --> DNA target variation --> mRNA variation...). This final model would require both A' and B' transcripts to have so-called cisQTLs. In other words, the variation in A' and B' mRNA is associated with local cis-sequence variants in their genes of orgin, A and B.

<LI>The pair of transcripts A' and B' that covary in expression at the transcript level also covary in expression at the protein level, <B>a</B> and <B>b</B>. This mRNA and protein covariance is NOT due to the action of common transcription factors on genes A and B. Instead, the correlation is driven by networks of interactions in the protein domain that ultimately link different transcriptional control circuits: circuits <B>x</B>, <B>y</B>, and <B>z</B> for gene A and transcription control circuits <B>p</B>, <B>q</B>, and <B>r</B> for gene B. The two sets of transcriptional control cirucuits <B>xyz</B> and <B>pqr</B> are themselves partially coupled. In this model, I have stated that A and B covary at both mRNA and protein levels. This is not necessary. The variation and correlation could in principle be isolated to the mRNA domain. If we entertain this idea, then we are saying that the variation in mRNA level is effectively a read-out of differences in the amount or sequence of proteins that modulate mRNA expression (protein variation --> mRNA --> no protein variation). If we concede that the mRNA variation does not lead to protein variation, we still need a cause for the original mRNA variation, and that will usually be upstream strain variation in protein level or sequence. Variation in mRNA is essentially providing us with an assay of variation in the upstream transcriptional protein circuits. In some cases, it may also be due to local cis-acting promoter variants in both A and B, but this is likely to be uncommon and should be detected as pairs of reciprocal QTLs.

<LI>Technical confounds can introduce correlations in the expression of A' and B'.  Imagine if data for the first 20 cases or strains were all acquired in the winter months and data for the second set of 20 cases or strains were all acquired in the summer months. If there were major differences in the technical personel handling arrays, or in the particular batch of arrays or reagents, one might easily introduce large differences in apparent expression. Technical factors or batch effects of this type can introduce large group differences that will tend to inflate the absolute values of correlations among many traits. The variation within the several batches may may lead to relatively well distributed scatter plots. Batch effects are a major problem in large array experiments of the type incorporated into GeneNetwork. If you review the INFO pages for any of the data sets you will see detailed descriptions of how cases were processed to minimize the potential batch effect confound. More recent data sets have better and larger designs that are better protected from batch effect. Technical and biological replicates can be used to detect and control for batch effects. Interleaving samples across multiple batches is also important in minimizing batch effect confounds.

<SMALL>[RWW, Jan 9, 2005; Sept 27, 2005]</SMALL>
<BR><BR>
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>


<Blockquote>
<B>Q12: </B> <A NAME="Q-11" class="Normal">What is the best way to analyze a group of interesting traits or transcripts simultaneously? For example, can I study all dopamine receptors simultaneously?
</A><BR><BR>

<B>A12: </B>Yes, there are several tools for this type of multi-trait analysis, including (i) the <B>Correlation Matrix</B> tool that will perform a Principal Component Anaysis (PCA) of a group of traits and (ii) the <B>Cluster Map</B> tool that allows you to visually detect common QTLs for sets of traits. Here are the instructions:
<DIR>
<OL>
<LI>Select the traits that interest you from any of the Genetic Reference Population. You can select traits and transcripts from multiple databases. You can select traits from the Published Phenotypes databases, Genotype databases, and any of the array databases. All of these traits need to be moved to the <B>Selections</B> window by clicking on the <B>Add Selection</B> button. Of course, all of the traits in a single <B>Selections</B> window must come from a single genetic reference population. The reason is simple: to compute a correlation coefficient the different measurements have to originate from common cases or strains.
<LI>Once you have added traits to the <B>Selections</B> window, you now need to select the subset of traits that you would like to analyzed together. If you plan to run a PCA using the <B>Correlation Matrix</B> function then keep the number of traits that you select under about 20 or 30 and/or drop any traits that have only be studied in a small number of strains. Click the check boxes to the left of each trait or click the <B>Select All</B> button.
<LI>Now click the <B>Correlation Matrix</B> button.
<LI>Review the matrix of correlation coefficients. You may want to drop traits if they do not appear to covary (positively or negatively) with any other traits. To drop a trait you must return to the <B>Selections</B> window and deselect the checkbox and click the <B>Correlation Matrix</B> button again.
<LI>Scroll down the <B>Correlation Matrix</B> window. You will (usually) find a heading that is labeled <B>PCA Traits</B> with one or more listed components. The components will have labels such as <B>PC01</B>, <B>PC02</B>, <B>PC03</B> etc.  The components are "synthetic" traits that share significant variance with members of your selection. We only list those components that can explain 10% or more of the variance that is common to your group of traits. If you click on one of the <B>PCA Traits</B> a new window will open that contains the synthetic trait values (component scores) for all strains that have complete data. (The positive and negative values of these component scores may be "flipped" relative to what you might have expected.) You can add the PC01 trait back into your <B>Selections</B> window if you want to see which of your traits covary best with each of the principal components. This allows you to view the effective "loading" of the original traits on the PCA factors.
<LI><B>Cluster Maps</B> are a particularly effective and intuitive way to look for shared covariance withing a group of traits. Just click on the <B>Cluster Map</B> button in the <B>Selection</B> window and then read the explanatory text at the top of the page. <SMALL>[RWW, Jan 2, 2005, Sept 27, 2005]</SMALL>

</OL>
</DIR>
 
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>


<Blockquote>
<B>Q13: </B> <A NAME="Q-12" class="Normal">What web browser do you recommend?
</A><BR><BR>

<B>A13: </B>Most browsers will work without any signficiant differences in functionality. However, the aesethics of the text and graphs varies significantly among current generation browsers. Safari 1.2.4 and Firefox 1.0.4 both look fine on Mac OS X (we use this browser for most in-house testing of Python). Please let us know if you encounter any differences in function among browsers or serious aesthetic issues that detract from your use of the GeneNetwork.
<SMALL>[RWW, Feb 19, 2005;l May 14, 2005]</SMALL>
<BR>  
		<BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>

<Blockquote>
<B>Q14: </B> <A NAME="Q-13" class="Normal"><B>Reverse Mapping</B>: How can I find a set of transcripts and other traits that are possibly controlled by a transcription factor or other gene variant that I already know about? For example, in the paper by Chesler et al. (2005), the region near <I>D6Mit150</I> was defined as a master control locus. What are some of the controlled traits? How do I review them efficiently since they are not all listed in the paper.</A><BR><BR>

<B>A14: </B>Select the <B>BXD Genotype Database</B>. Search for and select <I>D6Mit150</I>. Generate the <B>Correlation Results</B> table for <I>D6Mit150</I> against any other BXD database. For example, the correlation of <I>D6Mit150</I> against the RMA database (UTHSC Brain mRNA U74Av2 (Mar04) RMA Orig) that was used in Chesler et al., generates a list of 100 transcripts. All 100 covary with this marker with Pearson product moment correlations that have absolute values between 0.72 and 0.56 (76 are positive correlations, 24 are negative correlations). Select all 100 and add them to your BXD "Selections" window (do not select more than 100). Select all 100 again and compute a <B>Cluster Map</B> for the whole set of traits. This map highlights calcium/calmodulin dependent kinase 1 (<I>Camk1</I>) and the GABA transporter (<I>Gabt</I> or <I>Slc6a1</I>as two high priority candidates for the Chr 6 QTL (both are logical candidates and both are apparent cis-QTLs. This cluster map also highlights more than 90 downstream candidates of the Chr 6 locus, including <I>Pax3, Bmp10, Dlx4, Myh7, Prph, Gata6, Hoxb6, Ifna5, Msx3, Caml, Reln, Dct, </I>and <I>Rgs9</I>.
<SMALL>[RWW, March 27, 2005]</SMALL>
<BR>  
		<BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>

<Blockquote>
<B>Q15: </B> <A NAME="Q-14" class="Normal"> Finding transcripts that modulate their own expression levels (cis-QTs and cis-QTLs)</B>: How can I find a set of transcripts or proteins that are under tight control by a locus that overlaps their own physical location in the genome—that have a cis-QTL?  This class of transcripts is particulary interesting because polymorphic genes that modulate their own expression, may also produce numerous downstream effects.</A><BR><BR>

<B>A15: </B>Select the <B>The Genotype Database</B> that corresponds to the your species and tissue of interest. Select the marker that is most closely linked to the gene or transcript in which you are interested. Review the "Trait Data" window of the genotype that you have selected. Then compute the top 100 covariates of this genotype in any of the phenotype phenotypes databases. Select the top 100 covariates of your marker and then run the Cluster Map. This may take a while if you selected 100 traits.  Review the cluster map. It will highlight a subset of transcripts that are linked by high correlation to your marker and which have a marked yellow triangle.
<SMALL>[RWW, April 7, 2005]</SMALL>
<BR>  
		<BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>

<Blockquote>
<B>Q16: </B> <A NAME="Q-15" class="Normal">How do you error-check the data that you put into the GeneNetwork?</A><BR><BR>


<B>A16: </B>Once an array data set has passed standard quality control steps (good RNA quality, good array hybridization signal), we still need to verify that data are assigned to the correct strain and sex. 

<P>Checking the "sex" of an array data set is done using probe sets that are sexually dimorphic in expression level. The transcripts <I>Xist</I> and <I>Ddx3y</I>, for example, have sexually dimorphic expression on the U74Av2 array using some transforms. The <I>Xist</I> probe set, 99126_at, can be used as a surrogate "factor" for sex in most U74av2 data sets. Note that this probe set has high expression is 'all-female' strains (e.g., BXD6, 13, 25, and 28 in the Brain data sets). <I>Ddx3y</I>, or probe set 103842_at, tends to have high expression in male samples, although some transforms perform poorly with this particular probe set.

<P>Checking the "strain" of a data set is done using probe sets that are known to have nearly perfect Mendelian segregation patterns among BXD strains. Many probe sets (and single probes) can be used for this purpose. For the M430 Affymetrix arrays these include the following example probe sets:
<OL>
<LI>1452705_at_A [KIAA0251 on Chr 16 @ 12.570143 Mb]: pyridoxal dependent group II decarboxylase family member; deep 3' UTR, antisense probes in Ntan1 (test Mendelian 1)
<LI>1418908_at_A [Pam on Chr 1 @ 97.712988 Mb]: peptidylglycine alpha-amidating monooxygenase; whole 3' UTR (test Mendelian 2)
<LI>1450712_at_A [Kcnj9 on Chr 1 @ 172.39301 Mb]: potassium inwardly-rectifying channel, subfamily J, member 9; distal 3' UTR (test Mendelian 3)
<LI>1429509_at_B [FLJ30656 on Chr 11 @ 101.983718 Mb]: RIKEN cDNA 1110032E16; deep 3' UTR (test Mendelian 4)
<LI>1444806_at_B [6720456B07Rik on Chr 6 @ 114.179842 Mb]: 6720456B07Rik; intron or 3' UTR (test Mendelian 5)
<LI>1427011_a_at_A [Lancl1 on Chr 1 @ 67.399339 Mb]: LanC (bacterial lantibiotic synthetase component C)-like; last exons and proximal 3' UTR (test Mendelian 6)
</OL>

Strain means for these probe sets should in general be either high or low. When data for different arrays purported from the same strain fall into both high and low groups this suggest that there has been an error of strain assignment at some stage of the process. In some cases, it is possible to fix these errors after the fact and to correctly reassign an array to a particular strain. 

<SMALL>[RWW, May 8, 2005]</SMALL>
<BR>  
		<BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>


<Blockquote>
<B>Q17: </B> <A NAME="Q-16" class="Normal">Is there a way for me to automatically generate a log file of my use of the GeneNetwork?</A><BR><BR>

<B>A17: </B>No. The GeneNetwork does not track your activity and has no memory of your sequence of requests. However, there is a simple expedient that makes it possible for you to produce a history of your own activity. Open a slide presentation program such as PowerPoint or Keynote and incorporate screen shots from GeneNetwork as slides. Annotate as you progress. Even modest annotation will allow you to return to precisely the same point or graph. Note, that there are functions in the GeneNetwork that allow you to export and save lists of traits or markers. For example, you can export the top 500 traits in a <B>Compare Correlates</B> window by clicking on the "download" link toward the top of the page. The contents of any <B>Selections</B> window can also be saved in a format that can be reloaded into the GeneNetwork. Scroll to the bottom of the <B>Selections</B> window to find the <I>Save</I> and <I>Load</I> buttons.

<SMALL>[RWW, May 15, 2005]</SMALL>
<BR>  
		<BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>


<Blockquote>
<B>Q18: </B> <A NAME="Q-17" class="Normal">How can I determine the precise region of the transcript that is targeted by Affymetrix or Agilent probes?</A><BR><BR>

<B>A18: </B>The easiest way is to align the sequences of the probes with the most up-to-date version of genome sequence. GeneNetwork does most of the work for you. Notice that most <B>Trait Data and Analysis Forms</B> have on of more <B>Verify</B> buttons (e.g., <B>UCSC by Probes</B>). When you click these verify buttons, the sequence of probes are assembled into a single query sequence (overlapping sequence is trimmed away). The query string representing the four nucleotides is sent to the <A HREF="http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BLATAlign" target="_blank" class="fs14">BLAT</A> BLAT search program at UCSC. A BLAT window will load in a few seconds. There will typically be several rows of results, but the top row with the highest score is the one that will be of most relevance. Scores should be over 45, representing roughly a 45 nucleotide match. Review the whole row of data and note the target chromosome, the strand of DNA that matches the probe sequences, and the start and end base pairs of the probe sequence. Click on the <B>browser</B> link. The window will refresh with a graphic display of the probe sequence labeled <B>YourSeq</B> at the top. The black bars represent the probe sequences on the array (they are often interrupted by thin lines with arrow heads) aligned to the genome. <B>YourSeq</B> will either run from left to right on the plus strand of DNA or from right to left on the minus strand. Click on the <B>Zoom Out 10x</B> button in the upper right of the Genome Browser window. This will give you a better overview of the location of the probes on the target sequence. Look at the <B>Known Genes</B> track and see what part of the gene is targeted. Most probes are complementary to parts of the last few exons or the 3' untranslated region. If you still do not see any nearby genes, then zoom out  again until you see the genome context of your probe sequence.
 
<SMALL>[RWW, July 15, 2005]</SMALL>
<BR>  
		<BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>


<Blockquote>
<B>Q19: </B> <A NAME="Q-18" class="Normal">I am having trouble with the <B>Network Graph</B> feature. Problems include time-outs and failures to display the graphs.</A><BR><BR>

<B>A19: </B>Dr. Bob Clark has found a couple of fixes that will possibly help if you 
have persistent time-out errors due to the calculations taking too long.
<OL>
<LI>  Change the Correlation Threshold minutely. For instance, I frequently get 
time-out errors when use 0.9 as a correlation threshold. This is resolved at 
times when I use 0.8999 as a new correlation threshold.
<LI> Change the order of the traits in your selection menu. Sort your traits 
using different parameters in the Selection screen. This worked great today 
after trying multiple things to prevent a time-out.
</OL>
 
<SMALL>[RWW, Sept 26, 2005]</SMALL>
<<BR>  
		<BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>

<Blockquote>
<B>Q20: </B> <A NAME="Q-19" class="Normal">What expression levels are considered high and reliable? What expression levels are so low as to disregard?</A><BR><BR>

<B>A20: </B>Affymetrix expression data with mean values of less than 7.5 will tend to be noisy. This signal level corresponds to an mRNA concentration of <1.0 pM. Most probe sets with values of less than 7 would be "declared" as absent using the Affymetrix MAS 5 routine. Values greater than 10 can be referred to as "moderately high" and will usually be associated with probe sets that properly target the 3' UTR and last exons of transcripts present in the sample at concentrations of greater than 4 pM. Affymetrix data sets include a set of 64 probe sets that have IDs that start "AFFX".  You can search for these control probe sets and gain some understanding of expression levels associated with exogenenous labeled standards: from 1.5 pM through to 100 pM. For example:

<OL>
<LI>&nbsp;&nbsp;&nbsp;1.5 pM = AFFX-BioB-3_at
<LI>&nbsp;&nbsp;&nbsp;5.0 pM = AFFX-r2-Ec-bioC-3_at
<LI>&nbsp;&nbsp;25.0 pM = AFFX-BioDn-5_at
<LI>100.0 pM = AFFX-r2-P1-cre-3_at
</OL>
<BR>


<SMALL>[RWW, November 23, 2005]</SMALL>
<BR>  
		<BR>  
		<a href="#index" class= "fs12">Back to Index</a>
		<BR>
		<HR width="#30%" align="left">
		<BR>
		</Blockquote>

<DIR>
<SMALL>
Last edit Jan 18, 2005, by KAG. Feb 19, by RWW. May 12, 2006 by RWW.
</SMALL>
</DIR>

		</Blockquote>
		
		</TD>
		</TR></TABLE>
		</TD>
	</TR>
	<TR>
		<TD align=center bgColor=#ddddff class="solidBorder">
		<!--Start of footer-->
		<TABLE width="90%">
		<script language='JavaScript' src='/javascript/footer.js'></script>
		</TABLE>
		<!--End of footer-->
		</TD>
	</TR>
</TABLE>
<!-- /Footer -->
<script language="JavaScript" src="/javascript/menu_new.js"></script>
<script language="JavaScript" src="/javascript/menu_items.js"></script>
<script language="JavaScript" src="/javascript/menu_tpl.js"></script>
<script language="JavaScript">
	<!--//
	new menu (MENU_ITEMS, MENU_POS);
	//-->
</script>
<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct = "UA-3782271-1";
urchinTracker();
</script>
</BODY>
</HTML>