summaryrefslogtreecommitdiff
path: root/issues/systems/mariadb/ProbeData.gmi
diff options
context:
space:
mode:
Diffstat (limited to 'issues/systems/mariadb/ProbeData.gmi')
-rw-r--r--issues/systems/mariadb/ProbeData.gmi48
1 files changed, 48 insertions, 0 deletions
diff --git a/issues/systems/mariadb/ProbeData.gmi b/issues/systems/mariadb/ProbeData.gmi
new file mode 100644
index 0000000..c339571
--- /dev/null
+++ b/issues/systems/mariadb/ProbeData.gmi
@@ -0,0 +1,48 @@
+# ProbeData
+
+## Tags
+
+* assigned: pjotrp, aruni
+* status: unclear
+* priority: medium
+* type: enhancement
+* keywords: database, mariadb, innodb, ProbeData
+
+## Description
+
+Probe level data is used to examine the correlation structure among the
+N probes that have the same nominal target. Sometimes several probes
+are badly behaved or contain SNPs or indels.
+The well-behaved probes were then be used in GN1, at the user's
+discretion, to make an eigengene that sometimes performs quite a bit
+better than the Affymetrix probeset. Essentially, the user could design
+their own probesets. And the probe level data is quite fascinating to
+dissect some types of cis-eQTLs—the COMT story I have attached is a
+good example. Here is figure 1 that exploits this unique feature:
+
+Ideally, the probe level data would be in GN2 with the same basic
+functions as in GN1.
+
+All we need in GN2/3 is a new table to display the probe level
+expression (mean) with their metadata (melting temperature, sequence,
+location, etc). The probeset ID is the Table header and name (the
+parent), and the probes in the table are the children. Using our now
+standard DataTable format should work well.
+We have a similar parent-child relation among traits with peptides and
+proteins. All of the peptides of a single protein are should have
+the same parent probeset/protein. And peptides could be entered as
+"probes" in the same way that we did for Affymetrix.
+
+Arun—I wonder whether this hierarchy could be usefully combined to
+handle time-series data. Probably not ;-)
+In the case of probes and probesets there is almost never any overlap
+of probe sequence—all are disjoint. That is also usually true of
+peptides and proteins.
+
+Pjotr, the reason we have not added much probe level data to GN1 or GN2
+is because we did not have the bandwidth. Arthur simply did not have
+time and I did not push the issue. Instead we just started loading the
+probe level data separately as if they were probesets. This is what we
+have done for peptide data and the reason that there are now "parallel"
+data sets—one labeled "protein" and another as "peptide" or as "gene
+level" and "exon level". We just collapse the hierarchy.