summaryrefslogtreecommitdiff
path: root/topics/gn-uploader
diff options
context:
space:
mode:
Diffstat (limited to 'topics/gn-uploader')
-rw-r--r--topics/gn-uploader/genome-details.gmi42
-rw-r--r--topics/gn-uploader/genotypes-assemblies-markers-and-genenetwork.gmi40
-rw-r--r--topics/gn-uploader/types-of-data.gmi63
3 files changed, 145 insertions, 0 deletions
diff --git a/topics/gn-uploader/genome-details.gmi b/topics/gn-uploader/genome-details.gmi
new file mode 100644
index 0000000..f8a12f6
--- /dev/null
+++ b/topics/gn-uploader/genome-details.gmi
@@ -0,0 +1,42 @@
+# Genome Details
+
+This file is probably misnamed.
+
+*TODO*: Update name once we know where this fits
+
+## Tags
+
+* type: documentation, doc, docs
+* assigned: fredm
+* priority: docs
+* status: open
+* keywords: gn-uploader, uploader, genome
+
+## Location
+
+### centiMorgan (cM)
+
+We no longer use centiMorgan in GeneNetwork
+
+From the email threads:
+
+```
+> …
+> Sorry, we now generally do not use centimorgans. Chr 19 is 57 cM
+> using markers that exclude telomeres in most crosses.
+> …
+```
+
+and
+
+```
+> …
+> I know that cM is a bit more variable because it's not a direct measurement, …
+> …
+```
+
+### Megabasepairs (Mbp)
+
+The uploader will store any provided physical location values (in megabasepairs) in the
+=> https://gn1.genenetwork.org/webqtl/main.py?FormID=schemaShowPage#Geno Geno table
+specifically in the `Mb` field of that table.
diff --git a/topics/gn-uploader/genotypes-assemblies-markers-and-genenetwork.gmi b/topics/gn-uploader/genotypes-assemblies-markers-and-genenetwork.gmi
new file mode 100644
index 0000000..db0ddf3
--- /dev/null
+++ b/topics/gn-uploader/genotypes-assemblies-markers-and-genenetwork.gmi
@@ -0,0 +1,40 @@
+# Genotypes, Assemblies, Markers and GeneNetwork
+
+## Tags
+
+* type: documentation, docs, doc
+* keywords: genotype, assembly, markers, data, database, genenetwork, uploader
+
+## Markers
+
+```
+The marker is the SNP…
+
+— Rob (Paraphrased)
+```
+
+SNPs (Single Nucleotide Polymorphisms) are specific locations of interest within the genome, where the pair of nucleotides can take different forms.
+
+A SNP and its immediate neighbourhood (a number of megabase pairs before and after the SNP) form a sequence that is effectively the marker, e.g. for mouse (Mus musculus) you could have the following sequence from the GRCm38 genome assembly (mm10):
+
+```
+GAGATAAAGATGGGTCCCTTGGCACAGGACTGGCCCACATTTCCaatataaattacaacaattttttttaaatttttaaaCAAAACAAGCATCTCACACAC/TTGAAAAAGAAGATGCATTCAAAGAAAATAGATGTTTCAATGTATTTAAGATAATCAAGAGATAACCATGACCATATCATGAGGAAACTTAAGAATTGGCA
+```
+
+where the position with `C/T` represents the SNP of interest and thus the marker.
+
+You can search this on the UCSC Genome Browser, specifically the
+=> https://genome.ucsc.edu/cgi-bin/hgBlat BLAT search
+to get the name of the marker, and some extra details regarding it.
+
+## Genome Assemblies
+
+The genome assembly used will "determine" the position of the marker on the genome — newer assemblies will (generally) give a better position accounting for more of the issues discovered in older assemblies.
+
+With most of the newer assemblies, the positions do not shift very drastically.
+
+## GeneNetwork
+
+Currently (September 2024), GeneNetwork uses the GRCm38 (mm10) assembly for mice.
+
+Unfortunately, since the system was built for mice, the tables (e.g. Geno table) do not account for the fact that you could have markers (and other data) from species other than Mus musculus. You thus have the Geno table with fields like `Mb_mm8`, `Chr_mm8` which are very mouse-specific.
diff --git a/topics/gn-uploader/types-of-data.gmi b/topics/gn-uploader/types-of-data.gmi
new file mode 100644
index 0000000..1f53dec
--- /dev/null
+++ b/topics/gn-uploader/types-of-data.gmi
@@ -0,0 +1,63 @@
+# Types of Data in GeneNetwork
+
+## Tags
+
+* assigned:
+* priority:
+* status: open
+* type: documentation
+* keywords: gn-uploader, uploader, genenetwork, documentation, doc, docs, data, data type, types of data
+
+## Description
+
+There are five (5) main types of data in GeneNetwork
+
+* Classical Phenotypes (PublishData)
+* High Content Data
+* Genotype Data
+* Cofactors and Attributes
+* Metadata
+
+### Classical Phenotypes
+
+This is usually low-content data e.g. body weight, tail length, etc.
+
+This is currently saved in the `Publish*` tables in the database.
+
+This data is saved as is i.e. not log-transformed
+
+### High Content Data
+
+This includes mainly molecular data such as
+* mRNA assay data
+* genetic expression data
+* probes
+* tissue type and data
+
+These data are saved in the `ProbeSet*` database tables (and other closely related tables like the `Tissue*` tables - fred added this: verify).
+
+These could be saved in the database in a log-tranformed form - verify.
+
+How do you check for log-transformation in the data?
+
+### Genotype Data
+
+This is core data, and all other data seem to rely on its existence.
+
+Useful for:
+* correlations, cofactor and PheWAS computations.
+* mapping purposes
+* search and display
+* editing and curation
+
+### Cofactors and Attributes
+
+This data can be alphanumeric (mix of numerical and non-numerical) data.
+
+It is not intended for mapping.
+
+### Metadata
+
+This data should (ideally) always accompany any and all of the data types above. It provides contextual information regarding the data it accompanies, and is useful for search, and other contextualising operations.
+
+It is alphanumeric data, and mostly cannot be used for numeric computations.