summaryrefslogtreecommitdiff
path: root/topics
diff options
context:
space:
mode:
authorFrederick Muriuki Muriithi2024-02-22 12:07:21 +0300
committerFrederick Muriuki Muriithi2024-02-22 12:07:21 +0300
commit4999ac15950b37581ed0124627a712d732b461bb (patch)
treeb0cdc745bfe36b1272fcb07182da17087f65ed13 /topics
parent248e08680ff3fc2a93237b003a15fb580bfa6bf4 (diff)
downloadgn-gemtext-4999ac15950b37581ed0124627a712d732b461bb.tar.gz
Fix formatting and add missing file.
Diffstat (limited to 'topics')
-rw-r--r--topics/data-uploads/data_dictionaries_20230222.txt.zipbin0 -> 2437 bytes
-rw-r--r--topics/data-uploads/gn-uploader-requirements.gmi32
2 files changed, 16 insertions, 16 deletions
diff --git a/topics/data-uploads/data_dictionaries_20230222.txt.zip b/topics/data-uploads/data_dictionaries_20230222.txt.zip
new file mode 100644
index 0000000..5a8ba2f
--- /dev/null
+++ b/topics/data-uploads/data_dictionaries_20230222.txt.zip
Binary files differ
diff --git a/topics/data-uploads/gn-uploader-requirements.gmi b/topics/data-uploads/gn-uploader-requirements.gmi
index 8a1bfcd..871cf99 100644
--- a/topics/data-uploads/gn-uploader-requirements.gmi
+++ b/topics/data-uploads/gn-uploader-requirements.gmi
@@ -52,9 +52,9 @@ and for data in ProbeSetData, it would be something like:
We can then have table indexes composed of one or more of the elements of the *FULL IDENTIFIER* for faster queries.
-**NOTE 01**: The *FULL IDENTIFIERS* above should be hieararchical, beginning with the "oldest" ancestor and ending with the current record's ID.
+**NOTE 01**: The FULL IDENTIFIERS above should be hieararchical, beginning with the "oldest" ancestor and ending with the current record's ID.
-**NOTE 02**: The examples of the *FULL IDENTIFIERS* above might not be complete. I'll update them as I tease more information from the database.
+**NOTE 02**: The examples of the FULL IDENTIFIERS above might not be complete. I'll update them as I tease more information from the database.
## Data Categories
@@ -115,14 +115,14 @@ Hierarchy
We could index the genotype information by the following fields:
* SpeciesId: For faster queries for a particular species' genotypes
-* …
+* ...
### Assembly Information
* mm8
* mm10
* mm11
-* …
+* ...
etc.
I still do not wholly comprehend this. This might be related to the platform information.
@@ -135,7 +135,7 @@ Tables affected by this information:
* Geno
* Chr_Length
-* …
+* ...
### Population Information
@@ -167,7 +167,7 @@ The data we need to collect/have for the samples are:
From the existing `Strain` table, it seems you can only have one-and-only-one sample for a particular species with a specific name.
> MariaDB [db_webqtl]> SHOW CREATE TABLE Strain;
-> …
+> ...
> | Strain | CREATE TABLE `Strain` (
> `Id` int(20) NOT NULL AUTO_INCREMENT,
> `Name` varchar(100) DEFAULT NULL,
@@ -179,7 +179,7 @@ From the existing `Strain` table, it seems you can only have one-and-only-one sa
> UNIQUE KEY `Name` (`Name`,`SpeciesId`),
> KEY `Symbol` (`Symbol`)
> ) ENGINE=InnoDB AUTO_INCREMENT=180927 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
-> …
+> ...
We could index this information by any one, or combinations of the following fields:
@@ -191,7 +191,7 @@ and maybe even drop the need for the 'StrainXRef' table. (*To be considered*)
### Tissue Information
Hierarchy
-> Species --> ?? … ?? --> Tissue --> {{{ data of various sorts }}}
+> Species --> ?? ... ?? --> Tissue --> {{{ data of various sorts }}}
Felix discovered the need for this when uploading the Arabidopsis Thaliana data into the test database with the uploader. Expression data to be uploaded has to be linked to a tissue, and the existing tissue information (as of before 2024-02-22T09:45+03:00UTC) seems to only belong to vertebrates, not plants.
@@ -205,15 +205,15 @@ Tables:
* TissueProbeSetFreeze
* TissueProbeSetXRef
-…
+...
### Expression Data Information
Hierarchy
-> Species --> ?? … ?? --> Expression Data --> {{{ data of various sorts }}}
+> Species --> ?? ... ?? --> Expression Data --> {{{ data of various sorts }}}
-The ' --> ?? … ?? --> ' section winds through Platform, Population, Genotype, Tissue, Samples etc before making its way to the expression data information. I still need to unwind the hieararchy and list the paths here.
+The ' --> ?? ... ?? --> ' section winds through Platform, Population, Genotype, Tissue, Samples etc before making its way to the expression data information. I still need to unwind the hieararchy and list the paths here.
Affects the following database tables:
@@ -228,18 +228,18 @@ Some mandatory data we need:
* SpeciesId (see 'Species Information' above)
* PlatformId (see 'Platform Information' above)
* Name: Phenotype identifier for the platform above
-* Gene Symbol: …
+* Gene Symbol: ...
* Chromosome:
* Megabases:
* Description: A description for the phenotype
* GeneId: Entrez gene ID from NCBI
* Strand_Gene/Strand_Probe: he DNA strand (+ or -) of the gene assigned to the phenotype. Leading or lagging strand.
-Maybe the *Chromosome* and *Megabases* value could be replaced by a single link to a ChromosomeId or such… maybe a table linking the chromosome to its specific assembly e.g.
+Maybe the *Chromosome* and *Megabases* value could be replaced by a single link to a ChromosomeId or such... maybe a table linking the chromosome to its specific assembly e.g.
> Probeset(ChromosomeAssemblyId) --> (Id)ChromosomeAssembly(ChromosomeId) --> Chromosome(Id)
-…
+...
### Publish Phenotype Data
@@ -262,9 +262,9 @@ Some important data required:
* Units: Units of measurement for the phenotype
=> https://info.genenetwork.org/faq.php#q-22 Description for "Publish Phenotypes"
-* Others? …
+* Others? ...
-…
+...
## Descriptions