summaryrefslogtreecommitdiff
path: root/topics/data-uploads/inserting-data.gmi
diff options
context:
space:
mode:
authorBonfaceKilz2022-01-27 16:03:21 +0300
committerBonfaceKilz2022-01-27 16:04:57 +0300
commit7b14b2e9363b66be333f4fb4c652ba3abdcbd71a (patch)
tree94a8d341f1b6da2bb8ac8193fb67728272bb46b2 /topics/data-uploads/inserting-data.gmi
parenteb435b76b3433ed97837299329229e919c3fffd6 (diff)
downloadgn-gemtext-7b14b2e9363b66be333f4fb4c652ba3abdcbd71a.tar.gz
topics: inserting-data: New topic
Diffstat (limited to 'topics/data-uploads/inserting-data.gmi')
-rw-r--r--topics/data-uploads/inserting-data.gmi163
1 files changed, 163 insertions, 0 deletions
diff --git a/topics/data-uploads/inserting-data.gmi b/topics/data-uploads/inserting-data.gmi
new file mode 100644
index 0000000..33b500c
--- /dev/null
+++ b/topics/data-uploads/inserting-data.gmi
@@ -0,0 +1,163 @@
+## Tags
+
+* assigned: bonfacem, zachs
+
+### Introduction
+
+The current uploader work documented in `editing-data.gmi` only caters
+for the following operations by people with the right access:
+
+- Editing phenotype and probeset metadata
+
+- Editing sample data from a published phenotype
+
+- Deleting sample data from published phenotypes
+
+- Inserting data from strains that already exist in a ".geno" file
+
+However, one of our beta users ran into a problem when attempting to
+insert new trait data for BDL_10001. ATM, we can't add new samples.
+Also, adding case attributes for new samples is a manual process. New
+samples cannot be added because new genotype files need to be
+generated when new strains/ samples are added. Addition of these
+genotype files has always been manual. We can add strain data
+(inserting new strains into the Strain table(s) if it's not already
+there) by hacking existing code. However, this will show up-- the
+strains-- in a separate "sample group" on the trait page and won't be
+used for mapping until the new .geno file containing the new strains
+is generated.
+
+How is this genotype file generated?
+
+[From Rob] Genotypes should be added by code. For some species like
+humans, this won't happen; but for experimental animals and plant,
+many families may grow and spread e.g. BXDs grow to the BXD Dax, then
+they expand tot he BXDx Collaborative Cross DAX.
+
+New strains are added by entering strains based on groups(InbredSetId) and SpeciesId. This is why we have the "StrainXRef" table. This is demonstrated below:
+
+```
+MariaDB [db_webqtl]> desc Strain;
+
++-----------+----------------------+------+-----+---------+----------------+
+
+| Field | Type | Null | Key | Default | Extra |
+
++-----------+----------------------+------+-----+---------+----------------+
+
+| Id | int(20) | NO | PRI | NULL | auto_increment |
+
+| Name | varchar(100) | YES | MUL | NULL | |
+
+| Name2 | varchar(100) | YES | | NULL | |
+
+| SpeciesId | smallint(5) unsigned | NO | | 0 | |
+
+| Symbol | varchar(20) | YES | MUL | NULL | |
+
+| Alias | varchar(255) | YES | | NULL | |
+
++-----------+----------------------+------+-----+---------+----------------+
+
+6 rows in set (0.001 sec)
+
+MariaDB [db_webqtl]> desc StrainXRef;
+
++------------------+----------------------+------+-----+---------+-------+
+
+| Field | Type | Null | Key | Default | Extra |
+
++------------------+----------------------+------+-----+---------+-------+
+
+| InbredSetId | smallint(5) unsigned | NO | PRI | 0 | |
+
+| StrainId | int(20) | NO | PRI | NULL | |
+
+| OrderId | int(20) | YES | | NULL | |
+
+| Used_for_mapping | char(1) | YES | | N | |
+
+| PedigreeStatus | varchar(255) | YES | | NULL | |
+
++------------------+----------------------+------+-----+---------+-------+
+
+5 rows in set (0.001 sec)
+
+MariaDB [db_webqtl]> select max(Id) from Strain;
+
++---------+
+
+| max(Id) |
+
++---------+
+
+| 66085 |
+
++---------+
+
+1 row in set (0.000 sec)
+
+MariaDB [db_webqtl]> insert into Strain (Name,
+Name2,SpeciesId,Symbol,Alias) value ("Test1","Test1",30,"Test1","Test1");
+
+Query OK, 1 row affected (0.000 sec)
+
+MariaDB [db_webqtl]> select max(Id) from Strain;
+
++---------+
+
+| max(Id) |
+
++---------+
+
+| 66086 |
+
++---------+
+
+1 row in set (0.000 sec)
+
+MariaDB [db_webqtl]> select * from Strain where Id=66086;
+
++-------+-------+-------+-----------+--------+-------+
+
+| Id | Name | Name2 | SpeciesId | Symbol | Alias |
+
++-------+-------+-------+-----------+--------+-------+
+
+| 66086 | Test1 | Test1 | 30 | Test1 | Test1 |
+
++-------+-------+-------+-----------+--------+-------+
+
+1 row in set (0.000 sec)
+
+```
+
+### Problems
+
+- Integration with genotype files complicates things e.g. we can only
+ generate individual BXD genotypes from the "main" BXD genotype files
+ iff when we have the strains of each individual stored somewhere.
+
+- CaseAttributes need to be updated manually. One option is to enable
+ this using the UI. ATM we need to query the Case Attribute tables
+ to look the BXD strain for each individual when generating the
+ genotype files.
+
+- We should ideally be able to generate a set of genotype files or a
+ set of DB tables with all possible 20,000 BXD family genomes. As a
+ reference see David's smoothed WGS-based genotype files.
+
+- Genotype files will most likely not be static. Soon, we should be
+ able to support the need for them to change during runtime.
+
+- When the sample list for BDL_10001 is updated, will the sample list
+ for all other records be synchronized automatically.
+
+
+### Goal
+
+- (Low hanging fruit) Ability to add insert new strains.
+
+- There are 1000s of mice with computable genomes that have never been
+ born yet. Can we compute their phenotype? This is as difficult as
+ getting to Mars. \ No newline at end of file