aboutsummaryrefslogtreecommitdiff

Database Information

WARNING: This document contains information on the GN databases which will change over time. The GN database is currently MySQL based and, while efficient, contains a number of design choices we want to grow 'out' of. Especially with an eye on reproducibility we want to introduce versioning.

So do not treat the information in this document as a final way of accessing data. It is better to use the REST API.

The small test database (2GB)

The default install comes with a smaller database which includes a number of the BXD's and the Human liver dataset (GSE9588).

GeneNetwork database

Estimated table sizes

select tablename,round(((datalength + indexlength) / 1024 / 1024), 2) `Size in MB` from informationschema.TABLES where tableschema = "dbwebqtl" order by datalength;

 table_name                Size in MB 
 ProbeSetData            
 SnpAll                  
 ProbeData               
 SnpPattern              
 ProbeSetSE              
 QuickSearch             
 ProbeSetXRef            
 LCorrRamin3             
 ProbeSE                 
 ProbeSet                
 Probe                   
 GenoData                
 CeleraINFO_mm6          
 pubmedsearch            
 ProbeXRef               
 GeneRIF_BASIC           
 BXDSnpPosition          
 EnsemblProbe            
 EnsemblProbeLocation    
 Genbank                 
 TissueProbeSetData      
 AccessLog               
 GeneList                
 Geno                    
 MachineAccessLog        
 IndelAll                
 PublishData             
 TissueProbeSetXRef      
 ProbeH2                 
 GenoXRef                
 TempData                
 GeneList_rn3            
 GORef                   
 Phenotype               
 temporary               
 InfoFiles               
 Publication             
 Homologene              
 Datasets                
 GeneList_rn33           
 PublishSE               
 GeneRIF                 
 Vlookup                 
 H2                      
 PublishXRef             
 NStrain                 
 IndelXRef               
 Strain                  
 GeneMap_cuiyan          
 user_collection         
 CaseAttributeXRef       
 StrainXRef              
 GeneIDXRef              
 Docs                    
 News                    
 ProbeSetFreeze          
 GeneRIFXRef             
 Sample                  
 login                   
 user                    
 TableFieldAnnotation    
 DatasetMapInvestigator  
 User                    
 ProbeFreeze             
 TableComments           
 Investigators           
 DBList                  
 Tissue                  
 GeneChip                
 GeneCategory            
 SampleXRef              
 InbredSet               
 SnpAllele_to_be_deleted 
 Organizations           
 PublishFreeze           
 GenoFreeze              
 Chr_Length              
 SnpSource               
 AvgMethod               
 Species                 
 Dataset_mbat            
 TissueProbeFreeze       
 EnsemblChip             
 TissueProbeSetFreeze    
 UserPrivilege           
 CaseAttribute           
 MappingMethod           
 DBType                  
 InfoFilesUser_md5       
 GenoCode                
 DatasetStatus           
 GeneChipEnsemblXRef     
 GenoSE                  
 user_openids            
 roles_users             
 role                    
 Temp                    
   59358.80 
   15484.67 
   22405.44 
    9177.05 
   14551.02 
    5972.86 
    4532.89 
   18506.53 
    6263.83 
    2880.21 
    2150.30 
    3291.91 
     989.80 
    1032.50 
     743.38 
     448.54 
     224.44 
     133.66 
     105.49 
      37.71 
      74.42 
      42.38 
      34.11 
      33.90 
      28.34 
      22.42 
      22.54 
      14.73 
      13.26 
      22.83 
       8.35 
       5.54 
       4.97 
       6.50 
       3.59 
       3.32 
       3.42 
       5.69 
       2.31 
       2.61 
       4.71 
       2.18 
       1.87 
       2.18 
       2.18 
       4.80 
       2.91 
       1.07 
       0.51 
       0.30 
       0.44 
       0.56 
       0.77 
       0.17 
       0.17 
       0.22 
       0.24 
       0.06 
       0.06 
       0.04 
       0.05 
       0.05 
       0.04 
       0.06 
       0.02 
       0.02 
       0.03 
       0.02 
       0.01 
       0.01 
       0.01 
       0.01 
       0.00 
       0.01 
       0.00 
       0.00 
       0.01 
       0.00 
       0.00 
       0.00 
       0.00 
       0.00 
       0.00 
       0.01 
       0.00 
       0.00 
       0.00 
       0.00 
       0.00 
       0.00 
       0.00 
       0.00 
       0.00 
       0.00 
       0.00 
       0.00 
       NULL 

97 rows in set, 1 warning (0.01 sec)

All *Data tables are large

User access

According to the meta data:

This table tracks access time and IP addresses. Used for logging in registered users and tracking cookies.

select * from AccessLog limit 5;

 id      accesstime            ip_address     
 12174 
 12173 
     3 
     4 
     5 
 2003-10-28 02:17:41 
 2003-10-28 02:16:27 
 2003-02-22 07:38:33 
 2003-02-22 07:49:13 
 2003-02-22 07:51:08 
 130.120.104.71 
 130.120.104.71 
 192.117.159.1  
 192.117.159.1  
 192.117.159.1  

select * from AccessLog order by accesstime desc limit 5;

 id        accesstime            ip_address    
 1025735 
 1025734 
 1025733 
 1025732 
 1025731 
 2016-02-08 14:23:29 
 2016-02-08 13:54:28 
 2016-02-08 13:43:37 
 2016-02-08 13:39:50 
 2016-02-08 13:15:46 
 100.43.81.157 
 180.76.15.144 
 66.249.65.217 
 66.249.65.217 
 66.249.65.217 

Quite a few trait page hits:

select count(*) from AccessLog;

 count(*) 
  1025685 

show indexes from AccessLog;

 Table       Non_unique   Key_name   Seq_in_index   Column_name   Collation   Cardinality   Sub_part   Packed   Null   Index_type   Comment   Index_comment 
 AccessLog            0   PRIMARY               1   id            A               1025685       NULL   NULL            BTREE                                

This table is being used by both GN1 and GN2 from the trait pages!

grep -ir AccessLog *|grep -e "^gn1\|^gn2"|grep \.py|grep -v doc

gn1/web/webqtl/showTrait/ShowTraitPage.py: query = "SELECT count(id) FROM AccessLog WHERE ipaddress = %s and \ gn1/web/webqtl/showTrait/ShowTraitPage.py: self.cursor.execute("insert into AccessLog(accesstime,ipaddress) values(Now(),%s)" ,userip) gn1/web/webqtl/textUI/cmdClass.py: query = """SELECT count(id) FROM AccessLog WHERE ipaddress = %s AND UNIXTIMESTAMP()-UNIXTIMESTAMP(accesstime)<86400""" gn1/web/webqtl/textUI/cmdClass.py: query = """INSERT INTO AccessLog(accesstime,ipaddress) values(Now(),%s)""" gn2/wqflask/wqflask/showtrait/showtraitpage.py: query = "SELECT count(id) FROM AccessLog WHERE ipaddress = %s and \ gn2/wqflask/wqflask/showtrait/showtraitpage.py: self.cursor.execute("insert into AccessLog(accesstime,ipaddress) values(Now(),%s)", userip)

When looking at the code in GN1 and GN2 it restricts the daily use of the trait data page (set to 1,000 - whoever reaches that?). Unlike mentioned in the schema description, this table does not keep track of cookies.

From the code it looks like GN2 uses a mixture of Redis and sqlalchemy to keep track of logged in sessions (see gn2/wqflask/wqflask/usermanager.py) and cookies through a useruuid in model.py.

In gn2/wqflask/wqflask/templates/collections/viewanonymous.html it showtraitpage appears to be loaded (need to check).

AvgMethod

Probesetfreeze refers to AvgMethod

BXDSnPosition

Snp table (all snps)

Mapping in GN1 shows snps when you select a chromosome.

CaseAttribute(XRef)

Metadata

CeleralINFOmm6

?

ChrLength (/cross/BXD.json)

Default mm9, column for mm8

select * from ChrLength;

Name SpeciesId OrderId Length Lengthmm8
1 1 1 197195432 197069962
2 1 2 181748087 181976762

Table should be merged with

Datasetmbat

Menu for BXD (linkouts)

DatasetMapInvestigator

Arthur?

DataSets

Information/metadata

DatasetStatus

Arthur private/public

DBList and DBType

Hooked in API (URL encoding)

Docs

GN2 only (see menu bar)

Ensembl*

Probe information

(will be deprecated)

Genbank

Linkout and not important

GeneCategory

Not important. GeneWiki notes function classification.

Deprecate.

GeneChip

GeneIDXRef

Interspecies gene comparison

GeneList

Track info

Genlistrn3(3)

Rat list

GeneMapcuiyan

Link outs

GeneRIF

Wiki info (nightly updated from NCBI)

XRef should be foreign keys

Geno (genotype/marker/'marker'.json)

SNP or marker info

base.trait:.sql: retrieveinfo: select Geno.Chr, Geno.Mb from Geno, Species where Species.Name = 'mouse' and Geno.Name = 'rs3693478' and Geno.SpeciesId = Species.Id

Id SpeciesId Name MarkerName Chr Mb Sequence Source chrnum Source2 Comments usedbygenofile Mbmm8 Chrmm8
1 1 01.001.695 01.001.695 1 4.678288 GCCCTGCCCACCTCAGAGCAAGCTGCCACCCAGGAGTCCGTGTTTCAGGAGATGTGTGAGGAGGGCCTGCTGGAGGAGTGTGATGGTGAGGATGAGGCAGGCCGTGCCGCG[T/C]AGCCAGAGGCTGGTGATGGGACCACCGAGATCTCACCCACTGGTGCTGCTGATCCTGAGAAGAGGATGGAGAAGAAGACGGAGCAGCAGCACACCGGCGGCGGGAGAAAGCTGCTCGTAAGCTGCTCGTAAGCTACGGGTGCAGCAGGCTGCACTTAGGGCAGCCCGGCTTCAGCACCAAGAACTCTTCAGGCTGCATGGGATCAAGGCCCAGGTGGCCCGAAGGCTGGCAGAACTCGCACACGGGAGGGAGCAGCAGCGCATACAGCGACTGGCAGAGGCTGACAAGCCCCGAAGGCTGGGACGACTCAAGTACCAGGCTCCTGACATTGATGTGCAGCTCAGCTCTGAGCTGTCTGGCCCACTCAGGACACTGAAACCAGAAGGTCACATTCTCCAAGACAGGTTCAAGAGCTTCCAGAAGAGAAATATGATTGAGCCCCGAGAACGAGCCAAGTTCAAGCGCAAATAAAAAATGAAGTTGGTGGAGAAGCGGGCCTACCATGAGATTCAGTTGTAGCTGTGCAGATGTCGGAGCCCCGCCCCTCAATAAAGTTCTGTGACAAAAAAAAAAAAAAAAAAAGAAGAAGAAGAAGAAAAGGAAAAAAAAGAAGAAAAAGAAAAAAAAAGAAAAAAGAAAAAGAAAACACATCACTTGGCAAAACTCCATAGACTCTATGTGATTCATGTTTCAAACATGCACCTA GNFSNP 1 GNF NULL NULL 4.678288 1

GenoCode

Belongs to someone else

GenoData

Allele info

GenoFreeze

Big menu (Freeze refers to menu)

GenoSE

SE standard err, not used

GenoXREF

Very important. Key links between Geno, GenoData

GORef

GO terms

H2

Heritability for probeset(?)

Homologene

Homology, not used much

InbredSet (/cross/BXD.info)

Group in menu

Indelall, SnpAll, SnpPattern, SnpSource

Indel Snp browser (variant browser Gn1)

Info*

Infra system PhP

Data Info button

Infosystem users has separate entries

Also Investigators, User, Organizations,

LCorrRamin3

Lit. Correlations Prof. Ramin

Login

GN2 login info

MachineAccessLog

Old

MappingMethod

GN1

News

GN2

NStrain

pheno publishfreeze (menu) xref (keys) xref links to publish (pubmed), phenotype, pubishdata geno genofreeze xref (keys) xref links to publish (pubmed), genotype, genodata probeset/expr. probesetfreeze xref (keys) xref links to publish (pubmed), probeset, probesetdata probe/expr. probefreeze xref (keys) xref links to publish (pubmed), probe, probedata

Each dataset has 3 values (real value (1), number of samples (2), stderr (3))

NStrain = number of phenotype samples