diff options
Diffstat (limited to 'topics')
75 files changed, 8435 insertions, 270 deletions
diff --git a/topics/ADR/gn-guile/000-markdown-editor-push-to-bare-repo.gmi b/topics/ADR/gn-guile/000-markdown-editor-push-to-bare-repo.gmi new file mode 100644 index 0000000..05b2b6a --- /dev/null +++ b/topics/ADR/gn-guile/000-markdown-editor-push-to-bare-repo.gmi @@ -0,0 +1,18 @@ +# [gn-guile/ADR-000] Extend Markdown Editor to push to Git Bare Repo + +* author: bonfacem +* status: accepted +* reviewed-by: alexm, jnduli + +## Context + +The gn-guile markdown editor currently reads from normal git repositories. However, for GN's self-hosted git repository, we use bare repositories. Bare repositories only store the git objects, therefore we can't edit files directly. + +## Decision + +gn-guile and the cgit instance run in the same server. We will have one normal repository, and the bare repository, configurable by: "CURRENT_REPO_PATH", which has the normal raw files; and "CGIT_REPO_PATH" which is the bare repository. We will make edits to the normal repository, and once that is done, push locally to the cgit instance. + +## Consequences + +* When creating the gn-guile container, this introduces extra complexity in that will have to make sure that the container has the correct write access to the bare repository in the container. +* With this, we are coupled to our GN git set-up. diff --git a/topics/ADR/gn-transform-databases/000-remodel-rif-transform-with-predicateobject-lists.gmi b/topics/ADR/gn-transform-databases/000-remodel-rif-transform-with-predicateobject-lists.gmi new file mode 100644 index 0000000..1e3ee6a --- /dev/null +++ b/topics/ADR/gn-transform-databases/000-remodel-rif-transform-with-predicateobject-lists.gmi @@ -0,0 +1,74 @@ +# [gn-transform-databases/ADR-000] Remodel GeneRIF Metadata Using predicateObject Lists + +* author: bonfacem +* status: rejected +* reviewed-by: pjotr, jnduli + +## Context + +In RDF 1.1 Turtle, you have to use a Qname as the subject. As such, you cannot have a string literal forming the string. In simpler terms, this is not possible: + +``` +"Unique expression signature of a system that includes the subiculum, layer 6 in cortex ventral and lateral to dorsal striatum, and the endopiriform nucleus. Expression in cerebellum is apparently limited to Bergemann glia ABA" dct:created "2007-08-31T13:00:47"^^xsd:datetime . +``` + +As of commit "397745b554e0", a work-around was to manually create a unique identifier for each comment for the GeneRIF table. This identifier was created by combining GeneRIF.Id with GeneRIF.VersionId. One challenge with this is that we create some coupling with MySQL's unique generation of the GeneRIF.Id column. Here's an example of snipped turtle entries: + +``` +gn:wiki-352-0 rdfs:comment "Ubiquitously expressed. Hypomorphic vibrator allele shows degeneration of interneurons and tremor and juvenile lethality; modified by CAST alleles of Nxf1. Knockout has hepatic steatosis and hypoglycemia." . +gn:wiki-352-0 rdf:type gnc:GNWikiEntry . +gn:wiki-352-0 gnt:symbol gn:symbolPitpna . +gn:wiki-352-0 dct:created "2006-03-10T15:39:29"^^xsd:datetime . +gn:wiki-352-0 gnt:belongsToSpecies gn:Mus_musculus . +gn:wiki-352-0 dct:hasVersion "0"^^xsd:int . +gn:wiki-352-0 dct:identifier "352"^^xsd:int . +gn:wiki-352-0 gnt:initial "BAH" . +gn:wiki-352-0 foaf:mbox "XXX@XXX.XXX" . +gn:wiki-352-0 dct:references ( pubmed:9182797 pubmed:12788952 pubmed:14517553 ) . +gn:wiki-352-0 gnt:belongsToCategory ( "Cellular distribution" "Development and aging" "Expression patterns: mature cells, tissues" "Genetic variation and alleles" "Health and disease associations" "Interactions: mRNA, proteins, other molecules" ) . +``` + +## Decision + +We want to avoid manually generating a unique identifier for each WIKI comment. We should instead have that UID be a blank node reference that we don't care about and use predicateObjectLists as an idiom for representing string literals that can't be subjects. + +=> https://www.w3.org/TR/turtle/#grammar-production-predicateObjectList Predicate Object Lists + +The above transform (gn:wiki-352-0) would now be represented as: + +``` +[ rdfs:comment '''Ubiquitously expressed. Hypomorphic vibrator allele shows degeneration of interneurons and tremor and juvenile lethality; modified by CAST alleles of Nxf1. Knockout has hepatic steatosis and hypoglycemia.'''@en] rdf:type gnc:GNWikiEntry ; + gnt:belongsToSpecies gn:Mus_musculus ; + dct:created "2006-03-10 12:39:29"^^xsd:datetime ; + dct:references ( pubmed:9182797 pubmed:12788952 pubmed:14517553 ) ; + foaf:mbox <XXX@XXX.XXX> ; + dct:identifier "352"^^xsd:integer ; + dct:hasVersion "0"^^xsd:integer ; + gnt:initial "BAH" ; + gnt:belongsToCategory ( "Cellular distribution" "Development and aging" "Expression patterns: mature cells, tissues" "Genetic variation and alleles" "Health and disease associations" "Interactions: mRNA, proteins, other molecules" ) ; + gnt:symbol gn:symbolPitpna . +``` + +The above can be loosely translated as: + +``` +_:comment rdfs:comment '''Ubiquitously expressed. Hypomorphic vibrator allele shows degeneration of interneurons and tremor and juvenile lethality; modified by CAST alleles of Nxf1. Knockout has hepatic steatosis and hypoglycemia.'''@en] . +_:comment rdf:type gnc:GNWikiEntry . +_:comment dct:created "2006-03-10 12:39:29"^^xsd:datetime . +_:comment dct:references ( pubmed:9182797 pubmed:12788952 pubmed:14517553 ) . +_:comment foaf:mbox <bah@ucsd.edu> . +_:comment dct:identifier "352"^^xsd:integer . +_:comment dct:hasVersion "0"^^xsd:integer . +_:comment gnt:initial "BAH" . +_:comment gnt:belongsToCategory ( "Cellular distribution" "Development and aging" "Expression patterns: mature cells, tissues" "Genetic variation and alleles" "Health and disease associations" "Interactions: mRNA, proteins, other molecules" ) . +_:comment gnt:symbol gn:symbolPitpna . +``` + +## Consequences + +* Update SPARQL in tux02, tux01 in lockstep with updating GN3/GN2 and the XAPIAN index. +* Reduction in size of the final output, and faster transform time because using PredicateObjectLists output more terse RDF. + +## Rejection Rationale + +This proposal was rejected because relying on blank-nodes as an identifier is opaque and not human-readable. We want to use human readable identifiers where possible. diff --git a/topics/ADR/gn-transform-databases/001-remodel-ncbi-transform-with-predicateobject-lists.gmi b/topics/ADR/gn-transform-databases/001-remodel-ncbi-transform-with-predicateobject-lists.gmi new file mode 100644 index 0000000..073525a --- /dev/null +++ b/topics/ADR/gn-transform-databases/001-remodel-ncbi-transform-with-predicateobject-lists.gmi @@ -0,0 +1,102 @@ +# [gn-transform-databases/ADR-001] Remodel GeneRIF_BASIC (NCBI RIFs) Metadata Using predicateObject Lists + +* author: bonfacem +* status: rejected +* reviewed-by: pjotr, jnduli + +## Context + +We can model RIF comments using pridacetobject lists as described in: + +=> https://issues.genenetwork.org/topics/ADR/gn-transform-databases/000-remodel-rif-transform-with-predicateobject-lists [ADR/gn-transform-databases] Remodel GeneRIF Metadata Using predicateObject Lists + +However, currently for NCBI RIFs we represent comments as blank nodes: + +``` +gn:symbolsspA rdfs:comment [ + rdf:type gnc:NCBIWikiEntry ; + rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string ; + gnt:belongsToSpecies gn:Mus_musculus ; + skos:notation taxon:511145 ; + gnt:hasGeneId generif:944744 ; + dct:hasVersion '1'^^xsd:int ; + dct:references pubmed:97295 ; + ... + dct:references pubmed:15361618 ; + dct:created "2007-11-06T00:38:00"^^xsd:datetime ; +] . +gn:symbolaraC rdfs:comment [ + rdf:type gnc:NCBIWikiEntry ; + rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string ; + gnt:belongsToSpecies gn:Mus_musculus ; + skos:notation taxon:511145 ; + gnt:hasGeneId generif:944780 ; + dct:hasVersion '1'^^xsd:int ; + dct:references pubmed:320034 ; + ... + dct:references pubmed:16369539 ; + dct:created "2007-11-06T00:39:00"^^xsd:datetime ; +] . + +``` + +Here we see alot of duplicated entries for the same symbols. For the above 2 entries, everything is exactly the same except for the "gnt:hasGeneId" and "dct:references" predicates. + +## Decision + +We use predicateObjectLists with blankNodePropertyLists as an idiom to represent the generif comments. + +=> https://www.w3.org/TR/turtle/#grammar-production-predicateObjectList predicateObjectList +=> https://www.w3.org/TR/turtle/#grammar-production-blankNodePropertyList blankNodePropertyList + +In so doing, we can de-duplicate the entries demonstrated above. A representation of the above RDF Turtle triples would be: + +``` +[ rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string ] +rdf:type gnc:NCBIWikiEntry ; +dct:created "2007-11-06T00:39:00"^^xsd:datetime ; +gnt:belongsToSpecies gn:Mus_musculus ; +skos:notation taxon:511145 ; +dct:hasVersion '1'^^xsd:int ; +rdfs:seeAlso [ + gnt:hasGeneId generif:944744 ; + gnt:symbol gn:symbolsspA ; + dct:references ( pubmed:97295 ... pubmed:15361618 ) ; +] ; +rdfs:seeAlso [ + gnt:hasGeneId generif:944780 ; + gn:symbolaraC ; + dct:references ( pubmed:320034 ... pubmed:16369539 ) ; +] . +``` + +The above would translate to the following triples: + +``` +_:comment rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string . +_:comment rdfs:type gn:NCBIWikiEntry . +_:comment dct:created "2007-11-06T00:39:00"^^xsd:datetime . +_:comment gnt:belongsToSpecies gn:Mus_musculus . +_:comment skos:notation taxon:511145 . +_:comment dct:hasVersion '1'^^xsd:int . +_:comment rdfs:seeAlso _:metadata1 +_:comment rdfs:seeAlso _:metadata2 . +_:metadata1 gnt:hasGeneId generif:944744 . +_:metadata1 gnt:symbol gn:symbolaraC . +_:metadata1 dct:references ( pubmed:97295 ... pubmed:15361618 ) +_:metadata2 gnt:hasGeneId generif:944780 . +_:metadata2 gnt:symbol gn:symbolsspA . +_:metadata2 dct:references ( pubmed:320034 ... pubmed:16369539 ) . +``` + +Beyond that, we intentionally use a sequence to store a list of pubmed references. + +## Consequences + +* De-duplication of comments during the transform while retaining the integrity of the RIF metadata. +* Because of the terseness, less work during the I/O heavy operation. +* Update SPARQL in tux02, tux01 in lockstep with updating GN3/GN2 and the XAPIAN index. + +## Rejection Rationale + +This proposal was rejected because relying on blank-nodes as an identifier is opaque and not human-readable. We want to use human readable identifiers where possible. diff --git a/topics/ADR/gn-transform-databases/002-remodel-ncbi-transform-to-be-more-compact.gmi b/topics/ADR/gn-transform-databases/002-remodel-ncbi-transform-to-be-more-compact.gmi new file mode 100644 index 0000000..ac06fc1 --- /dev/null +++ b/topics/ADR/gn-transform-databases/002-remodel-ncbi-transform-to-be-more-compact.gmi @@ -0,0 +1,127 @@ +# [gn-transform-databases/ADR-002] Remodel GeneRIF_BASIC (NCBI RIFs) Metadata To Be More Compact + +* author: bonfacem +* status: proposal +* reviewed-by: pjotr, jnduli + +## Context + +Currently, we represent NCBI RIFs as blank nodes that form the object of a given symbol: + +``` +gn:symbolsspA rdfs:comment [ + rdf:type gnc:NCBIWikiEntry ; + rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string ; + gnt:belongsToSpecies gn:Mus_musculus ; + skos:notation taxon:511145 ; + gnt:hasGeneId generif:944744 ; + dct:hasVersion '1'^^xsd:int ; + dct:references pubmed:97295 ; + ... + dct:references pubmed:15361618 ; + dct:created "2007-11-06T00:38:00"^^xsd:datetime ; +] . +gn:symbolaraC rdfs:comment [ + rdf:type gnc:NCBIWikiEntry ; + rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string ; + gnt:belongsToSpecies gn:Mus_musculus ; + skos:notation taxon:511145 ; + gnt:hasGeneId generif:944780 ; + dct:hasVersion '1'^^xsd:int ; + dct:references pubmed:320034 ; + ... + dct:references pubmed:16369539 ; + dct:created "2007-11-06T00:39:00"^^xsd:datetime ; +] . +``` + +Moreover, we also store all the different versions of a comment: + +``` +mysql> SELECT * FROM GeneRIF_BASIC WHERE SpeciesId=1 AND TaxID=7955 AND GeneId=323473 AND PubMed_ID = 15680355\G +*************************** 1. row *************************** + SpeciesId: 1 + TaxID: 7955 + GeneId: 323473 + symbol: prdm1a + PubMed_ID: 15680355 +createtime: 2010-01-21 00:00:00 + comment: One of two mutations in which defects are observed in both cell populations: it leads to a complete absence of RB neurons and a reduction in neural crest cells + VersionId: 1 +*************************** 2. row *************************** + SpeciesId: 1 + TaxID: 7955 + GeneId: 323473 + symbol: prdm1a + PubMed_ID: 15680355 +createtime: 2010-01-21 00:00:00 + comment: prdm1 functions to promote the cell fate specification of both neural crest cells and sensory neurons + VersionId: 2 +``` + +## Decision + +First, we should only store the latest version of a given RIF entry and ignore all other versions. RIF entries in the GeneRIF_BASIC table are uniquely identified by the columns: SpeciesId, GeneId, PubMed_ID, createtime, and VersionId. Since we are storing the latest version of a given RIF entry, we drop the version identifier during the RDF transform. + +We use a unique identifier for a given comment, and use that as a triple's QName: + +> gn:rif-<speciesId>-<GeneId> + +Finally instead of: + +``` +<symbol> predicate <comment metadata> +``` + +We use: + +``` +<comment-uid> predicate object ; + ... (more metadata) . +``` + +An example triple would take the form: + +``` +gn:rif-1-511145 rdf:label '''N-terminus verified by Edman degradation on mature peptide'''@en . +gn:rif-1-511145 rdf:type gnc:NCBIWikiEntry . +gn:rif-1-511145 gnt:belongsToSpecies gn:Mus_musculus . +gn:rif-1-511145 skos:notation taxon:511145 . +gn:rif-1-511145 rdfs:seeAlso [ + gnt:hasGeneId generif:944744 ; + gnt:symbol "spA" ; + dct:references ( pubmed:97295 ... pubmed:15361618 ) . +] . +gn:rif-1-511145 rdfs:seeAlso [ + gnt:hasGeneId generif:944780 ; + gnt:symbol "araC" ; + dct:references ( pubmed:320034 ... pubmed:16369539 ) . +] +``` + +To efficiently store GeneIds, symbols and references, we use blank nodes. This reduces redundancy and simplifies the triples compared to including these details within the subject: + +``` +gn:rif-1-511145-944744 rdf:label '''N-terminus verified by Edman degradation on mature peptide'''@en . +gn:rif-1-511145-944744 rdf:type gnc:NCBIWikiEntry . +gn:rif-1-511145-944744 gnt:belongsToSpecies gn:Mus_musculus . +gn:rif-1-511145-944744 skos:notation taxon:511145 . +gn:rif-1-511145-944744 gnt:hasGeneId generif:944744 . +gn:rif-1-511145-944744 gnt:symbol "spA" . +gn:rif-1-511145-944744 dct:references ( pubmed:97295 ... pubmed:15361618 ) . + +gn:rif-1-511145-944780 rdf:label '''N-terminus verified by Edman degradation on mature peptide'''@en . +gn:rif-1-511145-944780 rdf:type gnc:NCBIWikiEntry . +gn:rif-1-511145-944780 gnt:belongsToSpecies gn:Mus_musculus . +gn:rif-1-511145-944780 skos:notation taxon:511145 . +gn:rif-1-511145-944780 gnt:hasGeneId generif:944744 . +gn:rif-1-511145-944780 gnt:symbol "spA" . +gn:rif-1-511145-944780 dct:references ( pubmed:97295 ... pubmed:15361618 ) . +``` + +## Consequences + +* More complex SQL query required for the transform. +* De-duplication of RIF entries during the transform. +* Because of the terseness, less work during the I/O heavy operation. +* Update SPARQL in tux02, tux01 in lockstep with updating GN3/GN2 and the XAPIAN index. diff --git a/topics/ADR/gn3/000-add-test-cases-for-rdf.gmi b/topics/ADR/gn3/000-add-test-cases-for-rdf.gmi new file mode 100644 index 0000000..43ac2ba --- /dev/null +++ b/topics/ADR/gn3/000-add-test-cases-for-rdf.gmi @@ -0,0 +1,21 @@ +# [gn3/ADR-000] Add RDF Test Cases + +* author: bonfacem +* status: proposed +* reviewed-by: jnduli + +## Context + +We have no way of ensuring the integrity of our SPARQL queries in GN3. As such, GN3 is fragile to breaking changes when the TTL files are updated. + +## Decision + +In Virtuoso, we load all our data to a default named graph: <http://genenetwork.org>. For SPARQL/RDF tests, we should upload test ttl files to a test named graph: <http://cd-test.genenetwork.org>, and run our RDF unit tests against that named graph. + +## Consequences + +* Extra bootstrapping to load ttl files when running the test. +* Extra documentation to GN developers on how to run virtuoso locally to get the tests running. +* Testing against gn-machines to make sure that all things run accordingly. +* Extra maintenance costs to keep the TTL files in lockstep with the latest RDF changes during re-modeling. +* Improvement in GN3 reliability. diff --git a/topics/ADR/gn3/001-remove-stace-traces-in-gn3-error-response.gmi b/topics/ADR/gn3/001-remove-stace-traces-in-gn3-error-response.gmi new file mode 100644 index 0000000..0910415 --- /dev/null +++ b/topics/ADR/gn3/001-remove-stace-traces-in-gn3-error-response.gmi @@ -0,0 +1,49 @@ +# [gn3/ADR-001] Remove Stack Traces in GN3 + +* author: bonfacem +* status: rejected +* reviewed-by: jnduli, zach, pjotr, fredm + +## Context + +Currently, GN3 error responses include stack traces: + +``` +def add_trace(exc: Exception, jsonmsg: dict) -> dict: + """Add the traceback to the error handling object.""" + return { + **jsonmsg, + "error-trace": "".join(traceback.format_exception(exc)) + } + + +def page_not_found(pnf): + """Generic 404 handler.""" + current_app.logger.error("Handling 404 errors", exc_info=True) + return jsonify(add_trace(pnf, { + "error": pnf.name, + "error_description": pnf.description + })), 404 + + +def internal_server_error(pnf): + """Generic 404 handler.""" + current_app.logger.error("Handling internal server errors", exc_info=True) + return jsonify(add_trace(pnf, { + "error": pnf.name, + "error_description": pnf.description + })), 500 +``` + + +## Decision + +Stack traces have the potential to allow malicious actors compromise our system by providing more context. As such, we should send a useful description of what went wrong; and log our stack traces in our logs, and send an appropriate error status code. We can use the logs to troubleshoot our system. + +## Consequences + +* Lockstep update in GN2 UI on how we handle GN3 errors. + +## Rejection Rationale + +The proposal to remove stack traces from error responses was rejected because they are essential for troubleshooting, especially when issues are difficult to reproduce or production logs are inaccessible. Stack traces provide immediate error context, and removing them would complicate debugging by requiring additional effort to link logs with specific requests; a trade-off we are not willing to make at the moment. diff --git a/topics/ADR/gn3/002-run-rdf-tests-in-build-container.gmi b/topics/ADR/gn3/002-run-rdf-tests-in-build-container.gmi new file mode 100644 index 0000000..a8026ce --- /dev/null +++ b/topics/ADR/gn3/002-run-rdf-tests-in-build-container.gmi @@ -0,0 +1,32 @@ +# [gn3/ADR-002] Move RDF Test Cases to Build Container + +* author: bonfacem +* status: accepted +* reviewed-by: jnduli + +## Context + +GN3 RDF tests are run against the CD's virtuoso instance. As such, we need to set special parameters when running tests: + +``` +SPARQL_USER = "dba" +SPARQL_PASSWORD = "dba" +SPARQL_AUTH_URI="http://localhost:8890/sparql-auth/" +SPARQL_CRUD_AUTH_URI="http://localhost:8890/sparql-graph-crud-auth" +FAHAMU_AUTH_TOKEN="XXXXXX" +``` + +This extra bootstrapping when running tests needs care, and locks tests to CD or special configuration when running locally. This leads to fragile tests that cause CD to break. Moreover, to add tests to CD, we would have to add extra g-exp to gn-machines. + +This ADR is related to: + +=> /topics/ADR/gn3/000-add-test-cases-for-rdf.gmi gn3/ADR-000. + +## Decision + +Move tests to the test build phase of building the genenetwork3 package. These tests are added in the ".guix/genenetwork3-all-tests.scm" file instead of the main "genenetwork3" package definition in guix-bioinformatics. This way, we have all our "light" tests I.e. unit tests running in guix-bioinformatics, while having all our heavier tests, in this case, RDF tests, running in CD. + +## Consequences + +* Extra bootstrapping to gn3's .guix/genenetwork3-package.scm to get tests working. +* GN3 RDF tests refactoring to use a virtuoso instance running in the background while tests are running. diff --git a/topics/ai/aider.gmi b/topics/ai/aider.gmi new file mode 100644 index 0000000..aa88e71 --- /dev/null +++ b/topics/ai/aider.gmi @@ -0,0 +1,16 @@ +# Aider + +=> https://aider.chat/ + +``` +python3 -m venv ~/opt/python-aider +~/opt/python-aider/bin/python3 -m pip install aider-install +~/opt/python-aider/bin/aider-install +``` + +Installed 1 executable: aider +Executable directory /home/wrk/.local/bin is already in PATH + +``` +aider --model gpt-4o --openai-api-key aa... +``` diff --git a/topics/ai/ontogpt.gmi b/topics/ai/ontogpt.gmi new file mode 100644 index 0000000..94bd165 --- /dev/null +++ b/topics/ai/ontogpt.gmi @@ -0,0 +1,7 @@ +# OntoGPT + +python3 -m venv ~/opt/ontogpt +~/opt/ontogpt/bin/python3 -m pip install ontogpt + + +runoak set-apikey -e openai diff --git a/topics/authentication/architecture.gmi b/topics/authentication/architecture.gmi index 931f9cb..2200745 100644 --- a/topics/authentication/architecture.gmi +++ b/topics/authentication/architecture.gmi @@ -54,13 +54,14 @@ Users are granted privileges (see "Privileges" section) to act upon resources, t Examples of "types" of resources on the system: -- system: The system itself -- group: Collection of users considered a group -- genotype: A resource representing a genotype trait -- phenotype: A resource representing a phenotype trait -- mrna: A resource representing a collection of mRNA Assay traits -- inbredset-group: A resource representing an InbredSet group - +* system: The system itself +* group: Collection of users considered a group +* genotype: A resource representing a genotype trait +* phenotype: A resource representing a phenotype trait +* mrna: A resource representing a collection of mRNA Assay traits +* inbredset-group: A resource representing an InbredSet group + +---- * TODO: Figure out a better name/description for "InbredSet group" -- so far, I have "a classification/grouping of traits/datasets within a species". Another is to use the term "population". ## Users diff --git a/topics/authentication/development-guide.gmi b/topics/authentication/development-guide.gmi new file mode 100644 index 0000000..840c26b --- /dev/null +++ b/topics/authentication/development-guide.gmi @@ -0,0 +1,60 @@ +# GN-AUTH FAQ + +## Tags + +* type: docs, documentation +* status: ongoing, open +* keywords: authentication, authorisation, docs, documentation +* author: @jnduli + +## Quick configuration for local development + +Save a `local_settings.conf` file that has the contents: + +``` +SQL_URI = "mysql://user:password@localhost/db_name" # mysql uri +AUTH_DB = "/absolute/path/to/auth.db/" # path to sqlite db file +GN_AUTH_SECRETS = "/absolute/path/to/secrets/secrets.conf" +``` + +The `GN_AUTH_SECRETS` path has two functions: + +* It contains the `SECRET_KEY` we use in our application +* The folder containing this file is used to store our jwks. + +An example is: + +``` +SECRET_KEY = "qQIrgiK29kXZU6v8D09y4uw_sk8I4cqgNZniYUrRoUk" +``` + +## Quick set up cli commands + +``` +export FLASK_DEBUG=1 AUTHLIB_INSECURE_TRANSPORT=1 OAUTHLIB_INSECURE_TRANSPORT=1 FLASK_APP=gn_auth/wsgi +export GN_AUTH_CONF=/absolute/path/to/local_settings.conf +flask init-dev-clients --client-uri "http://localhost:port" +flask init-dev-users +flask assign-system-admin 0ad1917c-57da-46dc-b79e-c81c91e5b928 +``` + +## Handling verification for users in local development + +* Run flask init_dev_users, which will create a verified local user. +* Run `UPDATE users set verified=1` on the sqlite3 auth database. + +## Errors related to unsupported clients/redirect URIs for client + +Rerun + +``` +FLASK_DEBUG=1 AUTHLIB_INSECURE_TRANSPORT=1 OAUTHLIB_INSECURE_TRANSPORT=1 \ + GN_AUTH_CONF=/absolute/path/to/local_settings.conf FLASK_APP=gn_auth/wsgi \ + flask init-dev-clients --client-uri "http://localhost:port_you_use_for_gn2" +``` + +This will update your clients list to have all the related urls we want. + +## 500 Server Error: INTERNAL SERVER ERROR + +When you see the error: `500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:8081/auth/token`, restart the gn2 server. diff --git a/topics/authentication/permission_hooks.gmi b/topics/authentication/permission_hooks.gmi new file mode 100644 index 0000000..dd475b6 --- /dev/null +++ b/topics/authentication/permission_hooks.gmi @@ -0,0 +1,62 @@ +# Permission Hooks System Design +## Status: Draft + +## Objective + +We want to achieve: + +- Default permissions for users that come from `.edu` domains. +- Support for visitors to the website. + +This should be dynamic and easily maintenable. + +## Design + +### Events + +* Use middleware to plug into the various aspects of a requests life cycle. We'll plug into `after_request` for providing default permissions. +* Create a hook which contains: the event to handle, what part of the life cycle the hook plugs into and the actual functions to call, +* Events can be identified using their `request.base_url` parameter. +* Each hook registers itself to the global set of hooks (TODO: Figure out how to automatically handle the registration). + + +``` +@app.after_request +def handle_hooks(): + for hook in hooks: + if hook.lifecycle == "after_request" and hook.can_handle(): + hook.run() + + +Hooks = [RegistrationHook, ...] + + +class RegistrationHook: + + def can_handle(self): + request.base_url == "register" + + def lifecyle: + return "after_request" + + def run(self): + ... +``` + +### Privilege Hooks + +* After login/registration, use the email to get extra privileges assigned to a user. We use `login` too to ensure that all users have the most up-to-date roles and privileges. +* This means that any user gets assigned these privileges and normal workflows can happen. + +### Storage + +* Create a new role that contains the default privileges we want to assign to users depending on their domain. +* This role will link up with the privileges to be assigned to said user. +* Example privileges we may want to add to users in the `.edu` domain: + * group:resource:edit-resource + * system:inbreadset:apply-case-attribute-edit + * system:inbreadset:edit-case-attribute + * system:inbreadset:view-case-attribute +* Create an extra table that provides a link between some `email identifier` and the role we'd like to pre-assign. We can use python regex for the email identifier e.g. `*.edu$` or `*.utsch.edu`. +* This will be the table used by the Registration Hook. +* This also allows us to edit roles/privileges without code releases. diff --git a/topics/biohackathon/biohackrxiv2024.gmi b/topics/biohackathon/biohackrxiv2024.gmi new file mode 100644 index 0000000..a159ec4 --- /dev/null +++ b/topics/biohackathon/biohackrxiv2024.gmi @@ -0,0 +1,7 @@ +# BioHackRxiv + +We have a hacking week in Barcelona to work on BioHackRXiv. + +# Tasks + +* [ ] ORCIDs for authors in PDF diff --git a/topics/R-qtl2-format-notes.gmi b/topics/data/R-qtl2-format-notes.gmi index e0109b1..3397b5e 100644 --- a/topics/R-qtl2-format-notes.gmi +++ b/topics/data/R-qtl2-format-notes.gmi @@ -1,4 +1,4 @@ -# R/qtl2 Format Notes +# R/qtl2 and GEMMA Format Notes This document is mostly to help other non-biologists figure out their way around the format(s) of the R/qtl2 files. It mostly deals with the meaning/significance of the various fields. @@ -12,6 +12,39 @@ and We are going to consider the "non-transposed" form here, for ease of documentation: simply flip the meanings as appropriate for the transposed files. +To convert between formats we should probably use python as that is what can use as 'esperanto'. + +## Control files + +Both GN and R/qtl2 have control files. For GN it basically describes the individuals (genometypes) and looks like: + +```js +{ + "mat": "C57BL/6J", + "pat": "DBA/2J", + "f1s": ["B6D2F1", "D2B6F1"], + "genofile" : [{ + "title" : "WGS-based (Mar2022)", + "location" : "BXD.8.geno", + "sample_list" : ["BXD1", "BXD2", "BXD5", "BXD6", "BXD8", "BXD9", "BXD11", "BXD12", "BXD13", "BXD14", "BXD15", "BXD16", "BXD18", "BXD19", "BXD20", "BXD21", "BXD22", "BXD23", "BXD24", "BXD24a", "BXD25", "BXD27", "BXD28", "BXD29", "BXD30", "BXD31", "BXD32", "BXD33", "BXD34", "BXD35", "BXD36", "BXD37", "BXD38", "BXD39", "BXD40", "BXD41", "BXD42", "BXD43", "BXD44", + ...]}]} +``` + +In gn-guile this gets parsed in gn/data/genotype.scm to fetch the individuals that match the genotype and phenotype layouts. + +## pheno files and phenotypes + +The standard GEMMA input files are not very good for trouble shooting. R/qtl2 has at least the individual or genometype ID for every line: + +``` +id,bolting_days,seed_weight,seed_area,ttl_seedspfruit,branches,height,pc_seeds_aborted,fruit_length +MAGIC.1,15.33,17.15,0.64,45.11,10.5,NA,0,14.95 +MAGIC.2,22,22.71,0.75,49.11,4.33,42.33,1.09,13.27 +MAGIC.3,23,21.03,0.68,57,4.67,50,0,13.9 +``` + +This is a good standard and can match with the control files. + ## geno files > The genotype data file is a matrix of individuals × markers. The first column is the individual IDs; the first row is the marker names. @@ -22,10 +55,6 @@ For GeneNetwork, this means that the first column contains the Sample names (pre The first column of the gmap/pmap file contains genetic marker values. There are no Individuals/samples (or strains) here. -## pheno files - -The first column is the list of individuals (samples/strains) whereas the first column is the list of phenotypes. - ## phenocovar files These seem to contain extra metadata for the phenotypes. diff --git a/topics/data/precompute/steps.gmi b/topics/data/precompute/steps.gmi index 75e3bfd..d22778a 100644 --- a/topics/data/precompute/steps.gmi +++ b/topics/data/precompute/steps.gmi @@ -13,8 +13,18 @@ We will track precompute steps here. We will have: Trait archives will have steps for * [X] step p1: list-traits-to-compute -* [+] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper -* [ ] step p3: gemma-to-lmdb: create a clean vector +* [X] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper +* [X] step p3: gemma-to-lmdb: create a clean vector + +Start precompute + +* [ ] Fetch traits on tux04 +* [ ] Set up runner on tux04 and others +* [ ] Run on Octopus + +Work on published data + +* [ ] Fetch traits The DB itself can be updated from these @@ -22,8 +32,11 @@ The DB itself can be updated from these Later +* [ ] Rqtl2: Compute Rqtl2 vector * [ ] bulklmm: Compute bulklmm vector +Interestingly this work coincides with Arun's work on CWL. Rather than trying to write a workflow in bash, we'll use ccwl and accompanying tools to scale up the effort. + # Tags * assigned: pjotrp @@ -36,10 +49,10 @@ Later * [ ] Check Artyoms LMDB version for kinship and maybe add LOCO * [+] Create JSON metadata controller for every compute incl. type of content -* [+] Create genotype archive -* [+] Create kinship archive +* [X] Create genotype archive +* [X] Create kinship archive * [+] Create trait archives -* [+] Kick off lmm9 step +* [X] Kick off lmm9 step * [ ] Update DB step v1 # Step p1: list traits to compute @@ -62,7 +75,7 @@ At this point we can write {"2":9.40338,"3":10.196,"4":10.1093,"5":9.42362,"6":9.8285,"7":10.0808,"8":9.17844,"9":10.1527,"10":10.1167,"11":9.88551,"13":9.58127,"15":9.82312,"17":9.88005,"19":10.0761,"20":10.2739,"21":9.54171,"22":10.1056,"23":10.5702,"25":10.1433,"26":9.68685,"28":9.98464,"29":10.132,"30":9.96049,"31":10.2055,"35":10.1406,"36":9.94794,"37":9.96864,"39":9.31048} ``` -Note that it (potentially) includes the parents. Also the strain-id is a string and we may want to plug in the strain name. To allow for easy comparison downstream. Finally we may want to store a checksum of sorts. In Guile this can be achieved with: +Note that it (potentially) includes the parents and that is corrected when generating the phenotype file for GEMMA. Also the strain-id is a string and we may want to plug in the strain name. To allow for easy comparison downstream. Finally we may want to store a checksum of sorts. In Guile this can be achieved with: ```scheme (use-modules (rnrs bytevectors) diff --git a/topics/database/mariadb-database-architecture.gmi b/topics/database/mariadb-database-architecture.gmi new file mode 100644 index 0000000..0454d71 --- /dev/null +++ b/topics/database/mariadb-database-architecture.gmi @@ -0,0 +1,830 @@ +# MariaDB Database Architecture + +The GeneNetwork database is running on MariaDB and the layout is almost carved in stone because so much code depends on it. +We are increasingly moving material out into lmdb (genotypes and phenotypes) and virtuoso (all types of metadata), but this proves a lengthy and rather tedious process. We also run redis for cachine, sqlite for authentication, and xapian for search! + +In this document we'll discuss where things are, where they ought to go, and how the nomenclature should change. + +An SVG of the SQL layout can be found here + +=> https://raw.githubusercontent.com/genenetwork/gn-gemtext-threads/main/topics/database/sql.svg + +# Nomenclature + +These are the terms we use + +* Genotypes +* Case or genometype: individual, strain, sample +* ProbeData: Now almost obsolete. [Comment by RWW perhaps for a footnote: In GeneNetwork 1 we had built and maintained a table for individual "Probe level" data simply because the Affymetrix data sets were so large. For example, the BXD Family: "UMUTAffy Hippocampus Exon 9Feb09)RMA" array data consists of 1.236 million "probesets" each of which is a summary of many individual probe assays (ProbeData)—a total of 4.5 million probes (see https://www.thermofisher.com/order/catalog/product/900817). In GN1 we built a special interface to interrogate these 4.5 million indivdual probes--extremely useful to studing the fine-structure of mRNA expresswion. We thought it best to split these very large "pro-level" data sets from the much smaller and more widely use "ProbeSetData". The term "Probe" in this particular context (Affymetrix Exon arrays) refers to short nucleotide probes used by Affymetrix and other microarray vendors. Affymetrix "Exon"-type arrays consist of 25 nt hybridization probes that target relatively specific parts of RNAs--mainly exons but also many intronic sequences.] +* ProbeSetData: trait/sample values almost exclusively used for molecular data types (mRNA, protein, methylation assays, metabolomics, etc). [Comment by RWW perhaps for a footnote: The term "ProbeSetData" should ideally be changed to "High_Content_Data_Assays. In 2003 the only high content data assays we had were Affymetrix microarrays that measured mRNA level, and the vendor called their assays "ProbeSets". We used this now obsolete term. Most ProbeSetData in GN1 and GN2 as of 2024 are measurments of molecular traits that can be tagged to a single genome location—-the location of the gene from which the mRNA and its derivative protein are transcribed and translated, or in the case of epigenomic studies—the site at which the genome is methylated. When these three types of molecular traits are mapped, we typically add a mark all graphic output maps that highlight the location of the "parent" gene. For example, the sonic hedgehog gene in mice is located on chromosome 5 at about 28.457 Mb on the mm10 assembly (aka GRCm38). When we measure the expression of Shh mRNA, we place a purple triangle at the coordinate of the Shh gene. Two notes: 1. There are at least three ProbeSetData types do NOT have parent genes--metabolomic data, and metagenomic data, and new high-content brain connectome data. When we do NOT know the location of a parent gene, we should NOT place any mark along the X-axis. 2. Ideally GN databases would define the TYPE of high-content data, so that the code could fork to the correct GUI for that particular data type. Connectome data for the brain is an example of a data type that is very large (40,000 measurements per brain), that is truly high-content data, but that is NOT molecular. Time series data may also fall into this category.] +* ProbeSetFreeze: points to datasets + +## More on naming + +Naming convention-wise there is a confusing use of id and data-id in particular. We should stick to the table-id naming. + +# The small test database (2GB) + +The default install comes with a smaller database which includes a +number of the BXDs and the Human liver dataset (GSE9588). + +It can be downloaded from: + +=> https://files.genenetwork.org/database/ + +Try the latest one first. + +# GeneNetwork database + +Estimated table sizes with metadata comment for the important tables + +select table_name,round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` from information_schema.TABLES where table_schema = "db_webqtl" order by data_length; + +``` ++-------------------------+------------+ +| table_name | Size in MB | Should be named: ++-------------------------+------------+ +| PublishData | 22.54 | ClassicTraitValues <- data-id, strain-id, value (3M traits) +| PublishSE | 4.71 | ClassicTraitValueError (300K traits) <- data-id, strain-id, value +| PublishXRef | 2.18 | List of publications <- id, data-id, inbred-id, pheno-id, pub-id +| ProbeSetData | 59358.80 | BulkTraitValues <- id, strain, value +| ProbeSetSE | 14551.02 | BulkTraitValueError <- SE values aligns with ProbeSetData +| ProbeSetXRef | 4532.89 | PrecomputedLRS <- precomputed LRS values, pointing to dataset+trait +| ProbeSet | 2880.21 | ProbeSetInfo <- over utilized mRNA probeset description, e.g. 100001_at comes with sequence info +| ProbeSetFreeze | 0.22 | DatasetInfo <- dataset description, e.g. "Hippocampus_BXD_Jun06" - probesetfreezeid points to dataset, shortname, public? +| Probe | 2150.30 | ProbeInfo <- Probe trait info incl sequence, id, probeset-id +| ProbeFreeze | 0.06 | Dataset names <- Similar to ProbesetFreeze, id, chip-id, inbredset-id, tissue-id +| Phenotype | 6.50 | PhenotypeMeta <- "Hippocampus weight", id, prepublish short-name, postpublish short-name +| ProbeXRef | 743.38 | ProbeFreezeDataIDs <- link ProbeFreeze-Id,Probe-Id with Data-Id +| Datasets | 2.31 | DatasetMeta <- "Data generated by...", investigator-id, publication title +| NStrain | 4.80 | StrainCountDataId <- Strains used in dataset, count, strain-id, data-id +| Strain | 1.07 | StrainNames <- with species ID and alias, id, species-id, name +| TissueProbeSetData | 74.42 | <- link Id,TissueID with value +| TissueProbeSetXRef | 14.73 | TissueGeneTable? <- data-id, gene-id, mean, symbol, TissueProbeSetFreezeId | ProbesetId | DataId +| TissueProbeSetFreeze | 0.01 | tissueprobefreeze-id +| InbredSet | 0.01 | InbredSetMeta -> Id,SpeciesId,FullName +| ProbeData | 22405.44 | (OLD?) mRNAStrainValues used for partial correlations <- id, strain, value = individual probe data (mRNA) [GN1,GN3] +| ProbeSE | 6263.83 | (OLD?) Trait Error <- trait SE aligns with ProbeData? [GN3] ++-------------------------+------------+ +``` +Less commonly used tables: + +``` ++-------------------------+------------+ +| table_name | Size in MB | ++-------------------------+------------+ +| LCorrRamin3 | 18506.53 | +| SnpAll | 15484.67 | +| SnpPattern | 9177.05 | +| QuickSearch | 5972.86 | +| GenoData | 3291.91 | Strain by genotype - only used in GN1 +| CeleraINFO_mm6 | 989.80 | +| pubmedsearch | 1032.50 | +| GeneRIF_BASIC | 448.54 | +| BXDSnpPosition | 224.44 | +| EnsemblProbe | 133.66 | +| EnsemblProbeLocation | 105.49 | +| Genbank | 37.71 | +| AccessLog | 42.38 | +| GeneList | 34.11 | +| Geno | 33.90 | Marker probe info (incl. sequence) +| MachineAccessLog | 28.34 | +| IndelAll | 22.42 | +| ProbeH2 | 13.26 | +| GenoXRef | 22.83 | +| TempData | 8.35 | +| GeneList_rn3 | 5.54 | +| GORef | 4.97 | +| temporary | 3.59 | +| InfoFiles | 3.32 | +| Publication | 3.42 | +| Homologene | 5.69 | +| GeneList_rn33 | 2.61 | +| GeneRIF | 2.18 | +| Vlookup | 1.87 | +| H2 | 2.18 | +| IndelXRef | 2.91 | +| GeneMap_cuiyan | 0.51 | +| user_collection | 0.30 | +| CaseAttributeXRef | 0.44 | +| StrainXRef | 0.56 | +| GeneIDXRef | 0.77 | +| Docs | 0.17 | +| News | 0.17 | +| GeneRIFXRef | 0.24 | +| Sample | 0.06 | +| login | 0.06 | +| user | 0.04 | +| TableFieldAnnotation | 0.05 | +| DatasetMapInvestigator | 0.05 | +| User | 0.04 | +| TableComments | 0.02 | +| Investigators | 0.02 | +| DBList | 0.03 | +| Tissue | 0.02 | +| GeneChip | 0.01 | +| GeneCategory | 0.01 | +| SampleXRef | 0.01 | +| SnpAllele_to_be_deleted | 0.00 | +| Organizations | 0.01 | +| PublishFreeze | 0.00 | +| GenoFreeze | 0.00 | Used for public/private +| Chr_Length | 0.01 | +| SnpSource | 0.00 | +| AvgMethod | 0.00 | +| Species | 0.00 | +| Dataset_mbat | 0.00 | +| TissueProbeFreeze | 0.00 | +| EnsemblChip | 0.00 | +| UserPrivilege | 0.00 | +| CaseAttribute | 0.00 | +| MappingMethod | 0.00 | +| DBType | 0.00 | +| InfoFilesUser_md5 | 0.00 | +| GenoCode | 0.00 | +| DatasetStatus | 0.00 | +| GeneChipEnsemblXRef | 0.00 | +| GenoSE | 0.00 | +| user_openids | 0.00 | +| roles_users | 0.00 | +| role | 0.00 | +| Temp | NULL | ++-------------------------+------------+ +97 rows in set, 1 warning (0.01 sec) +``` + +All *Data tables are large + +## Tables containing trait values + +A trait on GN is defined by a trait-id with a dataset-id. + +=> https://genenetwork.org/show_trait?trait_id=10031&dataset=BXDPublish + +The trait-id can also be a probe name + +=> https://genenetwork.org/show_trait?trait_id=1441566_at&dataset=HC_M2_0606_P + +One of the more problematic aspects of GN is that there are two tables containing trait values (actually there are three!). ProbeSetData mostly contains expression data. PublishData contains 'classical' phenotypes. ProbeData is considered defunct. + +So, a set of trait values gets described by the dataset+probe (trait_id) OR by BXDPublish --- which is its own table --- and an identifier, here 10031. + +OK, let's look at the ProbeSetData (expression) traits: + +``` +MariaDB [db_webqtl]> select * from ProbeSetData limit 5; ++----+----------+-------+ +| Id | StrainId | value | ++----+----------+-------+ +| 1 | 1 | 5.742 | +| 1 | 2 | 5.006 | +| 1 | 3 | 6.079 | +| 1 | 4 | 6.414 | +| 1 | 5 | 4.885 | ++----+----------+-------+ +5 rows in set (0.193 sec) +MariaDB [db_webqtl]> select * from ProbeData limit 5; ++--------+----------+---------+ +| Id | StrainId | value | ++--------+----------+---------+ +| 503636 | 42 | 11.6906 | +| 503636 | 43 | 11.4205 | +| 503636 | 44 | 11.2491 | +| 503636 | 45 | 11.2373 | +| 503636 | 46 | 12.0471 | ++--------+----------+---------+ +5 rows in set (0.183 sec) +``` + +ProbeSet describes ProbeSetData. I.e., every probe ID comes with a sequence (microarray) etc. + +As for duplicated data: duplicated or "detached"* data happens sometimes, though that's not related to the PublishData/ProbeSetData distinction (unless this is done deliberately for some reason). I believe that whether data is entered as one or the other primarily comes down to the desire/need to divide it into datasets (or by tissue) within a group (with mRNA expression data just being the most common reason for this). I've encountered a situation before with Arthur where there was data in ProbeSetData that wasn't also in ProbeSetXRef + +an you give an example of exactly what you mean? PublishData would be stuff like sex, weight, etc (is this what you mean?) while ProbeSetData is used for mRNA expression data (except for a few situations where it isn't lol). + +That being said, *functionally*, I think the only real distinction (aside from what metadata is displayed) is that "ProbeSet" data has extra levels of "granularity" where it's also organized by tissue type and can be split into "datasets" (while "PublishData" traits are only associated with a Group (InbredSet in DB). That's why some non-mRNA expression data is still classified as "ProbeSet" - I think it's basically just a way to separate it into datasets within a group, often for specific tissues. + +So the organization is something like this: + +``` +Group -> PublishData +Group -> Tissue -> Dataset -> ProbeSetData +``` + +## ProbeData + +[OBSOLETE] ProbeData meanwhile is a table with fine-grained probe level Affymetrix data only. Contains 1 billion rows March 2016. This table may be *deleted* later since it is only used by the Probe Table display in GN1. Not used in GN2 +"ProbeData" should probably be "AssayData" or something more neutral. + +In comparison the "ProbeSetData" table contains more molecular assay data, including probe set data, RNA-seq data, proteomic data, and metabolomic data. 2.5 billion rows March 2016. +ProbeData contains data only for Affymetrix probe level data (e.g. Exon array probes and M430 probes). + +"StrainId" should be "CaseId" or "SampleId" or "GenometypeId", see nomenclature above. + +``` +select * from ProbeData limit 2; ++--------+----------+---------+ +| Id | StrainId | value | ++--------+----------+---------+ +| 503636 | 42 | 11.6906 | +| 503636 | 43 | 11.4205 | ++--------+----------+---------+ +2 rows in set (0.00 sec) + +select count(*) from ProbeData limit 2; ++-----------+ +| count(*) | ++-----------+ +| 976753435 | ++-----------+ +1 row in set (0.00 sec) +``` + +## PublishData + +These are the classic phenotypes under BXDPublish. + +``` +MariaDB [db_webqtl]> select * from PublishData where StrainId=5 limit 5; ++---------+----------+------------+ +| Id | StrainId | value | ++---------+----------+------------+ +| 8967043 | 5 | 49.000000 | +| 8967044 | 5 | 50.099998 | +| 8967045 | 5 | 403.000000 | +| 8967046 | 5 | 45.500000 | +| 8967047 | 5 | 44.900002 | ++---------+----------+------------+ +5 rows in set (0.265 sec) +MariaDB [db_webqtl]> select * from PublishSE where StrainId=5 limit 5; ++---------+----------+-------+ +| DataId | StrainId | error | ++---------+----------+-------+ +| 8967043 | 5 | 1.25 | +| 8967044 | 5 | 0.71 | +| 8967045 | 5 | 8.6 | +| 8967046 | 5 | 1.23 | +| 8967047 | 5 | 1.42 | ++---------+----------+-------+ +5 rows in set (0.203 sec) +MariaDB [db_webqtl]> select * from PublishXRef limit 2; ++-------+-------------+-------------+---------------+---------+-------------------+----------------+------------------+------------------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Id | InbredSetId | PhenotypeId | PublicationId | DataId | mean | Locus | LRS | additive | Sequence | comments | ++-------+-------------+-------------+---------------+---------+-------------------+----------------+------------------+------------------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| 10001 | 1 | 4 | 116 | 8967043 | 52.13529418496525 | rs48756159 | 13.4974911471087 | 2.39444435069444 | 1 | robwilliams modified post_publication_description at Mon Jul 30 14:58:10 2012 +robwilliams modified post_publication_description at Sat Jan 30 13:48:49 2016 + | +| 10002 | 1 | 10 | 116 | 8967044 | 52.22058767430923 | rsm10000005699 | 22.004269639323 | 2.08178575714286 | 1 | robwilliams modified phenotype at Thu Oct 28 21:43:28 2010 + | ++-------+-------------+-------------+---------------+---------+-------------------+----------------+------------------+------------------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ +2 rows in set (0.328 sec) +``` + +## ProbeSet + + +Comment: PLEASE CHANGE TABLE NAME and rework fields carefully. This is a terrible table but it works well (RWW March 2016). It is used in combination with the crucial TRAIT DATA and ANALYSIS pages in GN1 and GN2. It is also used by annotators using the UPDATE INFO AND DATA web form to correct and update annotation. It is used by Arthur to enter new annotation files and metadata for arrays, genes, proteins, metabolites. The main problem with this table is that it is doing too much work. And it is not doing enough because it is huge, but does not track changes. The plan is to migrate to lmdb for that. + +Initially (2003) this table contained only Affymetrix ProbeSet data for mouse (U74aV2 initially). Many other array platforms for different species were added. At least four other major categories of molecular assays have been added since about 2010: + +1. RNA-seq annotation and sequence data for transcripts using ENSEMBL identifiers or NCBI NM_XXXXX and NR_XXXXX type identifiers + +2. Protein and peptide annotation and sequence data (see BXD Liver Proteome data, SRM and SWATH type data) with identifiers such as "abcb10_q9ji39_t311" for SRM data and "LLGNMIVIVLGHHLGKDFTPAAQAA" for SWATH data where the latter is just the peptide fragment that has been quantified. Data first entered in 2015 for work by Rudi Aebersold and colleagues. + +3. Metabolite annotation and metadata (see BXD Liver Metabolome data) with identifiers that are usually Mass charge ratios such as "149.0970810_MZ" + +4. Epigenomic and methylome data (e.g. Human CANDLE Methylation data with identifiers such as "cg24523000") + +It would make good sense to break this table into four or more types of molecular assay metadata or annotation tables) (AssayRNA_Anno, AssayProtein_Anno, AssayMetabolite_Anno, AssayEpigenome_Anno, AssayMetagenome_Anno), since these assays will have many differences in annotation content compared to RNAs (RWW). + +Some complex logic is used to update contents of this table when annotators modify and correct the information (for example, updating gene symbols). These features requested by Rob so that annotating one gene symbol in one species would annotate all gene symbols in the same species based on common NCBI GeneID number. For example, changing the gene alias for one ProbeSet.Id will changing the list of aliases in all instances with the same gene symbol. + +If the ProbeSet.BlatSeq (or is this ProbSetTargetSeq) is identical between different ProbeSet.Ids then annotation is forced to be the same even if the symbol or geneID is different. This "feature" was implemented when we found many probe sets with identical sequence but different annotations and identifiers. + + +``` +select count(*) from ProbeSet limit 5; ++----------+ +| count(*) | ++----------+ +| 4351030 | ++----------+ +| Id | ChipId | Name | TargetId | Symbol | description | Chr | Mb | alias | GeneId | GenbankId | SNP | BlatSeq |TargetSeq | UniGeneId | Strand_Probe | Strand_Gene | OMIM | comments | Probe_set_target_region | Probe_set_specificity | Probe_set_BLAT_score | Probe_set_Blat_Mb_start | Probe_set_Blat_Mb_end | Probe_set_strand | Probe_set_Note_by_RW | flag | Symbol_H | description_H | chromosome_H | MB_H | alias_H | GeneId_H | chr_num | name_num | Probe_Target_Description | RefSeq_TranscriptId | Chr_mm8 | Mb_mm8 | Probe_set_Blat_Mb_start_mm8 | Probe_set_Blat_Mb_end_mm8 | HomoloGeneID | Biotype_ENS | ProteinID | ProteinName | Flybase_Id | HMDB_ID | Confidence | ChEBI_ID | ChEMBL_ID | CAS_number | PubChem_ID | ChemSpider_ID | UNII_ID | EC_number | KEGG_ID | Molecular_Weight | Nugowiki_ID | Type | Tissue | PrimaryName | SecondaryNames | PeptideSequence | ++------+--------+----------+----------+--------+----------------------------------------------+------+-----------+----------+--------+-----------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+--------------+-------------+--------+----------+-------------------------+-----------------------+----------------------+-------------------------+-----------------------+------------------+----------------------+------+----------+---------------+--------------+------+---------+----------+---------+----------+--------------------------+---------------------+---------+-----------+-----------------------------+---------------------------+--------------+-------------+-----------+-------------+------------+---------+------------+----------+-----------+------------+------------+---------------+---------+-----------+---------+------------------+-------------+------+--------+-------------+----------------+-----------------+ +| 7282 | 1 | 93288_at | NULL | Arpc2 | actin related protein 2/3 complex, subunit 2 | 1 | 74.310961 | AK008777 | 76709 | AI835883 | 0 | CCGACTTCCTTAAGGTGCTCAACCGGACTGCTTGCTACTGGATAATCGTGAGGGATTCTCCATTTGGGTTCCATTTTGTACGAGTTTGGCAAATAACCTGCAGAAACGAGCTGTGCTTGCAAGGACTTGATAGTTCCTAATCCTTTTCCAAGCTGTTTGCTTTGCAATATGT | ccgacttccttaaggtgctcaaccgtnnnnnnccnannnnccnagaaaaaagaaatgaaaannnnnnnnnnnnnnnnnnnttcatcccgctaactcttgggaactgaggaggaagcgctgtcgaccgaagnntggactgcttgctactggataatcgtnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnntgagggattctccatttgggttccattttgtacgagtttggcaaataacctgcagaaacgagctgtgcttgcaaggacttgatagttcctaagaattanaanaaaaaaaanaanttccacttgatcaanttaattcccttttatttttcctccctcantccccttccttttccaagctgtttgctttgcaatatgt | Mm.337038 | + | | 604224 | | NULL | 8.45 | 169 | 74.310961 | 74.31466 | NULL | NULL | 3 | NULL | NULL | NULL | NULL | NULL | NULL | 1 | 93288 | NULL | XM_129773 | 1 | 74.197594 | 74.197594 | 74.201293 | 4187 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | ++------+--------+----------+----------+--------+----------------------------------------------+------+-----------+----------+--------+-----------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+--------------+-------------+--------+----------+-------------------------+-----------------------+----------------------+-------------------------+-----------------------+------------------+----------------------+------+----------+---------------+--------------+------+---------+----------+---------+----------+--------------------------+---------------------+---------+-----------+-----------------------------+---------------------------+--------------+-------------+-----------+-------------+------------+---------+------------+----------+-----------+------------+------------+---------------+---------+-----------+---------+------------------+-------------+------+--------+-------------+----------------+-----------------+ +2 rows in set (0.00 sec) +``` + +** ProbeSetXRef (phenotypes/dataset_name.json) + +For every probe set (read dataset measuring point): + +``` +select * from ProbeSetXRef; +| ProbeSetFreezeId | ProbeSetId | DataId | Locus_old | LRS_old | pValue_old | mean | se | Locus | LRS | pValue | additive | h2 | +| 112 | 123528 | 23439389 | NULL | NULL | NULL | 6.7460707070707 | NULL | rs6239372 | 10.9675593568894 | 0.567 | 0.0448545966228878 | NULL | +| 112 | 123527 | 23439388 | NULL | NULL | NULL | 6.19416161616162 | NULL | rs13476936 | 10.9075670392762 | 0.567 | -0.0358456732993988 | NULL | +``` + +where ProbeSetFreezeId is the dataset (experiment). ProbesetId refers to the probe set information (measuring point). DataId points to the data point. The other values are used for search. It is used in search thus: + +``` +SELECT distinct ProbeSet.Name as TNAME, + ProbeSetXRef.Mean as TMEAN, ProbeSetXRef.LRS as TLRS, + ProbeSetXRef.PVALUE as TPVALUE, ProbeSet.Chr_num as TCHR_NUM, + ProbeSet.Mb as TMB, ProbeSet.Symbol as TSYMBOL, + ProbeSet.name_num as TNAME_NUM +FROM ProbeSetXRef, ProbeSet +WHERE ProbeSet.Id = ProbeSetXRef.ProbeSetId + and ProbeSetXRef.ProbeSetFreezeId = 112 + ORDER BY ProbeSet.symbol ASC limit 5; +| TNAME | TMEAN | TLRS | TPVALUE | TCHR_NUM | TMB | TSYMBOL | TNAME_NUM | +| 1445618_at | 7.05679797979798 | 13.5417452764616 | 0.17 | 8 | 75.077895 | NULL | 1445618 | +| 1452452_at | 7.232 | 30.4944361132252 | 0.0000609756097560421 | 12 | 12.6694 | NULL | 1452452 | +``` + +Probedata - main molecular data. Probesets, metabolome, + +Almost all important molecular assay data is in this table including probe set data, RNA-seq data, proteomic data, and metabolomic data. 2.5 billion rows March 2016. In comparison, ProbeData contains data only for Affymetrix probe level data (e.g. Exon array probes and M430 probes). + +# Strain + +``` +select * from Strain limit 5; ++----+----------+----------+-----------+--------+-------+ +| Id | Name | Name2 | SpeciesId | Symbol | Alias | ++----+----------+----------+-----------+--------+-------+ +| 1 | B6D2F1 | B6D2F1 | 1 | NULL | NULL | +| 2 | C57BL/6J | C57BL/6J | 1 | B6J | NULL | +| 3 | DBA/2J | DBA/2J | 1 | D2J | NULL | +| 4 | BXD1 | BXD1 | 1 | NULL | NULL | +| 5 | BXD2 | BXD2 | 1 | NULL | NULL | ++----+----------+----------+-----------+--------+-------+ +``` + +``` +show indexes from Strain; ++--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | ++--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| Strain | 0 | PRIMARY | 1 | Id | A | 14368 | NULL | NULL | | BTREE | | | +| Strain | 0 | Name | 1 | Name | A | 14368 | NULL | NULL | YES | BTREE | | | +| Strain | 0 | Name | 2 | SpeciesId | A | 14368 | NULL | NULL | | BTREE | | | +| Strain | 1 | Symbol | 1 | Symbol | A | 14368 | NULL | NULL | YES | BTREE | | | ++--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ + +A typical query may look like + +SELECT Strain.Name, ProbeSetData.value, ProbeSetSE.error, ProbeSetData.Id + FROM (ProbeSetData, ProbeSetFreeze, Strain, ProbeSet, ProbeSetXRef) + left join ProbeSetSE on + (ProbeSetSE.DataId = ProbeSetData.Id AND ProbeSetSE.StrainId = ProbeSetData.StrainId) + WHERE + ProbeSetFreeze.name = 'B139_K_1206_M' AND + ProbeSetXRef.ProbeSetId = ProbeSet.Id AND + ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id AND + ProbeSetXRef.DataId = ProbeSetData.Id AND + ProbeSetData.StrainId = Strain.Id + Order BY Strain.Name + ++-------+-------+-------+----------+ +| Name | value | error | Id | ++-------+-------+-------+----------+ +| SM001 | 38.3 | NULL | 25309550 | +| SM001 | 2.7 | NULL | 25309520 | +| SM001 | 20.3 | NULL | 25309507 | +| SM001 | 125.8 | NULL | 25309511 | +| SM001 | 8.2 | NULL | 25309534 | ++-------+-------+-------+----------+ +5 rows in set (22.28 sec) +``` + +# ProbeSetFreeze + +``` +select * from ProbeSetFreeze limit 5; ++----+---------------+-------+-------------+---------------------------------+---------------------------------------------+-------------------------+------------+-----------+--------+-----------------+-----------------+-----------+ +| Id | ProbeFreezeId | AvgID | Name | Name2 | FullName | ShortName | CreateTime | OrderList | public | confidentiality | AuthorisedUsers | DataScale | ++----+---------------+-------+-------------+---------------------------------+---------------------------------------------+-------------------------+------------+-----------+--------+-----------------+-----------------+-----------+ +| 1 | 3 | 1 | Br_U_0803_M | BXDMicroArray_ProbeSet_August03 | UTHSC Brain mRNA U74Av2 (Aug03) MAS5 | Brain U74Av2 08/03 MAS5 | 2003-08-01 | NULL | 0 | 0 | NULL | log2 | +| 2 | 10 | 1 | Br_U_0603_M | BXDMicroArray_ProbeSet_June03 | UTHSC Brain mRNA U74Av2 (Jun03) MAS5 | Brain U74Av2 06/03 MAS5 | 2003-06-01 | NULL | 0 | 0 | NULL | log2 | +| 3 | 8 | 1 | Br_U_0303_M | BXDMicroArray_ProbeSet_March03 | UTHSC Brain mRNA U74Av2 (Mar03) MAS5 | Brain U74Av2 03/03 MAS5 | 2003-03-01 | NULL | 0 | 0 | NULL | log2 | +| 4 | 5 | 1 | Br_U_0503_M | BXDMicroArray_ProbeSet_May03 | UTHSC Brain mRNA U74Av2 (May03) MAS5 | Brain U74Av2 05/03 MAS5 | 2003-05-01 | NULL | 0 | 0 | NULL | log2 | +| 5 | 4 | 1 | HC_U_0303_M | GNFMicroArray_ProbeSet_March03 | GNF Hematopoietic Cells U74Av2 (Mar03) MAS5 | GNF U74Av2 03/03 MAS5 | 2003-03-01 | NULL | 0 | 0 | NULL | log2 | ++----+---------------+-------+-------------+---------------------------------+---------------------------------------------+-------------------------+------------+-----------+--------+-----------------+-----------------+-----------+ +``` + +# ProbeSetXRef + +``` +select * from ProbeSetXRef limit 5; ++------------------+------------+--------+------------+--------------------+------------+-------------------+---------------------+-----------------+--------------------+--------+----------------------+------+ +| ProbeSetFreezeId | ProbeSetId | DataId | Locus_old | LRS_old | pValue_old | mean | se | Locus | LRS | pValue | additive | h2 | ++------------------+------------+--------+------------+--------------------+------------+-------------------+---------------------+-----------------+--------------------+--------+----------------------+------+ +| 1 | 1 | 1 | 10.095.400 | 13.3971627898894 | 0.163 | 5.48794285714286 | 0.08525787814808819 | rs13480619 | 12.590069931048001 | 0.269 | -0.28515625 | NULL | +| 1 | 2 | 2 | D15Mit189 | 10.042057464356201 | 0.431 | 9.90165714285714 | 0.0374686634976217 | CEL-17_50896182 | 10.5970737900941 | 0.304 | -0.11678333333333299 | NULL | +| 1 | 3 | 3 | D5Mit139 | 5.43678531742749 | 0.993 | 7.83948571428571 | 0.0457583416912569 | rs13478499 | 6.0970532702754 | 0.988 | 0.112957489878542 | NULL | +| 1 | 4 | 4 | D1Mit511 | 9.87815279480766 | 0.483 | 8.315628571428569 | 0.0470396593931327 | rs6154379 | 11.774867551173099 | 0.286 | -0.157113725490196 | NULL | +| 1 | 5 | 5 | D16H21S16 | 10.191723834264499 | 0.528 | 9.19345714285714 | 0.0354801718293322 | rs4199265 | 10.923263374016202 | 0.468 | 0.11476470588235299 | NULL | ++------------------+------------+--------+------------+--------------------+------------+-------------------+---------------------+-----------------+--------------------+--------+----------------------+------+ +``` + + +Note that the following unlimited search is very slow: + +select max(value) from ProbeSetData; + +``` ++------------+ +| max(value) | ++------------+ +| 26436006 | ++------------+ +1 row in set (2 min 16.31 sec) +``` + +which is in some form is used in the search page, see [[https://github.com/genenetwork/genenetwork2_diet/blob/master/wqflask/wqflask/do_search.py#L811][the search code]]. + + +*** Comments + +I think the ProbeSetData table should be generalized to a 'phenotypes' table with an 'sample_id' column and a 'value' column. + +A new table 'samples' will link each sample against an 'experiment', an 'individual' and which in turn can link to a 'strain'. + +Experiment is here in a wide sense, GTex can be one - I don't want to use dataset ;) + +This means a (slight) reordering: + +``` +phenotypes: (id), sample_id, value +samples: experiment_id, individual_id +experiments: name, version +individual: strain_id +strains: species_id +species: ... +``` + +ProbeData is also interesting, because it has the same structure as ProbeSetData, but only contains microarrays. This tables should be one (when we clear up the cross-referencing) as they both contain phenotype values. Both are large tables. + +PublishData is another phenotype table with values only which can be merged into that same table. This data does not require the annotations of probesets(!) + +=> https://genenetwork.org/show_trait?trait_id=10031&dataset=BXDPublish + +So we have phenotype data in 3 tables with exactly the same +layout. There is also TissueProbeSet*, but we'll ignore those for +now. I think we should merge these into one and have the sample ref +refer to the type of data (probeset, probe, metabolomics, +whatever). These are all phenotype values and by having them split +into different tables they won't play well when looking for +correlations. + +ProbeSet contains the metadata on the probes and should (eventually) +move into NoSQL. There is plenty redundancy in that table now. + +I know it is going to be a pain to reorganize the database, but if we +want to use it in the long run we are going to have to simplify it. + +# ProbeSetFreeze and ProbeFreeze (/dataset/name.json) + +GN_SERVER: /dataset/HC_M2_0606_P.json + +ProbesetFreeze contains DataSet information, such as name, fullname of +datasets, as well as whether they are public and how the data is +scaled: + +``` +select * from ProbeSetFreeze; +| Id | ProbeFreezeId | AvgID | Name | Name2 | FullName | ShortName | CreateTime | OrderList | public | confidentiality | AuthorisedUsers | DataScale | +| 112 | 30 | 2 | HC_M2_0606_P | Hippocampus_M430_V2_BXD_PDNN_Jun06 | Hippocampus Consortium M430v2 (Jun06) PDNN | Hippocampus M430v2 BXD 06/06 PDNN | 2006-06-23 | NULL | 2 | 0 | NULL | log2 | +``` + +Another table contains a tissue reference and a back reference to the cross +type: + +``` +select * from ProbeFreeze; +| Id | ProbeFreezeId | ChipId | TissueId | Name | FullName | ShortName | CreateTime | InbredSetId | +| 30 | 30 | 4 | 9 | Hippocampus Consortium M430v2 Probe (Jun06) | | | 2006-07-07 | 1 | +``` + +NOTE: these tables can probably be merged into one. + +``` +show indexes from ProbeSetFreeze; ++----------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | ++----------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| ProbeSetFreeze | 0 | PRIMARY | 1 | Id | A | 2 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 0 | FullName | 1 | FullName | A | 2 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 0 | Name | 1 | Name | A | 2 | NULL | NULL | YES | BTREE | | | +| ProbeSetFreeze | 1 | NameIndex | 1 | Name2 | A | 2 | NULL | NULL | | BTREE | | | ++----------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +``` + +# ProbeSetSE + +``` +select * from ProbeSetSE limit 5; ++--------+----------+----------+ +| DataId | StrainId | error | ++--------+----------+----------+ +| 1 | 1 | 0.681091 | +| 1 | 2 | 0.361151 | +| 1 | 3 | 0.364342 | +| 1 | 4 | 0.827588 | +| 1 | 5 | 0.303492 | ++--------+----------+----------+ +``` + +# More information + +For the other tables, you may check the GN2/doc/database.org document (the starting point for this document). + +# Contributions regarding data upload to the GeneNetwork webserver +* Ideas shared by the GeneNetwork team to facilitate the process of uploading data to production + +## Quality check and integrity of the data to be uploaded to gn2 + +* A note to add (from Arthur): Some datasets have the following identifiers: ProbeSet IDs {chr_3020701, chr_3020851, etc}. This is not an acceptable way to name the probeset IDs. So, the data provider needs to understand what format is needed for gn2 to accept the ProbeSet IDs in their dataset +* Also, for the annotation file, among other important columns, it is crucial that there are descriptions, aliases, and location columns. And the formatting should be exactly as found in the public repositories such as NCBI, Ensembl, etc. For instance, for description: `X-linked Kx blood group related 4`, and Aliases: ` XRG4; Gm210; mKIAA1889` as in +=> https://www.ncbi.nlm.nih.gov/gene/497097 + +## Valid ProbeSetIDs + +* The official ProbeSetIDs would be the one from the vendor. This would also constitute the platform used to generate data {Novogene-specific platform}, for instance; `NovaSeqPE150` for the MBD UTHSC mice seq dataset +* NB; in this case, if the vendor does not provide the official names as expected, we can use the platform + the numbering order of the file to generate probeset IDs. For instance; `NseqPE150_000001 to NseqPE150_432694` for samples 1 to 432694 +* Avoid IDs with meaning, e.g. =chr1_3020701= → Chromosome 1 at 3020701 base pairs. Prefer IDs with no meaning + +## The importance of having unique identifiers within a platform + +* Unique identifiers solve the hurdles that come with having duplicate genes. So, the QA tools in place should ensure the uploaded dataset adheres to the requirements mentioned +* However, newer RNA-seq data sets generated by sequencing do not usually have an official vendor identifier. The identifier is usually based on the NCBI mRNA model (NM_XXXXXX) that was used to evaluate an expression and on the sequence that is involved, usually the start and stop nucleotide positions based on a specific genome assembly or just a suffix to make sure it is unique. In this case, you are looking at mRNA assays for a single transcript, but different parts of the transcript that have different genome coordinates. We now typically use ENSEMBL identifiers. +* The mouse version of the sonic hedgehog gene as an example: `ENSMUST00000002708` or `ENSMUSG00000002633` sources should be fine. The important thing is to know the provenance of the ID—who is in charge of that ID type? +* When a mRNA assay is super precise (one exon only or a part of the 5' UTR), then we should use exon identifiers from ENSEMBL probably. +* Ideally, we should enter the sequence's first and last 100 nt in GeneNetwork for verification and alignment. We did this religiously for arrays, but have started to get lazy now. The sequence is the ultimate identifier +* For methylation arrays and CpG assays, we can use this format `cg14050475` as seen in MBD UTHSC Ben's data +* For metabolites like isoleucine—the ID we have been using is the mass-to-charge (MZ) ratio such as `130.0874220_MZ` +* For protein and peptide identifiers we have used the official Protein ID followed by an underscore character and then some or all of the sequence. This is then followed by another underscore and a number. Evan to confirm, but the suffix number is the charge state if I remember correctly +``` +Q9JHJ3_LLHTADVCQLEVALVGASPR_3 +A2A8E1_TIVEFECR_2 +A2A8E1_ATLENVTNLRPVGEDFR_3 +A2A8E1_ENSIDILSSTIK_2 +``` +* But in older protein expression databases Evan and the team used a different method +``` +abcb10_q9ji39_t311 +abcb10_q9ji39_t312 +``` +* The above is just the gene symbol then the protein ID and not so sure what t311 and t312 mean +* Ideally these IDs are explained to some extent when they embed some information + + + +## BXD individuals + +* Basically groups (represented by the InbredSet tables) are primarily defined by their list of samples/strains (represented by the Strain tables). When we create a new group, it's because we have data with a distinct set of samples/strains from any existing groups. +* So when we receive data for BXD individuals, as far as the database is concerned they are a completely separate group (since the list of samples is new/distinct from any other existing groups). We can choose to also enter it as part of the "generic" BXD group (by converting it to strain means/SEs using the strain of each individual, assuming it's provided like in the files Arthur was showing us). +* This same logic could apply to other groups as well - we could choose to make one group the "strain mean" group for another set of groups that contain sample data for individuals. But the database doesn't reflect the relationship between these groups* +* As far as the database is concerned, there is no distinction between strain means and individual sample data - they're all rows in the ProbeSetData/PublishData tables. The only difference is that strain mean data will probably also have an SE value in the ProbeSetSE/PublishSE tables and/or an N (number of individuals per strain) value in the NStrain table +* As for what this means for the uploader - I think it depends on whether Rob/Arthur/etc wants to give users the ability to simultaneously upload both strain mean and individual data. For example, if someone uploads some BXD individuals' data, do we want the uploader to both create a new group for this (or add to an existing BXD individuals group) and calculate the strain means/SE and enter it into the "main" BXD group? My personal feeling is that it's probably best to postpone that for later and only upload the data with the specific set of samples indicated in the file since it would insert some extra complexity to the uploading process that could always be added later (since the user would need to select "the group the strains are from" as a separate option) +* The relationship is sorta captured in the CaseAttribute and CaseAttributeXRefNew tables (which contain sample metadata), but only in the form of the metadata that is sometimes displayed as extra columns in the trait page table - this data isn't used in any queries/analyses currently (outside of some JS filters run on the table itself) and isn't that important as part of the uploading process (or at least can be postponed) + +## Individual Datasets and Derivatives datasets in gn2 +* Individual dataset reflects the actual data provided or submitted by the investigator (user). Derivative datasets include the processed information from the individual dataset, as in the case of the average datasets. +* An example of an individual dataset would look something like; (MBD dataset) +``` +#+begin_example +sample, strain, Sex, Age,… +FEB0001,BXD48a,M,63,… +FEB0002,BXD48a,M,15,… +FEB0003,BXD48a,F,22,… +FEB0004,BXD16,M,39,… +FEB0005,BXD16,F,14,… +⋮ +#+end_example +``` +* The strain column above has repetitive values. Each value has a one-to-many relationship with values on sample column. From this dataset, there can be several derivatives. For example; +- Sex-based categories +- Average data (3 sample values averaged to one strain value) +- Standard error table computed for the averages + +## Saving data to database +* Strain table schema +``` +#+begin_src sql + MariaDB [db_webqtl]> DESC Strain; + +-----------+----------------------+------+-----+---------+----------------+ + | Field | Type | Null | Key | Default | Extra | + +-----------+----------------------+------+-----+---------+----------------+ + | Id | int(20) | NO | PRI | NULL | auto_increment | + | Name | varchar(100) | YES | MUL | NULL | | + | Name2 | varchar(100) | YES | | NULL | | + | SpeciesId | smallint(5) unsigned | NO | | 0 | | + | Symbol | varchar(20) | YES | MUL | NULL | | + | Alias | varchar(255) | YES | | NULL | | + +-----------+----------------------+------+-----+---------+----------------+ + 6 rows in set (0.00 sec) +#+end_src +``` +* For the *individual data*, the =sample= field would be saved as =Name= and the =strain= would be saved as =Name2=. These records would then all be linked to an inbredset group (population?) in the =InbredSet= table via the =StrainXRef= table, whose schema is as follows: +``` +#+begin_src sql + MariaDB [db_webqtl]> DESC StrainXRef; + +------------------+----------------------+------+-----+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +------------------+----------------------+------+-----+---------+-------+ + | InbredSetId | smallint(5) unsigned | NO | PRI | 0 | | + | StrainId | int(20) | NO | PRI | NULL | | + | OrderId | int(20) | YES | | NULL | | + | Used_for_mapping | char(1) | YES | | N | | + | PedigreeStatus | varchar(255) | YES | | NULL | | + +------------------+----------------------+------+-----+---------+-------+ + 5 rows in set (0.00 sec) +#+end_src +``` +* Where the =InbredSetId= comes from the =InbredSet= table and the =StrainId= comes from the =Strain= table. The *individual data* would be linked to an inbredset group that is for individuals +* For the *average data*, the only value to save would be the =strain= field, which would be saved as =Name= in the =Strain= table and linked to an InbredSet group that is for averages +*Question 01*: How do we distinguish the inbredset groups? +*Answer*: The =Family= field is useful for this. + +*Question 02*: If you have more derived "datasets", e.g. males-only, females-only, under-10-years, 10-to-25-years, etc. How would the =Strains= table handle all those differences? + +## Metadata +* The data we looked at had =gene id= and =gene symbol= fields. These fields were used to fetch the *Ensembl ID* and *descriptions* from [[https://www.ncbi.nlm.nih.gov/][NCBI]] and the [[https://useast.ensembl.org/][Ensembl Genome Browser]] + +## Files for mapping +* Files used for mapping need to be in =bimbam= or =.geno= formats. We would need to do conversions to at least one of these formats where necessary + +## Annotation files +* Consider the following schema of DB tables +#+begin_src sql + MariaDB [db_webqtl]> DESC InbredSet; + +-----------------+----------------------+------+-----+---------+----------------+ + | Field | Type | Null | Key | Default | Extra | + +-----------------+----------------------+------+-----+---------+----------------+ + | Id | smallint(5) unsigned | NO | PRI | NULL | auto_increment | + | InbredSetId | int(5) unsigned | NO | | NULL | | + | InbredSetName | varchar(100) | YES | | NULL | | + | Name | char(30) | NO | | | | + | SpeciesId | smallint(5) unsigned | YES | | 1 | | + | FullName | varchar(100) | YES | | NULL | | + | public | tinyint(3) unsigned | YES | | 2 | | + | MappingMethodId | char(50) | YES | | 1 | | + | GeneticType | varchar(255) | YES | | NULL | | + | Family | varchar(100) | YES | | NULL | | + | FamilyOrder | int(5) | YES | | NULL | | + | MenuOrderId | double | NO | | NULL | | + | InbredSetCode | varchar(5) | YES | | NULL | | + | Description | longtext | YES | | NULL | | + +-----------------+----------------------+------+-----+---------+----------------+ + ⋮ + MariaDB [db_webqtl]> DESC Strain; + +-----------+----------------------+------+-----+---------+----------------+ + | Field | Type | Null | Key | Default | Extra | + +-----------+----------------------+------+-----+---------+----------------+ + | Id | int(20) | NO | PRI | NULL | auto_increment | + | Name | varchar(100) | YES | MUL | NULL | | + | Name2 | varchar(100) | YES | | NULL | | + | SpeciesId | smallint(5) unsigned | NO | | 0 | | + | Symbol | varchar(20) | YES | MUL | NULL | | + | Alias | varchar(255) | YES | | NULL | | + +-----------+----------------------+------+-----+---------+----------------+ + ⋮ + MariaDB [db_webqtl]> DESC StrainXRef; + +------------------+----------------------+------+-----+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +------------------+----------------------+------+-----+---------+-------+ + | InbredSetId | smallint(5) unsigned | NO | PRI | 0 | | + | StrainId | int(20) | NO | PRI | NULL | | + | OrderId | int(20) | YES | | NULL | | + | Used_for_mapping | char(1) | YES | | N | | + | PedigreeStatus | varchar(255) | YES | | NULL | | + +------------------+----------------------+------+-----+---------+-------+ +#+end_src + +* The =StrainXRef= table creates a link between the Samples/cases/individuals (stored in the =Strain= table) to the group (population?) they belong to in the =InbredSet= table +* Steps to prepare the TSV file for entering samples/cases into the database are: +- Clean up =Name= of the samples/cases/individuals in the file: + - Names should have no spaces + - Names should be the same length of characters: pad those that are shorter e.g. *SampleName12* → *SampleName012* to fit in with other names if, say, the samples range from 1 to 999. In a similar vein, you'd rename *SampleName1* to *SampleName001* +- Order samples by the names +- Create a new column, say, =orderId= in the TSV, and assign the order *1, 2, 3, …, n* for the rows, from the first to the "n^{th}" row. The order of the strains is very important and must be maintained +- retrieve the largest current =Id= value in the =Strain= table +- Increment by one (1) and assign that to the first row of your ordered data + - Assign subsequent rows, the subsequent values for the ID e.g. Assuming the largest =Id= value in the =Strain= table was *23*, the first row of the new data would have the id *24*. The second row would have *25*, the third, *26* and so on +- Get the =InbredSetId= for your samples' data. Add a new column in the data and copy this value for all rows +- Enter data into the =Strain= table +- Using the previously computed strain ID values, and the =InbredSetId= previously copied, enter data into the =StrainXRef= table + +* Some notes on the data: +- The =Symbol= field in the =Strain= table corresponds to the =Strain= field in the annotation file +- The =used_for_mapping= field should be set to ~Y~ unless otherwise informed +- The =PedigreeStatus= field is unknown to us for now: set to ~NULL~ + +* Annotation file format +The important fields are: +- =ChipId=: The platform that the data was collected from/with +Consider the following table; +#+begin_src sql + MariaDB [db_webqtl]> DESC GeneChip; + +---------------+----------------------+------+-----+---------+----------------+ + | Field | Type | Null | Key | Default | Extra | + +---------------+----------------------+------+-----+---------+----------------+ + | Id | smallint(5) unsigned | NO | PRI | NULL | auto_increment | + | GeneChipId | int(5) | YES | | NULL | | + | GeneChipName | varchar(200) | YES | | NULL | | + | Name | char(30) | NO | | | | + | GeoPlatform | char(15) | YES | | NULL | | + | Title | varchar(100) | YES | | NULL | | + | SpeciesId | int(5) | YES | | 1 | | + | GO_tree_value | varchar(50) | YES | | NULL | | + +---------------+----------------------+------+-----+---------+----------------+ + #+end_src + Some of the important fields that were highlighted were: + - =GeoPlatform=: Links the details of the platform in our database with NCBI's [[https://www.ncbi.nlm.nih.gov/geo/][Gene Ontology Omnibus (GEO)]] system. This is not always possible, but where we can, it would be nice to have this field populated + - =GO_tree_value=: This is supposed to link the detail we have with some external system "GO". I have not figured this one out on my own and will need to follow up on it. + - =Name=: The name corresponds to the =ProbeSetId=, and we want this to be the same value as the identifier on the [[https://www.ensembl.org][Ensembl genome browser]], e.g. For a gene, say =Shh=, for *mouse*, we want the =Name= value to be a variation on [[https://useast.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000002633;r=5:28661813-28672254;t=ENSMUST00000002708][*ENSMUSG00000002633*]] + - =Probe_set_Blat_Mb_start=/=Probe_set_Blat_Mb_end=: In Byron's and Beni's data, these correspond to the =geneStart= and =geneEnd= fields respectively. These are the positions, in megabasepairs, that the gene begins and ends at, respectively. + - =Mb=: This is the =geneStart=/=Probe_set_Blat_Mb_start= value divided by *1000000*. (*Note to self*: Maybe the Probe_set_Blat_Mb_* fields above might not be in megabase pairs — please confirm) + - =Strand_Probe= and =Strand_Gene=: These fields' values are simply ~+~ or ~-~. If these values are missing, you can [[https://ftp.ncbi.nih.gov/gene/README][retrieve them from NCBI]], specifically from the =orientation= field of seemingly any text file with the field + - =Chr=: This is the chromosome on which the gene is found + +* The final annotation file will have (at minimum) the following fields (or their +analogs): +- =StrainName= +- =OrderId= +- =StrainId=: from the database +- =InbredSetId=: from the database +- =Symbol=: This could be named =Strain= +- =GeneChipId=: from the database +- =EnsemblId=: from the Ensembl genome browser +- =Probe_set_Blat_Mb_start=: possible analogue is =geneStart= +- =Probe_set_Blat_Mb_end=: possible analogue is =geneEnd= +- =Mb= +- =Strand_Probe= +- =Strand_Gene= +- =Chr= + +* =.geno= Files +- The =.geno= files have sample names, not the strain/symbol. The =Locus= field in the =.geno= file corresponds to the **marker**. =.geno= files are used with =QTLReaper= +- The sample names in the ~.geno~ files *MUST* be in the same order as the +strains/symbols for that species. For example; +Data format is as follows; +``` +#+begin_example +SampleName,Strain,… +⋮ +BJCWI0001,BXD40,… +BJCWI0002,BXD40,… +BJCWI0003,BXD33,… +BJCWI0004,BXD50,… +BJCWI0005,BXD50,… +⋮ +#+end_example +``` +and the order of strains is as follows; +``` +#+begin_example +…,BXD33,…,BXD40,…,BXD50,… +#+end_example +``` +then, the ~.geno~ file generated by this data should have a form such as shown +below; +``` +#+begin_example +…,BJCWI0003,…,BJCWI0001,BJCWI0002,…,BJCWI0004,BJCWI0005,… +#+end_example +``` +The order of samples that belong to the same strain is irrelevant - they share the same data, i.e. the order below is also valid; +``` +#+begin_example +…,BJCWI0003,…,BJCWI0002,BJCWI0001,…,BJCWI0004,BJCWI0005,… +#+end_example +``` +* =BimBam= Files +- Used with =GEMMA= +* Case Attributes +- These are metadata about every case/sample/individual in an InbredSet group. The metadata is any data that has nothing to do with phenotypes (e.g. height, weight, etc) that is useful for researchers to have in order to make sense of the data. +- Examples of case attributes: + - Treatment + - Sex (Really? Isn't sex an expression of genes?) + - batch + - Case ID, etc + +* Summary steps to load data to the database +- [x] Create *InbredSet* group (think population) +- [x] Load the strains/samples data +- [x] Load the sample cross-reference data to link the samples to their + InbredSet group +- Load the case-attributes data +- [x] Load the annotation (data into ProbeSet table) +- [x] Create the study for the data (At around this point, the InbredSet group + will show up in the UI). +- [x] Create the Dataset for the data +- [x] Load the *Log2* data (ProbeSetData and ProbeSetXRef tables) +- [x] Compute means (an SQL query was used — this could be pre-computed in code + and entered along with the data) +- [x] Run QTLReaper diff --git a/topics/database/setting-up-local-development-database.gmi b/topics/database/setting-up-local-development-database.gmi index 3b743b9..9ebb48b 100644 --- a/topics/database/setting-up-local-development-database.gmi +++ b/topics/database/setting-up-local-development-database.gmi @@ -41,7 +41,12 @@ Setting up mariadb in a Guix container is the preferred and easier method. But, ``` $ sudo $(./containers/db-container.sh) ``` -You should now be able to connect to the database using +By default, mariadb allows passwordless login for root only on the local machine. So, enter the container using guix container exec and set the root password to a blank. +``` +$ mysql -u root +MariaDB [(none)]> SET PASSWORD = PASSWORD(""); +``` +You should now be able to connect to the database from outside the container using ``` $ mysql --protocol tcp -u root ``` diff --git a/topics/database/sql.svg b/topics/database/sql.svg new file mode 100644 index 0000000..b7ab96e --- /dev/null +++ b/topics/database/sql.svg @@ -0,0 +1,2558 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" + "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> +<!-- Generated by graphviz version 2.49.0 (20210828.1703) + --> +<!-- Title: schema Pages: 1 --> +<svg width="13704pt" height="5921pt" + viewBox="0.00 0.00 13703.50 5921.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> +<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 5917)"> +<title>schema</title> +<polygon fill="white" stroke="transparent" points="-4,4 -4,-5917 13699.5,-5917 13699.5,4 -4,4"/> +<!-- NStrain --> +<g id="node1" class="node"> +<title>NStrain</title> +<polygon fill="white" stroke="transparent" points="6648.5,-1918 6648.5,-2008 6775.5,-2008 6775.5,-1918 6648.5,-1918"/> +<polygon fill="#df65b0" stroke="transparent" points="6652,-1984 6652,-2005 6773,-2005 6773,-1984 6652,-1984"/> +<polygon fill="none" stroke="black" points="6652,-1984 6652,-2005 6773,-2005 6773,-1984 6652,-1984"/> +<text text-anchor="start" x="6655" y="-1990.8" font-family="Times,serif" font-size="14.00">NStrain (9 MiB)</text> +<text text-anchor="start" x="6692.5" y="-1968.8" font-family="Times,serif" font-size="14.00">count</text> +<text text-anchor="start" x="6688" y="-1947.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="6683" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="6648.5,-1918 6648.5,-2008 6775.5,-2008 6775.5,-1918 6648.5,-1918"/> +</g> +<!-- Strain --> +<g id="node40" class="node"> +<title>Strain</title> +<polygon fill="lightgrey" stroke="transparent" points="5728.5,-765.5 5728.5,-918.5 5843.5,-918.5 5843.5,-765.5 5728.5,-765.5"/> +<polygon fill="#df65b0" stroke="transparent" points="5732,-894 5732,-915 5841,-915 5841,-894 5732,-894"/> +<polygon fill="none" stroke="black" points="5732,-894 5732,-915 5841,-915 5841,-894 5732,-894"/> +<text text-anchor="start" x="5735" y="-900.8" font-family="Times,serif" font-size="14.00">Strain (2 MiB)</text> +<polygon fill="green" stroke="transparent" points="5732,-873 5732,-892 5841,-892 5841,-873 5732,-873"/> +<text text-anchor="start" x="5769" y="-878.8" font-family="Times,serif" font-size="14.00">Alias</text> +<polygon fill="green" stroke="transparent" points="5732,-852 5732,-871 5841,-871 5841,-852 5732,-852"/> +<text text-anchor="start" x="5765" y="-857.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="green" stroke="transparent" points="5732,-831 5732,-850 5841,-850 5841,-831 5732,-831"/> +<text text-anchor="start" x="5760.5" y="-836.8" font-family="Times,serif" font-size="14.00">Name2</text> +<polygon fill="green" stroke="transparent" points="5732,-810 5732,-829 5841,-829 5841,-810 5732,-810"/> +<text text-anchor="start" x="5759.5" y="-815.8" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="5779" y="-794.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="5751.5" y="-773.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<polygon fill="none" stroke="black" points="5728.5,-765.5 5728.5,-918.5 5843.5,-918.5 5843.5,-765.5 5728.5,-765.5"/> +</g> +<!-- NStrain->Strain --> +<g id="edge1" class="edge"> +<title>NStrain:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6651,-1930C6610.43,-1930 6653.88,-1233.5 6631,-1200 6450.66,-935.96 6033.45,-866.5 5861.83,-848.81"/> +<polygon fill="black" stroke="black" points="5861.92,-845.3 5851.62,-847.79 5861.23,-852.27 5861.92,-845.3"/> +</g> +<!-- roles_users --> +<g id="node2" class="node"> +<title>roles_users</title> +<polygon fill="white" stroke="transparent" points="7071.5,-4853 7071.5,-4922 7204.5,-4922 7204.5,-4853 7071.5,-4853"/> +<polygon fill="#f1eef6" stroke="transparent" points="7075,-4897.5 7075,-4918.5 7202,-4918.5 7202,-4897.5 7075,-4897.5"/> +<polygon fill="none" stroke="black" points="7075,-4897.5 7075,-4918.5 7202,-4918.5 7202,-4897.5 7075,-4897.5"/> +<text text-anchor="start" x="7078" y="-4904.3" font-family="Times,serif" font-size="14.00">roles_users (0 B)</text> +<text text-anchor="start" x="7114" y="-4882.3" font-family="Times,serif" font-size="14.00">role_id</text> +<text text-anchor="start" x="7112.5" y="-4861.3" font-family="Times,serif" font-size="14.00">user_id</text> +<polygon fill="none" stroke="black" points="7071.5,-4853 7071.5,-4922 7204.5,-4922 7204.5,-4853 7071.5,-4853"/> +</g> +<!-- role --> +<g id="node58" class="node"> +<title>role</title> +<polygon fill="white" stroke="transparent" points="7093.5,-3249 7093.5,-3339 7184.5,-3339 7184.5,-3249 7093.5,-3249"/> +<polygon fill="#f1eef6" stroke="transparent" points="7097,-3315 7097,-3336 7182,-3336 7182,-3315 7097,-3315"/> +<polygon fill="none" stroke="black" points="7097,-3315 7097,-3336 7182,-3336 7182,-3315 7097,-3315"/> +<text text-anchor="start" x="7106" y="-3321.8" font-family="Times,serif" font-size="14.00">role (0 B)</text> +<text text-anchor="start" x="7099" y="-3299.8" font-family="Times,serif" font-size="14.00">description</text> +<text text-anchor="start" x="7119.5" y="-3278.8" font-family="Times,serif" font-size="14.00">name</text> +<text text-anchor="start" x="7117.5" y="-3257.8" font-family="Times,serif" font-size="14.00">the_id</text> +<polygon fill="none" stroke="black" points="7093.5,-3249 7093.5,-3339 7184.5,-3339 7184.5,-3249 7093.5,-3249"/> +</g> +<!-- roles_users->role --> +<g id="edge2" class="edge"> +<title>roles_users:role_id->role</title> +<path fill="none" stroke="black" d="M7203,-4885.5C7242.13,-4885.5 7161.86,-3639.62 7142.89,-3353.21"/> +<polygon fill="black" stroke="black" points="7146.37,-3352.78 7142.22,-3343.03 7139.39,-3353.24 7146.37,-3352.78"/> +</g> +<!-- User --> +<g id="node60" class="node"> +<title>User</title> +<polygon fill="white" stroke="transparent" points="7244,-3175.5 7244,-3412.5 7354,-3412.5 7354,-3175.5 7244,-3175.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="7247,-3388 7247,-3409 7351,-3409 7351,-3388 7247,-3388"/> +<polygon fill="none" stroke="black" points="7247,-3388 7247,-3409 7351,-3409 7351,-3388 7247,-3388"/> +<text text-anchor="start" x="7250" y="-3394.8" font-family="Times,serif" font-size="14.00">User (28 KiB)</text> +<text text-anchor="start" x="7260" y="-3372.8" font-family="Times,serif" font-size="14.00">createtime</text> +<text text-anchor="start" x="7273" y="-3351.8" font-family="Times,serif" font-size="14.00">disable</text> +<text text-anchor="start" x="7279" y="-3330.8" font-family="Times,serif" font-size="14.00">email</text> +<text text-anchor="start" x="7265.5" y="-3309.8" font-family="Times,serif" font-size="14.00">grpName</text> +<text text-anchor="start" x="7292" y="-3288.8" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="7268" y="-3267.8" font-family="Times,serif" font-size="14.00">lastlogin</text> +<text text-anchor="start" x="7279" y="-3246.8" font-family="Times,serif" font-size="14.00">name</text> +<text text-anchor="start" x="7264.5" y="-3225.8" font-family="Times,serif" font-size="14.00">password</text> +<text text-anchor="start" x="7267" y="-3204.8" font-family="Times,serif" font-size="14.00">privilege</text> +<text text-anchor="start" x="7273" y="-3183.8" font-family="Times,serif" font-size="14.00">user_ip</text> +<polygon fill="none" stroke="black" points="7244,-3175.5 7244,-3412.5 7354,-3412.5 7354,-3175.5 7244,-3175.5"/> +</g> +<!-- roles_users->User --> +<g id="edge3" class="edge"> +<title>roles_users:user_id->User</title> +<path fill="none" stroke="black" d="M7139,-4854.5C7139,-4323.12 7232.06,-3695.19 7276.24,-3427.05"/> +<polygon fill="black" stroke="black" points="7279.74,-3427.32 7277.92,-3416.88 7272.83,-3426.18 7279.74,-3427.32"/> +</g> +<!-- SnpAllRat --> +<g id="node3" class="node"> +<title>SnpAllRat</title> +<polygon fill="white" stroke="transparent" points="2716,-702.5 2716,-981.5 2876,-981.5 2876,-702.5 2716,-702.5"/> +<polygon fill="#df65b0" stroke="transparent" points="2719,-957 2719,-978 2873,-978 2873,-957 2719,-957"/> +<polygon fill="none" stroke="black" points="2719,-957 2719,-978 2873,-978 2873,-957 2719,-957"/> +<text text-anchor="start" x="2722" y="-963.8" font-family="Times,serif" font-size="14.00">SnpAllRat (908 MiB)</text> +<text text-anchor="start" x="2772" y="-941.8" font-family="Times,serif" font-size="14.00">Alleles</text> +<text text-anchor="start" x="2749" y="-920.8" font-family="Times,serif" font-size="14.00">Chromosome</text> +<text text-anchor="start" x="2728" y="-899.8" font-family="Times,serif" font-size="14.00">ConservationScore</text> +<text text-anchor="start" x="2768.5" y="-878.8" font-family="Times,serif" font-size="14.00">Domain</text> +<text text-anchor="start" x="2764" y="-857.8" font-family="Times,serif" font-size="14.00">Function</text> +<text text-anchor="start" x="2777.5" y="-836.8" font-family="Times,serif" font-size="14.00">Gene</text> +<text text-anchor="start" x="2788.5" y="-815.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2767" y="-794.8" font-family="Times,serif" font-size="14.00">Position</text> +<text text-anchor="start" x="2761" y="-773.8" font-family="Times,serif" font-size="14.00">SnpName</text> +<text text-anchor="start" x="2771" y="-752.8" font-family="Times,serif" font-size="14.00">Source</text> +<text text-anchor="start" x="2761" y="-731.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="2758.5" y="-710.8" font-family="Times,serif" font-size="14.00">Transcript</text> +<polygon fill="none" stroke="black" points="2716,-702.5 2716,-981.5 2876,-981.5 2876,-702.5 2716,-702.5"/> +</g> +<!-- Species --> +<g id="node33" class="node"> +<title>Species</title> +<polygon fill="lightgrey" stroke="transparent" points="2734,-201 2734,-396 2858,-396 2858,-201 2734,-201"/> +<polygon fill="#f1eef6" stroke="transparent" points="2737,-371.5 2737,-392.5 2855,-392.5 2855,-371.5 2737,-371.5"/> +<polygon fill="none" stroke="black" points="2737,-371.5 2737,-392.5 2855,-392.5 2855,-371.5 2737,-371.5"/> +<text text-anchor="start" x="2740" y="-378.3" font-family="Times,serif" font-size="14.00">Species (796 B)</text> +<polygon fill="green" stroke="transparent" points="2737,-350.5 2737,-369.5 2855,-369.5 2855,-350.5 2737,-350.5"/> +<text text-anchor="start" x="2761" y="-356.3" font-family="Times,serif" font-size="14.00">FullName</text> +<polygon fill="green" stroke="transparent" points="2737,-329.5 2737,-348.5 2855,-348.5 2855,-329.5 2737,-329.5"/> +<text text-anchor="start" x="2754.5" y="-335.3" font-family="Times,serif" font-size="14.00">MenuName</text> +<polygon fill="green" stroke="transparent" points="2737,-308.5 2737,-327.5 2855,-327.5 2855,-308.5 2737,-308.5"/> +<text text-anchor="start" x="2747.5" y="-314.3" font-family="Times,serif" font-size="14.00">SpeciesName</text> +<text text-anchor="start" x="2788.5" y="-293.3" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="green" stroke="transparent" points="2737,-266.5 2737,-285.5 2855,-285.5 2855,-266.5 2737,-266.5"/> +<text text-anchor="start" x="2774.5" y="-272.3" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="2767.5" y="-251.3" font-family="Times,serif" font-size="14.00">OrderId</text> +<text text-anchor="start" x="2761" y="-230.3" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="2752.5" y="-209.3" font-family="Times,serif" font-size="14.00">TaxonomyId</text> +<polygon fill="none" stroke="black" points="2734,-201 2734,-396 2858,-396 2858,-201 2734,-201"/> +</g> +<!-- SnpAllRat->Species --> +<g id="edge4" class="edge"> +<title>SnpAllRat:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M2874,-735C2906.96,-735 2860.65,-539.2 2826.56,-410.18"/> +<polygon fill="black" stroke="black" points="2829.87,-409 2823.92,-400.23 2823.1,-410.8 2829.87,-409"/> +</g> +<!-- SampleXRef --> +<g id="node4" class="node"> +<title>SampleXRef</title> +<polygon fill="white" stroke="transparent" points="3272,-3259.5 3272,-3328.5 3426,-3328.5 3426,-3259.5 3272,-3259.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3275,-3304 3275,-3325 3423,-3325 3423,-3304 3275,-3304"/> +<polygon fill="none" stroke="black" points="3275,-3304 3275,-3325 3423,-3325 3423,-3304 3275,-3304"/> +<text text-anchor="start" x="3278" y="-3310.8" font-family="Times,serif" font-size="14.00">SampleXRef (4 KiB)</text> +<text text-anchor="start" x="3296" y="-3288.8" font-family="Times,serif" font-size="14.00">ProbeFreezeId</text> +<text text-anchor="start" x="3315" y="-3267.8" font-family="Times,serif" font-size="14.00">SampleId</text> +<polygon fill="none" stroke="black" points="3272,-3259.5 3272,-3328.5 3426,-3328.5 3426,-3259.5 3272,-3259.5"/> +</g> +<!-- ProbeFreeze --> +<g id="node42" class="node"> +<title>ProbeFreeze</title> +<polygon fill="white" stroke="transparent" points="2611,-1855 2611,-2071 2777,-2071 2777,-1855 2611,-1855"/> +<polygon fill="#d7b5d8" stroke="transparent" points="2614,-2047 2614,-2068 2774,-2068 2774,-2047 2614,-2047"/> +<polygon fill="none" stroke="black" points="2614,-2047 2614,-2068 2774,-2068 2774,-2047 2614,-2047"/> +<text text-anchor="start" x="2617" y="-2053.8" font-family="Times,serif" font-size="14.00">ProbeFreeze (30 KiB)</text> +<text text-anchor="start" x="2670" y="-2031.8" font-family="Times,serif" font-size="14.00">ChipId</text> +<text text-anchor="start" x="2652" y="-2010.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="2659" y="-1989.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="2686.5" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2651" y="-1947.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="2672.5" y="-1926.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="2641" y="-1905.8" font-family="Times,serif" font-size="14.00">ProbeFreezeId</text> +<text text-anchor="start" x="2653" y="-1884.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<text text-anchor="start" x="2663.5" y="-1863.8" font-family="Times,serif" font-size="14.00">TissueId</text> +<polygon fill="none" stroke="black" points="2611,-1855 2611,-2071 2777,-2071 2777,-1855 2611,-1855"/> +</g> +<!-- SampleXRef->ProbeFreeze --> +<g id="edge5" class="edge"> +<title>SampleXRef:ProbeFreezeId->ProbeFreeze</title> +<path fill="none" stroke="black" d="M3274,-3292C3032.87,-3292 3338.17,-2922.26 3158,-2762 3097.26,-2707.98 2852.39,-2782.55 2794,-2726 2622.74,-2560.12 2641.84,-2254.55 2669,-2085.12"/> +<polygon fill="black" stroke="black" points="2672.47,-2085.6 2670.63,-2075.16 2665.56,-2084.47 2672.47,-2085.6"/> +</g> +<!-- Sample --> +<g id="node95" class="node"> +<title>Sample</title> +<polygon fill="white" stroke="transparent" points="3653.5,-1792 3653.5,-2134 3782.5,-2134 3782.5,-1792 3653.5,-1792"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3657,-2110 3657,-2131 3780,-2131 3780,-2110 3657,-2110"/> +<polygon fill="none" stroke="black" points="3657,-2110 3657,-2131 3780,-2131 3780,-2110 3657,-2110"/> +<text text-anchor="start" x="3660" y="-2116.8" font-family="Times,serif" font-size="14.00">Sample (53 KiB)</text> +<text text-anchor="start" x="3704.5" y="-2094.8" font-family="Times,serif" font-size="14.00">Age</text> +<text text-anchor="start" x="3688" y="-2073.8" font-family="Times,serif" font-size="14.00">CELURL</text> +<text text-anchor="start" x="3686.5" y="-2052.8" font-family="Times,serif" font-size="14.00">CHPURL</text> +<text text-anchor="start" x="3676.5" y="-2031.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="3688" y="-2010.8" font-family="Times,serif" font-size="14.00">DATURL</text> +<text text-anchor="start" x="3688" y="-1989.8" font-family="Times,serif" font-size="14.00">EXPURL</text> +<text text-anchor="start" x="3687" y="-1968.8" font-family="Times,serif" font-size="14.00">FromSrc</text> +<text text-anchor="start" x="3711" y="-1947.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3680.5" y="-1926.8" font-family="Times,serif" font-size="14.00">ImageURL</text> +<text text-anchor="start" x="3697" y="-1905.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="3688" y="-1884.8" font-family="Times,serif" font-size="14.00">RPTURL</text> +<text text-anchor="start" x="3705" y="-1863.8" font-family="Times,serif" font-size="14.00">Sex</text> +<text text-anchor="start" x="3689" y="-1842.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="3678" y="-1821.8" font-family="Times,serif" font-size="14.00">TissueType</text> +<text text-anchor="start" x="3688.5" y="-1800.8" font-family="Times,serif" font-size="14.00">TXTURL</text> +<polygon fill="none" stroke="black" points="3653.5,-1792 3653.5,-2134 3782.5,-2134 3782.5,-1792 3653.5,-1792"/> +</g> +<!-- SampleXRef->Sample --> +<g id="edge6" class="edge"> +<title>SampleXRef:SampleId->Sample</title> +<path fill="none" stroke="black" d="M3424,-3271C3878.8,-3271 3810.34,-2508.42 3752.65,-2148.25"/> +<polygon fill="black" stroke="black" points="3756.08,-2147.55 3751.03,-2138.24 3749.17,-2148.67 3756.08,-2147.55"/> +</g> +<!-- GeneIDXRef --> +<g id="node5" class="node"> +<title>GeneIDXRef</title> +<polygon fill="white" stroke="transparent" points="7441,-4842.5 7441,-4932.5 7613,-4932.5 7613,-4842.5 7441,-4842.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="7444,-4908.5 7444,-4929.5 7610,-4929.5 7610,-4908.5 7444,-4908.5"/> +<polygon fill="none" stroke="black" points="7444,-4908.5 7444,-4929.5 7610,-4929.5 7610,-4908.5 7444,-4908.5"/> +<text text-anchor="start" x="7447" y="-4915.3" font-family="Times,serif" font-size="14.00">GeneIDXRef (220 KiB)</text> +<text text-anchor="start" x="7502.5" y="-4893.3" font-family="Times,serif" font-size="14.00">human</text> +<text text-anchor="start" x="7503.5" y="-4872.3" font-family="Times,serif" font-size="14.00">mouse</text> +<text text-anchor="start" x="7516" y="-4851.3" font-family="Times,serif" font-size="14.00">rat</text> +<polygon fill="none" stroke="black" points="7441,-4842.5 7441,-4932.5 7613,-4932.5 7613,-4842.5 7441,-4842.5"/> +</g> +<!-- MachineAccessLog --> +<g id="node6" class="node"> +<title>MachineAccessLog</title> +<polygon fill="white" stroke="transparent" points="7647,-4811 7647,-4964 7861,-4964 7861,-4811 7647,-4811"/> +<polygon fill="#df65b0" stroke="transparent" points="7650,-4939.5 7650,-4960.5 7858,-4960.5 7858,-4939.5 7650,-4939.5"/> +<polygon fill="none" stroke="black" points="7650,-4939.5 7650,-4960.5 7858,-4960.5 7858,-4939.5 7650,-4939.5"/> +<text text-anchor="start" x="7653" y="-4946.3" font-family="Times,serif" font-size="14.00">MachineAccessLog (23 MiB)</text> +<text text-anchor="start" x="7714.5" y="-4924.3" font-family="Times,serif" font-size="14.00">accesstime</text> +<text text-anchor="start" x="7732" y="-4903.3" font-family="Times,serif" font-size="14.00">action</text> +<text text-anchor="start" x="7728" y="-4882.3" font-family="Times,serif" font-size="14.00">data_id</text> +<text text-anchor="start" x="7734.5" y="-4861.3" font-family="Times,serif" font-size="14.00">db_id</text> +<text text-anchor="start" x="7747" y="-4840.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="7715.5" y="-4819.3" font-family="Times,serif" font-size="14.00">ip_address</text> +<polygon fill="none" stroke="black" points="7647,-4811 7647,-4964 7861,-4964 7861,-4811 7647,-4811"/> +</g> +<!-- metadata_audit --> +<g id="node7" class="node"> +<title>metadata_audit</title> +<polygon fill="white" stroke="transparent" points="292.5,-1897 292.5,-2029 479.5,-2029 479.5,-1897 292.5,-1897"/> +<polygon fill="#d7b5d8" stroke="transparent" points="296,-2005 296,-2026 477,-2026 477,-2005 296,-2005"/> +<polygon fill="none" stroke="black" points="296,-2005 296,-2026 477,-2026 477,-2005 296,-2005"/> +<text text-anchor="start" x="299" y="-2011.8" font-family="Times,serif" font-size="14.00">metadata_audit (16 KiB)</text> +<text text-anchor="start" x="349.5" y="-1989.8" font-family="Times,serif" font-size="14.00">dataset_id</text> +<text text-anchor="start" x="365" y="-1968.8" font-family="Times,serif" font-size="14.00">editor</text> +<text text-anchor="start" x="379.5" y="-1947.8" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="337.5" y="-1926.8" font-family="Times,serif" font-size="14.00">json_diff_data</text> +<text text-anchor="start" x="344.5" y="-1905.8" font-family="Times,serif" font-size="14.00">time_stamp</text> +<polygon fill="none" stroke="black" points="292.5,-1897 292.5,-2029 479.5,-2029 479.5,-1897 292.5,-1897"/> +</g> +<!-- Datasets --> +<g id="node16" class="node"> +<title>Datasets</title> +<polygon fill="lightgrey" stroke="transparent" points="305,-660.5 305,-1023.5 469,-1023.5 469,-660.5 305,-660.5"/> +<polygon fill="#df65b0" stroke="transparent" points="308,-999 308,-1020 466,-1020 466,-999 308,-999"/> +<polygon fill="none" stroke="black" points="308,-999 308,-1020 466,-1020 466,-999 308,-999"/> +<text text-anchor="start" x="326.5" y="-1005.8" font-family="Times,serif" font-size="14.00">Datasets (4 MiB)</text> +<polygon fill="green" stroke="transparent" points="308,-978 308,-997 466,-997 466,-978 308,-978"/> +<text text-anchor="start" x="344.5" y="-983.8" font-family="Times,serif" font-size="14.00">AboutCases</text> +<polygon fill="green" stroke="transparent" points="308,-957 308,-976 466,-976 466,-957 308,-957"/> +<text text-anchor="start" x="310" y="-962.8" font-family="Times,serif" font-size="14.00">AboutDataProcessing</text> +<polygon fill="green" stroke="transparent" points="308,-936 308,-955 466,-955 466,-936 308,-936"/> +<text text-anchor="start" x="334.5" y="-941.8" font-family="Times,serif" font-size="14.00">AboutPlatform</text> +<polygon fill="green" stroke="transparent" points="308,-915 308,-934 466,-934 466,-915 308,-915"/> +<text text-anchor="start" x="343" y="-920.8" font-family="Times,serif" font-size="14.00">AboutTissue</text> +<polygon fill="green" stroke="transparent" points="308,-894 308,-913 466,-913 466,-894 308,-894"/> +<text text-anchor="start" x="325.5" y="-899.8" font-family="Times,serif" font-size="14.00">Acknowledgment</text> +<polygon fill="green" stroke="transparent" points="308,-873 308,-892 466,-892 466,-873 308,-873"/> +<text text-anchor="start" x="358" y="-878.8" font-family="Times,serif" font-size="14.00">Citation</text> +<polygon fill="green" stroke="transparent" points="308,-852 308,-871 466,-871 466,-852 308,-852"/> +<text text-anchor="start" x="341" y="-857.8" font-family="Times,serif" font-size="14.00">Contributors</text> +<text text-anchor="start" x="352" y="-836.8" font-family="Times,serif" font-size="14.00">DatasetId</text> +<polygon fill="green" stroke="transparent" points="308,-810 308,-829 466,-829 466,-810 308,-810"/> +<text text-anchor="start" x="338" y="-815.8" font-family="Times,serif" font-size="14.00">DatasetName</text> +<text text-anchor="start" x="328.5" y="-794.8" font-family="Times,serif" font-size="14.00">DatasetStatusId</text> +<polygon fill="green" stroke="transparent" points="308,-768 308,-787 466,-787 466,-768 308,-768"/> +<text text-anchor="start" x="320" y="-773.8" font-family="Times,serif" font-size="14.00">ExperimentDesign</text> +<polygon fill="green" stroke="transparent" points="308,-747 308,-766 466,-766 466,-747 308,-747"/> +<text text-anchor="start" x="350.5" y="-752.8" font-family="Times,serif" font-size="14.00">GeoSeries</text> +<text text-anchor="start" x="336" y="-731.8" font-family="Times,serif" font-size="14.00">InvestigatorId</text> +<polygon fill="green" stroke="transparent" points="308,-705 308,-724 466,-724 466,-705 308,-705"/> +<text text-anchor="start" x="365.5" y="-710.8" font-family="Times,serif" font-size="14.00">Notes</text> +<text text-anchor="start" x="330.5" y="-689.8" font-family="Times,serif" font-size="14.00">PublicationTitle</text> +<polygon fill="green" stroke="transparent" points="308,-663 308,-682 466,-682 466,-663 308,-663"/> +<text text-anchor="start" x="352" y="-668.8" font-family="Times,serif" font-size="14.00">Summary</text> +<polygon fill="none" stroke="black" points="305,-660.5 305,-1023.5 469,-1023.5 469,-660.5 305,-660.5"/> +</g> +<!-- metadata_audit->Datasets --> +<g id="edge7" class="edge"> +<title>metadata_audit:dataset_id->Datasets</title> +<path fill="none" stroke="black" d="M478,-1994C525.38,-1994 453.11,-1365.95 412.1,-1037.71"/> +<polygon fill="black" stroke="black" points="415.55,-1037.1 410.84,-1027.61 408.61,-1037.97 415.55,-1037.1"/> +</g> +<!-- GenoXRef --> +<g id="node8" class="node"> +<title>GenoXRef</title> +<polygon fill="white" stroke="transparent" points="4464,-3228 4464,-3360 4614,-3360 4614,-3228 4464,-3228"/> +<polygon fill="#df65b0" stroke="transparent" points="4467,-3336 4467,-3357 4611,-3357 4611,-3336 4467,-3336"/> +<polygon fill="none" stroke="black" points="4467,-3336 4467,-3357 4611,-3357 4611,-3336 4467,-3336"/> +<text text-anchor="start" x="4470" y="-3342.8" font-family="Times,serif" font-size="14.00">GenoXRef (14 MiB)</text> +<text text-anchor="start" x="4528" y="-3320.8" font-family="Times,serif" font-size="14.00">cM</text> +<text text-anchor="start" x="4514.5" y="-3299.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="4489" y="-3278.8" font-family="Times,serif" font-size="14.00">GenoFreezeId</text> +<text text-anchor="start" x="4513" y="-3257.8" font-family="Times,serif" font-size="14.00">GenoId</text> +<text text-anchor="start" x="4472.5" y="-3236.8" font-family="Times,serif" font-size="14.00">Used_for_mapping</text> +<polygon fill="none" stroke="black" points="4464,-3228 4464,-3360 4614,-3360 4614,-3228 4464,-3228"/> +</g> +<!-- Geno --> +<g id="node46" class="node"> +<title>Geno</title> +<polygon fill="white" stroke="transparent" points="4245,-671 4245,-1013 4383,-1013 4383,-671 4245,-671"/> +<polygon fill="#df65b0" stroke="transparent" points="4248,-989 4248,-1010 4380,-1010 4380,-989 4248,-989"/> +<polygon fill="none" stroke="black" points="4248,-989 4248,-1010 4380,-1010 4380,-989 4248,-989"/> +<text text-anchor="start" x="4262" y="-995.8" font-family="Times,serif" font-size="14.00">Geno (39 MiB)</text> +<text text-anchor="start" x="4300.5" y="-973.8" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="4279" y="-952.8" font-family="Times,serif" font-size="14.00">Chr_mm8</text> +<text text-anchor="start" x="4283" y="-931.8" font-family="Times,serif" font-size="14.00">chr_num</text> +<text text-anchor="start" x="4275.5" y="-910.8" font-family="Times,serif" font-size="14.00">Comments</text> +<text text-anchor="start" x="4306.5" y="-889.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="4263" y="-868.8" font-family="Times,serif" font-size="14.00">Marker_Name</text> +<text text-anchor="start" x="4302" y="-847.8" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="4280.5" y="-826.8" font-family="Times,serif" font-size="14.00">Mb_2016</text> +<text text-anchor="start" x="4280.5" y="-805.8" font-family="Times,serif" font-size="14.00">Mb_mm8</text> +<text text-anchor="start" x="4292.5" y="-784.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="4279" y="-763.8" font-family="Times,serif" font-size="14.00">Sequence</text> +<text text-anchor="start" x="4289" y="-742.8" font-family="Times,serif" font-size="14.00">Source</text> +<text text-anchor="start" x="4284.5" y="-721.8" font-family="Times,serif" font-size="14.00">Source2</text> +<text text-anchor="start" x="4279" y="-700.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="4250" y="-679.8" font-family="Times,serif" font-size="14.00">used_by_geno_file</text> +<polygon fill="none" stroke="black" points="4245,-671 4245,-1013 4383,-1013 4383,-671 4245,-671"/> +</g> +<!-- GenoXRef->Geno --> +<g id="edge9" class="edge"> +<title>GenoXRef:GenoId->Geno</title> +<path fill="none" stroke="black" d="M4612,-3261C4626.31,-3261 4580.57,-1213.56 4576,-1200 4540.22,-1093.91 4460.35,-992.99 4398.15,-925.69"/> +<polygon fill="black" stroke="black" points="4400.41,-922.99 4391.03,-918.06 4395.29,-927.76 4400.41,-922.99"/> +</g> +<!-- GenoFreeze --> +<g id="node82" class="node"> +<title>GenoFreeze</title> +<polygon fill="white" stroke="transparent" points="4407,-1855 4407,-2071 4559,-2071 4559,-1855 4407,-1855"/> +<polygon fill="#d7b5d8" stroke="transparent" points="4410,-2047 4410,-2068 4556,-2068 4556,-2047 4410,-2047"/> +<polygon fill="none" stroke="black" points="4410,-2047 4410,-2068 4556,-2068 4556,-2047 4410,-2047"/> +<text text-anchor="start" x="4413" y="-2053.8" font-family="Times,serif" font-size="14.00">GenoFreeze (2 KiB)</text> +<text text-anchor="start" x="4422.5" y="-2031.8" font-family="Times,serif" font-size="14.00">AuthorisedUsers</text> +<text text-anchor="start" x="4431.5" y="-2010.8" font-family="Times,serif" font-size="14.00">confidentiality</text> +<text text-anchor="start" x="4441" y="-1989.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="4448" y="-1968.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="4475.5" y="-1947.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="4440" y="-1926.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="4461.5" y="-1905.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="4461" y="-1884.8" font-family="Times,serif" font-size="14.00">public</text> +<text text-anchor="start" x="4442" y="-1863.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<polygon fill="none" stroke="black" points="4407,-1855 4407,-2071 4559,-2071 4559,-1855 4407,-1855"/> +</g> +<!-- GenoXRef->GenoFreeze --> +<g id="edge8" class="edge"> +<title>GenoXRef:GenoFreezeId->GenoFreeze</title> +<path fill="none" stroke="black" d="M4466,-3282C4346.95,-3282 4432.68,-2411.13 4468.93,-2085.19"/> +<polygon fill="black" stroke="black" points="4472.41,-2085.56 4470.04,-2075.24 4465.45,-2084.79 4472.41,-2085.56"/> +</g> +<!-- TissueProbeSetXRef --> +<g id="node9" class="node"> +<title>TissueProbeSetXRef</title> +<polygon fill="white" stroke="transparent" points="6347,-4748 6347,-5027 6563,-5027 6563,-4748 6347,-4748"/> +<polygon fill="#df65b0" stroke="transparent" points="6350,-5002.5 6350,-5023.5 6560,-5023.5 6560,-5002.5 6350,-5002.5"/> +<polygon fill="none" stroke="black" points="6350,-5002.5 6350,-5023.5 6560,-5023.5 6560,-5002.5 6350,-5002.5"/> +<text text-anchor="start" x="6353" y="-5009.3" font-family="Times,serif" font-size="14.00">TissueProbeSetXRef (9 MiB)</text> +<text text-anchor="start" x="6441.5" y="-4987.3" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="6430.5" y="-4966.3" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="6414.5" y="-4945.3" font-family="Times,serif" font-size="14.00">description</text> +<text text-anchor="start" x="6429" y="-4924.3" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="6443" y="-4903.3" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="6421.5" y="-4882.3" font-family="Times,serif" font-size="14.00">Mb_2016</text> +<text text-anchor="start" x="6435" y="-4861.3" font-family="Times,serif" font-size="14.00">Mean</text> +<text text-anchor="start" x="6362.5" y="-4840.3" font-family="Times,serif" font-size="14.00">Probe_Target_Description</text> +<text text-anchor="start" x="6415.5" y="-4819.3" font-family="Times,serif" font-size="14.00">ProbesetId</text> +<text text-anchor="start" x="6428" y="-4798.3" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="6367.5" y="-4777.3" font-family="Times,serif" font-size="14.00">TissueProbeSetFreezeId</text> +<text text-anchor="start" x="6419" y="-4756.3" font-family="Times,serif" font-size="14.00">useStatus</text> +<polygon fill="none" stroke="black" points="6347,-4748 6347,-5027 6563,-5027 6563,-4748 6347,-4748"/> +</g> +<!-- TissueProbeSetFreeze --> +<g id="node23" class="node"> +<title>TissueProbeSetFreeze</title> +<polygon fill="white" stroke="transparent" points="4747,-3165 4747,-3423 4977,-3423 4977,-3165 4747,-3165"/> +<polygon fill="#f1eef6" stroke="transparent" points="4750,-3399 4750,-3420 4974,-3420 4974,-3399 4750,-3399"/> +<polygon fill="none" stroke="black" points="4750,-3399 4750,-3420 4974,-3420 4974,-3399 4750,-3399"/> +<text text-anchor="start" x="4753" y="-3405.8" font-family="Times,serif" font-size="14.00">TissueProbeSetFreeze (228 B)</text> +<text text-anchor="start" x="4801.5" y="-3383.8" font-family="Times,serif" font-size="14.00">AuthorisedUsers</text> +<text text-anchor="start" x="4840" y="-3362.8" font-family="Times,serif" font-size="14.00">AvgID</text> +<text text-anchor="start" x="4810.5" y="-3341.8" font-family="Times,serif" font-size="14.00">confidentiality</text> +<text text-anchor="start" x="4820" y="-3320.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="4827" y="-3299.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="4854.5" y="-3278.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="4840.5" y="-3257.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="4836" y="-3236.8" font-family="Times,serif" font-size="14.00">Name2</text> +<text text-anchor="start" x="4840" y="-3215.8" font-family="Times,serif" font-size="14.00">public</text> +<text text-anchor="start" x="4821" y="-3194.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<text text-anchor="start" x="4786.5" y="-3173.8" font-family="Times,serif" font-size="14.00">TissueProbeFreezeId</text> +<polygon fill="none" stroke="black" points="4747,-3165 4747,-3423 4977,-3423 4977,-3165 4747,-3165"/> +</g> +<!-- TissueProbeSetXRef->TissueProbeSetFreeze --> +<g id="edge11" class="edge"> +<title>TissueProbeSetXRef:TissueProbeSetFreezeId->TissueProbeSetFreeze</title> +<path fill="none" stroke="black" d="M6349,-4780.5C5901.77,-4780.5 6243.92,-4188.23 5938,-3862 5667.77,-3573.83 5217.81,-3404.02 4995.17,-3333.49"/> +<polygon fill="black" stroke="black" points="4995.98,-3330.08 4985.39,-3330.41 4993.88,-3336.75 4995.98,-3330.08"/> +</g> +<!-- ProbeSE --> +<g id="node78" class="node"> +<title>ProbeSE</title> +<polygon fill="white" stroke="transparent" points="6992,-1918 6992,-2008 7122,-2008 7122,-1918 6992,-1918"/> +<polygon fill="#ce1256" stroke="transparent" points="6995,-1984 6995,-2005 7119,-2005 7119,-1984 6995,-1984"/> +<polygon fill="none" stroke="black" points="6995,-1984 6995,-2005 7119,-2005 7119,-1984 6995,-1984"/> +<text text-anchor="start" x="6998" y="-1990.8" font-family="Times,serif" font-size="14.00">ProbeSE (3 GiB)</text> +<text text-anchor="start" x="7032.5" y="-1968.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="7038.5" y="-1947.8" font-family="Times,serif" font-size="14.00">error</text> +<text text-anchor="start" x="7027.5" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="6992,-1918 6992,-2008 7122,-2008 7122,-1918 6992,-1918"/> +</g> +<!-- TissueProbeSetXRef->ProbeSE --> +<g id="edge10" class="edge"> +<title>TissueProbeSetXRef:ProbesetId->ProbeSE</title> +<path fill="none" stroke="black" d="M6561,-4822.5C6998.45,-4822.5 6458.97,-4163.43 6776,-3862 6844.63,-3796.75 6923.59,-3897.22 6986,-3826 7107.35,-3687.52 7069.01,-2322.6 7059.04,-2022.25"/> +<polygon fill="black" stroke="black" points="7062.53,-2021.9 7058.7,-2012.02 7055.54,-2022.13 7062.53,-2021.9"/> +</g> +<!-- Homologene --> +<g id="node10" class="node"> +<title>Homologene</title> +<polygon fill="white" stroke="transparent" points="7895,-4842.5 7895,-4932.5 8055,-4932.5 8055,-4842.5 7895,-4842.5"/> +<polygon fill="#df65b0" stroke="transparent" points="7898,-4908.5 7898,-4929.5 8052,-4929.5 8052,-4908.5 7898,-4908.5"/> +<polygon fill="none" stroke="black" points="7898,-4908.5 7898,-4929.5 8052,-4929.5 8052,-4908.5 7898,-4908.5"/> +<text text-anchor="start" x="7901" y="-4915.3" font-family="Times,serif" font-size="14.00">Homologene (3 MiB)</text> +<text text-anchor="start" x="7949" y="-4893.3" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="7923" y="-4872.3" font-family="Times,serif" font-size="14.00">HomologeneId</text> +<text text-anchor="start" x="7931.5" y="-4851.3" font-family="Times,serif" font-size="14.00">TaxonomyId</text> +<polygon fill="none" stroke="black" points="7895,-4842.5 7895,-4932.5 8055,-4932.5 8055,-4842.5 7895,-4842.5"/> +</g> +<!-- PublishData --> +<g id="node11" class="node"> +<title>PublishData</title> +<polygon fill="white" stroke="transparent" points="5091,-1918 5091,-2008 5257,-2008 5257,-1918 5091,-1918"/> +<polygon fill="#df65b0" stroke="transparent" points="5094,-1984 5094,-2005 5254,-2005 5254,-1984 5094,-1984"/> +<polygon fill="none" stroke="black" points="5094,-1984 5094,-2005 5254,-2005 5254,-1984 5094,-1984"/> +<text text-anchor="start" x="5097" y="-1990.8" font-family="Times,serif" font-size="14.00">PublishData (34 MiB)</text> +<text text-anchor="start" x="5166.5" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="5144.5" y="-1947.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="5154.5" y="-1926.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="5091,-1918 5091,-2008 5257,-2008 5257,-1918 5091,-1918"/> +</g> +<!-- PublishData->Strain --> +<g id="edge12" class="edge"> +<title>PublishData:StrainId->Strain</title> +<path fill="none" stroke="black" d="M5255,-1951C5275.87,-1951 5264.11,-1218.38 5274,-1200 5368.85,-1023.7 5593.45,-915.93 5711.13,-869.6"/> +<polygon fill="black" stroke="black" points="5712.4,-872.86 5720.45,-865.97 5709.86,-866.34 5712.4,-872.86"/> +</g> +<!-- ProbeSetXRef --> +<g id="node12" class="node"> +<title>ProbeSetXRef</title> +<polygon fill="white" stroke="transparent" points="3033.5,-4737.5 3033.5,-5037.5 3200.5,-5037.5 3200.5,-4737.5 3033.5,-4737.5"/> +<polygon fill="#ce1256" stroke="transparent" points="3037,-5013.5 3037,-5034.5 3198,-5034.5 3198,-5013.5 3037,-5013.5"/> +<polygon fill="none" stroke="black" points="3037,-5013.5 3037,-5034.5 3198,-5034.5 3198,-5013.5 3037,-5013.5"/> +<text text-anchor="start" x="3040" y="-5020.3" font-family="Times,serif" font-size="14.00">ProbeSetXRef (2 GiB)</text> +<text text-anchor="start" x="3088.5" y="-4998.3" font-family="Times,serif" font-size="14.00">additive</text> +<text text-anchor="start" x="3093" y="-4977.3" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="3108" y="-4956.3" font-family="Times,serif" font-size="14.00">h2</text> +<text text-anchor="start" x="3096.5" y="-4935.3" font-family="Times,serif" font-size="14.00">Locus</text> +<text text-anchor="start" x="3082.5" y="-4914.3" font-family="Times,serif" font-size="14.00">Locus_old</text> +<text text-anchor="start" x="3102.5" y="-4893.3" font-family="Times,serif" font-size="14.00">LRS</text> +<text text-anchor="start" x="3088.5" y="-4872.3" font-family="Times,serif" font-size="14.00">LRS_old</text> +<text text-anchor="start" x="3097.5" y="-4851.3" font-family="Times,serif" font-size="14.00">mean</text> +<text text-anchor="start" x="3052.5" y="-4830.3" font-family="Times,serif" font-size="14.00">ProbeSetFreezeId</text> +<text text-anchor="start" x="3077" y="-4809.3" font-family="Times,serif" font-size="14.00">ProbeSetId</text> +<text text-anchor="start" x="3093" y="-4788.3" font-family="Times,serif" font-size="14.00">pValue</text> +<text text-anchor="start" x="3079" y="-4767.3" font-family="Times,serif" font-size="14.00">pValue_old</text> +<text text-anchor="start" x="3109.5" y="-4746.3" font-family="Times,serif" font-size="14.00">se</text> +<polygon fill="none" stroke="black" points="3033.5,-4737.5 3033.5,-5037.5 3200.5,-5037.5 3200.5,-4737.5 3033.5,-4737.5"/> +</g> +<!-- ProbeSetXRef->ProbeSE --> +<g id="edge14" class="edge"> +<title>ProbeSetXRef:ProbeSetId->ProbeSE</title> +<path fill="none" stroke="black" d="M3199,-4812.5C4021.93,-4812.5 3996.77,-4088.2 4788,-3862 4841.88,-3846.6 6765.02,-3865.27 6805,-3826 6889.39,-3743.1 6769.62,-2854.79 6843,-2762 6880.46,-2714.64 6934.85,-2771.97 6974,-2726 7149.11,-2520.43 7098.76,-2161.98 7070.36,-2022.18"/> +<polygon fill="black" stroke="black" points="7073.73,-2021.18 7068.27,-2012.1 7066.87,-2022.6 7073.73,-2021.18"/> +</g> +<!-- ProbeSetFreeze --> +<g id="node90" class="node"> +<title>ProbeSetFreeze</title> +<polygon fill="white" stroke="transparent" points="2639.5,-3144 2639.5,-3444 2838.5,-3444 2838.5,-3144 2639.5,-3144"/> +<polygon fill="#d7b5d8" stroke="transparent" points="2643,-3420 2643,-3441 2836,-3441 2836,-3420 2643,-3420"/> +<polygon fill="none" stroke="black" points="2643,-3420 2643,-3441 2836,-3441 2836,-3420 2643,-3420"/> +<text text-anchor="start" x="2646" y="-3426.8" font-family="Times,serif" font-size="14.00">ProbeSetFreeze (171 KiB)</text> +<text text-anchor="start" x="2679" y="-3404.8" font-family="Times,serif" font-size="14.00">AuthorisedUsers</text> +<text text-anchor="start" x="2717.5" y="-3383.8" font-family="Times,serif" font-size="14.00">AvgID</text> +<text text-anchor="start" x="2688" y="-3362.8" font-family="Times,serif" font-size="14.00">confidentiality</text> +<text text-anchor="start" x="2697.5" y="-3341.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="2703" y="-3320.8" font-family="Times,serif" font-size="14.00">DataScale</text> +<text text-anchor="start" x="2704.5" y="-3299.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="2732" y="-3278.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2718" y="-3257.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="2713.5" y="-3236.8" font-family="Times,serif" font-size="14.00">Name2</text> +<text text-anchor="start" x="2704.5" y="-3215.8" font-family="Times,serif" font-size="14.00">OrderList</text> +<text text-anchor="start" x="2686.5" y="-3194.8" font-family="Times,serif" font-size="14.00">ProbeFreezeId</text> +<text text-anchor="start" x="2717.5" y="-3173.8" font-family="Times,serif" font-size="14.00">public</text> +<text text-anchor="start" x="2698.5" y="-3152.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<polygon fill="none" stroke="black" points="2639.5,-3144 2639.5,-3444 2838.5,-3444 2838.5,-3144 2639.5,-3144"/> +</g> +<!-- ProbeSetXRef->ProbeSetFreeze --> +<g id="edge13" class="edge"> +<title>ProbeSetXRef:ProbeSetFreezeId->ProbeSetFreeze</title> +<path fill="none" stroke="black" d="M3036,-4833.5C2816.79,-4833.5 2907.79,-4076.99 2865,-3862 2837.79,-3725.3 2803.24,-3570.92 2777.19,-3457.81"/> +<polygon fill="black" stroke="black" points="2780.6,-3456.98 2774.94,-3448.03 2773.77,-3458.56 2780.6,-3456.98"/> +</g> +<!-- TraitMetadata --> +<g id="node13" class="node"> +<title>TraitMetadata</title> +<polygon fill="white" stroke="transparent" points="8089,-4853 8089,-4922 8267,-4922 8267,-4853 8089,-4853"/> +<polygon fill="#d7b5d8" stroke="transparent" points="8092,-4897.5 8092,-4918.5 8264,-4918.5 8264,-4897.5 8092,-4897.5"/> +<polygon fill="none" stroke="black" points="8092,-4897.5 8092,-4918.5 8264,-4918.5 8264,-4897.5 8092,-4897.5"/> +<text text-anchor="start" x="8095" y="-4904.3" font-family="Times,serif" font-size="14.00">TraitMetadata (16 KiB)</text> +<text text-anchor="start" x="8162" y="-4882.3" font-family="Times,serif" font-size="14.00">type</text> +<text text-anchor="start" x="8158.5" y="-4861.3" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="8089,-4853 8089,-4922 8267,-4922 8267,-4853 8089,-4853"/> +</g> +<!-- TissueProbeSetData --> +<g id="node14" class="node"> +<title>TissueProbeSetData</title> +<polygon fill="white" stroke="transparent" points="2313.5,-1918 2313.5,-2008 2538.5,-2008 2538.5,-1918 2313.5,-1918"/> +<polygon fill="#df65b0" stroke="transparent" points="2317,-1984 2317,-2005 2536,-2005 2536,-1984 2317,-1984"/> +<polygon fill="none" stroke="black" points="2317,-1984 2317,-2005 2536,-2005 2536,-1984 2317,-1984"/> +<text text-anchor="start" x="2320" y="-1990.8" font-family="Times,serif" font-size="14.00">TissueProbeSetData (33 MiB)</text> +<text text-anchor="start" x="2419" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2395" y="-1947.8" font-family="Times,serif" font-size="14.00">TissueID</text> +<text text-anchor="start" x="2407" y="-1926.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="2313.5,-1918 2313.5,-2008 2538.5,-2008 2538.5,-1918 2313.5,-1918"/> +</g> +<!-- Tissue --> +<g id="node79" class="node"> +<title>Tissue</title> +<polygon fill="lightgrey" stroke="transparent" points="2372.5,-755 2372.5,-929 2497.5,-929 2497.5,-755 2372.5,-755"/> +<polygon fill="#d7b5d8" stroke="transparent" points="2376,-905 2376,-926 2495,-926 2495,-905 2376,-905"/> +<polygon fill="none" stroke="black" points="2376,-905 2376,-926 2495,-926 2495,-905 2376,-905"/> +<text text-anchor="start" x="2381" y="-911.8" font-family="Times,serif" font-size="14.00">Tissue (11 KiB)</text> +<text text-anchor="start" x="2390.5" y="-889.8" font-family="Times,serif" font-size="14.00">BIRN_lex_ID</text> +<text text-anchor="start" x="2378" y="-868.8" font-family="Times,serif" font-size="14.00">BIRN_lex_Name</text> +<text text-anchor="start" x="2428" y="-847.8" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="green" stroke="transparent" points="2376,-821 2376,-840 2495,-840 2495,-821 2376,-821"/> +<text text-anchor="start" x="2414" y="-826.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="green" stroke="transparent" points="2376,-800 2376,-819 2495,-819 2495,-800 2376,-800"/> +<text text-anchor="start" x="2391" y="-805.8" font-family="Times,serif" font-size="14.00">Short_Name</text> +<text text-anchor="start" x="2405" y="-784.8" font-family="Times,serif" font-size="14.00">TissueId</text> +<text text-anchor="start" x="2391.5" y="-763.8" font-family="Times,serif" font-size="14.00">TissueName</text> +<polygon fill="none" stroke="black" points="2372.5,-755 2372.5,-929 2497.5,-929 2497.5,-755 2372.5,-755"/> +</g> +<!-- TissueProbeSetData->Tissue --> +<g id="edge15" class="edge"> +<title>TissueProbeSetData:TissueID->Tissue</title> +<path fill="none" stroke="black" d="M2537,-1951C2587.33,-1951 2488.08,-1216.42 2449.46,-943.5"/> +<polygon fill="black" stroke="black" points="2452.87,-942.61 2448,-933.2 2445.94,-943.59 2452.87,-942.61"/> +</g> +<!-- DBType --> +<g id="node15" class="node"> +<title>DBType</title> +<polygon fill="white" stroke="transparent" points="8304.5,-3259.5 8304.5,-3328.5 8421.5,-3328.5 8421.5,-3259.5 8304.5,-3259.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="8308,-3304 8308,-3325 8419,-3325 8419,-3304 8308,-3304"/> +<polygon fill="none" stroke="black" points="8308,-3304 8308,-3325 8419,-3325 8419,-3304 8308,-3304"/> +<text text-anchor="start" x="8311" y="-3310.8" font-family="Times,serif" font-size="14.00">DBType (99 B)</text> +<text text-anchor="start" x="8356" y="-3288.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="8342" y="-3267.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="8304.5,-3259.5 8304.5,-3328.5 8421.5,-3328.5 8421.5,-3259.5 8304.5,-3259.5"/> +</g> +<!-- DatasetStatus --> +<g id="node20" class="node"> +<title>DatasetStatus</title> +<polygon fill="lightgrey" stroke="transparent" points="305.5,-264 305.5,-333 468.5,-333 468.5,-264 305.5,-264"/> +<polygon fill="#f1eef6" stroke="transparent" points="309,-308.5 309,-329.5 466,-329.5 466,-308.5 309,-308.5"/> +<polygon fill="none" stroke="black" points="309,-308.5 309,-329.5 466,-329.5 466,-308.5 309,-308.5"/> +<text text-anchor="start" x="312" y="-315.3" font-family="Times,serif" font-size="14.00">DatasetStatus (40 B)</text> +<text text-anchor="start" x="329" y="-293.3" font-family="Times,serif" font-size="14.00">DatasetStatusId</text> +<polygon fill="green" stroke="transparent" points="309,-266.5 309,-285.5 466,-285.5 466,-266.5 309,-266.5"/> +<text text-anchor="start" x="315" y="-272.3" font-family="Times,serif" font-size="14.00">DatasetStatusName</text> +<polygon fill="none" stroke="black" points="305.5,-264 305.5,-333 468.5,-333 468.5,-264 305.5,-264"/> +</g> +<!-- Datasets->DatasetStatus --> +<g id="edge16" class="edge"> +<title>Datasets:DatasetStatusId->DatasetStatus</title> +<path fill="none" stroke="black" d="M467,-798C557.78,-798 449.28,-471.63 404.55,-347.04"/> +<polygon fill="black" stroke="black" points="407.75,-345.6 401.06,-337.38 401.16,-347.97 407.75,-345.6"/> +</g> +<!-- Investigators --> +<g id="node71" class="node"> +<title>Investigators</title> +<polygon fill="lightgrey" stroke="transparent" points="88,-117 88,-480 258,-480 258,-117 88,-117"/> +<polygon fill="#d7b5d8" stroke="transparent" points="91,-455.5 91,-476.5 255,-476.5 255,-455.5 91,-455.5"/> +<polygon fill="none" stroke="black" points="91,-455.5 91,-476.5 255,-476.5 255,-455.5 91,-455.5"/> +<text text-anchor="start" x="94" y="-462.3" font-family="Times,serif" font-size="14.00">Investigators (22 KiB)</text> +<polygon fill="green" stroke="transparent" points="91,-434.5 91,-453.5 255,-453.5 255,-434.5 91,-434.5"/> +<text text-anchor="start" x="144" y="-440.3" font-family="Times,serif" font-size="14.00">Address</text> +<polygon fill="green" stroke="transparent" points="91,-413.5 91,-432.5 255,-432.5 255,-413.5 91,-413.5"/> +<text text-anchor="start" x="158" y="-419.3" font-family="Times,serif" font-size="14.00">City</text> +<polygon fill="green" stroke="transparent" points="91,-392.5 91,-411.5 255,-411.5 255,-392.5 91,-392.5"/> +<text text-anchor="start" x="144" y="-398.3" font-family="Times,serif" font-size="14.00">Country</text> +<polygon fill="green" stroke="transparent" points="91,-371.5 91,-390.5 255,-390.5 255,-371.5 91,-371.5"/> +<text text-anchor="start" x="152" y="-377.3" font-family="Times,serif" font-size="14.00">Email</text> +<polygon fill="green" stroke="transparent" points="91,-350.5 91,-369.5 255,-369.5 255,-350.5 91,-350.5"/> +<text text-anchor="start" x="134.5" y="-356.3" font-family="Times,serif" font-size="14.00">FirstName</text> +<text text-anchor="start" x="122" y="-335.3" font-family="Times,serif" font-size="14.00">InvestigatorId</text> +<polygon fill="green" stroke="transparent" points="91,-308.5 91,-327.5 255,-327.5 255,-308.5 91,-308.5"/> +<text text-anchor="start" x="136.5" y="-314.3" font-family="Times,serif" font-size="14.00">LastName</text> +<text text-anchor="start" x="119.5" y="-293.3" font-family="Times,serif" font-size="14.00">OrganizationId</text> +<polygon fill="green" stroke="transparent" points="91,-266.5 91,-285.5 255,-285.5 255,-266.5 91,-266.5"/> +<text text-anchor="start" x="150.5" y="-272.3" font-family="Times,serif" font-size="14.00">Phone</text> +<polygon fill="green" stroke="transparent" points="91,-245.5 91,-264.5 255,-264.5 255,-245.5 91,-245.5"/> +<text text-anchor="start" x="153.5" y="-251.3" font-family="Times,serif" font-size="14.00">State</text> +<polygon fill="green" stroke="transparent" points="91,-224.5 91,-243.5 255,-243.5 255,-224.5 91,-224.5"/> +<text text-anchor="start" x="161" y="-230.3" font-family="Times,serif" font-size="14.00">Url</text> +<text text-anchor="start" x="138.5" y="-209.3" font-family="Times,serif" font-size="14.00">UserDate</text> +<text text-anchor="start" x="136.5" y="-188.3" font-family="Times,serif" font-size="14.00">UserLevel</text> +<text text-anchor="start" x="134.5" y="-167.3" font-family="Times,serif" font-size="14.00">UserName</text> +<text text-anchor="start" x="139.5" y="-146.3" font-family="Times,serif" font-size="14.00">UserPass</text> +<polygon fill="green" stroke="transparent" points="91,-119.5 91,-138.5 255,-138.5 255,-119.5 91,-119.5"/> +<text text-anchor="start" x="143" y="-125.3" font-family="Times,serif" font-size="14.00">ZipCode</text> +<polygon fill="none" stroke="black" points="88,-117 88,-480 258,-480 258,-117 88,-117"/> +</g> +<!-- Datasets->Investigators --> +<g id="edge17" class="edge"> +<title>Datasets:InvestigatorId->Investigators</title> +<path fill="none" stroke="black" d="M307,-735C252.81,-735 218.24,-610.26 197.82,-494.3"/> +<polygon fill="black" stroke="black" points="201.22,-493.45 196.07,-484.19 194.32,-494.64 201.22,-493.45"/> +</g> +<!-- IndelAll --> +<g id="node17" class="node"> +<title>IndelAll</title> +<polygon fill="white" stroke="transparent" points="3168,-692 3168,-992 3302,-992 3302,-692 3168,-692"/> +<polygon fill="#df65b0" stroke="transparent" points="3171,-968 3171,-989 3299,-989 3299,-968 3171,-968"/> +<polygon fill="none" stroke="black" points="3171,-968 3171,-989 3299,-989 3299,-968 3171,-968"/> +<text text-anchor="start" x="3174" y="-974.8" font-family="Times,serif" font-size="14.00">IndelAll (17 MiB)</text> +<text text-anchor="start" x="3188" y="-952.8" font-family="Times,serif" font-size="14.00">Chromosome</text> +<text text-anchor="start" x="3227.5" y="-931.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3181" y="-910.8" font-family="Times,serif" font-size="14.00">InDelSequence</text> +<text text-anchor="start" x="3206.5" y="-889.8" font-family="Times,serif" font-size="14.00">Mb_end</text> +<text text-anchor="start" x="3185" y="-868.8" font-family="Times,serif" font-size="14.00">Mb_end_2016</text> +<text text-anchor="start" x="3202.5" y="-847.8" font-family="Times,serif" font-size="14.00">Mb_start</text> +<text text-anchor="start" x="3181" y="-826.8" font-family="Times,serif" font-size="14.00">Mb_start_2016</text> +<text text-anchor="start" x="3213.5" y="-805.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="3219.5" y="-784.8" font-family="Times,serif" font-size="14.00">Size</text> +<text text-anchor="start" x="3203" y="-763.8" font-family="Times,serif" font-size="14.00">SourceId</text> +<text text-anchor="start" x="3200" y="-742.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="3210.5" y="-721.8" font-family="Times,serif" font-size="14.00">Strand</text> +<text text-anchor="start" x="3217.5" y="-700.8" font-family="Times,serif" font-size="14.00">Type</text> +<polygon fill="none" stroke="black" points="3168,-692 3168,-992 3302,-992 3302,-692 3168,-692"/> +</g> +<!-- IndelAll->Species --> +<g id="edge18" class="edge"> +<title>IndelAll:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M3170,-746C3144.8,-746 3164.16,-541.49 3151,-520 3088.71,-418.27 2960,-356.26 2875.88,-324.91"/> +<polygon fill="black" stroke="black" points="2876.95,-321.58 2866.36,-321.42 2874.55,-328.15 2876.95,-321.58"/> +</g> +<!-- GORef --> +<g id="node18" class="node"> +<title>GORef</title> +<polygon fill="white" stroke="transparent" points="8459.5,-4842.5 8459.5,-4932.5 8576.5,-4932.5 8576.5,-4842.5 8459.5,-4842.5"/> +<polygon fill="#df65b0" stroke="transparent" points="8463,-4908.5 8463,-4929.5 8574,-4929.5 8574,-4908.5 8463,-4908.5"/> +<polygon fill="none" stroke="black" points="8463,-4908.5 8463,-4929.5 8574,-4929.5 8574,-4908.5 8463,-4908.5"/> +<text text-anchor="start" x="8466" y="-4915.3" font-family="Times,serif" font-size="14.00">GORef (2 MiB)</text> +<text text-anchor="start" x="8497" y="-4893.3" font-family="Times,serif" font-size="14.00">genes</text> +<text text-anchor="start" x="8492.5" y="-4872.3" font-family="Times,serif" font-size="14.00">goterm</text> +<text text-anchor="start" x="8511.5" y="-4851.3" font-family="Times,serif" font-size="14.00">id</text> +<polygon fill="none" stroke="black" points="8459.5,-4842.5 8459.5,-4932.5 8576.5,-4932.5 8576.5,-4842.5 8459.5,-4842.5"/> +</g> +<!-- Publication --> +<g id="node19" class="node"> +<title>Publication</title> +<polygon fill="lightgrey" stroke="transparent" points="2531.5,-723.5 2531.5,-960.5 2682.5,-960.5 2682.5,-723.5 2531.5,-723.5"/> +<polygon fill="#df65b0" stroke="transparent" points="2535,-936 2535,-957 2680,-957 2680,-936 2535,-936"/> +<polygon fill="none" stroke="black" points="2535,-936 2535,-957 2680,-957 2680,-936 2535,-936"/> +<text text-anchor="start" x="2538" y="-942.8" font-family="Times,serif" font-size="14.00">Publication (7 MiB)</text> +<polygon fill="green" stroke="transparent" points="2535,-915 2535,-934 2680,-934 2680,-915 2535,-915"/> +<text text-anchor="start" x="2577" y="-920.8" font-family="Times,serif" font-size="14.00">Abstract</text> +<polygon fill="green" stroke="transparent" points="2535,-894 2535,-913 2680,-913 2680,-894 2535,-894"/> +<text text-anchor="start" x="2579" y="-899.8" font-family="Times,serif" font-size="14.00">Authors</text> +<polygon fill="green" stroke="transparent" points="2535,-873 2535,-892 2680,-892 2680,-873 2535,-873"/> +<text text-anchor="start" x="2581.5" y="-878.8" font-family="Times,serif" font-size="14.00">Journal</text> +<polygon fill="green" stroke="transparent" points="2535,-852 2535,-871 2680,-871 2680,-852 2535,-852"/> +<text text-anchor="start" x="2584" y="-857.8" font-family="Times,serif" font-size="14.00">Month</text> +<polygon fill="green" stroke="transparent" points="2535,-831 2535,-850 2680,-850 2680,-831 2535,-831"/> +<text text-anchor="start" x="2586" y="-836.8" font-family="Times,serif" font-size="14.00">Pages</text> +<polygon fill="green" stroke="transparent" points="2535,-810 2535,-829 2680,-829 2680,-810 2535,-810"/> +<text text-anchor="start" x="2566" y="-815.8" font-family="Times,serif" font-size="14.00">PubMed_ID</text> +<polygon fill="green" stroke="transparent" points="2535,-789 2535,-808 2680,-808 2680,-789 2535,-789"/> +<text text-anchor="start" x="2591" y="-794.8" font-family="Times,serif" font-size="14.00">Title</text> +<polygon fill="green" stroke="transparent" points="2535,-768 2535,-787 2680,-787 2680,-768 2535,-768"/> +<text text-anchor="start" x="2581" y="-773.8" font-family="Times,serif" font-size="14.00">Volume</text> +<polygon fill="green" stroke="transparent" points="2535,-747 2535,-766 2680,-766 2680,-747 2535,-747"/> +<text text-anchor="start" x="2591.5" y="-752.8" font-family="Times,serif" font-size="14.00">Year</text> +<text text-anchor="start" x="2600" y="-731.8" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="none" stroke="black" points="2531.5,-723.5 2531.5,-960.5 2682.5,-960.5 2682.5,-723.5 2531.5,-723.5"/> +</g> +<!-- PublishFreeze --> +<g id="node21" class="node"> +<title>PublishFreeze</title> +<polygon fill="white" stroke="transparent" points="3246.5,-1855 3246.5,-2071 3415.5,-2071 3415.5,-1855 3246.5,-1855"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3250,-2047 3250,-2068 3413,-2068 3413,-2047 3250,-2047"/> +<polygon fill="none" stroke="black" points="3250,-2047 3250,-2068 3413,-2068 3413,-2047 3250,-2047"/> +<text text-anchor="start" x="3253" y="-2053.8" font-family="Times,serif" font-size="14.00">PublishFreeze (6 KiB)</text> +<text text-anchor="start" x="3271" y="-2031.8" font-family="Times,serif" font-size="14.00">AuthorisedUsers</text> +<text text-anchor="start" x="3280" y="-2010.8" font-family="Times,serif" font-size="14.00">confidentiality</text> +<text text-anchor="start" x="3289.5" y="-1989.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="3296.5" y="-1968.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="3324" y="-1947.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3288.5" y="-1926.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="3310" y="-1905.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="3309.5" y="-1884.8" font-family="Times,serif" font-size="14.00">public</text> +<text text-anchor="start" x="3290.5" y="-1863.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<polygon fill="none" stroke="black" points="3246.5,-1855 3246.5,-2071 3415.5,-2071 3415.5,-1855 3246.5,-1855"/> +</g> +<!-- InbredSet --> +<g id="node28" class="node"> +<title>InbredSet</title> +<polygon fill="lightgrey" stroke="transparent" points="3781.5,-692 3781.5,-992 3928.5,-992 3928.5,-692 3781.5,-692"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3785,-968 3785,-989 3926,-989 3926,-968 3785,-968"/> +<polygon fill="none" stroke="black" points="3785,-968 3785,-989 3926,-989 3926,-968 3785,-968"/> +<text text-anchor="start" x="3788" y="-974.8" font-family="Times,serif" font-size="14.00">InbredSet (10 KiB)</text> +<text text-anchor="start" x="3810" y="-952.8" font-family="Times,serif" font-size="14.00">FamilyOrder</text> +<text text-anchor="start" x="3848" y="-931.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3801.5" y="-910.8" font-family="Times,serif" font-size="14.00">InbredSetCode</text> +<text text-anchor="start" x="3812.5" y="-889.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="3798.5" y="-868.8" font-family="Times,serif" font-size="14.00">InbredSetName</text> +<text text-anchor="start" x="3789" y="-847.8" font-family="Times,serif" font-size="14.00">MappingMethodId</text> +<text text-anchor="start" x="3807" y="-826.8" font-family="Times,serif" font-size="14.00">MenuOrderId</text> +<polygon fill="green" stroke="transparent" points="3785,-800 3785,-819 3926,-819 3926,-800 3785,-800"/> +<text text-anchor="start" x="3834" y="-805.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="3833.5" y="-784.8" font-family="Times,serif" font-size="14.00">public</text> +<text text-anchor="start" x="3820.5" y="-763.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<polygon fill="green" stroke="transparent" points="3785,-737 3785,-756 3926,-756 3926,-737 3785,-737"/> +<text text-anchor="start" x="3831" y="-742.8" font-family="Times,serif" font-size="14.00">Family</text> +<polygon fill="green" stroke="transparent" points="3785,-716 3785,-735 3926,-735 3926,-716 3785,-716"/> +<text text-anchor="start" x="3820.5" y="-721.8" font-family="Times,serif" font-size="14.00">FullName</text> +<polygon fill="green" stroke="transparent" points="3785,-695 3785,-714 3926,-714 3926,-695 3785,-695"/> +<text text-anchor="start" x="3810.5" y="-700.8" font-family="Times,serif" font-size="14.00">GeneticType</text> +<polygon fill="none" stroke="black" points="3781.5,-692 3781.5,-992 3928.5,-992 3928.5,-692 3781.5,-692"/> +</g> +<!-- PublishFreeze->InbredSet --> +<g id="edge19" class="edge"> +<title>PublishFreeze:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M3414,-1930C3454.58,-1930 3409.48,-1229.81 3437,-1200 3485.84,-1147.1 3703.73,-1210.15 3759,-1164 3805.64,-1125.05 3830.2,-1064.45 3842.93,-1006.34"/> +<polygon fill="black" stroke="black" points="3846.42,-1006.79 3845.03,-996.28 3839.56,-1005.36 3846.42,-1006.79"/> +</g> +<!-- TissueProbeFreeze --> +<g id="node22" class="node"> +<title>TissueProbeFreeze</title> +<polygon fill="white" stroke="transparent" points="4631,-1865.5 4631,-2060.5 4837,-2060.5 4837,-1865.5 4631,-1865.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="4634,-2036 4634,-2057 4834,-2057 4834,-2036 4634,-2036"/> +<polygon fill="none" stroke="black" points="4634,-2036 4634,-2057 4834,-2057 4834,-2036 4634,-2036"/> +<text text-anchor="start" x="4637" y="-2042.8" font-family="Times,serif" font-size="14.00">TissueProbeFreeze (116 B)</text> +<text text-anchor="start" x="4710" y="-2020.8" font-family="Times,serif" font-size="14.00">ChipId</text> +<text text-anchor="start" x="4692" y="-1999.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="4699" y="-1978.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="4726.5" y="-1957.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="4691" y="-1936.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="4712.5" y="-1915.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="4693" y="-1894.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<text text-anchor="start" x="4704.5" y="-1873.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="4631,-1865.5 4631,-2060.5 4837,-2060.5 4837,-1865.5 4631,-1865.5"/> +</g> +<!-- TissueProbeFreeze->InbredSet --> +<g id="edge20" class="edge"> +<title>TissueProbeFreeze:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M4633,-1940C4550.53,-1940 4633.54,-1259.07 4576,-1200 4521.75,-1144.31 4299.4,-1194.77 4228,-1164 4116.11,-1115.79 4013.14,-1021.68 3943.86,-947.77"/> +<polygon fill="black" stroke="black" points="3946.22,-945.17 3936.85,-940.23 3941.1,-949.94 3946.22,-945.17"/> +</g> +<!-- TissueProbeSetFreeze->TissueProbeFreeze --> +<g id="edge21" class="edge"> +<title>TissueProbeSetFreeze:TissueProbeFreezeId->TissueProbeFreeze</title> +<path fill="none" stroke="black" d="M4862,-3167C4862,-2762.54 4789.57,-2285.87 4753.68,-2074.48"/> +<polygon fill="black" stroke="black" points="4757.13,-2073.88 4752,-2064.61 4750.23,-2075.06 4757.13,-2073.88"/> +</g> +<!-- ProbeXRef --> +<g id="node24" class="node"> +<title>ProbeXRef</title> +<polygon fill="white" stroke="transparent" points="4805,-4842.5 4805,-4932.5 4969,-4932.5 4969,-4842.5 4805,-4842.5"/> +<polygon fill="#df65b0" stroke="transparent" points="4808,-4908.5 4808,-4929.5 4966,-4929.5 4966,-4908.5 4808,-4908.5"/> +<polygon fill="none" stroke="black" points="4808,-4908.5 4808,-4929.5 4966,-4929.5 4966,-4908.5 4808,-4908.5"/> +<text text-anchor="start" x="4811" y="-4915.3" font-family="Times,serif" font-size="14.00">ProbeXRef (229 MiB)</text> +<text text-anchor="start" x="4862.5" y="-4893.3" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="4834" y="-4872.3" font-family="Times,serif" font-size="14.00">ProbeFreezeId</text> +<text text-anchor="start" x="4858.5" y="-4851.3" font-family="Times,serif" font-size="14.00">ProbeId</text> +<polygon fill="none" stroke="black" points="4805,-4842.5 4805,-4932.5 4969,-4932.5 4969,-4842.5 4805,-4842.5"/> +</g> +<!-- Probe --> +<g id="node41" class="node"> +<title>Probe</title> +<polygon fill="white" stroke="transparent" points="6860.5,-3186 6860.5,-3402 6969.5,-3402 6969.5,-3186 6860.5,-3186"/> +<polygon fill="#ce1256" stroke="transparent" points="6864,-3378 6864,-3399 6967,-3399 6967,-3378 6864,-3378"/> +<polygon fill="none" stroke="black" points="6864,-3378 6864,-3399 6967,-3399 6967,-3378 6864,-3378"/> +<text text-anchor="start" x="6867" y="-3384.8" font-family="Times,serif" font-size="14.00">Probe (2 GiB)</text> +<text text-anchor="start" x="6891" y="-3362.8" font-family="Times,serif" font-size="14.00">E_GSB</text> +<text text-anchor="start" x="6890.5" y="-3341.8" font-family="Times,serif" font-size="14.00">E_NSB</text> +<text text-anchor="start" x="6887" y="-3320.8" font-family="Times,serif" font-size="14.00">ExonNo</text> +<text text-anchor="start" x="6908" y="-3299.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="6894" y="-3278.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="6875" y="-3257.8" font-family="Times,serif" font-size="14.00">ProbeSetId</text> +<text text-anchor="start" x="6880.5" y="-3236.8" font-family="Times,serif" font-size="14.00">Sequence</text> +<text text-anchor="start" x="6873" y="-3215.8" font-family="Times,serif" font-size="14.00">SerialOrder</text> +<text text-anchor="start" x="6904" y="-3194.8" font-family="Times,serif" font-size="14.00">Tm</text> +<polygon fill="none" stroke="black" points="6860.5,-3186 6860.5,-3402 6969.5,-3402 6969.5,-3186 6860.5,-3186"/> +</g> +<!-- ProbeXRef->Probe --> +<g id="edge23" class="edge"> +<title>ProbeXRef:ProbeId->Probe</title> +<path fill="none" stroke="black" d="M4967,-4854.5C5534.68,-4854.5 5262.79,-4114.96 5771,-3862 5877.2,-3809.14 6749.63,-3905.13 6838,-3826 6950.47,-3725.29 6951.4,-3539.28 6936.93,-3416.33"/> +<polygon fill="black" stroke="black" points="6940.37,-3415.61 6935.68,-3406.11 6933.42,-3416.47 6940.37,-3415.61"/> +</g> +<!-- ProbeXRef->ProbeFreeze --> +<g id="edge22" class="edge"> +<title>ProbeXRef:ProbeFreezeId->ProbeFreeze</title> +<path fill="none" stroke="black" d="M4807,-4875.5C3968.98,-4875.5 3960.35,-4248.91 3217,-3862 3179.88,-3842.68 3157.46,-3857.58 3130,-3826 2809.52,-3457.41 3148.75,-3152.22 2855,-2762 2836.07,-2736.85 2811.36,-2752.26 2794,-2726 2665.13,-2531.04 2665.79,-2246.15 2679.06,-2085.66"/> +<polygon fill="black" stroke="black" points="2682.59,-2085.53 2679.95,-2075.27 2675.61,-2084.93 2682.59,-2085.53"/> +</g> +<!-- Publication_Test --> +<g id="node25" class="node"> +<title>Publication_Test</title> +<polygon fill="white" stroke="transparent" points="8610.5,-4769 8610.5,-5006 8797.5,-5006 8797.5,-4769 8610.5,-4769"/> +<polygon fill="#df65b0" stroke="transparent" points="8614,-4981.5 8614,-5002.5 8795,-5002.5 8795,-4981.5 8614,-4981.5"/> +<polygon fill="none" stroke="black" points="8614,-4981.5 8614,-5002.5 8795,-5002.5 8795,-4981.5 8614,-4981.5"/> +<text text-anchor="start" x="8617" y="-4988.3" font-family="Times,serif" font-size="14.00">Publication_Test (7 MiB)</text> +<text text-anchor="start" x="8674" y="-4966.3" font-family="Times,serif" font-size="14.00">Abstract</text> +<text text-anchor="start" x="8676" y="-4945.3" font-family="Times,serif" font-size="14.00">Authors</text> +<text text-anchor="start" x="8697" y="-4924.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="8678.5" y="-4903.3" font-family="Times,serif" font-size="14.00">Journal</text> +<text text-anchor="start" x="8681" y="-4882.3" font-family="Times,serif" font-size="14.00">Month</text> +<text text-anchor="start" x="8683" y="-4861.3" font-family="Times,serif" font-size="14.00">Pages</text> +<text text-anchor="start" x="8663" y="-4840.3" font-family="Times,serif" font-size="14.00">PubMed_ID</text> +<text text-anchor="start" x="8688" y="-4819.3" font-family="Times,serif" font-size="14.00">Title</text> +<text text-anchor="start" x="8678" y="-4798.3" font-family="Times,serif" font-size="14.00">Volume</text> +<text text-anchor="start" x="8688.5" y="-4777.3" font-family="Times,serif" font-size="14.00">Year</text> +<polygon fill="none" stroke="black" points="8610.5,-4769 8610.5,-5006 8797.5,-5006 8797.5,-4769 8610.5,-4769"/> +</g> +<!-- DBList --> +<g id="node26" class="node"> +<title>DBList</title> +<polygon fill="white" stroke="transparent" points="8301,-4821.5 8301,-4953.5 8425,-4953.5 8425,-4821.5 8301,-4821.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="8304,-4929.5 8304,-4950.5 8422,-4950.5 8422,-4929.5 8304,-4929.5"/> +<polygon fill="none" stroke="black" points="8304,-4929.5 8304,-4950.5 8422,-4950.5 8422,-4929.5 8304,-4929.5"/> +<text text-anchor="start" x="8307" y="-4936.3" font-family="Times,serif" font-size="14.00">DBList (99 KiB)</text> +<text text-anchor="start" x="8344.5" y="-4914.3" font-family="Times,serif" font-size="14.00">Code</text> +<text text-anchor="start" x="8327.5" y="-4893.3" font-family="Times,serif" font-size="14.00">DBTypeId</text> +<text text-anchor="start" x="8331" y="-4872.3" font-family="Times,serif" font-size="14.00">FreezeId</text> +<text text-anchor="start" x="8355.5" y="-4851.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="8341.5" y="-4830.3" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="8301,-4821.5 8301,-4953.5 8425,-4953.5 8425,-4821.5 8301,-4821.5"/> +</g> +<!-- DBList->DBType --> +<g id="edge24" class="edge"> +<title>DBList:DBTypeId->DBType</title> +<path fill="none" stroke="black" d="M8423,-4897.5C8462.94,-4897.5 8383.01,-3608.94 8366.07,-3342.76"/> +<polygon fill="black" stroke="black" points="8369.55,-3342.4 8365.42,-3332.64 8362.57,-3342.84 8369.55,-3342.4"/> +</g> +<!-- H2 --> +<g id="node27" class="node"> +<title>H2</title> +<polygon fill="white" stroke="transparent" points="8831.5,-4832 8831.5,-4943 8922.5,-4943 8922.5,-4832 8831.5,-4832"/> +<polygon fill="#df65b0" stroke="transparent" points="8835,-4918.5 8835,-4939.5 8920,-4939.5 8920,-4918.5 8835,-4918.5"/> +<polygon fill="none" stroke="black" points="8835,-4918.5 8835,-4939.5 8920,-4939.5 8920,-4918.5 8835,-4918.5"/> +<text text-anchor="start" x="8838" y="-4925.3" font-family="Times,serif" font-size="14.00">H2 (2 MiB)</text> +<text text-anchor="start" x="8853" y="-4903.3" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="8856.5" y="-4882.3" font-family="Times,serif" font-size="14.00">H2SE</text> +<text text-anchor="start" x="8856" y="-4861.3" font-family="Times,serif" font-size="14.00">HPH2</text> +<text text-anchor="start" x="8859" y="-4840.3" font-family="Times,serif" font-size="14.00">ICH2</text> +<polygon fill="none" stroke="black" points="8831.5,-4832 8831.5,-4943 8922.5,-4943 8922.5,-4832 8831.5,-4832"/> +</g> +<!-- InbredSet->Species --> +<g id="edge25" class="edge"> +<title>InbredSet:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M3784,-767C3728.83,-767 3795.51,-561.36 3759,-520 3641.66,-387.09 3085.79,-325.05 2876.21,-306.09"/> +<polygon fill="black" stroke="black" points="2876.47,-302.6 2866.2,-305.19 2875.85,-309.57 2876.47,-302.6"/> +</g> +<!-- DatasetMapInvestigator --> +<g id="node29" class="node"> +<title>DatasetMapInvestigator</title> +<polygon fill="white" stroke="transparent" points="8,-1918 8,-2008 258,-2008 258,-1918 8,-1918"/> +<polygon fill="#d7b5d8" stroke="transparent" points="11,-1984 11,-2005 255,-2005 255,-1984 11,-1984"/> +<polygon fill="none" stroke="black" points="11,-1984 11,-2005 255,-2005 255,-1984 11,-1984"/> +<text text-anchor="start" x="14" y="-1990.8" font-family="Times,serif" font-size="14.00">DatasetMapInvestigator (28 KiB)</text> +<text text-anchor="start" x="98" y="-1968.8" font-family="Times,serif" font-size="14.00">DatasetId</text> +<text text-anchor="start" x="125.5" y="-1947.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="82" y="-1926.8" font-family="Times,serif" font-size="14.00">InvestigatorId</text> +<polygon fill="none" stroke="black" points="8,-1918 8,-2008 258,-2008 258,-1918 8,-1918"/> +</g> +<!-- DatasetMapInvestigator->Datasets --> +<g id="edge26" class="edge"> +<title>DatasetMapInvestigator:DatasetId->Datasets</title> +<path fill="none" stroke="black" d="M256,-1973C277.48,-1973 271.49,-1221.19 275,-1200 283.9,-1146.31 298.97,-1089.52 315.22,-1037.42"/> +<polygon fill="black" stroke="black" points="318.6,-1038.33 318.27,-1027.74 311.93,-1036.23 318.6,-1038.33"/> +</g> +<!-- DatasetMapInvestigator->Investigators --> +<g id="edge27" class="edge"> +<title>DatasetMapInvestigator:InvestigatorId->Investigators</title> +<path fill="none" stroke="black" d="M133,-1920C133,-1405.22 153.42,-798.72 165.08,-494.41"/> +<polygon fill="black" stroke="black" points="168.59,-494.29 165.48,-484.16 161.59,-494.02 168.59,-494.29"/> +</g> +<!-- Docs --> +<g id="node30" class="node"> +<title>Docs</title> +<polygon fill="white" stroke="transparent" points="8956.5,-4832 8956.5,-4943 9075.5,-4943 9075.5,-4832 8956.5,-4832"/> +<polygon fill="#d7b5d8" stroke="transparent" points="8960,-4918.5 8960,-4939.5 9073,-4939.5 9073,-4918.5 8960,-4918.5"/> +<polygon fill="none" stroke="black" points="8960,-4918.5 8960,-4939.5 9073,-4939.5 9073,-4918.5 8960,-4918.5"/> +<text text-anchor="start" x="8963" y="-4925.3" font-family="Times,serif" font-size="14.00">Docs (148 KiB)</text> +<text text-anchor="start" x="8989" y="-4903.3" font-family="Times,serif" font-size="14.00">content</text> +<text text-anchor="start" x="8997" y="-4882.3" font-family="Times,serif" font-size="14.00">entry</text> +<text text-anchor="start" x="9009.5" y="-4861.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="9001.5" y="-4840.3" font-family="Times,serif" font-size="14.00">title</text> +<polygon fill="none" stroke="black" points="8956.5,-4832 8956.5,-4943 9075.5,-4943 9075.5,-4832 8956.5,-4832"/> +</g> +<!-- Phenotype --> +<g id="node31" class="node"> +<title>Phenotype</title> +<polygon fill="lightgrey" stroke="transparent" points="2910,-713 2910,-971 3134,-971 3134,-713 2910,-713"/> +<polygon fill="#df65b0" stroke="transparent" points="2913,-947 2913,-968 3131,-968 3131,-947 2913,-947"/> +<polygon fill="none" stroke="black" points="2913,-947 2913,-968 3131,-968 3131,-947 2913,-947"/> +<text text-anchor="start" x="2955" y="-953.8" font-family="Times,serif" font-size="14.00">Phenotype (9 MiB)</text> +<text text-anchor="start" x="3014.5" y="-931.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2915" y="-910.8" font-family="Times,serif" font-size="14.00">Post_publication_abbreviation</text> +<text text-anchor="start" x="2918" y="-889.8" font-family="Times,serif" font-size="14.00">Pre_publication_abbreviation</text> +<polygon fill="green" stroke="transparent" points="2913,-863 2913,-882 3131,-882 3131,-863 2913,-863"/> +<text text-anchor="start" x="2958.5" y="-868.8" font-family="Times,serif" font-size="14.00">Authorized_Users</text> +<polygon fill="green" stroke="transparent" points="2913,-842 2913,-861 3131,-861 3131,-842 2913,-842"/> +<text text-anchor="start" x="2988.5" y="-847.8" font-family="Times,serif" font-size="14.00">Lab_code</text> +<polygon fill="green" stroke="transparent" points="2913,-821 2913,-840 3131,-840 3131,-821 2913,-821"/> +<text text-anchor="start" x="2949.5" y="-826.8" font-family="Times,serif" font-size="14.00">Original_description</text> +<polygon fill="green" stroke="transparent" points="2913,-800 2913,-819 3131,-819 3131,-800 2913,-800"/> +<text text-anchor="start" x="2998" y="-805.8" font-family="Times,serif" font-size="14.00">Owner</text> +<polygon fill="green" stroke="transparent" points="2913,-779 2913,-798 3131,-798 3131,-779 2913,-779"/> +<text text-anchor="start" x="2919.5" y="-784.8" font-family="Times,serif" font-size="14.00">Post_publication_description</text> +<polygon fill="green" stroke="transparent" points="2913,-758 2913,-777 3131,-777 3131,-758 2913,-758"/> +<text text-anchor="start" x="2922.5" y="-763.8" font-family="Times,serif" font-size="14.00">Pre_publication_description</text> +<polygon fill="green" stroke="transparent" points="2913,-737 2913,-756 3131,-756 3131,-737 2913,-737"/> +<text text-anchor="start" x="2985.5" y="-742.8" font-family="Times,serif" font-size="14.00">Submitter</text> +<polygon fill="green" stroke="transparent" points="2913,-716 2913,-735 3131,-735 3131,-716 2913,-716"/> +<text text-anchor="start" x="3002" y="-721.8" font-family="Times,serif" font-size="14.00">Units</text> +<polygon fill="none" stroke="black" points="2910,-713 2910,-971 3134,-971 3134,-713 2910,-713"/> +</g> +<!-- SnpPattern --> +<g id="node32" class="node"> +<title>SnpPattern</title> +<polygon fill="white" stroke="transparent" points="9110,-3866 9110,-5909 9294,-5909 9294,-3866 9110,-3866"/> +<polygon fill="#ce1256" stroke="transparent" points="9113,-5884.5 9113,-5905.5 9291,-5905.5 9291,-5884.5 9113,-5884.5"/> +<polygon fill="none" stroke="black" points="9113,-5884.5 9113,-5905.5 9291,-5905.5 9291,-5884.5 9113,-5884.5"/> +<text text-anchor="start" x="9134" y="-5891.3" font-family="Times,serif" font-size="14.00">SnpPattern (8 GiB)</text> +<text text-anchor="start" x="9150.5" y="-5869.3" font-family="Times,serif" font-size="14.00">129P2/OlaHsd</text> +<text text-anchor="start" x="9155.5" y="-5848.3" font-family="Times,serif" font-size="14.00">129S1/SvImJ</text> +<text text-anchor="start" x="9153.5" y="-5827.3" font-family="Times,serif" font-size="14.00">129S2/SvHsd</text> +<text text-anchor="start" x="9156.5" y="-5806.3" font-family="Times,serif" font-size="14.00">129S4/SvJae</text> +<text text-anchor="start" x="9145" y="-5785.3" font-family="Times,serif" font-size="14.00">129S5/SvEvBrd</text> +<text text-anchor="start" x="9158" y="-5764.3" font-family="Times,serif" font-size="14.00">129S6/SvEv</text> +<text text-anchor="start" x="9149.5" y="-5743.3" font-family="Times,serif" font-size="14.00">129T2/SvEmsJ</text> +<text text-anchor="start" x="9165" y="-5722.3" font-family="Times,serif" font-size="14.00">129X1/SvJ</text> +<text text-anchor="start" x="9192" y="-5701.3" font-family="Times,serif" font-size="14.00">A/J</text> +<text text-anchor="start" x="9181.5" y="-5680.3" font-family="Times,serif" font-size="14.00">AKR/J</text> +<text text-anchor="start" x="9115" y="-5659.3" font-family="Times,serif" font-size="14.00">B6A6_Esline_Regeneron</text> +<text text-anchor="start" x="9164" y="-5638.3" font-family="Times,serif" font-size="14.00">BALB/cByJ</text> +<text text-anchor="start" x="9173" y="-5617.3" font-family="Times,serif" font-size="14.00">BALB/cJ</text> +<text text-anchor="start" x="9176" y="-5596.3" font-family="Times,serif" font-size="14.00">BPH/2J</text> +<text text-anchor="start" x="9177.5" y="-5575.3" font-family="Times,serif" font-size="14.00">BPL/1J</text> +<text text-anchor="start" x="9176" y="-5554.3" font-family="Times,serif" font-size="14.00">BPN/3J</text> +<text text-anchor="start" x="9148.5" y="-5533.3" font-family="Times,serif" font-size="14.00">BTBRT<+>tf/J</text> +<text text-anchor="start" x="9170.5" y="-5512.3" font-family="Times,serif" font-size="14.00">BUB/BnJ</text> +<text text-anchor="start" x="9135.5" y="-5491.3" font-family="Times,serif" font-size="14.00">C2T1_Esline_Nagy</text> +<text text-anchor="start" x="9171" y="-5470.3" font-family="Times,serif" font-size="14.00">C3H/HeJ</text> +<text text-anchor="start" x="9163" y="-5449.3" font-family="Times,serif" font-size="14.00">C3HeB/FeJ</text> +<text text-anchor="start" x="9164" y="-5428.3" font-family="Times,serif" font-size="14.00">C57BL/10J</text> +<text text-anchor="start" x="9159" y="-5407.3" font-family="Times,serif" font-size="14.00">C57BL/6ByJ</text> +<text text-anchor="start" x="9168.5" y="-5386.3" font-family="Times,serif" font-size="14.00">C57BL/6J</text> +<text text-anchor="start" x="9140" y="-5365.3" font-family="Times,serif" font-size="14.00">C57BL/6JBomTac</text> +<text text-anchor="start" x="9157.5" y="-5344.3" font-family="Times,serif" font-size="14.00">C57BL/6JCrl</text> +<text text-anchor="start" x="9142" y="-5323.3" font-family="Times,serif" font-size="14.00">C57BL/6JOlaHsd</text> +<text text-anchor="start" x="9154" y="-5302.3" font-family="Times,serif" font-size="14.00">C57BL/6NCrl</text> +<text text-anchor="start" x="9150.5" y="-5281.3" font-family="Times,serif" font-size="14.00">C57BL/6NHsd</text> +<text text-anchor="start" x="9162.5" y="-5260.3" font-family="Times,serif" font-size="14.00">C57BL/6NJ</text> +<text text-anchor="start" x="9150.5" y="-5239.3" font-family="Times,serif" font-size="14.00">C57BL/6NNIH</text> +<text text-anchor="start" x="9153" y="-5218.3" font-family="Times,serif" font-size="14.00">C57BL/6NTac</text> +<text text-anchor="start" x="9162.5" y="-5197.3" font-family="Times,serif" font-size="14.00">C57BLKS/J</text> +<text text-anchor="start" x="9164" y="-5176.3" font-family="Times,serif" font-size="14.00">C57BR/cdJ</text> +<text text-anchor="start" x="9178" y="-5155.3" font-family="Times,serif" font-size="14.00">C57L/J</text> +<text text-anchor="start" x="9182.5" y="-5134.3" font-family="Times,serif" font-size="14.00">C58/J</text> +<text text-anchor="start" x="9167.5" y="-5113.3" font-family="Times,serif" font-size="14.00">CALB/RkJ</text> +<text text-anchor="start" x="9170" y="-5092.3" font-family="Times,serif" font-size="14.00">CAST/EiJ</text> +<text text-anchor="start" x="9181.5" y="-5071.3" font-family="Times,serif" font-size="14.00">CBA/J</text> +<text text-anchor="start" x="9186.5" y="-5050.3" font-family="Times,serif" font-size="14.00">CE/J</text> +<text text-anchor="start" x="9157.5" y="-5029.3" font-family="Times,serif" font-size="14.00">CZECHII/EiJ</text> +<text text-anchor="start" x="9176.5" y="-5008.3" font-family="Times,serif" font-size="14.00">DBA/1J</text> +<text text-anchor="start" x="9176.5" y="-4987.3" font-family="Times,serif" font-size="14.00">DBA/2J</text> +<text text-anchor="start" x="9170.5" y="-4966.3" font-family="Times,serif" font-size="14.00">DDK/Pas</text> +<text text-anchor="start" x="9135.5" y="-4945.3" font-family="Times,serif" font-size="14.00">DDY/JclSidSeyFrkJ</text> +<text text-anchor="start" x="9148.5" y="-4924.3" font-family="Times,serif" font-size="14.00">EL/SuzSeyFrkJ</text> +<text text-anchor="start" x="9183.5" y="-4903.3" font-family="Times,serif" font-size="14.00">Fline</text> +<text text-anchor="start" x="9176" y="-4882.3" font-family="Times,serif" font-size="14.00">FVB/NJ</text> +<text text-anchor="start" x="9154" y="-4861.3" font-family="Times,serif" font-size="14.00">HTG/GoSfSnJ</text> +<text text-anchor="start" x="9185" y="-4840.3" font-family="Times,serif" font-size="14.00">I/LnJ</text> +<text text-anchor="start" x="9162.5" y="-4819.3" font-family="Times,serif" font-size="14.00">ILS/IbgTejJ</text> +<text text-anchor="start" x="9164" y="-4798.3" font-family="Times,serif" font-size="14.00">IS/CamRkJ</text> +<text text-anchor="start" x="9162.5" y="-4777.3" font-family="Times,serif" font-size="14.00">ISS/IbgTejJ</text> +<text text-anchor="start" x="9176.5" y="-4756.3" font-family="Times,serif" font-size="14.00">JF1/Ms</text> +<text text-anchor="start" x="9178" y="-4735.3" font-family="Times,serif" font-size="14.00">KK/HlJ</text> +<text text-anchor="start" x="9162.5" y="-4714.3" font-family="Times,serif" font-size="14.00">LEWES/EiJ</text> +<text text-anchor="start" x="9186.5" y="-4693.3" font-family="Times,serif" font-size="14.00">LG/J</text> +<text text-anchor="start" x="9184" y="-4672.3" font-family="Times,serif" font-size="14.00">Lline</text> +<text text-anchor="start" x="9187.5" y="-4651.3" font-family="Times,serif" font-size="14.00">LP/J</text> +<text text-anchor="start" x="9173.5" y="-4630.3" font-family="Times,serif" font-size="14.00">MA/MyJ</text> +<text text-anchor="start" x="9172.5" y="-4609.3" font-family="Times,serif" font-size="14.00">MAI/Pas</text> +<text text-anchor="start" x="9167" y="-4588.3" font-family="Times,serif" font-size="14.00">MOLF/EiJ</text> +<text text-anchor="start" x="9164" y="-4567.3" font-family="Times,serif" font-size="14.00">MOLG/DnJ</text> +<text text-anchor="start" x="9168.5" y="-4546.3" font-family="Times,serif" font-size="14.00">MRL/MpJ</text> +<text text-anchor="start" x="9169.5" y="-4525.3" font-family="Times,serif" font-size="14.00">MSM/Ms</text> +<text text-anchor="start" x="9160.5" y="-4504.3" font-family="Times,serif" font-size="14.00">NOD/ShiLtJ</text> +<text text-anchor="start" x="9171.5" y="-4483.3" font-family="Times,serif" font-size="14.00">NON/LtJ</text> +<text text-anchor="start" x="9172.5" y="-4462.3" font-family="Times,serif" font-size="14.00">NOR/LtJ</text> +<text text-anchor="start" x="9167" y="-4441.3" font-family="Times,serif" font-size="14.00">NZB/BlNJ</text> +<text text-anchor="start" x="9174" y="-4420.3" font-family="Times,serif" font-size="14.00">NZL/LtJ</text> +<text text-anchor="start" x="9164.5" y="-4399.3" font-family="Times,serif" font-size="14.00">NZO/HlLtJ</text> +<text text-anchor="start" x="9166.5" y="-4378.3" font-family="Times,serif" font-size="14.00">NZW/LacJ</text> +<text text-anchor="start" x="9187" y="-4357.3" font-family="Times,serif" font-size="14.00">O20</text> +<text text-anchor="start" x="9192" y="-4336.3" font-family="Times,serif" font-size="14.00">P/J</text> +<text text-anchor="start" x="9169" y="-4315.3" font-family="Times,serif" font-size="14.00">PERA/EiJ</text> +<text text-anchor="start" x="9168.5" y="-4294.3" font-family="Times,serif" font-size="14.00">PERC/EiJ</text> +<text text-anchor="start" x="9187.5" y="-4273.3" font-family="Times,serif" font-size="14.00">PL/J</text> +<text text-anchor="start" x="9170" y="-4252.3" font-family="Times,serif" font-size="14.00">PWD/PhJ</text> +<text text-anchor="start" x="9170" y="-4231.3" font-family="Times,serif" font-size="14.00">PWK/PhJ</text> +<text text-anchor="start" x="9185.5" y="-4210.3" font-family="Times,serif" font-size="14.00">Qsi5</text> +<text text-anchor="start" x="9171.5" y="-4189.3" font-family="Times,serif" font-size="14.00">RBA/DnJ</text> +<text text-anchor="start" x="9186.5" y="-4168.3" font-family="Times,serif" font-size="14.00">RF/J</text> +<text text-anchor="start" x="9179" y="-4147.3" font-family="Times,serif" font-size="14.00">RIIIS/J</text> +<text text-anchor="start" x="9171.5" y="-4126.3" font-family="Times,serif" font-size="14.00">SEA/GnJ</text> +<text text-anchor="start" x="9171.5" y="-4105.3" font-family="Times,serif" font-size="14.00">SEG/Pas</text> +<text text-anchor="start" x="9185" y="-4084.3" font-family="Times,serif" font-size="14.00">SJL/J</text> +<text text-anchor="start" x="9166.5" y="-4063.3" font-family="Times,serif" font-size="14.00">SKIVE/EiJ</text> +<text text-anchor="start" x="9185" y="-4042.3" font-family="Times,serif" font-size="14.00">SM/J</text> +<text text-anchor="start" x="9180.5" y="-4021.3" font-family="Times,serif" font-size="14.00">SnpId</text> +<text text-anchor="start" x="9168.5" y="-4000.3" font-family="Times,serif" font-size="14.00">SOD1/EiJ</text> +<text text-anchor="start" x="9164.5" y="-3979.3" font-family="Times,serif" font-size="14.00">SPRET/EiJ</text> +<text text-anchor="start" x="9183" y="-3958.3" font-family="Times,serif" font-size="14.00">ST/bJ</text> +<text text-anchor="start" x="9179.5" y="-3937.3" font-family="Times,serif" font-size="14.00">SWR/J</text> +<text text-anchor="start" x="9151.5" y="-3916.3" font-family="Times,serif" font-size="14.00">TALLYHO/JngJ</text> +<text text-anchor="start" x="9172" y="-3895.3" font-family="Times,serif" font-size="14.00">WSB/EiJ</text> +<text text-anchor="start" x="9153" y="-3874.3" font-family="Times,serif" font-size="14.00">ZALENDE/EiJ</text> +<polygon fill="none" stroke="black" points="9110,-3866 9110,-5909 9294,-5909 9294,-3866 9110,-3866"/> +</g> +<!-- AccessLog --> +<g id="node34" class="node"> +<title>AccessLog</title> +<polygon fill="white" stroke="transparent" points="9328,-4842.5 9328,-4932.5 9482,-4932.5 9482,-4842.5 9328,-4842.5"/> +<polygon fill="#df65b0" stroke="transparent" points="9331,-4908.5 9331,-4929.5 9479,-4929.5 9479,-4908.5 9331,-4908.5"/> +<polygon fill="none" stroke="black" points="9331,-4908.5 9331,-4929.5 9479,-4929.5 9479,-4908.5 9331,-4908.5"/> +<text text-anchor="start" x="9334" y="-4915.3" font-family="Times,serif" font-size="14.00">AccessLog (46 MiB)</text> +<text text-anchor="start" x="9365.5" y="-4893.3" font-family="Times,serif" font-size="14.00">accesstime</text> +<text text-anchor="start" x="9398" y="-4872.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="9366.5" y="-4851.3" font-family="Times,serif" font-size="14.00">ip_address</text> +<polygon fill="none" stroke="black" points="9328,-4842.5 9328,-4932.5 9482,-4932.5 9482,-4842.5 9328,-4842.5"/> +</g> +<!-- GeneRIF --> +<g id="node35" class="node"> +<title>GeneRIF</title> +<polygon fill="white" stroke="transparent" points="3576.5,-692 3576.5,-992 3709.5,-992 3709.5,-692 3576.5,-692"/> +<polygon fill="#df65b0" stroke="transparent" points="3580,-968 3580,-989 3707,-989 3707,-968 3580,-968"/> +<polygon fill="none" stroke="black" points="3580,-968 3580,-989 3707,-989 3707,-968 3580,-968"/> +<text text-anchor="start" x="3583" y="-974.8" font-family="Times,serif" font-size="14.00">GeneRIF (2 MiB)</text> +<text text-anchor="start" x="3610" y="-952.8" font-family="Times,serif" font-size="14.00">comment</text> +<text text-anchor="start" x="3604.5" y="-931.8" font-family="Times,serif" font-size="14.00">createtime</text> +<text text-anchor="start" x="3617.5" y="-910.8" font-family="Times,serif" font-size="14.00">display</text> +<text text-anchor="start" x="3623.5" y="-889.8" font-family="Times,serif" font-size="14.00">email</text> +<text text-anchor="start" x="3636" y="-868.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3622.5" y="-847.8" font-family="Times,serif" font-size="14.00">initial</text> +<text text-anchor="start" x="3602" y="-826.8" font-family="Times,serif" font-size="14.00">PubMed_ID</text> +<text text-anchor="start" x="3619" y="-805.8" font-family="Times,serif" font-size="14.00">reason</text> +<text text-anchor="start" x="3608.5" y="-784.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="3617.5" y="-763.8" font-family="Times,serif" font-size="14.00">symbol</text> +<text text-anchor="start" x="3617.5" y="-742.8" font-family="Times,serif" font-size="14.00">user_ip</text> +<text text-anchor="start" x="3610" y="-721.8" font-family="Times,serif" font-size="14.00">versionId</text> +<text text-anchor="start" x="3618.5" y="-700.8" font-family="Times,serif" font-size="14.00">weburl</text> +<polygon fill="none" stroke="black" points="3576.5,-692 3576.5,-992 3709.5,-992 3709.5,-692 3576.5,-692"/> +</g> +<!-- GeneRIF->Species --> +<g id="edge28" class="edge"> +<title>GeneRIF:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M3579,-788C3549.14,-788 3577.82,-543.18 3559,-520 3471.93,-412.76 3053.77,-338.32 2876.12,-311.02"/> +<polygon fill="black" stroke="black" points="2876.46,-307.54 2866.05,-309.49 2875.41,-314.46 2876.46,-307.54"/> +</g> +<!-- ProbeData --> +<g id="node36" class="node"> +<title>ProbeData</title> +<polygon fill="white" stroke="transparent" points="5291,-1918 5291,-2008 5443,-2008 5443,-1918 5291,-1918"/> +<polygon fill="#ce1256" stroke="transparent" points="5294,-1984 5294,-2005 5440,-2005 5440,-1984 5294,-1984"/> +<polygon fill="none" stroke="black" points="5294,-1984 5294,-2005 5440,-2005 5440,-1984 5294,-1984"/> +<text text-anchor="start" x="5297" y="-1990.8" font-family="Times,serif" font-size="14.00">ProbeData (10 GiB)</text> +<text text-anchor="start" x="5359.5" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="5337.5" y="-1947.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="5347.5" y="-1926.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="5291,-1918 5291,-2008 5443,-2008 5443,-1918 5291,-1918"/> +</g> +<!-- ProbeData->Strain --> +<g id="edge29" class="edge"> +<title>ProbeData:StrainId->Strain</title> +<path fill="none" stroke="black" d="M5441,-1951C5461.87,-1951 5451.21,-1219.36 5459,-1200 5511.05,-1070.73 5632.85,-959.15 5712.21,-896.58"/> +<polygon fill="black" stroke="black" points="5714.51,-899.22 5720.23,-890.3 5710.2,-893.71 5714.51,-899.22"/> +</g> +<!-- AvgMethod --> +<g id="node37" class="node"> +<title>AvgMethod</title> +<polygon fill="lightgrey" stroke="transparent" points="982.5,-786.5 982.5,-897.5 1133.5,-897.5 1133.5,-786.5 982.5,-786.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="986,-873 986,-894 1131,-894 1131,-873 986,-873"/> +<polygon fill="none" stroke="black" points="986,-873 986,-894 1131,-894 1131,-873 986,-873"/> +<text text-anchor="start" x="989" y="-879.8" font-family="Times,serif" font-size="14.00">AvgMethod (792 B)</text> +<text text-anchor="start" x="1010" y="-857.8" font-family="Times,serif" font-size="14.00">AvgMethodId</text> +<text text-anchor="start" x="1051" y="-836.8" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="green" stroke="transparent" points="986,-810 986,-829 1131,-829 1131,-810 986,-810"/> +<text text-anchor="start" x="1037" y="-815.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="1007.5" y="-794.8" font-family="Times,serif" font-size="14.00">Normalization</text> +<polygon fill="none" stroke="black" points="982.5,-786.5 982.5,-897.5 1133.5,-897.5 1133.5,-786.5 982.5,-786.5"/> +</g> +<!-- GeneRIFXRef --> +<g id="node38" class="node"> +<title>GeneRIFXRef</title> +<polygon fill="white" stroke="transparent" points="3003,-1918 3003,-2008 3175,-2008 3175,-1918 3003,-1918"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3006,-1984 3006,-2005 3172,-2005 3172,-1984 3006,-1984"/> +<polygon fill="none" stroke="black" points="3006,-1984 3006,-2005 3172,-2005 3172,-1984 3006,-1984"/> +<text text-anchor="start" x="3009" y="-1990.8" font-family="Times,serif" font-size="14.00">GeneRIFXRef (82 KiB)</text> +<text text-anchor="start" x="3030.5" y="-1968.8" font-family="Times,serif" font-size="14.00">GeneCategoryId</text> +<text text-anchor="start" x="3050.5" y="-1947.8" font-family="Times,serif" font-size="14.00">GeneRIFId</text> +<text text-anchor="start" x="3055.5" y="-1926.8" font-family="Times,serif" font-size="14.00">versionId</text> +<polygon fill="none" stroke="black" points="3003,-1918 3003,-2008 3175,-2008 3175,-1918 3003,-1918"/> +</g> +<!-- GeneRIFXRef->GeneRIF --> +<g id="edge31" class="edge"> +<title>GeneRIFXRef:GeneRIFId->GeneRIF</title> +<path fill="none" stroke="black" d="M3173,-1951C3214.74,-1951 3168.49,-1230.49 3197,-1200 3252.21,-1140.95 3497.53,-1216.51 3559,-1164 3604.75,-1124.91 3627.15,-1064.28 3637.64,-1006.19"/> +<polygon fill="black" stroke="black" points="3641.12,-1006.59 3639.34,-996.14 3634.22,-1005.42 3641.12,-1006.59"/> +</g> +<!-- GeneCategory --> +<g id="node73" class="node"> +<title>GeneCategory</title> +<polygon fill="white" stroke="transparent" points="3373.5,-807.5 3373.5,-876.5 3542.5,-876.5 3542.5,-807.5 3373.5,-807.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3377,-852 3377,-873 3540,-873 3540,-852 3377,-852"/> +<polygon fill="none" stroke="black" points="3377,-852 3377,-873 3540,-873 3540,-852 3377,-852"/> +<text text-anchor="start" x="3380" y="-858.8" font-family="Times,serif" font-size="14.00">GeneCategory (5 KiB)</text> +<text text-anchor="start" x="3451" y="-836.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3437" y="-815.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="3373.5,-807.5 3373.5,-876.5 3542.5,-876.5 3542.5,-807.5 3373.5,-807.5"/> +</g> +<!-- GeneRIFXRef->GeneCategory --> +<g id="edge30" class="edge"> +<title>GeneRIFXRef:GeneCategoryId->GeneCategory</title> +<path fill="none" stroke="black" d="M3173,-1973C3215.97,-1973 3169.76,-1233.22 3197,-1200 3241.84,-1145.31 3299.78,-1211.69 3352,-1164 3430.43,-1092.39 3450.94,-961.62 3456.23,-891.11"/> +<polygon fill="black" stroke="black" points="3459.75,-890.96 3456.93,-880.75 3452.77,-890.49 3459.75,-890.96"/> +</g> +<!-- CaseAttribute --> +<g id="node39" class="node"> +<title>CaseAttribute</title> +<polygon fill="lightgrey" stroke="transparent" points="1168,-797 1168,-887 1334,-887 1334,-797 1168,-797"/> +<polygon fill="#d7b5d8" stroke="transparent" points="1171,-863 1171,-884 1331,-884 1331,-863 1171,-863"/> +<polygon fill="none" stroke="black" points="1171,-863 1171,-884 1331,-884 1331,-863 1171,-863"/> +<text text-anchor="start" x="1174" y="-869.8" font-family="Times,serif" font-size="14.00">CaseAttribute (2 KiB)</text> +<polygon fill="green" stroke="transparent" points="1171,-842 1171,-861 1331,-861 1331,-842 1171,-842"/> +<text text-anchor="start" x="1209.5" y="-847.8" font-family="Times,serif" font-size="14.00">Description</text> +<polygon fill="green" stroke="transparent" points="1171,-821 1171,-840 1331,-840 1331,-821 1171,-821"/> +<text text-anchor="start" x="1243.5" y="-826.8" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="green" stroke="transparent" points="1171,-800 1171,-819 1331,-819 1331,-800 1171,-800"/> +<text text-anchor="start" x="1229.5" y="-805.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="1168,-797 1168,-887 1334,-887 1334,-797 1168,-797"/> +</g> +<!-- Strain->Species --> +<g id="edge32" class="edge"> +<title>Strain:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M5731,-777C5128.52,-777 4994.43,-618.17 4400,-520 3817.59,-423.81 3111.33,-337.05 2876.33,-308.98"/> +<polygon fill="black" stroke="black" points="2876.51,-305.48 2866.17,-307.77 2875.68,-312.43 2876.51,-305.48"/> +</g> +<!-- Probe->ProbeSE --> +<g id="edge33" class="edge"> +<title>Probe:ProbeSetId->ProbeSE</title> +<path fill="none" stroke="black" d="M6968,-3261C6999.5,-3261 7043.75,-2274.36 7054.55,-2022.15"/> +<polygon fill="black" stroke="black" points="7058.05,-2022.23 7054.98,-2012.09 7051.06,-2021.93 7058.05,-2022.23"/> +</g> +<!-- ProbeFreeze->InbredSet --> +<g id="edge34" class="edge"> +<title>ProbeFreeze:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M2775,-1951C2816.74,-1951 2764.69,-1229.71 2794,-1200 2866.79,-1126.23 3641.27,-1223.68 3726,-1164 3778.21,-1127.22 3809.31,-1065.62 3827.82,-1006.16"/> +<polygon fill="black" stroke="black" points="3831.27,-1006.83 3830.79,-996.25 3824.56,-1004.82 3831.27,-1006.83"/> +</g> +<!-- ProbeFreeze->Tissue --> +<g id="edge35" class="edge"> +<title>ProbeFreeze:TissueId->Tissue</title> +<path fill="none" stroke="black" d="M2613,-1867C2575.92,-1867 2609.31,-1231.02 2589,-1200 2568.75,-1169.06 2537.32,-1192.7 2514,-1164 2463.47,-1101.8 2444.56,-1011.96 2437.81,-943.13"/> +<polygon fill="black" stroke="black" points="2441.29,-942.77 2436.9,-933.13 2434.32,-943.41 2441.29,-942.77"/> +</g> +<!-- BXDSnpPosition --> +<g id="node43" class="node"> +<title>BXDSnpPosition</title> +<polygon fill="white" stroke="transparent" points="5476.5,-1886.5 5476.5,-2039.5 5681.5,-2039.5 5681.5,-1886.5 5476.5,-1886.5"/> +<polygon fill="#df65b0" stroke="transparent" points="5480,-2015 5480,-2036 5679,-2036 5679,-2015 5480,-2015"/> +<polygon fill="none" stroke="black" points="5480,-2015 5480,-2036 5679,-2036 5679,-2015 5480,-2015"/> +<text text-anchor="start" x="5483" y="-2021.8" font-family="Times,serif" font-size="14.00">BXDSnpPosition (230 MiB)</text> +<text text-anchor="start" x="5566" y="-1999.8" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="5572.5" y="-1978.8" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="5567.5" y="-1957.8" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="5546" y="-1936.8" font-family="Times,serif" font-size="14.00">Mb_2016</text> +<text text-anchor="start" x="5545.5" y="-1915.8" font-family="Times,serif" font-size="14.00">StrainId1</text> +<text text-anchor="start" x="5545.5" y="-1894.8" font-family="Times,serif" font-size="14.00">StrainId2</text> +<polygon fill="none" stroke="black" points="5476.5,-1886.5 5476.5,-2039.5 5681.5,-2039.5 5681.5,-1886.5 5476.5,-1886.5"/> +</g> +<!-- BXDSnpPosition->Strain --> +<g id="edge36" class="edge"> +<title>BXDSnpPosition:StrainId1->Strain</title> +<path fill="none" stroke="black" d="M5680,-1919C5699.98,-1919 5696.36,-1219.8 5699,-1200 5711.36,-1107.45 5738.02,-1004.03 5758.6,-932.42"/> +<polygon fill="black" stroke="black" points="5762.04,-933.11 5761.46,-922.54 5755.32,-931.17 5762.04,-933.11"/> +</g> +<!-- BXDSnpPosition->Strain --> +<g id="edge37" class="edge"> +<title>BXDSnpPosition:StrainId2->Strain</title> +<path fill="none" stroke="black" d="M5680,-1898C5699.4,-1898 5696.43,-1219.22 5699,-1200 5711.39,-1107.46 5738.05,-1004.03 5758.62,-932.43"/> +<polygon fill="black" stroke="black" points="5762.06,-933.12 5761.48,-922.54 5755.34,-931.17 5762.06,-933.12"/> +</g> +<!-- GeneRIF_BASIC --> +<g id="node44" class="node"> +<title>GeneRIF_BASIC</title> +<polygon fill="white" stroke="transparent" points="531.5,-744.5 531.5,-939.5 734.5,-939.5 734.5,-744.5 531.5,-744.5"/> +<polygon fill="#df65b0" stroke="transparent" points="535,-915 535,-936 732,-936 732,-915 535,-915"/> +<polygon fill="none" stroke="black" points="535,-915 535,-936 732,-936 732,-915 535,-915"/> +<text text-anchor="start" x="538" y="-921.8" font-family="Times,serif" font-size="14.00">GeneRIF_BASIC (275 MiB)</text> +<text text-anchor="start" x="600" y="-899.8" font-family="Times,serif" font-size="14.00">comment</text> +<text text-anchor="start" x="594.5" y="-878.8" font-family="Times,serif" font-size="14.00">createtime</text> +<text text-anchor="start" x="607.5" y="-857.8" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="592" y="-836.8" font-family="Times,serif" font-size="14.00">PubMed_ID</text> +<text text-anchor="start" x="598.5" y="-815.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="607.5" y="-794.8" font-family="Times,serif" font-size="14.00">symbol</text> +<text text-anchor="start" x="612.5" y="-773.8" font-family="Times,serif" font-size="14.00">TaxID</text> +<text text-anchor="start" x="599.5" y="-752.8" font-family="Times,serif" font-size="14.00">VersionId</text> +<polygon fill="none" stroke="black" points="531.5,-744.5 531.5,-939.5 734.5,-939.5 734.5,-744.5 531.5,-744.5"/> +</g> +<!-- GeneRIF_BASIC->Species --> +<g id="edge38" class="edge"> +<title>GeneRIF_BASIC:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M733,-819C766.29,-819 728.98,-544.05 752,-520 890.33,-375.45 2354.35,-314.96 2715.71,-302.17"/> +<polygon fill="black" stroke="black" points="2715.96,-305.66 2725.83,-301.81 2715.71,-298.67 2715.96,-305.66"/> +</g> +<!-- GeneList_rn33 --> +<g id="node45" class="node"> +<title>GeneList_rn33</title> +<polygon fill="white" stroke="transparent" points="9516.5,-4737.5 9516.5,-5037.5 9691.5,-5037.5 9691.5,-4737.5 9516.5,-4737.5"/> +<polygon fill="#df65b0" stroke="transparent" points="9520,-5013.5 9520,-5034.5 9689,-5034.5 9689,-5013.5 9520,-5013.5"/> +<polygon fill="none" stroke="black" points="9520,-5013.5 9520,-5034.5 9689,-5034.5 9689,-5013.5 9520,-5013.5"/> +<text text-anchor="start" x="9523" y="-5020.3" font-family="Times,serif" font-size="14.00">GeneList_rn33 (2 MiB)</text> +<text text-anchor="start" x="9578" y="-4998.3" font-family="Times,serif" font-size="14.00">cdsEnd</text> +<text text-anchor="start" x="9574" y="-4977.3" font-family="Times,serif" font-size="14.00">cdsStart</text> +<text text-anchor="start" x="9559" y="-4956.3" font-family="Times,serif" font-size="14.00">chromosome</text> +<text text-anchor="start" x="9566" y="-4935.3" font-family="Times,serif" font-size="14.00">exonCount</text> +<text text-anchor="start" x="9569.5" y="-4914.3" font-family="Times,serif" font-size="14.00">exonEnds</text> +<text text-anchor="start" x="9565" y="-4893.3" font-family="Times,serif" font-size="14.00">exonStarts</text> +<text text-anchor="start" x="9560.5" y="-4872.3" font-family="Times,serif" font-size="14.00">geneSymbol</text> +<text text-anchor="start" x="9597.5" y="-4851.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="9587.5" y="-4830.3" font-family="Times,serif" font-size="14.00">kgID</text> +<text text-anchor="start" x="9579.5" y="-4809.3" font-family="Times,serif" font-size="14.00">NM_ID</text> +<text text-anchor="start" x="9581" y="-4788.3" font-family="Times,serif" font-size="14.00">strand</text> +<text text-anchor="start" x="9583" y="-4767.3" font-family="Times,serif" font-size="14.00">txEnd</text> +<text text-anchor="start" x="9578.5" y="-4746.3" font-family="Times,serif" font-size="14.00">txStart</text> +<polygon fill="none" stroke="black" points="9516.5,-4737.5 9516.5,-5037.5 9691.5,-5037.5 9691.5,-4737.5 9516.5,-4737.5"/> +</g> +<!-- Geno->Species --> +<g id="edge39" class="edge"> +<title>Geno:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M4247,-704C4089.83,-704 4091.63,-576.6 3945,-520 3561.93,-372.13 3067.37,-320.3 2876.27,-305.04"/> +<polygon fill="black" stroke="black" points="2876.28,-301.52 2866.03,-304.23 2875.73,-308.5 2876.28,-301.52"/> +</g> +<!-- Organizations --> +<g id="node47" class="node"> +<title>Organizations</title> +<polygon fill="white" stroke="transparent" points="90,-4 90,-73 256,-73 256,-4 90,-4"/> +<polygon fill="#d7b5d8" stroke="transparent" points="93,-48.5 93,-69.5 253,-69.5 253,-48.5 93,-48.5"/> +<polygon fill="none" stroke="black" points="93,-48.5 93,-69.5 253,-69.5 253,-48.5 93,-48.5"/> +<text text-anchor="start" x="96" y="-55.3" font-family="Times,serif" font-size="14.00">Organizations (3 KiB)</text> +<text text-anchor="start" x="119.5" y="-33.3" font-family="Times,serif" font-size="14.00">OrganizationId</text> +<text text-anchor="start" x="105.5" y="-12.3" font-family="Times,serif" font-size="14.00">OrganizationName</text> +<polygon fill="none" stroke="black" points="90,-4 90,-73 256,-73 256,-4 90,-4"/> +</g> +<!-- StrainXRef --> +<g id="node48" class="node"> +<title>StrainXRef</title> +<polygon fill="white" stroke="transparent" points="4871,-1897 4871,-2029 5019,-2029 5019,-1897 4871,-1897"/> +<polygon fill="#df65b0" stroke="transparent" points="4874,-2005 4874,-2026 5016,-2026 5016,-2005 4874,-2005"/> +<polygon fill="none" stroke="black" points="4874,-2005 4874,-2026 5016,-2026 5016,-2005 4874,-2005"/> +<text text-anchor="start" x="4877" y="-2011.8" font-family="Times,serif" font-size="14.00">StrainXRef (1 MiB)</text> +<text text-anchor="start" x="4902" y="-1989.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="4916.5" y="-1968.8" font-family="Times,serif" font-size="14.00">OrderId</text> +<text text-anchor="start" x="4890" y="-1947.8" font-family="Times,serif" font-size="14.00">PedigreeStatus</text> +<text text-anchor="start" x="4915.5" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="4878.5" y="-1905.8" font-family="Times,serif" font-size="14.00">Used_for_mapping</text> +<polygon fill="none" stroke="black" points="4871,-1897 4871,-2029 5019,-2029 5019,-1897 4871,-1897"/> +</g> +<!-- StrainXRef->InbredSet --> +<g id="edge40" class="edge"> +<title>StrainXRef:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M4873,-1994C4828.88,-1994 4884.67,-1231.72 4854,-1200 4805.57,-1149.92 4292.6,-1190.1 4228,-1164 4115.23,-1118.43 4012.54,-1024.28 3943.58,-949.66"/> +<polygon fill="black" stroke="black" points="3945.94,-947.05 3936.6,-942.05 3940.78,-951.79 3945.94,-947.05"/> +</g> +<!-- StrainXRef->Strain --> +<g id="edge41" class="edge"> +<title>StrainXRef:StrainId->Strain</title> +<path fill="none" stroke="black" d="M5017,-1930C5057.58,-1930 5018.82,-1233.98 5041,-1200 5195.5,-963.36 5553.55,-879.5 5710.26,-853.43"/> +<polygon fill="black" stroke="black" points="5710.98,-856.86 5720.28,-851.79 5709.85,-849.95 5710.98,-856.86"/> +</g> +<!-- SnpSource --> +<g id="node49" class="node"> +<title>SnpSource</title> +<polygon fill="white" stroke="transparent" points="9726,-4832 9726,-4943 9870,-4943 9870,-4832 9726,-4832"/> +<polygon fill="#d7b5d8" stroke="transparent" points="9729,-4918.5 9729,-4939.5 9867,-4939.5 9867,-4918.5 9729,-4918.5"/> +<polygon fill="none" stroke="black" points="9729,-4918.5 9729,-4939.5 9867,-4939.5 9867,-4918.5 9729,-4918.5"/> +<text text-anchor="start" x="9732" y="-4925.3" font-family="Times,serif" font-size="14.00">SnpSource (1 KiB)</text> +<text text-anchor="start" x="9758.5" y="-4903.3" font-family="Times,serif" font-size="14.00">DateAdded</text> +<text text-anchor="start" x="9752.5" y="-4882.3" font-family="Times,serif" font-size="14.00">DateCreated</text> +<text text-anchor="start" x="9790.5" y="-4861.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="9776.5" y="-4840.3" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="9726,-4832 9726,-4943 9870,-4943 9870,-4832 9726,-4832"/> +</g> +<!-- user_openids --> +<g id="node50" class="node"> +<title>user_openids</title> +<polygon fill="white" stroke="transparent" points="9904.5,-4853 9904.5,-4922 10049.5,-4922 10049.5,-4853 9904.5,-4853"/> +<polygon fill="#f1eef6" stroke="transparent" points="9908,-4897.5 9908,-4918.5 10047,-4918.5 10047,-4897.5 9908,-4897.5"/> +<polygon fill="none" stroke="black" points="9908,-4897.5 9908,-4918.5 10047,-4918.5 10047,-4897.5 9908,-4897.5"/> +<text text-anchor="start" x="9911" y="-4904.3" font-family="Times,serif" font-size="14.00">user_openids (0 B)</text> +<text text-anchor="start" x="9939.5" y="-4882.3" font-family="Times,serif" font-size="14.00">openid_url</text> +<text text-anchor="start" x="9951.5" y="-4861.3" font-family="Times,serif" font-size="14.00">user_id</text> +<polygon fill="none" stroke="black" points="9904.5,-4853 9904.5,-4922 10049.5,-4922 10049.5,-4853 9904.5,-4853"/> +</g> +<!-- GeneMap_cuiyan --> +<g id="node51" class="node"> +<title>GeneMap_cuiyan</title> +<polygon fill="white" stroke="transparent" points="10084,-4832 10084,-4943 10290,-4943 10290,-4832 10084,-4832"/> +<polygon fill="#d7b5d8" stroke="transparent" points="10087,-4918.5 10087,-4939.5 10287,-4939.5 10287,-4918.5 10087,-4918.5"/> +<polygon fill="none" stroke="black" points="10087,-4918.5 10087,-4939.5 10287,-4939.5 10287,-4918.5 10087,-4918.5"/> +<text text-anchor="start" x="10090" y="-4925.3" font-family="Times,serif" font-size="14.00">GeneMap_cuiyan (376 KiB)</text> +<text text-anchor="start" x="10160" y="-4903.3" font-family="Times,serif" font-size="14.00">GeneID</text> +<text text-anchor="start" x="10180" y="-4882.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="10160" y="-4861.3" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="10141.5" y="-4840.3" font-family="Times,serif" font-size="14.00">TranscriptID</text> +<polygon fill="none" stroke="black" points="10084,-4832 10084,-4943 10290,-4943 10290,-4832 10084,-4832"/> +</g> +<!-- InfoFilesUser_md5 --> +<g id="node52" class="node"> +<title>InfoFilesUser_md5</title> +<polygon fill="white" stroke="transparent" points="10324,-4853 10324,-4922 10520,-4922 10520,-4853 10324,-4853"/> +<polygon fill="#f1eef6" stroke="transparent" points="10327,-4897.5 10327,-4918.5 10517,-4918.5 10517,-4897.5 10327,-4897.5"/> +<polygon fill="none" stroke="black" points="10327,-4897.5 10327,-4918.5 10517,-4918.5 10517,-4897.5 10327,-4897.5"/> +<text text-anchor="start" x="10330" y="-4904.3" font-family="Times,serif" font-size="14.00">InfoFilesUser_md5 (96 B)</text> +<text text-anchor="start" x="10387.5" y="-4882.3" font-family="Times,serif" font-size="14.00">Password</text> +<text text-anchor="start" x="10385" y="-4861.3" font-family="Times,serif" font-size="14.00">Username</text> +<polygon fill="none" stroke="black" points="10324,-4853 10324,-4922 10520,-4922 10520,-4853 10324,-4853"/> +</g> +<!-- PublishXRef --> +<g id="node53" class="node"> +<title>PublishXRef</title> +<polygon fill="lightgrey" stroke="transparent" points="2811.5,-1834 2811.5,-2092 2968.5,-2092 2968.5,-1834 2811.5,-1834"/> +<polygon fill="#df65b0" stroke="transparent" points="2815,-2068 2815,-2089 2966,-2089 2966,-2068 2815,-2068"/> +<polygon fill="none" stroke="black" points="2815,-2068 2815,-2089 2966,-2089 2966,-2068 2815,-2068"/> +<text text-anchor="start" x="2818" y="-2074.8" font-family="Times,serif" font-size="14.00">PublishXRef (2 MiB)</text> +<text text-anchor="start" x="2861.5" y="-2052.8" font-family="Times,serif" font-size="14.00">additive</text> +<text text-anchor="start" x="2853.5" y="-2031.8" font-family="Times,serif" font-size="14.00">comments</text> +<text text-anchor="start" x="2866" y="-2010.8" font-family="Times,serif" font-size="14.00">DataId</text> +<polygon fill="green" stroke="transparent" points="2815,-1984 2815,-2003 2966,-2003 2966,-1984 2815,-1984"/> +<text text-anchor="start" x="2883" y="-1989.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2847.5" y="-1968.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="2869.5" y="-1947.8" font-family="Times,serif" font-size="14.00">Locus</text> +<text text-anchor="start" x="2875.5" y="-1926.8" font-family="Times,serif" font-size="14.00">LRS</text> +<text text-anchor="start" x="2870.5" y="-1905.8" font-family="Times,serif" font-size="14.00">mean</text> +<text text-anchor="start" x="2845" y="-1884.8" font-family="Times,serif" font-size="14.00">PhenotypeId</text> +<polygon fill="green" stroke="transparent" points="2815,-1858 2815,-1877 2966,-1877 2966,-1858 2815,-1858"/> +<text text-anchor="start" x="2843" y="-1863.8" font-family="Times,serif" font-size="14.00">PublicationId</text> +<text text-anchor="start" x="2855.5" y="-1842.8" font-family="Times,serif" font-size="14.00">Sequence</text> +<polygon fill="none" stroke="black" points="2811.5,-1834 2811.5,-2092 2968.5,-2092 2968.5,-1834 2811.5,-1834"/> +</g> +<!-- PublishXRef->Publication --> +<g id="edge44" class="edge"> +<title>PublishXRef:PublicationId->Publication</title> +<path fill="none" stroke="black" d="M2814,-1867C2776.93,-1867 2815.52,-1230.19 2794,-1200 2767.79,-1163.23 2729.57,-1197.23 2699,-1164 2651.77,-1112.67 2628.61,-1038.69 2617.34,-974.68"/> +<polygon fill="black" stroke="black" points="2620.73,-973.78 2615.62,-964.5 2613.83,-974.94 2620.73,-973.78"/> +</g> +<!-- PublishXRef->InbredSet --> +<g id="edge42" class="edge"> +<title>PublishXRef:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M2967,-1973C3009.96,-1973 2955.99,-1230.74 2986,-1200 3043.5,-1141.1 3658.94,-1211.74 3726,-1164 3777.91,-1127.05 3808.95,-1065.59 3827.5,-1006.29"/> +<polygon fill="black" stroke="black" points="3830.95,-1006.99 3830.49,-996.41 3824.25,-1004.97 3830.95,-1006.99"/> +</g> +<!-- PublishXRef->Phenotype --> +<g id="edge43" class="edge"> +<title>PublishXRef:PhenotypeId->Phenotype</title> +<path fill="none" stroke="black" d="M2967,-1888C2986.12,-1888 2984.78,-1219.08 2986,-1200 2990.55,-1129.04 2998.2,-1050.39 3005.28,-985.01"/> +<polygon fill="black" stroke="black" points="3008.76,-985.37 3006.37,-975.05 3001.8,-984.61 3008.76,-985.37"/> +</g> +<!-- RatSnpPattern --> +<g id="node54" class="node"> +<title>RatSnpPattern</title> +<polygon fill="white" stroke="transparent" points="10554,-4517 10554,-5258 10748,-5258 10748,-4517 10554,-4517"/> +<polygon fill="#df65b0" stroke="transparent" points="10557,-5233.5 10557,-5254.5 10745,-5254.5 10745,-5233.5 10557,-5233.5"/> +<polygon fill="none" stroke="black" points="10557,-5233.5 10557,-5254.5 10745,-5254.5 10745,-5233.5 10557,-5233.5"/> +<text text-anchor="start" x="10560" y="-5240.3" font-family="Times,serif" font-size="14.00">RatSnpPattern (202 MiB)</text> +<text text-anchor="start" x="10638" y="-5218.3" font-family="Times,serif" font-size="14.00">ACI</text> +<text text-anchor="start" x="10628.5" y="-5197.3" font-family="Times,serif" font-size="14.00">ACI_N</text> +<text text-anchor="start" x="10629.5" y="-5176.3" font-family="Times,serif" font-size="14.00">BBDP</text> +<text text-anchor="start" x="10639.5" y="-5155.3" font-family="Times,serif" font-size="14.00">BN</text> +<text text-anchor="start" x="10630" y="-5134.3" font-family="Times,serif" font-size="14.00">BN_N</text> +<text text-anchor="start" x="10625" y="-5113.3" font-family="Times,serif" font-size="14.00">BUF_N</text> +<text text-anchor="start" x="10632.5" y="-5092.3" font-family="Times,serif" font-size="14.00">F344</text> +<text text-anchor="start" x="10623" y="-5071.3" font-family="Times,serif" font-size="14.00">F344_N</text> +<text text-anchor="start" x="10634" y="-5050.3" font-family="Times,serif" font-size="14.00">FHH</text> +<text text-anchor="start" x="10635.5" y="-5029.3" font-family="Times,serif" font-size="14.00">FHL</text> +<text text-anchor="start" x="10640" y="-5008.3" font-family="Times,serif" font-size="14.00">GK</text> +<text text-anchor="start" x="10643.5" y="-4987.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="10641" y="-4966.3" font-family="Times,serif" font-size="14.00">LE</text> +<text text-anchor="start" x="10634" y="-4945.3" font-family="Times,serif" font-size="14.00">LEW</text> +<text text-anchor="start" x="10640" y="-4924.3" font-family="Times,serif" font-size="14.00">LH</text> +<text text-anchor="start" x="10641.5" y="-4903.3" font-family="Times,serif" font-size="14.00">LL</text> +<text text-anchor="start" x="10640" y="-4882.3" font-family="Times,serif" font-size="14.00">LN</text> +<text text-anchor="start" x="10620.5" y="-4861.3" font-family="Times,serif" font-size="14.00">M520_N</text> +<text text-anchor="start" x="10632.5" y="-4840.3" font-family="Times,serif" font-size="14.00">MHS</text> +<text text-anchor="start" x="10632.5" y="-4819.3" font-family="Times,serif" font-size="14.00">MNS</text> +<text text-anchor="start" x="10629" y="-4798.3" font-family="Times,serif" font-size="14.00">MR_N</text> +<text text-anchor="start" x="10634.5" y="-4777.3" font-family="Times,serif" font-size="14.00">SBH</text> +<text text-anchor="start" x="10634.5" y="-4756.3" font-family="Times,serif" font-size="14.00">SBN</text> +<text text-anchor="start" x="10634.5" y="-4735.3" font-family="Times,serif" font-size="14.00">SHR</text> +<text text-anchor="start" x="10625" y="-4714.3" font-family="Times,serif" font-size="14.00">SHRSP</text> +<text text-anchor="start" x="10629.5" y="-4693.3" font-family="Times,serif" font-size="14.00">SnpId</text> +<text text-anchor="start" x="10640.5" y="-4672.3" font-family="Times,serif" font-size="14.00">SR</text> +<text text-anchor="start" x="10641.5" y="-4651.3" font-family="Times,serif" font-size="14.00">SS</text> +<text text-anchor="start" x="10633.5" y="-4630.3" font-family="Times,serif" font-size="14.00">WAG</text> +<text text-anchor="start" x="10634" y="-4609.3" font-family="Times,serif" font-size="14.00">WKY</text> +<text text-anchor="start" x="10625" y="-4588.3" font-family="Times,serif" font-size="14.00">WKY_N</text> +<text text-anchor="start" x="10636.5" y="-4567.3" font-family="Times,serif" font-size="14.00">WLI</text> +<text text-anchor="start" x="10634" y="-4546.3" font-family="Times,serif" font-size="14.00">WMI</text> +<text text-anchor="start" x="10628" y="-4525.3" font-family="Times,serif" font-size="14.00">WN_N</text> +<polygon fill="none" stroke="black" points="10554,-4517 10554,-5258 10748,-5258 10748,-4517 10554,-4517"/> +</g> +<!-- Genbank --> +<g id="node55" class="node"> +<title>Genbank</title> +<polygon fill="white" stroke="transparent" points="769,-797 769,-887 911,-887 911,-797 769,-797"/> +<polygon fill="#df65b0" stroke="transparent" points="772,-863 772,-884 908,-884 908,-863 772,-863"/> +<polygon fill="none" stroke="black" points="772,-863 772,-884 908,-884 908,-863 772,-863"/> +<text text-anchor="start" x="775" y="-869.8" font-family="Times,serif" font-size="14.00">Genbank (37 MiB)</text> +<text text-anchor="start" x="832.5" y="-847.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="805" y="-826.8" font-family="Times,serif" font-size="14.00">Sequence</text> +<text text-anchor="start" x="805" y="-805.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<polygon fill="none" stroke="black" points="769,-797 769,-887 911,-887 911,-797 769,-797"/> +</g> +<!-- Genbank->Species --> +<g id="edge45" class="edge"> +<title>Genbank:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M909,-809C941.22,-809 910.62,-543.18 933,-520 1058.95,-389.57 2375.45,-319.21 2715.96,-303.1"/> +<polygon fill="black" stroke="black" points="2716.17,-306.6 2725.99,-302.63 2715.84,-299.61 2716.17,-306.6"/> +</g> +<!-- EnsemblChip --> +<g id="node56" class="node"> +<title>EnsemblChip</title> +<polygon fill="white" stroke="transparent" points="1780.5,-786.5 1780.5,-897.5 1945.5,-897.5 1945.5,-786.5 1780.5,-786.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="1784,-873 1784,-894 1943,-894 1943,-873 1784,-873"/> +<polygon fill="none" stroke="black" points="1784,-873 1784,-894 1943,-894 1943,-873 1784,-873"/> +<text text-anchor="start" x="1787" y="-879.8" font-family="Times,serif" font-size="14.00">EnsemblChip (296 B)</text> +<text text-anchor="start" x="1856" y="-857.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="1842" y="-836.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="1815" y="-815.8" font-family="Times,serif" font-size="14.00">ProbeSetSize</text> +<text text-anchor="start" x="1846" y="-794.8" font-family="Times,serif" font-size="14.00">Type</text> +<polygon fill="none" stroke="black" points="1780.5,-786.5 1780.5,-897.5 1945.5,-897.5 1945.5,-786.5 1780.5,-786.5"/> +</g> +<!-- LCorrRamin3 --> +<g id="node57" class="node"> +<title>LCorrRamin3</title> +<polygon fill="white" stroke="transparent" points="10782.5,-4842.5 10782.5,-4932.5 10945.5,-4932.5 10945.5,-4842.5 10782.5,-4842.5"/> +<polygon fill="#ce1256" stroke="transparent" points="10786,-4908.5 10786,-4929.5 10943,-4929.5 10943,-4908.5 10786,-4908.5"/> +<polygon fill="none" stroke="black" points="10786,-4908.5 10786,-4929.5 10943,-4929.5 10943,-4908.5 10786,-4908.5"/> +<text text-anchor="start" x="10789" y="-4915.3" font-family="Times,serif" font-size="14.00">LCorrRamin3 (2 GiB)</text> +<text text-anchor="start" x="10834" y="-4893.3" font-family="Times,serif" font-size="14.00">GeneId1</text> +<text text-anchor="start" x="10834" y="-4872.3" font-family="Times,serif" font-size="14.00">GeneId2</text> +<text text-anchor="start" x="10845" y="-4851.3" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="10782.5,-4842.5 10782.5,-4932.5 10945.5,-4932.5 10945.5,-4842.5 10782.5,-4842.5"/> +</g> +<!-- UserPrivilege --> +<g id="node59" class="node"> +<title>UserPrivilege</title> +<polygon fill="white" stroke="transparent" points="7239,-4842.5 7239,-4932.5 7407,-4932.5 7407,-4842.5 7239,-4842.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="7242,-4908.5 7242,-4929.5 7404,-4929.5 7404,-4908.5 7242,-4908.5"/> +<polygon fill="none" stroke="black" points="7242,-4908.5 7242,-4929.5 7404,-4929.5 7404,-4908.5 7242,-4908.5"/> +<text text-anchor="start" x="7245" y="-4915.3" font-family="Times,serif" font-size="14.00">UserPrivilege (224 B)</text> +<text text-anchor="start" x="7246.5" y="-4893.3" font-family="Times,serif" font-size="14.00">download_result_priv</text> +<text text-anchor="start" x="7258" y="-4872.3" font-family="Times,serif" font-size="14.00">ProbeSetFreezeId</text> +<text text-anchor="start" x="7298.5" y="-4851.3" font-family="Times,serif" font-size="14.00">UserId</text> +<polygon fill="none" stroke="black" points="7239,-4842.5 7239,-4932.5 7407,-4932.5 7407,-4842.5 7239,-4842.5"/> +</g> +<!-- UserPrivilege->User --> +<g id="edge46" class="edge"> +<title>UserPrivilege:UserId->User</title> +<path fill="none" stroke="black" d="M7323,-4844.5C7323,-4319.22 7309.04,-3693.9 7302.41,-3426.66"/> +<polygon fill="black" stroke="black" points="7305.91,-3426.44 7302.16,-3416.53 7298.91,-3426.61 7305.91,-3426.44"/> +</g> +<!-- GeneChip --> +<g id="node61" class="node"> +<title>GeneChip</title> +<polygon fill="lightgrey" stroke="transparent" points="1980,-744.5 1980,-939.5 2116,-939.5 2116,-744.5 1980,-744.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="1983,-915 1983,-936 2113,-936 2113,-915 1983,-915"/> +<polygon fill="none" stroke="black" points="1983,-915 1983,-936 2113,-936 2113,-915 1983,-915"/> +<text text-anchor="start" x="1986" y="-921.8" font-family="Times,serif" font-size="14.00">GeneChip (9 KiB)</text> +<text text-anchor="start" x="2005.5" y="-899.8" font-family="Times,serif" font-size="14.00">GeneChipId</text> +<polygon fill="green" stroke="transparent" points="1983,-873 1983,-892 2113,-892 2113,-873 1983,-873"/> +<text text-anchor="start" x="1992" y="-878.8" font-family="Times,serif" font-size="14.00">GeneChipName</text> +<text text-anchor="start" x="2002.5" y="-857.8" font-family="Times,serif" font-size="14.00">GeoPlatform</text> +<text text-anchor="start" x="1996" y="-836.8" font-family="Times,serif" font-size="14.00">GO_tree_value</text> +<text text-anchor="start" x="2040.5" y="-815.8" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="green" stroke="transparent" points="1983,-789 1983,-808 2113,-808 2113,-789 1983,-789"/> +<text text-anchor="start" x="2026.5" y="-794.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="2013" y="-773.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="2031.5" y="-752.8" font-family="Times,serif" font-size="14.00">Title</text> +<polygon fill="none" stroke="black" points="1980,-744.5 1980,-939.5 2116,-939.5 2116,-744.5 1980,-744.5"/> +</g> +<!-- GeneChip->Species --> +<g id="edge47" class="edge"> +<title>GeneChip:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M2114,-777C2142.63,-777 2115.4,-542.59 2133,-520 2274.95,-337.76 2572.58,-304.64 2715.73,-299.62"/> +<polygon fill="black" stroke="black" points="2715.88,-303.12 2725.77,-299.31 2715.66,-296.12 2715.88,-303.12"/> +</g> +<!-- IndelXRef --> +<g id="node62" class="node"> +<title>IndelXRef</title> +<polygon fill="white" stroke="transparent" points="5716,-1918 5716,-2008 5856,-2008 5856,-1918 5716,-1918"/> +<polygon fill="#df65b0" stroke="transparent" points="5719,-1984 5719,-2005 5853,-2005 5853,-1984 5719,-1984"/> +<polygon fill="none" stroke="black" points="5719,-1984 5719,-2005 5853,-2005 5853,-1984 5719,-1984"/> +<text text-anchor="start" x="5722" y="-1990.8" font-family="Times,serif" font-size="14.00">IndelXRef (1 MiB)</text> +<text text-anchor="start" x="5760.5" y="-1968.8" font-family="Times,serif" font-size="14.00">IndelId</text> +<text text-anchor="start" x="5752" y="-1947.8" font-family="Times,serif" font-size="14.00">StrainId1</text> +<text text-anchor="start" x="5752" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId2</text> +<polygon fill="none" stroke="black" points="5716,-1918 5716,-2008 5856,-2008 5856,-1918 5716,-1918"/> +</g> +<!-- IndelXRef->Strain --> +<g id="edge48" class="edge"> +<title>IndelXRef:StrainId1->Strain</title> +<path fill="none" stroke="black" d="M5854,-1951C5904.87,-1951 5825.54,-1197.02 5796.2,-933"/> +<polygon fill="black" stroke="black" points="5799.64,-932.24 5795.05,-922.68 5792.68,-933.01 5799.64,-932.24"/> +</g> +<!-- IndelXRef->Strain --> +<g id="edge49" class="edge"> +<title>IndelXRef:StrainId2->Strain</title> +<path fill="none" stroke="black" d="M5786,-1920C5786,-1553.9 5786,-1117.79 5786,-932.93"/> +<polygon fill="black" stroke="black" points="5789.5,-932.72 5786,-922.72 5782.5,-932.72 5789.5,-932.72"/> +</g> +<!-- user --> +<g id="node63" class="node"> +<title>user</title> +<polygon fill="white" stroke="transparent" points="10979.5,-4779.5 10979.5,-4995.5 11108.5,-4995.5 11108.5,-4779.5 10979.5,-4779.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="10983,-4971.5 10983,-4992.5 11106,-4992.5 11106,-4971.5 10983,-4971.5"/> +<polygon fill="none" stroke="black" points="10983,-4971.5 10983,-4992.5 11106,-4992.5 11106,-4971.5 10983,-4971.5"/> +<text text-anchor="start" x="10997" y="-4978.3" font-family="Times,serif" font-size="14.00">user (64 KiB)</text> +<text text-anchor="start" x="11023" y="-4956.3" font-family="Times,serif" font-size="14.00">active</text> +<text text-anchor="start" x="11008.5" y="-4935.3" font-family="Times,serif" font-size="14.00">confirmed</text> +<text text-anchor="start" x="10993" y="-4914.3" font-family="Times,serif" font-size="14.00">email_address</text> +<text text-anchor="start" x="11009.5" y="-4893.3" font-family="Times,serif" font-size="14.00">full_name</text> +<text text-anchor="start" x="11037.5" y="-4872.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="10999.5" y="-4851.3" font-family="Times,serif" font-size="14.00">organization</text> +<text text-anchor="start" x="11010" y="-4830.3" font-family="Times,serif" font-size="14.00">password</text> +<text text-anchor="start" x="10985" y="-4809.3" font-family="Times,serif" font-size="14.00">registration_info</text> +<text text-anchor="start" x="11008" y="-4788.3" font-family="Times,serif" font-size="14.00">superuser</text> +<polygon fill="none" stroke="black" points="10979.5,-4779.5 10979.5,-4995.5 11108.5,-4995.5 11108.5,-4779.5 10979.5,-4779.5"/> +</g> +<!-- PublishSE --> +<g id="node64" class="node"> +<title>PublishSE</title> +<polygon fill="white" stroke="transparent" points="5890,-1918 5890,-2008 6034,-2008 6034,-1918 5890,-1918"/> +<polygon fill="#df65b0" stroke="transparent" points="5893,-1984 5893,-2005 6031,-2005 6031,-1984 5893,-1984"/> +<polygon fill="none" stroke="black" points="5893,-1984 5893,-2005 6031,-2005 6031,-1984 5893,-1984"/> +<text text-anchor="start" x="5896" y="-1990.8" font-family="Times,serif" font-size="14.00">PublishSE (3 MiB)</text> +<text text-anchor="start" x="5937.5" y="-1968.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="5943.5" y="-1947.8" font-family="Times,serif" font-size="14.00">error</text> +<text text-anchor="start" x="5932.5" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="5890,-1918 5890,-2008 6034,-2008 6034,-1918 5890,-1918"/> +</g> +<!-- PublishSE->Strain --> +<g id="edge50" class="edge"> +<title>PublishSE:StrainId->Strain</title> +<path fill="none" stroke="black" d="M5962,-1920C5962,-1549.32 5859.2,-1116.17 5810.73,-932.54"/> +<polygon fill="black" stroke="black" points="5814.06,-931.43 5808.11,-922.66 5807.29,-933.22 5814.06,-931.43"/> +</g> +<!-- EnsemblProbe --> +<g id="node65" class="node"> +<title>EnsemblProbe</title> +<polygon fill="white" stroke="transparent" points="11143,-4821.5 11143,-4953.5 11327,-4953.5 11327,-4821.5 11143,-4821.5"/> +<polygon fill="#df65b0" stroke="transparent" points="11146,-4929.5 11146,-4950.5 11324,-4950.5 11324,-4929.5 11146,-4929.5"/> +<polygon fill="none" stroke="black" points="11146,-4929.5 11146,-4950.5 11324,-4950.5 11324,-4929.5 11146,-4929.5"/> +<text text-anchor="start" x="11149" y="-4936.3" font-family="Times,serif" font-size="14.00">EnsemblProbe (94 MiB)</text> +<text text-anchor="start" x="11211" y="-4914.3" font-family="Times,serif" font-size="14.00">ChipId</text> +<text text-anchor="start" x="11227.5" y="-4893.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="11212" y="-4872.3" font-family="Times,serif" font-size="14.00">length</text> +<text text-anchor="start" x="11213.5" y="-4851.3" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="11201.5" y="-4830.3" font-family="Times,serif" font-size="14.00">ProbeSet</text> +<polygon fill="none" stroke="black" points="11143,-4821.5 11143,-4953.5 11327,-4953.5 11327,-4821.5 11143,-4821.5"/> +</g> +<!-- InfoFiles --> +<g id="node66" class="node"> +<title>InfoFiles</title> +<polygon fill="lightgrey" stroke="transparent" points="2048.5,-1424.5 2048.5,-2501.5 2279.5,-2501.5 2279.5,-1424.5 2048.5,-1424.5"/> +<polygon fill="#df65b0" stroke="transparent" points="2052,-2477 2052,-2498 2277,-2498 2277,-2477 2052,-2477"/> +<polygon fill="none" stroke="black" points="2052,-2477 2052,-2498 2277,-2498 2277,-2477 2052,-2477"/> +<text text-anchor="start" x="2104" y="-2483.8" font-family="Times,serif" font-size="14.00">InfoFiles (4 MiB)</text> +<text text-anchor="start" x="2085.5" y="-2461.8" font-family="Times,serif" font-size="14.00">About_Array_Platform</text> +<text text-anchor="start" x="2119" y="-2440.8" font-family="Times,serif" font-size="14.00">About_Cases</text> +<text text-anchor="start" x="2054" y="-2419.8" font-family="Times,serif" font-size="14.00">About_Data_Values_Processing</text> +<text text-anchor="start" x="2104.5" y="-2398.8" font-family="Times,serif" font-size="14.00">About_Download</text> +<text text-anchor="start" x="2117" y="-2377.8" font-family="Times,serif" font-size="14.00">About_Tissue</text> +<text text-anchor="start" x="2104" y="-2356.8" font-family="Times,serif" font-size="14.00">AuthorizedUsers</text> +<text text-anchor="start" x="2116" y="-2335.8" font-family="Times,serif" font-size="14.00">AvgMethodId</text> +<text text-anchor="start" x="2135.5" y="-2314.8" font-family="Times,serif" font-size="14.00">Citation</text> +<text text-anchor="start" x="2149.5" y="-2293.8" font-family="Times,serif" font-size="14.00">City</text> +<text text-anchor="start" x="2112" y="-2272.8" font-family="Times,serif" font-size="14.00">Contact_Name</text> +<text text-anchor="start" x="2122" y="-2251.8" font-family="Times,serif" font-size="14.00">Contributor</text> +<text text-anchor="start" x="2135.5" y="-2230.8" font-family="Times,serif" font-size="14.00">Country</text> +<text text-anchor="start" x="2069" y="-2209.8" font-family="Times,serif" font-size="14.00">Data_Source_Acknowledge</text> +<text text-anchor="start" x="2129.5" y="-2188.8" font-family="Times,serif" font-size="14.00">DatasetId</text> +<text text-anchor="start" x="2129" y="-2167.8" font-family="Times,serif" font-size="14.00">DB_Name</text> +<text text-anchor="start" x="2121" y="-2146.8" font-family="Times,serif" font-size="14.00">Department</text> +<text text-anchor="start" x="2140" y="-2125.8" font-family="Times,serif" font-size="14.00">Emails</text> +<text text-anchor="start" x="2101.5" y="-2104.8" font-family="Times,serif" font-size="14.00">Experiment_Type</text> +<text text-anchor="start" x="2122" y="-2083.8" font-family="Times,serif" font-size="14.00">GeneChipId</text> +<polygon fill="green" stroke="transparent" points="2052,-2057 2052,-2076 2277,-2076 2277,-2057 2052,-2057"/> +<text text-anchor="start" x="2111" y="-2062.8" font-family="Times,serif" font-size="14.00">GN_AccesionId</text> +<text text-anchor="start" x="2128.5" y="-2041.8" font-family="Times,serif" font-size="14.00">InbredSet</text> +<text text-anchor="start" x="2121.5" y="-2020.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="2129.5" y="-1999.8" font-family="Times,serif" font-size="14.00">InfoFileId</text> +<polygon fill="green" stroke="transparent" points="2052,-1973 2052,-1992 2277,-1992 2277,-1973 2052,-1973"/> +<text text-anchor="start" x="2120.5" y="-1978.8" font-family="Times,serif" font-size="14.00">InfoFileTitle</text> +<text text-anchor="start" x="2112" y="-1957.8" font-family="Times,serif" font-size="14.00">InfoPageName</text> +<text text-anchor="start" x="2117" y="-1936.8" font-family="Times,serif" font-size="14.00">InfoPageTitle</text> +<text text-anchor="start" x="2125" y="-1915.8" font-family="Times,serif" font-size="14.00">Laboratory</text> +<text text-anchor="start" x="2113.5" y="-1894.8" font-family="Times,serif" font-size="14.00">Normalization</text> +<text text-anchor="start" x="2129.5" y="-1873.8" font-family="Times,serif" font-size="14.00">Organism</text> +<text text-anchor="start" x="2119" y="-1852.8" font-family="Times,serif" font-size="14.00">Organism_Id</text> +<text text-anchor="start" x="2093.5" y="-1831.8" font-family="Times,serif" font-size="14.00">Organization_Name</text> +<text text-anchor="start" x="2110" y="-1810.8" font-family="Times,serif" font-size="14.00">Overall_Design</text> +<text text-anchor="start" x="2142" y="-1789.8" font-family="Times,serif" font-size="14.00">Phone</text> +<text text-anchor="start" x="2129.5" y="-1768.8" font-family="Times,serif" font-size="14.00">Platforms</text> +<text text-anchor="start" x="2132" y="-1747.8" font-family="Times,serif" font-size="14.00">Progreso</text> +<text text-anchor="start" x="2088.5" y="-1726.8" font-family="Times,serif" font-size="14.00">QualityControlStatus</text> +<text text-anchor="start" x="2134" y="-1705.8" font-family="Times,serif" font-size="14.00">Samples</text> +<text text-anchor="start" x="2137" y="-1684.8" font-family="Times,serif" font-size="14.00">Species</text> +<text text-anchor="start" x="2129.5" y="-1663.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<polygon fill="green" stroke="transparent" points="2052,-1637 2052,-1656 2277,-1656 2277,-1637 2052,-1637"/> +<text text-anchor="start" x="2132.5" y="-1642.8" font-family="Times,serif" font-size="14.00">Specifics</text> +<text text-anchor="start" x="2145" y="-1621.8" font-family="Times,serif" font-size="14.00">State</text> +<text text-anchor="start" x="2141" y="-1600.8" font-family="Times,serif" font-size="14.00">Status</text> +<text text-anchor="start" x="2141.5" y="-1579.8" font-family="Times,serif" font-size="14.00">Street</text> +<text text-anchor="start" x="2102.5" y="-1558.8" font-family="Times,serif" font-size="14.00">Submission_Date</text> +<text text-anchor="start" x="2129.5" y="-1537.8" font-family="Times,serif" font-size="14.00">Summary</text> +<text text-anchor="start" x="2141.5" y="-1516.8" font-family="Times,serif" font-size="14.00">Tissue</text> +<text text-anchor="start" x="2134" y="-1495.8" font-family="Times,serif" font-size="14.00">TissueId</text> +<polygon fill="green" stroke="transparent" points="2052,-1469 2052,-1488 2277,-1488 2277,-1469 2052,-1469"/> +<text text-anchor="start" x="2148" y="-1474.8" font-family="Times,serif" font-size="14.00">Title</text> +<text text-anchor="start" x="2148.5" y="-1453.8" font-family="Times,serif" font-size="14.00">URL</text> +<text text-anchor="start" x="2152" y="-1432.8" font-family="Times,serif" font-size="14.00">ZIP</text> +<polygon fill="none" stroke="black" points="2048.5,-1424.5 2048.5,-2501.5 2279.5,-2501.5 2279.5,-1424.5 2048.5,-1424.5"/> +</g> +<!-- InfoFiles->Datasets --> +<g id="edge52" class="edge"> +<title>InfoFiles:DatasetId->Datasets</title> +<path fill="none" stroke="black" d="M2051,-2193C1940.48,-2193 2072.47,-1276.81 1993,-1200 1933.9,-1142.88 581.41,-1211.03 514,-1164 470.71,-1133.8 442.18,-1086.38 423.37,-1037.17"/> +<polygon fill="black" stroke="black" points="426.6,-1035.81 419.85,-1027.64 420.03,-1038.23 426.6,-1035.81"/> +</g> +<!-- InfoFiles->InbredSet --> +<g id="edge54" class="edge"> +<title>InfoFiles:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M2278,-2025C2323.84,-2025 2263.64,-1232.47 2296,-1200 2352.1,-1143.71 3660.72,-1209.33 3726,-1164 3778.57,-1127.49 3809.73,-1065.76 3828.19,-1006.12"/> +<polygon fill="black" stroke="black" points="3831.65,-1006.77 3831.16,-996.18 3824.94,-1004.76 3831.65,-1006.77"/> +</g> +<!-- InfoFiles->Species --> +<g id="edge55" class="edge"> +<title>InfoFiles:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M2278,-1667C2303.96,-1667 2277.61,-1218.33 2296,-1200 2376.56,-1119.71 3240,-1245.83 3319,-1164 3368.7,-1112.52 3358.57,-579.62 3319,-520 3219.73,-370.42 2996.86,-322.06 2876.6,-306.62"/> +<polygon fill="black" stroke="black" points="2876.71,-303.1 2866.35,-305.35 2875.85,-310.05 2876.71,-303.1"/> +</g> +<!-- InfoFiles->AvgMethod --> +<g id="edge51" class="edge"> +<title>InfoFiles:AvgMethodId->AvgMethod</title> +<path fill="none" stroke="black" d="M2051,-2340C1924.17,-2340 2083.05,-1289.32 1993,-1200 1926.52,-1134.05 1224.64,-1221.84 1151,-1164 1075.17,-1104.44 1058.6,-986.94 1056.31,-911.82"/> +<polygon fill="black" stroke="black" points="1059.8,-911.43 1056.07,-901.51 1052.8,-911.59 1059.8,-911.43"/> +</g> +<!-- InfoFiles->GeneChip --> +<g id="edge53" class="edge"> +<title>InfoFiles:GeneChipId->GeneChip</title> +<path fill="none" stroke="black" d="M2051,-2088C2022.77,-2088 2038.62,-1258.67 2045.41,-953.75"/> +<polygon fill="black" stroke="black" points="2048.91,-953.64 2045.63,-943.57 2041.91,-953.49 2048.91,-953.64"/> +</g> +<!-- InfoFiles->Tissue --> +<g id="edge56" class="edge"> +<title>InfoFiles:TissueId->Tissue</title> +<path fill="none" stroke="black" d="M2278,-1499C2311.28,-1499 2278.84,-1228.52 2296,-1200 2311.83,-1173.68 2336.81,-1188.76 2355,-1164 2402.06,-1099.94 2421.62,-1011.33 2429.66,-943.43"/> +<polygon fill="black" stroke="black" points="2433.17,-943.47 2430.81,-933.15 2426.22,-942.7 2433.17,-943.47"/> +</g> +<!-- Vlookup --> +<g id="node67" class="node"> +<title>Vlookup</title> +<polygon fill="white" stroke="transparent" points="2070,-2766 2070,-3822 2258,-3822 2258,-2766 2070,-2766"/> +<polygon fill="#d7b5d8" stroke="transparent" points="2073,-3798 2073,-3819 2255,-3819 2255,-3798 2073,-3798"/> +<polygon fill="none" stroke="black" points="2073,-3798 2073,-3819 2255,-3819 2255,-3798 2073,-3798"/> +<text text-anchor="start" x="2099" y="-3804.8" font-family="Times,serif" font-size="14.00">Vlookup (120 KiB)</text> +<text text-anchor="start" x="2147" y="-3782.8" font-family="Times,serif" font-size="14.00">alias</text> +<text text-anchor="start" x="2137" y="-3761.8" font-family="Times,serif" font-size="14.00">AlignID</text> +<text text-anchor="start" x="2130.5" y="-3740.8" font-family="Times,serif" font-size="14.00">assembly</text> +<text text-anchor="start" x="2115.5" y="-3719.8" font-family="Times,serif" font-size="14.00">AvgMethodId</text> +<text text-anchor="start" x="2135.5" y="-3698.8" font-family="Times,serif" font-size="14.00">BlatSeq</text> +<text text-anchor="start" x="2117.5" y="-3677.8" font-family="Times,serif" font-size="14.00">CAS_number</text> +<text text-anchor="start" x="2137.5" y="-3656.8" font-family="Times,serif" font-size="14.00">cdsEnd</text> +<text text-anchor="start" x="2133.5" y="-3635.8" font-family="Times,serif" font-size="14.00">cdsStart</text> +<text text-anchor="start" x="2129" y="-3614.8" font-family="Times,serif" font-size="14.00">ChEBI_ID</text> +<text text-anchor="start" x="2120" y="-3593.8" font-family="Times,serif" font-size="14.00">ChEMBL_ID</text> +<text text-anchor="start" x="2108" y="-3572.8" font-family="Times,serif" font-size="14.00">ChemSpider_ID</text> +<text text-anchor="start" x="2150.5" y="-3551.8" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="2129" y="-3530.8" font-family="Times,serif" font-size="14.00">DatasetId</text> +<text text-anchor="start" x="2123.5" y="-3509.8" font-family="Times,serif" font-size="14.00">description</text> +<text text-anchor="start" x="2122" y="-3488.8" font-family="Times,serif" font-size="14.00">EC_number</text> +<text text-anchor="start" x="2125.5" y="-3467.8" font-family="Times,serif" font-size="14.00">exonCount</text> +<text text-anchor="start" x="2129" y="-3446.8" font-family="Times,serif" font-size="14.00">exonEnds</text> +<text text-anchor="start" x="2124.5" y="-3425.8" font-family="Times,serif" font-size="14.00">exonStarts</text> +<text text-anchor="start" x="2105" y="-3404.8" font-family="Times,serif" font-size="14.00">Full_Description</text> +<text text-anchor="start" x="2121.5" y="-3383.8" font-family="Times,serif" font-size="14.00">GeneChipId</text> +<text text-anchor="start" x="2138" y="-3362.8" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="2110.5" y="-3341.8" font-family="Times,serif" font-size="14.00">GN_AccesionId</text> +<text text-anchor="start" x="2128" y="-3320.8" font-family="Times,serif" font-size="14.00">HMDB_ID</text> +<text text-anchor="start" x="2156.5" y="-3299.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2121" y="-3278.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="2129" y="-3257.8" font-family="Times,serif" font-size="14.00">InfoFileId</text> +<text text-anchor="start" x="2111.5" y="-3236.8" font-family="Times,serif" font-size="14.00">InfoPageName</text> +<text text-anchor="start" x="2130.5" y="-3215.8" font-family="Times,serif" font-size="14.00">KEGG_ID</text> +<text text-anchor="start" x="2147" y="-3194.8" font-family="Times,serif" font-size="14.00">kgID</text> +<text text-anchor="start" x="2152" y="-3173.8" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="2099.5" y="-3152.8" font-family="Times,serif" font-size="14.00">Molecular_Weight</text> +<text text-anchor="start" x="2142.5" y="-3131.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="2139" y="-3110.8" font-family="Times,serif" font-size="14.00">NM_ID</text> +<text text-anchor="start" x="2118.5" y="-3089.8" font-family="Times,serif" font-size="14.00">Nugowiki_ID</text> +<text text-anchor="start" x="2135" y="-3068.8" font-family="Times,serif" font-size="14.00">Position</text> +<text text-anchor="start" x="2079" y="-3047.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_end</text> +<text text-anchor="start" x="2075" y="-3026.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_start</text> +<text text-anchor="start" x="2129" y="-3005.8" font-family="Times,serif" font-size="14.00">ProteinID</text> +<text text-anchor="start" x="2117.5" y="-2984.8" font-family="Times,serif" font-size="14.00">PubChem_ID</text> +<text text-anchor="start" x="2129" y="-2963.8" font-family="Times,serif" font-size="14.00">SnpName</text> +<text text-anchor="start" x="2129" y="-2942.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="2139.5" y="-2921.8" font-family="Times,serif" font-size="14.00">Strand</text> +<text text-anchor="start" x="2137" y="-2900.8" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="2133.5" y="-2879.8" font-family="Times,serif" font-size="14.00">TissueId</text> +<text text-anchor="start" x="2141" y="-2858.8" font-family="Times,serif" font-size="14.00">TxEnd</text> +<text text-anchor="start" x="2136.5" y="-2837.8" font-family="Times,serif" font-size="14.00">TxStart</text> +<text text-anchor="start" x="2135" y="-2816.8" font-family="Times,serif" font-size="14.00">UNII_ID</text> +<text text-anchor="start" x="2126" y="-2795.8" font-family="Times,serif" font-size="14.00">VLBlatSeq</text> +<text text-anchor="start" x="2114" y="-2774.8" font-family="Times,serif" font-size="14.00">VLProbeSetId</text> +<polygon fill="none" stroke="black" points="2070,-2766 2070,-3822 2258,-3822 2258,-2766 2070,-2766"/> +</g> +<!-- Vlookup->Datasets --> +<g id="edge58" class="edge"> +<title>Vlookup:DatasetId->Datasets</title> +<path fill="none" stroke="black" d="M2072,-3535C1300.04,-3535 942.38,-3381.71 535,-2726 490.25,-2653.97 509.59,-1283.71 496,-1200 487.3,-1146.41 472.62,-1089.65 456.8,-1037.55"/> +<polygon fill="black" stroke="black" points="460.12,-1036.41 453.84,-1027.88 453.42,-1038.46 460.12,-1036.41"/> +</g> +<!-- Vlookup->InbredSet --> +<g id="edge60" class="edge"> +<title>Vlookup:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M2256,-3282C2538.62,-3282 2374.11,-2897.73 2622,-2762 2701.02,-2718.73 3368.94,-2790.34 3432,-2726 3491.36,-2665.43 3412.08,-1262.87 3469,-1200 3556.17,-1103.72 3659.82,-1247.85 3759,-1164 3805.29,-1124.86 3829.81,-1064.39 3842.6,-1006.44"/> +<polygon fill="black" stroke="black" points="3846.07,-1006.91 3844.7,-996.41 3839.22,-1005.48 3846.07,-1006.91"/> +</g> +<!-- Vlookup->Species --> +<g id="edge62" class="edge"> +<title>Vlookup:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M2256,-2946C2438.07,-2946 2446.26,-2809.59 2622,-2762 2683.25,-2745.41 3148.01,-2771.74 3192,-2726 3250.79,-2664.88 3170.49,-1261.39 3229,-1200 3305.4,-1119.84 3650.58,-1245.08 3726,-1164 3774.74,-1111.61 3770.38,-576.13 3726,-520 3619.99,-385.91 3082.28,-324.84 2876.38,-306.1"/> +<polygon fill="black" stroke="black" points="2876.51,-302.6 2866.23,-305.19 2875.88,-309.57 2876.51,-302.6"/> +</g> +<!-- Vlookup->AvgMethod --> +<g id="edge57" class="edge"> +<title>Vlookup:AvgMethodId->AvgMethod</title> +<path fill="none" stroke="black" d="M2072,-3724C882.38,-3724 1769.05,-2234.12 1181,-1200 1170.7,-1181.9 1160.77,-1182.39 1151,-1164 1107.82,-1082.73 1082.45,-978.95 1069.39,-911.74"/> +<polygon fill="black" stroke="black" points="1072.79,-910.86 1067.48,-901.69 1065.91,-912.17 1072.79,-910.86"/> +</g> +<!-- Vlookup->GeneChip --> +<g id="edge59" class="edge"> +<title>Vlookup:GeneChipId->GeneChip</title> +<path fill="none" stroke="black" d="M2072,-3388C1777.21,-3388 2040.12,-3020.64 2031,-2726 2010.03,-2048.1 2014.87,-1878.03 2031,-1200 2032.96,-1117.52 2037.47,-1024.42 2041.35,-953.97"/> +<polygon fill="black" stroke="black" points="2044.86,-953.92 2041.92,-943.75 2037.87,-953.54 2044.86,-953.92"/> +</g> +<!-- Vlookup->InfoFiles --> +<g id="edge61" class="edge"> +<title>Vlookup:InfoFileId->InfoFiles</title> +<path fill="none" stroke="black" d="M2256,-3261C2335.46,-3261 2299.68,-2868.62 2251.39,-2515.5"/> +<polygon fill="black" stroke="black" points="2254.86,-2515.02 2250.03,-2505.59 2247.92,-2515.97 2254.86,-2515.02"/> +</g> +<!-- Vlookup->Tissue --> +<g id="edge63" class="edge"> +<title>Vlookup:TissueId->Tissue</title> +<path fill="none" stroke="black" d="M2256,-2883C2406.09,-2883 2477.36,-2854.46 2555,-2726 2598.85,-2653.44 2589.96,-1277.23 2555,-1200 2545,-1177.91 2526.59,-1184.73 2514,-1164 2473.2,-1096.81 2453.44,-1009.63 2443.89,-943.08"/> +<polygon fill="black" stroke="black" points="2447.33,-942.43 2442.5,-933 2440.4,-943.39 2447.33,-942.43"/> +</g> +<!-- user_collection --> +<g id="node68" class="node"> +<title>user_collection</title> +<polygon fill="white" stroke="transparent" points="11361,-4811 11361,-4964 11543,-4964 11543,-4811 11361,-4811"/> +<polygon fill="#d7b5d8" stroke="transparent" points="11364,-4939.5 11364,-4960.5 11540,-4960.5 11540,-4939.5 11364,-4939.5"/> +<polygon fill="none" stroke="black" points="11364,-4939.5 11364,-4960.5 11540,-4960.5 11540,-4939.5 11364,-4939.5"/> +<text text-anchor="start" x="11367" y="-4946.3" font-family="Times,serif" font-size="14.00">user_collection (60 KiB)</text> +<text text-anchor="start" x="11380" y="-4924.3" font-family="Times,serif" font-size="14.00">changed_timestamp</text> +<text text-anchor="start" x="11383" y="-4903.3" font-family="Times,serif" font-size="14.00">created_timestamp</text> +<text text-anchor="start" x="11445" y="-4882.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="11418.5" y="-4861.3" font-family="Times,serif" font-size="14.00">members</text> +<text text-anchor="start" x="11432" y="-4840.3" font-family="Times,serif" font-size="14.00">name</text> +<text text-anchor="start" x="11436" y="-4819.3" font-family="Times,serif" font-size="14.00">user</text> +<polygon fill="none" stroke="black" points="11361,-4811 11361,-4964 11543,-4964 11543,-4811 11361,-4811"/> +</g> +<!-- pubmedsearch --> +<g id="node69" class="node"> +<title>pubmedsearch</title> +<polygon fill="white" stroke="transparent" points="11577.5,-4800.5 11577.5,-4974.5 11770.5,-4974.5 11770.5,-4800.5 11577.5,-4800.5"/> +<polygon fill="#df65b0" stroke="transparent" points="11581,-4950.5 11581,-4971.5 11768,-4971.5 11768,-4950.5 11581,-4950.5"/> +<polygon fill="none" stroke="black" points="11581,-4950.5 11581,-4971.5 11768,-4971.5 11768,-4950.5 11581,-4950.5"/> +<text text-anchor="start" x="11584" y="-4957.3" font-family="Times,serif" font-size="14.00">pubmedsearch (586 MiB)</text> +<text text-anchor="start" x="11619.5" y="-4935.3" font-family="Times,serif" font-size="14.00">authorfullname</text> +<text text-anchor="start" x="11612.5" y="-4914.3" font-family="Times,serif" font-size="14.00">authorshortname</text> +<text text-anchor="start" x="11650" y="-4893.3" font-family="Times,serif" font-size="14.00">geneid</text> +<text text-anchor="start" x="11667.5" y="-4872.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="11644" y="-4851.3" font-family="Times,serif" font-size="14.00">institute</text> +<text text-anchor="start" x="11638.5" y="-4830.3" font-family="Times,serif" font-size="14.00">pubmedid</text> +<text text-anchor="start" x="11659.5" y="-4809.3" font-family="Times,serif" font-size="14.00">title</text> +<polygon fill="none" stroke="black" points="11577.5,-4800.5 11577.5,-4974.5 11770.5,-4974.5 11770.5,-4800.5 11577.5,-4800.5"/> +</g> +<!-- EnsemblProbeLocation --> +<g id="node70" class="node"> +<title>EnsemblProbeLocation</title> +<polygon fill="white" stroke="transparent" points="6793,-4790 6793,-4985 7037,-4985 7037,-4790 6793,-4790"/> +<polygon fill="#df65b0" stroke="transparent" points="6796,-4960.5 6796,-4981.5 7034,-4981.5 7034,-4960.5 6796,-4960.5"/> +<polygon fill="none" stroke="black" points="6796,-4960.5 6796,-4981.5 7034,-4981.5 7034,-4960.5 6796,-4960.5"/> +<text text-anchor="start" x="6799" y="-4967.3" font-family="Times,serif" font-size="14.00">EnsemblProbeLocation (99 MiB)</text> +<text text-anchor="start" x="6901.5" y="-4945.3" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="6900.5" y="-4924.3" font-family="Times,serif" font-size="14.00">End</text> +<text text-anchor="start" x="6879" y="-4903.3" font-family="Times,serif" font-size="14.00">End_2016</text> +<text text-anchor="start" x="6867" y="-4882.3" font-family="Times,serif" font-size="14.00">MisMataches</text> +<text text-anchor="start" x="6886.5" y="-4861.3" font-family="Times,serif" font-size="14.00">ProbeId</text> +<text text-anchor="start" x="6896.5" y="-4840.3" font-family="Times,serif" font-size="14.00">Start</text> +<text text-anchor="start" x="6875" y="-4819.3" font-family="Times,serif" font-size="14.00">Start_2016</text> +<text text-anchor="start" x="6890.5" y="-4798.3" font-family="Times,serif" font-size="14.00">Strand</text> +<polygon fill="none" stroke="black" points="6793,-4790 6793,-4985 7037,-4985 7037,-4790 6793,-4790"/> +</g> +<!-- EnsemblProbeLocation->Probe --> +<g id="edge64" class="edge"> +<title>EnsemblProbeLocation:ProbeId->Probe</title> +<path fill="none" stroke="black" d="M7035,-4864.5C7071.26,-4864.5 6964.83,-3784.86 6927.45,-3416.46"/> +<polygon fill="black" stroke="black" points="6930.91,-3415.9 6926.42,-3406.3 6923.95,-3416.61 6930.91,-3415.9"/> +</g> +<!-- Investigators->Organizations --> +<g id="edge65" class="edge"> +<title>Investigators:OrganizationId->Organizations</title> +<path fill="none" stroke="black" d="M256,-296.5C296.78,-296.5 271.73,-150.19 255,-113 250.33,-102.62 243.39,-93.09 235.5,-84.57"/> +<polygon fill="black" stroke="black" points="237.88,-82 228.35,-77.36 232.9,-86.93 237.88,-82"/> +</g> +<!-- ProbeSetSE --> +<g id="node72" class="node"> +<title>ProbeSetSE</title> +<polygon fill="white" stroke="transparent" points="6068,-1918 6068,-2008 6222,-2008 6222,-1918 6068,-1918"/> +<polygon fill="#ce1256" stroke="transparent" points="6071,-1984 6071,-2005 6219,-2005 6219,-1984 6071,-1984"/> +<polygon fill="none" stroke="black" points="6071,-1984 6071,-2005 6219,-2005 6219,-1984 6071,-1984"/> +<text text-anchor="start" x="6074" y="-1990.8" font-family="Times,serif" font-size="14.00">ProbeSetSE (7 GiB)</text> +<text text-anchor="start" x="6120.5" y="-1968.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="6126.5" y="-1947.8" font-family="Times,serif" font-size="14.00">error</text> +<text text-anchor="start" x="6115.5" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="6068,-1918 6068,-2008 6222,-2008 6222,-1918 6068,-1918"/> +</g> +<!-- ProbeSetSE->Strain --> +<g id="edge66" class="edge"> +<title>ProbeSetSE:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6070,-1930C6049.72,-1930 6057.62,-1219.18 6051,-1200 6011.97,-1086.88 5923.03,-979.85 5858.94,-913.01"/> +<polygon fill="black" stroke="black" points="5861.11,-910.22 5851.65,-905.47 5856.08,-915.09 5861.11,-910.22"/> +</g> +<!-- TableComments --> +<g id="node74" class="node"> +<title>TableComments</title> +<polygon fill="white" stroke="transparent" points="11805,-4853 11805,-4922 11995,-4922 11995,-4853 11805,-4853"/> +<polygon fill="#d7b5d8" stroke="transparent" points="11808,-4897.5 11808,-4918.5 11992,-4918.5 11992,-4897.5 11808,-4897.5"/> +<polygon fill="none" stroke="black" points="11808,-4897.5 11808,-4918.5 11992,-4918.5 11992,-4897.5 11808,-4897.5"/> +<text text-anchor="start" x="11811" y="-4904.3" font-family="Times,serif" font-size="14.00">TableComments (34 KiB)</text> +<text text-anchor="start" x="11865" y="-4882.3" font-family="Times,serif" font-size="14.00">Comment</text> +<text text-anchor="start" x="11859.5" y="-4861.3" font-family="Times,serif" font-size="14.00">TableName</text> +<polygon fill="none" stroke="black" points="11805,-4853 11805,-4922 11995,-4922 11995,-4853 11805,-4853"/> +</g> +<!-- Dataset_mbat --> +<g id="node75" class="node"> +<title>Dataset_mbat</title> +<polygon fill="white" stroke="transparent" points="12029.5,-4800.5 12029.5,-4974.5 12198.5,-4974.5 12198.5,-4800.5 12029.5,-4800.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="12033,-4950.5 12033,-4971.5 12196,-4971.5 12196,-4950.5 12033,-4950.5"/> +<polygon fill="none" stroke="black" points="12033,-4950.5 12033,-4971.5 12196,-4971.5 12196,-4950.5 12033,-4950.5"/> +<text text-anchor="start" x="12036" y="-4957.3" font-family="Times,serif" font-size="14.00">Dataset_mbat (764 B)</text> +<text text-anchor="start" x="12095.5" y="-4935.3" font-family="Times,serif" font-size="14.00">cross</text> +<text text-anchor="start" x="12082" y="-4914.3" font-family="Times,serif" font-size="14.00">database</text> +<text text-anchor="start" x="12040" y="-4893.3" font-family="Times,serif" font-size="14.00">database_LongName</text> +<text text-anchor="start" x="12107.5" y="-4872.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="12088" y="-4851.3" font-family="Times,serif" font-size="14.00">species</text> +<text text-anchor="start" x="12091" y="-4830.3" font-family="Times,serif" font-size="14.00">switch</text> +<text text-anchor="start" x="12093" y="-4809.3" font-family="Times,serif" font-size="14.00">tissue</text> +<polygon fill="none" stroke="black" points="12029.5,-4800.5 12029.5,-4974.5 12198.5,-4974.5 12198.5,-4800.5 12029.5,-4800.5"/> +</g> +<!-- CaseAttributeXRefNew --> +<g id="node76" class="node"> +<title>CaseAttributeXRefNew</title> +<polygon fill="white" stroke="transparent" points="3817,-1907.5 3817,-2018.5 4053,-2018.5 4053,-1907.5 3817,-1907.5"/> +<polygon fill="#df65b0" stroke="transparent" points="3820,-1994 3820,-2015 4050,-2015 4050,-1994 3820,-1994"/> +<polygon fill="none" stroke="black" points="3820,-1994 3820,-2015 4050,-2015 4050,-1994 3820,-1994"/> +<text text-anchor="start" x="3823" y="-2000.8" font-family="Times,serif" font-size="14.00">CaseAttributeXRefNew (5 MiB)</text> +<text text-anchor="start" x="3877.5" y="-1978.8" font-family="Times,serif" font-size="14.00">CaseAttributeId</text> +<text text-anchor="start" x="3892" y="-1957.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="3905.5" y="-1936.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="3915" y="-1915.8" font-family="Times,serif" font-size="14.00">Value</text> +<polygon fill="none" stroke="black" points="3817,-1907.5 3817,-2018.5 4053,-2018.5 4053,-1907.5 3817,-1907.5"/> +</g> +<!-- CaseAttributeXRefNew->InbredSet --> +<g id="edge68" class="edge"> +<title>CaseAttributeXRefNew:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M3819,-1961C3795.41,-1961 3828.4,-1316.38 3845.65,-1006.1"/> +<polygon fill="black" stroke="black" points="3849.14,-1006.29 3846.2,-996.11 3842.15,-1005.9 3849.14,-1006.29"/> +</g> +<!-- CaseAttributeXRefNew->CaseAttribute --> +<g id="edge67" class="edge"> +<title>CaseAttributeXRefNew:CaseAttributeId->CaseAttribute</title> +<path fill="none" stroke="black" d="M3819,-1983C3775.49,-1983 3829.94,-1230.6 3799,-1200 3702.3,-1104.35 1459.95,-1245.42 1351,-1164 1269.39,-1103.01 1252.58,-975.97 1250.14,-901.3"/> +<polygon fill="black" stroke="black" points="1253.64,-901.03 1249.89,-891.12 1246.64,-901.2 1253.64,-901.03"/> +</g> +<!-- CaseAttributeXRefNew->Strain --> +<g id="edge69" class="edge"> +<title>CaseAttributeXRefNew:StrainId->Strain</title> +<path fill="none" stroke="black" d="M4051,-1940C4092.12,-1940 4042.15,-1230.26 4070,-1200 4119.95,-1145.72 4327.27,-1176.34 4400,-1164 4905.53,-1078.2 5502.61,-920.46 5710.32,-863.88"/> +<polygon fill="black" stroke="black" points="5711.48,-867.19 5720.21,-861.18 5709.64,-860.44 5711.48,-867.19"/> +</g> +<!-- GenoCode --> +<g id="node77" class="node"> +<title>GenoCode</title> +<polygon fill="white" stroke="transparent" points="3486.5,-1907.5 3486.5,-2018.5 3619.5,-2018.5 3619.5,-1907.5 3486.5,-1907.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="3490,-1994 3490,-2015 3617,-2015 3617,-1994 3490,-1994"/> +<polygon fill="none" stroke="black" points="3490,-1994 3490,-2015 3617,-2015 3617,-1994 3490,-1994"/> +<text text-anchor="start" x="3493" y="-2000.8" font-family="Times,serif" font-size="14.00">GenoCode (40 B)</text> +<text text-anchor="start" x="3506.5" y="-1978.8" font-family="Times,serif" font-size="14.00">AlleleSymbol</text> +<text text-anchor="start" x="3516" y="-1957.8" font-family="Times,serif" font-size="14.00">AlleleType</text> +<text text-anchor="start" x="3500.5" y="-1936.8" font-family="Times,serif" font-size="14.00">DatabaseValue</text> +<text text-anchor="start" x="3510.5" y="-1915.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<polygon fill="none" stroke="black" points="3486.5,-1907.5 3486.5,-2018.5 3619.5,-2018.5 3619.5,-1907.5 3486.5,-1907.5"/> +</g> +<!-- GenoCode->InbredSet --> +<g id="edge70" class="edge"> +<title>GenoCode:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M3618,-1919C3657.96,-1919 3611.64,-1231.67 3636,-1200 3670.72,-1154.85 3718.61,-1204.16 3759,-1164 3801.12,-1122.13 3824.91,-1062.6 3838.29,-1006.16"/> +<polygon fill="black" stroke="black" points="3841.71,-1006.93 3840.51,-996.4 3834.88,-1005.38 3841.71,-1006.93"/> +</g> +<!-- ProbeSE->Strain --> +<g id="edge71" class="edge"> +<title>ProbeSE:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6994,-1930C6953.43,-1930 6998.65,-1232.22 6974,-1200 6834.26,-1017.37 6100.93,-891 5861.61,-854.12"/> +<polygon fill="black" stroke="black" points="5862.02,-850.65 5851.61,-852.59 5860.96,-857.57 5862.02,-850.65"/> +</g> +<!-- Temp --> +<g id="node80" class="node"> +<title>Temp</title> +<polygon fill="white" stroke="transparent" points="4087.5,-1865.5 4087.5,-2060.5 4206.5,-2060.5 4206.5,-1865.5 4087.5,-1865.5"/> +<polygon fill="#df65b0" stroke="transparent" points="4091,-2036 4091,-2057 4204,-2057 4204,-2036 4091,-2036"/> +<polygon fill="none" stroke="black" points="4091,-2036 4091,-2057 4204,-2057 4204,-2036 4091,-2036"/> +<text text-anchor="start" x="4099" y="-2042.8" font-family="Times,serif" font-size="14.00">Temp (1 MiB)</text> +<text text-anchor="start" x="4108.5" y="-2020.8" font-family="Times,serif" font-size="14.00">createtime</text> +<text text-anchor="start" x="4123" y="-1999.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="4093" y="-1978.8" font-family="Times,serif" font-size="14.00">dbdisplayname</text> +<text text-anchor="start" x="4107" y="-1957.8" font-family="Times,serif" font-size="14.00">description</text> +<text text-anchor="start" x="4140" y="-1936.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="4104.5" y="-1915.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="4139.5" y="-1894.8" font-family="Times,serif" font-size="14.00">IP</text> +<text text-anchor="start" x="4126" y="-1873.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="4087.5,-1865.5 4087.5,-2060.5 4206.5,-2060.5 4206.5,-1865.5 4087.5,-1865.5"/> +</g> +<!-- Temp->InbredSet --> +<g id="edge72" class="edge"> +<title>Temp:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M4090,-1919C4070.02,-1919 4075.62,-1219.17 4070,-1200 4043.91,-1110.94 3990,-1021.51 3942.68,-954.43"/> +<polygon fill="black" stroke="black" points="3945.3,-952.07 3936.65,-945.95 3939.59,-956.12 3945.3,-952.07"/> +</g> +<!-- GenoData --> +<g id="node81" class="node"> +<title>GenoData</title> +<polygon fill="white" stroke="transparent" points="6256.5,-1918 6256.5,-2008 6403.5,-2008 6403.5,-1918 6256.5,-1918"/> +<polygon fill="#ce1256" stroke="transparent" points="6260,-1984 6260,-2005 6401,-2005 6401,-1984 6260,-1984"/> +<polygon fill="none" stroke="black" points="6260,-1984 6260,-2005 6401,-2005 6401,-1984 6260,-1984"/> +<text text-anchor="start" x="6263" y="-1990.8" font-family="Times,serif" font-size="14.00">GenoData (10 GiB)</text> +<text text-anchor="start" x="6323" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="6301" y="-1947.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="6311" y="-1926.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="6256.5,-1918 6256.5,-2008 6403.5,-2008 6403.5,-1918 6256.5,-1918"/> +</g> +<!-- GenoData->Strain --> +<g id="edge73" class="edge"> +<title>GenoData:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6259,-1951C6217.26,-1951 6257.72,-1237.31 6239,-1200 6158.18,-1038.89 5967.05,-927.85 5860.69,-876.11"/> +<polygon fill="black" stroke="black" points="5862.14,-872.92 5851.61,-871.74 5859.11,-879.23 5862.14,-872.92"/> +</g> +<!-- GenoFreeze->InbredSet --> +<g id="edge74" class="edge"> +<title>GenoFreeze:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M4409,-1930C4368.43,-1930 4415.79,-1231.31 4390,-1200 4343.1,-1143.07 4293.94,-1197.05 4228,-1164 4118.16,-1108.94 4014.02,-1014.44 3943.83,-942.19"/> +<polygon fill="black" stroke="black" points="3946.19,-939.59 3936.73,-934.83 3941.15,-944.45 3946.19,-939.59"/> +</g> +<!-- ProbeSetData --> +<g id="node83" class="node"> +<title>ProbeSetData</title> +<polygon fill="white" stroke="transparent" points="6438,-1918 6438,-2008 6614,-2008 6614,-1918 6438,-1918"/> +<polygon fill="#ce1256" stroke="transparent" points="6441,-1984 6441,-2005 6611,-2005 6611,-1984 6441,-1984"/> +<polygon fill="none" stroke="black" points="6441,-1984 6441,-2005 6611,-2005 6611,-1984 6441,-1984"/> +<text text-anchor="start" x="6444" y="-1990.8" font-family="Times,serif" font-size="14.00">ProbeSetData (62 GiB)</text> +<text text-anchor="start" x="6518.5" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="6496.5" y="-1947.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="6506.5" y="-1926.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="6438,-1918 6438,-2008 6614,-2008 6614,-1918 6438,-1918"/> +</g> +<!-- ProbeSetData->Strain --> +<g id="edge75" class="edge"> +<title>ProbeSetData:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6440,-1951C6398.26,-1951 6441.54,-1235.75 6420,-1200 6294.74,-992.11 6000.36,-895.18 5861.29,-859.75"/> +<polygon fill="black" stroke="black" points="5862.1,-856.35 5851.55,-857.31 5860.4,-863.14 5862.1,-856.35"/> +</g> +<!-- CeleraINFO_mm6 --> +<g id="node84" class="node"> +<title>CeleraINFO_mm6</title> +<polygon fill="white" stroke="transparent" points="12232,-4706 12232,-5069 12448,-5069 12448,-4706 12232,-4706"/> +<polygon fill="#df65b0" stroke="transparent" points="12235,-5044.5 12235,-5065.5 12445,-5065.5 12445,-5044.5 12235,-5044.5"/> +<polygon fill="none" stroke="black" points="12235,-5044.5 12235,-5065.5 12445,-5065.5 12445,-5044.5 12235,-5044.5"/> +<text text-anchor="start" x="12238" y="-5051.3" font-family="Times,serif" font-size="14.00">CeleraINFO_mm6 (780 MiB)</text> +<text text-anchor="start" x="12309.5" y="-5029.3" font-family="Times,serif" font-size="14.00">allele_AJ</text> +<text text-anchor="start" x="12307.5" y="-5008.3" font-family="Times,serif" font-size="14.00">allele_B6</text> +<text text-anchor="start" x="12307" y="-4987.3" font-family="Times,serif" font-size="14.00">allele_D2</text> +<text text-anchor="start" x="12308" y="-4966.3" font-family="Times,serif" font-size="14.00">allele_S1</text> +<text text-anchor="start" x="12308" y="-4945.3" font-family="Times,serif" font-size="14.00">allele_X1</text> +<text text-anchor="start" x="12319" y="-4924.3" font-family="Times,serif" font-size="14.00">B6_AJ</text> +<text text-anchor="start" x="12316.5" y="-4903.3" font-family="Times,serif" font-size="14.00">B6_D2</text> +<text text-anchor="start" x="12294.5" y="-4882.3" font-family="Times,serif" font-size="14.00">chromosome</text> +<text text-anchor="start" x="12318.5" y="-4861.3" font-family="Times,serif" font-size="14.00">D2_AJ</text> +<text text-anchor="start" x="12306.5" y="-4840.3" font-family="Times,serif" font-size="14.00">flanking3</text> +<text text-anchor="start" x="12306.5" y="-4819.3" font-family="Times,serif" font-size="14.00">flanking5</text> +<text text-anchor="start" x="12332.5" y="-4798.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="12302" y="-4777.3" font-family="Times,serif" font-size="14.00">MB_celera</text> +<text text-anchor="start" x="12302.5" y="-4756.3" font-family="Times,serif" font-size="14.00">MB_UCSC</text> +<text text-anchor="start" x="12283.5" y="-4735.3" font-family="Times,serif" font-size="14.00">MB_UCSC_OLD</text> +<text text-anchor="start" x="12315.5" y="-4714.3" font-family="Times,serif" font-size="14.00">SNPID</text> +<polygon fill="none" stroke="black" points="12232,-4706 12232,-5069 12448,-5069 12448,-4706 12232,-4706"/> +</g> +<!-- TableFieldAnnotation --> +<g id="node85" class="node"> +<title>TableFieldAnnotation</title> +<polygon fill="white" stroke="transparent" points="12482,-4842.5 12482,-4932.5 12710,-4932.5 12710,-4842.5 12482,-4842.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="12485,-4908.5 12485,-4929.5 12707,-4929.5 12707,-4908.5 12485,-4908.5"/> +<polygon fill="none" stroke="black" points="12485,-4908.5 12485,-4929.5 12707,-4929.5 12707,-4908.5 12485,-4908.5"/> +<text text-anchor="start" x="12488" y="-4915.3" font-family="Times,serif" font-size="14.00">TableFieldAnnotation (43 KiB)</text> +<text text-anchor="start" x="12556.5" y="-4893.3" font-family="Times,serif" font-size="14.00">Annotation</text> +<text text-anchor="start" x="12552" y="-4872.3" font-family="Times,serif" font-size="14.00">Foreign_Key</text> +<text text-anchor="start" x="12558.5" y="-4851.3" font-family="Times,serif" font-size="14.00">TableField</text> +<polygon fill="none" stroke="black" points="12482,-4842.5 12482,-4932.5 12710,-4932.5 12710,-4842.5 12482,-4842.5"/> +</g> +<!-- ProbeSet --> +<g id="node86" class="node"> +<title>ProbeSet</title> +<polygon fill="white" stroke="transparent" points="752.5,-1204 752.5,-2722 983.5,-2722 983.5,-1204 752.5,-1204"/> +<polygon fill="#ce1256" stroke="transparent" points="756,-2698 756,-2719 981,-2719 981,-2698 756,-2698"/> +<polygon fill="none" stroke="black" points="756,-2698 756,-2719 981,-2719 981,-2698 756,-2698"/> +<text text-anchor="start" x="808" y="-2704.8" font-family="Times,serif" font-size="14.00">ProbeSet (2 GiB)</text> +<text text-anchor="start" x="851.5" y="-2682.8" font-family="Times,serif" font-size="14.00">alias</text> +<text text-anchor="start" x="842.5" y="-2661.8" font-family="Times,serif" font-size="14.00">alias_H</text> +<text text-anchor="start" x="821.5" y="-2640.8" font-family="Times,serif" font-size="14.00">Biotype_ENS</text> +<text text-anchor="start" x="840" y="-2619.8" font-family="Times,serif" font-size="14.00">BlatSeq</text> +<text text-anchor="start" x="822" y="-2598.8" font-family="Times,serif" font-size="14.00">CAS_number</text> +<text text-anchor="start" x="833.5" y="-2577.8" font-family="Times,serif" font-size="14.00">ChEBI_ID</text> +<text text-anchor="start" x="824.5" y="-2556.8" font-family="Times,serif" font-size="14.00">ChEMBL_ID</text> +<text text-anchor="start" x="812.5" y="-2535.8" font-family="Times,serif" font-size="14.00">ChemSpider_ID</text> +<text text-anchor="start" x="844.5" y="-2514.8" font-family="Times,serif" font-size="14.00">ChipId</text> +<text text-anchor="start" x="855" y="-2493.8" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="833.5" y="-2472.8" font-family="Times,serif" font-size="14.00">Chr_2016</text> +<text text-anchor="start" x="833.5" y="-2451.8" font-family="Times,serif" font-size="14.00">Chr_mm8</text> +<text text-anchor="start" x="837.5" y="-2430.8" font-family="Times,serif" font-size="14.00">chr_num</text> +<text text-anchor="start" x="813.5" y="-2409.8" font-family="Times,serif" font-size="14.00">chromosome_H</text> +<text text-anchor="start" x="831.5" y="-2388.8" font-family="Times,serif" font-size="14.00">comments</text> +<text text-anchor="start" x="829" y="-2367.8" font-family="Times,serif" font-size="14.00">Confidence</text> +<text text-anchor="start" x="828" y="-2346.8" font-family="Times,serif" font-size="14.00">description</text> +<text text-anchor="start" x="818.5" y="-2325.8" font-family="Times,serif" font-size="14.00">description_H</text> +<text text-anchor="start" x="826.5" y="-2304.8" font-family="Times,serif" font-size="14.00">EC_number</text> +<text text-anchor="start" x="804.5" y="-2283.8" font-family="Times,serif" font-size="14.00">ENSEMBLGeneId</text> +<text text-anchor="start" x="855" y="-2262.8" font-family="Times,serif" font-size="14.00">flag</text> +<text text-anchor="start" x="830" y="-2241.8" font-family="Times,serif" font-size="14.00">Flybase_Id</text> +<text text-anchor="start" x="829.5" y="-2220.8" font-family="Times,serif" font-size="14.00">GenbankId</text> +<text text-anchor="start" x="842.5" y="-2199.8" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="833.5" y="-2178.8" font-family="Times,serif" font-size="14.00">GeneId_H</text> +<text text-anchor="start" x="833.5" y="-2157.8" font-family="Times,serif" font-size="14.00">HGNC_ID</text> +<text text-anchor="start" x="832.5" y="-2136.8" font-family="Times,serif" font-size="14.00">HMDB_ID</text> +<text text-anchor="start" x="814" y="-2115.8" font-family="Times,serif" font-size="14.00">HomoloGeneID</text> +<text text-anchor="start" x="861" y="-2094.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="835" y="-2073.8" font-family="Times,serif" font-size="14.00">KEGG_ID</text> +<text text-anchor="start" x="856.5" y="-2052.8" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="835" y="-2031.8" font-family="Times,serif" font-size="14.00">Mb_2016</text> +<text text-anchor="start" x="846.5" y="-2010.8" font-family="Times,serif" font-size="14.00">MB_H</text> +<text text-anchor="start" x="835" y="-1989.8" font-family="Times,serif" font-size="14.00">Mb_mm8</text> +<text text-anchor="start" x="804" y="-1968.8" font-family="Times,serif" font-size="14.00">Molecular_Weight</text> +<text text-anchor="start" x="847" y="-1947.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="829.5" y="-1926.8" font-family="Times,serif" font-size="14.00">name_num</text> +<text text-anchor="start" x="823" y="-1905.8" font-family="Times,serif" font-size="14.00">Nugowiki_ID</text> +<text text-anchor="start" x="845.5" y="-1884.8" font-family="Times,serif" font-size="14.00">OMIM</text> +<text text-anchor="start" x="806.5" y="-1863.8" font-family="Times,serif" font-size="14.00">PeptideSequence</text> +<text text-anchor="start" x="818.5" y="-1842.8" font-family="Times,serif" font-size="14.00">PrimaryName</text> +<text text-anchor="start" x="783.5" y="-1821.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_end</text> +<text text-anchor="start" x="762" y="-1800.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_end_2016</text> +<text text-anchor="start" x="762" y="-1779.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_end_mm8</text> +<text text-anchor="start" x="779.5" y="-1758.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_start</text> +<text text-anchor="start" x="758" y="-1737.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_start_2016</text> +<text text-anchor="start" x="758" y="-1716.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_start_mm8</text> +<text text-anchor="start" x="788.5" y="-1695.8" font-family="Times,serif" font-size="14.00">Probe_set_BLAT_score</text> +<text text-anchor="start" x="784.5" y="-1674.8" font-family="Times,serif" font-size="14.00">Probe_set_Note_by_RW</text> +<text text-anchor="start" x="793.5" y="-1653.8" font-family="Times,serif" font-size="14.00">Probe_set_specificity</text> +<text text-anchor="start" x="806.5" y="-1632.8" font-family="Times,serif" font-size="14.00">Probe_set_strand</text> +<text text-anchor="start" x="781" y="-1611.8" font-family="Times,serif" font-size="14.00">Probe_set_target_region</text> +<text text-anchor="start" x="776" y="-1590.8" font-family="Times,serif" font-size="14.00">Probe_Target_Description</text> +<text text-anchor="start" x="833.5" y="-1569.8" font-family="Times,serif" font-size="14.00">ProteinID</text> +<text text-anchor="start" x="821" y="-1548.8" font-family="Times,serif" font-size="14.00">ProteinName</text> +<text text-anchor="start" x="822" y="-1527.8" font-family="Times,serif" font-size="14.00">PubChem_ID</text> +<text text-anchor="start" x="795" y="-1506.8" font-family="Times,serif" font-size="14.00">RefSeq_TranscriptId</text> +<text text-anchor="start" x="840" y="-1485.8" font-family="Times,serif" font-size="14.00">RGD_ID</text> +<text text-anchor="start" x="806" y="-1464.8" font-family="Times,serif" font-size="14.00">SecondaryNames</text> +<text text-anchor="start" x="852.5" y="-1443.8" font-family="Times,serif" font-size="14.00">SNP</text> +<text text-anchor="start" x="822" y="-1422.8" font-family="Times,serif" font-size="14.00">Strand_Gene</text> +<text text-anchor="start" x="819.5" y="-1401.8" font-family="Times,serif" font-size="14.00">Strand_Probe</text> +<text text-anchor="start" x="841.5" y="-1380.8" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="832" y="-1359.8" font-family="Times,serif" font-size="14.00">Symbol_H</text> +<text text-anchor="start" x="838" y="-1338.8" font-family="Times,serif" font-size="14.00">TargetId</text> +<text text-anchor="start" x="831.5" y="-1317.8" font-family="Times,serif" font-size="14.00">TargetSeq</text> +<text text-anchor="start" x="845.5" y="-1296.8" font-family="Times,serif" font-size="14.00">Tissue</text> +<text text-anchor="start" x="851" y="-1275.8" font-family="Times,serif" font-size="14.00">Type</text> +<text text-anchor="start" x="830" y="-1254.8" font-family="Times,serif" font-size="14.00">UniGeneId</text> +<text text-anchor="start" x="839.5" y="-1233.8" font-family="Times,serif" font-size="14.00">UNII_ID</text> +<text text-anchor="start" x="832" y="-1212.8" font-family="Times,serif" font-size="14.00">UniProtID</text> +<polygon fill="none" stroke="black" points="752.5,-1204 752.5,-2722 983.5,-2722 983.5,-1204 752.5,-1204"/> +</g> +<!-- ProbeSet->Genbank --> +<g id="edge76" class="edge"> +<title>ProbeSet:GenbankId->Genbank</title> +<path fill="none" stroke="black" d="M755,-2225C726.53,-2225 752.7,-1228.28 756,-1200 768.49,-1092.85 801.24,-971.17 821.96,-901.12"/> +<polygon fill="black" stroke="black" points="825.42,-901.75 824.93,-891.16 818.72,-899.75 825.42,-901.75"/> +</g> +<!-- GenoFile --> +<g id="node87" class="node"> +<title>GenoFile</title> +<polygon fill="white" stroke="transparent" points="4240.5,-1886.5 4240.5,-2039.5 4373.5,-2039.5 4373.5,-1886.5 4240.5,-1886.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="4244,-2015 4244,-2036 4371,-2036 4371,-2015 4244,-2015"/> +<polygon fill="none" stroke="black" points="4244,-2015 4244,-2036 4371,-2036 4371,-2015 4244,-2015"/> +<text text-anchor="start" x="4247" y="-2021.8" font-family="Times,serif" font-size="14.00">GenoFile (332 B)</text> +<text text-anchor="start" x="4300.5" y="-1999.8" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="4263.5" y="-1978.8" font-family="Times,serif" font-size="14.00">InbredSetID</text> +<text text-anchor="start" x="4279" y="-1957.8" font-family="Times,serif" font-size="14.00">location</text> +<text text-anchor="start" x="4284.5" y="-1936.8" font-family="Times,serif" font-size="14.00">server</text> +<text text-anchor="start" x="4293" y="-1915.8" font-family="Times,serif" font-size="14.00">sort</text> +<text text-anchor="start" x="4292.5" y="-1894.8" font-family="Times,serif" font-size="14.00">title</text> +<polygon fill="none" stroke="black" points="4240.5,-1886.5 4240.5,-2039.5 4373.5,-2039.5 4373.5,-1886.5 4240.5,-1886.5"/> +</g> +<!-- GenoFile->InbredSet --> +<g id="edge77" class="edge"> +<title>GenoFile:InbredSetID->InbredSet</title> +<path fill="none" stroke="black" d="M4243,-1983C4221.24,-1983 4231.73,-1219.93 4223,-1200 4165.37,-1068.5 4034.27,-960.98 3945.16,-899.43"/> +<polygon fill="black" stroke="black" points="3946.9,-896.38 3936.67,-893.62 3942.95,-902.16 3946.9,-896.38"/> +</g> +<!-- TempData --> +<g id="node88" class="node"> +<title>TempData</title> +<polygon fill="white" stroke="transparent" points="6636,-3228 6636,-3360 6788,-3360 6788,-3228 6636,-3228"/> +<polygon fill="#df65b0" stroke="transparent" points="6639,-3336 6639,-3357 6785,-3357 6785,-3336 6639,-3336"/> +<polygon fill="none" stroke="black" points="6639,-3336 6639,-3357 6785,-3357 6785,-3336 6639,-3336"/> +<text text-anchor="start" x="6642" y="-3342.8" font-family="Times,serif" font-size="14.00">TempData (11 MiB)</text> +<text text-anchor="start" x="6704.5" y="-3320.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="6683.5" y="-3299.8" font-family="Times,serif" font-size="14.00">NStrain</text> +<text text-anchor="start" x="6701.5" y="-3278.8" font-family="Times,serif" font-size="14.00">SE</text> +<text text-anchor="start" x="6682.5" y="-3257.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="6692.5" y="-3236.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="6636,-3228 6636,-3360 6788,-3360 6788,-3228 6636,-3228"/> +</g> +<!-- TempData->NStrain --> +<g id="edge78" class="edge"> +<title>TempData:NStrain->NStrain</title> +<path fill="none" stroke="black" d="M6786,-3304C6851.17,-3304 6745.87,-2280.14 6718.32,-2022.36"/> +<polygon fill="black" stroke="black" points="6721.77,-2021.66 6717.22,-2012.09 6714.81,-2022.4 6721.77,-2021.66"/> +</g> +<!-- TempData->Strain --> +<g id="edge79" class="edge"> +<title>TempData:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6786,-3261C6799.61,-3261 6829.44,-1253.01 6792,-1200 6572.1,-888.62 6056.1,-847.14 5861.8,-842.87"/> +<polygon fill="black" stroke="black" points="5861.75,-839.37 5851.68,-842.67 5861.61,-846.36 5861.75,-839.37"/> +</g> +<!-- CaseAttributeXRef --> +<g id="node89" class="node"> +<title>CaseAttributeXRef</title> +<polygon fill="white" stroke="transparent" points="2630,-4832 2630,-4943 2848,-4943 2848,-4832 2630,-4832"/> +<polygon fill="#d7b5d8" stroke="transparent" points="2633,-4918.5 2633,-4939.5 2845,-4939.5 2845,-4918.5 2633,-4918.5"/> +<polygon fill="none" stroke="black" points="2633,-4918.5 2633,-4939.5 2845,-4939.5 2845,-4918.5 2633,-4918.5"/> +<text text-anchor="start" x="2636" y="-4925.3" font-family="Times,serif" font-size="14.00">CaseAttributeXRef (753 KiB)</text> +<text text-anchor="start" x="2681.5" y="-4903.3" font-family="Times,serif" font-size="14.00">CaseAttributeId</text> +<text text-anchor="start" x="2674" y="-4882.3" font-family="Times,serif" font-size="14.00">ProbeSetFreezeId</text> +<text text-anchor="start" x="2709.5" y="-4861.3" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="2719" y="-4840.3" font-family="Times,serif" font-size="14.00">Value</text> +<polygon fill="none" stroke="black" points="2630,-4832 2630,-4943 2848,-4943 2848,-4832 2630,-4832"/> +</g> +<!-- CaseAttributeXRef->CaseAttribute --> +<g id="edge80" class="edge"> +<title>CaseAttributeXRef:CaseAttributeId->CaseAttribute</title> +<path fill="none" stroke="black" d="M2632,-4907.5C859.27,-4907.5 1188.58,-1398.42 1244.12,-901.29"/> +<polygon fill="black" stroke="black" points="1247.63,-901.45 1245.27,-891.12 1240.67,-900.66 1247.63,-901.45"/> +</g> +<!-- CaseAttributeXRef->Strain --> +<g id="edge82" class="edge"> +<title>CaseAttributeXRef:StrainId->Strain</title> +<path fill="none" stroke="black" d="M2846,-4864.5C3071.96,-4864.5 2844.72,-4009.37 3016,-3862 3099.31,-3790.32 4915.51,-3902.94 4994,-3826 5098.23,-3723.83 4995.8,-1323.24 5074,-1200 5218.94,-971.59 5558.15,-883.8 5710.07,-855.09"/> +<polygon fill="black" stroke="black" points="5711.05,-858.47 5720.24,-853.2 5709.77,-851.59 5711.05,-858.47"/> +</g> +<!-- CaseAttributeXRef->ProbeSetFreeze --> +<g id="edge81" class="edge"> +<title>CaseAttributeXRef:ProbeSetFreezeId->ProbeSetFreeze</title> +<path fill="none" stroke="black" d="M2846,-4885.5C3129.96,-4885.5 2889.92,-3863.52 2783.5,-3457.98"/> +<polygon fill="black" stroke="black" points="2786.86,-3457.01 2780.93,-3448.23 2780.09,-3458.79 2786.86,-3457.01"/> +</g> +<!-- ProbeSetFreeze->ProbeFreeze --> +<g id="edge83" class="edge"> +<title>ProbeSetFreeze:ProbeFreezeId->ProbeFreeze</title> +<path fill="none" stroke="black" d="M2642,-3198C2531.36,-3198 2632.91,-2395.98 2676.43,-2085.09"/> +<polygon fill="black" stroke="black" points="2679.9,-2085.53 2677.83,-2075.14 2672.97,-2084.56 2679.9,-2085.53"/> +</g> +<!-- temporary --> +<g id="node91" class="node"> +<title>temporary</title> +<polygon fill="white" stroke="transparent" points="12744.5,-4811 12744.5,-4964 12889.5,-4964 12889.5,-4811 12744.5,-4811"/> +<polygon fill="#df65b0" stroke="transparent" points="12748,-4939.5 12748,-4960.5 12887,-4960.5 12887,-4939.5 12748,-4939.5"/> +<polygon fill="none" stroke="black" points="12748,-4939.5 12748,-4960.5 12887,-4960.5 12887,-4939.5 12748,-4939.5"/> +<text text-anchor="start" x="12751" y="-4946.3" font-family="Times,serif" font-size="14.00">temporary (4 MiB)</text> +<text text-anchor="start" x="12790.5" y="-4924.3" font-family="Times,serif" font-size="14.00">GeneID</text> +<text text-anchor="start" x="12771.5" y="-4903.3" font-family="Times,serif" font-size="14.00">HomoloGene</text> +<text text-anchor="start" x="12794.5" y="-4882.3" font-family="Times,serif" font-size="14.00">OMIM</text> +<text text-anchor="start" x="12766.5" y="-4861.3" font-family="Times,serif" font-size="14.00">Other_GeneID</text> +<text text-anchor="start" x="12790.5" y="-4840.3" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="12796" y="-4819.3" font-family="Times,serif" font-size="14.00">tax_id</text> +<polygon fill="none" stroke="black" points="12744.5,-4811 12744.5,-4964 12889.5,-4964 12889.5,-4811 12744.5,-4811"/> +</g> +<!-- Chr_Length --> +<g id="node92" class="node"> +<title>Chr_Length</title> +<polygon fill="white" stroke="transparent" points="1368,-765.5 1368,-918.5 1518,-918.5 1518,-765.5 1368,-765.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="1371,-894 1371,-915 1515,-915 1515,-894 1371,-894"/> +<polygon fill="none" stroke="black" points="1371,-894 1371,-915 1515,-915 1515,-894 1371,-894"/> +<text text-anchor="start" x="1374" y="-900.8" font-family="Times,serif" font-size="14.00">Chr_Length (2 KiB)</text> +<text text-anchor="start" x="1417.5" y="-878.8" font-family="Times,serif" font-size="14.00">Length</text> +<text text-anchor="start" x="1396" y="-857.8" font-family="Times,serif" font-size="14.00">Length_2016</text> +<text text-anchor="start" x="1396" y="-836.8" font-family="Times,serif" font-size="14.00">Length_mm8</text> +<text text-anchor="start" x="1421.5" y="-815.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="1414.5" y="-794.8" font-family="Times,serif" font-size="14.00">OrderId</text> +<text text-anchor="start" x="1408" y="-773.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<polygon fill="none" stroke="black" points="1368,-765.5 1368,-918.5 1518,-918.5 1518,-765.5 1368,-765.5"/> +</g> +<!-- Chr_Length->Species --> +<g id="edge84" class="edge"> +<title>Chr_Length:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M1516,-777C1544.63,-777 1515.78,-541.23 1535,-520 1694.07,-344.29 2463.44,-308.31 2715.71,-301.19"/> +<polygon fill="black" stroke="black" points="2716,-304.69 2725.9,-300.91 2715.81,-297.69 2716,-304.69"/> +</g> +<!-- GenoSE --> +<g id="node93" class="node"> +<title>GenoSE</title> +<polygon fill="white" stroke="transparent" points="6848.5,-1918 6848.5,-2008 6957.5,-2008 6957.5,-1918 6848.5,-1918"/> +<polygon fill="#f1eef6" stroke="transparent" points="6852,-1984 6852,-2005 6955,-2005 6955,-1984 6852,-1984"/> +<polygon fill="none" stroke="black" points="6852,-1984 6852,-2005 6955,-2005 6955,-1984 6852,-1984"/> +<text text-anchor="start" x="6855" y="-1990.8" font-family="Times,serif" font-size="14.00">GenoSE (0 B)</text> +<text text-anchor="start" x="6879" y="-1968.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="6885" y="-1947.8" font-family="Times,serif" font-size="14.00">error</text> +<text text-anchor="start" x="6874" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="6848.5,-1918 6848.5,-2008 6957.5,-2008 6957.5,-1918 6848.5,-1918"/> +</g> +<!-- GenoSE->Strain --> +<g id="edge85" class="edge"> +<title>GenoSE:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6851,-1930C6810.42,-1930 6850.14,-1232.62 6826,-1200 6591.69,-883.44 6059.6,-845.25 5861.86,-842.35"/> +<polygon fill="black" stroke="black" points="5861.61,-838.85 5851.57,-842.23 5861.53,-845.85 5861.61,-838.85"/> +</g> +<!-- ProbeH2 --> +<g id="node94" class="node"> +<title>ProbeH2</title> +<polygon fill="white" stroke="transparent" points="5788.5,-4832 5788.5,-4943 5921.5,-4943 5921.5,-4832 5788.5,-4832"/> +<polygon fill="#df65b0" stroke="transparent" points="5792,-4918.5 5792,-4939.5 5919,-4939.5 5919,-4918.5 5792,-4918.5"/> +<polygon fill="none" stroke="black" points="5792,-4918.5 5792,-4939.5 5919,-4939.5 5919,-4918.5 5792,-4918.5"/> +<text text-anchor="start" x="5795" y="-4925.3" font-family="Times,serif" font-size="14.00">ProbeH2 (9 MiB)</text> +<text text-anchor="start" x="5846" y="-4903.3" font-family="Times,serif" font-size="14.00">h2</text> +<text text-anchor="start" x="5802.5" y="-4882.3" font-family="Times,serif" font-size="14.00">ProbeFreezeId</text> +<text text-anchor="start" x="5827" y="-4861.3" font-family="Times,serif" font-size="14.00">ProbeId</text> +<text text-anchor="start" x="5831" y="-4840.3" font-family="Times,serif" font-size="14.00">weight</text> +<polygon fill="none" stroke="black" points="5788.5,-4832 5788.5,-4943 5921.5,-4943 5921.5,-4832 5788.5,-4832"/> +</g> +<!-- ProbeH2->Probe --> +<g id="edge87" class="edge"> +<title>ProbeH2:ProbeId->Probe</title> +<path fill="none" stroke="black" d="M5920,-4864.5C6401.38,-4864.5 5940.09,-4144.3 6330,-3862 6421.67,-3795.63 6755.1,-3903.04 6838,-3826 6948.34,-3723.46 6950.01,-3538.6 6936.27,-3416.32"/> +<polygon fill="black" stroke="black" points="6939.72,-3415.69 6935.07,-3406.16 6932.76,-3416.5 6939.72,-3415.69"/> +</g> +<!-- ProbeH2->ProbeFreeze --> +<g id="edge86" class="edge"> +<title>ProbeH2:ProbeFreezeId->ProbeFreeze</title> +<path fill="none" stroke="black" d="M5791,-4885.5C5212.27,-4885.5 5503.91,-4120.25 4986,-3862 4899.92,-3819.08 3329.71,-3886.58 3255,-3826 2877.83,-3520.19 3360.75,-3094.62 3007,-2762 2937.05,-2696.23 2860.62,-2795.13 2794,-2726 2629.09,-2554.88 2645.25,-2253.02 2670.34,-2085.17"/> +<polygon fill="black" stroke="black" points="2673.84,-2085.47 2671.89,-2075.05 2666.92,-2084.41 2673.84,-2085.47"/> +</g> +<!-- MappingMethod --> +<g id="node96" class="node"> +<title>MappingMethod</title> +<polygon fill="white" stroke="transparent" points="12923.5,-4853 12923.5,-4922 13110.5,-4922 13110.5,-4853 12923.5,-4853"/> +<polygon fill="#f1eef6" stroke="transparent" points="12927,-4897.5 12927,-4918.5 13108,-4918.5 13108,-4897.5 12927,-4897.5"/> +<polygon fill="none" stroke="black" points="12927,-4897.5 12927,-4918.5 13108,-4918.5 13108,-4897.5 12927,-4897.5"/> +<text text-anchor="start" x="12930" y="-4904.3" font-family="Times,serif" font-size="14.00">MappingMethod (100 B)</text> +<text text-anchor="start" x="13010" y="-4882.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="12996" y="-4861.3" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="12923.5,-4853 12923.5,-4922 13110.5,-4922 13110.5,-4853 12923.5,-4853"/> +</g> +<!-- SnpAll --> +<g id="node97" class="node"> +<title>SnpAll</title> +<polygon fill="white" stroke="transparent" points="1552,-524 1552,-1160 1746,-1160 1746,-524 1552,-524"/> +<polygon fill="#ce1256" stroke="transparent" points="1555,-1136 1555,-1157 1743,-1157 1743,-1136 1555,-1136"/> +<polygon fill="none" stroke="black" points="1555,-1136 1555,-1157 1743,-1157 1743,-1136 1555,-1136"/> +<text text-anchor="start" x="1593.5" y="-1142.8" font-family="Times,serif" font-size="14.00">SnpAll (11 GiB)</text> +<text text-anchor="start" x="1603.5" y="-1120.8" font-family="Times,serif" font-size="14.00">3Prime_UTR</text> +<text text-anchor="start" x="1603.5" y="-1099.8" font-family="Times,serif" font-size="14.00">5Prime_UTR</text> +<text text-anchor="start" x="1625" y="-1078.8" font-family="Times,serif" font-size="14.00">Alleles</text> +<text text-anchor="start" x="1602" y="-1057.8" font-family="Times,serif" font-size="14.00">Chromosome</text> +<text text-anchor="start" x="1581" y="-1036.8" font-family="Times,serif" font-size="14.00">ConservationScore</text> +<text text-anchor="start" x="1621.5" y="-1015.8" font-family="Times,serif" font-size="14.00">Domain</text> +<text text-anchor="start" x="1603.5" y="-994.8" font-family="Times,serif" font-size="14.00">Downstream</text> +<text text-anchor="start" x="1630.5" y="-973.8" font-family="Times,serif" font-size="14.00">Exon</text> +<text text-anchor="start" x="1630.5" y="-952.8" font-family="Times,serif" font-size="14.00">Gene</text> +<text text-anchor="start" x="1641.5" y="-931.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="1612" y="-910.8" font-family="Times,serif" font-size="14.00">Intergenic</text> +<text text-anchor="start" x="1626.5" y="-889.8" font-family="Times,serif" font-size="14.00">Intron</text> +<text text-anchor="start" x="1591.5" y="-868.8" font-family="Times,serif" font-size="14.00">Non_Splice_Site</text> +<text text-anchor="start" x="1557" y="-847.8" font-family="Times,serif" font-size="14.00">Non_Synonymous_Coding</text> +<text text-anchor="start" x="1620" y="-826.8" font-family="Times,serif" font-size="14.00">Position</text> +<text text-anchor="start" x="1599" y="-805.8" font-family="Times,serif" font-size="14.00">Position_2016</text> +<text text-anchor="start" x="1639.5" y="-784.8" font-family="Times,serif" font-size="14.00">Rs</text> +<text text-anchor="start" x="1614" y="-763.8" font-family="Times,serif" font-size="14.00">SnpName</text> +<text text-anchor="start" x="1624" y="-742.8" font-family="Times,serif" font-size="14.00">Source</text> +<text text-anchor="start" x="1614" y="-721.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="1609.5" y="-700.8" font-family="Times,serif" font-size="14.00">Splice_Site</text> +<text text-anchor="start" x="1602" y="-679.8" font-family="Times,serif" font-size="14.00">Start_Gained</text> +<text text-anchor="start" x="1611.5" y="-658.8" font-family="Times,serif" font-size="14.00">Start_Lost</text> +<text text-anchor="start" x="1603.5" y="-637.8" font-family="Times,serif" font-size="14.00">Stop_Gained</text> +<text text-anchor="start" x="1613.5" y="-616.8" font-family="Times,serif" font-size="14.00">Stop_Lost</text> +<text text-anchor="start" x="1575" y="-595.8" font-family="Times,serif" font-size="14.00">Synonymous_Coding</text> +<text text-anchor="start" x="1611.5" y="-574.8" font-family="Times,serif" font-size="14.00">Transcript</text> +<text text-anchor="start" x="1558.5" y="-553.8" font-family="Times,serif" font-size="14.00">Unknown_Effect_In_Exon</text> +<text text-anchor="start" x="1613" y="-532.8" font-family="Times,serif" font-size="14.00">Upstream</text> +<polygon fill="none" stroke="black" points="1552,-524 1552,-1160 1746,-1160 1746,-524 1552,-524"/> +</g> +<!-- SnpAll->Species --> +<g id="edge88" class="edge"> +<title>SnpAll:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M1744,-725C1789.75,-725 1732.61,-554.2 1763,-520 1889.95,-377.13 2495.01,-320.73 2715.44,-304.71"/> +<polygon fill="black" stroke="black" points="2715.91,-308.18 2725.64,-303.98 2715.41,-301.2 2715.91,-308.18"/> +</g> +<!-- GeneInfo --> +<g id="node98" class="node"> +<title>GeneInfo</title> +<polygon fill="white" stroke="transparent" points="2150,-671 2150,-1013 2338,-1013 2338,-671 2150,-671"/> +<polygon fill="#df65b0" stroke="transparent" points="2153,-989 2153,-1010 2335,-1010 2335,-989 2153,-989"/> +<polygon fill="none" stroke="black" points="2153,-989 2153,-1010 2335,-1010 2335,-989 2153,-989"/> +<text text-anchor="start" x="2178" y="-995.8" font-family="Times,serif" font-size="14.00">GeneInfo (23 MiB)</text> +<text text-anchor="start" x="2226.5" y="-973.8" font-family="Times,serif" font-size="14.00">Alias</text> +<text text-anchor="start" x="2215.5" y="-952.8" font-family="Times,serif" font-size="14.00">BlatSeq</text> +<text text-anchor="start" x="2230.5" y="-931.8" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="2218" y="-910.8" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="2189.5" y="-889.8" font-family="Times,serif" font-size="14.00">HomoloGeneID</text> +<text text-anchor="start" x="2236.5" y="-868.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2232" y="-847.8" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="2221" y="-826.8" font-family="Times,serif" font-size="14.00">OMIM</text> +<text text-anchor="start" x="2159" y="-805.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_end</text> +<text text-anchor="start" x="2155" y="-784.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_start</text> +<text text-anchor="start" x="2209" y="-763.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="2197.5" y="-742.8" font-family="Times,serif" font-size="14.00">Strand_Gene</text> +<text text-anchor="start" x="2195" y="-721.8" font-family="Times,serif" font-size="14.00">Strand_Probe</text> +<text text-anchor="start" x="2217" y="-700.8" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="2224" y="-679.8" font-family="Times,serif" font-size="14.00">TaxId</text> +<polygon fill="none" stroke="black" points="2150,-671 2150,-1013 2338,-1013 2338,-671 2150,-671"/> +</g> +<!-- GeneInfo->Species --> +<g id="edge89" class="edge"> +<title>GeneInfo:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M2336,-767C2363.53,-767 2339.64,-542.84 2355,-520 2438.32,-396.09 2612.85,-338.66 2715.61,-314.65"/> +<polygon fill="black" stroke="black" points="2716.66,-318 2725.63,-312.36 2715.1,-311.18 2716.66,-318"/> +</g> +<!-- GeneList_rn3 --> +<g id="node99" class="node"> +<title>GeneList_rn3</title> +<polygon fill="white" stroke="transparent" points="552,-1718.5 552,-2207.5 718,-2207.5 718,-1718.5 552,-1718.5"/> +<polygon fill="#df65b0" stroke="transparent" points="555,-2183 555,-2204 715,-2204 715,-2183 555,-2183"/> +<polygon fill="none" stroke="black" points="555,-2183 555,-2204 715,-2204 715,-2183 555,-2183"/> +<text text-anchor="start" x="558" y="-2189.8" font-family="Times,serif" font-size="14.00">GeneList_rn3 (5 MiB)</text> +<text text-anchor="start" x="589.5" y="-2167.8" font-family="Times,serif" font-size="14.00">chromosome</text> +<text text-anchor="start" x="621.5" y="-2146.8" font-family="Times,serif" font-size="14.00">flag</text> +<text text-anchor="start" x="595.5" y="-2125.8" font-family="Times,serif" font-size="14.00">genBankID</text> +<text text-anchor="start" x="576" y="-2104.8" font-family="Times,serif" font-size="14.00">geneDescription</text> +<text text-anchor="start" x="609" y="-2083.8" font-family="Times,serif" font-size="14.00">geneID</text> +<text text-anchor="start" x="591" y="-2062.8" font-family="Times,serif" font-size="14.00">geneSymbol</text> +<text text-anchor="start" x="628" y="-2041.8" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="607" y="-2020.8" font-family="Times,serif" font-size="14.00">identity</text> +<text text-anchor="start" x="618" y="-1999.8" font-family="Times,serif" font-size="14.00">kgID</text> +<text text-anchor="start" x="601.5" y="-1978.8" font-family="Times,serif" font-size="14.00">ProbeSet</text> +<text text-anchor="start" x="616" y="-1957.8" font-family="Times,serif" font-size="14.00">qEnd</text> +<text text-anchor="start" x="615" y="-1936.8" font-family="Times,serif" font-size="14.00">qSize</text> +<text text-anchor="start" x="612" y="-1915.8" font-family="Times,serif" font-size="14.00">qStart</text> +<text text-anchor="start" x="615.5" y="-1894.8" font-family="Times,serif" font-size="14.00">score</text> +<text text-anchor="start" x="601.5" y="-1873.8" font-family="Times,serif" font-size="14.00">sequence</text> +<text text-anchor="start" x="618" y="-1852.8" font-family="Times,serif" font-size="14.00">span</text> +<text text-anchor="start" x="598.5" y="-1831.8" font-family="Times,serif" font-size="14.00">specificity</text> +<text text-anchor="start" x="611.5" y="-1810.8" font-family="Times,serif" font-size="14.00">strand</text> +<text text-anchor="start" x="613.5" y="-1789.8" font-family="Times,serif" font-size="14.00">txEnd</text> +<text text-anchor="start" x="612.5" y="-1768.8" font-family="Times,serif" font-size="14.00">txSize</text> +<text text-anchor="start" x="609" y="-1747.8" font-family="Times,serif" font-size="14.00">txStart</text> +<text text-anchor="start" x="602" y="-1726.8" font-family="Times,serif" font-size="14.00">unigenID</text> +<polygon fill="none" stroke="black" points="552,-1718.5 552,-2207.5 718,-2207.5 718,-1718.5 552,-1718.5"/> +</g> +<!-- GeneList_rn3->Genbank --> +<g id="edge90" class="edge"> +<title>GeneList_rn3:genBankID->Genbank</title> +<path fill="none" stroke="black" d="M716,-2130C741.84,-2130 729.38,-1225.22 735,-1200 738.81,-1182.91 745.09,-1180.48 751,-1164 783.34,-1073.83 811.09,-965.96 826.65,-901.05"/> +<polygon fill="black" stroke="black" points="830.13,-901.54 829.04,-891 823.32,-899.92 830.13,-901.54"/> +</g> +<!-- News --> +<g id="node100" class="node"> +<title>News</title> +<polygon fill="white" stroke="transparent" points="13145,-4842.5 13145,-4932.5 13269,-4932.5 13269,-4842.5 13145,-4842.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="13148,-4908.5 13148,-4929.5 13266,-4929.5 13266,-4908.5 13148,-4908.5"/> +<polygon fill="none" stroke="black" points="13148,-4908.5 13148,-4929.5 13266,-4929.5 13266,-4908.5 13148,-4908.5"/> +<text text-anchor="start" x="13151" y="-4915.3" font-family="Times,serif" font-size="14.00">News (167 KiB)</text> +<text text-anchor="start" x="13191" y="-4893.3" font-family="Times,serif" font-size="14.00">date</text> +<text text-anchor="start" x="13182.5" y="-4872.3" font-family="Times,serif" font-size="14.00">details</text> +<text text-anchor="start" x="13200" y="-4851.3" font-family="Times,serif" font-size="14.00">id</text> +<polygon fill="none" stroke="black" points="13145,-4842.5 13145,-4932.5 13269,-4932.5 13269,-4842.5 13145,-4842.5"/> +</g> +<!-- login --> +<g id="node101" class="node"> +<title>login</title> +<polygon fill="white" stroke="transparent" points="13303.5,-4800.5 13303.5,-4974.5 13414.5,-4974.5 13414.5,-4800.5 13303.5,-4800.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="13307,-4950.5 13307,-4971.5 13412,-4971.5 13412,-4950.5 13307,-4950.5"/> +<polygon fill="none" stroke="black" points="13307,-4950.5 13307,-4971.5 13412,-4971.5 13412,-4950.5 13307,-4950.5"/> +<text text-anchor="start" x="13310" y="-4957.3" font-family="Times,serif" font-size="14.00">login (52 KiB)</text> +<text text-anchor="start" x="13315.5" y="-4935.3" font-family="Times,serif" font-size="14.00">assumed_by</text> +<text text-anchor="start" x="13352.5" y="-4914.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="13321" y="-4893.3" font-family="Times,serif" font-size="14.00">ip_address</text> +<text text-anchor="start" x="13323" y="-4872.3" font-family="Times,serif" font-size="14.00">session_id</text> +<text text-anchor="start" x="13322.5" y="-4851.3" font-family="Times,serif" font-size="14.00">successful</text> +<text text-anchor="start" x="13321" y="-4830.3" font-family="Times,serif" font-size="14.00">timestamp</text> +<text text-anchor="start" x="13343.5" y="-4809.3" font-family="Times,serif" font-size="14.00">user</text> +<polygon fill="none" stroke="black" points="13303.5,-4800.5 13303.5,-4974.5 13414.5,-4974.5 13414.5,-4800.5 13303.5,-4800.5"/> +</g> +<!-- GeneList --> +<g id="node102" class="node"> +<title>GeneList</title> +<polygon fill="white" stroke="transparent" points="1017.5,-1582 1017.5,-2344 1164.5,-2344 1164.5,-1582 1017.5,-1582"/> +<polygon fill="#df65b0" stroke="transparent" points="1021,-2320 1021,-2341 1162,-2341 1162,-2320 1021,-2320"/> +<polygon fill="none" stroke="black" points="1021,-2320 1021,-2341 1162,-2341 1162,-2320 1021,-2320"/> +<text text-anchor="start" x="1026" y="-2326.8" font-family="Times,serif" font-size="14.00">GeneList (37 MiB)</text> +<text text-anchor="start" x="1064.5" y="-2304.8" font-family="Times,serif" font-size="14.00">AlignID</text> +<text text-anchor="start" x="1065" y="-2283.8" font-family="Times,serif" font-size="14.00">cdsEnd</text> +<text text-anchor="start" x="1043.5" y="-2262.8" font-family="Times,serif" font-size="14.00">cdsEnd_2016</text> +<text text-anchor="start" x="1043.5" y="-2241.8" font-family="Times,serif" font-size="14.00">cdsEnd_mm8</text> +<text text-anchor="start" x="1061" y="-2220.8" font-family="Times,serif" font-size="14.00">cdsStart</text> +<text text-anchor="start" x="1039.5" y="-2199.8" font-family="Times,serif" font-size="14.00">cdsStart_2016</text> +<text text-anchor="start" x="1039.5" y="-2178.8" font-family="Times,serif" font-size="14.00">cdsStart_mm8</text> +<text text-anchor="start" x="1044.5" y="-2157.8" font-family="Times,serif" font-size="14.00">Chromosome</text> +<text text-anchor="start" x="1023" y="-2136.8" font-family="Times,serif" font-size="14.00">Chromosome_mm8</text> +<text text-anchor="start" x="1053" y="-2115.8" font-family="Times,serif" font-size="14.00">exonCount</text> +<text text-anchor="start" x="1031.5" y="-2094.8" font-family="Times,serif" font-size="14.00">exonCount_mm8</text> +<text text-anchor="start" x="1056.5" y="-2073.8" font-family="Times,serif" font-size="14.00">exonEnds</text> +<text text-anchor="start" x="1035" y="-2052.8" font-family="Times,serif" font-size="14.00">exonEnds_mm8</text> +<text text-anchor="start" x="1052" y="-2031.8" font-family="Times,serif" font-size="14.00">exonStarts</text> +<text text-anchor="start" x="1031" y="-2010.8" font-family="Times,serif" font-size="14.00">exonStarts_mm8</text> +<text text-anchor="start" x="1050.5" y="-1989.8" font-family="Times,serif" font-size="14.00">GenBankID</text> +<text text-anchor="start" x="1031.5" y="-1968.8" font-family="Times,serif" font-size="14.00">GeneDescription</text> +<text text-anchor="start" x="1064.5" y="-1947.8" font-family="Times,serif" font-size="14.00">GeneID</text> +<text text-anchor="start" x="1046" y="-1926.8" font-family="Times,serif" font-size="14.00">GeneSymbol</text> +<text text-anchor="start" x="1084" y="-1905.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="1056" y="-1884.8" font-family="Times,serif" font-size="14.00">Info_mm9</text> +<text text-anchor="start" x="1074.5" y="-1863.8" font-family="Times,serif" font-size="14.00">kgID</text> +<text text-anchor="start" x="1066.5" y="-1842.8" font-family="Times,serif" font-size="14.00">NM_ID</text> +<text text-anchor="start" x="1056.5" y="-1821.8" font-family="Times,serif" font-size="14.00">ProteinID</text> +<text text-anchor="start" x="1063" y="-1800.8" font-family="Times,serif" font-size="14.00">RGD_ID</text> +<text text-anchor="start" x="1056.5" y="-1779.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="1067" y="-1758.8" font-family="Times,serif" font-size="14.00">Strand</text> +<text text-anchor="start" x="1045.5" y="-1737.8" font-family="Times,serif" font-size="14.00">Strand_mm8</text> +<text text-anchor="start" x="1068.5" y="-1716.8" font-family="Times,serif" font-size="14.00">TxEnd</text> +<text text-anchor="start" x="1047" y="-1695.8" font-family="Times,serif" font-size="14.00">TxEnd_2016</text> +<text text-anchor="start" x="1047" y="-1674.8" font-family="Times,serif" font-size="14.00">TxEnd_mm8</text> +<text text-anchor="start" x="1064" y="-1653.8" font-family="Times,serif" font-size="14.00">TxStart</text> +<text text-anchor="start" x="1043" y="-1632.8" font-family="Times,serif" font-size="14.00">TxStart_2016</text> +<text text-anchor="start" x="1043" y="-1611.8" font-family="Times,serif" font-size="14.00">TxStart_mm8</text> +<text text-anchor="start" x="1057" y="-1590.8" font-family="Times,serif" font-size="14.00">UnigenID</text> +<polygon fill="none" stroke="black" points="1017.5,-1582 1017.5,-2344 1164.5,-2344 1164.5,-1582 1017.5,-1582"/> +</g> +<!-- GeneList->Species --> +<g id="edge92" class="edge"> +<title>GeneList:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M1020,-1783C987.59,-1783 1012.7,-1229.81 1000,-1200 991.25,-1179.47 973.39,-1184.68 965,-1164 938.08,-1097.7 917.52,-573.54 965,-520 1083.11,-386.82 2377.72,-318.63 2715.68,-303.02"/> +<polygon fill="black" stroke="black" points="2716.05,-306.51 2725.88,-302.55 2715.73,-299.51 2716.05,-306.51"/> +</g> +<!-- GeneList->Genbank --> +<g id="edge91" class="edge"> +<title>GeneList:GenBankID->Genbank</title> +<path fill="none" stroke="black" d="M1020,-1994C975.87,-1994 1023.12,-1237.58 1000,-1200 982.29,-1171.21 954.25,-1190.29 933,-1164 870.98,-1087.29 850.32,-970.88 843.44,-901.34"/> +<polygon fill="black" stroke="black" points="846.89,-900.65 842.48,-891.02 839.92,-901.3 846.89,-900.65"/> +</g> +<!-- GeneChipEnsemblXRef --> +<g id="node103" class="node"> +<title>GeneChipEnsemblXRef</title> +<polygon fill="white" stroke="transparent" points="1750,-1928.5 1750,-1997.5 1976,-1997.5 1976,-1928.5 1750,-1928.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="1753,-1973 1753,-1994 1973,-1994 1973,-1973 1753,-1973"/> +<polygon fill="none" stroke="black" points="1753,-1973 1753,-1994 1973,-1994 1973,-1973 1753,-1973"/> +<text text-anchor="start" x="1756" y="-1979.8" font-family="Times,serif" font-size="14.00">GeneChipEnsemblXRef (36 B)</text> +<text text-anchor="start" x="1808" y="-1957.8" font-family="Times,serif" font-size="14.00">EnsemblChipId</text> +<text text-anchor="start" x="1820.5" y="-1936.8" font-family="Times,serif" font-size="14.00">GeneChipId</text> +<polygon fill="none" stroke="black" points="1750,-1928.5 1750,-1997.5 1976,-1997.5 1976,-1928.5 1750,-1928.5"/> +</g> +<!-- GeneChipEnsemblXRef->EnsemblChip --> +<g id="edge93" class="edge"> +<title>GeneChipEnsemblXRef:EnsemblChipId->EnsemblChip</title> +<path fill="none" stroke="black" d="M1974,-1961C2027,-1961 1909.96,-1154.89 1873.44,-911.66"/> +<polygon fill="black" stroke="black" points="1876.86,-910.9 1871.91,-901.53 1869.94,-911.94 1876.86,-910.9"/> +</g> +<!-- GeneChipEnsemblXRef->GeneChip --> +<g id="edge94" class="edge"> +<title>GeneChipEnsemblXRef:GeneChipId->GeneChip</title> +<path fill="none" stroke="black" d="M1974,-1940C1994.57,-1940 1996.24,-1220.49 1998,-1200 2005.12,-1117.24 2018.29,-1024.34 2029.33,-954.05"/> +<polygon fill="black" stroke="black" points="2032.84,-954.27 2030.95,-943.85 2025.93,-953.18 2032.84,-954.27"/> +</g> +<!-- SnpAllele_to_be_deleted --> +<g id="node104" class="node"> +<title>SnpAllele_to_be_deleted</title> +<polygon fill="white" stroke="transparent" points="13448.5,-4842.5 13448.5,-4932.5 13687.5,-4932.5 13687.5,-4842.5 13448.5,-4842.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="13452,-4908.5 13452,-4929.5 13685,-4929.5 13685,-4908.5 13452,-4908.5"/> +<polygon fill="none" stroke="black" points="13452,-4908.5 13452,-4929.5 13685,-4929.5 13685,-4908.5 13452,-4908.5"/> +<text text-anchor="start" x="13455" y="-4915.3" font-family="Times,serif" font-size="14.00">SnpAllele_to_be_deleted (3 KiB)</text> +<text text-anchor="start" x="13551" y="-4893.3" font-family="Times,serif" font-size="14.00">Base</text> +<text text-anchor="start" x="13561" y="-4872.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="13554.5" y="-4851.3" font-family="Times,serif" font-size="14.00">Info</text> +<polygon fill="none" stroke="black" points="13448.5,-4842.5 13448.5,-4932.5 13687.5,-4932.5 13687.5,-4842.5 13448.5,-4842.5"/> +</g> +</g> +</svg> diff --git a/topics/deploy/configuring-nginx-on-host.gmi b/topics/deploy/configuring-nginx-on-host.gmi new file mode 100644 index 0000000..cb1c497 --- /dev/null +++ b/topics/deploy/configuring-nginx-on-host.gmi @@ -0,0 +1,220 @@ +# Configuring Nginx on the Host System + +## Tags + +* type: doc, docs, documentation +* keywords: deploy, deployment, deploying, nginx, guix, guix container, guix system container +* status: in progress + +## Introduction + +We deploy the GeneNetwork system within GNU Guix system containers. All the configurations and HTTPS certificates are handled from within the container, thus all the host has to do is to pass the traffic on to the system container. + +This document shows you how to set up the host container to forward all the necessary traffic so that you do not run into all the problems that we did when figuring this stuff out :-). + +## Ports and Domains + +In your system container, there are certain ports that are defined for various traffic. The most important ones, and the ones we will deal with, are for HTTP and HTTPS. The ideas should translate for most other ports. + +For the examples is this document, we will assume the following ports are defined in the Guix system container: +* HTTP on port 9080 +* HTTPS on port 9081 + +## HTTPS Traffic + +### Nginx --with-stream_ssl_preread_module + +We handle all the necessary traffic details (e.g. SSL/TLS termination, etc.) within the container, and only need the host to forward the traffic. + +In order to achieve this, your Nginx will need to be compiled with the +=> https://nginx.org/en/docs/stream/ngx_stream_ssl_preread_module.html Nginx Stream SSL Preread Module. + +Now, because we are awesome, we include +=> https://git.genenetwork.org/gn-machines/tree/nginx-preread.scm a definition for nginx compiled with the module. +Simply install it on your host by doing something like: + +``` +$ git clone https://git.genenetwork.org/gn-machines +$ cd gn-machines +$ ./nginx-preread-deploy.sh +``` + +That will install the nginx under "/usr/local/sbin/nginx". + +Now, we comment out, or delete any/all lines loading any nginx modules for any previously existing nginx. Comment out/delete the following line in your "/etc/nginx/nginx.conf" file if it exists: + +``` +include /etc/nginx/modules-enabled/*.conf; +``` + +This is necessary since the nginx we installed from guix comes with all the modules we need, and even if not, it would not successfully use the hosts modules anyhow. You'd need to modify the nginx config for yourself to add any missing modules for the nginx from guix — how to do that is outside the scope of this document, but should not be particularly difficult. + +Set up your init system to use the nginx from guix. Assuming systemd, you need to have something like the following in your "/etc/systemd/system/nginx.service" unit file: + +``` +[Unit] +Description=nginx web server (from Guix, not the host) +After=network.target + +[Service] +Type=forking +PIDFile=/run/nginx.pid +ExecStartPre=/usr/local/sbin/nginx -q -t -c /etc/nginx/nginx.conf -e /var/log/nginx/error.log +ExecStart=/usr/local/sbin/nginx -c /etc/nginx/nginx.conf -p /var/run/nginx -e /var/log/nginx/error.log +ExecReload=/usr/local/sbin/nginx -c /etc/nginx/nginx.conf -s reload -e /var/log/nginx/error.log +ExecStop=-/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid +TimeoutStopSec=5 +KillMode=mixed + +[Install] +WantedBy=multi-user.target +``` + +Awesome. Now enable the unit file: + +``` +$ sudo systemctl enable nginx.service +``` + +### Forwarding the HTTPS Traffic + +Now that we have nginx in place, we can forward HTTPS traffic for all the domains we want. In "/etc/nginx/nginx.conf" we add: + +``` +# Forward some HTTPS connections into existing guix containers +stream { + upstream my-container { + # This is our Guix system container + server 127.0.0.1:9081; + } + + upstream host-https { + # Forward any https traffic for any previously existing domains on the + # host itself. + server 127.0.0.1:6443; + } + + map $ssl_preread_server_name $upstream { + yourdomain1.genenetwork.org my-container; + yourdomain2.genenetwork.org my-container; + default host-https; + } + + server { + listen 443; + proxy_pass $upstream; + ssl_preread on; + } +} +``` + +## HTTP Traffic + +You will need to pass the HTTP traffic on to the container in order to enable HTTP-dependent traffic (e.g. setting up the SSL certificates using the ACME protocol) is successfully handled. + +You have 2 options to do this: +* Add a separate server block in `/etc/nginx/site-available/` (or other configured directory) +* Add the server block directly in `/etc/nginx/nginx.conf` (or your main nginx config file, if it's not the standard one mentioned here). + +The configuration to add is as follows: + +``` +server { + ## Forward HTTP traffic to container + ## Without this, the HTTP calls will fall through to the defaults in + ## /etc/nginx/sites-enabled/ leading to http-dependent traffic, like + ## that of the ACME client, failing. + server_name yourdomain1.genenetwork.org yourdomain2.genenetwork.org …; + listen 80; + location / { + proxy_pass http://127.0.0.1:9080; + proxy_set_header Host $host; + } +} +``` + +** Do please replace the "yourdomain*" parts in the example above as appropriate for your scenario. The ellipsis (…) indicate optional extra domains you might need to configure. + +Without this, the `Run ACME Client` below will fail + +## Run ACME Client + +Now that all traffic is set up, and you can reach your sites using both HTTP and HTTPS (you have tested your sites, right? right?) we can now request the SSL certificates from Let's Encrypt so that we no longer see the "Self-signed Certificate" warning. + +You need to get into your system container to do this. The steps are a follows: + +=> https://git.genenetwork.org/gn-machines/tree/README.org#n61 Figure out which process is your container +=> https://git.genenetwork.org/gn-machines/tree/README.org#n55 Get a shell into the container +=> https://guix-forge.systemreboot.net/manual/dev/en/#section-acme-service Run "/usr/bin/acme renew" to get your initial SSL certificates from Let's Encrypt. + +At this point, the traffic portion of the configuration is done. + +## Sample "/etc/nginx/nginx.conf" + +``` +user www-data; +worker_processes auto; +pid /run/nginx.pid; +# include /etc/nginx/modules-enabled/*.conf; + +access_log /var/log/nginx/access.log; +error_log /var/log/nginx/error.log error; + +events { + worker_connections 768; + # multi_accept on; +} + +stream { + upstream my-container { + # This is our Guix system container + server 127.0.0.1:9081; + } + + upstream host-https { + # Forward any https traffic for any previously existing domains on the + # host itself. + server 127.0.0.1:6443; + } + + map $ssl_preread_server_name $upstream { + yourdomain1.genenetwork.org my-container; + yourdomain2.genenetwork.org my-container; + default host-https; + } + + server { + listen 443; + proxy_pass $upstream; + ssl_preread on; + } +} + +http { + ## + # Basic Settings + ## + + ⋮ + + include /etc/nginx/conf.d/*.conf; + server { + ## Forward HTTP traffic to container + ## Without this, the HTTP calls will fall through to the defaults in + ## /etc/nginx/sites-enabled/ leading to http-dependent traffic, like + ## that of the ACME client, failing. + server_name yourdomain1.genenetwork.org yourdomain2.genenetwork.org …; + listen 80; + location / { + proxy_pass http://127.0.0.1:9080; + proxy_set_header Host $host; + } + } + include /etc/nginx/sites-enabled/*; + + ⋮ +} + +⋮ + +``` diff --git a/topics/deploy/deployment.gmi b/topics/deploy/deployment.gmi index b844821..74fd6f0 100644 --- a/topics/deploy/deployment.gmi +++ b/topics/deploy/deployment.gmi @@ -1,14 +1,21 @@ # Deploy GeneNetwork +## Tags + +* type: doc, docs, documentation +* keywords: deploy, deployment, deploying, guix, guix container, guix system container +* status: in progress + # Description This page attempts to document the deployment process we have for GeneNetwork. We use Guix system containers for deployment of CI/CD and the Guix configuration for the CI/CD container should be considered the authoritative reference. -=> https://github.com/genenetwork/genenetwork-machines/blob/main/genenetwork-development.scm +=> https://git.genenetwork.org/gn-machines/tree/genenetwork-development.scm See also => ./guix-system-containers-and-how-we-use-them +=> ./configuring-nginx-on-host ## genenetwork2 diff --git a/topics/deploy/genecup.gmi b/topics/deploy/genecup.gmi index c5aec17..fc93d07 100644 --- a/topics/deploy/genecup.gmi +++ b/topics/deploy/genecup.gmi @@ -53,3 +53,72 @@ and port forward: ssh -L 4200:127.0.0.1:4200 -f -N server curl localhost:4200 ``` + +# Troubleshooting + +## Moving the PubMed dir + +After moving the PubMed dir GeneCup stopped displaying part of the connections. This can be reproduced by running the standard example on the home page - the result should look like the image on the right of the home page. + +After fixing the paths and restarting the service there still was no result. + +Genecup is currently managed by the shepherd as user shepherd. Stop the service as that user: + +``` +shepherd@tux02:~$ herd stop genecup +guile: warning: failed to install locale +Service genecup has been stopped. +``` + +Now the servic looks stopped, but it is still running and you need to kill by hand: + +``` +shepherd@tux02:~$ ps xau|grep genecup +shepherd 89524 0.0 0.0 12780 944 pts/42 S+ 00:32 0:00 grep genecup +shepherd 129334 0.0 0.7 42620944 2089640 ? Sl Mar05 66:30 /gnu/store/1w5v338qk5m8khcazwclprs3znqp6f7f-python-3.10.7/bin/python3 /gnu/store/a6z0mmj6iq6grwynfvkzd0xbbr4zdm0l-genecup-latest-with-tensorflow-native-HEAD-of-master-branch/.server.py-real +shepherd@tux02:~$ kill -9 129334 +shepherd@tux02:~$ ps xau|grep genecup +shepherd 89747 0.0 0.0 12780 944 pts/42 S+ 00:32 0:00 grep genecup +shepherd@tux02:~$ +``` + +The log file lives in + +``` +shepherd@tux02:~/logs$ tail -f genecup.log +``` + +and we were getting errors on a reload and I had to fix + +``` +shepherd@tux02:~/shepherd-services$ grep export run_genecup.sh +export EDIRECT_PUBMED_MASTER=/export3/PubMed +export TMPDIR=/export/ratspub/tmp +export NLTK_DATA=/export3/PubMed/nltk_data +``` + +See + +=> https://git.genenetwork.org/gn-shepherd-services/commit/?id=cd4512634ce1407b14b0842b0ef6a9cd35e6d46c + +The symlink from /export2 is not honoured by the guix container. Now the service works. + +Note we have deprecation warnings that need to be addressed in the future: + +``` +2025-04-22 00:40:07 /home/shepherd/services/genecup/guix-past/modules/past/packages/python.scm:740:19: warning: 'texlive-union' is deprecated, + use 'texlive-updmap.cfg' instead +2025-04-22 00:40:07 guix build: warning: 'texlive-latex-base' is deprecated, use 'texlive-latex-bin' instead +2025-04-22 00:40:15 updating checkout of 'https://git.genenetwork.org/genecup'... +/gnu/store/9lbn1l04y0xciasv6zzigqrrk1bzz543-tensorflow-native-1.9.0/lib/python3.10/site-packages/tensorflow/python/framewo +rk/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. +2025-04-22 00:40:38 _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) +2025-04-22 00:40:38 /gnu/store/9lbn1l04y0xciasv6zzigqrrk1bzz543-tensorflow-native-1.9.0/lib/python3.10/site-packages/tensorflow/python/framewo +rk/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. +2025-04-22 00:40:38 _np_qint32 = np.dtype([("qint32", np.int32, 1)]) +2025-04-22 00:40:38 /gnu/store/9lbn1l04y0xciasv6zzigqrrk1bzz543-tensorflow-native-1.9.0/lib/python3.10/site-packages/tensorflow/python/framewo +rk/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. +2025-04-22 00:40:38 np_resource = np.dtype([("resource", np.ubyte, 1)]) +2025-04-22 00:40:39 /gnu/store/7sam0mr9kxrd4p7g1hlz9wrwag67a6x6-python-flask-sqlalchemy-2.5.1/lib/python3.10/site-packages/flask_sqlalchemy/__ +init__.py:872: FSADeprecationWarning: SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and will be disabled by default in the future. Set it to True or False to suppress this warning. +``` diff --git a/topics/deploy/installation.gmi b/topics/deploy/installation.gmi index 757d848..d6baa79 100644 --- a/topics/deploy/installation.gmi +++ b/topics/deploy/installation.gmi @@ -319,7 +319,7 @@ Currently we have two databases for deployment, from BXD mice and 'db_webqtl_plant' which contains all plant related material. -Download one database from +Download a recent database from => https://files.genenetwork.org/database/ diff --git a/topics/deploy/machines.gmi b/topics/deploy/machines.gmi index d610c9f..a7c197c 100644 --- a/topics/deploy/machines.gmi +++ b/topics/deploy/machines.gmi @@ -2,17 +2,19 @@ ``` - [ ] bacchus 172.23.17.156 (00:11:32:ba:7f:17) - 1 Gbs -- [X] lambda01 172.23.18.212 (7c:c2:55:11:9c:ac) +- [ ] penguin2 +- [X] lambda01 172.23.18.212 (7c:c2:55:11:9c:ac) - currently 172.23.17.41 - [X] tux03i 172.23.17.181 (00:0a:f7:c1:00:8d) - 10 Gbs [X] tux03 128.169.5.101 (00:0a:f7:c1:00:8b) - 1 Gbs -- [ ] tux04i 172.23.17.170 (14:23:f2:4f:e6:10) -- [ ] tux04 128.169.5.119 (14:23:f2:4f:e6:11) +- [X] tux04i 172.23.17.170 (14:23:f2:4f:e6:10) +- [X] tux04 128.169.5.119 (14:23:f2:4f:e6:11) - [X] tux05 172.23.18.129 (14:23:f2:4f:35:00) - [X] tux06 172.23.17.188 (14:23:f2:4e:29:10) - [X] tux07 172.23.17.191 (14:23:f2:4e:7d:60) - [X] tux08 172.23.17.186 (14:23:f2:4f:4e:b0) - [X] tux09 172.23.17.182 (14:23:f2:4e:49:10) - [X] space 128.169.5.175 (e4:3d:1a:80:6c:40) +- [ ] space-i 172.23.18.153 (cc:48:3a:13:db:4c) - [ ] octopus01f 172.23.18.221 (2c:ea:7f:60:bf:61) - [ ] octopus02f 172.23.22.159 (2c:ea:7f:60:bd:61) - [ ] octopus03f 172.23.19.187 (2c:ea:7f:60:ac:2b) @@ -25,6 +27,8 @@ c for console or control ``` - [ ] DNS entries no longer visible +- [X] penguin2-c 172.23.31.83 +- [ ] octolair01 172.23.16.228 - [X] lambda01-c 172.23.17.173 (3c:ec:ef:aa:e5:50) - [X] tux01-c 172.23.31.85 (58:8A:5A:F9:3A:22) - [X] tux02-c 172.23.30.40 (58:8A:5A:F0:E6:E4) diff --git a/topics/deploy/paths-in-flask-applications.gmi b/topics/deploy/paths-in-flask-applications.gmi new file mode 100644 index 0000000..77bc201 --- /dev/null +++ b/topics/deploy/paths-in-flask-applications.gmi @@ -0,0 +1,22 @@ +# Paths in Flask Application + +## Tags + +* type: doc, docs, documentation +* assigned: fredm +* keywords: application paths, flask, absolute path, relative path + +## Content + +Always build and use absolute paths for the resources you use in your application. Assuming that the application will always be run with the root of the application's repository/package as the working directory is a recipe for failure. + +To demonstrate, see the following issue: +=> /issues/genenetwork2/haley-knott-regression-mapping-error + +In this case, the path issue was not caught in the CI/CD environment since it runs the application with the repository root as its working directory. This issue will also not show up in most development environments since it is easier to run the application from the root of the repository than have to set up the PYTHONPATH variables. + +In the new containers making use of the "(genenetwork services genenetwork)" module in gn-machines[fn:1], the working directory where the application is invoked has no relation with the application's package — in fact, the working directory is actually the root of the containers file system ("/"). + +# Footnotes + +[fn:1] https://git.genenetwork.org/gn-machines/ diff --git a/topics/deploy/setting-up-or-migrating-production-across-machines.gmi b/topics/deploy/setting-up-or-migrating-production-across-machines.gmi new file mode 100644 index 0000000..1f35dae --- /dev/null +++ b/topics/deploy/setting-up-or-migrating-production-across-machines.gmi @@ -0,0 +1,58 @@ +# Setting Up or Migrating Production Across Machines + +## Tags + +* type: documentation, docs, doc +* status: in-progress +* assigned: fredm +* priority: undefined +* keywords: migration, production, genenetwork +* interested-parties: pjotrp, zachs + +## Introduction + +Recent events (Late 2024 and early 2025) have led to us needing to move the production system from one machine to the other several time, due to machine failures, disk space, security concerns, and the like. + +In this respect, a number of tasks rise to the front as necessary to accomplish for a successful migration. Each of the following sections will detail a task that's necessary for a successful migration. + +## Set Up the Database + +* Extract: detail this — link to existing document in this repo. Also, probably note that we symlink the extraction back to `/var/lib/mysql`? +* Configure: detail this — link to existing document in this repo + +## Set Up the File System + +* TODO: List the necessary directories and describe what purpose each serves. This will be from the perspective of the container — actual paths on the host system are left to the builders choice, and can vary wildly. +* TODO: Prefer explicit binding rather than implicit — makes the shell scripts longer, but no assumptions have to be made, everything is explicitly spelled out. + +## Redis + +We currently (2025-06-11) use Redis for: + +- Tracking user collection (this will be moved to SQLite database) +- Tracking background jobs (this is being moved out to SQLite databases) +- Tracking running-time (not sure what this is about) +- Others? + +We do need to copy over the redis save file whenever we do a migration, at least until the user collections and background jobs features have been moved completely out of Redis. + +## Container Configurations: Secrets + +* TODO: Detail how to extract/restore the existing secrets configurations in the new machine + +## Build Production Container + +* TODO: Add notes on building +* TODO: Add notes on setting up systemd + +## NGINX + +* TODO: Add notes on streaming and configuration of it thereof + +## SSL Certificates + +* TODO: Add notes on acquisition and setup of SSL certificates + +## DNS + +* TODO: Migrate DNS settings diff --git a/topics/deploy/uthsc-vpn-with-free-software.gmi b/topics/deploy/uthsc-vpn-with-free-software.gmi index 344772c..95fd1cd 100644 --- a/topics/deploy/uthsc-vpn-with-free-software.gmi +++ b/topics/deploy/uthsc-vpn-with-free-software.gmi @@ -10,6 +10,11 @@ $ openconnect-sso --server uthscvpn1.uthsc.edu --authgroup UTHSC ``` Note that openconnect-sso should be run as a regular user, not as root. After passing Duo authentication, openconnect-sso will try to gain root priviliges to set up the network routes. At that point, it will prompt you for your password using sudo. +## Recommended way + +The recommended way is to use Arun's g-expression setup using guix. See below. It should just work, provided you have the +chained certificate that you can get from the browser or one of us. + ## Avoid tunneling all your network traffic through the VPN (aka Split Tunneling) openconnect, by default, tunnels all your traffic through the VPN. This is not good for your privacy. It is better to tunnel only the traffic destined to the specific hosts that you want to access. This can be done using the vpn-slice script. @@ -44,46 +49,46 @@ export OPENSSL_CONF=/tmp/openssl.cnf ``` Then, run the openconnect-sso client as usual. -## Putting it all together using Guix G-expressions +## Misconfigured UTHSC TLS certificate -Remembering to do all these steps is a hassle. Writing a shell script to automate this is a good idea, but why write shell scripts when we have G-expressions! Here's a G-expression script that I prepared earlier. -=> uthsc-vpn.scm -Download it, tweak the %hosts variable to specify the hosts you are interested in, and run it like so: +The UTHSC TLS certificate does not validate on some systems. You can work around this by downloading the certificate chain and adding it to your system: +* Navigate with browser to https://uthscvpn1.uthsc.edu/. Inspect the certificate in the browser (lock icon next to search bar) and export .pem file +* Move it to /usr/local/share/ca-certificates (with .crt extension) or equivalent +* On Debian/Ubuntu update the certificate store with update-ca-certificates +You should see ``` -$(guix build -f uthsc-vpn.scm) +Updating certificates in /etc/ssl/certs... +1 added, 0 removed; done. ``` +Thanks Niklas. See also +=> https://superuser.com/a/719047/914881 -# Troubleshooting - -Older versions would not show a proper dialog for sign-in. Try - +However, adding certificates to your system manually is not good security practice. It is better to limit the added certificate to the openconnect process. You can do this using the REQUESTS_CA_BUNDLE environment variable like so: ``` -export QTWEBENGINE_CHROMIUM_FLAGS=--disable-seccomp-filter-sandbox +REQUESTS_CA_BUNDLE=/path/to/uthsc/certificate.pem openconnect-sso --server uthscvpn1.uthsc.edu --authgroup UTHSC ``` -## Update certificate +## Putting it all together using Guix G-expressions -When the certificate expires you can download the new one with: +Remembering to do all these steps is a hassle. Writing a shell script to automate this is a good idea, but why write shell scripts when we have G-expressions! Here's a G-expression script that I prepared earlier. +=> uthsc-vpn.scm +Download it, download the UTHSC TLS certificate chain to uthsc-certificate.pem, tweak the %hosts variable to specify the hosts you are interested in, and run it like so: +``` +$(guix build -f uthsc-vpn.scm) +``` -* Navigate with browser to https://uthscvpn1.uthsc.edu/. Inspect the certificate in the browser (lock icon next to search bar) and export .pem file -* Move it to /usr/local/share/ca-certificates (with .crt extension) or equivalent -* On Debian/Ubuntu update the certificate store with update-ca-certificates - -You should see +to add a route by hand after you can do ``` -Updating certificates in /etc/ssl/certs... -1 added, 0 removed; done. +ip route add 172.23.17.156 dev tun0 ``` -Thanks Niklas. See also - -=> https://superuser.com/a/719047/914881 +# Troubleshooting -On GUIX you may need to point to the updated certificates file with: +Older versions would not show a proper dialog for sign-in. Try ``` -env REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt openconnect-sso --server uthscvpn1.uthsc.edu --authgroup UTHSC +export QTWEBENGINE_CHROMIUM_FLAGS=--disable-seccomp-filter-sandbox ``` ## Acknowledgement diff --git a/topics/deploy/uthsc-vpn.scm b/topics/deploy/uthsc-vpn.scm index c714731..82f67f5 100644 --- a/topics/deploy/uthsc-vpn.scm +++ b/topics/deploy/uthsc-vpn.scm @@ -1,11 +1,15 @@ -(use-modules ((gnu packages guile-xyz) #:select (guile-ini guile-lib guile-smc)) +(use-modules ((gnu packages python-web) #:select (python-requests python-urllib3)) + ((gnu packages guile-xyz) #:select (guile-ini guile-lib guile-smc)) ((gnu packages vpn) #:select (openconnect-sso vpn-slice)) - (guix gexp)) + (guix build-system python) + (guix download) + (guix gexp) + (guix packages)) ;; Put in the hosts you are interested in here. (define %hosts (list "octopus01" - "tux01.genenetwork.org")) + "spacex.uthsc.edu")) (define (ini-file name scm) "Return a file-like object representing INI file with @var{name} and @@ -19,6 +23,46 @@ (call-with-output-file #$output (cut scm->ini #$scm #:port <>)))))) +(define python-urllib3-1.26 + (package + (inherit python-urllib3) + (version "1.26.15") + (source + (origin + (method url-fetch) + (uri (pypi-uri "urllib3" version)) + (sha256 + (base32 + "01dkqv0rsjqyw4wrp6yj8h3bcnl7c678qkj845596vs7p4bqff4a")))) + (build-system python-build-system))) + +(define python-requests-2.28 + (package + (inherit python-requests) + (name "python-requests") + (version "2.28.1") + (source (origin + (method url-fetch) + (uri (pypi-uri "requests" version)) + (sha256 + (base32 + "10vrr7bijzrypvms3g2sgz8vya7f9ymmcv423ikampgy0aqrjmbw")))) + (build-system python-build-system) + (arguments (list #:tests? #f)) + (native-inputs (list)) + (propagated-inputs + (modify-inputs (package-propagated-inputs python-requests) + (replace "python-urllib3" python-urllib3-1.26))))) + +;; Login to the UTHSC VPN fails with an SSLV3_ALERT_HANDSHAKE_FAILURE +;; on newer python-requests. +(define openconnect-sso-uthsc + (package + (inherit openconnect-sso) + (inputs + (modify-inputs (package-inputs openconnect-sso) + (replace "python-requests" python-requests-2.28))))) + (define uthsc-vpn (with-imported-modules '((guix build utils)) #~(begin @@ -34,7 +78,9 @@ ("system_default" . "system_default_sect")) ("system_default_sect" ("Options" . "UnsafeLegacyRenegotiation"))))) - (invoke #$(file-append openconnect-sso "/bin/openconnect-sso") + (setenv "REQUESTS_CA_BUNDLE" + #$(local-file "uthsc-certificate.pem")) + (invoke #$(file-append openconnect-sso-uthsc "/bin/openconnect-sso") "--server" "uthscvpn1.uthsc.edu" "--authgroup" "UTHSC" "--" diff --git a/topics/documentation/guides_vs_references.gmi b/topics/documentation/guides_vs_references.gmi new file mode 100644 index 0000000..7df0be2 --- /dev/null +++ b/topics/documentation/guides_vs_references.gmi @@ -0,0 +1,24 @@ +# Guides Vs References + +Before coming up with docs, figure out their use. It can either be as a guide (provides solutions to problems encountered) or a reference (similar to man pages, where we provide detailed explanations). + +## For guides: + +* Be as brief as possible, providing reference links for users that want to explore i.e. don't aim from completeness, but rather practicality. +* Prefer providing code or command snippets where possible. +* Preferable have another team member review the docs. This helps eliminate blindspots due to our current knowledge. +* Organize the document in such a way that it starts with the most actionable steps. +* Avoid stream of consciousness writing. + +### Example + +Wrong: + +When setting up guix OS, I couldn't get `tmux` to start, getting `tmux: invalid LC_ALL, LC_CTYPE or LANG`. Running `locale -a` failed too. It took me a while to figure out the solution for this problem, and I attempted to reinstall `glibc-locales` which didn't help. After a lot of research, I found that the root cause was that my applications were built on a different version of `glibc`. I ran `guix update` and the problem disappeared. + +Correct: + +`tmux` failing with `tmux: invalid LC_ALL, LC_CTYPE or LANG` could be caused by having packages build on a different version of `glibc`. Attempt: + +> locale -a # should also fail +> guix update # rebuilds your packages with your current glibc diff --git a/topics/editing/case-attributes.gmi b/topics/editing/case-attributes.gmi new file mode 100644 index 0000000..1a86131 --- /dev/null +++ b/topics/editing/case-attributes.gmi @@ -0,0 +1,110 @@ +# Editing Case-Attributes + +## Tags + +* type: document +* keywords: case-attribute, editing +* assigned: fredm, zachs, acenteno, bonfacem +* status: requirements gathering + +## Introduction + +Case-attributes metadata for samples. They are include: sex, age, etc of the various individuals and exist separately from "normal" traits mainly because they're non-numeric. From the GN2 traits page, they are shown as extra columns under the "Reviews and Edit Data" section. + +Case-attributes are determined at the group-level. E.g. for BXD, case attributes would apply at the level of each sample, across all BXD data. Every strain has a unique attribute and it's fixed, not variable. + +We need to differentiate these two things: + +* Case-Attribute labels/names/categories (e.g. Sex, Height, Cage-handler, etc) +* Case-Attribute values (e.g. Male/Female, 20cm, Frederick, etc.) + +Currently, both labels and values are set at the group level: + +=> https://github.com/genenetwork/genenetwork1/blob/0f170f0b748a4e10eaf8538f6bcbf88b573ce8e7/web/webqtl/showTrait/DataEditingPage.py Case-Attributes on GeneNetwork1 +is a good starting point to help with understanding how case-attributes were implemented and how they worked. + +Critical bug existed where editing one case-attribute affected all case-attributes defined for a group. + +Case attributes can have the following data-types: + +* Free-form text (no constraints) - see the `Status` column +* Enumerations - textual data, but where the user can only pick from specific values +* Links - The value displayed also acts as a link - e.g. the 'JAX:*' values in the `RRID` column + +## HOWTO + +Example SQL query to fetch case-attribute data: + +``` +SELECT + caxrn.*, ca.Name AS CaseAttributeName, + ca.Description AS CaseAttributeDescription, + iset.InbredSetId AS OrigInbredSetId +FROM + CaseAttribute AS ca INNER JOIN CaseAttributeXRefNew AS caxrn + ON ca.Id=caxrn.CaseAttributeId +INNER JOIN + StrainXRef AS sxr + ON caxrn.StrainId=sxr.StrainId +INNER JOIN + InbredSet AS iset + ON sxr.InbredSetId=iset.InbredSetId +WHERE + caxrn.value != 'x' + AND caxrn.value IS NOT NULL; +``` + +CaseAttributeXRefNew differs from CaseAttributeXRef: + +``` +mysql> describe CaseAttributeXRef; ++------------------+----------------------+------+-----+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++------------------+----------------------+------+-----+---------+-------+ +| ProbeSetFreezeId | smallint(5) unsigned | NO | PRI | 0 | | +| StrainId | smallint(5) unsigned | NO | PRI | 0 | | +| CaseAttributeId | smallint(5) | NO | PRI | 0 | | +| Value | varchar(100) | NO | | | | ++------------------+----------------------+------+-----+---------+-------+ +4 rows in set (0.01 sec) + +mysql> describe CaseAttributeXRefNew; ++-----------------+------------------+------+-----+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++-----------------+------------------+------+-----+---------+-------+ +| InbredSetId | int(5) unsigned | NO | PRI | NULL | | +| StrainId | int(20) unsigned | NO | PRI | NULL | | +| CaseAttributeId | int(5) unsigned | NO | PRI | NULL | | +| Value | varchar(100) | NO | | NULL | | ++-----------------+------------------+------+-----+---------+-------+ +4 rows in set (0.01 sec) +``` + +=> https://github.com/genenetwork/genenetwork3/blob/dd0b29c07017ec398c447ca683dd4b4be18d73b7/scripts/update-case-attribute-tables-20230818 Script to update CaseAttribute and CaseAttributeXRefNew table + +## Tasks + +* @bmunyoki: Model case-attributes correctly in RDF. +* @bmunyoki, @zachs: Implement case-attributes editing in GN3 that correctly models case-attributes at the group-level. CRUD operations with the correct authorization. People who can edit sample data should not be able to edit case-attributes because case-attributes are defined at the group level; and editing case-attributes at the group-level will affect other samples. +* @rob: Confirm to team whether "N" and "SE" are case-attributes. @bmunyoki AFAICT, no. + + +Possible set of privileges subject to discussion: + +* group:resource:add-case-attributes - Allows user to add a completely new case attribute +* group:resource:edit-case-attributes - Allows user to edit an existing case attribute +* group:resource:delete-case-attributes - Allows user to delete an existing case attribute +* group:resource:view-case-attributes - Allows user to view case attributes and their value + +Given groups are not directly linked to any auth resource, we may introduce some level of indirection. Addy a new resource type that handles groups may solve this. + +## See Also + +=> https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$myIoafLp_dIONnyNvEI0k2xf3Y8-LyiI_mkP2vBN08o?via=matrix.org Discussion on Case-Attributes Editing in Matrix +=> https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$P6SNnpY-nAZsDr3VZlRi05m6MT32lXBsCl-BYLh-YLM?via=matrix.org More Discussion on Matrix +=> /issues/case-attr-edit-error Case Attribute Edting Problems +=> /issues/fix-case-attribute-work Fix Case Attribute Work (Same Columns) +=> /issues/fix-case-attribute-editing Editing Case Attribute +=> /issues/consecutive-crud-applications-when-uploading-data Fix Case Attribute Work (Consecutive CRUD applications) +=> /issues/edit-metadata-bugs Cannot Edit Metadata of BXD Traits Effectively +=> /topics/data-uploads/datasets Some Historical Context diff --git a/topics/editing/case_attributes.gmi b/topics/editing/case_attributes.gmi deleted file mode 100644 index 5a11026..0000000 --- a/topics/editing/case_attributes.gmi +++ /dev/null @@ -1,180 +0,0 @@ -# Editing Case-Attributes - -## Tags - -* type: document -* keywords: case-attribute, editing -* assigned: fredm, zachs, acenteno -* status: requirements gathering - -## Introduction - -Case-Attributes are essentially the metadata for the samples. In the GN2 system, they are the extra columns in the table in the "Reviews and Edit Data" accordion tab besides the value and its error margin. - -To quote @zachs - -> "Case Attributes" are basically just sample metadata. So stuff like the sex, age, etc of the various individuals (and exist separately from "normal" traits mainly because they're non-numeric) - -They are the metadata for the various sample in a trait. The case attributes are determined at the group-level: - -> Since they're metadata (or "attributes" in this case) for samples, they're group-level so for BXD, case attributes would apply at the level of each sample, across all BXD data - -Also From email: -> Every strain has a unique attribute and it's fixed, not variable. - -## Direction - -We need to differentiate two things: -* Case-Attribute labels/names/categories (e.g. Sex, Height, Cage-handler, etc) -* Case-Attribute values (e.g. Male/Female, 20cm, Frederick, etc.) - -As is currently implemented (as of before 2023-08-31), both the labels and values are set at group level. - -A look at -=> https://github.com/genenetwork/genenetwork1/blob/0f170f0b748a4e10eaf8538f6bcbf88b573ce8e7/web/webqtl/showTrait/DataEditingPage.py Case-Attributes on GeneNetwork1 -is a good starting point to help with understanding how case-attributes were implemented and how they worked. - -## Status - -There is code that existed for the case-attributes editing, but it had a critical bug where the data for existing attributes would be deleted/replaced randomly when one made a change. This lead to a pause in this effort. - -The chosen course of action will, however, not make use of this existing code. Instead, we will reimplement the feature with code in GN3, exposing the data and its editing via API endpoints. - -## Database - -The existing database tables of concern to us are: - -* InbredSet -* CaseAttribute -* StrainXRef -* Strain -* CaseAttributeXRefNew - -We can fetch case-attribute data from the database with: - -``` -SELECT - caxrn.*, ca.Name AS CaseAttributeName, - ca.Description AS CaseAttributeDescription, - iset.InbredSetId AS OrigInbredSetId -FROM - CaseAttribute AS ca INNER JOIN CaseAttributeXRefNew AS caxrn - ON ca.Id=caxrn.CaseAttributeId -INNER JOIN - StrainXRef AS sxr - ON caxrn.StrainId=sxr.StrainId -INNER JOIN - InbredSet AS iset - ON sxr.InbredSetId=iset.InbredSetId -WHERE - caxrn.value != 'x' - AND caxrn.value IS NOT NULL; -``` - -which gives us all the information we need to rework the database schema. - -Since the Case-Attributes are group-level, we need to move the `InbredSetId` to the `CaseAttribute` table from the `CaseAttributeXRefNew` table. - -For more concrete relationship declaration, we can have the `CaseAttributeXRefNew` table have it primary key be composed of the `InbredSetId`, `StrainId` and `CaseAttributeId`. That has the added advantage that we can index the table on `InbredSetId` and `StrainId`. - -That leaves the `CaseAttribute` table with the following columns: - -* InbredSetId: Foreign Key from `InbredSet` table -* Id: The CaseAttribute identifier -* Name: Textual name for the Case-Attribute -* Description: Textual description fro the case-attribute - -while the `CaseAttributeXRefNew` table ends up with the following columns: - -* InbredSetId: Foreign Key from `InbredSet` table -* StrainId: The strain -* CaseAttributeId: The case-attribute identifier -* Value: The value for the case-attribute for this specific strain - -There will not be any `NULL` values allowed for any of the columns in both tables. If a strain has no value, we simply delete the corresponding record from the `CaseAttributeXRefNew` table. - -To that end, the following script has been added to ease the migration of the table schemas: -=> https://github.com/genenetwork/genenetwork3/blob/dd0b29c07017ec398c447ca683dd4b4be18d73b7/scripts/update-case-attribute-tables-20230818 -The script is meant to be run only once, and makes the changes mentioned above for both tables. - -## Data Types - -> ... (and exist separately from "normal" traits mainly because they're non-numeric) - -The values for Case-Attributes are non-numeric data. This will probably be mostly textual data. - -As an example: -=> https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish Trait Data and Analysis for BXD_10010 -we see Case-Attributes as: - -* Free-form text (no constraints) - see the `Status` column -* Enumerations - textual data, but where the user can only pick from specific values -* Links - The value displayed also acts as a link - e.g. the 'JAX:*' values in the `RRID` column - - -=> https://genenetwork.org/show_trait?trait_id=10002&dataset=CCPublish For this trait - -We see: -* Numeric data - see the `N` and `SE` columns -though that might be a misunderstanding of the quote - -> In the following link for example, every column after Value is a case attribute - https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish - -**TODO**: Verify whether `N` and `SE` are Case-Attributes - -## Authorisation - -From email: -> it's probably not okay to let anyone who can edit sample data for a trait also edit case attributes, since they're group level - -and from matrix: -> The weird bug aside, Bonface had (mostly) successfully implemented editing these through the CSV files in the same way as any other sample data, but for authorization reasons this probably doesn't make sense (since a user having access to editing sample data for specific traits doesn't imply that they'd have access for editing case attributes across the entire group) - -From this, it implies we might need a new set of privileges for dealing with case-attributes, e.g. -* group:resource:add-case-attributes - Allows user to add a completely new case attribute -* group:resource:edit-case-attributes - Allows user to edit an existing case attribute -* group:resource:delete-case-attributes - Allows user to delete an existing case attribute -* group:resource:view-case-attributes - Allows user to view case attributes and their value - -Considering, however, that groups (InbredSets) are not directly linked to any auth resource, this might mean some indirection of sorts, or maybe add a new resource type that handles groups. - -## Features - -* Editing existing case-attributes: YES -* Adding new case attributes: ??? -* Deleting existing case attributes: ??? - -Strains/samples are shared across traits. The values for the case attributes are the same for a particular strain/sample for all traits within a particular InbredSet (group). - -## Related and Unsynthesised Chats - -=> https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$myIoafLp_dIONnyNvEI0k2xf3Y8-LyiI_mkP2vBN08o?via=matrix.org -``` -Zachary SloanZ -I'm pretty sure multiple phenotypes and mRNA datasets can belong to the same experiment (and definitely for the purposes of case attributes -since the mRNA datasets are split by tissue -genotype traits should all be considered part of the same "experiment" (at least as long as we're still only databasing a single genotype file for each group) - -pjotrp -: Case attribute editing will still need to be group level, at least until the whole feature is completely changed. Since they're basically just phenotypes we choose to show in the trait page table, and phenotypes are at the group level -``` - -=> https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$P6SNnpY-nAZsDr3VZlRi05m6MT32lXBsCl-BYLh-YLM?via=matrix.org -``` -Zachary SloanZ -21:14 -Groups are defined by their list of samples/strains, and the "case attributes" are just "the characteristics of those samples/strains we choose to show on the trait page" (if we move away from the "group" concept entirely that could change, but if we did that we probably would also replace "case attributes" with something else because the way that's implemented is kind of weird to begin with) -ZB -``` - -## Related issues - -=> /issues/case-attr-edit-error -=> /issues/fix-case-attribute-work -=> /issues/fix-case-attribute-editing -=> /issues/consecutive-crud-applications-when-uploading-data -=> /issues/edit-metadata-bugs - -## References - -=> /topics/data-uploads/datasets diff --git a/topics/engineering/improving-wiki-rif-search-in-genenetwork.gmi b/topics/engineering/improving-wiki-rif-search-in-genenetwork.gmi new file mode 100644 index 0000000..74e7178 --- /dev/null +++ b/topics/engineering/improving-wiki-rif-search-in-genenetwork.gmi @@ -0,0 +1,119 @@ +# Improving RIF+WIKI Search + +* author: bonfacem +* reviewed-by: jnduli + +At the time of this writing, WIKI and/or RIF Search is extremely slow for MySQL .e.g. searching: "WIKI=nicotine MEAN=(12.103 12.105)" causes an Nginx time-out in Genenetwork2. This blog discusses how we improved the WIKI+RIF search using XAPIAN and some of our key learnings. + +### TLDR; Key Learnings from Adding RIF+WIKI to the Index + +* xapian-compacting is IO bound. +* Instrument your indexing script and appropriately choose an appropriate parallel process_count that fits your needs. +* Do NOT store positional data unless you need it. +* Consider stemming your data and removing stop-words from your data ahead of indexing. + +### Slow MySQL Performance + +When indexing genes, we have a complex query [0] which returns 48,308,714 rows + +running an "EXPLAIN" on [0] yields: + +``` +1 +------+-------------+----------------+--------+-----------------------------+------------------+---------+------------------------------------------------------------+-------+-------------+ +2 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +3 +------+-------------+----------------+--------+-----------------------------+------------------+---------+------------------------------------------------------------+-------+-------------+ +4 | 1 | SIMPLE | ProbeSetFreeze | ALL | PRIMARY | NULL | NULL | NULL | 931 | | +5 | 1 | SIMPLE | ProbeFreeze | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeSetFreeze.ProbeFreezeId | 1 | Using where | +6 | 1 | SIMPLE | Tissue | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeFreeze.TissueId | 1 | | +7 | 1 | SIMPLE | InbredSet | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeFreeze.InbredSetId | 1 | Using where | +8 | 1 | SIMPLE | Species | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.InbredSet.SpeciesId | 1 | | +9 | 1 | SIMPLE | ProbeSetXRef | ref | ProbeSetFreezeId,ProbeSetId | ProbeSetFreezeId | 2 | db_webqtl.ProbeSetFreeze.Id | 27287 | | +10 | 1 | SIMPLE | ProbeSet | eq_ref | PRIMARY | PRIMARY | 4 | db_webqtl.ProbeSetXRef.ProbeSetId | 1 | | +11 | 1 | SIMPLE | Geno | eq_ref | species_name | species_name | 164 | db_webqtl.InbredSet.SpeciesId,db_webqtl.ProbeSetXRef.Locus | 1 | Using where | ++------+-------------+----------------+--------+-----------------------------+------------------+---------+------------------------------------------------------------+-------+-------------+ +``` + +From the above table, we note that we have "ref" under the "type" column in line 9. The "type" column describes how the rows are found from the table (I.e the join type) [2]. In this case, "ref" means a non-unique index or prefix is used to find all the rows which we can see by running "SHOW INDEXES FROM ProbeSetXRef" (note the Non-unique value of 1 for ProbeSetFreezeId): + +``` ++--------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | ++--------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| ProbeSetXRef | 0 | PRIMARY | 1 | DataId | A | 46061750 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 1 | ProbeSetFreezeId | 1 | ProbeSetFreezeId | A | 1688 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 1 | ProbeSetId | 1 | ProbeSetId | A | 11515437 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 1 | Locus_2 | 1 | Locus | A | 1806 | 5 | NULL | YES | BTREE | | | ++--------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +``` + +We get a performance hit on the join: "INNER JOIN ProbeSetXRef ON ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id" since ProbeSetXRef.ProbeSetFreezeId is a non-unique index. What this means to our query is that for rows scanned in the ProbeSetFreeze table, there are several rows under the ProbeSetXRef table tha will satisfy the JOIN condition. This is analogous to nested loops in programming. + +In the RIF Search, we append "INNER JOIN GeneRIF_BASIC ON GeneRIF_BASIC.symbol = ProbeSet.Symbol" to [0]. Running an EXPLAIN on this new query yields: + +``` +1 +------+-------------+----------------+--------+---------------------------------------+--------------+---------+------------------------------------------------------------+---------+-----------------------+ +2 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +3 +------+-------------+----------------+--------+---------------------------------------+--------------+---------+------------------------------------------------------------+---------+-----------------------+ +4 | 1 | SIMPLE | GeneRIF_BASIC | index | NULL | symbol | 777 | NULL | 1366287 | Using index | +5 | 1 | SIMPLE | ProbeSet | ref | PRIMARY,symbol_IDX,ft_ProbeSet_Symbol | symbol_IDX | 403 | func | 1 | Using index condition | +6 | 1 | SIMPLE | ProbeSetXRef | ref | ProbeSetFreezeId,ProbeSetId | ProbeSetId | 4 | db_webqtl.ProbeSet.Id | 4 | | +7 | 1 | SIMPLE | ProbeSetFreeze | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeSetXRef.ProbeSetFreezeId | 1 | | +8 | 1 | SIMPLE | ProbeFreeze | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeSetFreeze.ProbeFreezeId | 1 | Using where | +9 | 1 | SIMPLE | InbredSet | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeFreeze.InbredSetId | 1 | Using where | +10 | 1 | SIMPLE | Tissue | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeFreeze.TissueId | 1 | | +11 | 1 | SIMPLE | Species | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.InbredSet.SpeciesId | 1 | | +12 | 1 | SIMPLE | Geno | eq_ref | species_name | species_name | 164 | db_webqtl.InbredSet.SpeciesId,db_webqtl.ProbeSetXRef.Locus | 1 | Using where | +13 +------+-------------+----------------+--------+---------------------------------------+--------------+---------+------------------------------------------------------------+---------+-----------------------+ +``` + +From the above we see that we have an extra "ref" on line 5 which adds extra overhead. Additionally, now under the "ref" column we see "func" with a "Using index condition" under the "Extra" column. This means that we are using some function during this join [3]. Specifically, this is because the "symbol" column in the GeneRIF_BASIC table is indexed, but the "Symbol" column in the ProbeSet table is not indexed. Regardless, this increases the performance of the query by some orders of magnitude. + +### Adding RIF+WIKI Search to the Existing Gene Index + +Our current indexer[4] works by indexing the results from [0] in chunks of 100,000 into separate xapian databases stored in different directories. This happens by spawning different child processes from the main indexer script. The final step in this process is to compact all the different databases into one database. + +To add RIF+WIKI indices to the existing gene index, we built a global cache. In each child process, we fetch the relevant RIF+WIKI entry from this cache and index. This increased our indexing time and space consumption. At one point we ran out of our RAM causing an intermittent outage on 2024-06-21 (search for "Outage for 2024-06-20 in the following link"): + +=> https://issues.genenetwork.org/topics/meetings/jnduli_bmunyoki Meeting notes + +When troubleshooting our outage, we realized the indexing script consumed all the RAM. This was because the child processes spawned by the index script each consumed around 3GB of RAM; with the total number of child processes and their RAM usage exceeding the system RAM. To remedy this, we settled on a total_child_process count of 67, limiting the number of spawned children and putting a cap on the total number of RAM the indexing script could consume. You can see the fix in this commit: + +=> https://github.com/genenetwork/genenetwork3/commit/99d0d1200d7dcd81e27ce65ab84bab145d9ae543 feat: set 67 parallel processes to run in prod + +To try to speed our indexing speed, we attempted to parallelize our compacting. Parallelising had some improvements in reducing our compacting time, but nothing significant. On a SATA drive, compacting 3 different databases which had been compacted from 50 different databases was significantly faster than compacting one database at once from 150 different databases. The conclusion we could draw from this was that the compacting process is IO bound. This is useful data because it informs the type of drive you would want to run our indexing script in, and in our case, an NVMe drive is an ideal candidate because of the fast IO speeds it has. + +To attempt to reduce the index script's space consumption and improve the script's performance, we first removed stop-words and most common words from the global cache, and stemmed words from other documents. This reduced the space footprint to 152 Gb. This was still unacceptable per our needs. Further research with how xapian indexing works pointed us to positional data in the XAPIAN index. In XAPIAN, positional data allows someone to be able to perform phrase searches such as: "nicotine NEAR mouse" which loosely translates to "search for the term nicotine which occurs near the term mouse." One thing we noticed in the RIF+WIKI search is that we don't need this type of search, a trade-off we were willing to make to make search faster and our XAPIAN database smaller. Instrumenting the impact of dropping positional data from RIF+WIKI data was immediate. Our indexing times, on the NVMe drive dropped to a record high of 1 hour 9 minutes with a size of 73 Gb! The table below summarizes our findings: + + +``` +| | Indexing Time (min) | Space (Gb) | % Inc Size (from G+P) | % Inc Time | +|------------------------------------------------------------------------------------------------------------------ -----| +|G+P (no stop-words, no-stemming, pos. data) | 75 | 60 | 0 | 0 | +|G+P+W+R (no stop-words, no stemming, pos. data)| 429 | 152 | 153.3 | 472 | +|G+P+W+R (stop-words, stemming, no pos. data) | 69 | 73 | 21.6 | -8 | + +Key: +---- +G: Genes +P: Phenotypes +W: Wiki +R: RIF +``` + +### Some Example Searches + +With RIF+WIKI search added, here are some searches you can try out in CD genenetwork instance: + +* wiki:nicotine AND mean:12.103..12.105 +* rif:isoneuronal AND mean:12.103..12.105 +* species:mouse wiki:addiction rif:heteroneuronal mean:12.103..12.105 +* symbol:shh rif:glioma wiki:nicotine + +### References + +=> https://github.com/genenetwork/genenetwork3/blob/52cd294c2f1d06dddbd6ff613b11f8bc43066038/scripts/index-genenetwork#L54-L89 [0] Gene Indexing SQL Query +=> https://mariadb.com/kb/en/explain/ [1] MariaDB EXPLAIN +=> https://stackoverflow.com/a/4528433 [2] What does eq_ref and ref types mean in MySQL explain? +=> https://planet.mysql.com/entry/?id=29724 [3] The meaning of ref=func in MySQL EXPLAIN +=> https://issues.genenetwork.org/topics/engineering/instrumenting-ram-usage [3] Instrument RAM Usage +=> https://github.com/genenetwork/genenetwork3/blob/main/scripts/index-genenetwork#L54 index-genenetwork diff --git a/topics/engineering/instrumenting-ram-usage.gmi b/topics/engineering/instrumenting-ram-usage.gmi new file mode 100644 index 0000000..4f7ab96 --- /dev/null +++ b/topics/engineering/instrumenting-ram-usage.gmi @@ -0,0 +1,32 @@ +# Instrumenting RAM usage + +* author: bonfacem +* reviewed-by: jnduli + +On 2024-06-21, TUX02 experienced an outage because we ran out of RAM on the server. Here we outline how to instrument processes that consume RAM, in particular, what to watch out for. + +=> https://issues.genenetwork.org/topics/meetings/jnduli_bmunyoki Meeting Notes + +The output of "free -m -h" looks like: + +``` + total used free shared buff/cache available +Mem: 251G 88G 57G 6.2G 105G 155G +Swap: 29G 20G 9.8G +``` + +When running "free", you can refresh the output regularly. As an example, to get human readable output every 2 seconds: + +> free -m -h -s 2 + +It's tempting to check the "free" column to see how much RAM is being used. However, this column also includes disk caching. Disk caching doesn't prevent applications from getting the memory they want[1]. What we need to be aware of instead are: + +* available: Make sure this is within acceptable thresholds. +* swap used: Make sure this does not change significantly. + +Also, use htop/top and filter out the process (and preferably order by RAM usage) you are monitoring to see how much RAM a process and it's children (if any) consume. + +## References + +=> https://www.linuxatemyram.com/index.html [0] Linux ate my ram! +=> https://www.linuxatemyram.com/play.html [1] Experiments and fun with Linux disk cache diff --git a/topics/engineering/setting-up-a-basic-pre-commit-hook-for-linting-scheme-files.gmi b/topics/engineering/setting-up-a-basic-pre-commit-hook-for-linting-scheme-files.gmi new file mode 100644 index 0000000..5324de8 --- /dev/null +++ b/topics/engineering/setting-up-a-basic-pre-commit-hook-for-linting-scheme-files.gmi @@ -0,0 +1,31 @@ +# Setting Up a Basic Pre-Commit Hook for Linting Scheme Files + +* author: bonfacem +* reviewed-by: jnduli + +Git executes hooks before/after events such as: commit, push and receive. A pre-commit hook runs before a commit is finalized [0]. This post shows how to create a pre-commit hook for linting scheme files using `guix style`. + +``` +# Step 1: Create the hook +touch .git/hooks/pre-commit + +# Step 2: Make the hook executable +chmod +x .git/hooks/pre-commit + +# Step 3: Copy the following to .git/hooks/pre-commit + +#!/bin/sh + +# Run guix style on staged .scm files +for file in $(git diff --cached --name-only --diff-filter=ACM | grep ".scm$"); do + if ! guix style --whole-file "$file"; then + echo "Linting failed for $file. Please fix the errors and try again." + exit 1 + fi + git add $file +done +``` + +## References: + +=> https://www.slingacademy.com/article/git-pre-commit-hook-a-practical-guide-with-examples/ [0] Git Pre-Commit Hook: A Practical Guide (with Examples) diff --git a/topics/engineering/using-architecture-decision-records-in-genenetwork.gmi b/topics/engineering/using-architecture-decision-records-in-genenetwork.gmi new file mode 100644 index 0000000..43d344c --- /dev/null +++ b/topics/engineering/using-architecture-decision-records-in-genenetwork.gmi @@ -0,0 +1,56 @@ +# Using Architecture Decision Records at GeneNetwork + +* author: bonfacem +* reviewed-by: fredm, jnduli + +> One of the hardest things to track during the life of a project is the motivation behind certain decisions. A new person coming on to a project may be perplexed, baffled, delighted, or infuriated by some past decision. +> -- Michael Nygard + +When building or maintaining software, there's often moments when we ask, "What were they thinking?" This happens when we are trying to figure out why something was done a certain way, leading to speculation, humor, or criticism[0]. Given the constraints we face when writing code, it's important to make sure that important decisions are well-documented and transparent. Architecture Decision Records (ADRs) are one such tool. They provide a structured way to capture the reasoning behind key decisions. + +ADRs consist 4 key sections [0]: + +* Status: An ADR begins with a proposed status. After discussions, it will be accepted or rejected. It is also possible for a decision to be superseded by a newer ADR later on. +* Context: The context section outlines the situation or problem, providing the background and constraints relevant to the decision. This section is meant to frame the issue concisely, not as a lengthy blog post or detailed explanation. +* Decision: This section clearly defines the chosen approach and the specific actions that will be taken to address the issue. +* Consequences: This part lays out the impact or outcomes of the decision, detailing the expected results and potential trade-offs. + +Optionally, when an ADR is rejected, you can add a section: + +* Rejection Rationale: Briefly provides some context for why the ADR was rejected. + +At GeneNetwork, we manage ADRs within our issue tracker, organizing them under the path "/topics/ADR/<project-name>/XXX-name.gmi". The "XXX" represents a three-digit number, allowing for an easy, chronological order of the proposals as they are created. + +Here is a template for a typical ADR in Genenetwork: + +``` +# [<project>/ADR-<XXX>] Title + +* author: author-name +* status: proposed +* reviewed-by: A, B, C + +## Context + +Some context. + +## Decision + +Decisions. + +## Consequences + +Consequences. +``` + +Here are some examples of Genenetwork specific ADRs: + +=> https://issues.genenetwork.org/topics/ADR/gn3/000-add-test-cases-for-rdf [gn3/ADR-000] Add RDF Test Case +=> https://issues.genenetwork.org/topics/ADR/gn3/000-remove-stace-traces-in-gn3-error-response [gn3/ADR-001] Remove Stack Traces in GN3 + +### References + +=> https://www.oreilly.com/library/view/mastering-api-architecture/9781492090625/ [0] Gough, J., Bryant, D., & Auburn, M. (2022). Mastering API Architecture: Design, Operate, and Evolve API-based Systems. O'Reilly Media, Incorporated. +=> https://adr.github.io/ [1] Architectural Decision Records. Homepage of the ADR GitHub organization +=> https://docs.aws.amazon.com/prescriptive-guidance/latest/architectural-decision-records/adr-process.html [2] Amazon's ADR process +=> https://cloud.google.com/architecture/architecture-decision-records [3] Google Cloud Center Architecture Decision Records Overview diff --git a/topics/engineering/working-with-virtuoso-locally.gmi b/topics/engineering/working-with-virtuoso-locally.gmi new file mode 100644 index 0000000..af249a5 --- /dev/null +++ b/topics/engineering/working-with-virtuoso-locally.gmi @@ -0,0 +1,70 @@ +# Working with Virtuoso for Local Development + +* author: bonfacem +* reviewed-by: jnduli + +Using guix, install the Virtuoso server: + +``` +guix install virtuoso-ose # or any other means to install virtuoso +cd /path/to/virtuoso/database/folder +cp $HOME/.guix-profile/var/lib/virtuoso/db/virtuoso.ini ./virtuoso.ini +# modify the virtuoso.ini file to save files to the folder you'd prefer +virtuoso-t +foreground +wait +debug +``` + +## Common Virtuoso Operations + +Use isql to load up data: + +``` +isql +# subsquent commands run in isql prompt +# this folder is relative to the folder virtuoso was started from +ld_dir ('path/to/folder/with/ttls', '*.ttl', 'http://genenetwork.org'); +rdf_loader_run(); +checkpoint; +``` + +Add data using HTTP: + +``` +# Replace dba:dba with <user>:<password> +curl --digest --user 'dba:dba' --verbose --url\ +"http://localhost:8890/sparql-graph-crud-auth?graph=http://genenetwork.org"\ +-T test-data.ttl +``` + +Delete data using HTTP: + +``` +# Replace dba:dba with <user>:<password> +curl --digest --user 'dba:dba' --verbose --url\ +"http://localhost:8890/sparql-graph-crud-auth?graph=http://genenetwork.org" -X DELETE +``` + +Query the graph data: + +``` +curl --verbose --url\ +"http://localhost:8890/sparql-graph-crud?graph=http://genenetwork.org" +``` + +Check out more cURL examples here: + +=> https://vos.openlinksw.com/owiki/wiki/VOS/VirtGraphProtocolCURLExamples SPARQL 1.1 Graph Store HTTP Protocol cURL Exampple Collection + +## Setting Passwords + +Virtuoso's default user is "dba" and its default password is "dba". To change a password, use isql to run: + +``` +set password "dba" "dba"; +CHECKPOINT; +``` + +## More + +Read a fuller more complete tutorial on Virtuoso here: + +=> https://issues.genenetwork.org/topics/systems/virtuoso Virtuoso diff --git a/topics/genenetwork-releases.gmi b/topics/genenetwork-releases.gmi new file mode 100644 index 0000000..e179629 --- /dev/null +++ b/topics/genenetwork-releases.gmi @@ -0,0 +1,77 @@ +# GeneNetwork Releases + +## Tags + +* status: open +* priority: +* assigned: +* type: documentation +* keywords: documentation, docs, release, releases, genenetwork + +## Introduction + +The sections that follow will be note down the commits used for various stable (and stable-ish) releases of genenetwork. + +The tagging of the commits will need to distinguish repository-specific tags from overall system tags. + +In this document, we only concern ourselves with the overall system tags, that shall have the template: + +``` +genenetwork-system-v<major>.<minor>.<patch>[-<commit>] +``` + +the portions in angle brackets will be replaced with the actual version numbers. + +## genenetwork-system-v1.0.0 + +This is the first, guix-system-container-based, stable release of the entire genenetwork system. +The commits involved are: + +=> https://github.com/genenetwork/genenetwork2/commit/314c6d597a96ac903071fcb6e50df3d9e88935e9 GN2: 314c6d5 +=> https://github.com/genenetwork/genenetwork3/commit/0d902ec267d96b87648669a7a43b699c8a22a3de GN3: 0d902ec +=> https://git.genenetwork.org/gn-auth/commit/?id=8e64f7f8a392b8743a4f36c497cd2ec339fcfebc: gn-auth: 8e64f7f +=> https://git.genenetwork.org/gn-libs/commit/?id=72a95f8ffa5401649f70978e863dd3f21900a611: gn-libs: 72a95f8 + +The guix channels used for deployment of the system above are as follows: + +``` +(list (channel + (name 'guix-bioinformatics) + (url "https://git.genenetwork.org/guix-bioinformatics/") + (branch "master") + (commit + "039a3dd72c32d26b9c5d2cc99986fd7c968a90a5")) + (channel + (name 'guix-forge) + (url "https://git.systemreboot.net/guix-forge/") + (branch "main") + (commit + "bcb3e2353b9f6b5ac7bc89d639e630c12049fc42") + (introduction + (make-channel-introduction + "0432e37b20dd678a02efee21adf0b9525a670310" + (openpgp-fingerprint + "7F73 0343 F2F0 9F3C 77BF 79D3 2E25 EE8B 6180 2BB3")))) + (channel + (name 'guix-past) + (url "https://gitlab.inria.fr/guix-hpc/guix-past") + (branch "master") + (commit + "5fb77cce01f21a03b8f5a9c873067691cf09d057") + (introduction + (make-channel-introduction + "0c119db2ea86a389769f4d2b9c6f5c41c027e336" + (openpgp-fingerprint + "3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5")))) + (channel + (name 'guix) + (url "https://git.savannah.gnu.org/git/guix.git") + (branch "master") + (commit + "2394a7f5fbf60dd6adc0a870366adb57166b6d8b") + (introduction + (make-channel-introduction + "9edb3f66fd807b096b48283debdcddccfea34bad" + (openpgp-fingerprint + "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA"))))) +``` diff --git a/topics/genenetwork/Case_Attributes_GN2 b/topics/genenetwork/Case_Attributes_GN2 new file mode 100644 index 0000000..52a956f --- /dev/null +++ b/topics/genenetwork/Case_Attributes_GN2 @@ -0,0 +1,2 @@ +# Update Case Attributes to capture hierarchy info +## The following provides guidelines and insight regarding case attributes as used in GeneNetwork Webservice searches diff --git a/topics/genenetwork/genenetwork-services.gmi b/topics/genenetwork/genenetwork-services.gmi new file mode 100644 index 0000000..717fdd8 --- /dev/null +++ b/topics/genenetwork/genenetwork-services.gmi @@ -0,0 +1,122 @@ +# GeneNetwork Services + +## Tags + +* type: documentation +* keywords: documentation, docs, doc, services, genenetwork services + +## GeneNetwork Core Services + +GeneNetwork is composed of a number of different services. This document attempts to document all the services that make up GeneNetwork and document what links give access to the services. + +### GeneNetwork2 + +This is the main user-interface to the entire GeneNetwork system. + +#### Links + +=> https://github.com/genenetwork/genenetwork2 Repository +=> https://genenetwork.org/ GN2 on production +=> https://fallback.genenetwork.org/ GN2 on old production +=> https://cd.genenetwork.org/ GN2 on CI/CD +=> https://staging.genenetwork.org/ GN2 on staging + +### GeneNetwork3 + +This is the main API server for GeneNetwork. + +#### Links + +=> https://github.com/genenetwork/genenetwork3 Repository +=> https://genenetwork.org/api3/ GN3 on production +=> https://fallback.genenetwork.org/api3/ GN3 on old production +=> https://cd.genenetwork.org/api3/ GN3 on CI/CD +=> https://staging.genenetwork.org/api3/ GN3 on staging + +### Sparql Service + +The SparQL service is served from a Virtuoso-OSE service. + +=> https://issues.genenetwork.org/topics/deploy/our-virtuoso-instances We have notes on our virtuoso instances here. + + +#### Links + +=> https://github.com/genenetwork/genenetwork3 Repository +=> https://sparql.genenetwork.org/sparql/ sparql-service on production +* ??? sparql-service on old production +* ??? sparql-service on CI/CD +* ??? sparql-service on staging + +### GN-Auth + +This is the authorisation server for the GeneNetwork system. + +#### Links + +=> https://git.genenetwork.org/gn-auth/ Repository +=> https://auth.genenetwork.org/ gn-auth on production +=> https://fallback.genenetwork.org/gn-auth/ gn-auth on old production +* ??? gn-auth on CI/CD +=> https://staging-auth.genenetwork.org/ gn-auth on staging + +### GN-Uploader + +This service is to be used for uploading data to GeneNetwork. It is currently in development (best case, alpha). + +#### Links + +=> https://git.genenetwork.org/gn-uploader/ Repository +* ??? gn-uploader on production +* ??? gn-uploader on old production +* ??? gn-uploader on CI/CD +=> https://staging-uploader.genenetwork.org/ gn-uploader on staging + +### Aliases Server + +An extra server to respond with aliases for genetic (etc.) symbols. + +This is currently a project in racket, but we should probably pull in the features in this repository into one of the others (probably GeneNetwork3) and trash this repository. + +#### Links + +=> https://github.com/genenetwork/gn3 Repository +=> https://genenetwork.org/gn3/ aliases-server on production +=> https://fallback.genenetwork.org/gn3/ aliases-server on old production +=> https://cd.genenetwork.org/gn3/ aliases-server on CI/CD +=> https://staging.genenetwork.org/gn3/ aliases-server on staging + +### Markdown Editing Server + +#### Links + +=> https://git.genenetwork.org/gn-guile/ Repository +=> https://genenetwork.org/facilities/ markdown-editing-server on production +=> https://fallback.genenetwork.org/facilities/ markdown-editing-server on old production +=> https://cd.genenetwork.org/facilities/ markdown-editing-server on CI/CD +=> https://staging.genenetwork.org/facilities/ markdown-editing-server on staging + +## Support Services + +These are other services that support the development and maintenance of the core services. + +### Issue Tracker + +We use a text-based issue tracker that is accessible via +=> https://issues.genenetwork.org/ + +The repository for this service is at +=> https://github.com/genenetwork/gn-gemtext-threads/ + +### Repositories Server + +This is where a lot of the genenetwork repositories live. You can access it at +=> https://git.genenetwork.org/ + +### Continuous Integration Service + +… + +=> https://ci.genenetwork.org/ + +### … diff --git a/topics/genenetwork/genenetwork-streaming-functionality.gmi b/topics/genenetwork/genenetwork-streaming-functionality.gmi new file mode 100644 index 0000000..4f81eea --- /dev/null +++ b/topics/genenetwork/genenetwork-streaming-functionality.gmi @@ -0,0 +1,43 @@ +# Genenetwork Streaming Functionality + +## Tags +* type: documentation +* Keywords: documentation, docs, genenetwork, streaming + +### Introduction +Genenetwork implements streaming functionality that logs results from a running external process to a terminal emulator. + +The streaming functionality can be divided into several sections. + +### Streaming UI +The terminal emulator is implemented using the `xterm.js` library and +logs results from the GN3 API. + +See: +=> https://github.com/xtermjs/xterm.js + +### Streaming API +This is the main endpoint for streaming: + +See reference: +=> https://github.com/genenetwork/genenetwork3/gn3/api/streaming.py + +### How to Integrate + +#### Import the `enable_streaming` Decorator + +``` +from gn3.computations.streaming import enable_streaming +``` + +#### Apply the Decorator to Your Endpoint that Runs an External Process + +Note: To run the external process, use the `run_process` function, +which captures the `stdout` in a file identified by the `run_id`. + +``` +@app.route('/your-endpoint') +@enable_streaming +def your_endpoint(streaming_output_file): + run_process(command, streaming_output_file, run_id) +``` diff --git a/topics/genenetwork/starting_gn1.gmi b/topics/genenetwork/starting_gn1.gmi index efbfd0f..e31061f 100644 --- a/topics/genenetwork/starting_gn1.gmi +++ b/topics/genenetwork/starting_gn1.gmi @@ -51,9 +51,7 @@ On an update of guix the build may fail. Try #######################################' # Environment Variables - private ######################################### - # sql_host = '[1]tux02.uthsc.edu' - # sql_host = '128.169.4.67' - sql_host = '172.23.18.213' + sql_host = '170.23.18.213' SERVERNAME = sql_host MYSQL_SERVER = sql_host DB_NAME = 'db_webqtl' diff --git a/topics/gn-learning-team/next-steps.gmi b/topics/gn-learning-team/next-steps.gmi new file mode 100644 index 0000000..b427923 --- /dev/null +++ b/topics/gn-learning-team/next-steps.gmi @@ -0,0 +1,48 @@ +# Next steps + +Wednesday we had a wrap-up meeting of the gn-learning efforts. + +## Data uploading + +The goal of these meetings was to learn how to upload data into GN. In the process Felix has become the de facto uploader, next to Arthur. A C. elegans dataset was uploaded and Felix is preparing + +* More C. elegans +* HSRat +* Kilifish +* Medaka + +Updates are here: + +=> https://issues.genenetwork.org/tasks/felixl + +We'll keep focussing on that work and hopefully we'll get more parties interested in doing some actual work down the line. + +## Hosting GN in Wageningen + +Harm commented that he thought these meetings were valuable, particularly we learnt a lot about GN ins and outs. Harm suggests we focus on hosting GN in Wageningen for C. elegans and Arabidopsis. +Pjotr says that is a priority this year, even if we start on a privately hosted machine in NL. Wageningen requires Docker images and Bonface says that is possible - with some work. So: + +* Host GN in NL +* Make GN specific for C.elegans and Arabidopsis - both trim and add datasets +* Create Docker container +* Host Docker container in Wageningen +* Present to other parties in Wageningen + +Having above datasets will help this effort succeed. + +## AI + +Harm is also very interested in the AI efforts and wants to pursue that in the context of above server - i.e., functionality arrives when it lands in GN. + +## Wormbase + +Jameson suggest we can work with Wormbase and the Caender folks once we have a running system. Interactive data analysis is very powerful and could run in conjunction with those sites. + +=> https://caendr.org/ +=> https://wormbase.org/ + +Other efforts are Flybase and Arabidopsis Magic which we can host, in principle. + +## Mapping methods + +Jameson will continue with his work on risiduals. diff --git a/topics/gn-uploader/genome-details.gmi b/topics/gn-uploader/genome-details.gmi new file mode 100644 index 0000000..f8a12f6 --- /dev/null +++ b/topics/gn-uploader/genome-details.gmi @@ -0,0 +1,42 @@ +# Genome Details + +This file is probably misnamed. + +*TODO*: Update name once we know where this fits + +## Tags + +* type: documentation, doc, docs +* assigned: fredm +* priority: docs +* status: open +* keywords: gn-uploader, uploader, genome + +## Location + +### centiMorgan (cM) + +We no longer use centiMorgan in GeneNetwork + +From the email threads: + +``` +> … +> Sorry, we now generally do not use centimorgans. Chr 19 is 57 cM +> using markers that exclude telomeres in most crosses. +> … +``` + +and + +``` +> … +> I know that cM is a bit more variable because it's not a direct measurement, … +> … +``` + +### Megabasepairs (Mbp) + +The uploader will store any provided physical location values (in megabasepairs) in the +=> https://gn1.genenetwork.org/webqtl/main.py?FormID=schemaShowPage#Geno Geno table +specifically in the `Mb` field of that table. diff --git a/topics/gn-uploader/genotypes-assemblies-markers-and-genenetwork.gmi b/topics/gn-uploader/genotypes-assemblies-markers-and-genenetwork.gmi new file mode 100644 index 0000000..db0ddf3 --- /dev/null +++ b/topics/gn-uploader/genotypes-assemblies-markers-and-genenetwork.gmi @@ -0,0 +1,40 @@ +# Genotypes, Assemblies, Markers and GeneNetwork + +## Tags + +* type: documentation, docs, doc +* keywords: genotype, assembly, markers, data, database, genenetwork, uploader + +## Markers + +``` +The marker is the SNP… + +— Rob (Paraphrased) +``` + +SNPs (Single Nucleotide Polymorphisms) are specific locations of interest within the genome, where the pair of nucleotides can take different forms. + +A SNP and its immediate neighbourhood (a number of megabase pairs before and after the SNP) form a sequence that is effectively the marker, e.g. for mouse (Mus musculus) you could have the following sequence from the GRCm38 genome assembly (mm10): + +``` +GAGATAAAGATGGGTCCCTTGGCACAGGACTGGCCCACATTTCCaatataaattacaacaattttttttaaatttttaaaCAAAACAAGCATCTCACACAC/TTGAAAAAGAAGATGCATTCAAAGAAAATAGATGTTTCAATGTATTTAAGATAATCAAGAGATAACCATGACCATATCATGAGGAAACTTAAGAATTGGCA +``` + +where the position with `C/T` represents the SNP of interest and thus the marker. + +You can search this on the UCSC Genome Browser, specifically the +=> https://genome.ucsc.edu/cgi-bin/hgBlat BLAT search +to get the name of the marker, and some extra details regarding it. + +## Genome Assemblies + +The genome assembly used will "determine" the position of the marker on the genome — newer assemblies will (generally) give a better position accounting for more of the issues discovered in older assemblies. + +With most of the newer assemblies, the positions do not shift very drastically. + +## GeneNetwork + +Currently (September 2024), GeneNetwork uses the GRCm38 (mm10) assembly for mice. + +Unfortunately, since the system was built for mice, the tables (e.g. Geno table) do not account for the fact that you could have markers (and other data) from species other than Mus musculus. You thus have the Geno table with fields like `Mb_mm8`, `Chr_mm8` which are very mouse-specific. diff --git a/topics/gn-uploader/types-of-data.gmi b/topics/gn-uploader/types-of-data.gmi new file mode 100644 index 0000000..1f53dec --- /dev/null +++ b/topics/gn-uploader/types-of-data.gmi @@ -0,0 +1,63 @@ +# Types of Data in GeneNetwork + +## Tags + +* assigned: +* priority: +* status: open +* type: documentation +* keywords: gn-uploader, uploader, genenetwork, documentation, doc, docs, data, data type, types of data + +## Description + +There are five (5) main types of data in GeneNetwork + +* Classical Phenotypes (PublishData) +* High Content Data +* Genotype Data +* Cofactors and Attributes +* Metadata + +### Classical Phenotypes + +This is usually low-content data e.g. body weight, tail length, etc. + +This is currently saved in the `Publish*` tables in the database. + +This data is saved as is i.e. not log-transformed + +### High Content Data + +This includes mainly molecular data such as +* mRNA assay data +* genetic expression data +* probes +* tissue type and data + +These data are saved in the `ProbeSet*` database tables (and other closely related tables like the `Tissue*` tables - fred added this: verify). + +These could be saved in the database in a log-tranformed form - verify. + +How do you check for log-transformation in the data? + +### Genotype Data + +This is core data, and all other data seem to rely on its existence. + +Useful for: +* correlations, cofactor and PheWAS computations. +* mapping purposes +* search and display +* editing and curation + +### Cofactors and Attributes + +This data can be alphanumeric (mix of numerical and non-numerical) data. + +It is not intended for mapping. + +### Metadata + +This data should (ideally) always accompany any and all of the data types above. It provides contextual information regarding the data it accompanies, and is useful for search, and other contextualising operations. + +It is alphanumeric data, and mostly cannot be used for numeric computations. diff --git a/topics/guix/guix-profiles.gmi b/topics/guix/guix-profiles.gmi index 578bb82..8cf41d8 100644 --- a/topics/guix/guix-profiles.gmi +++ b/topics/guix/guix-profiles.gmi @@ -16,7 +16,7 @@ Alternatively put the following into a channels.scm file. ``` (list (channel (name 'gn-bioinformatics) - (url "https://gitlab.com/genenetwork/guix-bioinformatics") + (url "https://git.genenetwork.org/guix-bioinformatics") (branch "master"))) ``` Build a profile using diff --git a/topics/gunicorn/deploying-app-under-url-prefix.gmi b/topics/gunicorn/deploying-app-under-url-prefix.gmi new file mode 100644 index 0000000..b2e382f --- /dev/null +++ b/topics/gunicorn/deploying-app-under-url-prefix.gmi @@ -0,0 +1,121 @@ +# Deploying Your Flask Application Under a URL Prefix With GUnicorn + +## TAGS + +* type: doc, documentation, docs +* author: fredm, zachs +* keywords: flask, gunicorn, SCRIPT_NAME, URL prefix + +## Introduction + +You have your application and are ready to deploy it, however, for some reason, you want to deploy it under a URL prefix, rather than at a top-level-domain. + +This short article details the things you need to set up. + +## Set up Your WebServer (Nginx) + +You need to tell your webserver to serve the application under a particular url prefix. You do this using that particular webserver's reverse-proxying configurations: For this article, we will use Nginx as the server. + +Normally, you'd simply do something like: + +``` +server { + server_name your.server.domain + + ⋮ + + location /the-prefix/ { + proxy_pass http://127.0.0.1:8080/; + proxy_set_header Host $host; + ⋮ + } + + ⋮ +} +``` + +Here, your top-level domain will be https://your.server.domain and you therefore want to access your shiny new application at https://your.server.domain/the-prefix/ + +For a simple application, with no sessions or anything, this should work, somewhat, though you might run into trouble with things like static files (e.g. css, js, etc) if the application does not use the same ones as that one on the TLD. + +If you are using sessions, you might also run into an issue where there is an interaction in the session management of both applications, especially if the application on the TLD makes use of services from the application at the url prefix. This is mostly due to redirects from the url-prefix app getting lost and hitting the TLD app. + +To fix this, we change the configuration above to: + +``` +server { + server_name your.server.domain + + ⋮ + + location /the-prefix/ { + proxy_pass http://127.0.0.1:8080/the-prefix/; + proxy_set_header Host $host; + ⋮ + } + + ⋮ +} +``` + +but now, you get errors, since there is no endpoint in your shiny new app that in at the route /the-prefix/***. + +Enter Gunicorn! + + +## Setting up SCRIPT_NAME for GUnicorn + +### The "Hacky" Way + +At the point of invocation of GUnicorn, we set the SCRIPT_NAME environment variable to the value "/the-prefix" — note that there is no trailing slash; this is very important. You should now have something like: + +``` +$ export SCRIPT_NAME="/the-prefix" +$ gunicorn --bind 0.0.0.0:8082 --workers … +``` + +The first line tells GUnicorn what the URL prefix is. It will use this to compute what URL to pass to the flask application. + +Example, say you try accessing the endpoint + +``` +https://your.server.domain/the-prefix/auth/authorise?response_type=code&client_id=some-id&redirect_uri=some-uri +``` + +Gunicorn will split that URL into 2 parts using the value of the SCRIPT_NAME environment variable, giving you: + +* https://your.server.domain +* /auth/authorise?response_type=code&client_id=some-id&redirect_uri=some-uri + +It will then pass on the second part to flask. This is why the value of SCRIPT_NAME should not have a trailing slash. + +Note that using the SCRIPT_NAME environment variable is a convenience feature provided by GUnicorn, not a WSGI feature. If you ever change your WSGI server, there is no guarantee this fix will work. + +### Using WSGI Routing MiddleWare + +A better way is to make use of a WSGI routing middleware. You could do this by defining a separate WSGI entry point in your application's repository. + +``` +# wsgi_url_prefix.py +from werkzeug.wrappers import Response +from werkzeug.middleware.dispatcher import DispatcherMiddleware + +from app import create_app + +def init_prefixed_app(theapp): + theapp.wsgi_app = DispatcherMiddleware( + Response("Not Found", 404), + { + "/the-prefix": the_app.wsgi_app + }) + return theapp + + +app = init_prefixed_app(create_app()) +``` + +## References + +=> https://docs.gunicorn.org/en/latest/faq.html#how-do-i-set-script-name +=> https://dlukes.github.io/flask-wsgi-url-prefix.html +=> https://www.reddit.com/r/Python/comments/juwj3x/comment/gchdsld/ diff --git a/topics/lmms/bulklmm/readme.gmi b/topics/lmms/bulklmm/readme.gmi new file mode 100644 index 0000000..8bd96a8 --- /dev/null +++ b/topics/lmms/bulklmm/readme.gmi @@ -0,0 +1 @@ +This is a stub diff --git a/topics/lmms/gemma/permutations.gmi b/topics/lmms/gemma/permutations.gmi new file mode 100644 index 0000000..4c8932a --- /dev/null +++ b/topics/lmms/gemma/permutations.gmi @@ -0,0 +1,1014 @@ +# Permutations + +Currently we use gemma-wrapper to compute the significance level - by shuffling the phenotype vector 1000x. +As this is a lengthy procedure we have not incorporated it into the GN web service. The new bulklmm may work +in certain cases (genotypes have to be complete, for one). + +Because of many changes gemma-wrapper is not working for permutations. I have a few steps to take care of: + +* [X] read R/qtl2 format for phenotype + +# R/qtl2 and GEMMA formats + +See + +=> data/R-qtl2-format-notes + +# One-offs + +## Phenotypes + +For a study Dave handed me phenotype and covariate files for the BXD. Phenotypes look like: + +``` + +Record ID,21526,21527,21528,21529,21530,21531,21532,21537,24398,24401,24402,24403,24404,24405,24406,24407,24408,24412,27513,27514,27515,27516, +27517 +BXD1,18.5,161.5,6.5,1919.450806,3307.318848,0.8655,1.752,23.07,0.5,161.5,18.5,6.5,1919.450806,3307.318848,0.8655,1.752,0.5,32,1.5,1.75,2.25,1. +25,50 +BXD100,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x +BXD101,20.6,176.199997,4.4,2546.293945,4574.802734,1.729,3.245,25.172001,0.6,176.199997,20.6,4.4,2546.294189,4574.802734,1.7286,3.2446,0.6,32, +1.875,2.375,2.75,1.75,38 +BXD102,18.785,159.582993,6.167,1745.671997,4241.505859,0.771,2.216,22.796667,0.25,159.583328,18.785,6.166667,1745.672485,4241.506348,0.770667, +2.216242,0.25,28.08333,1.5,2,2.875,1.5,28.5 +... +``` + +which is close to the R/qtl2 format. GEMMA meanwile expects a tab delimited file where x=NA. You can pass in the column number with the -n switch. One thing GEMMA lacks it the first ID which has to align with the genotype file. The BIMBAM geno format, again, does not contain the IDs. See + +=> http://www.xzlab.org/software/GEMMAmanual.pdf + +What we need to do is create and use R/qtl2 format files because they can be error checked on IDs and convert those, again, to BIMBAM for use by GEMMA. In the past I wrote Python converters for gemma2lib: + +=> https://github.com/genetics-statistics/gemma2lib + +I kinda abandoned the project, but you can see a lot of functionality, e.g. + +=> https://github.com/genetics-statistics/gemma2lib/blob/master/gemma2/format/bimbam.py + +We also have bioruby-table as a generic command line tool + +=> https://github.com/pjotrp/bioruby-table + +which is an amazingly flexible tool and can probably do the same. I kinda abandoned that project too. You know, bioinformatics is a graveyard of projects :/ + +OK, let's try. The first step is to convert the phenotype file to something GEMMA can use. We have to make sure that the individuals align with the genotype file(!). So, because we work with GN's GEMMA files, the steps are: + +* [X] Read the JSON layout file - 'sample_list' is essentially the header of the BIMBAM geno file +* [X] Use the R/qtl2-style phenotype file to write a correct GEMMA pheno file (multi column) +* [X] Compare results with GN pheno output + +Running GEMMA by hand it complained + +``` +## number of total individuals = 235 +## number of analyzed individuals = 26 +## number of covariates = 1 +## number of phenotypes = 1 +## number of total SNPs/var = 21056 +## number of analyzed SNPs = 21056 +Calculating Relatedness Matrix ... +rsm10000000001, X, Y, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0.5, 0, 1, 0, 1, 0.5, 0, 1, 0, 0, 0, 1, 1, 0, 0.5, 1, 1, 0.5, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0.5, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0.5, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0.5, 0, 0, 0.5, 0, 1, 0, 1, 0, 0, 1, 0.5, 0, 1, 0, 0.5, 1, 1, 1, 1, 0.5, 0, 0, 0.5, 1, 0.5, 0.5, 0.5, 1, 0.5, 1, 0.5, 0.5, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0.5, 0, 0, 1, 0, 0.5, 1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5 +237 != 235 +WARNING: Columns in geno file do not match # individuals in phenotypes +ERROR: Enforce failed for not enough genotype fields for marker in src/gemma_io.cpp at line 1470 in BimbamKin +``` + +GEMMA on production is fine. So, I counted BXDs. For comparison, GN's pheno outputs 241 BXDs. Daves pheno file has 241 BXDs (good). But when using my script we get 235 BXDs. Ah, apparently they are different from what we use on GN because GN does not use the parents and the F1s for GEMMA. So, my script should complain when a match is not made. Turns out the JSON file only contains 235 'mappable' BXDs and refers to BXD.8 which is from Apr 26, 2023. The header says `BXD_experimental_DGA_7_Dec_2021` and GN says WGS March 2022. So which one is it? I'll just go with latest, but genotype naming is problematic and the headers are not updated. + +> MOTTO: Always complain when there are problems! + +Luckily GEMMA complained, but the script should have also complained. The JSON file with 235 genometypes is not representing the actual 237 genometypes. We'll work on that in the next section. + +Meanwhile let's add this code to gemma-wrapper. The code can be found here: + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/master/bin/rqtl2-pheno-to-gemma.py + +## Genotypes + +The pheno script now errors with + +``` +ERROR: sets differ {'BXD065xBXD102F1', 'C57BL/6J', 'DBA/2J', 'BXD077xBXD065F1', 'D2B6F1', 'B6D2F1'} +``` + +Since these are parents and F1s, and are all NAs in Dave's phenotypes, they are easy to remove. So, now we have 235 samples in the phenotype file and 237 genometypes in the genotype file (according to GEMMA). A quick check shows that BXD.geno has 236 genometypes. Same for the bimbam on production. We now have 3 values: 235, 236 and 237. Question is why these do not overlap. + +### Genotype probabilities for GEMMA + +Another problem on production is that we are not using the standard GEMMA values. So GEMMA complains with + +``` +WARNING: The maximum genotype value is not 2.0 - this is not the BIMBAM standard and will skew l_lme and effect sizes +``` + +This explains why we divide the effect size by 2 in the GN production code. Maybe it is a better idea to fix then geno files! + +* [X] Generate BIMBAM file from GENO .geno files (via R/qtl2) +* [X] Check bimbam files on production + +So we need to convert .geno files as they are the current source of genotypes in GN and contain the sample names that we need to align with pheno files. For this we'll output two files - one JSON file with metadata and sample names and the actual BIMBAM file GEMMA requires. I notice that I actually never had the need to parse a geno file! Zach wrote a tool `gn2/maintenance/convert_geno_to_bimbam.py` that also writes the GN JSON file and I'll take some ideas from that. We'll also need to convert to R/qtl2 as that is what Dave can use and then on to BIMBAM. So, let's add that code to gemma-wrapper again. + +This is another tool at + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/master/bin/gn-geno-to-gemma.py + +where the generated JSON file helps create the pheno file. We ended up with 237 genometypes/samples to match the genotype file and all of Dave's samples matched. Also, now I was able to run GEMMA successfully and passed in the pheno column number with + +``` +gemma -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 +gemma -lmm 9 -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -k output/result.cXX.txt -n 5 +``` + +the pheno file can include the sample names as long as there are no spaces in them. For marker rs3718618 we get values -9 0 X Y 0.317 7.930689e+02 1.779940e+02 1.000000e+05 7.532662e-05. The last value translates to + +``` +-Math.log10(7.532662e-05) => 4.123051519468808 +``` + +and that matches GN's run of GEMMA w.o. LOCO. + +The next step is to make the -n switch run with LOCO on gemma-wrapper. + +``` +./bin/gemma-wrapper --loco --json -- -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > K.json +./bin/gemma-wrapper --keep --force --json --loco --input K.json -- -lmm 9 -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > GWA.json +``` + +Checking the output we get + +``` +-Math.log10(3.191755e-05) => 4.495970452606926 +``` + +and that matches Dave's output for LOCO and marker rs3718618. All good, so far. Next step permute. + +## Permute + +Now we have gemma-wrapper working we need to fix it to work with the latest type of files. + +* [X] randomize phenotypes using -n switch +* [X] Permute gemma and collect results +* [X] Unseed randomizer or make it an option +* [X] Fix tmpdir +* [X] Show final score +* [X] Compare small and large BXD set + +For the first one, the --permutate-phenotype switch takes the input pheno file. Because we pick a column with gemma we can randomize all input lines together. So, in the above example, we shuffle BXD_pheno_Dave-GEMMA.txt. Interestingly it looks like we are already shuffling by line in gemma-wrapper. + +The good news is that it runs, but the outcome is wrong: + +``` +["95 percentile (significant) ", 1000.0, -3.0] +["67 percentile (suggestive) ", 1000.0, -3.0] +``` + +Inspecting the phenotype files they are shuffled, e.g. + +``` +BXD073xBXD065F1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA +BXD49 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA +BXD86 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA +BXD161 15.623 142.908997 4.0 2350.637939 3294.824951 1.452 2.08 20.416365 0.363636 142.909088 15.622727 4.0 2350.638672 3294.825928 1.45 +1636 2.079909 0.363636 33.545448 2.125 2.0 2.375 1.25 44.5 +BXD154 20.143 195.5 4.75 1533.689941 4568.76416 0.727 2.213748 27.9275 0.75 195.5 20.142857 4.75 1533.690796 4568.76416 0.72675 2.2137 +48 0.75 54.5 0.75 1.75 3.0 1.5 33.0 +``` + +which brings out an interesting point. Most BXDs in the genotype file are missing from this experiment. We are computing LOD scores as if we have a full BXD population. So, what we are saying here is that if we have all BXD genotypes and we randomly assign phenotypes against a subset, what is the chance we get a hit at random. I don't think this is a bad assumption, but it not exactly what Gary Churchill had in mind in his 1994 paper: + +=> https://pubmed.ncbi.nlm.nih.gov/7851788/ Empirical threshold values for quantitative trait mapping + +The idea is to shuffle genotypes against phenotypes. If there is a high correlation we get a result. The idea is to break the correlation and that should work for both the large and the small BXD set. Scoring the best 'random' result out of 1000 permutations at, say 95% highest, sets the significance level. +With our new precompute we should be able to show the difference. Anyway, that is one problem, the other is that the stats somehow do not add up to the final result. Score min is set at + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/7769f209bcaff2472ba185234fad47985e59e7a3/bin/gemma-wrapper#L667 + +The next line says 'if false'. Alright, that explains part of it at least as the next block was disabled for slurm and is never run. I should rip the slurm stuff out, actually, as Arun has come up with a much better solution. But that is for later. + +Disabling that permutation stopped with + +``` +Add parallel job: time -v /bin/gemma -loco X -k 02fe8482913a998e6e9559ff5e3f1b89e904d59d.X.cXX.txt.cXX.txt -o 55b49eb774f638d16fd267313d8b4d1d6d2a0a25.X.assoc.txt -p phenotypes-1 -lmm 9 -g BXD-test.txt -n 5 -a BXD.8_snps.txt -outdir /tmp/d20240823-4481-xfrnp6 +DEBUG: Reading 55b49eb774f638d16fd267313d8b4d1d6d2a0a25.X.assoc.txt.1.assoc.txt +./bin/gemma-wrapper:672:in `foreach': No such file or directory @ rb_sysopen - 55b49eb774f638d16fd267313d8b4d1d6d2a0a25.X.assoc.txt.1.assoc.txt (Errno::ENOENT) +``` + +so it created a file, but can't find it because outdir is not shared. Now tmpdir is in the outer block so the file should still exist. For troubleshooting the first step is to seed the randomizer (seed) so we get the same run every time. +It turns out there are a number of problems. First of all the permutation output was numbered and the result was not found. Fixing that gave a first result without the -parallel switch: + +``` +[0.0008489742, 0.03214928, 0.03426648, 0.0351207, 0.0405179, 0.04688354, 0.0692488, 0.1217158, 0.1270747, 0.1880325] +["95 percentile (significant) ", 0.0008489742, 3.1] +["67 percentile (suggestive) ", 0.0351207, 1.5] +``` + +That is pleasing and it suggests that we have a significant result for the trait of interest: `volume of the first tumor that developed`. Running LOCO withouth parallel is slow (how did we survive in the past!). + +The 100 run shows + +``` +[0.0001626146, 0.0001993085, 0.000652191, 0.0007356249, 0.0008489742, 0.0009828207, 0.00102203, 0.001091924, 0.00117823, 0.001282312, 0.001471041, 0.001663572, 0.001898194, 0.003467039, 0.004655921, 0.005284387, 0.005628393, 0.006319995, 0.006767502, 0.007752473, 0.008757406, 0.008826192, 0.009018125, 0.009735282, 0.01034488, 0.01039465, 0.0122644, 0.01231366, 0.01265093, 0.01317425, 0.01348443, 0.013548, 0.01399461, 0.01442383, 0.01534904, 0.01579931, 0.01668551, 0.01696015, 0.01770371, 0.01838937, 0.01883068, 0.02011034, 0.02234977, 0.02362105, 0.0242342, 0.02520063, 0.02536663, 0.0266905, 0.02932001, 0.03116032, 0.03139836, 0.03176087, 0.03214928, 0.03348359, 0.03426648, 0.0351207, 0.03538503, 0.0354338, 0.03609931, 0.0371134, 0.03739827, 0.03787489, 0.04022586, 0.0405179, 0.04056273, 0.04076034, 0.04545012, 0.04588635, 0.04688354, 0.04790254, 0.05871501, 0.05903692, 0.05904868, 0.05978341, 0.06103624, 0.06396175, 0.06628317, 0.06640048, 0.06676557, 0.06848021, 0.0692488, 0.07122914, 0.07166011, 0.0749728, 0.08174019, 0.08188341, 0.08647539, 0.0955264, 0.1019648, 0.1032776, 0.1169525, 0.1182405, 0.1217158, 0.1270747, 0.1316735, 0.1316905, 0.1392859, 0.1576149, 0.1685975, 0.1880325] +["95 percentile (significant) ", 0.0009828207, 3.0] +["67 percentile (suggestive) ", 0.01442383, 1.8] +``` + +Not too far off! + +The command was + +``` +./bin/gemma-wrapper --debug --no-parallel --keep --force --json --loco --input K.json --permutate 100 --permute-phenotype BXD_pheno_Dave-GEMMA.txt -- -lmm 9 -g BXD-test.txt -n 5 -a BXD.8_snps.txt +``` + +It is fun to see that when I did a second run the + +``` +[100, ["95 percentile (significant) ", 0.0002998286, 3.5], ["67 percentile (suggestive) ", 0.01167864, 1.9]] +``` + +significance value was 3.5. Still, our hit is whopper - based on this. + +## Run permutations in parallel + +Next I introduced and fixed parallel support for permutations, now we can run gemma LOCO with decent speed - about 1 permutation per 3s! That is one trait in an hour on my machine. + +=> https://github.com/genetics-statistics/gemma-wrapper/commit/a8d3922a21c7807a9f20cf9ffb62d8b16f18c591 + +Now we can run 1000 permutations in an hour, rerunning above we get + +``` +["95 percentile (significant) ", 0.0006983356, 3.2] +["67 percentile (suggestive) ", 0.01200505, 1.9] +``` + +which proves that 100 permutations is not enough. It is a bit crazy to think that 5% of randomized phenotypes will get a LOD score of 3.2 or higher! + +Down the line I can use Arun's CWL implementation to fire this on a cluster. Coming... + +## Reduce genotypes for permutations + +In the next phase we need to check if shuffling the full set of BXDs makes sense for computing permutations. Since I wrote a script for this exercise to transform BIMBAM genotypes I can reuse that: + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/a8d3922a21c7807a9f20cf9ffb62d8b16f18c591/bin/gn-geno-to-gemma.py#L31 + +If we check the sample names we can write a reduced genotype matrix. Use that to compute the GRM. Next permute with the smaller BXD sample set and genotypes. + +Instead of modifying above script I decided to add another one + +``` +bimbam-filter.py --json BXD.geno.json --sample-file BXD_pheno_Dave-GEMMA-samples.txt BXD_geno.txt > BXD_geno-samples.txt +``` + +which takes as inputs the json file from gn-geno-to-gemma and the GEMMA input file. This is not to mix targets and keeping the code simple. Now create the GRM with + +``` +./bin/gemma-wrapper --loco --json -- -gk -g BXD_geno-samples.txt -p BXD_pheno_Dave-GEMMA-samples.txt -n 5 -a BXD.8_snps.txt > K-samples.json +./bin/gemma-wrapper --keep --force --json --loco --input K-samples.json -- -lmm 9 -g BXD_geno-samples.txt -p BXD_pheno_Dave-GEMMA-samples.txt -n 5 -a BXD.8_snps.txt > GWA-samples.json +``` + +Now the hit got reduced: + +``` +-Math.log10(1.111411e-04) +=> 3.9541253091741235 +``` + +and with 1000 permutations + +``` +./bin/gemma-wrapper --debug --parallel --keep --force --json --loco --input K-samples.json --permutate 1000 --permute-phenotype BXD_pheno_Dave-GEMMA-samples.txt -- -lmm 9 -g BXD_geno-samples.txt -n 5 -a BXD.8_snps.txt +["95 percentile (significant) ", 0.0004184217, 3.4] +["67 percentile (suggestive) ", 0.006213012, 2.2] +``` + +we are still significant. Though the question is now why results differ so much, compared to using the full BXD genotypes. + +## Why do we have a difference with the full BXD genotypes? + +GEMMA strips out the missing phenotypes in a list. Only the actual phenotypes are used. We need to check how the GRM is used and what genotypes are used by GEMMA. For the GRM the small genotype file compares vs the large: + +``` +Samples small large +BXD1 <-> BXD1 0.248 0.253 +BXD24 <-> BXD24 0.255 0.248 +BXD1 <-> BXD24 -0.040 -0.045 +BXD1 <-> BXD29 0.010 0.009 +``` + +You can see there is a small difference in the computation of K even though it looks pretty close. This is logical because with the full BXD set all genotypes are used. With a smaller BXD set only those genotypes are used. We expect a difference in values, but not much of a difference in magnitude (shift). The only way to prove that K impacts the outcome is to take the larger matrix and reduce it to the smaller one using those values. I feel another script coming ;) + +Above numbers are without LOCO. With LOCO on CHR18 + +``` +Samples small large +BXD1 <-> BXD1 0.254 0.248 +BXD1 <-> BXD24 -0.037 -0.042 +``` + +again a small shift. OK, let's try computing with a reduced matrix and compare results for rs3718618. Example: + +``` +gemma -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt -o full-bxd +gemma -lmm 9 -k output/full-bxd.cXX.txt -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt -o full-bxd +``` + +we get three outcomes where full-bxd is the full set, +``` +output/full-bxd.assoc.txt:18 rs3718618 7.532662e-05 +output/full-reduced-bxd.assoc.txt:18 rs3718618 2.336439e-04 +output/small-bxd.assoc.txt:18 rs3718618 2.338226e-04 +``` + +even without LOCO you can see a huge jump for the full BXD kinship matrix, just looking at our hit rs3718618: + +``` +-Math.log10(7.532662e-05) +=> 4.123051519468808 +-Math.log10(2.338226e-04) +=> 3.631113514641496 +``` + +With LOCO the difference may be even greater. + +So, which one to use? Truth is that the GRM is a blunt instrument. Essentially every combination of two samples/strains/genometypes gets compressed into a single number that gives a distance between the genomes. This number represents a hierarchy of relationships computed in differences in DNA (haplotypes) between those individuals. The more DNA variation is represented in the calculation, the more 'fine tuned' this GRM matrix becomes. Instinctively the larger matrix, or full BXD population, is a better estimate of distance between the individuals than just using a subset of DNA. + +So, I still underwrite using the full BXD for computing the GRM. To run GEMMA, I have just proven we can use the reduced GRM which will be quite a bit faster too, as the results are the same. For permutations we *should* use the reduced form of the full BXD GRM as it does not make sense to shuffle phenotypes against BXDs we don't use. So I need to recompute that. + +## Recomputing significance with the reduced GRM matrix + +* [ ] Recomute significance with reduced GRM + +I can reuse the script I wrote for the previous section. + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/master/bin/grm-filter.py + +So, the idea is to rerun permutations with the small set, but with the reduced GRM from the full BXD population. That ought to be straightforward by using the new matrix as an input for GWA. Only problem is that LOCO generates a GRM for every chromosome, so we need to make gemma-wrapper aware about the matrix reduction. As the reduction is fast we can do it for every run of gemma-wrapper and destroy it automatically with tmpdir. So: + +* [X] Compute the full GRM for every LOCO (if not cached) - already part of gemma-wrapper +* [X] Run through GRMs and reduce them in tmpdir +* [X] Plug new GRM name into computations - which really updates the JSON file that is input for GWA + +The interesting bit is that GEMMA requires input of phenotypes, but does not use them to compute the GRM. + +After giving it some thought we want GRM reduction to work in production GN because of the speed benefit. That means modifying gemma-wrapper to take a list of samples/genometypes as input - and we'll output that with GN. It is a good idea anyhow because it can give us some improved error feedback down the line. + +We'll use the --input switch to gemma-wrapper by providing the full list of genometypes that are used to compute the GRM and the 'reduced' list of genometypes that are used to reduce the GRM and compute GWA after. +So the first step is to create this JSON input file. We already created the "gn-geno-to-gemma" output that has a full list of samples as parsed from the GN .geno file. Now we need a script to generate the reduced samples JSON and merge that to "gn-geno-to-gemma-reduced" by addind a "samples-reduced" vector. + +The rqtl2-pheno-to-gemma.py script I wrote above already takes the "gn-geno-to-gemma" JSON. It now adds to the JSON: + +``` + "samples-column": 2, + "samples-reduced": { + "BXD1": 18.5, + "BXD24": 27.510204, + "BXD29": 17.204, + "BXD43": 21.825397, + "BXD44": 23.454, + "BXD60": 22.604, + "BXD63": 19.171, + "BXD65": 21.607, + "BXD66": 17.056999, + "BXD70": 17.962999, + "BXD73b": 20.231001, + "BXD75": 19.952999, + "BXD78": 19.514, + "BXD83": 18.031, + "BXD87": 18.258715, + "BXD89": 18.365, + "BXD90": 20.489796, + "BXD101": 20.6, + "BXD102": 18.785, + "BXD113": 24.52, + "BXD124": 21.762142, + "BXD128a": 18.952, + "BXD154": 20.143, + "BXD161": 15.623, + "BXD210": 23.771999, + "BXD214": 19.533117 + }, + "numsamples-reduced": 26 +``` + +which is kinda cool because now I can reduce and write the pheno file in one go. Implementation: + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/master/bin/rqtl2-pheno-to-gemma.py + +OK, we are going to input the resulting JSON file into gemma-wrapper. At the GRM stage we ignore the reduction but we need to add these details to the outgoing JSON. So the following commands can run: + +``` +./bin/gemma-wrapper --loco --json --input BXD_pheno_Dave-GEMMA.txt.json -- -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > K.json +``` + +where K.json has a json["input"] which essentially is above structure. + +``` +./bin/gemma-wrapper --keep --force --json --loco --input K.json -- -lmm 9 -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > GWA.json +``` + +Now I have to deal with phenotype files as they are rewritten. We should still cater for `-p` for GEMMA. We already have `--permute-phenotypes filen` for gemma-wrapper. Now we are adding `--phenotypes` to gemma-wrapper which replaces both! +Note that we can use -p if --phenotypes is NOT defined. Problem is we have a few paths now: + +* [X] Check phenotypes are directly passed into GEMMA with -p switch +* [X] Check phenotypes are passed in as a file with --phenotypes switch +* [X] Check phenotypes are coming in using the JSON file + +Fixed the first one with + +=> https://github.com/genetics-statistics/gemma-wrapper/commit/2b7570a7f0ba0d1080c730b208823c0622dd8f2c + +though that does not do caching (yet). Next stop doing LOCO I notice xz is phenomenally slow. Turns out it was not xz, but when using `tar -C` we switch into the path and somehow xz kept growing its output. + +At this point David told me that we don't have to do epoch or covariates. So it is just the traits. After getting side-tracked by a slow running python program for haplotype assessment we start up again. + +So, now we can pass in a trait using JSON. This is probably not a great idea when you have a million values, but for our purposes it will do. K.json contains the reduced samples. Next GWA is run on that. I had to fix minor niggles and get `parallel' to give more useful debug info. + +Next write the pheno file and pass it in! + +``` +./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g test/data/input/BXD_geno.txt.gz -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 -n 2 -debug +``` + +note the '-n 2' switch to get the second generated column in the phenotype file. We had our first successful run! To run permutations I get: + +``` +./bin/gemma-wrapper:722:in `<main>': You should supply --permute-phenotypes with gemma-wrapper --permutate (RuntimeError) +``` + +and, of course, as this reduced file is generated it not available yet. That was an easy fix/hack. Next I got + +``` +./bin/gemma-wrapper:230:in `block in <main>': Do not use the GEMMA -p switch with gemma-wrapper if you are using JSON phenotypes! +``` + +Hmm. This is a bit harder. The call to GWAS takes a kinship matrix and it gets reduced with every permutation. That is probably OK because it runs quickly, but I'll need to remove the -p switch... OK. Done that and permutations are running in a second for 28 BXD! That implies computing significance in the web service comes into view - especially if we use a cluster on the backend. + +It is interesting to see that 60% of time is spent in the kernel - which means still heavy IO on GEMMA's end - even with the reduced data: + +``` +%Cpu0 : 39.1 us, 51.0 sy +%Cpu1 : 34.0 us, 54.8 sy +%Cpu2 : 35.8 us, 54.5 sy +%Cpu3 : 37.5 us, 49.8 sy +%Cpu4 : 36.0 us, 53.3 sy +%Cpu5 : 29.5 us, 57.9 sy +%Cpu6 : 42.7 us, 44.7 sy +%Cpu7 : 35.9 us, 52.2 sy +%Cpu8 : 27.0 us, 60.7 sy +%Cpu9 : 24.5 us, 63.2 sy +%Cpu10 : 29.8 us, 58.9 sy +%Cpu11 : 25.3 us, 62.7 sy +%Cpu12 : 28.1 us, 58.9 sy +%Cpu13 : 34.2 us, 52.8 sy +%Cpu14 : 34.6 us, 52.2 sy +%Cpu15 : 37.5 us, 51.8 sy +``` + +There is room for more optimization. + +The good news is for a peak we have we find that it is statistically significant: + +``` +["95 percentile (significant) ", 0.0004945423, 3.3] +["67 percentile (suggestive) ", 0.009975183, 2.0] +``` + +Even though it was low permutations there was actually a real bug. It turns out I only picked the values from the X chromosome (ugh!). It looks different now. + +For the peaks of + +=> https://genenetwork.org/show_trait?trait_id=21526&dataset=BXDPublish + +after 1000 permutations (I tried a few times) the significance threshold with MAF 0.05 ends up at approx. + +["95 percentile (significant) ", 1.434302e-05, 4.8] +["67 percentile (suggestive) ", 0.0001620244, 3.8] + +If it is it means that for this trait BXD_21526 the peaks on chr 14 at LOD 3.5 are not significant, but close to suggestive (aligning with Dave's findings and comments). It is interesting to see the numbers quickly stabilize by 100 permutations (see attached). Now, this is before correcting for epoch effects and other covariates. And I took the data from Dave as is (the distribution looks fairly normal). Also there is a problem with MAF I have to look into: + +GEMMA in GN2 shows the same result when setting MAF to 0.05 or 0.1 (you can try that). The GN2 GEMMA code for LOCO does pass in -maf (though I see that non-LOCO does not - ugh again). I need to run GEMMA to see if the output should differ and I'll need to see the GN2 logs to understand what is happening. Maybe it just says that the hits are haplotype driven - and that kinda makes sense because there is a range of them. + +That leads me to think that we only need to check for epoch when we have a single *low* MAF hit, say 0.01 for 28 mice. As we actively filter on MAF right now we won't likely see an epoch hit. + + +## Protocol for permutations + +First we run GEMMA just without LOCO using default settings that GN uses + +``` +# Convert the GN geno file to BIMBAM geno file +./bin/gn-geno-to-gemma.py BXD.geno > BXD.geno.txt +# Match pheno file +./bin/rqtl2-pheno-to-gemma.py BXD_pheno_Dave.csv --json BXD.geno.json > BXD_pheno_matched.txt + Wrote GEMMA pheno 237 from 237 with genometypes (rows) and 24 collections (cols)! +gemma -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 +gemma -lmm 9 -g BXD.geno.txt -p BXD_pheno_matched.txt -k output/result.cXX.txt -n 5 +``` + +So far the output is correct. + +``` +-Math.log10(7.532460e-05) +=> 4.123063165904243 +``` + +Try with gemma-wrapper + +``` +./bin/gemma-wrapper --json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 -a BXD.8_snps.txt > K.json +cp output/bab43175329bd14d485e582b7ad890cf0ec28915.cXX.txt /tmp +``` + +Works, but the following failed without the -n switch: + +``` +./bin/gemma-wrapper --debug --verbose --force --json --lmdb --input K.json -- -g BXD.geno.txt -a BXD.8_snps.txt -lmm 9 -p BXD_pheno_matched.txt -n 5 +``` + +and worked with. That is logical, if you see output like + +``` +19 rs30886715 46903165 0 X Y 0.536 0.000000e+00 0.000000e+00 1.000000e-05 1.000000e+00 +19 rs6376540 46905638 0 X Y 0.536 0.000000e+00 0.000000e+00 1.000000e-05 1.000000e+00 +19 rs50610897 47412184 0 X Y 0.538 0.000000e+00 0.000000e+00 1.000000e-05 1.000000e+00 +``` + +It means the phenotype column that was parsed has empty values. In this case the BXD strain names. GEMMA should show a meaningful error. + +Now that works we can move to a full LOCO + + +``` +./bin/gemma-wrapper --loco --json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 -a BXD.8_snps.txt > K.json +./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g BXD.geno.txt -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 -p BXD_pheno_matched.txt -n 5 +./bin/./bin/view-gemma-mdb --sort /tmp/test/ca55b05e8b48fb139179fe09c35cff0340fe13bc.mdb +``` + +and we get + +``` +18,69216071,rs3718618,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69825784,rs50446650,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,68189477,rs29539715,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +``` + +When we converted BXD.geno to its BIMBAM BXD.geno.txt we also got a BXD.geno.json file which contains a list of the individuals/genometypes that were used in the genotype file. + +Now we reduce the traits file to something GEMMA can use for permutations - adding the trait number and output BXD_pheno_Dave.csv.json + +```sh +./bin/rqtl2-pheno-to-gemma.py BXD_pheno_Dave.csv --json BXD.geno.json -n 5 > BXD_pheno_matched-5.txt +``` + +The matched file should be identical to the earlier BXD_pheno_matched.txt file. Meanwhile, if you inspect the JSON file you should see + +``` +jq < BXD_pheno_Dave.csv.json + "samples-column": 5, + "trait": "21529", + "samples-reduced": { + "BXD1": 1919.450806, + "BXD101": 2546.293945, + "BXD102": 1745.671997, +``` + +So far we are OK! + +At this point we have a reduced sample set, a BIMBAM file and a phenotype file GEMMA can use! + +``` +./bin/gemma-wrapper --loco --json --input BXD_pheno_Dave.csv.json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -a BXD.8_snps.txt -n 5 > K.json +``` + +Note that at this step we actually create a full GRM. Reducing happens in the next mapping stage. + +``` +./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g BXD.geno.txt -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 +``` + +Note the use of '-n' switch. We should change that. + +``` +./bin/./bin/view-gemma-mdb /tmp/test/8599834ee474b9da9ff39cc4954d662518a6b5c8.mdb --sort +``` + +Look for rs3718618 at 69216071 and I am currently getting the wrong result for trait 21529 and it is not clear why that is: + +``` +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +16,88032783,?,0.538,-134.1339,75.7837,0.0,0.0009,3.02 +16,88038734,?,0.538,-134.1339,75.7837,0.0,0.0009,3.02 +(...) +18,69216071,?,0.462,10.8099,93.3936,0.0,0.8097,0.09 +``` + +The failing command is: + +``` +/bin/gemma -loco 18 -k /tmp/test/reduced-GRM-18.txt.tmp -o 69170e8a2d2f08905daa14461eca1d82a676b4c4.18.assoc.txt -p /tmp/test/reduced-pheno.txt.tmp -n 2 -g BXD.geno.txt -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 -outdir /tmp/test +``` + +produces + +``` +18 rs3718618 69216071 0 X Y 0.462 -2.161984e+01 9.339365e+01 1.000000e-05 8.097026e-01 +``` + +The pheno file looks correct, so it has to be the reduced GRM. And this does not look good either: + +``` +number of SNPS for K = 7070 +number of SNPS for GWAS = 250 +``` + +When running GEMMA on genenetwork.org we get a peak for LOCO at that position for rs3718618. I note that the non-LOCO version at 4.1 vs 4.5 for LOCO has a higher peak. We should compute the significance for both! + +Now, when I run the non-LOCO version by hand I get + +``` +-Math.log10(7.532460e-05) +=> 4.123063165904243 +``` + +## Finally + +So, we rolled back to not using reduced phenotypes for now. + +For trait 21529 after 1000 permutations we get for LOCO: + +``` +["95 percentile (significant) ", 1.051208e-05, 5.0] +["67 percentile (suggestive) ", 0.0001483188, 3.8] +``` + +which means our GWA hit is at 4.5 is not so close to being significant. + +Next I made sure the phenotypes got shuffled against the BXD used - which is arguably the right thing to do. +It should not have a huge impact because the BXDs share haplotypes - so randomized association should end up in the same ball park. The new result after 1000 permutations is: + +``` +["95 percentile (significant) ", 8.799303e-06, 5.1] +["67 percentile (suggestive) ", 0.0001048443, 4.0] +``` + +## More for Dave + + +Run and permute: + +``` +./bin/gemma-wrapper --lmdb --debug --phenotypes BXD_pheno_matched.txt --verbose --force --loco --json --input K.json -- -g BXD.geno.txt -a BXD.8. -lmm 9 -maf 0.05 -n 2 -p BXD_pheno_matched.txt +./bin/gemma-wrapper --debug --phenotypes BXD_pheno_matched.txt --permutate 1000 --phenotype-column 2 --verbose --force --loco --json --input K.json -- -g BXD.geno.txt -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 +``` + +``` +21526 How old was the mouse when a tumor was first detected? +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +14,99632276,?,0.462,-0.6627,0.3322,100000.0,0.0003,3.56 +14,99694520,?,0.462,-0.6627,0.3322,100000.0,0.0003,3.56 +17,80952261,?,0.538,0.6528,0.3451,100000.0,0.0005,3.31 +["95 percentile (significant) ", 6.352578e-06, 5.2] +["67 percentile (suggestive) ", 0.0001007502, 4.0] +``` + +``` +24406 What was the weight of the first tumor that developed, at death? +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +11,9032629,?,0.536,0.1293,0.0562,100000.0,0.0,4.36 +11,9165457,?,0.536,0.1293,0.0562,100000.0,0.0,4.36 +11,11152439,?,0.5,0.126,0.0562,100000.0,0.0001,4.21 +11,11171143,?,0.5,0.126,0.0562,100000.0,0.0001,4.21 +11,11525458,?,0.5,0.126,0.0562,100000.0,0.0001,4.21 +11,8786241,?,0.571,0.1203,0.0581,100000.0,0.0002,3.78 +11,8836726,?,0.571,0.1203,0.0581,100000.0,0.0002,3.78 +11,19745817,?,0.536,0.1183,0.061,100000.0,0.0003,3.46 +11,19833554,?,0.536,0.1183,0.061,100000.0,0.0003,3.46 +["95 percentile (significant) ", 1.172001e-05, 4.9] +["67 percentile (suggestive) ", 0.0001175644, 3.9] +``` + +``` +27515 No description +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +4,103682035,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,103875085,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,104004372,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,104156915,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,104166428,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,104584276,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,103634906,?,0.519,-0.1497,0.0733,100000.0,0.0002,3.67 +4,103640707,?,0.519,-0.1497,0.0733,100000.0,0.0002,3.67 +["95 percentile (significant) ", 7.501004e-06, 5.1] +["67 percentile (suggestive) ", 7.804668e-05, 4.1] +``` + +## Dealing with significance + +Now the significance thresholds appear to be a bit higher than we expect. So, let's see what is going on. First I check the randomization of the phenotypes. That looks great. There are 1000 different phenotype files and they randomized only the BXD we used. Let's zoom in on our most interesting 27515. When running in GN2 I get more hits - they are at the same level, but somehow SNPs have dropped off. In those runs our SNP of interest shows only a few higher values: + +``` +./6abd89211d93b0d03dc4281ac3a0abe7fc10da46.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 -2.932957e-01 7.337327e-02 1.000000e+05 2.700506e-04 +./b6e58d6092987d0c23ae1735d11d4a293782c511.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 -2.413067e-01 6.416133e-02 1.000000e+05 5.188637e-04 +./4266656951ab0c5f3097ddb4bf917448d7542dd5.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 2.757074e-01 6.815899e-02 1.000000e+05 2.365318e-04 +./265e44a4c078d2a608b7117bbdcb9be36f56c7de.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 2.358494e-01 5.743872e-02 1.000000e+05 1.996261e-04 +napoli:/export/local/home/wrk/iwrk/opensource/code/genetics/gemma-wrapper/tmp/test$ rg 103682035 .|grep 5$ +./b29f08a4b1061301d52f939087f1a4c1376256f0.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 -2.841255e-01 6.194426e-02 1.000000e+05 5.220922e-05 +./3e5b12e9b7478b127b47c23ccdfba2127cf7e2b2.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 -2.813968e-01 6.379554e-02 1.000000e+05 8.533857e-05 +``` + +but none as high as the original hit of 5.57 + +``` +irb(main):001:0> -Math.log10(2.700506e-04) +=> 3.5685548534637 +irb(main):002:0> -Math.log10(5.220922e-05) +=> 4.282252795052573 +irb(main):003:0> -Math.log10(8.533857e-05) +=> 4.06885463879464 +``` + +All good. This leaves two things to look into. First, I see less hits than with GN2(!). Second, qnorm gives a higher peak in GN2. + +* [X] Check for number of SNPs + +The number of SNPs is not enough: + +``` +GEMMA 0.98.6 (2022-08-05) by Xiang Zhou, Pjotr Prins and team (C) 2012-2022 +Reading Files ... +## number of total individuals = 237 +## number of analyzed individuals = 26 +## number of covariates = 1 +## number of phenotypes = 1 +## leave one chromosome out (LOCO) = 1 +## number of total SNPs/var = 21056 +## number of SNPS for K = 6684 +## number of SNPS for GWAS = 636 +## number of analyzed SNPs = 21056 +``` + +Even when disabling MAF filtering we still see a subset of SNPs. I am wondering what GN2 does here. + +## Missing SNPs + +In our results we miss SNPs that are listed on GN2, but do appear in our genotypes, e.g. + +``` +BXD.8_snps.txt +19463:rsm10000013598, 69448067, 18 +``` + +First of all we find we used a total of 6360 SNPs out of the original 21056. For this SNP the genotype files show: + +``` +BXD_geno.txt +19463:rsm10000013598, X, Y, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1, 0.5, 1, 1, 1, 1, 0, 1, 0, 1, 0.5, 0, 0, 0, 1, 0.5, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0.5, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0, 0, 0, 1, 1, 1, 0.5, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0.5, 0.5, 0, 0.5, 0.5, 0.5, 0, 0.5, 0.5, 0.5, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 0.5, 1, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0, 0.5, 1, 0.5, 0, 0.5 +``` + +and in our updated + +``` +BXD.geno.txt +rsm10000013598,X,Y,2,0,2,0,2,0,2,0,0,0,0,0,0,2,2,2,0,0,0,0,2,2,2,2,2,0,0,0,0,2,0,2,2,2,0,0,0,0,2,0,2,2,2,2,0,0,2,0,0,0,2,2,0,2,0,0,2,2,2,0,0,2,2,2,2,2,2,2,2,2,2,0,0,2,2,0,2,2,2,2,0,2,2,2,2,2,2,2,0,0,2,2,0,2,0,0,2,2,2,0,2,2,2,0,1,1,1,1,1,1,2,2,1,2,2,2,2,0,2,0,2,1,0,0,0,2,1,0,2,2,2,2,2,0,0,2,2,0,2,2,0,2,2,2,2,2,2,2,2,0,2,2,2,2,2,0,0,0,0,0,2,0,0,2,0,2,1,0,2,0,0,0,0,0,0,0,0,1,2,0,0,0,2,2,2,1,0,2,2,2,2,0,2,0,0,0,2,2,2,2,1,1,0,1,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,2,1,2,1,1,2,1,1,1,1,1,1,0,1,2,1,0,1 +``` + +That looks good. Turns out we need the annotation file(?!) + +I figured out where the missing SNPs went. Turns out that, if you pass in an annotation file, and if it is not complete, GEMMA drops the non-annotated SNPs unceremoniously. Getting the right annotation file fixed it. GEMMA should obviously not behave like that ;). Anyway, I am in sync with GN2 now. Unfortunately, with permutations, the significance threshold did not change much (which kinda makes sense). + +I want to see why gemma is giving this number. If I can't find it fast I'll try to run bulklmm or R/qtl2 lmm instead and see if they disagree with gemma and if we can get close to what Rob expects. + + +``` +gemma -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 22 +gemma -lmm 9 -g BXD.geno.txt -p BXD_pheno_matched.txt -k output/result.cXX.txt -n 22 +``` + +Now that works we can move to a full LOCO + +``` +./bin/gemma-wrapper --loco --json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 -a BXD.8_snps.txt > K.json +./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g BXD.geno.txt -a BXD.8_snps.txt -lmm 9 -maf 0.05 -p BXD_pheno_matched.txt -n 5 +./bin/./bin/view-gemma-mdb --sort /tmp/test/ca55b05e8b48fb139179fe09c35cff0340fe13bc.mdb +``` + +and we get + +``` +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +18,69216071,rs3718618,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69448067,rsm10000013598,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69463065,rsm10000013599,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69803489,rsm10000013600,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69825784,rs50446650,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69836387,rsm10000013601,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,68188822,rsm10000013579,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68189477,rs29539715,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68195226,rsm10000013580,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68195289,rsm10000013581,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68195758,rsm10000013582,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68454446,rs30216358,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68514475,rs6346101,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68521138,rsm10000013583,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68526029,rs29984158,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68542739,rsm10000013584,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68543456,rsm10000013585,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68564736,rsm10000013586,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68565230,rsm10000013587,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +``` + +which is in line with GN2. + +Run and permute: + +``` +./bin/gemma-wrapper --debug --phenotypes BXD_pheno_matched.txt --permutate 1000 --phenotype-column 2 --verbose --force --loco --json --input K.json -- -g BXD.geno.txt -a BXD.8_snps.txt -lmm 9 -maf 0.05 +``` + +* [X] Test significance effect for higher and lower MAF than 0.05 + +Lower MAF increases significance thresholds? + +``` +0.05? +["95 percentile (significant) ", 6.268117e-06, 5.2] +["67 percentile (suggestive) ", 7.457537e-05, 4.1] + +0.01 +["95 percentile (significant) ", 5.871237e-06, 5.2] +["67 percentile (suggestive) ", 7.046853e-05, 4.2] +``` + +* [ ] Check distribution of hits with permutations + +## What about significance + +What we are trying to do here is to decide on a significance level that says that the chance of a hit caused by a random event is less that 1 in a thousand. We are currently finding levels of 5.0 and from earlier work it should be less than 4.0. We are essentially following Gary Churchill's '94 paper: ``Empirical threshold values for quantitative trait mapping''. The significance level depends on the shape of the data - i.e., the shape of both genotypes and the trait under study. If the significance level is 5.0 it means that we can expect alpha=0.05 or 5% of random trait vectors can be expected to show a LOD score of 5 or higher. + +What GEMMA does is look for a correlation between a marker, e.g. + +``` +BXD.geno.txt +rsm10000013598,X,Y,2,0,2,0,2,0,2,0,0,0,0,0,0,2,2,2,0,0,0,0,2,2,2,2,2,0,0,0,0,2,0,2,2,2,0,0,0,0,2,0,2,2,2,2,0,0,2,0,0,0,2,2,0,2,0,0,2,2,2,0,0,2,2,2,2,2,2,2,2,2,2,0,0,2,2,0,2,2,2,2,0,2,2,2,2,2,2,2,0,0,2,2,0,2,0,0,2,2,2,0,2,2,2,0,1,1,1,1,1,1,2,2,1,2,2,2,2,0,2,0,2,1,0,0,0,2,1,0,2,2,2,2,2,0,0,2,2,0,2,2,0,2,2,2,2,2,2,2,2,0,2,2,2,2,2,0,0,0,0,0,2,0,0,2,0,2,1,0,2,0,0,0,0,0,0,0,0,1,2,0,0,0,2,2,2,1,0,2,2,2,2,0,2,0,0,0,2,2,2,2,1,1,0,1,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,2,1,2,1,1,2,1,1,1,1,1,1,0,1,2,1,0,1 +``` + +and a trait that is measured for a limited number against these individuals/strains/genometypes. We also correct for kinship between the individuals, but that is tied to the individuals, so we can ignore that for now. So you get a vector of: + +``` +marker rsm10000013598 +ind trait +0 8.1 +0 7.9 +2 12.3 +2 13.4 +``` + +We permute the data after breaking the correlation between left and right columns. When running 1000 permutations for this particular hit we find that the shuffled never gets a higher value then for our main run. That is comforting because random permutations are always less correlated (for this marker). + +If we do this genome-wide we also see a randomly positioned highest hit across all chromosomes after shuffling the trait vector and our hit never appears the highest. E.g. + +``` +[10, ["2", "rs13476914", "170826974"], ["95 percentile (significant) ", 1.870138e-05, 4.7], ["67 percentile (suggestive) ", 6.3797e-05, 4.2]] +[11, ["6", "rsm10000004149", "25227945"], ["95 percentile (significant) ", 1.870138e-05, 4.7], ["67 percentile (suggestive) ", 6.3797e-05, 4. 2]] +[12, ["9", "rsm10000006852", "81294046"], ["95 percentile (significant) ", 1.555683e-05, 4.8], ["67 percentile (suggestive) ", 4.216931e-05, 4.4]] +[13, ["2", "rsm10000001382", "57898368"], ["95 percentile (significant) ", 1.555683e-05, 4.8], ["67 percentile (suggestive) ", 6.3797e-05, 4. 2]] +[14, ["1", "rsm10000000166", "94030054"], ["95 percentile (significant) ", 1.555683e-05, 4.8], ["67 percentile (suggestive) ", 6.3797e-05, 4. 2]] +[15, ["X", "rsm10000014672", "163387262"], ["95 percentile (significant) ", 1.555683e-05, 4.8], ["67 percentile (suggestive) ", 6.3797e-05, 4 .2]] +``` + +### Shuffling a normally distributed trait + + +So the randomization works well. Still, or 95% is close to 5.0 and that is by chance. What happens when we change the shape of the data? Let's create a new trait, so the distribution is random and normal: + +``` +> rnorm(25, mean = 10, sd = 2) + [1] 10.347116 9.475156 11.747876 10.969742 11.374611 12.283834 11.499779 + [8] 11.123520 10.830300 11.640049 10.392085 11.586836 11.540470 10.700869 +[15] 8.802858 10.238498 11.099536 8.832104 6.463636 10.347956 11.222558 +[22] 8.658024 7.796304 10.684967 9.540483 +``` + +These random trait values renders a hit of -Math.log10(8.325683e-04) = 3.0! Now we permute and we get: + +["95 percentile (significant) ", 5.22093e-06, 5.3] +["67 percentile (suggestive) ", 7.303966e-05, 4.1] + +So the shape of a normally distribute trait gives a higher threshold - it is easier to get a hit by chance. + +### Genotypes + +So 95% of random shuffled trait runs still gives us 5.x. So this has to be a property of the genotypes in conjunction with the method GEMMA applies. With regard to genotypes, the BXD are not exactly random because they share markers from two parents which run along haplotypes. I.e. we are dealing with a patchwork of similar genotypes. You may expect that would suppress the chance of finding random hits. Let's try to prove that by creating fully random genotypes and an extreme haplotype set. And, for good measure something in between. + +* [X] Fully random genotypes + +In the next phase we are going to play a bit with the haplotypes. First we fully randomize the genotype matrix. This way we break all haplotypes. As BIMBAM is a simple format we'll just modify an existing BIMBAM file. It looks like + +``` +rs3677817,X,Y,1.77,0.42,0.18,0.42,1.42,0.34,0.69,1.57,0.52,0.1,0.37,1.27,0.62,1.87,1.71,1.65,1.83,0.04,1.05,0.52,1.92,0.57,0.61,0.11,1.49,1.07,1.48,1.7,0.5,1.75,1.74,0.29,0.37,1.78,1.91,1.37,1.64,0.32,0.09,1.21,1.58,0.4,1.0,0.62,1.1,0.7,0.35,0.86,0.7,0.46,1.14,0.04,1.87,1.96,0.61,1.34,0.63,1.04,1.95,0.22,0.54,0.31,0.14,0.95,1.45,0.93,0.37,0.79,1.37,0.87,1.79,0.41,1.73,1.25,1.49,1.57,0.39,1.61,0.37,1.85,1.83,1.71,1.5,1.78,1.34,1.29,1.41,1.54,1.05,0.3,0.87,1.85,0.5,0.19,1.54,0.53,0.26,1.47,0.67,0.84,0.18,0.79,0.68,1.48,0.4,1.83,1.76,1.09,0.2,1.48,0.24,0.53,0.41,1.24,1.38,1.31,1.73,0.52,1.86,1.21,0.58,1.68,0.79,0.4,1.41,0.07,0.57,0.42,0.47,0.49,0.05,0.77,1.33,0.15,1.41,0.03,0.24,1.66,1.39,2.0,0.23,1.4,1.05,0.79,0.51,0.66,1.24,0.29,1.12,0.46,0.92,1.12,1.53,1.78,1.22,1.35,0.1,0.43,0.41,1.89,0.09,0.13,1.04,0.24,1.4,1.25,0.24,0.26,0.31,0.36,0.31,1.34,1.23,1.91,0.7,0.08,1.43,0.17,1.9,0.06,1.42,1.94,0.43,0.54,1.96,1.29,0.64,0.82,1.85,1.63,0.23,1.79,0.52,1.65,1.43,0.95,1.13,0.59,0.07,0.66,1.79,0.92,1.89,1.2,0.51,0.18,0.96,0.44,0.46,0.88,0.39,0.89,1.68,0.07,1.46,1.61,1.73,0.56,1.33,1.67,0.16,1.78,0.61,1.55,0.88,0.15,1.98,1.96,0.61,0.04,0.12,1.4,1.65,0.71,1.3,1.85,0.49 +``` + +We'll stick in the old hit for good measure and run our genotypes: + +``` +./bin/gemma-wrapper --loco --json -- -gk -g BXD.geno.rand.txt -p BXD_pheno_matched.txt -n 5 -a BXD.8_snps.txt > K.json +./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g BXD.geno.rand.txt -a BXD.8_snps.txt -lmm 9 -maf 0.05 -p BXD_pheno_matched.txt -n 22 +./bin/./bin/view-gemma-mdb --sort /tmp/test/ca55b05e8b48fb139179fe09c35cff0340fe13bc.mdb +./bin/view-gemma-mdb /tmp/e279abbebee8e41d7eb9dae...-gemma-GWA.tar.xz --anno BXD.8_snps.txt|head -20 +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +X,139258413,rsm10000014629,0.496,0.2248,0.093,100000.0,0.0,4.58 +6,132586518,rsm10000003691,0.517,0.2399,0.1068,100000.0,0.0001,4.17 +2,161895805,rs27350606,0.585,-0.2303,0.1059,100000.0,0.0001,4.0 +X,47002415,rsm10000014323,0.562,-0.1904,0.0877,100000.0,0.0001,3.99 +3,32576363,rsm10000001568,0.468,-0.2251,0.104,100000.0,0.0001,3.97 +14,19281191,rs52350512,0.5,-0.2454,0.1154,100000.0,0.0001,3.88 +7,111680092,rs32385258,0.536,0.2022,0.0968,100000.0,0.0002,3.79 +4,151267320,rsm10000002095,0.604,-0.2257,0.1102,100000.0,0.0002,3.69 +2,157353289,rs27323024,0.455,0.2188,0.1072,100000.0,0.0002,3.67 +19,56503719,rsm10000013894,0.617,0.2606,0.1302,100000.0,0.0003,3.58 +``` + +Interestingly our trait did not do that well: + +``` +18,69448067,rsm10000013598,0.635,0.0941,0.0774,100000.0,0.0167,1.78 +``` + +It shows how large the impact of the GRM is. We can run our permutations. + +``` +./bin/gemma-wrapper --debug --phenotypes BXD_pheno_matched.txt --permutate 1000 --phenotype-column 22 --verbose --force --loco --json --input K.json -- -g BXD.geno.rand.txt -a BXD.8_snps.txt -lmm 9 -maf 0.05 +["95 percentile (significant) ", 1.478479e-07, 6.8] +["67 percentile (suggestive) ", 1.892087e-06, 5.7] +``` + +Well that went through the roof :). It makes sense when you think about it. Randomizing genotypes of 21K SNPs gives you a high chance of finding SNPs that correlate with the trait. Let's go the other way and give 20% of indidivuals the exact same haplotypes, basically copying + +``` +rsm10000013598,X,Y,2,0,2,0,2,0,2,0,0,0,0,0,0,2,2,2,0,0,0,0,2,2,2,2,2,0,0,0,0,2,0,2,2,2,0,0,0,0,2,0,2,2,2,2,0,0,2,0,0,0,2,2,0,2,0,0,2,2,2,0,0,2,2,2,2,2,2,2,2,2,2,0,0,2,2,0,2,2,2,2,0,2,2,2,2,2,2,2,0,0,2,2,0,2,0,0,2,2,2,0,2,2,2,0,1,1,1,1,1,1,2,2,1,2,2,2,2,0,2,0,2,1,0,0,0,2,1,0,2,2,2,2,2,0,0,2,2,0,2,2,0,2,2,2,2,2,2,2,2,0,2,2,2,2,2,0,0,0,0,0,2,0,0,2,0,2,1,0,2,0,0,0,0,0,0,0,0,1,2,0,0,0,2,2,2,1,0,2,2,2,2,0,2,0,0,0,2,2,2,2,1,1,0,1,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,2,1,2,1,1,2,1,1,1,1,1,1,0,1,2,1,0,1 +``` + +``` +./bin/bimbam-rewrite.py --inject inject.geno.txt BXD.geno.txt --perc=20 > BXD.geno.20.txt +rg -c "2,0,2,0,2,0,2,0,0,0,0,0,0,2,2,2,0,0,0,0,2,2,2,2,2,0,0,0,0,2,0,2,2,2,0,0,0,0,2,0,2,2,2,2,0,0,2,0,0,0,2,2,0,2,0,0,2,2,2,0,0,2,2,2,2,2,2,2,2,2,2,0,0,2,2,0,2,2,2,2,0,2,2,2,2,2,2,2,0,0,2,2,0,2,0,0,2,2,2,0,2,2,2,0,1,1,1,1,1,1,2,2,1,2,2,2,2,0,2,0,2,1,0,0,0,2,1,0,2,2,2,2,2,0,0,2,2,0,2,2,0,2,2,2,2,2,2,2,2,0,2,2,2,2,2,0,0,0,0,0,2,0,0,2,0,2,1,0,2,0,0,0,0,0,0,0,0,1,2,0,0,0,2,2,2,1,0,2,2,2,2,0,2,0,0,0,2,2,2,2,1,1,0,1,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,2,1,2,1,1,2,1,1,1,1,1,1,0,1,2,1,0,1" BXD.geno.20.txt +4276 +``` + +so 4K out of 20K SNPs has identical haplotypes which correlate with our trait of interest: + +``` +["95 percentile (significant) ", 5.16167e-06, 5.3] +["67 percentile (suggestive) ", 6.163728e-05, 4.2] +``` + +and at 40% haplotype injection we get + +``` +["95 percentile (significant) ", 3.104788e-06, 5.5] +["67 percentile (suggestive) ", 7.032406e-05, 4.2] +``` + +* [X] Haplotype equal genotypes 20% and 40% + +All looks interesting, but does not help. + +Also when we halve the number of SNPs the results are similar too. + +``` +["95 percentile (significant) ", 6.026549e-06, 5.2] +["67 percentile (suggestive) ", 8.571557e-05, 4.1] +``` + +Even though the threshold is high, it is kind of interesting to see that no matter what you do you end up similar levels. After a meeting with Rob and Saunak the latter pointed out that these numbers are not completely surprising. For LMMs we need to use an adaptation - i.e. shuffle the trait values after rotation and transformation and then reverse that procedure. There is only the assumption of normality that Churchill does not require. The good news is that BulkLMM contains that method and thresholds will be lower. The bad news is that I'll have to adapt it because it does not handle missing data. + +Oh yes, rereading the Churchill paper from 1994 I now realise he also suggests an at marker significance method that will end lower - we saw that already in an earlier comparison. Saunak, however, says that we *should* do experiment-wide. + +## BulkLMM + +* [ ] Run bulklmm + + +## Dealing with epoch + +Rob pointed out that the GRM does not necessarily represent epoch and that may influence the significance level. I.e. we should check for that. I agree that the GRM distances are not precise enough (blunt instrument) to capture a few variants that appeared in a new epoch of mice. I.e., the mice from the 90s may be different from the mice today in a few DNA variants that won't be reflected in the GRM. + +* [ ] Deal with epoch + +We have two or more possible solutions to deal with hierarchy in the population. + +## Covariates + +* [ ] Try covariates Dave + +## Later + +* [ ] Check running or trait without LOCO with both standard and random GRMs +* [ ] Test non-loco effect for rsm10000013598 - looks too low and does not agree with GN2 +* [X] Try qnorm run +* [ ] Fix non-use of MAF in GN for non-LOCO +* [ ] Fix running of -p switch when assoc cache exists (bug) + +Quantile-Based Permutation Thresholds for Quantitative Trait Loci Hotspots +https://academic.oup.com/genetics/article/191/4/1355/5935078 +by Karl, Ritsert et al. 2012 diff --git a/topics/lmms/rqtl2/genenetwork-rqtl2-implementation.gmi b/topics/lmms/rqtl2/genenetwork-rqtl2-implementation.gmi new file mode 100644 index 0000000..452930f --- /dev/null +++ b/topics/lmms/rqtl2/genenetwork-rqtl2-implementation.gmi @@ -0,0 +1,71 @@ +# Implementation of QTL Analysis Using r-qtl2 in GeneNetwork +## Tags + +* Assigned: alexm +* Keywords: RQTL, GeneNetwork2, implementation +* Type: Feature +* Status: In Progress + +## Description + +This document outlines the implementation of a QTL analysis tool in GeneNetwork using r-qtl2 (see docs: https://kbroman.org/qtl2/) and explains what the script does. +This PR contains the implementation of the r-qtl2 script for genenetwork: +=> https://github.com/genenetwork/genenetwork3/pull/201 + +## Tasks + +The script currently aims to achieve the following: + +* [x] Parsing arguments required for the script +* [x] Data validation for the script +* [x] Generating the cross file +* [x] Reading the cross file +* [x] Calculating genotype probabilities +* [x] Performing Geno Scan (scan1) using HK, LOCO, etc. +* [x] Finding LOD peaks +* [x] Performing permutation tests +* [x] Conducting QTL analysis for multiparent populations +* [ ] Generating required plots + +## How to Run the Script + +The script requires an input file containing all the necessary data to generate the control file. Example: + +```json +{ + "crosstype": "riself", + "geno_file": "grav2_geno.csv", + "geno_map_file": "grav2_gmap.csv", + "pheno_file": "grav2_pheno.csv", + "phenocovar_file": "grav2_phenocovar.csv" +} + +``` +In addition other parameters required are + +* output file (A file path of where the output for the script will be generated) +* --directory ( A workspace of where to generate the control file) + +Optional parameters include +* --output_file: The file path where the output for the script will be generated. +* --directory: The workspace directory where the control file will be generated. + +Optional parameters: + +* --cores: The number of cores to use (set to 0 for using all cores). +* --method: The scanning method to use (e.g., Haley-Knott, Linear Mixed Model, or LMM with Leave-One-Chromosome-Out). +* --pstrata: Use permutation strata. +* --threshold: Minimum LOD score for a peak. + + +An example of how to run the script: + +```sh + +Rscript rqtl2_wrapper.R --input_file [file_path] --directory [workspace_dir] --output_file [file_path] --nperm 100 --cores 3 + +``` +## Related issues: +https://issues.genenetwork.org/topics/lmms/rqtl2/using-rqtl2 +=> ./using-rqtl2 +=> ./gn-rqtl-design-implementation diff --git a/topics/lmms/rqtl2/gn-rqtl-design-implementation.gmi b/topics/lmms/rqtl2/gn-rqtl-design-implementation.gmi new file mode 100644 index 0000000..f37da42 --- /dev/null +++ b/topics/lmms/rqtl2/gn-rqtl-design-implementation.gmi @@ -0,0 +1,203 @@ +# RQTL Implementation for GeneNetwork Design Proposal + +## Tags + +* Assigned: alexm, +* Keywords: RQTL, GeneNetwork2, Design +* Type: Enhancements, +* Status: In Progress + + + +## Description + +This document outlines the design proposal for the re-implementation of the RQTL feature in GeneNetwork providing also a console view to track the stdout from the external process. + +### Problem Definition + +The current RQTL implementation faces the following challenges: + +- Lack of adequate error handling for the API and scripts. + +- Insufficient separation of concerns between GN2 and GN3. + +- lack way for user to track the progress of the r-qtl script being executed + +- There is lack of a clear way in which the r-qtl script is executed + +We will address these challenges and add enhancements by: + +- Rewriting the R script using r-qtl2 instead of r-qtl. + +- Establishing clear separation of concerns between GN2 and GN3, eliminating file path transfers between the two. + +- Implementing better error handling for both the API and the RQTL script. + +- run the script as a job in a task queue + +- Piping stdout from the script to the browser through a console for real-time monitoring. + +- Improving the overall design and architecture of the system. + + + +## High-Level Design +This is divided into three major components: + +* GN3 RQTL-2 Script implementation +* RQTL Api +* Monitoring system for the rqtl script + + +### GN3 RQTL-2 Script implementation +We currently have an rqtl script written in rqtl https://github.com/genenetwork/genenetwork3/blob/main/scripts/rqtl_wrapper.R +There is a newer rqtl implementation (rqtl-2) which is +a reimplementation of the QTL analysis software R/qtl, to better handle high-dimensional data and complex cross designs. +To see the difference between the two see documentation: +=> https://kbroman.org/qtl2/assets/vignettes/rqtl_diff.html +We aim to implement a seperate script using this while maintaining the one +implemented using rqtl1 (rqtl) . +(TODO) This probably needs to be split to a new issue(with enough knowledge) , to capture +each computation step in the r script. + +### RQTL Api + + +This component will serve as the entry point for running RQTL in GN3. At this stage, we need to improve the overall architecture and error handling. This process will be divided into the following steps: + +- Data Validation +In this step, we must validate that all required data to run RQTL is provided in the JSON format. This includes the mapping method, genotype file, phenotype file, etc. Please refer to the r-qtl2 documentation for an overview on the requirements : +=> https://rqtl.org/ + +- Data Preprocessing +During this stage, we will transform the data into a format that R can understand. This includes converting boolean values to the appropriate representations, preparing the RQTL command with all required values, and adding defaults where necessary. + +- Data Computation +In this stage, we will pass the RQTL script command to the task queue to run as a job. + +- Output Data Processing +In this step, we need to retrieve the results outputted from the script in a specified format, such as JSON or CSV and process the data. This may include outputs like RQTL pair scans and generated diagrams. Please refer to the documentation for an overview: +=> https://rqtl.org/ + + + +**Subtasks:** + +- [ ] add the rqtl api endpoint (10%) +- [ ] Input Data validation (15%) +- [ ] Input data processing (20%) +- [ ] Passing data to r-script for the computation (40%) +- [ ] output data processing (80%) + -[ ] add unittests for this module (100%) + + +### Monitoring system for the rqtl script + +This component involves creating a monitoring system to track the state of the external process and output relevant information to the user. +We need a way to determine the status for the current job for example +QUEUED, STARTED, INPROGRESS, COMPLETED (see deep dive for more on this) + + +## Deep Dive + + +### Running the External Script +The RQTL implementation is in R, and we need a strategy for executing this script as an external process. This can be subdivided into several key steps: + +- **Task Queue Integration**: + + - We will utilize a task queue system , + We already have an implementation in gn3 + to manage script execution + +- https://github.com/genenetwork/genenetwork3/blob/0820295202c2fe747c05b93ce0f1c5a604442f69/gn3/commands.py#L101 + +- **Job Submission**: + - Each API call will create a new job in the task queue, which will handle the execution of the R script. + +- **Script Execution**: + - This stage involves executing the R script in a controlled environment, ensuring all necessary dependencies are loaded. + +- **Monitoring and Logging**: + +- The system will include monitoring tools to track the status of each job. Users will receive real-time updates on job progress and logs for the current task. + +In this stage, we can have different states for the current job, such as QUEUED, IN PROGRESS, and COMPLETED. + +We need to output to the user which stage of computation we are currently on during the script +execution. + +- During the QUEUED state, the standard output (stdout) should display the command to be executed along with all its arguments. + +- During the STARTED stage, the stdout should notify the user that execution has begun. + +- In the IN PROGRESS stage, we need to fetch logs from the script being executed at each computation step. Please refer to this documentation for an overview of the different computations we +shall have : +=> https://rqtl.org/ + +- During the DONE step, the system should output the results from the R/qtl script to the user. + + +- **Result Retrieval**: + - Once the R script completes (either successfully or with an error), results will be returned to the API call. + +- **Error Handling**: + - Better error handling will be implemented to manage potential issues during script execution. This includes capturing errors from the R script and providing meaningful feedback to users through the application. + +### Additional Error Handling Considerations +This will involve: +* API error handling +* Error handling within the R script + +## Additional UI Considerations +We need to rethink where to output the external process stdout in the UI. Currently, we can add flags to the URL to enable this functionality, e.g., `URL/page&flags&console=1`. +Also the design suggestion is to output the results in a terminal emulator for +example xterm ,See more: https://xtermjs.org/, A current implementation already exists +for gn3 see +=> https://github.com/genenetwork/genenetwork2/blob/abe324888fc3942d4b3469ec8d1ce2c7dcbd8a93/gn2/wqflask/templates/wgcna_setup.html#L89 + +### Design Suggestions: +#### With HTMX, offer a split screen +This will include an output page and a monitoring system page. + +#### Popup button for preview +A button that allows users to preview and hide the console output. + + + + + +## Long-Term Goals +We aim to run computations on clusters rather than locally. This project will serve as a pioneer for that approach. + +## Related Issues +=> https://issues.genenetwork.org/topics/lmms/rqtl2/using-rqtl2 + +### Tasks + +* stage 1 (20%) * + + - [x] implement the rqtl script using rqtl2 + +* stage 2 (40%) * + +- [ ] Implement the RQTL API endpoints +- [ ] validation and preprocessing for data from the client +- [ ] Implement state-of-the-art error handling +- [ ] Add unit tests for the rqtl api module +- [ ] Make improvements to the current R script if possible + +* stage 3 (60%)* + +- [ ] Task queue integration (refer to the Deep Dive section) +- [ ] Implement a monitoring and logging system for job execution (refer to the deep dive section +- [ ] Fetch results from running jobs +- [ ] Processing output from the external script + +* stage 4 (80%) * +- [ ] Implement a console preview UI for user feedback +- [ ] Refactor the GN2 UI + +* stage 5 (100%) * + +- [ ] Run this computation on clusters
\ No newline at end of file diff --git a/topics/lmms/rqtl2/using-rqtl2.gmi b/topics/lmms/rqtl2/using-rqtl2.gmi new file mode 100644 index 0000000..7f671ba --- /dev/null +++ b/topics/lmms/rqtl2/using-rqtl2.gmi @@ -0,0 +1,44 @@ +# R/qtl2 + +# Tags + +* assigned: pjotrp, alexm +* priority: high +* type: enhancement +* status: open +* keywords: database, gemma, reaper, rqtl2 + +# Description + +R/qtl2 handles multi-parent populations, such as DO, HS rat and the collaborative cross (CC). It also comes with an LMM implementation. Here we describe using and embedding R/qtl2 in GN2. + +# Tasks + + +## R/qtl2 + +R/qtl2 is packaged in guix and can be run in a shell with + + +``` +guix shell -C r r-qtl2 +R +library(qtl2) +``` + +R/qtl2 also comes with many tests. When starting up with development tools in the R/qtl2 checked out git repo + +```sh +cd qtl2 +guix shell -C -D r r-qtl2 r-devtools make coreutils gcc-toolchain +make test +Warning: Your system is mis-configured: '/var/db/timezone/localtime' is not a symlink +i Testing qtl2 +Error in dyn.load(dll_copy_file) : +unable to load shared object '/tmp/RtmpWaf4td/pkgload31850824d/qtl2.so': /gnu/store/hs6jjk97kzafl3qn4wkdc8l73bfqqmqh-gfortran-11.4.0-lib/lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /tmp/RtmpWaf4td/pkgload31850824d/qtl2.so) +Calls: <Anonymous> ... <Anonymous> -> load_dll -> library.dynam2 -> dyn.load +Execution halted +make: *** [Makefile:9: test] Error 1 +``` + +not sure what the problem is yet. diff --git a/topics/meetings/gn-nairobi-2025.gmi b/topics/meetings/gn-nairobi-2025.gmi new file mode 100644 index 0000000..fb357a5 --- /dev/null +++ b/topics/meetings/gn-nairobi-2025.gmi @@ -0,0 +1,17 @@ +# Meeting Notes + +## 2024-01-10 +* @flisso: Prepare gn-uploader presentation for KEMRI. +* @flisso: Put c-elegans dataset to staging. +* @flisso: PHEWAS --- extract phenotypes from genenetwork and analyse them using PHEWAS. +* @alexm: Clean up R/Qtl 1. +* @alexm: Add R/Qtl 2 in gn. +* @alexm: Fix UI issues around GN AI. +* @bonfacem: Fix UI for group pages. +* @bonfacem: Add git hooks to cd container for self-hosted repositories. +* @bonfacem: Share developer work container and have Alex test it out. +* @bonfacem: Prepare RDF presentation for KEMRI. + +Nice to have: +* @bonfacem: Start dataset metadata editing work. +* @flisso: Write PhD concept note. diff --git a/topics/meetings/jnduli_bmunyoki.gmi b/topics/meetings/jnduli_bmunyoki.gmi index 5af7221..26621d1 100644 --- a/topics/meetings/jnduli_bmunyoki.gmi +++ b/topics/meetings/jnduli_bmunyoki.gmi @@ -1,5 +1,462 @@ # Meeting Notes +## 2024-10-15 +* DONE: @flisso: Follow up with the Medaka team on verification of genotype sample names +* DONE: @flisso: Understand uploader scripts and help improve then. +* CANCELLED: @flisso: Set up virtuoso. @bonfacem shall share notes on this. +* NOT DONE: @flisso: Write PhD concept note. +* DONE: @alexm @jnduli: R/Qtl script. +* DONE: @bonfacem: Test the production container locally and provide @fredm some feedback. +* DONE: @bonfacem: Wrap-up re-writing gn-guile to be part of genenetwork-webservices. +* NOT DONE: @bonfacem: Start dataset metadata editing work. + +## 2024-10-08 +* NOT DONE: @bonfacem RIF Indexing for RIF page in Xapian. +* IN PROGRESS: @bonfacem: Test the production container locally and provide @fredm some feedback. +* IN PROGRESS: @bonfacem: Re-writing gn-guile to be part of genenetwork-webservices. +* NOT DONE: @shelbys @bonfacem: Getting RDF into R2R. +* NOT DONE: @flisso: Follow up with the Medaka team on verification of genotype sample names. NOTE: Medaka team are yet to respond. +* IN PROGRESS: @flisso: Figure out how to add C Elegans data in staging. NOTE: Got access to staging server. Ran example tests. Still working on some errors. +* NOT DONE: @flisso: Set up virtuoso. @bonfacem shall share notes on this. +* NOT DONE: @flisso: Write PhD concept note. NOTE: Doing some lit review. +* @shelbys: Be able to test things on lambda01 for LLM tests. +* @alexm @jnduli: R/Qtl script. + +## 2024-10-18 +* IN-PROGRESS: @priscilla @flisso: Set up mariadb and virtuoso to test out some GN3 endpoints. NOTE: Mariadb set-up +* NOT DONE: @priscilla @flisso @bmunyoki: Improve docs while hacking on the above. +* DONE: @jnduli Remove gn-auth code from GN3. +* DONE: @jnduli Resolve current issue with broken auth in gn-qa. +* DONE: @jnduli @alexm Work on the R/Qtl design doc. +* IN-PROGRESS: @alexm: R/Qtl script. NOTE: Reviewed by @jnduli. +* DONE: @flisso MIKK genotyping. NOTE: Verification pending from Medaka team. +* DONE: @flisso Make sure we have C Elegans and HS Rats dataset to testing, and have the genotyping pipeline working. NOTE: Issues with tux02 staging server. +* DONE: @shelbys: Modify existing Grant write-up for pangenomes. NOTES: Some more edits to be done. +* NOT DONE: @shelbys @bonfacem: Getting RDF into R2R. +* NOT DONE: @bonfacem RIF Indexing for RIF page in Xapian. +* DONE: @bonfacem Work on properly containerizing gn-guile. NOTE: Send in patches to @alexm, @aruni, and @fredm to review later today. +* DONE: @bonfacem: Fix the virtuoso CI job in CD: NOTE: I'm awaiting feedback from @arun/@fredm. + +## 2024-10-11 +* WIP @priscilla @flisso: Try out API endpoints that don't require auth. NOTE: Priscilla got to set-up guix channels for gn3. Felix ran into problems. Priscilla set up the MySQL in her Ubuntu system. +* NOT DONE: @jnduli Harden hook system for gn-auth. +* WIP: @jnduli Remove gn-auth code from GN3. NOTE: Sent latest patches to Fred. Running issue, some patches may have caused gn-qa to fail. +* DONE: @jnduli @bonfacem Finish up RIF Editing project. +* NOT DONE: @jnduli @alexm Create issue on describing the monitoring system. +* NOT DONE: @jnduli @alexm Create issue on prompt engineering in GN to improve what we already have. +* WIP: @alex Work on R/Qtl. NOTE: @jnduli/@bonfacem help out with this. NOTE: Finished writing the design doc for gn-qa. +* DONE: Looked at documentation for R/Qtl. +* NOT DONE: @alex: Review @bmunyoki's work on RIF/Indexing. +* WIP: @flisso: Make sure we have C Elegans dataset and MIKK genotypes to production. NOTE: Issues with data entry scripts. Fred/Zach working to set up test environment. +* WIP: @flisso: MIKK genotyping. NOTE: Still testing the pipeline. Halfway there. +* NOT DONE: @flisso: Make sure we have HS Rats in testing stage. +* WIP: @flisso: Make progress in learning back-end coding WRT GN. NOTE: Issue setting up GN3. +* WIP: @shelbys: Modify existing Grant write-up for pangenomes. NOTE: Reviewed by Pj and Eric. More mods based of feedback. Paper got accepted by BioArxiv. Added some docs to R2R evaluation code. +* DONE: @shelbys: Finish getting all the R2R scores from the first study. NOTE: Got scores for all the scores from first papers using R2R instead of Fahamu. +* NOT DONE: @bonfacem RIF Indexing for RIF page in Xapian. +* WIP: @bonfacem Work on properly containerizing gn-guile. +* DONE: @bonfacem Fix the gn-transform-database in CI. Sent patches to Arun for review. +* DONE: @bonfacem Fixed broken utf-8 characters in gn-gemtext. + +## 2024-10-04 +* IN PROGRESS: @priscilla @bonfacem Setting up GN3. @priscilla try out API endpoints that don't require auth. NOTE: @priscilla Able to set up guix as a package manager. Trouble with Guix set-up with GN3. @bonfacem good opportunity to improve docs in GN3. +* IN PROGRESS: @jnduli Harden hook system for gn-auth. +* IN PROGRESS: @jnduli Remove gn-auth code from GN3. +* DONE: @jnduli Finish UI changes for RIF editing. NOTE: Demo done in GN Learning team. +* IN PROGRESS: @alex Work on R/Qtl. NOTE: Met with Karl Brohman/PJ. Been reading the docs. Will track this issue in GN. +* NOT DONE: @alex @bonfacem Work on properly containerizing gn-guile. +* DONE: @bonfacem API/Display of NCBI Rif metadata. +* IN PROGRESS: @bonfacem @alex RIF Indexing for RIF page in Xapian. +* IN PROGRESS: @flisso Push data to production. Commence work on Arabidopsis data and HS Rats data. NOTE: C-Elegans pushed in process of being pushed to testing server, then later production. WIP with HS Rats data in collab with Palmer. +* DONE: @flisso: Learning how to use SQL WRT C Elegans data. +* IN PROGRESS: @shelbys Re-formatting grant to use pangenomes. Waiting for Garisson for feedback. +* DONE: @shelbys Got the R2R for the human generated questions. TODO: Run this for GPT 4.0 model. + +## 2024-09-27 + +* DONE: @jnduli @bonfacem @alex Look at base files refactor and merge work. +* DONE: @priscilla continue to upload more papers. NOTE: Uploaded an extra 200 papers. +* NOT DONE: @priscilla @flisso Set up GN3. Goal is to be able to query some APIs in cURL. +* IN PROGRESS: @jnduli Improve hook systems for gn-auth. NOTE: Still figuring out a cleaner implementation for some things. +* IN PROGRESS: @jnduli Trying to remove auth code GN3 code. NOTE: Idea, though unsure about safety. @fred to review work and make sure things are safe. +* DONE: @jnduli @bonfacem @alex Push most recent changes to production. Figure out what needs doing. NOTE: @Zach is in charge of deployment. @fredm is working on the production container. +* DONE: @alex Close down remaining issues on issue tracker. NOTE: Merged work on cleaning up base files. Few more minor modifications to the UI. +* NOT DONE: @alex investigate the dumped static files for encoding issues. +* IN PROGRESS: @bonfacem NCBI Metadata - Modelling and Display. NOTE: Done with the modelling. Almost done with API/UI work. +* DONE: @bonfacem Fix broken CD tests. NOTE: We have tests running inside the guix build phase. +* IN-PROGRESS: @flisso Continue work on uploading datasets: C Elegans and MIKK. NOTE: Managed to create data files that need to be upleaded to the testing gn2 stage server. +* NOT DONE: @flisso @jnduli help @flisso with SQL. + +## 2024-09-20 +* NOT DONE: @priscilla @flisso @bmunyoki @jnduli set up GN ecosystem and review UI PRs +* DONE: @priscilla continue to upload more papers. NOTE: Shared access to drive to @bmunyoki. We are at 800 papers. +* DONE: @bmunyoki update tux02/01 with recent RIF modifications +* DONE: @jnduli Finish up experiments on hook system. NOTE: Patches got merged. Needs to make some things more concrete. +* NOT DONE: @alex @bonfacem investigate the dumped static files for encoding issues. +* DONE: Refactoring base files for GN2. +* IN PROGRESS: @flisso: Continue work on uploading datasets: C Elegans and MIKK. Note: Waiting for the original MIKK genotype file from the Medaka team. C Elegans yet to process the annotation file---some info is missing. +* NOT DONE: @flisso: Do code reviews on Sarthak's script. +* NOT DONE: @bmunyoki NCBI Metadata - Modelling and Display. +* DONE: @bmunyoki update tux02/01 with recent RIF modifications. NOTE: CD tests are broken and need to be fixed. + +## 2024-09-13 +* NOT DONE: @jnduli @bmunyoki fetch ncbi metadata and display them in GN2 +* DONE: @jnduli @bmunyoki add auth layer to edit rifs functionality +* DONE: @jnduli complete design doc for hooks system for gn-auth. NOTE: More experimentation with this. +* DONE: @jnduli @alex bug fixes for LLM integration. +* DONE: @priscilla added more papers to the LLM ~ 250 papers. +* NOT DONE: @priscilla @flisso @bmunyoki @jnduli set up GN ecosystem and review UI PRs +* DONE: @bmunyoki modify edit api to also write to RIF +* NOT DONE: @bmunyoki update tux02/01 with recent RIF modifications +* DONE: @bmunyoki Add test cases for RDF +* DONE: @alex Bug fix for session expiry. +* DONE: @alex Update links for static content to use self-hosted git repo. +* IN PROGRESS: @flisso Upload C Elegans Dataset. Nb: MIKK one has some issues, so work is paused for now. NOTE: Waiting for annotation and phenotype file for the C Elegans Dataset. +* DONE @flisso: Reviewed gemma wrapper scripts. + + +Nice to have: +* @bmunyoki build system container for gn-guile and write documentation for creating containers + +## 2024-09-06 + +* DONE: @bmunyoki Replicate GN1 WIKI+RIF in GN2. +* DONE: @bmunyoki update server to include latest code changes +* IN PROGRESS: @bmunyoki modify edit api to also write to RIF +* NOT DONE: @bmunyoki build system container for gn-guile and write documentation for creating containers +* DONE: @bmunyoki @flisso update case attributes to capture hierarchy info +* DONE: @bmunyoki prepare presentation for RIF work to GN learning team (goal is to present on Wednesday next week) +* NOT DONE: @bmunyoki update tux02/01 with recent RIF modifications +* NOT DONE: @jnduli @bmunyoki fetch ncbi metadata and display them in GN2 +* NOT DONE: @jnduli complete design doc for hooks system for gn-auth; Focus for next week. +* DONE: @alexm @jnduli integrate LLM in GN2 and GN3: On the look-out for bug-fixes. +* IN PROGRESS: @jnduli add auth layer to edit rifs functionality +* DONE: @flisso generate genotype file on Medaka fish dataset: @arthur to have a look at this. +* IN PROGRESS: @flisso code reviews for gemma-wrapper with @pjotr +* DONE: @flisso update gemtext documentation +* DONE: @flisso help Masters students with their proposal defences +* @priscilla add more papers to LLM +* NOT DONE: @priscilla @flisso @bmunyoki @jnduli set up GN ecosystem and review UI PRs + + +## 2024-09-02 (Sync with @flisso+@bonfacem) + +### Case-Attributes + +* @bmunyoki understood case attributes by reverse-engineering the relevant tables from GeneNetwork's database. + +* One source of confusion for @bmunyoki is that we have the same "CaseAttribute.Name" that applies to different strains. Example Query: + +``` +SELECT * FROM CaseAttribute JOIN CaseAttributeXRef ON CaseAttribute.CaseAttributeId = CaseAttributeXRef.CaseAttributeId WHERE CaseAttribute.Name = "Sex"\G +``` + +* @rob wants fine-grained access control with case attributes. + +* @flisso, case-attributes are GN invention. Case Attributes are extra metadata about a given dataset beyond the phenotype measurements. E.g. We can have the phenotype: "Central nervous system"; whereby we collect the values, and SE. However, we can also collect extra metadata like "Body Weight", "Sex", "Status", etc, and in GN terminology, that extra metadata is called Case Attributes. + +* @bmunyoki. Most of the confusion around case-attributes is because of how we store case-attributes. We don't have unique identifiers for case-attributes. + +## 2024-08-30 + +* IN PROGRESS: @bmunyoki Replicate GN1 WIKI+RIF in GN2. +* DONE: @bmunyoki and @alex help Alex deploy gn-guile code on tux02, run this in a tmux session. +* DONE: @bmunyoki api for history for all tasks +* DONE: @bmunyoki UI layer for RDF history +* @bmunyoki modify edit api to also write to RIF +* @bmunyoki build system container for gn-guile and write documentation for creating containers +* NOT DONE: @jnduli complete design doc for hooks system for gn-auth +* DONE: @alexm @jnduli create branches to testing for LLM in GN2 and GN3 +* IN PROGRESS: @alexm @jnduli integrate LLM in GN2 and GN3 +* IN PROGRESS: @jnduli add auth layer to edit rifs functionality +* DONE: @bmunyoki @felix sync on case attributes and document +* DONE: @flisso managed to upload <TODO> dataset to production + + +### nice to haves + +* nice_to_have: @bmunyoki experiment and document updating gn-bioinformatics set up packages (to support rshiny) + +## 2024-08-23 +* @shelby re-ingest data and run RAGAs against the queries already in the system to perform comparison with new papers. +* @shelby figure out Claude Sonnet stuff. +* IN PROGRESS: @felix @fred push RQTL bundles to uploader, also includes metadata. +* IN PROGESS: @felix look for means to fix metadata challenge ie. trouble associating data we upload and metadata that provides descriptions. +* DONE: @bmunyoki API: Get all RIF metadata by symbols from rdf. +* NOT DONE: @bmunyoki UI: Modify traits page to have "GN2 (GeneWiki)", to be picked after RDF is updated in tux02 +* DONE: @bmunyoki UI: Integrate with API +* NOT DONE: @bmunyoki Replicate GN1 WIKI+RIF in GN2. +* IN PROGRESS: @bmunyoki and @alex help Alex deploy gn-guile code on tux02. +* DONE: @bmunyoki @jnduli review gn2 UI change for markdown editor +* NOT DONE: @bmunyoki create template for bio paper +* DONE: @alex sync with Boni to set up gn-guile +* DONE: @alex @bmunyoki @jnduli sync to plan out work for llm integration +* DONE: @jnduli edit WIKI+RIF +* NOT DONE: @jnduli set up gn-uploader locally and improve docs +* NOT DONE: @jnduli complete design doc for hooks system for gn-auth +* DONE: @felix to document email threads on gemtext + +## 2024-08-22 + +=> https://issues.genenetwork.org/issues/edit-rif-metadata APIs for wiki editting and broke down wiki-editting task to sub-projects. + +## 2024-08-20 + +Integrating GNQA to GN2 website and how it will work? + +1. Have the context information displayed to the right of the GN2 xapian search page +2. When someone clicks the context info page, it opens the search from GNQA which has all the references. +3. Cache queries since many searches are the same. + +Problems: + +1. search has xapian specific terminology. How do we handle this? Remove xapian prefixes and provide the key words to search. +2. how do we handle cache expiry? + - no expiry for now. + - store them in a database table. + - every quarter year, the search can be updated. + - group:bxd, species: mouse -> bxd mouse + mouse bxd: -> when caching the ordering for the seach terms shouldn't matter much. + +Game Plan: + +1. Production the code relating to LLM search. Get the code for LLMs merged into main branch. +2. UI changes to show the search context from LLM. +3. Figuring out caching: + - database table structure + - cache expiry (use 1 month for now) + - modify LLM search to pick from cache if it exists. +4. Have another qa branch that fixes all errors since we had the freeze. +5. Only logged in users will have access to this functionality. + +## 2024-08-16 +* @jnduli Fix failing unit tests on GN-Auth. +* @jnduli Exploring Mechanical Rob for Integration Tests. GN-Auth should be as stable as possible. +* @jnduli Research e-mail patch workflow and propose a sane workflow for GN through an engineering blog post. +* @jnduli Help @alexm with auth work. +* @felix @fred push RQTL bundles to uploader. +* @felix look for means to fix metadata challenge ie. trouble associating data we upload and metadata that provides descriptions. +* @felix @jnduli programming learning: started building a web server to learn backend using Flask. +* @felix @jnduli Read Shelby's paper and provide feedback by the end of Saturday. + +## 2024-08-16 +* DONE: @jnduli Fix failing unit tests on GN-Auth. +* NOT DONE: @jnduli Exploring Mechanical Rob for Integration Tests. GN-Auth should be as stable as possible. +* NOT DONE: @jnduli Research e-mail patch workflow and propose a sane workflow for GN through an engineering blog post. +* DONE: @jnduli Help @alexm with auth work. +* IN PROGRESS: @felix @fred push RQTL bundles to uploader, also includes metadata. +* IN PROGRESS: @felix look for means to fix metadata challenge ie. trouble associating data we upload and metadata that provides descriptions. +* DONE: @felix @jnduli programming learning: started building a web server to learn backend using Flask. Learning html and css and will share the progress with this. +* DONE: @felix ~@jnduli~ Read Shelby's paper and provide feedback by the end of Saturday. +* DONE: @felix tested the time tracker script. +* IN PROGRESS: @bmunyoki implementation code work to edit Rif + WIki SQL n RDF data. We'll break this down. +* @bmunyoki and @alex help Alex deploy gn-guile code on tux02. +* NOT DONE: @bmunyoki Replicate GN1 WIKI+RIF in GN2. +* @shelby @bonfacem @alex Integrate QNQA Search to global search. +* @shelby handling edits with the current open paper + +Nice To Have: +* DEPRIORITIZED: @felix figure out how to fix large data uploads ie. most data sets are large e.g. 45GB. Uploader cannot handle these large files. +* DONE: @felix Try out John's time tracking tool and provide feedback. +* @shelby run RAGAs against the queries already in the system to perform comparison with new papers: re-ingesting, now at 1500 papers. +* @bmunyoki Send out emails to the culprit on failing tests in CI/CD. + +## 2024-08-15 +### RTF Editing (bmunyoki+alexm) + +In our static content, we don't really store RTF; instead we store, HTML. As an example compare these 2 documements and note their difference: + +=> https://github.com/bitfocus/rtf2text/blob/master/sample.rtf => [Proper RTF] sample.rtf +=> https://github.com/genenetwork/gn-docs/blob/master/general/datasets/Br_u_1203_rr/acknowledgment.rtf => [GN] acknowledgement.rtf + +* TODO @alexm Rename all the *rtf to *html during transform to make things clearer. Send @bonfacem PR. + +## 2024-08-13 +### Markdown Editor (bmunyoki+alexm) + +* @alexm @bonfacem Tested the Markdown Editor locally and it works fine. Only issue is that someone can make edits without logging in. +* API end-points to be only exposed locally. +* @alexm: Fix minor bug for when showing the diff. Have a back arrow. +* @bonfacem, @alexm: Deploy gn-guile; make sure it's only exposed locally. +* [blocking] @alexm having issues setting up gn-auth. @jnduli to help out to set up gn-auth and work out any quirks. @alexm to make sure you can't make edits without being logged in. +* @bmunyoki to set merge ge-editor UI work once basic auth is figured out. +* [nice-to-have] @alexm work on packaging: "diff2html-ui.min.js", "diff.min.js", "marked.min.js", "index.umd.js", "diff2html.min.js". +* [nice-to-have] @alexm to check-out djlint for linting jinja templates. +* @bonfacem share pre-commit hooks for setting up djlint and auto-pep8. +* [nice-to-have] @alexm to checkout: + +> djlint gn2/wqflask/templates/gn_editor.htmll --profile=jinja --reformat --format-css --format-js +=> https://www.djlint.com/ dj Lint; Lint & Format HTML Templates + +## 2024-08-09 + +* @shelby figure out Claude Sonnet stuff: NOT DONE, main focus was on the paper +* @shelby planning session for next work and tasks for Priscilla. DONE: Priscilla was given some work. Loop in Priscilla for our meetings. +* @shelby format output for ingested paper so that we can test the RAG engine. IN PROGRESS. Most focus has been on editing paper and some funding pursuit. +* @shelby run RAGAs against the queries already in the system to perform comparison with new papers. NOT DONE. +* @bmunyoki implementation code work to edit Rif + WIki SQL n RDF data. IN PROGRESS. Updated the RDF transform for geneWIKI; Now we can do a single GET for a single comment in RDF. +* @bmunyoki @shelby group paper on dissertation to target Arxiv. NOT DONE. +* @bmunyoki and @alex help Alex deploy gn-guile code on tux02. NOT DONE. Currently auth is a blocker. +* @bmunyoki review UI code editor work. DONE. +* @alex address comments in UI work. DONE. +* @felix @fred push RQTL bundles to uploader. In Progress: OOM Killer killing upload process. +* @felix look for means to fix metadata challenge ie. trouble associating data we upload and metadata that provides descriptions. The metadata doesn't meeting requirements. In Progress: Some things to be confirmed with Rob/PJ on coming up with a good format for adding metadata. NOT DONE. +* @felix figure out how to fix large data uploads ie. most data sets are large e.g. 45GB. Uploader cannot handle these large files. +* @felix @jnduli programming learning: started building a web server to learn backend using Flask. NOT DONE. +* @felix (@bmunyoki / @alex) learning emacs so that he figures out how to track times. @jnduli shared his time-tracking tool with @felix. DONE. +* @jnduli fix group creation bug in gn-auth. DONE: Group creation wasn't exactly a bug; updated docs, and fixed the masquerade API. +* @jnduli edit rif metadata using gn3. NOT DONE +* @jnduli update documentation for gn-auth setup. DONE +* @jnduli investigate more bugs related to gn-auth. DONE + +Note: When setting up sync between @jnduli and @felix, add @bmunyoki too. + + +## 2024-08-02 + +* DONE: @bmunyoki virtuoso and xapian updated in prod +* @bmunyoki code work to edit Rif + WIki SQL n RDF data: WIP, we have desired API, but we need to implement code. +* NOT DONE: @bmunyoki group paper on dissertation to target Arxiv +* DONE: @bmunyoki fix case insensitivity in Xapian search +* DONE: @jnduli review Alex patches +* DONE: @bmunyoki: updated gn2 and gn3 on git.genetwork server. Shared QA code with @shelby on a special branch. +* @bmunyoki @jnduli: fixed minor bug on xapian reflected with stemming. +* @shelby figure out Claude Sonnet stuff: NOT DONE, main focus was on the paper +* IN PROGRESS: @shelby edit paper with @pjtor +* @shelby planning session for next work and tasks for Priscilla. +* @shelby use RAGAS to test R2R with the new papers (follow up on the ingestion of papers tasks) +* @shelby and @boni to discuss R2R and interfacing with Virtuoso: deprioritized, we'll figure out interfacing with R2R. Implementation to happen later. +* DONE: @jnduli get up to speed on gn-auth +* @alex have an instance of gn-guile running on production: Code in prod, but needs to liase with Boni to get this working. +* @jgart getting genecup and rshiny containers to run as normal users instead of root users. May use libvirts APIs; or podman/docker as normal user; or rewriting the services as guix home services: system container doesn't have work around this, there's no work around. Because guix by default needs root to run as a system container. We also need sudo since at root level we define our system containers in a systemd that needs to be run as root. Why systemd? Systemd no one needs to run this. + +### Meeting with Sevila on Masters Papers + +- mainly stylistic changes provided. +- provide an email explaining how long ethical review took, so that he follows up on unexpected delays. +- met up with Dr Betsy, once done with defences in October (hopefully), and Boni may get his degree before graduation next year, to facilitate Boni applying for PhD. + +### Guix Root Container + +- With docker, to prevent the need for sudo, we usually create a docker group, and add users that need to run this to this group. Can this ahppen in guix? +- Guix has a guix group. Why haven't we done this??? @jgart and @boni + +## 2024-07-26 +Plan for this week: + +* NOT DONE, needs a meeting: @bmunyoki virtuoso and xapian are up-to-date in prod. Boni doesn't have root access in production, so coordination with Fred and Zach is causing delays. +* Apis design DONE, actual CODE incomplete: @bmunyoki update RIF+WIKI on SQL and RDF from gn2 website +* DONE: @bmunyoki and @shelby review dissertation for Masters +* DONE, needs to review new changes: @bmunyoki and @jgart to review patches for `genecup` and `rshiny`. +* @bmunyoki and @jnduli to review patches for markdown parser +* DONE, patches sent. @alexm add validation and document to markdown parser. +* DONE: @shelby ingest ageing data to RAG, 10% left to complete. +* DONE: @shelby do another round for editting on the AI paper +* IN PROGRESS: @shelby RAG engine only works with OpenAI, figure out Claude Sonnet integration +* IN PROGRESS: @jnduli get up to speed on gn-auth +* @jgart enabling acme service in genecup and rshiny containers. +* @jnduli and @bmunyoki to attempt to get familiar with R2R + +Nice to have: +* @bmunyoki fix CI job for GN transformer database i.e. instead of checksums just run full job once per month: scheme script created that dumps the example files, next step is to create Gexp that runs this script. Bandwidth constraints. + +## 2024-07-23 +### LLM Meeting (@shelby+@bmunyoki) +* There's no clear way of ingesting human-readable data with context into the RAG Graph from RDF. +* What specific graph should we ingest into the RAG Graph from RDF? @bmunyoki suggested RIF, PubMed Metadata. We'll figure this out. +* @bonfacem recommended: Much better to work with SPARQL than directly with TTL files. +* We've uploaded rdf triples, yet they loose their strength as the RAG system is not undergirded with a knowledge graph. @bonfacem should read the following for more context and should reach out to @shelby on how to move forward with SPARQL more concretely: + +=> https://r2r-docs.sciphi.ai/cookbooks/knowledge-graph#r2r-knowledge-graph-configuration + +* We need to test the knowledge graph backend of R2R to see how feasible it is to use with the existing data (RDF). +* Fahamu just stored the object and lost the subject+predicate +* Loop in Alex. + + +## 2024-07-19 +Plan for this week: + +* DONE: @jgart getting `genecup` app to run in a guix container i.e. `gunicorn service` should then run `genecup`, similar to how gn2 and gn-uploader work. Patches sent to Boni, include `genecup` and `rshiny` and the container patches are tested. +* @jgart enable acme certificates for `genecup` container: Should just enable a single form, let's use arun's email since its what we use for all our services. Reverse proxy happens inside the container. Add a comment explaining that this shouldn't be standard python set up. +* INPROGRESS: @bmunyoki virtuoso and xapian are up-to-date in prod: +* NOT DONE: @bmunyoki update RIF+WIKI on SQL and RDF from gn2 website +* INPROGRESS: @bmunyoki fix CI job for GN transformer database i.e. instead of checksums just run full job once per month: scheme script created that dumps the example files, next step is to create Gexp that runs this script. Bandwidth constraints. +* @bmunyoki and @shelby review dissertation for Masters: @bonz needs to send updated version. Also reviewed another masters by Johannes. +* ON HOLD: @alexm rewrite UI code using htmx +* INPROGRESS: @alexm address review comments in markdown parser. Api endpoints are getting reimplemented. Needs to add validation and documentation and send v2 patches for review. +* DONE: @shelby compile ingesting 500 more papers into RAG engine +* @shelby ingesting ageing research into the RAG engine: diabetes reseach is ingested, ageing will be done later. +* NOT DONE: @shelby RAG engine only works with OpenAI, figure out Claude Sonnet integration +* DONE: @shelby @bmunyoki @alexm to define the problem with RDF triple stores +* DONE: @jnduli finish up on RIF update +* IN PROGRESS: @jnduli get up to speed on gn-auth + +AOB + +* RAG engine uses R2R for the integration. It would be great if we could integrate this into guix. @shelby will send @jgart the paper on how we use the RAG. + + +## 2024-07-12 + +Plan for this week: + +* @shelby use Claude Sonnet with R2R RAG engine with 1000 papers and fix bugs: 500 papers ingested into R2R, remaining with 500. +* @shelby final run through for paper 1 before Pjotr's review. DONE, configurations fixed. New repo gnai that contains the results and will contain R2R stuff. +* NOT DONE: @shelby and @bmunyoki review dissertation paper for Masters +* @shelby @bmunyoki @alexm to define the problem with RDF triple stores +* @alexm integrate the markdown parser: DONE, patches sent to Boni +* @alexm rewrite UI code using htmx: NOT DONE +* @bmunyoki investigate why xapian index isn't getting rebuilt: DONE +* @bmunyoki investigate discrepancies between wiki and rif search: DONE, get this to prod to be tested +* @jnduli update the generif_basic table from NCBI: IN PROGRESS. +* @jnduli blog post of preference for documentation: DONE. + +We have qa.genenetwork.com. We need to have this set up to `qa.genenetwork.com/paper1` so that we always have the system that was used for this. How? + +Nice to Haves + +* @bmunyoki Nice to have tag for paper1: Fix this with Boni and get done later on/iron them out then. +* @bmunyoki fix CI job that transforms gn tables to TTL: Move this to running a cron job once per month instead of + + +## 2024-06-24 + +Plan for this Week: + +* CANCELED: @bmunyoki Remove boolean prefixes from search where it makes sense. +* DONE: @bmunyoki GeneWiki + GeneRIF search in production. Mostly needs to be run in prod to see impact. +* DONE: @jnduli Children process termination when we kill the main index-genenetwork script +* CANCELED: @bmunyoki Follow up on getting virtuoso child pages in production +* IN PROGRESS @alexm push endpoints for editting and making commits for markdown files +* DONE: @all Reply to survey from Shelby +* DONE: @jnduli Fix JS import orders (without messing up the rest of Genenetwork) +* DONE: @jnduli fix search results when nothing is found +* CANCELED: @jnduli test out running guix cron jobs locally +* NOT DONE: @Jnduli mention our indexing documentation in gn2 README + +Note: For qa.genenetwork.com, we chose to pause work on this until papers are done. + +Review for last week + +* DONE: @bmunyoki rebuild guix container with new mcron changes +* WIP: @jnduli attempts to make UI change that shows all supported keys in the search: Blocked because our JS imports aren't ordered correctly and using `boolean_prefixes` means our searches don't work as we'd expect. +* WIP: @bmunyoki create an issue with all the problems experienced with search and potential solutions. Make sure it has replication steps, and plans for solutions. Issue was created but we need to get a better understanding for how cis and trans searches work. +* TODO: @bmunyoki and @jnduli genewiki indexing: PR for WIKI indexing is completed, but we didn't test it out due to the outage caused by RAM and our script. We don't have a way to easily instrument how much RAM our process uses and how to kill the process. +* DONE: @bmunyoki demoes and documents how to run and test guix cron job for indexing +* DONE: @bmunyoki trains @jnduli on how to review patchsets from emails +* DONE: @jnduli Follow up notes on setting up local index-genenetwork search +* DONE: @alexm handling with graduation, AFK +* TODO: @bmunyoki follow up with Rob to makes sure he tests search after everything is complete: He got some feedback and Rob is out of Town but wants RIF and Wiki search by July 2nd. + +Nice to haves: + +* TODO: minor: bonfacem makes sure that mypy/pylint in CI runs against the index-genenetwork script. +* TODO: @bmunyoki follow up how do we make sure that xapian prefix changes in code retrigger xapian indexing? + - howto: xapian prefix changes, let's maintain a hash for the file and store it in xapian + - howto: for RDF changes, since we have ttl files, if this ever changes we trigger the script. It's also nice to be able to automatically also load up data to virtuoso if this file changes. + + ## 2024-06-21 ### Outage for 2024-06-20 diff --git a/topics/octopus/lizardfs/README.gmi b/topics/octopus/lizardfs/README.gmi index 78316ef..7c91136 100644 --- a/topics/octopus/lizardfs/README.gmi +++ b/topics/octopus/lizardfs/README.gmi @@ -86,14 +86,23 @@ Other commands can be found with `man lizardfs-admin`. ## Deleted files -Lizardfs also keeps deleted files, by default for 30 days. If you need to recover deleted files (or delete them permanently) then the metadata directory can be mounted with: +Lizardfs also keeps deleted files, by default for 30 days in `/mnt/lizardfs-meta/trash`. If you need to recover deleted files (or delete them permanently) then the metadata directory can be mounted with: ``` $ mfsmount /path/to/unused/mount -o mfsmeta ``` For more information see the lizardfs documentation online -=> https://dev.lizardfs.com/docs/adminguide/advanced_configuration.html#trash-directory lizardfs documentation for the trash directory +=> https://lizardfs-docs.readthedocs.io/en/latest/adminguide/advanced_configuration.html#trash-directory lizardfs documentation for the trash directory + +## Start lizardfs-mount (lizardfs reader daemon) after a system reboot + +``` +sudo bash +systemctl daemon-reload +systemctl restart lizardfs-mount +systemctl status lizardfs-mount +``` ## Gotchas diff --git a/topics/octopus/maintenance.gmi b/topics/octopus/maintenance.gmi new file mode 100644 index 0000000..65ea52e --- /dev/null +++ b/topics/octopus/maintenance.gmi @@ -0,0 +1,98 @@ +# Octopus/Tux maintenance + +## To remember + +`fdisk -l` to see disk models +`lsblk -nd` to see mounted disks + +## Status + +octopus02 +- Devices: 2 3.7T SSDs + 2 894.3G SSDs + 2 4.6T HDDs +- **Status: Slurm not OK, LizardFS not OK** +- Notes: + - `octopus02 mfsmount[31909]: can't resolve master hostname and/or portname (octopus01:9421)`, + - **I don't see 2 drives that are physically mounted** + +octopus03 +- Devices: 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK +- Notes: **I don't see 2 drives that are physically mounted** + +octopus04 +- Devices: 4 7.3 T SSDs (Neil) + 1 4.6T HDD + 1 3.7T SSD + 2 894.3G SSDs +- Status: Slurm NO, LizardFS OK (we don't share the HDD) +- Notes: no + +octopus05 +- Devices: 1 7.3 T SSDs (Neil) + 5 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK +- Notes: no + +octopus06 +- Devices: 1 7.3 T SSDs (Neil) + 1 4.6T HDD + 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK (we don't share the HDD) +- Notes: no + +octopus07 +- Devices: 1 7.3 T SSDs (Neil) + 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK +- Notes: **I don't see 1 device that is physically mounted** + +octopus08 +- Devices: 1 7.3 T SSDs (Neil) + 1 4.6T HDD + 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK (we don't share the HDD) +- Notes: no + +octopus09 +- Devices: 1 7.3 T SSDs (Neil) + 1 4.6T HDD + 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK (we don't share the HDD) +- Notes: no + +octopus10 +- Devices: 1 7.3 T SSDs (Neil) + 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK (we don't share the HDD) +- Notes: **I don't see 1 device that is physically mounted** + +octopus11 +- Devices: 1 7.3 T SSDs (Neil) + 5 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK +- Notes: on + +tux05 +- Devices: 1 3.6 NVMe + 1 1.5T NVMe + 1 894.3G NVMe +- Status: Slurm OK, LizardFS OK (we don't share anything) +- Notes: **I don't have a picture to confirm physically mounted devices** + +tux06 +- Devices: 2 3.6 T SSDs (1 from Neil) + 1 1.5T NVMe + 1 894.3G NVMe +- Status: Slurm OK, LizardFS (we don't share anything) +- Notes: + - **Last picture reports 1 7.3 T SSD (Neil) that is missing** + - **Disk /dev/sdc: 3.64 TiB (Samsung SSD 990: free and usable for lizardfs** + - **Disk /dev/sdd: 3.64 TiB (Samsung SSD 990): free and usable for lizardfs** + +tux07 +- Devices: 3 3.6 T SSDs + 1 1.5T NVMe (Neil) + 1 894.3G NVMe +- Status: Slurm OK, LizardFS +- Notes: + - **Disk /dev/sdb: 3.64 TiB (Samsung SSD 990): free and usable for lizardfs** + - **Disk /dev/sdd: 3.64 TiB (Samsung SSD 990): mounted at /mnt/sdb and shared on LIZARDFS: TO CHECK BECAUSE IT HAS NO PARTITIONS** + +tux08 +- Devices: 3 3.6 T SSDs + 1 1.5T NVMe (Neil) + 1 894.3G NVMe +- Status: Slurm OK, LizardFS +- Notes: no + +tux09 +- Devices: 1 3.6 T SSDs + 1 1.5T NVMe + 1 894.3G NVMe +- Status: Slurm OK, LizardFS +- Notes: **I don't see 1 device that is physically mounted** + +## Neil disks +- four 8TB SSDs on the right of octopus04 +- one 8TB SSD in the left slot of octopus05 +- six 8TB SSDs bottom-right slot of octopus06,07,08,09,10,11 +- one 4TB NVMe and one 8TB SSDs on tux06, NVME in the bottom-right of the group of 4 on the left, SSD on the bottom-left of the group of 4 on the right +- one 4TB NVMe on tux07, on the top-left of the group of 4 on the right +- one 4TB NVMe on tux08, on the top-left of the group of 4 on the right diff --git a/topics/octopus/recent-rust.gmi b/topics/octopus/recent-rust.gmi new file mode 100644 index 0000000..7ce8968 --- /dev/null +++ b/topics/octopus/recent-rust.gmi @@ -0,0 +1,76 @@ +# Use a recent Rust on Octopus + + +For impg we currently need a rust that is more recent than what we have in Debian +or Guix. No panic, because Rust has few requirements. + +Install latest rust using the script + +``` +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +``` + +Set path + +``` +. ~/.cargo/env +``` + +Update rust + +``` +rustup default stable +``` + +Next update Rust + +``` +octopus01:~/tmp/impg$ . ~/.cargo/env +octopus01:~/tmp/impg$ rustup default stable +info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu' +info: latest update on 2025-05-15, rust version 1.87.0 (17067e9ac 2025-05-09) +info: downloading component 'cargo' +info: downloading component 'clippy' +info: downloading component 'rust-docs' +info: downloading component 'rust-std' +info: downloading component 'rustc' +(...) +``` + +and build the package + +``` +octopus01:~/tmp/impg$ cargo build +``` + +Since we are not in guix we get the local dependencies: + +``` +octopus01:~/tmp/impg$ ldd target/debug/impg + linux-vdso.so.1 (0x00007ffdb266a000) + libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe404001000) + librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe403ff7000) + libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe403fd6000) + libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe403fd1000) + libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe403e11000) + /lib64/ld-linux-x86-64.so.2 (0x00007fe404682000) +``` + +Login on another octopus - say 02 you can run impg from this directory: + +``` +octopus02:~$ ~/tmp/impg/target/debug/impg +Command-line tool for querying overlaps in PAF files + +Usage: impg <COMMAND> + +Commands: + index Create an IMPG index + partition Partition the alignment + query Query overlaps in the alignment + stats Print alignment statistics + +Options: + -h, --help Print help + -V, --version Print version +``` diff --git a/topics/octopus/set-up-guix-for-new-users.gmi b/topics/octopus/set-up-guix-for-new-users.gmi new file mode 100644 index 0000000..f459559 --- /dev/null +++ b/topics/octopus/set-up-guix-for-new-users.gmi @@ -0,0 +1,38 @@ +# Set up Guix for new users + +This document describes how to set up Guix for new users on a machine in which Guix is already installed (such as octopus01). + +## Create a per-user profile for yourself by running your first guix pull + +"Borrow" some other user's guix to run guix pull. In the example below, we use root's guix, but it might as well be any guix. +``` +$ /var/guix/profiles/per-user/root/current-guix/bin/guix pull +``` +This should create your very own Guix profile at ~/.config/guix/current. You may invoke guix from this profile as +``` +$ ~/.config/guix/current/bin/guix ... +``` +But, you'd normally want to make this more convenient. So, add ~/.config/guix/current/bin to your PATH. To do this, add the following to your ~/.profile +``` +GUIX_PROFILE=~/.config/guix/current +. $GUIX_PROFILE/etc/profile +``` +Thereafter, you may run any guix command simply as +``` +$ guix ... +``` + +## Pulling from a different channels.scm + +By default, guix pull pulls the latest commit of the main upstream Guix channel. You may want to pull from additional channels as well. Put the channels you want into ~/.config/guix/channels.scm, and then run guix pull. For example, here's a channels.scm if you want to use the guix-bioinformatics channel. +``` +$ cat ~/.config/guix/channels.scm +(list (channel + (name 'gn-bioinformatics) + (url "https://git.genenetwork.org/guix-bioinformatics") + (branch "master"))) +``` +And, +``` +$ guix pull +``` diff --git a/topics/octopus/slurm-upgrade.gmi b/topics/octopus/slurm-upgrade.gmi new file mode 100644 index 0000000..822f68e --- /dev/null +++ b/topics/octopus/slurm-upgrade.gmi @@ -0,0 +1,89 @@ +# How to upgrade slurm on octopus + +This document closely mirrors the official upgrade guide. The official upgrade guide is very thorough. Please refer to it and update this document if something is not clear. +=> https://slurm.schedmd.com/upgrades.html Official slurm upgrade guide + +## Preparation + +It is possible to upgrade slurm in-place without upsetting running jobs. But, for our small cluster, we don't mind a little downtime. So, it is simpler if we schedule some downtime with other users and make sure there are no running jobs. + +slurm can only be upgraded safely in small version increments. For example, it is safe to upgrade version 18.08 to 19.05 or 20.02, but not to 20.11 or later. This compatibility information is in the RELEASE_NOTES file of the slurm git repo with the git tag corresponding to the version checked out. Any configuration file changes are also outlined in this file. +=> https://github.com/SchedMD/slurm/ slurm git repository + +## Backup + +Stop the slurmdbd, slurmctld and slurmd services. +``` +# systemctl stop slurmdbd slurmctld slurmd slurmrestd +``` +Backup the slurm StateSaveLocation (/var/spool/slurmd/ctld in our case) and the slurm configuration directory. +``` +# cp -av /var/spool/slurmd/ctld /somewhere/safe/ +# cp -av /etc/slurm /somewhere/safe/ +``` +Backup the slurmdbd MySQL database. Enter the password when prompted. The password is specified in StoragePass of /etc/slurm/slurmdbd.conf. +``` +$ mysqldump -u slurm -p --databases slurm_acct_db > /somewhere/safe/slurm_acct_db.sql +``` + +## Upgrade slurm on octopus01 (the head node) + +Clone the gn-machines git repo. +``` +$ git clone https://git.genenetwork.org/gn-machines +``` +Edit slurm.scm to build the version of slurm you are upgrading to. Ensure it builds successfully using +``` +$ guix build -f slurm.scm +``` +Upgrade slurm. +``` +# ./slurm-head-deploy.sh +``` +Make any configuration file changes outlined in RELEASE_NOTES. Next, run the slurmdbd daemon, wait for it to start up successfully and then exit with Ctrl+C. During upgrades, slurmdbd may take extra time to update the database. This may cause systemd to timeout and kill slurmdbd. So, we do it this way, instead of simply starting the slurmdbd systemd service. +``` +# sudo -u slurm slurmdbd -D +``` +Reload the new systemd configuration files. Then, start the slurmdbd, slurmctld and slurmd services one at a time ensuring that each starts up correctly before proceeding on to the next. +``` +# systemctl daemon-reload +# systemctl start slurmdbd +# systemctl start slurmctld +# systemctl start slurmd +# systemctl start slurmrestd +``` + +## Upgrade slurm on the worker nodes + +Repeat the steps below on every worker node. + +Stop the slurmd service. +``` +# systemctl stop slurmd +``` +Upgrade slurm, passing slurm-worker-deploy.sh the slurm store path obtained from building slurm using guix build on octopus01. Recall that you cannot invoke guix build on the worker nodes. +``` +# ./slurm-worker-deploy.sh /gnu/store/...-slurm +``` +Copy over any configuration file changes from octopus01. Then, reload the new systemd configuration files and start slurmd. +``` +# systemctl daemon-reload +# systemctl start slurmd +``` + +## Tip: Running the same command on all worker nodes + +It is a lot of typing to run the same command on all worker nodes. You could make this a little less cumbersome with the following bash for loop. +``` +for node in octopus02 octopus03 octopus05 octopus06 octopus07 octopus08 octopus09 octopus10 octopus11 tux05 tux06 tux07 tux08 tux09; +do + ssh $node your command +done +``` +You can even do this for sudo commands using the -S flag of sudo that makes it read the password from stdin. Assuming your password is in the pass password manager, the bash for loop would then look like: +``` +for node in octopus02 octopus03 octopus05 octopus06 octopus07 octopus08 octopus09 octopus10 octopus11 tux05 tux06 tux07 tux08 tux09; +do + pass octopus | ssh $node sudo -S your command +done +```
\ No newline at end of file diff --git a/topics/programming/autossh-for-keeping-ssh-tunnels.gmi b/topics/programming/autossh-for-keeping-ssh-tunnels.gmi new file mode 100644 index 0000000..a977232 --- /dev/null +++ b/topics/programming/autossh-for-keeping-ssh-tunnels.gmi @@ -0,0 +1,65 @@ +# Using autossh to Keep SSH Tunnels Alive + +## Tags +* keywords: ssh, autossh, tunnel, alive + + +## TL;DR + +``` +guix package -i autossh # Install autossh with Guix +autossh -M 0 -o "ServerAliveInterval 60" -o "ServerAliveCountMax 5" -L 4000:127.0.0.1:3306 alexander@remoteserver.org +``` + +## Introduction + +Autossh is a utility for automatically restarting SSH sessions and tunnels if they drop or become inactive. It's particularly useful for long-lived tunnels in unstable network environments. + +See official docs: + +=> https://www.harding.motd.ca/autossh/ + +## Installing autossh + +Install autossh using Guix: + +``` +guix package -i autossh +``` + +Basic usage: + +``` +autossh [-V] [-M monitor_port[:echo_port]] [-f] [SSH_OPTIONS] +``` + +## Examples + +### Keep a database tunnel alive with autossh + +Forward a remote MySQL port to your local machine: + +**Using plain SSH:** + +``` +ssh -L 5000:localhost:3306 alexander@remoteserver.org +``` + +**Using autossh:** + +``` +autossh -L 5000:localhost:3306 alexander@remoteserver.org +``` + +### Better option + +``` +autossh -M 0 -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" -L 5000:localhost:3306 alexander@remoteserver.org +``` + +#### Option explanations: + +- `ServerAliveInterval`: Seconds between sending keepalive packets to the server (default: 0). +- `ServerAliveCountMax`: Number of unanswered keepalive packets before SSH disconnects (default: 3). + +You can also configure these options in your `~/.ssh/config` file to simplify command-line usage. diff --git a/topics/systems/backup-drops.gmi b/topics/systems/backup-drops.gmi index 191b185..3f81c5a 100644 --- a/topics/systems/backup-drops.gmi +++ b/topics/systems/backup-drops.gmi @@ -4,6 +4,10 @@ To make backups we use a combination of sheepdog, borg, sshfs, rsync. sheepdog i This system proves pretty resilient over time. Only on the synology server I can't get it to work because of some CRON permission issue. +For doing the actual backups see + +=> ./backups-with-borg.gmi + # Tags * assigned: pjotrp @@ -13,7 +17,7 @@ This system proves pretty resilient over time. Only on the synology server I can ## Borg backups -It is advised to use a backup password and not store that on the remote. +Despite our precautions it is advised to use a backup password and *not* store that on the remote. ## Running sheepdog on rabbit @@ -59,14 +63,14 @@ where remote can be an IP address. Warning: if you introduce this `AllowUsers` command all users should be listed or people may get locked out of the machine. -Next create a special key on the backup machine's ibackup user (just hit enter): +Next create a special password-less key on the backup machine's ibackup user (just hit enter): ``` su ibackup ssh-keygen -t ecdsa -f $HOME/.ssh/id_ecdsa_backup ``` -and copy the public key into the remote /home/bacchus/.ssh/authorized_keys +and copy the public key into the remote /home/bacchus/.ssh/authorized_keys. Now test it from the backup server with @@ -82,13 +86,20 @@ On the drop server you can track messages by tail -40 /var/log/auth.log ``` +or on recent linux with systemd + +``` +journalctl -r +``` + Next ``` ssh -v -i ~/.ssh/id_ecdsa_backup bacchus@dropserver ``` -should give a Broken pipe(!). In auth.log you may see something like +should give a Broken pipe(!) or -- more recently -- it says `This service allows sftp connections only`. +When running sshd with a verbose switch you may see something like fatal: bad ownership or modes for chroot directory component "/export/backup/" @@ -110,6 +121,19 @@ chown bacchus.bacchus backup/bacchus/drop/ chmod 0700 backup/bacchus/drop/ ``` +Another error may be: + +``` +fusermount3: mount failed: Operation not permitted +``` + +This means you need to set the suid on the fusermount3 command. Bit nasty in Guix. + +``` +apt-get install fuse(3) sshfs +chmod 4755 /usr/bin/fusermount +``` + If auth.log says error: /dev/pts/11: No such file or directory on ssh, or received disconnect (...) disconnected by user we are good to go! Note: at this stage it may pay to track the system log with @@ -171,3 +195,5 @@ sshfs -o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3,IdentityFile=~/. The recent scripts can be found at => https://github.com/genenetwork/gn-deploy-servers/blob/master/scripts/tux01/backup_drop.sh + +# borg-borg diff --git a/topics/systems/backups-with-borg.gmi b/topics/systems/backups-with-borg.gmi new file mode 100644 index 0000000..1ad0112 --- /dev/null +++ b/topics/systems/backups-with-borg.gmi @@ -0,0 +1,220 @@ +# Borg backups + +We use borg for backups. Borg is an amazing tool and after 25+ years of making backups it just feels right. +With the new tux04 production install we need to organize backups off-site. The first step is to create a +borg runner using sheepdog -- sheepdog we use for monitoring success/failure. +Sheepdog essentially wraps a Unix command and sends a report to a local or remote redis instance. +Sheepdog also includes a web server for output: + +=> http://sheepdog.genenetwork.org/sheepdog/status.html + +which I run on one of my machines. + +# Tags + +* assigned: pjotrp +* keywords: systems, backup, sheepdog, database + +# Install borg + +Usually I use a version of borg from guix. This should really be done as the borg user (ibackup). + +``` +mkdir ~/opt +guix package -i borg ~/opt/borg +tux04:~$ ~/opt/borg/bin/borg --version + 1.2.2 +``` + +# Create a new backup dir and user + +The backup should live on a different disk from the things we backup, so when that disk fails we have another. + +The SQL database lives on /export and the containers live on /export2. /export3 is a largish slow drive, so perfect. + +By convention I point /export/backup to the real backup dir on /export3/backup/borg/ Another convention is that we use an ibackup user which has the backup passphrase in ~/.borg-pass. As root: + +``` +mkdir /export/backup/borg +chown ibackup:ibackup /export/backup/borg +chown ibackup:ibackup /home/ibackup/.borg-pass +su ibackup +``` + +Now you should be able to load the passphrase and create the backup dir + +``` +id + uid=1003(ibackup) +. ~/.borg-pass +cd /export/backup/borg +~/opt/borg/bin/borg init --encryption=repokey-blake2 genenetwork +``` + +Now we can run our first backup. Note that ibackup should be a member of the mysql and gn groups + +``` +mysql:x:116:ibackup +``` + +# First backup + +Run the backup the first time: + +``` +id + uid=1003(ibackup) groups=1003(ibackup),116(mysql) +~/opt/borg/bin/borg create --progress --stats genenetwork::first-backup /export/mysql/database/* +``` + +You may first need to update permissions to give group access + +``` +chmod g+rx -R /var/lib/mysql/* +``` + +When that works borg reports: + +``` +Archive name: first-backup +Archive fingerprint: 376d32fda9738daa97078fe4ca6d084c3fa9be8013dc4d359f951f594f24184d +Time (start): Sat, 2025-02-08 04:46:48 +Time (end): Sat, 2025-02-08 05:30:01 +Duration: 43 minutes 12.87 seconds +Number of files: 799 +Utilization of max. archive size: 0% +------------------------------------------------------------------------------ + Original size Compressed size Deduplicated size +This archive: 534.24 GB 238.43 GB 237.85 GB +All archives: 534.24 GB 238.43 GB 238.38 GB + Unique chunks Total chunks +Chunk index: 200049 227228 +------------------------------------------------------------------------------ +``` + +50% compression is not bad. borg is incremental so it will only backup differences next round. + +Once borg works we could run a CRON job. But we should use the sheepdog monitor to make sure backups keep going without failure going unnoticed. + +# Using the sheepdog + +=> https://github.com/pjotrp/deploy sheepdog code + +## Clone sheepdog + +=> https://github.com/pjotrp/deploy#install sheepdog install + +Essentially clone the repo so it shows up in ~/deploy + +``` +cd /home/ibackup +git clone https://github.com/pjotrp/deploy.git +/export/backup/scripts/tux04/backup-tux04.sh +``` + +## Setup redis + +All sheepdog messages get pushed to redis. You can run it locally or remotely. + +By default we use redis, but syslog and others may also be used. The advantage of redis is that it is not bound to the same host, can cross firewalls using an ssh reverse tunnel, and is easy to query. + +=> https://github.com/pjotrp/deploy#install sheepdog install + +In our case we use redis on a remote host and the results get displayed by a webserver. Also some people get E-mail updates on failure. The configuration is in + +``` +/home/ibackup# cat .config/sheepdog/sheepdog.conf . +{ + "redis": { + "host" : "remote-host", + "password": "something" + } +} +``` + +If you see localhost with port 6377 it is probably a reverse tunnel setup: + +=> https://github.com/pjotrp/deploy#redis-reverse-tunnel + +Update the fields according to what we use. Main thing is that is the definition of the sheepdog->redis connector. If you also use sheepdog as another user you'll need to add a config. + +Sheepdog should show a warning when you configure redis and it is not connecting. + +## Scripts + +Typically I run the cron job from root CRON so people can find it. Still it is probably a better idea to use an ibackup CRON. In my version a script is run that also captures output: + +```cron root +0 6 * * * /bin/su ibackup -c /export/backup/scripts/tux04/backup-tux04.sh >> ~/cron.log 2>&1 +``` + +The script contains something like + +```bash +#! /bin/bash +if [ "$EUID" -eq 0 ] + then echo "Please do not run as root. Run as: su ibackup -c $0" + exit +fi +rundir=$(dirname "$0") +# ---- for sheepdog +source $rundir/sheepdog_env.sh +cd $rundir +sheepdog_borg.rb -t borg-tux04-sql --group ibackup -v -b /export/backup/borg/genenetwork /export/mysql/database/* +``` + +and the accompanying sheepdov_env.sh + +``` +export GEM_PATH=/home/ibackup/opt/deploy/lib/ruby/vendor_ruby +export PATH=/home/ibackup/opt/deploy/deploy/bin:/home/wrk/opt/deploy/bin:$PATH +``` + +If it reports + +``` +/export/backup/scripts/tux04/backup-tux04.sh: line 11: /export/backup/scripts/tux04/sheepdog_env.sh: No such file or directory +``` + +you need to install sheepdog first. + +If all shows green (and takes some time) we made a backup. Check the backup with + +``` +ibackup@tux04:/export/backup/borg$ borg list genenetwork/ +first-backup Sat, 2025-02-08 04:39:50 [58715b883c080996ab86630b3ae3db9bedb65e6dd2e83977b72c8a9eaa257cdf] +borg-tux04-sql-20250209-01:43-Sun Sun, 2025-02-09 01:43:23 [5e9698a032143bd6c625cdfa12ec4462f67218aa3cedc4233c176e8ffb92e16a] +``` +and you should see the latest. The contents with all files should be visible with + +``` +borg list genenetwork::borg-tux04-sql-20250209-01:43-Sun +``` + +Make sure you not only see just a symlink. + +# More backups + +Our production server runs databases and file stores that need to be backed up too. + +# Drop backups + +Once backups work it is useful to copy them to a remote server, so when the machine stops functioning we have another chance at recovery. See + +=> ./backup-drops.gmi + +# Recovery + +With tux04 we ran into a problem where all disks were getting corrupted(!) Probably due to the RAID controller, but we still need to figure that one out. + +Anyway, we have to assume the DB is corrupt. Files are corrupt AND the backups are corrupt. Borg backup has checksums which you can + +``` +borg check repo +``` + +it has a --repair switch which we needed to remove some faults in the backup itself: + +``` +borg check --repair repo +``` diff --git a/topics/systems/ci-cd.gmi b/topics/systems/ci-cd.gmi index 6aa17f2..a1ff2e3 100644 --- a/topics/systems/ci-cd.gmi +++ b/topics/systems/ci-cd.gmi @@ -31,7 +31,7 @@ Arun has figured out the CI part. It runs a suitably configured laminar CI servi CD hasn't been figured out. Normally, Guix VMs and containers created by `guix system` can only access the store read-only. Since containers don't have write access to the store, you cannot `guix build' from within a container or deploy new containers from within a container. This is a problem for CD. How do you make Guix containers have write access to the store? -Another alternative for CI/ CID were to have the quick running tests, e.g unit tests, run on each commit to branch "main". Once those are successful, the CI/CD system we choose should automatically pick the latest commit that passed the quick running tests for for further testing and deployment, maybe once an hour or so. Once the next battery of tests is passed, the CI/CD system will create a build/artifact to be deployed to staging and have the next battery of tests runs against it. If that passes, then that artifact could be deployed to production, and details on the commit and +Another alternative for CI/ CD were to have the quick running tests, e.g unit tests, run on each commit to branch "main". Once those are successful, the CI/CD system we choose should automatically pick the latest commit that passed the quick running tests for for further testing and deployment, maybe once an hour or so. Once the next battery of tests is passed, the CI/CD system will create a build/artifact to be deployed to staging and have the next battery of tests runs against it. If that passes, then that artifact could be deployed to production, and details on the commit and #### Possible Steps @@ -90,3 +90,49 @@ This contains a check-list of things that need to be done: => /topics/systems/orchestration Orchestration => /issues/broken-cd Broken-cd (Resolved) + +## Adding a web-hook + +### Github hooks + +IIRC actions run artifacts inside github's infrastracture. We use webhooks: e.g. + +Update the hook at + +=> https://github.com/genenetwork/genenetwork3/settings/hooks + +=> ./screenshot-github-webhook.png + +To trigger CI manually, run this with the project name: + +``` +curl https://ci.genenetwork.org/hooks/example-gn3 +``` + +For gemtext we have a github hook that adds a forge-project and looks like + +```lisp +(define gn-gemtext-threads-project + (forge-project + (name "gn-gemtext-threads") + (repository "https://github.com/genenetwork/gn-gemtext-threads/") + (ci-jobs (list (forge-laminar-job + (name "gn-gemtext-threads") + (run (with-packages (list nss-certs openssl) + (with-imported-modules '((guix build utils)) + #~(begin + (use-modules (guix build utils)) + + (setenv "LC_ALL" "en_US.UTF-8") + (invoke #$(file-append tissue "/bin/tissue") + "pull" "issues.genenetwork.org")))))))) + (ci-jobs-trigger 'webhook))) +``` + +Guix forge can be found at + +=> https://git.systemreboot.net/guix-forge/ + +### git.genenetwork.org hooks + +TBD diff --git a/topics/systems/dns-changes.gmi b/topics/systems/dns-changes.gmi index 7f1d8f1..30aae58 100644 --- a/topics/systems/dns-changes.gmi +++ b/topics/systems/dns-changes.gmi @@ -27,6 +27,7 @@ We are moving thing to a new DNS hosting service. We have accounts on both. To m * Import DNS settings on DNSimple (cut-N-paste) + Edit delegation - make sure the delegation box is set => https://support.dnsimple.com/articles/delegating-dnsimple-registered + + Registration menu item comes up after transfer... * Approve transfer on GoDaddy a few minutes later (!!), see + https://dcc.godaddy.com/control/transfers * Add DNSSec diff --git a/topics/systems/hpc/performance.gmi b/topics/systems/hpc/performance.gmi index ce6a111..ac5e861 100644 --- a/topics/systems/hpc/performance.gmi +++ b/topics/systems/hpc/performance.gmi @@ -12,6 +12,23 @@ For disk speeds make sure there is no load and run hdparm -Ttv /dev/sdc1 ``` +Cheap and cheerful: + +Read test: + +``` +dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct +``` + +Write test: + +``` +/sbin/sysctl -w vm.drop_caches=3 +dd if=./test of=/dev/zero bs=512k count=2048 +``` + + + ## Networking To check the network devices installed use diff --git a/topics/systems/linux/add-boot-partition.gmi b/topics/systems/linux/add-boot-partition.gmi new file mode 100644 index 0000000..564e044 --- /dev/null +++ b/topics/systems/linux/add-boot-partition.gmi @@ -0,0 +1,52 @@ +# Add (2nd) boot and other partitions + +As we handle machines remotely it is often useful to have a secondary boot partition that can be used from grub. + +Basically, create a similar sized boot partition on a different disk and copy the running one over with: + +``` +parted -a optimal /dev/sdb +(parted) p +Model: NVMe CT4000P3SSD8 (scsi) +Disk /dev/sdb: 4001GB +Sector size (logical/physical): 512B/512B +Partition Table: gpt +Disk Flags: + +Number Start End Size File system Name Flags + 1 32.0GB 4001GB 3969GB ext4 bulk + +(parted) rm 1 +mklabel gpt +mkpart fat23 1 1GB +set 1 esp on +align-check optimal 1 +mkpart ext4 1GB 32GB +mkpart swap 32GB 48GB +set 2 boot on # other flags are raid, swap, lvm +set 3 swap on +mkpart scratch 48GB 512GB +mkpart ceph 512GB -1 +``` + +We also took the opportunity to create a new scratch partition (for moving things around) and a ceph partition (for testing). +Resulting in + +``` +Number Start End Size File system Name Flags + 1 1049kB 1000MB 999MB fat23 boot, esp + 2 1000MB 24.0GB 23.0GB ext4 boot, esp + 3 24.0GB 32.0GB 8001MB swap swap + 4 32.0GB 512GB 480GB ext4 scratch + 5 512GB 4001GB 3489GB ceph +``` + +Now we have the drive ready we can copy the existing boot partitions and make sure you don't get it wrong and the target partitiong is larger. +Here the original boot disk is /dev/sda (894Gb). We copy that to the new disk /dev/sdb (3.64Tb) + +``` +root@tux05:/home/wrk# dd if=/dev/sda1 of=/dev/sdb1 +root@tux05:/home/wrk# dd if=/dev/sda2 of=/dev/sdb2 +``` + +Next, test mount the dirs and reboot. You make want to run e2fsck and resize2fs on the new partitions (or their equivalent if you use xfs or something). diff --git a/topics/systems/mariadb/mariadb.gmi b/topics/systems/mariadb/mariadb.gmi index ae0ab19..ec8b739 100644 --- a/topics/systems/mariadb/mariadb.gmi +++ b/topics/systems/mariadb/mariadb.gmi @@ -16,6 +16,8 @@ To install Mariadb (as a container) see below and Start the client and: ``` +mysql +show databases MariaDB [db_webqtl]> show binary logs; +-----------------------+-----------+ | Log_name | File_size | @@ -60,4 +62,11 @@ Stop the running mariadb-guix.service. Restore the latest backup archive and ove => https://www.borgbackup.org/ Borg => https://borgbackup.readthedocs.io/en/stable/ Borg documentation -# +# Upgrade mariadb + +It is wise to upgrade mariadb once in a while. In a disaster recovery it is better to move forward in versions too. +Before upgrading make sure there is a decent backup of the current setup. + +See also + +=> issues/systems/tux04-disk-issues.gmi diff --git a/topics/systems/mariadb/precompute-mapping-input-data.gmi b/topics/systems/mariadb/precompute-mapping-input-data.gmi index 0c89fe5..977120d 100644 --- a/topics/systems/mariadb/precompute-mapping-input-data.gmi +++ b/topics/systems/mariadb/precompute-mapping-input-data.gmi @@ -49,10 +49,29 @@ The original reaper precompute lives in => https://github.com/genenetwork/genenetwork2/blob/testing/scripts/maintenance/QTL_Reaper_v6.py -This script first fetches inbredsets +More recent incarnations are at v8, including a PublishData version that can be found in + +=> https://github.com/genenetwork/genenetwork2/tree/testing/scripts/maintenance + +Note that the locations are on space: + +``` +cd /mount/space2/lily-clone/acenteno/GN-Data +ls -l +python QTL_Reaper_v8_space_good.py 116 +-- +python UPDATE_Mean_MySQL_tab.py +cd /mount/space2/lily-clone/gnshare/gn/web/webqtl/maintainance +ls -l +python QTL_Reaper_cal_lrs.py 7 +``` + +The first task is to prepare an update script that can run a set at a time and compute GEMMA output (instead of reaper). + +The script first fetches inbredsets ``` - select Id,InbredSetId,InbredSetName,Name,SpeciesId,FullName,public,MappingMethodId,GeneticType,Family,FamilyOrder,MenuOrderId,InbredSetCode from InbredSet LIMIT 5; +select Id,InbredSetId,InbredSetName,Name,SpeciesId,FullName,public,MappingMethodId,GeneticType,Family,FamilyOrder,MenuOrderId,InbredSetCode from InbredSet LIMIT 5; +----+-------------+-------------------+----------+-----------+-------------------+--------+-----------------+-------------+--------------------------------------------------+-------------+-------------+---------------+ | Id | InbredSetId | InbredSetName | Name | SpeciesId | FullName | public | MappingMethodId | GeneticType | Family | FamilyOrder | MenuOrderId | InbredSetCode | +----+-------------+-------------------+----------+-----------+-------------------+--------+-----------------+-------------+--------------------------------------------------+-------------+-------------+---------------+ diff --git a/topics/systems/migrate-p2.gmi b/topics/systems/migrate-p2.gmi deleted file mode 100644 index c7fcb90..0000000 --- a/topics/systems/migrate-p2.gmi +++ /dev/null @@ -1,12 +0,0 @@ -* Penguin2 crash - -This week the boot partition of P2 crashed. We have a few lessons here, not least having a fallback for all services ;) - -* Tasks - -- [ ] setup space.uthsc.edu for GN2 development -- [ ] update DNS to tux02 128.169.4.52 and space 128.169.5.175 -- [ ] move CI/CD to tux02 - - -* Notes diff --git a/topics/systems/restore-backups.gmi b/topics/systems/restore-backups.gmi index 518c56d..b97af2b 100644 --- a/topics/systems/restore-backups.gmi +++ b/topics/systems/restore-backups.gmi @@ -26,7 +26,7 @@ The last backup on 'tux02' is from October 2022 - after I did a reinstall. That According to sheepdog the drops are happening to 'space' and 'epysode', but 'tux02' is missing: -=> https://rabbit.genenetwork.org/sheepdog/index.html +=> http://sheepdog.genenetwork.org/sheepdog/status.html ## Mariadb diff --git a/topics/systems/screenshot-github-webhook.png b/topics/systems/screenshot-github-webhook.png Binary files differnew file mode 100644 index 0000000..08feed3 --- /dev/null +++ b/topics/systems/screenshot-github-webhook.png diff --git a/topics/systems/synchronising-the-different-environments.gmi b/topics/systems/synchronising-the-different-environments.gmi new file mode 100644 index 0000000..207b234 --- /dev/null +++ b/topics/systems/synchronising-the-different-environments.gmi @@ -0,0 +1,68 @@ +# Synchronising the Different Environments + +## Tags + +* status: open +* priority: +* type: documentation +* assigned: fredm +* keywords: doc, docs, documentation + +## Introduction + +We have different environments we run for various reasons, e.g. + +* Production: This is the user-facing environment. This is what GeneNetwork is about. +* gn2-fred: production-adjacent. It is meant to test out changes before they get to production. It is **NOT** meant for users. +* CI/CD: Used for development. The latest commits get auto-deployed here. It's the first place (outside of developer machines) where errors and breakages are caught and/or revealed. This will break a lot. Do not expose to users! +* staging: Uploader environment. This is where Felix, Fred and Arthur flesh out the upload process, and tasks, and also test out the uploader. + +These different environments demand synchronisation, in order to have mostly similar results and failure modes. + +## Synchronisation of the Environments + +### Main Database: MariaDB + +* [ ] TODO: Describe process + +=> https://issues.genenetwork.org/topics/systems/restore-backups Extract borg archive +* Automate? Will probably need some checks for data sanity. + +### Authorisation Database + +* [ ] TODO: Describe process + +* Copy backup from production +* Update/replace GN2 client configs in database +* What other things? + +### Virtuoso/RDF + +* [ ] TODO: Describe process + +* Copy TTL (Turtle) files from (where?). Production might not always be latest source of TTL files. +=> https://issues.genenetwork.org/issues/set-up-virtuoso-on-production Run setup to "activate" database entries +* Can we automate this? What checks are necessary? + +## Genotype Files + +* [ ] TODO: Describe process + +* Copy from source-of-truth (currently Zach's tux01 and/or production). +* Rsync? + +### gn-docs + +* [ ] TODO: Describe process + +* Not sure changes from other environments should ever take + +### AI Summaries (aka. gnqna) + +* [ ] TODO: Describe process + +* Update configs (should be once, during container setup) + +### Others? + +* [ ] TODO: Describe process diff --git a/topics/systems/update-production-checklist.gmi b/topics/systems/update-production-checklist.gmi new file mode 100644 index 0000000..b17077b --- /dev/null +++ b/topics/systems/update-production-checklist.gmi @@ -0,0 +1,182 @@ +# Update production checklist + + +# Tasks + +* [X] Install underlying Debian +* [X] Get guix going +* [ ] Check database +* [ ] Check gemma working +* [ ] Check global search +* [ ] Check authentication +* [ ] Check sending E-mails +* [ ] Make sure info.genenetwork.org can reach the DB +* [ ] Backups + +The following are at the system level + +* [ ] Make journalctl presistent +* [ ] Update certificates in CRON +* [ ] Run trim in CRON + +# Install underlying Debian + +For our production systems we use Debian as a base install. Once installed: + +* [X] set up git in /etc and limit permissions to root user +* [X] add ttyS0 support for grub and kernel - so out-of-band works +* [X] start ssh server and configure not to use with passwords +* [X] start nginx and check external networking +* [ ] set up E-mail routing + +It may help to mount the old root if you have it. Now it is on + +``` +mount /dev/sdd2 /mnt/old-root/ +``` + +# Get Guix going + +* [X] Install Guix daemon +* [X] Move /gnu/store to larger partition +* [X] Update Guix daemon and setup in systemd +* [X] Make available in /usr/local/guix-profiles +* [X] Clean up /etc/profile + +We can bootstrap with the Debian guix package. Next move the store to a large partion and hard mount it in /etc/fstab with + +``` +/export2/gnu /gnu none defaults,bind 0 0 +``` + +Run guix pull + +``` +wrk@tux04:~$ guix pull -p ~/opt/guix-pull --url=https://codeberg.org/guix/guix-mirror.git +``` + +Use that to install guix in /usr/local/guix-profiles + +``` +guix package -i guix -p /usr/local/guix-profiles/guix +``` + +and update the daemon in systemd accordingly. After that I tend to remove /usr/bin/guix + +The Debian installer configures guix. I tend to remove the profiles from /etc/profile so people have a minimal profile. + +# Check database + +* [X] Install mariadb +* [ ] Recover database +* [ ] Test permissions +* [ ] Mariadb update my.cnf + +Basically recover the database from a backup is the best start and set permissions. We usually take the default mariadb unless production is already on a newer version - so we move to guix deployment. + +On tux02 mariadb-10.5.8 is running. On Debian it is now 10.11.11-0+deb12u1, so we should be good. On Guix is 10.10 at this point. + +``` +apt-get install mariadb-server +``` + +Next unpack the database files and set permissions to the mysql user. And (don't forget) update the /etc/mysql config files. + +Restart mysql until you see: + +``` +mysql -u webqtlout -p -e "show databases" ++---------------------------+ +| Database | ++---------------------------+ +| 20081110_uthsc_dbdownload | +| db_GeneOntology | +| db_webqtl | +| db_webqtl_s | +| go | +| information_schema | +| kegg | +| mysql | +| performance_schema | +| sys | ++---------------------------+ +``` + +=> topics/systems/mariadb/mariadb.gmi + +## Recover database + +We use borg for backups. First restore the backup on the PCIe. Also a test for overheating! + + +# Check sending E-mails + +The swaks package is quite useful to test for a valid receive host: + +``` +swaks --to testing-my-server@gmail.com --server smtp.uthsc.edu +=== Trying smtp.uthsc.edu:25... +=== Connected to smtp.uthsc.edu. +<- 220 mailrouter8.uthsc.edu ESMTP NO UCE + -> EHLO tux04.uthsc.edu +<- 250-mailrouter8.uthsc.edu +<- 250-PIPELINING +<- 250-SIZE 26214400 +<- 250-VRFY +<- 250-ETRN +<- 250-STARTTLS +<- 250-ENHANCEDSTATUSCODES +<- 250-8BITMIME +<- 250-DSN +<- 250 SMTPUTF8 + -> MAIL FROM:<root@tux04.uthsc.edu> +<- 250 2.1.0 Ok + -> RCPT TO:<pjotr2020@thebird.nl> +<- 250 2.1.5 Ok + -> DATA +<- 354 End data with <CR><LF>.<CR><LF> + -> Date: Thu, 06 Mar 2025 08:34:24 +0000 + -> To: pjotr2020@thebird.nl + -> From: root@tux04.uthsc.edu + -> Subject: test Thu, 06 Mar 2025 08:34:24 +0000 + -> Message-Id: <20250306083424.624509@tux04.uthsc.edu> + -> X-Mailer: swaks v20201014.0 jetmore.org/john/code/swaks/ + -> + -> This is a test mailing + -> + -> + -> . +<- 250 2.0.0 Ok: queued as 4157929DD + -> QUIT +<- 221 2.0.0 Bye === Connection closed with remote host +``` + +An exim configuration can be + +``` +dc_eximconfig_configtype='smarthost' +dc_other_hostnames='genenetwork.org' +dc_local_interfaces='127.0.0.1 ; ::1' +dc_readhost='' +dc_relay_domains='' +dc_minimaldns='false' +dc_relay_nets='' +dc_smarthost='smtp.uthsc.edu' +CFILEMODE='644' +dc_use_split_config='false' +dc_hide_mailname='false' +dc_mailname_in_oh='true' +dc_localdelivery='maildir_home' +``` + +And this should work: + +``` +swaks --to myemailaddress --from john@uthsc.edu --server localhost +``` + +# Backups + +* [ ] Create an ibackup user. +* [ ] Install borg (usually guix version) +* [ ] Create a borg passphrase diff --git a/topics/systems/virtuoso.gmi b/topics/systems/virtuoso.gmi index e911a8b..94a15f0 100644 --- a/topics/systems/virtuoso.gmi +++ b/topics/systems/virtuoso.gmi @@ -104,7 +104,7 @@ After running virtuoso, you will want to change the default password of the `dba In a typical production virtuoso installation, you will want to change the password of the dba user and disable the dav user. Here are the commands to do so. Pay attention to the single versus double quoting. ``` -SQL> set password "dba" "rFw,OntlJ@Sz"; +SQL> set password "dba" "dba"; SQL> UPDATE ws.ws.sys_dav_user SET u_account_disabled=1 WHERE u_name='dav'; SQL> CHECKPOINT; ``` diff --git a/topics/testing/mechanical-rob.gmi b/topics/testing/mechanical-rob.gmi index 9413b47..baf111a 100644 --- a/topics/testing/mechanical-rob.gmi +++ b/topics/testing/mechanical-rob.gmi @@ -1,9 +1,74 @@ # Mechanical Rob -We need to run Mechanical Rob tests as part of our continuous integration tests. +## Tags -The Mechanical Rob CI tests are functioning again now. To see how to run Mechanical Rob, see the CI job definition in the genenetwork-machines repo. +* type: documentation, docs +* assigned: bonfacem, rookie101, fredm +* priority: medium +* status: open +* keywords: tests, testing, mechanical-rob -=> genenetwork-machines/src/branch/main/genenetwork-development.scm +## What is Mechanical Rob? -The invocation procedure is bound to change as the many environment variables in genenetwork2 are cleared up. +Mechanical Rob is our name for what could be considered our integration tests. + +The idea is that we observe how Prof. Robert Williams (Rob) (and other scientists) use(s) GeneNetwork and create a "mechanical" facsimile of that. The purpose is to ensure that the system works correctly with each and every commit in any of our various repositories. + +If any commit causes any part of the Mechanical Rob system to raise an error, then we know, immediately, that something is broken, and the culprit can get onto fixing that with haste. + +## Show Me Some Code!!! + +Nice! I like your enthusiasm. + +You can find the +=> https://github.com/genenetwork/genenetwork2/tree/testing/test/requests Mechanical Rob code here +within the genenetwork2 repository. + +You can also see how it is triggered in the gn-machines repository in +=> https://git.genenetwork.org/gn-machines/tree/genenetwork-development.scm this module. +Search for "genenetwork2-mechanical-rob" within that module and you should find how the system is triggered. + +## How About Running it Locally + +All the above is nice and all, but sometimes you just want to run the checks locally. + +In that case, you can run Mechanical Rob locally by following the steps below: +(note that these steps are mostly the same ones to run GN2 locally). + + +1. Get a guix shell for GN2 development: +``` +$ cd genenetwork2/ +$ guix shell --container --network \ + --expose=</path/to/directory/with/genotypes> \ + --expose=</path/to/local/genenetwork3> \ + --expose=</path/to/setting/file> \ + --expose=</path/to/secrets/file> \ + --file=guix.scm bash +``` +The last `bash` is to ensure we install the Bourne-Again Shell whic we use to launch the application. The `</path/to/local/genenetwork3>` can be omitted if you do not need the latest code in GN3 to be included in your running GN2. + +2. Set up the appropriate environment variables: +``` +[env]$ export HOME=</path/to/home/directory> +[env]$ export GN2_SETTINGS=</path/to/settings/file> +[env]$ export SERVER_PORT=5003 +[env]$ export GN2_PROFILE="${GUIX_ENVIRONMENT}" +[env]$ export GN3_PYTHONPATH=</path/to/local/genenetwor3> # Only needed if you need to test GN3 updates +``` + +3. Run the mechanical-rob tests +``` +[env]$ bash bin/genenetwork2 gn2/default_settings.py -c \ + test/requests/test-website.py \ + --all "http://localhost:${SERVER_PORT}" +``` +Of course, here we are assuming that `SERVER_PORT` has the value of the port on which GN2 is running. + + +## Possible Improvements + +Look into using geckodriver to help with the mechanical-rob tests. +`geckodriver` comes with the +=> https://icecatbrowser.org/index.html GNU IceCat browser +which is present as a package in GNU Guix. diff --git a/topics/xapian/xapian-indexing.gmi b/topics/xapian/xapian-indexing.gmi index 1c82018..68ab7a6 100644 --- a/topics/xapian/xapian-indexing.gmi +++ b/topics/xapian/xapian-indexing.gmi @@ -2,18 +2,48 @@ Due to the enormous size of the GeneNetwork database, indexing it in a reasonable amount of time is a tricky process that calls for careful identification and optimization of the performance bottlenecks. This document is a description of how we achieve it. -Indexing happens in the following three phases. +Indexing happens in these phases. * Phase 1: retrieve data from SQL -* Phase 2: index text -* Phase 3: write Xapian index to disk +* Phase 2: retrieve metadata from RDF +* Phase 3: index text +* Phase 4: write Xapian index to disk -Phases 1 and 3 (that is, the retrieval of data from SQL and writing of the Xapian index to disk) are I/O bound processes. Phase 2 (the actual indexing of text) is CPU bound. So, we parallelize phase 2 while keeping phases 1 and 3 sequential. +Phases 1, 2 and 4 are I/O bound processes. Phase 3 (the actual indexing of text) is CPU bound. So, we parallelize phase 2 while keeping phases 1, 2 and 3 sequential. -There is a long delay in retrieving data from SQL and loading it into memory. In this time, the CPU is waiting on I/O and idling away. In order to avoid this, we retrieve SQL data chunk by chunk and spawn off phase 2 worker processes. Thus, we interleave phase 1 and 2 so that they don't block each other. Despite this, on tux02, the indexing script is only able to keep around 10 of the 128 CPUs busy. As phase 1 is dishing out jobs to phase 2 worker processes, before it can finish dishing out jobs to all 128 CPUs, the earliest worker processes finish and exit. The only way to avoid this and improve CPU utilization would be to further optimize the I/O of phase 1. +There is a long delay in retrieving data from SQL and loading it into memory. In this time, the CPU is waiting on I/O and idling away. In order to avoid this, we retrieve SQL data chunk by chunk and spawn off phase 3 worker processes. We get RDF data in one large call before any processing is done. Thus, we interleave phase 1 and 3 so that they don't block each other. Despite this, on tux02, the indexing script is only able to keep around 10 of the 128 CPUs busy. As phase 1 is dishing out jobs to phase 2 worker processes, before it can finish dishing out jobs to all 128 CPUs, the earliest worker processes finish and exit. The only way to avoid this and improve CPU utilization would be to further optimize the I/O of phase 1. Building a single large Xapian index is not scalable. See detailed report on Xapian scalability. => xapian-scalability So, we let each process of phase 2 build its own separate Xapian index. Finally, we compact and combine them into one large index. When writing smaller indexes in parallel, we take care to lock access to the disk so that only one process is writing to the disk at any given time. If many processes try to simultaneously write to the disk, the write speed is slowed down, often considerably, due to I/O contention. -It is important to note that the performance bottlenecks identified in this document are machine-specific. For example, on my laptop with only 2 cores, CPU performance in phase 2 is the bottleneck. Phase 1 I/O waits on the CPU to finish instead of the other way around. +It is important to note that the performance bottlenecks identified in this document are machine-specific. For example, on my laptop with only 2 cores, CPU performance in phase 3 is the bottleneck. Phase 1 I/O waits on the CPU to finish instead of the other way around. + +## Local Development + +For local development, see: + +=> https://issues.genenetwork.org/topics/database/working-with-virtuoso-locally Working with Virtuoso for Local Development + +Ping @bmunyoki for the ttl folder backups. + +Set up mysql with instructions from + +=> https://issues.genenetwork.org/topics/database/setting-up-local-development-database + +and load up the backup file using: + +> mariadb gn2 < /path/to/backup/file.sql + +A backup file can be generated using: + +> mysqldump -u mysqluser -pmysqlpasswd --opt --where="1 limit 100000" db_webqtl > out.sql +> xz out.sql + +And run the index script using: + +> python3 scripts/index-genenetwork create-xapian-index /tmp/xapian "mysql://gn2:password@localhost/gn2" "http://localhost:8890/sparql" + +Verify the index with: + +> xapian-delve /tmp/xapian |