diff options
197 files changed, 13290 insertions, 475 deletions
@@ -1,3 +1,5 @@ /**/*~ index.gmi tracker.gmi +.aider* +.tissue/**/* diff --git a/issues/CI-CD/configurations.gmi b/issues/CI-CD/configurations.gmi index 54cea47..acd2512 100644 --- a/issues/CI-CD/configurations.gmi +++ b/issues/CI-CD/configurations.gmi @@ -4,7 +4,7 @@ * assigned: aruni, fredm * priority: normal -* status: open +* status: closed, completed * keywords: CI, CD, configuration, config * type: bug @@ -38,3 +38,7 @@ and at least one of the values other than "localhost" is used to determine the c The secrets (e.g. SECRET_KEY, OAUTH_CLIENT_ID, OAUTH_CLIENT_SECRET, etc) can be encrypted and stored in some secrets management system (e.g. Pass [https://www.passwordstore.org/] etc.) setup in each relevant host: better yet, have all configurations (secret or otherwise) encrypted and stored in such a secrets management system and fetch them from there. This reduces the mental overhead of dealing with multiple places to fetch the configs. From these, the CI/CD system can them build and intern the configurations into the store with guix functions like "plain-file", "local-file", etc. + +## Notes + +This idea was mostly rejected — it seems — in favour of using external settings files that are shared with the running container and separate build scripts for the different environments. This mostly covers all the bases necessary to get the settings correct. diff --git a/issues/add-documentation-and-data-retrieval-for-AI-repo.gmi b/issues/add-documentation-and-data-retrieval-for-AI-repo.gmi index 11f8f30..a96c18d 100644 --- a/issues/add-documentation-and-data-retrieval-for-AI-repo.gmi +++ b/issues/add-documentation-and-data-retrieval-for-AI-repo.gmi @@ -6,7 +6,6 @@ * priority: high * type: ui * keywords: phenotypes -* status: stalled ## Description @@ -15,3 +14,4 @@ * Share alternate way of getting sparql json-ld data from public endpoint outside isql. * Share json-ld gotchas. +* closed diff --git a/issues/add-genotype-files-to-rdf.gmi b/issues/add-genotype-files-to-rdf.gmi index 85ac39c..856c070 100644 --- a/issues/add-genotype-files-to-rdf.gmi +++ b/issues/add-genotype-files-to-rdf.gmi @@ -3,7 +3,7 @@ ## Tags * assigned: bonfacem * type: bug -* status: open, in progress +* status: stalled In Penguin2, genotype files are located in: /export/data/genenetwork/genotype_files/genotype. Each genotype files has an identifier to a dataset it refers to: diff --git a/issues/add-unique-identifiers-for-case-attributes.gmi b/issues/add-unique-identifiers-for-case-attributes.gmi new file mode 100644 index 0000000..0c3123d --- /dev/null +++ b/issues/add-unique-identifiers-for-case-attributes.gmi @@ -0,0 +1,11 @@ +# Add Case Attributes to RDF + +## Tags + +* assigned: bonfacem +* priority: high +* status: open + +## Description + +Add case attributes and their metadata into RDF. diff --git a/issues/assorted-ui-issues.gmi b/issues/assorted-ui-issues.gmi new file mode 100644 index 0000000..5fbacea --- /dev/null +++ b/issues/assorted-ui-issues.gmi @@ -0,0 +1,36 @@ +# Various UI issues raised by Rob (8/19/2024) + +# Tags + +* assigned: zsloan +* keywords: user-interface +* priority: medium +* open + +## Tasks + +* [X] Fix collection encoding issue + +* [X] Don't import empty collections (like the Default Collection) + +* [X] Update/Creation dates aren't listed for collections + +* [X] Remove in-between ticks for Effect Size Plot (from mapping page) so it's just -1/0/1 + +* [X] Also make Effect Size Plot more narrow + +* [X] Prevent X/Y-aix summary text from extending beyond the graph width + +* [X] Longer tick markers as well + +* [X] Remove triangle for phenotype mapping + +* [X] Remove ProbeSetPosition from mapping for traits with no position + +* [X] Make Haplotype legend image thicker + change text to Haplotypes (Mat, Pat, Het, Unknown) + +* [X] Change "Sequence Site" in legend to "Gene Location" + +* [X] When adding genotype marker as covariate (for scatter-plot, maybe also mapping), change description to Position instead of "undefined" + +* [ ] Check Add Covariation colorbox popup on Apple laptop (it shows up weird for Rob, but normal for me) diff --git a/issues/auth/reset-password-feature.gmi b/issues/auth/reset-password-feature.gmi index 8eaaa6a..299f915 100644 --- a/issues/auth/reset-password-feature.gmi +++ b/issues/auth/reset-password-feature.gmi @@ -1,6 +1,16 @@ # Reset/Forgot Password Feature for GN2 +# Tags + * assigned: fredm -* tags: critical +* priority: critical +* status: closed +* keywords: gn-auth, auth, reset password +* type: feature-request + +## Description Should a user forget his/her password, there's no clear way to reset the password. + +This issue is +=> https://git.genenetwork.org/gn-auth/tree/gn_auth/auth/authorisation/users/views.py?id=e829074e99fd5bec033765d18d5efa55e1edce44#n454 implemented with the latest code. diff --git a/issues/cleanup-base-file-gn2.gmi b/issues/cleanup-base-file-gn2.gmi new file mode 100644 index 0000000..8a05323 --- /dev/null +++ b/issues/cleanup-base-file-gn2.gmi @@ -0,0 +1,30 @@ +# Cleanup GN2 Base HTML File + +## Tags + +* Assigned: alexm +* Keywords: base, HTML, JavaScript, cleanup +* type: Refactoring +* Status: closed, completed, done + +## Description + +The base file should contain no custom JavaScript since it is inherited in almost all files in GN2. It should only include what is necessary. As a result, we need to move the global search from the base file to the index page, which renders the GN2 home. + +## Tasks + +* [x] Remove global search code from the base file and move it to the index page +* [x] Fix formatting and linting issues in the base file (e.g., tags) +* [x] Inherit from index page for all gn2 templates + + +## Notes + +See the PR that seeks to fix this: +=> https://github.com/genenetwork/genenetwork2/pull/877 + +## Notes 26/09/2024 + +It was agreed that global search should be a feature for all pages, +As such all files need to inherit from the global search which +defines the global search.
\ No newline at end of file diff --git a/issues/create-custom-rif-xapian-index.gmi b/issues/create-custom-rif-xapian-index.gmi new file mode 100644 index 0000000..a0b9039 --- /dev/null +++ b/issues/create-custom-rif-xapian-index.gmi @@ -0,0 +1,16 @@ +# Create Custom RIF XAPIAN Index + +## Tags + +* assigned: bonfacem +* priority: medium +* status: in-progress +* deadline: 2024-10-23 Wed + +## Description + +Given the GN Wiki search page: + +=> https://cd.genenetwork.org/genewiki GeneWiki Entries Search + +We only search by symbol. Add custom XAPIAN index to perform more powerful search. diff --git a/issues/edit-rif-metadata.gmi b/issues/edit-rif-metadata.gmi new file mode 100644 index 0000000..546dc80 --- /dev/null +++ b/issues/edit-rif-metadata.gmi @@ -0,0 +1,121 @@ +# Edit RIF Metadata in GN2 + +## Tags + +* assigned: bonfacem, jnduli +* priority: high +* status: closed + +## Tasks + +### Viewing +* [X] API: Get WIKI/RIF by symbol from rdf. + +> GET /wiki/<symbol> + +``` +[{ + "symbol": "XXXX", + "reason": "XXXX", + "species": "XXXX", + "pubmed_ids": ["XXXX", "XXXX"], // empty array when non-existent + "web_url": "XXXX" // Optional + "comment": "XXXX", + "email": "XXXX", + "categories": ["XXXX", "XXXX"], // Enumeration + "version": "XXXX", + "initial": "XXXX", // Optional user or project code or your initials. +}] +``` + +* [X] UI: Modify traits page to have "GN2 (GeneWiki)" +* [X] UI: Integrate with API + +### Editing + +* [X] API: Edit comment by id in mysql/rdf: modifies GeneRIF and GeneRIFXRef tables. +* [X] API: Modify edit comments by id to include RDF changes. + +> POST /wiki/<comment-id>/edit + +``` +{ + "symbol": "XXXX", + "reason": "XXXX", + "species": "XXXX", + "pubmed_ids": ["XXXX", "XXXX"], // Optional + "web_url": "XXXX" // Optional + "comment": "XXXX", + "email": "XXXX", + "categories": ["XXXX", "XXXX"], // Enumeration + "initial": "XXXX", // Optional user or project code or your initials. +} +``` +* [X] UI: Add buttons that edit various relevant sections. +* [X] UI: Edit page needs to fetch categories from GeneCategory table. When comment write fails, alert with error. When comment write success, update the comment on the page, and alert with success. +* [X] API: Modify edit comments by id to include RDF changes. +* [X] GN auth integration + +### History + +* [X] API: End-point to fetch all the historical data +* [X] UI: Page that contains history for how comments changes. + +> GET /wiki/<comment-id>/history + +``` +[{ + "symbol": "XXXX", + "reason": "XXXX", + "species": "XXXX", + "pubmed_ids": ["XXXX", "XXXX"], // Optional + "web_url": "XXXX" // Optional + "comment": "XXXX", + "email": "XXXX", + "categories": ["XXXX", "XXXX"], // Enumeration + "version": "XXXX", + "initial": "XXXX", // Optional user or project code or your initials. +}] +``` + +### Misc ToDos: + +* [X] Review performance of query used in 72d9a24e8e65 [Genenetwork3] + +### Ops + +* [X] RDF synchronization with SQL (gn-machines). +* [X] Update RDF in tux02. +* [X] UI: Add "edit" button after testing. + +### Resolution + +Genenetwork2: +=> https://github.com/genenetwork/genenetwork2/pull/858 UI/fetch rif using recent apis #858 +=> https://github.com/genenetwork/genenetwork2/pull/864 Add comment history page. #864 +=> https://github.com/genenetwork/genenetwork2/pull/865 Add support for auth in Rif Edit #865 +=> https://github.com/genenetwork/genenetwork2/pull/866 Add a page for searching GeneWiki by symbol. #866 +=> https://github.com/genenetwork/genenetwork2/pull/881 Add display page for NCBI RIF metadata. #881 +=> https://github.com/genenetwork/genenetwork2/pull/881 Add display page for NCBI RIF metadata. #881 +=> https://github.com/genenetwork/genenetwork2/pull/882 GN editting UI improvements #882 + + +GeneNetwork3: +=> https://github.com/genenetwork/genenetwork3/pull/180 Update script that updates Generif_BASIC table #180 +=> https://github.com/genenetwork/genenetwork3/pull/181 Add case insensitive prefixes for rif wiki #181 +=> https://github.com/genenetwork/genenetwork3/pull/184 Api/get wiki from rdf #184 +=> https://github.com/genenetwork/genenetwork3/pull/185 feat: add api calls to get categories and last comment #185 +=> https://github.com/genenetwork/genenetwork3/pull/186 Api/fetch the latest wiki by versionid #186 +=> https://github.com/genenetwork/genenetwork3/pull/187 Api/get end point to fetch all historical data #187 +=> https://github.com/genenetwork/genenetwork3/pull/189 Add auth to edit RIF api call #189 +=> https://github.com/genenetwork/genenetwork3/pull/190 Api/update rif queries #190 +=> https://github.com/genenetwork/genenetwork3/pull/193 Api/edit rif endpoint #193 +=> https://github.com/genenetwork/genenetwork3/pull/194 Fix C0411/C0412 pylint errors in gn3.api.metadata.api.wiki. #194 +=> https://github.com/genenetwork/genenetwork3/pull/195 Add rif tests #195 +=> https://github.com/genenetwork/genenetwork3/pull/196 Handle missing GN3_SECRETS for CI testing. #196 +=> https://github.com/genenetwork/genenetwork3/pull/197 Rif edit atomicity #197 +=> https://github.com/genenetwork/genenetwork3/pull/198 Run tests against Virtuoso that is spun locally. #198 +=> https://github.com/genenetwork/genenetwork3/pull/199 Add rdf-tests after the check phase. #199 +=> https://github.com/genenetwork/genenetwork3/pull/200 Api/ncbi metadata #200 + +* closed diff --git a/issues/editing-dataset-metadata.gmi b/issues/editing-dataset-metadata.gmi index 17d1693..70876e0 100644 --- a/issues/editing-dataset-metadata.gmi +++ b/issues/editing-dataset-metadata.gmi @@ -5,7 +5,7 @@ * assigned: bonfacem * priority: high * type: editing -* status: in-progress +* status: stalled * keywords: metadata editing ## Description diff --git a/issues/error-handling-external-errors.gmi b/issues/error-handling-external-errors.gmi index d1707de..640e1d1 100644 --- a/issues/error-handling-external-errors.gmi +++ b/issues/error-handling-external-errors.gmi @@ -3,7 +3,7 @@ ## Tags * assigned: fredm -* status: open +* status: closed * type: bug * priority: high * keywords: error handling diff --git a/issues/fix-global-search-ui.gmi b/issues/fix-global-search-ui.gmi new file mode 100644 index 0000000..2979d99 --- /dev/null +++ b/issues/fix-global-search-ui.gmi @@ -0,0 +1,24 @@ +# Fix Broken Global Search UI + +## Tags + +* Assigned: alexm, zsloan +* Priority: high +* status: in progress +* Keyword : search, UI, bug, Refactor +* Type: UI, bug + +## Description + +The Global search UI layout is broken on certain browser versions. +This issue was reported to occur for **Firefox Version 128.3.1** ESR Version. +The root cause of the problem is unclear, +but after reviewing the global search UI code, +the following changes need to be implemented (see tasks below): + + + +## Tasks + +* [ ] Remove custom layout CSS and replace it with the Bootstrap layout for better uniformity and easier debugging. +* [ ] Modify the navbar to extend across the full width of the page on medium and small devices. diff --git a/issues/fix-pairscan-mapping.gmi b/issues/fix-pairscan-mapping.gmi new file mode 100644 index 0000000..1b48fee --- /dev/null +++ b/issues/fix-pairscan-mapping.gmi @@ -0,0 +1,28 @@ +# Fix Pairscan Mapping + +## Tags + +* assigned: alexm, +* priority: medium, +* type: bug +* keywords: pairscan, debug, fix, mapping + +## Description +Pairscan mapping is currently not working: + +Error: + +``` +GeneNetwork 3.12-rc1 https://genenetwork.org/run_mapping ( 1:01PM UTC Jan 13, 2025) +Traceback (most recent call last): + File "/gnu/store/cxawl32jm0fgavc9ahcr3g0j66zdan30-profile/lib/python3.10/site-packages/flask/app.py", line 1523, in full_dispatch_request + rv = self.dispatch_request() + File "/gnu/store/cxawl32jm0fgavc9ahcr3g0j66zdan30-profile/lib/python3.10/site-packages/flask/app.py", line 1509, in dispatch_request + return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) + File "/gnu/store/cxawl32jm0fgavc9ahcr3g0j66zdan30-profile/lib/python3.10/site-packages/gn2/wqflask/views.py", line 1035, in mapping_results_page + template_vars = run_mapping.RunMapping(start_vars, + File "/gnu/store/cxawl32jm0fgavc9ahcr3g0j66zdan30-profile/lib/python3.10/site-packages/gn2/wqflask/marker_regression/run_mapping.py", line 312, in __init__ + self.geno_db_exists = geno_db_exists(self.dataset, results[0]['name']) + KeyError: 'name' + +```
\ No newline at end of file diff --git a/issues/fix-rqtl-rm-bug.gmi b/issues/fix-rqtl-rm-bug.gmi new file mode 100644 index 0000000..de71487 --- /dev/null +++ b/issues/fix-rqtl-rm-bug.gmi @@ -0,0 +1,95 @@ +# Investigate and Fix `rm` Command in `rqtl` Logs + +## Tags + +* assigned: alex, bonfacem +* type: Bug +* status: in progress +* keywords: external, qtl, rqtl, bug, logs + +## Description + +For QTL analysis, we invoke the `rqtl` script as an external process through Python's `subprocess` module. +For reference, see the `rqtl_wrapper.R` script: +=> https://github.com/genenetwork/genenetwork3/blob/main/scripts/rqtl_wrapper.R + +The issue is that, upon analyzing the logs for `rqtl`, we see that an `rm` command is unexpectedly invoked: + +``` +sh: line 1: rm: command not found +``` + +This command cannot be traced to its origin, and it does not appear to be part of the expected behavior. + +The issue is currently observed only in the CD environment. The only way I have attempted to reproduce this locally is by invoking the command in a shell environment with string injection, which is not the case for GeneNetwork3, where all strings are parsed and passed as a list argument. + +Here’s an example of the above attempt: + +```python +def run_process(cmd, output_file, run_id): + """Function to execute an external process and capture the stdout in a file. + + Args: + cmd: The command to execute, provided as a list of arguments. + output_file: Absolute file path to write the stdout. + run_id: Unique ID to identify the process. + + Returns: + A dictionary with the results, indicating success or failure. + """ + cmd.append(" && rm") # Injecting potentially problematic command + cmd = " ".join(cmd) # The command is passed as a string + + try: + # Phase: Execute the command in a shell environment + with subprocess.Popen( + cmd, + shell=True, + stdout=subprocess.PIPE, + stderr=subprocess.STDOUT, + ) as process: + # Process output handling goes here +``` + +The error generated at the end of the `rqtl` if the rm run does not exists inside the container is: + +``` +sh: line 1: rm: command not found +``` + +The actual code for GeneNetwork3 is: + +```python +def run_process(cmd, output_file, run_id): + """Function to execute an external process and capture the stdout in a file. + + Args: + cmd: The command to execute, provided as a list of arguments. + output_file: Absolute file path to write the stdout. + run_id: Unique ID to identify the process. + + Returns: + A dictionary with the results, indicating success or failure. + """ + try: + # Phase: Execute the command in a shell environment + with subprocess.Popen( + cmd, + stdout=subprocess.PIPE, + stderr=subprocess.STDOUT, + ) as process: + # Process output handling goes here +``` + +## Investigated and Excluded Possibilities + +* [x] The `rm` command is not explicitly invoked within the `rqtl` script. +* [x] The `rqtl` command is passed as a list of parsed arguments (i.e., no direct string injection). +* [x] The subprocess is not invoked within a shell environment, which would otherwise result in string injection. +* [x] We simulated invoking a system command within the `rqtl` script, but the error does not match the observed issue. + +## TODO + +* [ ] Test in a similar environment to the CD environment to replicate the issue. + +* [ ] Investigate the internals of the QTL library for any unintended `rm` invocation. diff --git a/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi b/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi new file mode 100644 index 0000000..a0b2c5c --- /dev/null +++ b/issues/gemma/gemma2-has-different-output-from-rqtl2.gmi @@ -0,0 +1,80 @@ +# GEMMA output differs from R/qtl2 + +# Tags + +* assigned: pjotrp, davea +* priority: high +* type: bug, enhancement +* status: closed +* keywords: database, gemma, reaper, rqtl2 + +# Description + +When running trait BXD_21526 results differ significantly. + +=> https://genenetwork.org/show_trait?trait_id=21526&dataset=BXDPublish +=> https://genenetwork.org/show_trait?trait_id=21529&dataset=BXDPublish + +So I confirm I am getting the same results as Dave in GN for GEMMA (see Conclusion below). + +# Tasks + +## GeneNetwork + +I run GEMMA for precompute on the command line and that I confirmed to +be the same as what we see in the browser. This suggests either data +or method is different with Dave's approach. + +I confirmed that gemma in GN matches Dave's results. It is interesting +to see that running without LOCO has some impact, but not as bad as +the R/qtl2 difference. First we should check the genotype files to see +if they match. I checked that the phenotypes match. + +Our inputs are different if I count genotypes (first yours, the other +on production): + +``` + 1 2184941 B + 2 2132744 D + 3 628980 H + 1 2195662 B + 2 2142959 D + 3 650168 H +``` + +The number of rows/markers is the same. So we probably added some +genometypes, but if we miss one that would matter. Dave you can find +the file in /home/wrk/BXD.geno on tux02 if you want to look. + +I notice that we don't use H in the R/qtl2 control file. That +might make a difference though it probably won't explain what we see +now. BTW I also correlated the LOD scores from GEMMA and R/qtl2 in +the spreadsheet and at 0.7 that is too low. So it is probably not +just a magnitude problem. The results differ a lot in your +spreadsheet. + +Next step is that I need to run R/qtl2 using the script in your +dropbox and see what Karl's code does. The exercise does not hurt +because it will help us bring R/qtl2 to GN. + +## R/qtl2 + +R/qtl2 is packaged in guix and can be run in a shell with + +``` +guix shell -C r r-qtl2 +> library(qtl2) +> bxd <- read_cross2(file = "bxd_cancer_new_GN_July_2024.json") +Warning messages: +1: In recode_geno(sheet, genotypes) : + 630519 genotypes treated as missing: "H", "U" +2: In matrix(as.numeric(unlist(pheno)), ncol = nc) : + NAs introduced by coercion +3: In check_cross2(output) : Physical map out of order on chr 1, 2, 11, 19 +``` + +The first warning matches above. If data is missing it may be filtered out. We'll have to check for that. The third warning I am not sure about. Probably a ranking of markers. + +# Conclusion + +It turned out that R/qtl was running HK - so it was a QTL mapping rather than an LMM. diff --git a/issues/genenetwork/cannot-connect-to-mariadb.gmi b/issues/genenetwork/cannot-connect-to-mariadb.gmi new file mode 100644 index 0000000..3dfe1bc --- /dev/null +++ b/issues/genenetwork/cannot-connect-to-mariadb.gmi @@ -0,0 +1,121 @@ +# Cannot Connect to MariaDB + + +## Description + +GeneNetwork3 is failing to connect to mariadb with the error: + +``` +⋮ +2024-11-05 14:49:00 Traceback (most recent call last): +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/flask/app.py", line 1523, in full_dispatch_request +2024-11-05 14:49:00 rv = self.dispatch_request() +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/flask/app.py", line 1509, in dispatch_request +2024-11-05 14:49:00 return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/gn3/api/menu.py", line 13, in generate_json +2024-11-05 14:49:00 with database_connection(current_app.config["SQL_URI"], logger=current_app.logger) as conn: +2024-11-05 14:49:00 File "/gnu/store/lzw93sik90d780n09svjx5la1bb8g3df-python-3.10.7/lib/python3.10/contextlib.py", line 135, in __enter__ +2024-11-05 14:49:00 return next(self.gen) +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/gn3/db_utils.py", line 34, in database_connection +2024-11-05 14:49:00 connection = mdb.connect(db=db_name, +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/MySQLdb/__init__.py", line 121, in Connect +2024-11-05 14:49:00 return Connection(*args, **kwargs) +2024-11-05 14:49:00 File "/gnu/store/83v79izrqn36nbn0l1msbcxa126v21nz-profile/lib/python3.10/site-packages/MySQLdb/connections.py", line 195, in __init__ +2024-11-05 14:49:00 super().__init__(*args, **kwargs2) +2024-11-05 14:49:00 MySQLdb.OperationalError: (2002, "Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)") +``` + +We have previously defined the default socket file[^1][^2] as "/run/mysqld/mysqld.sock". + +## Troubleshooting Logs + +### 2024-11-05 + +I attempted to just bind `/run/mysqld/mysqld.sock` to `/tmp/mysql.sock` by adding the following mapping in GN3's `gunicorn-app` definition: + +``` +(file-system-mapping + (source "/run/mysqld/mysqld.sock") + (target "/tmp/mysql.sock") + (writable? #t)) +``` + +but that does not fix things. + +I had tried to change the mysql URI to use IP addresses, i.e. + +``` +SQL_URI="mysql://webqtlout:webqtlout@128.169.5.119:3306/db_webqtl" +``` + +but that simply changes the error from the above to the one below: + +``` +2024-11-05 15:27:12 MySQLdb.OperationalError: (2002, "Can't connect to MySQL server on '128.169.5.119' (115)") +``` + +I tried with both `127.0.0.1` and `128.169.5.119`. + +My hail-mary was to attempt to expose the `my.cnf` file generated by the `mysql-service-type` definition to the "pola-wrapper", but that is proving tricky, seeing as the file is generated elsewhere[^4] and we do not have a way of figuring out the actual final path of the file. + +I tried: + +``` +(file-system-mapping + (source (mixed-text-file "my.cnf" + (string-append "[client]\n" + "socket=/run/mysqld/mysqld.sock"))) + (target "/etc/mysql/my.cnf")) +``` + +but that did not work either. + +### 2024-11-07 + +Start digging into how GNU Guix services are defined[^5] to try and understand why the file mapping attempt did not work. + +=> http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/system/file-systems.scm?id=2394a7f5fbf60dd6adc0a870366adb57166b6d8b#n575 +Looking at the code linked above specifically at lines 575 to 588, and 166, it seems, to me, that the mappings attempt should have worked. + +Try it again, taking care to verify that the paths are correct, with: + +``` +(file-system-mapping + (source (mixed-text-file "my.cnf" + (string-append "[client-server]\n" + "socket=/run/mysqld/mysqld.sock"))) + (target "/etc/my.cnf")) +``` + +Try rebuilding on tux04: started getting `Segmentation fault` errors out of the blue for many guix commands 🤦🏿. +Try building container on local dev machine: this took a long time - quit and continue later. + +### 2024-11-08 + +After guix broke, causing the `Segmentation fault` errors above, I did some troubleshooting and was able to finally fix that by pinning guix to version b0b988c41c9e0e591274495a1b2d6f27fcdae15a as shown in the troubleshooting transcript[^6]. + +Now the fixes I did to make python requests work with the newer guix (defined in guix-bioinformatics[^7]) seem to be leading to failures in the older guix version. + +Let me attempt rebasing to reorder the commits, to make the python requests commit come last, to more easily do a `git reset` before rebuilding the container — not successful. +=> https://git.genenetwork.org/gn-machines/commit/?h=production-container&id=610049b2bfa32cae5d3f992b95aac711290efa2a Manually "undo" the changes in a new commit, + +then rebuild the container. This exposes a bug in gn-auth. + +=> https://git.genenetwork.org/gn-auth/commit/?id=4c21d0e43cf0de1084d0e0a243e441c6e72236eb Fix that. + +and update the `public-jwks-uri` value for the client in the admin dashboard, and voila!!! Now the system works. + +Attempt pulling guix "2394a7f5fbf60dd6adc0a870366adb57166b6d8b" into a profile locally: went through without a hitch + +Upgrade guix daemon, and restart it. Delete profile and run `guix gc`, then try pulling guix "2394a7f5fbf60dd6adc0a870366adb57166b6d8b" again. It also went through without a problem. This eliminates the daemon being the culprit: Running `sudo -i guix pull --list-generations` on both tux04 and my local dev machine gives both daemon commits as `2a6d96425eea57dc6dd48a2bec16743046e32e06`. + + +### Footnotes + +=> https://git.genenetwork.org/gn-machines/tree/production.scm?id=46a1c4c8d01198799e6ac3b99998dca40d2c7094#n47 [^1] Lines 47 to 49 of production.scm +=> https://guix.gnu.org/manual/en/html_node/Database-Services.html#index-mysql_002dconfiguration [^2] Guix's mysql-service-type configurations +=> https://mariadb.com/kb/en/server-system-variables/#socket [^3] MariaDB configuration variables: socket +=> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/services/databases.scm?id=4c56d0cccdc44e12484b26332715f54768738c5f#n576 [^4] Guix: mysql-service-type configuration code +=> https://guix.gnu.org/manual/en/html_node/Defining-Services.html [^5] Guix documentation: Defining Services +=> https://github.com/genenetwork/gn-gemtext-threads/blob/d785b06643b5e5a2470fd0da075dcf77bda82d16/miscellaneous/broken-guix-on-tux04-20241108.org [^6] Broken guix on tux04: Troubleshooting transcript +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=eb7beb340a9731775e8ad177e47b70dba2f2a84f [^7] guix-bioinformatics: Upgrade guix channel to 2394a7f diff --git a/issues/genenetwork/containerising-production-issues.gmi b/issues/genenetwork/containerising-production-issues.gmi new file mode 100644 index 0000000..ed5702a --- /dev/null +++ b/issues/genenetwork/containerising-production-issues.gmi @@ -0,0 +1,33 @@ +# Containerising Production: Issues + +## Tags + +* type: bug +* assigned: fredm +* priority: critical +* status: closed, completed +* keywords: production, container, tux04 +* interested: alexk, aruni, bonfacem, fredm, pjotrp, soloshelby, zsloan, jnduli + +## Description + +We have recently got production into a container and deployed it: It has come up, however, that there are services that are useful to get a full-featured GeneNetwork system running that are not part of the container. + +This is, therefore, a meta-issue, tracking all issues that relate to the deployment of the disparate services that make up GeneNetwork. + +## Documentation + +=> https://issues.genenetwork.org/topics/genenetwork/genenetwork-services + +The link above documents the various services that make up the GeneNetwork service. + +## Issues + +* [x] Move user directories to a large partition +=> ./handle-tmp-dirs-in-container [x] Link TMPDIR in container to a directory on a large partition +=> ./markdown-editing-service-not-deployed [ ] Define and deploy Markdown Editing service +=> ./umhet3-samples-timing-slow [ ] Figure out and fix UM-HET3 Samples mappings on Tux04 +=> ./setup-mailing-on-tux04 [x] Setting up email service on Tux04 +=> ./virtuoso-shutdown-clears-data [x] Virtuoso seems to lose data on restart +=> ./python-requests-error-in-container [x] Fix python's requests library certificates error +=> ./cannot-connect-to-mariadb [ ] GN3 cannot connect to mariadb server diff --git a/issues/genenetwork/handle-tmp-dirs-in-container.gmi b/issues/genenetwork/handle-tmp-dirs-in-container.gmi new file mode 100644 index 0000000..5f6eb92 --- /dev/null +++ b/issues/genenetwork/handle-tmp-dirs-in-container.gmi @@ -0,0 +1,22 @@ +# Handle Temporary Directories in the Container + +## Tags + +* type: feature +* assigned: fredm +* priority: critical +* status: closed, completed +* keywords: production, container, tux04 +* interested: alexk, aruni, bonfacem, pjotrp, zsloan + +## Description + +The container's temporary directories should be in a large partition on the host to avoid a scenario where the writes fill up one of the smaller drives. + +Currently, we use the `/tmp` directory by default, but we should look into transitioning away from that — `/tmp` is world readable and world writable and therefore needs careful consideration to keep safe. + +Thankfully, we are running our systems within a container, and can bind the container's `/tmp` directory to a non-world-accessible directory, keeping things at least contained. + +### Fixes + +=> https://git.genenetwork.org/gn-machines/commit/?id=7306f1127df9d4193adfbfa51295615f13d32b55 diff --git a/issues/genenetwork/markdown-editing-service-not-deployed.gmi b/issues/genenetwork/markdown-editing-service-not-deployed.gmi new file mode 100644 index 0000000..e7a1717 --- /dev/null +++ b/issues/genenetwork/markdown-editing-service-not-deployed.gmi @@ -0,0 +1,34 @@ +# Markdown Editing Service: Not Deployed + +## Tags + +* type: bug +* status: open +* assigned: fredm +* priority: critical +* keywords: production, container, tux04 +* interested: alexk, aruni, bonfacem, fredm, pjotrp, zsloan + +## Description + +The Markdown Editing service is not working on production. + +* Link: https://genenetwork.org/facilities/ +* Repository: https://git.genenetwork.org/gn-guile + +Currently, the code is being run directly on the host, rather than inside the container. + +Some important things to note: + +* The service requires access to a checkout of https://github.com/genenetwork/gn-docs +* Currently, the service is hard-coded to use a specific port: we should probably fix that. + +## Reopened: 2024-11-01 + +While the service was deployed, the edit functionality is not working right, specifically, pushing the edits upstream to the remote seems to fail. + +If you do an edit and refresh the page, it will show up in the system, but it will not proceed to be pushed up to the remote. + +Set `CGIT_REPO_PATH="https://git.genenetwork.org/gn-guile"` which seems to allow the commit to work, but we do not actually get the changes pushed to the remote in any useful sense. + +It seems to me, that we need to configure the environment in such a way that it will be able to push the changes to remote. diff --git a/issues/genenetwork/python-requests-error-in-container.gmi b/issues/genenetwork/python-requests-error-in-container.gmi new file mode 100644 index 0000000..0289762 --- /dev/null +++ b/issues/genenetwork/python-requests-error-in-container.gmi @@ -0,0 +1,174 @@ +# Python Requests Error in Container + +## Tags + +* type: bug +* assigned: fredm +* priority: critical +* status: closed, completed, fixed +* interested: alexk, aruni, bonfacem, pjotrp, zsloan +* keywords: production, container, tux04, python, requests + +## Description + +Building the container with the +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=eb7beb340a9731775e8ad177e47b70dba2f2a84f upgraded guix definition +leads to python's requests library failing. + +``` +2024-10-30 16:04:13 OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/ssl/certs/ca-certificates.crt +``` + +If you login to the container itself, however, you find that the file `/etc/ssl/certs/ca-certificates.crt` actually exists and has content. + +Possible fixes suggested are to set up correct envvars for the requests library, such as `REQUESTS_CA_BUNDLE` + +See +=> https://requests.readthedocs.io/en/latest/user/advanced/#ssl-cert-verification + +### Troubleshooting Logs + +Try reproducing the issue locally: + +``` +$ guix --version +hint: Consider installing the `glibc-locales' package and defining `GUIX_LOCPATH', along these lines: + + guix install glibc-locales + export GUIX_LOCPATH="$HOME/.guix-profile/lib/locale" + +See the "Application Setup" section in the manual, for more info. + +guix (GNU Guix) 2394a7f5fbf60dd6adc0a870366adb57166b6d8b +Copyright (C) 2024 the Guix authors +License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +$ +$ guix shell --container --network python python-requests coreutils +[env]$ ls "${GUIX_ENVIRONMENT}/etc" +ld.so.cache profile +``` + +We see from the above that there are no certificates in the environment with just python and python-requests. + +Okay. Now let's write a simple python script to test things out with: + +``` +import requests + +resp = requests.get("https://github.com") +print(resp) +``` + +and run it! + +``` +$ guix shell --container --network python python-requests coreutils -- python3 test.py +Traceback (most recent call last): + File "/tmp/test.py", line 1, in <module> + import requests + File "/gnu/store/b6ny4p29f32rrnnvgx7zz1nhsms2zmqk-profile/lib/python3.10/site-packages/requests/__init__.py", line 164, in <module> + from .api import delete, get, head, options, patch, post, put, request + File "/gnu/store/b6ny4p29f32rrnnvgx7zz1nhsms2zmqk-profile/lib/python3.10/site-packages/requests/api.py", line 11, in <module> + from . import sessions + File "/gnu/store/b6ny4p29f32rrnnvgx7zz1nhsms2zmqk-profile/lib/python3.10/site-packages/requests/sessions.py", line 15, in <module> + from .adapters import HTTPAdapter + File "/gnu/store/b6ny4p29f32rrnnvgx7zz1nhsms2zmqk-profile/lib/python3.10/site-packages/requests/adapters.py", line 81, in <module> + _preloaded_ssl_context.load_verify_locations( +FileNotFoundError: [Errno 2] No such file or directory +``` + +Uhmm, what is this new error? + +Add `nss-certs` and try again. + +``` +$ guix shell --container --network python python-requests nss-certs coreutils +[env]$ ls ${GUIX_ENVIRONMENT}/etc/ssl/ +certs +[env]$ python3 test.py +Traceback (most recent call last): + File "/tmp/test.py", line 1, in <module> + import requests + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/__init__.py", line 164, in <module> + from .api import delete, get, head, options, patch, post, put, request + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/api.py", line 11, in <module> + from . import sessions + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/sessions.py", line 15, in <module> + from .adapters import HTTPAdapter + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/adapters.py", line 81, in <module> + _preloaded_ssl_context.load_verify_locations( +FileNotFoundError: [Errno 2] No such file or directory +[env]$ +[env]$ export REQUESTS_CA_BUNDLE="${GUIX_ENVIRONMENT}/etc/ssl/certs/ca-certificates.crt" +[env]$ $ python3 test.py +Traceback (most recent call last): + File "/tmp/test.py", line 1, in <module> + import requests + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/__init__.py", line 164, in <module> + from .api import delete, get, head, options, patch, post, put, request + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/api.py", line 11, in <module> + from . import sessions + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/sessions.py", line 15, in <module> + from .adapters import HTTPAdapter + File "/gnu/store/17dw8qczqqz9fmj2kxzsbfqn730frqd7-profile/lib/python3.10/site-packages/requests/adapters.py", line 81, in <module> + _preloaded_ssl_context.load_verify_locations( +FileNotFoundError: [Errno 2] No such file or directory +``` + +Welp! Looks like this error is a whole different thing. + +Let us try with the genenetwork2 package. + +``` +$ guix shell --container --network genenetwork2 coreutils +[env]$ ls "${GUIX_ENVIRONMENT}/etc" +bash_completion.d jupyter ld.so.cache profile +``` + +This does not seem to have the certificates in place either, so let's add nss-certs + +``` +$ guix shell --container --network genenetwork2 coreutils nss-certs +[env]$ ls "${GUIX_ENVIRONMENT}/etc" +bash_completion.d jupyter ld.so.cache profile ssl +[env]$ python3 test.py +Traceback (most recent call last): + File "/tmp/test.py", line 3, in <module> + resp = requests.get("https://github.com") + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/api.py", line 73, in get + return request("get", url, params=params, **kwargs) + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/api.py", line 59, in request + return session.request(method=method, url=url, **kwargs) + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/sessions.py", line 587, in request + resp = self.send(prep, **send_kwargs) + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/sessions.py", line 701, in send + r = adapter.send(request, **kwargs) + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/adapters.py", line 460, in send + self.cert_verify(conn, request.url, verify, cert) + File "/gnu/store/qigjz4i0dckbsjbd2has0md2dxwsa7ry-profile/lib/python3.10/site-packages/requests/adapters.py", line 263, in cert_verify + raise OSError( +OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/ssl/certs/ca-certificates.crt +``` + +We get the expected certificates error! This is good. Now define the envvar and try again. + +``` +[env]$ export REQUESTS_CA_BUNDLE="${GUIX_ENVIRONMENT}/etc/ssl/certs/ca-certificates.crt" +[env]$ python3 test.py +<Response [200]> +``` + +Success!!! + +Adding nss-certs and setting the `REQUESTS_CA_BUNDLE` fixes things. We'll need to do the same for the container, for both the genenetwork2 and genenetwork3 packages (and any other packages that use requests library). + +### Fixes + +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=fec68c4ca87eeca4eb9e69e71fc27e0eae4dd728 +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=c3bb784c8c70857904ef97ecd7d36ec98772413d +The two commits above add nss-certs package to all the flask apps, which make use of the python-requests library, which requires a valid CA certificates bundle in each application's environment. + +=> https://git.genenetwork.org/gn-machines/commit/?h=production-container&id=04506c4496e5ca8b3bc38e28ed70945a145fb036 +The commit above defines the "REQUESTS_CA_BUNDLE" environment variable for all the flask applications that make use of python's requests library. diff --git a/issues/genenetwork/setup-mailing-on-tux04.gmi b/issues/genenetwork/setup-mailing-on-tux04.gmi new file mode 100644 index 0000000..45605d9 --- /dev/null +++ b/issues/genenetwork/setup-mailing-on-tux04.gmi @@ -0,0 +1,16 @@ +# Setup Mailing on Tux04 + +## Tags + +* type: bug +* status: closed +* assigned: fredm +* priority: critical +* interested: pjotrp, zsloan +* keywords: production, container, tux04 + +## Description + +We use emails to verify user accounts and allow changing of user passwords. We therefore need to setup a way to send emails from the system. + +I updated the configurations to use UTHSC's mail server diff --git a/issues/genenetwork/umhet3-samples-timing-slow.gmi b/issues/genenetwork/umhet3-samples-timing-slow.gmi new file mode 100644 index 0000000..a3a33a7 --- /dev/null +++ b/issues/genenetwork/umhet3-samples-timing-slow.gmi @@ -0,0 +1,72 @@ +# UM-HET3 Timing: Slow + +## Tags + +* type: bug +* status: open +* assigned: fredm +* priority: critical +* interested: fredm, pjotrp, zsloan +* keywords: production, container, tux04, UM-HET3 + +## Description + +In email from @robw: + +``` +> > Not sure why. Am I testing the wrong way? +> > Are we using memory and RAM in the same way on the two machines? +> > Here are data on the loading time improvement for Tux2: +> > I tested this using a "worst case" trait that we know when—the 25,000 +> > UM-HET3 samples: +> > [1]https://genenetwork.org/show_trait?trait_id=10004&dataset=HET3-ITPPu +> > blish +> > Tux02: 15.6, 15.6, 15.3 sec +> > Fallback: 37.8, 38.7, 38.5 sec +> > Here are data on Gemma speed/latency performance: +> > Also tested "worst case" performance using three large BXD data sets +> > tested in this order: +> > [2]https://genenetwork.org/show_trait?trait_id=10004&dataset=BXD-Longev +> > ityPublish +> > [3]https://genenetwork.org/show_trait?trait_id=10003&dataset=BXD-Longev +> > ityPublish +> > [4]https://genenetwork.org/show_trait?trait_id=10002&dataset=BXD-Longev +> > ityPublish +> > Tux02: 107.2, 329.9 (ouch), 360.0 sec (double ouch) for 1004, 1003, and +> > 1002 respectively. On recompute (from cache) 19.9, 19.9 and 20.0—still +> > too slow. +> > Fallback: 154.1, 115.9 for the first two traits (trait 10002 already in +> > the cache) +> > On recompute (from cache) 59.6, 59.0 and 59.7. Too slow from cache. +> > PROBLEM 2: Tux02 is unable to map UM-HET3. I still get an nginx 413 +> > error: Entity Too Large. +> +> Yeah, Fred should fix that one. It is an nginx setting - we run 2x +> nginx. It was reported earlier. +> +> > I need this to work asap. Now mapping our amazing UM-HET3 data. I can +> > use Fallback, but it is painfully slow and takes about 214 sec. I hope +> > Tux02 gets that down to a still intolerable slow 86 sec. +> > Can we please fix and confirm by testing. The Trait is above for your +> > testing pleasure. +> > Even 86 secs is really too slow and should motivate us (or users like +> > me) to think about how we are using all of those 24 ultra-fast cores on +> > the AMD 9274F. Why not put them all to use for us and users. It is not +> > good enough just to have "it work". It has to work in about 5–10 +> > seconds. +> > Here are my questions for you guys: Are we able to use all 24 cores +> > for any one user? How does each user interact with the CPU? Can we +> > handle a class of 24 students with 24 cores, or is it "complicated"? +> > PROBLEM 3: Zach, Fred. Are we computing render time or transport +> > latency correctly? Ideally the printout at the bottom of mapping pages +> > would be true latency as experienced by the user. As far as I can tell +> > with a stop watch our estimates of time are incorrect by as much as 3 +> > secs. And note that the link +> > to [5]http://joss.theoj.org/papers/10.21105/joss.00025 is not working +> > correctly in the footer (see image below). Oddly enough it works fine +> > on Tux02 +> +> Fred, take a note. +``` + +Figure out what this is about and fix it. diff --git a/issues/genenetwork/virtuoso-shutdown-clears-data.gmi b/issues/genenetwork/virtuoso-shutdown-clears-data.gmi new file mode 100644 index 0000000..2e01238 --- /dev/null +++ b/issues/genenetwork/virtuoso-shutdown-clears-data.gmi @@ -0,0 +1,98 @@ +# Virtuoso: Shutdown Clears Data + +## Tags + +* type: bug +* assigned: fredm +* priority: critical +* status: closed, completed +* interested: bonfacem, pjotrp, zsloan +* keywords: production, container, tux04, virtuoso + +## Description + +It seems that virtuoso has the bad habit of clearing data whenever it is stopped/restarted. + +This issue will track the work necessary to get the service behaving correctly. + +According to the documentation on +=> https://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader the bulk loading process + +``` +The bulk loader also disables checkpointing and the scheduler, which also need to be re-enabled post bulk load +``` + +That needs to be handled. + +### Notes + +After having a look at +=> https://docs.openlinksw.com/virtuoso/ch-server/#databaseadmsrv the configuration documentation +it occurs to me that the reason virtuoso supposedly clears the data is that the `DatabaseFile` value is not set, so it defaults to a new database file every time the server is restarted (See also the `Striping` setting). + +### Troubleshooting + +Reproduce locally: + +We begin by getting a look at the settings for the remote virtuoso +``` +$ ssh tux04 +fredm@tux04:~$ cat /gnu/store/bg6i4x96nm32gjp4qhphqmxqc5vggk3h-virtuoso.ini +[Parameters] +ServerPort = localhost:8981 +DirsAllowed = /var/lib/data +NumberOfBuffers = 4000000 +MaxDirtyBuffers = 3000000 +[HTTPServer] +ServerPort = localhost:8982 +``` + +Copy these into a file locally, and adjust the `NumberOfBuffers` and `MaxDirtyBuffers` for smaller local dev environment. Also update `DirsAllowed`. + +We end up with our local configuration in `~/tmp/virtuoso/etc/virtuoso.ini` with the content: + +``` +[Parameters] +ServerPort = localhost:8981 +DirsAllowed = /var/lib/data +NumberOfBuffers = 10000 +MaxDirtyBuffers = 6000 +[HTTPServer] +ServerPort = localhost:8982 +``` + +Run virtuoso! +``` +$ cd ~/tmp/virtuoso/var/lib/virtuoso/ +$ ls +$ ~/opt/virtuoso/bin/virtuoso-t +foreground +configfile ~/tmp/virtuoso/etc/virtuoso.ini +``` + +Here we start by changing into the `~/tmp/virtuoso/var/lib/virtuoso/` directory which will be where virtuoso will put its state. Now in a different terminal list the files created int the state directory: + +``` +$ ls ~/tmp/virtuoso/var/lib/virtuoso +virtuoso.db virtuoso.lck virtuoso.log virtuoso.pxa virtuoso.tdb virtuoso.trx +``` + +That creates the database file (and other files) with the documented default values, i.e. `virtuoso.*`. + +We cannot quite reproduce the issue locally, since every reboot will have exactly the same value for the files locally. + +Checking the state directory for virtuoso on tux04, however: + +``` +fredm@tux04:~$ sudo ls -al /export2/guix-containers/genenetwork/var/lib/virtuoso/ | grep '\.db$' +-rw-r--r-- 1 986 980 3787456512 Oct 28 14:16 js1b7qjpimdhfj870kg5b2dml640hryx-virtuoso.db +-rw-r--r-- 1 986 980 4152360960 Oct 28 17:11 rf8v0c6m6kn5yhf00zlrklhp5lmgpr4x-virtuoso.db +``` + +We see that there are multiple db files, each created when virtuoso was restarted. There is an extra (possibly) random string prepended to the `virtuoso.db` part. This happens for our service if we do not actually provide the `DatabaseFile` configuration. + + +## Fixes + +=> https://github.com/genenetwork/gn-gemtext-threads/commit/8211c1e49498ba2f3b578ed5b11b15c52299aa08 Document how to restart checkpointing and the scheduler after bulk loading +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=2dc335ca84ea7f26c6977e6b432f3420b113f0aa Add configs for scheduler and checkpointing +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=7d793603189f9d41c8ee87f8bb4c876440a1fce2 Set up virtuoso database configurations +=> https://git.genenetwork.org/gn-machines/commit/?id=46a1c4c8d01198799e6ac3b99998dca40d2c7094 Explicitly name virtuoso database files. diff --git a/issues/genenetwork2-account-registration-error.gmi b/issues/genenetwork2-account-registration-error.gmi index d617f93..14b6322 100644 --- a/issues/genenetwork2-account-registration-error.gmi +++ b/issues/genenetwork2-account-registration-error.gmi @@ -5,7 +5,7 @@ * type: bug * priority: critical * assigned: zachs, zsloan, fredm -* status: open +* status: closed, completed * keywords: genenetwork2, account management, user, registration ## Description diff --git a/issues/genenetwork2-cd-sometimes-fails-to-restart.gmi b/issues/genenetwork2-cd-sometimes-fails-to-restart.gmi index d2d2013..603de59 100644 --- a/issues/genenetwork2-cd-sometimes-fails-to-restart.gmi +++ b/issues/genenetwork2-cd-sometimes-fails-to-restart.gmi @@ -10,4 +10,7 @@ A reminder that CD logs are publicly accessible on tux02. => /topics/cd-logs ## Resolution + This issue has been re-opened. Originally, we believed that the restart failures were due to occasional breakage in GN code, and were not a problem with the CI/CD system itself. This will need further investigation to figure out what the root cause is. + +* closed diff --git a/issues/genenetwork2/broken-collections-features.gmi b/issues/genenetwork2/broken-collections-features.gmi new file mode 100644 index 0000000..4239929 --- /dev/null +++ b/issues/genenetwork2/broken-collections-features.gmi @@ -0,0 +1,44 @@ +# Broken Collections Features + +## Tags + +* type: bug +* status: open +* priority: high +* assigned: zachs, fredm +* keywords: gn2, genenetwork2, genenetwork 2, collections + +## Descriptions + +There are some features in the search results page, and/or the collections page that are broken — these are: + +* "CTL" feature +* "MultiMap" feature +* "Partial Correlations" feature +* "Generate Heatmap" feature + +### Reproduce Issue + +* Go to https://genenetwork.org +* Select "Mouse (Mus musculus, mm10) for "Species" +* Select "BXD Family" for "Group" +* Select "Traits and Cofactors" for "Type" +* Select "BXD Published Phenotypes" for "Dataset" +* Type "locomotion" in the "Get Any" field (without the quotes) +* Click "Search" +* In the results page, select the traits with the following "Record" values: "BXD_10050", "BXD_10051", "BXD_10088", "BXD_10091", "BXD_10092", "BXD_10455", "BXD_10569", "BXD_10570", "BXD_11316", "BXD_11317" +* Click the "Add" button and add them to a new collection +* In the resulting collections page, click the button for any of the listed failing features above + +### Failure modes + +* The "CTL" and "WCGNA" features have a failure mode that might have been caused by recent changes making use of AJAX calls, rather than submitting the form manually. +* The "MultiMap" and "Generate Heatmap" features raise exceptions that need to be investigated and resolved +* The "Partial Correlations" feature seems to run forever + +## Break-out Issues + +We break-out the issues above into separate pages to track the progress of the fixes for each feature separately. + +=> /issues/genenetwork3/ctl-maps-error +=> /issues/genenetwork3/generate-heatmaps-failing diff --git a/issues/genenetwork2/fix-display-for-time-consumed-for-correlations.gmi b/issues/genenetwork2/fix-display-for-time-consumed-for-correlations.gmi new file mode 100644 index 0000000..0c8e9c8 --- /dev/null +++ b/issues/genenetwork2/fix-display-for-time-consumed-for-correlations.gmi @@ -0,0 +1,15 @@ +# Fix Display for the Time Consumed for Correlations + +## Tags + +* type: bug +* status: closed, completed +* priority: low +* assigned: @alexm, @bonz +* keywords: gn2, genenetwork2, genenetwork 2, gn3, genenetwork3 genenetwork 3, correlations, time display + +## Description + +The breakdown of the time consumed for the correlations computations, displayed at the bottom of the page, is not representative of reality. The time that GeneNetwork3 (or background process) takes for the computations is not actually represented in the breakdown, leading to wildly inaccurate displays of total time. + +This will need to be fixed. diff --git a/issues/genenetwork2/haley-knott-regression-mapping-error.gmi b/issues/genenetwork2/haley-knott-regression-mapping-error.gmi new file mode 100644 index 0000000..25bb221 --- /dev/null +++ b/issues/genenetwork2/haley-knott-regression-mapping-error.gmi @@ -0,0 +1,80 @@ +# Haley-Knott Regression Mapping Error + +## Tags + +* type: bug +* status: closed, completed +* priority: high +* assigned: fredm +* keywords: gn2, genenetwork2, genenetwork 2, mapping, haley-knott + +## Description + +To run the mapping: + +* Do a search +* Click on any trait in the results +* On the trait page, expand the "Mapping Tools" section +* Select the "Haley-Knott Regression" option under "Mapping Tools" +* Click "Compute" + +On running the mapping as above, we got the following error: + +``` + GeneNetwork 2.11-rc2 https://gn2-fred.genenetwork.org/run_mapping ( 6:14AM UTC Sep 11, 2024) +Traceback (most recent call last): + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/flask/app.py", line 1523, in full_dispatch_request + rv = self.dispatch_request() + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/flask/app.py", line 1509, in dispatch_request + return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/gn2/wqflask/views.py", line 1004, in mapping_results_page + gn1_template_vars = display_mapping_results.DisplayMappingResults( + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/gn2/wqflask/marker_regression/display_mapping_results.py", line 651, in __init__ + self.perm_filename = self.drawPermutationHistogram() + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/gn2/wqflask/marker_regression/display_mapping_results.py", line 3056, in drawPermutationHistogram + Plot.plotBar(myCanvas, perm_output, XLabel=self.LRS_LOD, + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/gn2/utility/Plot.py", line 184, in plotBar + scaleFont = ImageFont.truetype(font=COUR_FILE, size=11) + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/PIL/ImageFont.py", line 959, in truetype + return freetype(font) + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/PIL/ImageFont.py", line 956, in freetype + return FreeTypeFont(font, size, index, encoding, layout_engine) + File "/gnu/store/hgcvlkn4bjl0f9wqiakpk5w66brbfxk6-profile/lib/python3.10/site-packages/PIL/ImageFont.py", line 247, in __init__ + self.font = core.getfont( +OSError: cannot open resource +``` + +### Hypothesis + +My hypothesis is that the use of relative paths[fn:1] is the cause of the failure. + +When running the application with the working directory being the root of the GeneNetwork2 repository, use of the relative paths works well. Unfortunately, that assumption breaks quickly if the application is ever run outside of the root of the GN2 repo. + +Verification: + +*Question*: Does the application run on root of GN2 repository/package? + +* Log out the path of the font file and use the results to answer the question +* https://github.com/genenetwork/genenetwork2/commit/ca8018a61f2e014b4aee4da2cbd00d7b591b2f6a +* https://github.com/genenetwork/genenetwork2/commit/01d56903ba01a91841d199fe393f9b307a7596a2 + +*Answer*: No! The application does not run with the working directory on the root of the GN2 repository/package, as evidenced by this snippet from the logs: + +``` +2024-09-11 07:41:13 [2024-09-11 07:41:13 +0000] [494] [DEBUG] POST /run_mapping +2024-09-11 07:41:18 [2024-09-11 07:41:18 +0000] [494] [DEBUG] Font file path: /gn2/wqflask/static/fonts/courbd.ttf +2024-09-11 07:41:18 DEBUG:gn2.wqflask:Font file path: /gn2/wqflask/static/fonts/courbd.ttf +2024-09-11 07:41:18 [2024-09-11 07:41:18 +0000] [494] [ERROR] https://gn2-fred.genenetwork.org/run_mapping ( 7:41AM UTC Sep 11, 2024) +2024-09-11 07:41:18 Traceback (most recent call last): +``` + +We see from this that the application seems to be running with the working directory being "/" rather than the root for the application's package files. + +### Fixes + +* https://github.com/genenetwork/genenetwork2/commit/d001c1e7cae8f69435545b8715038b1d0fc1ee62 +* https://git.genenetwork.org/guix-bioinformatics/commit/?id=7a1bf5bc1c3de67f01eabd23e1ddc0150f81b22b + +# Footnotes + +[fn:1] https://github.com/genenetwork/genenetwork2/blob/50fc0b4bc4106164745afc7e1099bb150f6e635f/gn2/utility/Plot.py#L44-L46 diff --git a/issues/genenetwork2/handle-oauth-errors-better.gmi b/issues/genenetwork2/handle-oauth-errors-better.gmi new file mode 100644 index 0000000..462ded5 --- /dev/null +++ b/issues/genenetwork2/handle-oauth-errors-better.gmi @@ -0,0 +1,17 @@ +# Handle OAuth Errors Better + +## Tags + +* type: bug +* status: open +* priority: high +* assigned: fredm +* interested: zachs, robw +* keywords: gn2, genenetwork2, ui, user interface, oauth, oauth errors + +## Description + +When a session expires, for whatever reason, a notification is displayed to the user as shown in the image below: +=> ./session_expiry_oauth_error.png + +The message is a little jarring to the end user. Make it gentler, and probably more informative, so the user is not as surprised. diff --git a/issues/genenetwork2/mapping-error.gmi b/issues/genenetwork2/mapping-error.gmi new file mode 100644 index 0000000..2e28491 --- /dev/null +++ b/issues/genenetwork2/mapping-error.gmi @@ -0,0 +1,51 @@ +# Mapping Error + +## Tags + +* type: bug +* status: open +* priority: medium +* assigned: zachs, fredm, flisso +* keywords: gn2, genenetwork2, genenetwork 2, mapping + +## Reproduction + +* Go to https://staging.genenetwork.org/ +* For 'Species' select "Arabidopsis (Arabidopsis thaliana, araTha1)" +* For 'Group' select "BayXSha(RIL by sib-mating)" +* For 'Type' select "arabidopsis seeds" +* For 'Dataset' select "Arabidopsis BayXShaXRIL_expr_reg _ATH1" +* Leave 'Get Any' blank +* Enter "*" for "Combined" +* Click "Search" +* On the search results page, click on "AT1G01010" +* Expand the "Mapping Tools" section +* For 'Chromosome' select "All" +* For 'Minor Allele ≥' enter "0.05" +* For 'Use LOCO' select "Yes" +* Ignore covariates +* Click "Compute" + +### Expected + +The system would compute the maps and display the mapping diagram(s) and data. + +### Actual + +The computation fails with: + +``` + GeneNetwork 2.11-rc2 https://staging.genenetwork.org/loading ( 6:50PM UTC Jul 03, 2024) +Traceback (most recent call last): + File "/gnu/store/jsvqai0gz6fn40k7kx3r12yq4hzfini6-profile/lib/python3.10/site-packages/flask/app.py", line 1523, in full_dispatch_request + rv = self.dispatch_request() + File "/gnu/store/jsvqai0gz6fn40k7kx3r12yq4hzfini6-profile/lib/python3.10/site-packages/flask/app.py", line 1509, in dispatch_request + return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args) + File "/gnu/store/jsvqai0gz6fn40k7kx3r12yq4hzfini6-profile/lib/python3.10/site-packages/gn2/wqflask/views.py", line 812, in loading_page + for sample in samples: +TypeError: 'NoneType' object is not iterable +``` + +### Updates + +This is likely just because the genotype file doesn't exist in the necessary format (BIMBAM). We probably need to convert the R/qtl2 genotypes to BIMBAM. diff --git a/issues/genenetwork2/refresh-token-failure.gmi b/issues/genenetwork2/refresh-token-failure.gmi new file mode 100644 index 0000000..dd33341 --- /dev/null +++ b/issues/genenetwork2/refresh-token-failure.gmi @@ -0,0 +1,108 @@ +# Refresh Token Failure + +## Tags + +* status: open +* priority: high +* type: bug +* assigned: fredm, zsloan, zachs +* keywords: gn2, genenetwork2 + +## Description + +* Go to https://genenetwork.org +* Click "Sign in" and sign in to the application +* Wait 15 minutes +* Close the entire browser +* Open the browser and go to https://genenetwork.org +* Observe the "ERROR" message at the "Collections" link's badge + +The expectation is that the Collections badge would list the number of collection the user has, rather than the error message. + +The logs fail with an 'invalid_client' error: + +``` +2025-01-08 20:48:56 raise self.oauth_error_class( +2025-01-08 20:48:56 authlib.integrations.base_client.errors.OAuthError: invalid_client: +2025-01-08 20:48:56 ERROR:gn2.wqflask:Error loading number of collections +2025-01-08 20:48:56 Traceback (most recent call last): +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/gn2/wqflask/__init__.py", +line 55, in numcoll +2025-01-08 20:48:56 return num_collections() +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/gn2/wqflask/oauth2/collect +ions.py", line 13, in num_collections +2025-01-08 20:48:56 all_collections = all_collections + oauth2_get( +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/gn2/wqflask/oauth2/client. +py", line 168, in oauth2_get +2025-01-08 20:48:56 resp = oauth2_client().get( +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/requests/sessions.py", lin +e 600, in get +2025-01-08 20:48:56 return self.request("GET", url, **kwargs) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/integrations/reque +sts_client/oauth2_session.py", line 109, in request +2025-01-08 20:48:56 return super(OAuth2Session, self).request( +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/requests/sessions.py", lin +e 573, in request +2025-01-08 20:48:56 prep = self.prepare_request(req) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/requests/sessions.py", lin +e 484, in prepare_request +2025-01-08 20:48:56 p.prepare( +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/requests/models.py", line +372, in prepare +2025-01-08 20:48:56 self.prepare_auth(auth, url) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/requests/models.py", line +603, in prepare_auth +2025-01-08 20:48:56 r = auth(self) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/integrations/reque +sts_client/oauth2_session.py", line 24, in __call__ +2025-01-08 20:48:56 self.ensure_active_token() +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/integrations/reque +sts_client/oauth2_session.py", line 20, in ensure_active_token +2025-01-08 20:48:56 if self.client and not self.client.ensure_active_token(self.token): +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/oauth2/client.py", + line 262, in ensure_active_token +2025-01-08 20:48:56 self.refresh_token(url, refresh_token=refresh_token) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/oauth2/client.py", + line 252, in refresh_token +2025-01-08 20:48:56 return self._refresh_token( +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/oauth2/client.py", + line 373, in _refresh_token +2025-01-08 20:48:56 token = self.parse_response_token(resp) +2025-01-08 20:48:56 File "/gnu/store/3n1cl5cxal3qk7p9q363qgm2ag45a177-profile/lib/python3.10/site-packages/authlib/oauth2/client.py", + line 340, in parse_response_token +2025-01-08 20:48:56 raise self.oauth_error_class( +2025-01-08 20:48:56 authlib.integrations.base_client.errors.OAuthError: invalid_client: +``` + + +### Troubleshooting + +The following commits were done as part of the troubleshooting: + +=> https://github.com/genenetwork/genenetwork2/commit/55da5809d851a3c8bfa13637947b019a2c02cc93 +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=d1cada0f0933732eb68b7786fb04ea541d8c51c9 +=> https://github.com/genenetwork/genenetwork2/commit/93dd7f7583af4e0bdd3c7b9c88d375fdc4b40039 +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=5fe04ca1545f740cbb91474576891c7fd1dff13a +=> https://github.com/genenetwork/genenetwork2/commit/2031da216f3b62c23dca64eb6d1c533c07dc81f1 +=> https://github.com/genenetwork/genenetwork2/commit/125c436f5310b194c10385ce9d81135518ac0adf +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=758e6f0fbf6af4af5b94b9aa5a9264c31f050153 +=> https://github.com/genenetwork/genenetwork2/commit/8bf483a3ab23ebf25d73380e78271c368ff06b2d +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=f1ee97a17e670b12112d48bea8969e2ee162f808 +=> https://github.com/genenetwork/genenetwork2/commit/de01f83090184fc56dce2f9887d2dc910edc60fe +=> https://github.com/genenetwork/genenetwork2/commit/91017b97ee346e73bed9b77e3f3f72daa4acbacd +=> https://github.com/genenetwork/genenetwork2/commit/7e6bfe48167c70d26e27b043eb567608bc1fda84 +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=1f71a1e78af87266e7a4170ace8860111a1569d6 +=> https://github.com/genenetwork/genenetwork2/commit/9bdc8ca0b17739c1df9dc504f8cd978296b987dd +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=02a9a99e7e3c308157f7d740a244876ab4196337 +=> https://github.com/genenetwork/genenetwork2/commit/236a48835dc6557ba0ece6aef6014f496ddb163e +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=f928be361d2e331d72448416300c331e47341807 +=> https://github.com/genenetwork/genenetwork2/commit/5fb56c51ad4eaff13a7e24b6022dffb7d82aa41d +=> https://github.com/genenetwork/genenetwork2/commit/c6c9ef71718d650f9c19ae459d6d4e25e72de00a +=> https://github.com/genenetwork/genenetwork2/commit/dc606f39fb4aad74004959a6a15e481fa74d52ff +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=4ab597b734968916af5bae6332756af8168783b3 +=> https://github.com/genenetwork/genenetwork2/commit/854639bd46293b6791c629591fd934d1f34038ac +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=7e0083555150d151e566cebed4bd82d69e347eb6 +=> https://github.com/genenetwork/genenetwork2/commit/c4508901027a2d3ea98e1e9b3f8767a455cad02f +=> https://git.genenetwork.org/guix-bioinformatics/commit/?id=955e4ce9370be9811262d7c73fa5398385cc04d8 + + diff --git a/issues/genenetwork2/session_expiry_oauth_error.png b/issues/genenetwork2/session_expiry_oauth_error.png Binary files differnew file mode 100644 index 0000000..34e2dda --- /dev/null +++ b/issues/genenetwork2/session_expiry_oauth_error.png diff --git a/issues/genenetwork3/01828928-26e6-4cad-bbc8-59fd7a7977de.json.zip b/issues/genenetwork3/01828928-26e6-4cad-bbc8-59fd7a7977de.json.zip Binary files differnew file mode 100644 index 0000000..7681b88 --- /dev/null +++ b/issues/genenetwork3/01828928-26e6-4cad-bbc8-59fd7a7977de.json.zip diff --git a/issues/genenetwork3/broken-aliases.gmi b/issues/genenetwork3/broken-aliases.gmi new file mode 100644 index 0000000..5735a1c --- /dev/null +++ b/issues/genenetwork3/broken-aliases.gmi @@ -0,0 +1,27 @@ +# Broken Aliases + +## Tags + +* type: bug +* status: open +* priority: high +* assigned: fredm +* interested: pjotrp +* keywords: aliases, aliases server + + +## Repository + +=> https://github.com/genenetwork/gn3 + +## Bug Report + +### Actual + +* Go to https://genenetwork.org/gn3/gene/aliases2/Shh,Brca2 +* Not that an exception is raised, with a "404 Not Found" message + +### Expected + +* We expected a list of aliases to be returned for the given symbols as is done in https://fallback.genenetwork.org/gn3/gene/aliases2/Shh,Brca2 + diff --git a/issues/genenetwork3/check-for-mandatory-settings.gmi b/issues/genenetwork3/check-for-mandatory-settings.gmi new file mode 100644 index 0000000..16a2f8a --- /dev/null +++ b/issues/genenetwork3/check-for-mandatory-settings.gmi @@ -0,0 +1,40 @@ +# Check for Mandatory Settings + +## Tags + +* status: open +* priority: high +* type: bug, improvement +* interested: fredm, bonz +* assigned: jnduli, rookie101 +* keywords: GN3, gn3, genenetwork3, settings, config, configs, configurations + +## Explanation + +Giving defaults to some important settings leads to situations where the correct configuration is not set up correctly leading at best to failure, and at worst, to subtle failures that can be difficult to debug: e.g. When a default URI to a server points to an active domain, just not the correct one. + +We want to make such (arguably, sensitive) configurations explicit, and avoid giving them defaults. We want to check that they are set up before allowing the application to run, and fail loudly and obnoxiously if they are not provided. + +Examples of configuration variables that should be checked for: + +* All external URIs (external to app/repo under consideration) +* All secrets (secret keys, salts, tokens, etc) + +We should also eliminate from the defaults: + +* Computed values +* Calls to get values from ENVVARs (`os.environ.get(…)` calls) + +### Note on ENVVARs + +The environment variables should be used for overriding values under specific conditions, therefore, it should both be explicit and the last thing loaded to ensure they actually override settings. + +=> https://git.genenetwork.org/gn-auth/tree/gn_auth/__init__.py?id=3a276642bea934f0a7ef8f581d8639e617357a2a#n70 See this example for a possible way of allowing ENVVARs to override settings. + +The example above could be improved by maybe checking for environment variables starting with a specific value, e.g. the envvar `GNAUTH_SECRET_KEY` would override the `SECRET_KEY` configuration. This allows us to override settings without having to change the code. + +## Tasks + +* [ ] Explicitly check configs for ALL external URIs +* [ ] Explicitly check configs for ALL secrets +* [ ] Explicitly load ENVVARs last to override settings diff --git a/issues/genenetwork3/ctl-maps-error.gmi b/issues/genenetwork3/ctl-maps-error.gmi new file mode 100644 index 0000000..6726357 --- /dev/null +++ b/issues/genenetwork3/ctl-maps-error.gmi @@ -0,0 +1,46 @@ +# CTL Maps Error + +## Tags + +* type: bug +* status: open +* priority: high +* assigned: alexm, zachs, fredm +* keywords: CTL, CTL Maps, gn3, genetwork3, genenetwork 3 + +## Description + +Trying to run the CTL Maps feature in the collections page as described in +=> /issues/genenetwork2/broken-collections-feature + +We get an error in the results page of the form: + +``` +{'error': '{\'code\': 1, \'output\': \'Loading required package: MASS\\nLoading required package: parallel\\nLoading required package: qtl\\nThere were 13 warnings (use warnings() to see them)\\nError in xspline(x, y, shape = 0, lwd = lwd, border = col, lty = lty, : \\n invalid value specified for graphical parameter "lwd"\\nCalls: ctl.lineplot -> draw.spline -> xspline\\nExecution halted\\n\'}'} +``` + +on the CLI the same error is rendered: +``` +Loading required package: MASS +Loading required package: parallel +Loading required package: qtl +There were 13 warnings (use warnings() to see them) +Error in xspline(x, y, shape = 0, lwd = lwd, border = col, lty = lty, : + invalid value specified for graphical parameter "lwd" +Calls: ctl.lineplot -> draw.spline -> xspline +Execution halted +``` + +On my local development machine, the command run was +``` +Rscript /home/frederick/genenetwork/genenetwork3/scripts/ctl_analysis.R /tmp/01828928-26e6-4cad-bbc8-59fd7a7977de.json +``` + +Here is a zipped version of the json file (follow the link and click download): +=> https://github.com/genenetwork/gn-gemtext-threads/blob/main/issues/genenetwork3/01828928-26e6-4cad-bbc8-59fd7a7977de.json.zip + +Troubleshooting a while, I suspect +=> https://github.com/genenetwork/genenetwork3/blob/27d9c9d6ef7f37066fc63af3d6585bf18aeec925/scripts/ctl_analysis.R#L79-L80 this is the offending code. + +=> https://cran.r-project.org/web/packages/ctl/ctl.pdf The manual for the ctl library +indicates that our call above might be okay, which might mean something changed in the dependencies that the ctl library used. diff --git a/issues/genenetwork3/generate-heatmaps-failing.gmi b/issues/genenetwork3/generate-heatmaps-failing.gmi new file mode 100644 index 0000000..522dc27 --- /dev/null +++ b/issues/genenetwork3/generate-heatmaps-failing.gmi @@ -0,0 +1,64 @@ +# Generate Heatmaps Failing + +## Tags + +* type: bug +* status: open +* priority: medium +* assigned: fredm, zachs, zsloan +* keywords: genenetwork3, gn3, GN3, heatmaps + +## Reproduce + +* Go to https://genenetwork.org/ +* Under "Select and Search" menu, enter "synap*" for the "Get Any" field +* Click "Search" +* In search results page, select first 10 traits +* Click "Add" +* Under "Create a new collection" enter the name "newcoll" and click "Create collection" +* In the collections page that shows up, click "Select All" once +* Ensure all the traits are selected +* Click "Generate Heatmap" and wait +* Note how system fails silently with no heatmap presented + +### Notes + +On https://gn2-fred.genenetwork.org the heatmaps fails with a note ("ERROR: undefined"). In the logs, I see "Module 'scipy' has no attribute 'array'" which seems to be due to a change in numpy. +=> https://github.com/MaartenGr/BERTopic/issues/1791 +=> https://github.com/scipy/scipy/issues/19972 + +This issue should not be present with python-plotly@5.20.0 but since guix-bioinformatics pins the guix version to `b0b988c41c9e0e591274495a1b2d6f27fcdae15a`, we are not able to pull in newer versions of packages from guix. + + +### Update 2025-04-08T10:59CDT + +Got the following error when I ran the background command manually: + +``` +$ export RUST_BACKTRACE=full +$ /gnu/store/dp4zq4xiap6rp7h6vslwl1n52bd8gnwm-profile/bin/qtlreaper --geno /home/frederick/genotype_files/genotype/genotype/BXD.geno --n_permutations 1000 --traits /tmp/traits_test_file_n2E7V06Cx7.txt --main_output /tmp/qtlreaper/main_output_NGVW4sfYha.txt --permu_output /tmp/qtlreaper/permu_output_MJnzLbrsrC.txt +thread 'main' panicked at src/regression.rs:216:25: +index out of bounds: the len is 20 but the index is 20 +stack backtrace: + 0: 0x61399d77d46d - <unknown> + 1: 0x61399d7b5e13 - <unknown> + 2: 0x61399d78b649 - <unknown> + 3: 0x61399d78f26f - <unknown> + 4: 0x61399d78ee98 - <unknown> + 5: 0x61399d78f815 - <unknown> + 6: 0x61399d77d859 - <unknown> + 7: 0x61399d77d679 - <unknown> + 8: 0x61399d78f3f4 - <unknown> + 9: 0x61399d6f4063 - <unknown> + 10: 0x61399d6f41f7 - <unknown> + 11: 0x61399d708f18 - <unknown> + 12: 0x61399d6f6e4e - <unknown> + 13: 0x61399d6f9e93 - <unknown> + 14: 0x61399d6f9e89 - <unknown> + 15: 0x61399d78e505 - <unknown> + 16: 0x61399d6f8d55 - <unknown> + 17: 0x75ee2b945bf7 - __libc_start_call_main + 18: 0x75ee2b945cac - __libc_start_main@GLIBC_2.2.5 + 19: 0x61399d6f4861 - <unknown> + 20: 0x0 - <unknown> +``` diff --git a/issues/genenetwork3/rqtl2-mapping-error.gmi b/issues/genenetwork3/rqtl2-mapping-error.gmi new file mode 100644 index 0000000..480c7c6 --- /dev/null +++ b/issues/genenetwork3/rqtl2-mapping-error.gmi @@ -0,0 +1,42 @@ +# R/qtl2 Maps Error + +## Tags + +* type: bug +* status: open +* priority: high +* assigned: alexm, zachs, fredm +* keywords: R/qtl2, R/qtl2 Maps, gn3, genetwork3, genenetwork 3 + +## Reproduce + +* Go to https://genenetwork.org/ +* In the "Get Any" field, enter "synap*" and press the "Enter" key +* In the search results, click on the "1435464_at" trait +* Expand the "Mapping Tools" accordion section +* Select the "R/qtl2" option +* Click "Compute" +* In the "Computing the Maps" page that results, click on "Display System Log" + +### Observed + +A traceback is observed, with an error of the following form: + +``` +⋮ +FileNotFoundError: [Errno 2] No such file or directory: '/opt/gn/tmp/gn3-tmpdir/JL9PvKm3OyKk.txt' +``` + +### Expected + +The mapping runs successfully and the results are presented in the form of a mapping chart/graph and a table of values. + +### Debug Notes + +The directory "/opt/gn/tmp/gn3-tmpdir/" exists, and is actually used by other mappings (i.e. The "R/qtl" and "Pair Scan" mappings) successfully. + +This might imply a code issue: Perhaps +* a path is hardcoded, or +* the wrong path value is passed + +The same error occurs on https://cd.genenetwork.org but does not seem to prevent CD from running the mapping to completion. Maybe something is missing on production — what, though? diff --git a/issues/genotype_search_bug.gmi b/issues/genotype_search_bug.gmi new file mode 100644 index 0000000..0f05f4e --- /dev/null +++ b/issues/genotype_search_bug.gmi @@ -0,0 +1,13 @@ +# The * Search for Genotypes Not Working + +## Tags + +* type: bug +* priority: medium +* status: closed +* assigned: zsloan +* keywords: bug, search + +## Description + +Currently * searches for genotypes return no results, even when data exists. diff --git a/issues/global-search-results.gmi b/issues/global-search-results.gmi deleted file mode 100644 index 9cd773a..0000000 --- a/issues/global-search-results.gmi +++ /dev/null @@ -1,32 +0,0 @@ -# Global search does not return results - -## Tags - -* priority: critical -* type: bug -* assigned: zsloan, pjotrp -* status: unclear -* keywords: global search, from github - -## Description - -=> https://github.com/genenetwork/genenetwork2/issues/629 From GitHub - -> Try a search for Brca2 -> -> I am trying to add an example to this storyboard: -> -> => https://github.com/genenetwork/gn-docs/blob/master/story-boards/starting-from-known-gene/starting-from-known-gene.md#use-the-search-page -> -> -> Interestingly luna does no better: -> -> => http://luna.genenetwork.org/gsearch?type=gene&terms=brca2 - -@pjotr @zsloan, it seems to me this might be fixed, but please have a look and fix it in case it is not - -## Resolution - -With the new xapian search, this issue is no more. - -* closed diff --git a/issues/global-search-unhandled-error.gmi b/issues/global-search-unhandled-error.gmi index b2f6ba8..7626280 100644 --- a/issues/global-search-unhandled-error.gmi +++ b/issues/global-search-unhandled-error.gmi @@ -5,7 +5,7 @@ * assigned: aruni, fredm * priority: high * type: bug -* status: open +* status: closed * keywords: global search, gn2, genenetwork2 ## Description @@ -15,3 +15,7 @@ assume the request will always be successful. This is not always the case, as ca => https://test3.genenetwork.org/gsearch?type=gene&terms=Priscilla here (as of 2024-03-04T11:25+03:00UTC). Possible errors should be checked for and handled before attempting to read and/or process expected data. + +## Closing Comments + +This issue is closed as obsoleted. The issue is really old (>=7 months). Closing it for now. To be reopened if the issue happens again. diff --git a/issues/gn-auth/email_verification.gmi b/issues/gn-auth/email_verification.gmi index 8147bb5..fff3d54 100644 --- a/issues/gn-auth/email_verification.gmi +++ b/issues/gn-auth/email_verification.gmi @@ -2,7 +2,7 @@ ## Tags -* status: open +* status: closed, completed * priority: medium * type: enhancement * assigned: fredm, zsloan @@ -17,3 +17,5 @@ SMTP_PORT = 25 (not 587, which is what we first tried) SMTP_TIMEOUT = 200 # seconds Not sure about username/password yet. We tried UNKNOWN/UNKNOWN and my own (Zach's) username/password + +Note that this host is only visible on the internal network of UTHSC. It won't work for tux02. diff --git a/issues/gn-auth/example-privileges-script.gmi b/issues/gn-auth/example-privileges-script.gmi new file mode 100644 index 0000000..afda1a1 --- /dev/null +++ b/issues/gn-auth/example-privileges-script.gmi @@ -0,0 +1,36 @@ +# Example Python script for setting privileges for user/group + +## Description + +This is just an example of a python script for setting user/group privileges, for potential future reference + +Before running this script, stop the crontab job that automatically sets unlinked resource privileges + +```python +import uuid +import sqlite3 + +group_id = '0510dc91-0eb6-4d9d-97e5-405acc84ba2b' +resource_id = 'e5cc773d-ca28-44e2-b2a7-1c2901794238' + +publishxrefs = ('10955','10957','10960','10961','10964','10966','10969','10970','10973','10975','10978','10979','10982','10984','10987','10988','12486','12487','12489','12490','12491','12492','12493','12494','12495','12496','12497','12498','12499','12500','12501','12502','12503','12504','12505','12506','12507','12508','12509','12510','12511','12512','12513','12514','12515','12516','12517','12518','12519','12520','12521','12522','12523','12524','12525','12526','12527','12528','12529','12530','12531','12532','12533','12534','12535','12536','12537','12538','12539','12540','12541','12542','12543','12544','12545','12546','12547','12548','12549','12550','12551','12566','12567','12568','12569','12574','12575','12576','12577','12578','12579','12580','12621','12735','12737','12741','12742','12743','12744','12745','12780','12781','12782','12783','12784','12785','12786','12787','12788','12789','12790','12791','12792','12793','12794','12795','12796','12797','12798','12799','12800','12801','12803','12804','12805','12806','12807','12808','12809','12810','12812','12813','12816','12817','12961','12962','12963','12964','12965','12966','12967','12970','13029','14803','14804','14805','14806','15572','15573','16197','16375','17329','17330','17331','17332','17333','17334','17335','17336','17337','17338','17339','17340','17341','17342') + +# I generated these separatedly with uuid.uuid4(); I probably could have just done this in the script itself, but wanted to make sure they stayed the same +data_link_ids = ('3041366d-1ffd-45fb-9617-043772b285c8', 'da41fc30-3cd6-4b41-83b5-8fedc4ccd65f', '364a4010-e3fe-470f-a8c9-2a9fd359a4e3', '4e878c0a-cc92-4b21-8152-310266291967', 'ab50a999-e9bb-4bb6-91c0-9828b804156e', 'd50d30e9-15f9-4578-8b48-2bcb0d7a8afb', 'd42d2ef5-278f-4b5e-ae57-10f49f48c2e9', '78c022d7-390b-4688-96c6-c1afadd45877', '17fca9ae-8e71-4c55-b035-15d04f96d936', '4f9893de-fccf-4d6a-845d-df2f83e4d06c', '8a660b03-786a-4143-9fb3-9d00e888f3a2', '3965417a-e47a-47c8-81f6-991eef8c4152', 'e27707f7-5832-4e3f-9391-849e964bbaf6', 'bf9f6ff0-a131-46ef-8a2e-c37d8b66f992', '1ee744c4-95e1-4a66-958c-e785dc937563', '0fa79294-bbdc-4701-861d-9bb91ea72588', '38665214-7cdd-4b01-81dc-d1b78e63a0b0', '82a237df-96ce-404e-b052-8dbe45e793ee', 'ec4c1848-d326-462b-9c0d-f5e5c76e92f6', '46bee64b-8ce7-4910-80ec-211063725b1a', '7f489875-38b6-4cff-a05e-f11a7957b9b8', 'f39744a1-d673-406f-a2f1-c45082bb1975', '5f53a9e9-e40c-4a01-bf9d-430d7c2fd5ef', '1f0a4f2d-cd1c-41e5-a185-2ea2b2b05cd3', 'e282651c-7dc3-40e9-bb52-14e73c3a4ef7', '3c492e6d-e807-427b-acca-44afa4862894', '38e0df6c-3f44-4acb-9965-f0d3f0278150', '35e5ae63-3a32-49ac-93ed-b39d02ab5f5c', '0e6bfa4a-4fee-4b54-80c6-209f9b0ecd00', 'eb85e71a-8b4b-4f3f-9168-59b4ebc090a1', '3eb0325c-4dce-481e-bce7-46c37031da76', '7bc5ce49-4150-4d87-bfbf-d3a1cd20ad67', '03c0cba7-8712-4a27-9b79-e38818805b1f', '07d787ec-e0f9-4b7c-b368-d1f56ce030dc', '51d9e601-31c7-4643-b896-79d90bdc4105', '3cee3754-2822-4f0a-87ad-96bdfe2f0232', 'a7e9eb54-63bd-4ca9-a1f8-1aeac02a76db', '3ff132e5-7fb6-4763-943e-1efbe5f8000e', 'c685f0c9-084d-44d2-882e-ce66cdccef6d', 'ea062e07-1f59-4312-bfd9-6560e652c878', '75d33621-b5a4-447d-a094-7480d1d57a47', 'bb3dbd16-0c73-47d8-8e21-f095d3398b61', '0211177b-a92c-4215-a622-0cba5e8e2866', 'e2139b64-e74a-4263-9785-314e73b102df', '0426f12b-c223-487b-8ab7-baea5995c480', '4a467a72-174c-4ec7-9557-859656ad2c71', '38ab978e-e78f-4c0a-8af3-449b636fe5e6', 'a45c8d42-14d3-464d-8395-8a574148da78', 'e4171cc1-4a03-4311-a287-cee1b8084227', '75d70308-6f1a-49e4-9199-97ec8f60778e', 'efb5c834-b88a-4ee9-b09d-91913fddb546', '23866a00-a729-4ba9-af22-ee83ec164d34', '3feb1154-0613-464b-b758-aad308550a74', '7019d0f1-a590-46ce-a30e-4c21541b6ea8', '6e803182-71d2-4427-a5df-ad84651e5d11', 'fe1bf3f6-818b-4fae-9880-8ae2c1bdcff6', '66d480f7-da41-49ed-a222-8724b493313a', 'c908d2a3-8378-4574-83be-3bf8bdeff5fb', '96b36360-7258-43ab-bdda-23e93f15b0ac', 'daf90aca-6ee6-4c3c-9a60-1e7ae2e29cd2', '43800347-1fe1-40f7-9013-408f0b0740e9', 'e9350a78-a62f-4a08-8881-e6e51450d120', 'bda9a217-d605-4a18-9c3f-5139679ae413', 'cbd8f79a-4992-43c9-8391-994e221b73e1', 'c6b64d90-63ff-482d-b205-f58f3cf656df', '3ecbf267-3655-42a6-a8f9-2751439efb27', '808ae753-a255-43a6-96d4-0ed02b14aefe', '1a5424df-49b3-4274-8281-a1eed838ffda', '89e6d278-e643-43a2-8a61-746cbf446109', 'b4940ece-80a0-4382-ba57-eaad1d35e83e', 'f46cd643-fccb-4037-b642-9a4a329e84e2', '497a235c-4253-4e94-a69c-4b2f200976dd', '02aa8e3a-f9ac-459b-8e35-7081f2849f48', 'da5018e2-38af-415a-ad43-8caf8d82290d', '574ee482-f534-475e-9e7a-0a14e05f4495', 'b90b3a02-fa8d-4393-9dbb-087224a80b40', 'd68370ec-f569-42f3-9c07-a3118aa73ad5', '4b6b099b-3a7c-46c2-a2fc-92c01463b698', 'c9f5608f-3301-4835-b6dc-b1891fe81c36', 'eead972c-0fc4-4c5e-b1ad-63db4d1e9409', 'd8b295eb-6d07-4abe-8b8a-8cfef066a32e', 'a89f3944-be64-42d0-aa66-d2501021760d', '02f42124-bc38-4a14-9400-bbc8e8bf41b7', 'abbcb901-da42-4ef1-bc2c-55b95d584461', 'e28b0cef-eddb-41f2-9479-722365c0b2e0', '9135c304-1dd3-4eb5-82d4-91a86e39068a', '0bbd5f1d-eef3-4c35-84ab-484165a4240d', '08ad9a25-b20d-4ad8-a5e0-a886edc4a7aa', '7e05bdf8-51f5-49dc-9ff6-fbbc6aa20c9f', 'c82d4943-dc6f-4ec8-b76f-1309290183fe', '6a8d76bc-156b-4925-823c-b4585a847efc', '2604e9a8-a4ee-49be-a754-126b1705516e', '8c32b69b-e796-418d-b254-104a179a84ba', '532dca31-c38e-4b77-a84c-563407e9ae00', '954cacda-179e-42a9-8c1f-987e6fae1079', 'bcfced8a-bd50-48e6-9edb-4776a1e95bf5', '66308324-1747-46df-8ddf-41e5bff1cd1a', 'f797e23c-7cb6-4869-97f5-3a79b685c6a3', '0869bb57-0133-4e57-9655-2b6eb1906f5e', 'fc0dddfa-e683-4a8d-9f57-82fb368f8a84', '35b7ffc1-6782-4c85-9bf8-d51629cab2d0', '232850b6-5a53-45e0-8668-7773b9cb39c2', 'af20291c-2be6-40e1-9576-b78df5d56774', 'f52f5c1a-1f8a-4b8a-8e00-fc2bdc6edc5b', '90819230-f372-4e48-96fc-6fb97199fa07', 'b31aefbf-fb67-49dc-b357-f8f0cd76cea9', '5d695f24-674a-4dc5-9e02-7817b77ab06b', '064d5972-f636-4771-95fe-3f6260fd550f', 'c2254f71-98dc-4303-bc26-9b9640582be1', '6eac9495-a366-4e65-90d2-d63472937925', '119398e3-b8cc-4ae5-addb-ec13db9834fa', '6cce7b35-fe2a-4348-9e42-5179ea9f42f1', '65940929-c9fc-47e9-b1cf-c9c9688f7871', '73ffdb1a-f70d-4e8e-88b7-0e22cfd1916e', 'c1b25581-7d28-4535-bcdc-44dc3bc7e438', '6e03a5f7-f200-439a-a465-97056d3c9f71', '4d270b71-2e06-4cfb-a60d-258ccbc7860a', '8b82e29f-a901-454f-a9ad-2f96be9d6c44', '7d699b76-f554-44db-9c68-6ff985cd6388', '3417b2dc-a88a-4cb6-a446-9e90063731f9', '18760f59-4b50-48d5-9814-8117490ab972', '4aaebf37-9529-4365-bdb8-dd53b0ac2499', '95ecdf43-12a5-4b3c-993a-ff03b58cee93', '2b5dd4e6-2310-417e-82bb-b16e96c7346b', '92ee883a-646d-44dd-b2c6-1bffb7b0d2cb', '979038e4-9392-4836-ad04-f125cf19eafa', '1220629d-000f-4508-8a41-3706eebeb812', '42abca44-8eb3-4aa7-adae-16afc211dff4', '82fe9559-718e-4424-9465-033204e1ec03', '8353fe08-e6c8-4f87-b0d8-412ab4a41d19', '1c6bebcf-c125-42a3-9d5b-4fae3113b62b', 'ba54b2ba-fee3-4f1d-a903-18edc7c694bd', '0ea0d40d-3204-4b9b-bae2-54355dce2b5c', '5ee4857c-00b4-46d6-880c-44dbae021b45', '2caa4c03-78ce-456d-8e20-edb531bdd45a', 'e2536a5e-357d-4f6d-a764-ac85a40a2f3f', 'e6341996-80bb-42f9-8842-92062680e957', '3612e03e-430d-4da3-ac87-93a310a3d780', '88c600d2-cefd-4a99-a904-bf2260554ac6', 'f1a6af16-2525-4650-b729-cbec60ad276c', '4b854252-9e87-4d7c-99d9-84ae9297d26e', 'be580989-3ccd-48bd-8c85-a750a800afbd', '5fd675fe-e765-4bf0-8e0f-8f81107a0bb8', 'cf852032-6399-4bf8-a8e7-474c84030430', 'eef27f8a-32d2-4add-a018-ff2d34208a11', '3aca3b1d-4589-4b4c-90de-588fd43fe835', 'd6187213-5a39-4089-ac50-eb144be2a3a5', '5bf60cda-b6b9-4992-91ac-c022e523202a', '4c4395ca-2f2e-4a85-93df-37d2c7f3d1d6', 'b8f9d837-2bd6-447c-9ad8-f581f84f36c1', '029a88bb-3850-4e85-87ab-8ecb3ad59538', '39ead890-0e1a-43df-9bbc-459a3ea0a016', '4b559ad2-c4d8-4763-bc08-90cb63fc79d0', '8361884a-248b-4dac-a9f9-d56f31ab477e', 'd79e2e00-9ea6-4d43-addc-3b1955bc7e5f', '4c0a35ac-c549-4c1a-9fc8-a2e93ba1c632', '50f558d0-c7b1-4204-8ebb-5855e7588998', 'be061746-1b34-4c04-a752-ab5c8d78fdef', 'f8edfb50-c572-4025-87c6-b34e88d8fb90', '0a799ff1-df2c-4c85-9b7e-4fe4885ab5cd', 'db373aa1-8ab9-4257-8d48-11dc92448344', '1e2b9de8-74a4-446a-970e-b47c662760b2', 'ac09ffdf-9cb5-49be-8f52-b681598453f6', 'ae4a55af-a1bb-4698-b2e7-ffbed8760635', '7989ff1f-a9da-439a-bb8b-14482b15dd2e') + +# delete_query deletes from the AutoAdminGroup +delete_query = 'delete from linked_phenotype_data where group_id="5ea09f67-5426-4b66-9ea2-12bdd78350e8" and SpeciesId="1" and InbredSetId="1" and PublishFreezeId="1" and PublishXRefId=?' +resource_query = "insert into phenotype_resources values ('e5cc773d-ca28-44e2-b2a7-1c2901794238', ?)" +link_query = 'insert into linked_phenotype_data (data_link_id, group_id, SpeciesId, InbredSetId, PublishFreezeId, dataset_name, dataset_fullname, dataset_shortname, PublishXRefId) values (?,?,?,?,?,?,?,?,?)' + +db_path = '/home/gn2/auth.db' +conn = sqlite3.connect(db_path) +cursor = conn.cursor() + +the_data = tuple((dlid, group_id, 1, 1, 1, 'BXDPublish', 'BXD Phenotypes', 'BXD Publish', pxrid) for (dlid, pxrid) in zip(data_link_ids, publishxrefs)) + +cursor.executemany(delete_query, tuple((item,) for item in publishxrefs)) +cursor.executemany(link_query, the_data) +cursor.executemany(resource_query, tuple((item,) for item in data_link_ids)) +conn.commit() +``` diff --git a/issues/gn-auth/feature-request-create-test-accounts.gmi b/issues/gn-auth/feature-request-create-test-accounts.gmi new file mode 100644 index 0000000..9e8aa45 --- /dev/null +++ b/issues/gn-auth/feature-request-create-test-accounts.gmi @@ -0,0 +1,51 @@ +# Feature Request: Create Test Accounts + +## Tags + +* assigned: fredm, alex +* status: open +* type: feature request, feature-request +* priority: medium +* keywords: gn-auth, auth, test accounts + +## Description + +From the requests on Matrix: + +@alexm +``` +fredmanglis +: Can we create a generic, verified email for CD to make it easier for people to test our services that requires login? +``` + +and from @pjotrp + +``` +yes, please. Let it expire after a few weeks, or something, if possible. So we can hand out test accounts. +``` + +We, thus, want to have a feature that allows the system administrator, or some other user with the appropriate privileges, to create a bunch of test accounts that have the following properties: + +* The accounts are pre-verified +* The accounts are temporary and are deleted after a set amount of time + +This feature will need a corresponding UI, say on GN2 to enable the users with the appropriate privileges create the accounts easily. + +### Implementation Considerations + +Only system-admin level users will be able to create the test accounts + +We'll probably need to track the plain-text passwords for these accounts, probably. + +Information to collect might include: +* Start of test period (automatic on test account creation: mandatory) +* End of test period (Entered at creation time: mandatory) +* A pattern of sorts to follow when creating the accounts — this brings up the question, is there a specific domain (e.g. …@uthsc.edu, …@genenetwork.org etc.) that these test accounts should use? +* Extra details on event/conference necessitating creation of the test account(s) (optional) + + +Interaction with the rest of the system that we need to consider and handle are: +* Assign public-read for all public data: mostly easy. +* Forgot Password: If such users request a password change, what happens? Password changes requires emails to be sent out with a time-sensitive token. The emails in the test accounts are not meant to be actual existing emails and thus cannot reliably receive such emails. This needs to be considered. Probably just prevent users from changing their passwords. +* What group to assign to these test accounts? I'm thinking probably a new group that is also temporary - deleted when users are deleted. +* What happens to any data uploaded by these accounts? They should probably not upload data meant to be permanent. All their data might need to be deleted along with the temporary accounts. diff --git a/issues/gn-auth/fix-refresh-token.gmi b/issues/gn-auth/fix-refresh-token.gmi new file mode 100644 index 0000000..1a6a825 --- /dev/null +++ b/issues/gn-auth/fix-refresh-token.gmi @@ -0,0 +1,53 @@ +# Fix Refresh Token + +## Tags + +* status: open +* priority: high +* assigned: fredm +* type: feature-request, bug +* keywords: gn-auth, token, refresh token, jwt + +## Description + +The way we currently provide the refresh token is wrong, and complicated, and +leads to subtle bugs in the clients. + +The refresh tokens should be sent back together with the access token in the +same response with the following important considerations: + +* The access token is sent back as the body of the response +* The refresh token is sent back as a httpOnly cookie +* The refresh token should be opaque to the client — if it is a JWT, encrypt it + +### Server-Side Changes + +The following changes will be necessary at the generation of the access token: + +* Generate the refresh token (possibly in the `create_token_response()` function in `gn_auth.auth.authentication.oauth2.grants.JWTBearerGrant`). Put the user ID, and expiration in the refresh token. Expiration can be provided as part of initial request. +* Encrypt the refresh token (maybe use the auth-server's public key for this) +* Save refresh token to DB with link to access token ID perhaps? +* Attach the token to the response as a httpOnly cookie + +at the refreshing of the access token, we'll need to: + +* Fetch the refresh token from the cookies +* Decrypt it +* Compare the user ID in the refresh token with that in the access token provided +* Verify refresh token has not expired +* Check that the refresh token is not revoked (revocation will happen when user logs out, on manual sys-admin revocation) +* Generate new access token +* Do we attach the same refresh token or generate a new one? + +#### Gotchas + +Since there are multiple workers, you could get a flurry of refresh requests using the same refresh token. We might need to handle that — maybe save the refresh request to DB with the ID of the access token used and the new access token, and simply return the same new access token generated by the first successful refresh worker. + +This actually kills 2 birds with the one stone: +* The refresh completes successfully if the refresh token is not expired and the access token is valid +* In case the access token and refresh token are somehow compromised, the system returns the same, possibly expired access token, rendering the compromise moot. + +### Client-Side Changes + +* Get the refresh token from the cookies rather than from the body +* Maybe: make refreshing the access token unaware of threads/workers diff --git a/issues/gn-auth/implement-redirect-on-login.gmi b/issues/gn-auth/implement-redirect-on-login.gmi new file mode 100644 index 0000000..342b2e6 --- /dev/null +++ b/issues/gn-auth/implement-redirect-on-login.gmi @@ -0,0 +1,22 @@ +# Redirect Users to the Correct URL on Login for GN2 + +## Tags + +* assigned: alexm +* priority: medium +* status: in progress +* keywords: gn-auth, auth, redirect, login, completed, closed, done +* type: feature-request + +## Description + +The goal is to redirect users to the login page for services that require authentication, and then return them to the page they were trying to access before logging in, rather than sending them to the homepage. Additionally, display the message "You are required to log in" on the current page instead of on the homepage. + +## Tasks + +* [x] Redirect users to the login page if they are not logged in. +* [x] Implement a redirect to the correct resource after users log in. + +## Notes +See this PR for commits that fixes this: +=> https://github.com/genenetwork/genenetwork2/pull/875 diff --git a/issues/gn-auth/implement-refresh-token.gmi b/issues/gn-auth/implement-refresh-token.gmi index 6b697eb..0dc63f3 100644 --- a/issues/gn-auth/implement-refresh-token.gmi +++ b/issues/gn-auth/implement-refresh-token.gmi @@ -2,7 +2,7 @@ ## Tags -* status: open +* status: closed, completed, fixed * priority: high * assigned: fredm, bonfacem * type: feature-request, bug diff --git a/issues/gn-auth/new-privilegs-samples-ordering.gmi b/issues/gn-auth/new-privilegs-samples-ordering.gmi new file mode 100644 index 0000000..be9cfe9 --- /dev/null +++ b/issues/gn-auth/new-privilegs-samples-ordering.gmi @@ -0,0 +1,32 @@ +# New Privileges: Samples Ordering + +## Tags + +* status: open +* assigned: fredm +* interested: @zachs, @jnduli, @flisso +* priority: medium +* type: feature-request, feature request +* keywords: gn-auth, auth, privileges, samples, ordering + +## Description + +From the email thread: + +``` +Regarding the order of samples, it can basically be whatever we decide it is. It just needs to stay consistent (like if there are multiple genotype files). It only really affects how it's displayed, and any other genotype files we use for mapping needs to share the same order. +``` + +Since this has nothing to do with the data analysis, this could be considered a system-level privilege. I propose + +``` +system:species:samples:ordering +``` + +or something similar. + +This can be added into some sort of generic GN2 curator role (as opposed to a data curator role). + +This allows us to have users that are "data curators" that we can offload some of the data curation work to (e.g. @flisso, @suheeta etc.). + +We would then, restrict the UI and display "curation" to users like @acenteno, @robw and @zachs. This second set of users would thus have both the "data curation" roles, and still have the "UI curation" roles. diff --git a/issues/gn-auth/problems-with-roles.gmi b/issues/gn-auth/problems-with-roles.gmi index 46f3c52..2778b61 100644 --- a/issues/gn-auth/problems-with-roles.gmi +++ b/issues/gn-auth/problems-with-roles.gmi @@ -3,9 +3,9 @@ ## Tags * type: bug -* status: open * priority: critical * assigned: fredm, zachs +* status: closed, completed, fixed * keywords: gn-auth, authorisation, authorization, roles, privileges ## Description @@ -29,8 +29,8 @@ The implementation should instead, tie the roles to the specific resource, rathe * [x] migration: Add `resource:role:[create|delete|edit]-role` privileges to `resource-owner` role * [x] migration: Create new `resource_roles` db table linking each resource to roles that can act on it, and the user that created the role * [x] migration: Drop table `group_roles` deleting all data in the table: data here could already have privilege escalation in place -* [ ] Create a new "Roles" section on the "Resource-View" page, or a separate "Resource-Roles" page to handle the management of that resource's roles -* [ ] Ensure user can only assign roles they have created - maybe? +* [x] Create a new "Roles" section on the "Resource-View" page, or a separate "Resource-Roles" page to handle the management of that resource's roles +* [x] Ensure user can only assign roles they have created - maybe? ### Fixes @@ -39,3 +39,4 @@ The implementation should instead, tie the roles to the specific resource, rathe => https://git.genenetwork.org/gn-auth/commit/?h=handle-role-privilege-escalation&id=5d34332f356164ce539044f538ed74b983fcc706 => https://git.genenetwork.org/gn-auth/commit/?h=handle-role-privilege-escalation&id=f691603a8e7a1700783b2be6f855f30d30f645f1 => https://git.genenetwork.org/gn-auth/commit/?h=handle-role-privilege-escalation&id=2363842cc81132a2592d5cda98e6ebf1305e8482 +=> https://github.com/genenetwork/genenetwork2/commit/a7a8754a57594e5705fea8e5bbea391a09e8f64c diff --git a/issues/gn-auth/registration.gmi b/issues/gn-auth/registration.gmi index 6558a6d..61ea94a 100644 --- a/issues/gn-auth/registration.gmi +++ b/issues/gn-auth/registration.gmi @@ -2,8 +2,11 @@ # Tags +* type: bug * assigned: fredm * priority: critical +* status: closed, completed, fixed +* keywords: gn-auth, auth, authorisation, authentication, registration # Issues diff --git a/issues/gn-auth/resources-duplicates-in-resources-list.gmi b/issues/gn-auth/resources-duplicates-in-resources-list.gmi new file mode 100644 index 0000000..379c1eb --- /dev/null +++ b/issues/gn-auth/resources-duplicates-in-resources-list.gmi @@ -0,0 +1,29 @@ +# Resources: Duplicates in Resources List + +## Tags + +* type: bug +* status: closed +* priority: medium +* assigned: fredm, zachs, zsloan +* keywords: gn-auth, auth, authorisation, resources + +## Reproduce + +* Go to https://genenetwork.org/ +* Sign in to the system +* Click on "Profile" at the top to go to your profile page +* Click on "Resources" on your profile page to see the resources you have access to + +## Expected + +Each resource appears on the list only one time + +## Actual + +Some resources appear more than once on the list + + +## Fix + +=> https://git.genenetwork.org/gn-auth/commit/?id=00f863b3dcb76f5fdca8e139e903e2f7edb861fc diff --git a/issues/send-out-confirmation-emails-on-registration.gmi b/issues/gn-auth/send-out-confirmation-emails-on-registration.gmi index c85e26b..e32c7c0 100644 --- a/issues/send-out-confirmation-emails-on-registration.gmi +++ b/issues/gn-auth/send-out-confirmation-emails-on-registration.gmi @@ -2,11 +2,11 @@ ## Tags -* status: open +* status: closed, completed * assigned: fredm * priority: medium -* keywords: email, user registration * type: feature request, feature-request +* keywords: gn-auth, email, user registration, email confirmation ## Description diff --git a/issues/gn-auth/test1-deployment-cant-find-templates.gmi b/issues/gn-auth/test1-deployment-cant-find-templates.gmi index bd2f57e..ca3bfad 100644 --- a/issues/gn-auth/test1-deployment-cant-find-templates.gmi +++ b/issues/gn-auth/test1-deployment-cant-find-templates.gmi @@ -4,7 +4,7 @@ * assigned: fredm, aruni * priority: critical -* status: open +* status: closed, completed, fixed * type: bug * keywords: gn-auth, deployment, test1 diff --git a/issues/gn-guile/Configurations.gmi b/issues/gn-guile/Configurations.gmi new file mode 100644 index 0000000..f1ae06e --- /dev/null +++ b/issues/gn-guile/Configurations.gmi @@ -0,0 +1,60 @@ +# gn-guile Configurations + +## Tags + +* type: bug +* assigned: +* priority: high +* status: open +* keywords: gn-guile, markdown editing +* interested: alexk, bonfacem, fredm, pjotrp + +## Description + +=> https://git.genenetwork.org/gn-guile/ The gn-guile service +is used to enable markdown editing in GeneNetwork. + +There are configuration that are needed to get the system to work as expected: + +* CURRENT_REPO_PATH: The local path to the cloned repository +* CGIT_REPO_PATH: path to the bare repo (according to docs [gn-guile-docs]) + +With these settings, we should be able to make changes to make edits. These edits, however, do not get pushed upstream. + +Looking at the code +=> https://git.genenetwork.org/gn-guile/tree/web/webserver.scm?id=4623225b0adb0846a4c2e879a33b31884d2e5f05#n212 +we see both the settings above being used, and we can further have a look at +=> https://git.genenetwork.org/gn-guile/tree/web/view/markdown.scm?id=4623225b0adb0846a4c2e879a33b31884d2e5f05#n78 the definition of git-invoke. + +With the above, we could, hypothetically, do a command like: + +``` +git -C ${CURRENT_REPO_PATH} push ${REMOTE_REPO_URI} master +``` + +where REMOTE_REPO_URI can be something like "appuser@git.genenetwork.org:/home/git/public/gn-guile" + +That means we change the (git-invoke …) call seen previously to something like: + +``` +(git-invoke +current-repo-path+ "push" +remote-repo-url+ "master") +``` + +and make sure that the "+remote-repo-url+" value is something along the URI above. + +### Gotchas + +We need to fetch and rebase with every push, to avoid conflicts. That means we'll need a sequence such as the following: + +``` +(git-invoke +current-repo-path+ "fetch" +remote-repo-url+ "master") +(git-invoke +current-repo-path+ "rebase" "origin/master") +(git-invoke +current-repo-path+ "push" +remote-repo-url+ "master") +``` + +The tests above work with a normal user. We'll be running this code within a container, so we do need to expose a specific private ssh key for the user to use to push to remote. This also means that the corresponding public key should be registered with the repository server. + +## References + +* [gn-guile-docs] https://git.genenetwork.org/gn-guile/tree/doc/git-markdown-editor.md?id=4623225b0adb0846a4c2e879a33b31884d2e5f05 + diff --git a/issues/gn-guile/rendering-images-within-markdown-documents.gmi b/issues/gn-guile/rendering-images-within-markdown-documents.gmi new file mode 100644 index 0000000..fe3ed39 --- /dev/null +++ b/issues/gn-guile/rendering-images-within-markdown-documents.gmi @@ -0,0 +1,22 @@ +# Rendering Images Linked in Markdown Documents + +## Tags + +* status: open +* priority: high +* type: bug +* assigned: alexm, bonfacem, fredm +* keywords: gn-guile, images, markdown + +## Description + +Rendering images linked within markdown documents does not work as expected — we cannot render images if they have a relative path. +As an example see the commit below: +=> https://github.com/genenetwork/gn-docs/commit/783e7d20368e370fb497974f843f985b51606d00 + +In that commit, we are forced to use the full github uri to get the images to load correctly when rendered via gn-guile. This, has two unfortunate consequences: + +* It makes editing more difficult, since the user has to remember to find and use the full github URL for their images. +* It ties the data and code to github + +This needs to be fixed, such that any and all paths relative to the markdown file are resolved at render time automatically. diff --git a/issues/gn-guile/rework-hard-dependence-on-github.gmi b/issues/gn-guile/rework-hard-dependence-on-github.gmi new file mode 100644 index 0000000..751e9fe --- /dev/null +++ b/issues/gn-guile/rework-hard-dependence-on-github.gmi @@ -0,0 +1,21 @@ +# Rework Hard Dependence on Github + +## Tags + +* status: open +* priority: medium +* type: bug +* assigned: alexm +* assigned: bonfacem +* assigned: fredm +* keywords: gn-guile, github + +## Description + +Currently, we have a hard-dependence on Github for our source repository — you can see this in lines 31, 41, 55 and 59 of the code linked below: + +=> https://git.genenetwork.org/gn-guile/tree/web/view/markdown.scm?id=0ebf6926db0c69e4c444a6f95907e0971ae9bf40 + +The most likely reason is that the "edit online" functionality might not exist in a lot of other popular source forges. + +This is rendered moot, however, since we do provide a means to edit the data on Genenetwork itself. We might as well get rid of this option, and only allow the "edit online" feature on Genenetwork and stop relying on its presence in the forges we use. diff --git a/issues/gn-uploader/AuthorisationError-gn-uploader.gmi b/issues/gn-uploader/AuthorisationError-gn-uploader.gmi new file mode 100644 index 0000000..50a236d --- /dev/null +++ b/issues/gn-uploader/AuthorisationError-gn-uploader.gmi @@ -0,0 +1,66 @@ +# AuthorisationError in gn uploader + +## Tags +* assigned: fredm +* status: open +* priority: critical +* type: error +* key words: authorisation, permission + +## Description + +Trying to create population for Kilifish dataset in the gn-uploader webpage, +then encountered the following error: +```sh +Traceback (most recent call last): + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/flask/app.py", line 917, in full_dispatch_request + rv = self.dispatch_request() + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/flask/app.py", line 902, in dispatch_request + return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/authorisation.py", line 23, in __is_session_valid__ + return session.user_token().either( + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/pymonad/either.py", line 89, in either + return right_function(self.value) + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/authorisation.py", line 25, in <lambda> + lambda token: function(*args, **kwargs)) + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/population/views.py", line 185, in create_population + ).either( + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/pymonad/either.py", line 91, in either + return left_function(self.monoid[0]) + File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/monadic_requests.py", line 99, in __fail__ + raise Exception(_data) +Exception: {'error': 'AuthorisationError', 'error-trace': 'Traceback (most recent call last): + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/flask/app.py", line 917, in full_dispatch_request + rv = self.dispatch_request() + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/flask/app.py", line 902, in dispatch_request + return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/authlib/integrations/flask_oauth2/resource_protector.py", line 110, in decorated + return f(*args, **kwargs) + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/resources/inbredset/views.py", line 95, in create_population_resource + ).then( + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/pymonad/monad.py", line 152, in then + result = self.map(function) + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/pymonad/either.py", line 106, in map + return self.__class__(function(self.value), (None, True)) + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/resources/inbredset/views.py", line 98, in <lambda> + "resource": create_resource( + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/resources/inbredset/models.py", line 25, in create_resource + return _create_resource(cursor, + File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/checks.py", line 56, in __authoriser__ + raise AuthorisationError(error_description) +gn_auth.auth.errors.AuthorisationError: Insufficient privileges to create a resource +', 'error_description': 'Insufficient privileges to create a resource'} + +``` +The error above resulted from the attempt to upload the following information on the gn-uploader-`create population section` +Input details are as follows: +Full Name: Kilifish F2 Intercross Lines +Name: KF2_Lines +Population code: KF2 +Description: Kilifish second generation population +Family: Crosses, AIL, HS +Mapping Methods: GEMMA, QTLReaper, R/qtl +Genetic type: intercross + +And when pressed the `Create Population` icon, it led to the error above. + diff --git a/issues/gn-uploader/check-genotypes-in-database-too.gmi b/issues/gn-uploader/check-genotypes-in-database-too.gmi new file mode 100644 index 0000000..4e034b7 --- /dev/null +++ b/issues/gn-uploader/check-genotypes-in-database-too.gmi @@ -0,0 +1,22 @@ +# Check Genotypes in the Database for R/qtl2 Uploads + +## Tags + +* type: bug +* assigned: fredm +* priority: high +* status: closed, completed, fixed +* keywords: gn-uploader, uploader, upload, genotypes, geno + +## Description + +Currently, the uploader expects that a R/qtl2 bundle be self-contained, i.e. it contains all the genotypes and other data that fully describe the data in that bundle. + +This is unnecessary, in a lot of situations, seeing as Genenetwork might already have the appropriate genotypes already in its database. + +This issue tracks the implementation for the check of the genotypes against both the genotypes provided in the bundle, and those already in the database. + +### Updates + +Fixed in +=> https://git.genenetwork.org/gn-uploader/commit/?id=0e74a1589db9f367cdbc3dce232b1b6168e3aca1 this commit diff --git a/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi b/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi index d2c33e8..5a5cdfa 100644 --- a/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi +++ b/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi @@ -3,7 +3,7 @@ ## Tags * assigned: fredm, aruni -* status: open +* status: closed, completed * priority: high * type: bug * keywords: guix, gn-uploader diff --git a/issues/gn-uploader/link-authentication-authorisation.gmi b/issues/gn-uploader/link-authentication-authorisation.gmi new file mode 100644 index 0000000..90b8e5e --- /dev/null +++ b/issues/gn-uploader/link-authentication-authorisation.gmi @@ -0,0 +1,15 @@ +# Link Authentication/Authorisation + +## Tags + +* status: open +* assigned: fredm +* priority: critical +* type: feature request, feature-request +* keywords: gn-uploader, gn-auth, authorisation, authentication, uploader, upload + +## Description + +The last chain in the link to the uploads is the authentication/authorisation. Once the user uploads their data, they need access to it. The auth system, by default, will deny anyone/everyone access to any data that is not linked to a resource and which no user has any roles allowing them access to the data. + +We, currently, assign such data to the user manually, but that is not a sustainable way of working, especially as the uploader is exposed to more and more users. diff --git a/issues/quality-control/move-uploader-to-tux02.gmi b/issues/gn-uploader/move-uploader-to-tux02.gmi index 4459433..20c5b24 100644 --- a/issues/quality-control/move-uploader-to-tux02.gmi +++ b/issues/gn-uploader/move-uploader-to-tux02.gmi @@ -5,7 +5,7 @@ * type: migration * assigned: fredm * priority: high -* status: open +* status: closed, completed, fixed * keywords: gn-uploader, guix, container, deploy ## Databases @@ -17,13 +17,13 @@ This implies separate configurations, and separate startup. Some of the things to do to enable this, then, are: -- [x] Provide separate configs and run db server on separate port +* [x] Provide separate configs and run db server on separate port - Configs put in /etc/mysql3307 - Selected port 3307 - datadir in /var/lib/mysql3307 -> /export5 -- [x] Provide separate data directory for the content +* [x] Provide separate data directory for the content - extract backup -- [x] Maybe suffix the files with the port number, e.g. +* [x] Maybe suffix the files with the port number, e.g. ``` datadir = /var/lib/mysql3307 socket = /var/run/mysqld/mysqld3307.sock diff --git a/issues/gn-uploader/provide-page-for-uploaded-data.gmi b/issues/gn-uploader/provide-page-for-uploaded-data.gmi new file mode 100644 index 0000000..60b154b --- /dev/null +++ b/issues/gn-uploader/provide-page-for-uploaded-data.gmi @@ -0,0 +1,22 @@ +# Provide Page/Link for/to Uploaded Data + +## Tags + +* status: open +* assigned: fredm +* priority: medium +* type: feature, feature request, feature-request +* keywords: gn-uploader, uploader, data dashboard + +## Description + +Once a user has uploaded their data, provide them with a landing page/dashboard for the data they have uploaded, with details on what that data is. + +* Should we provide a means to edit the data here (mostly to add metadata and the like)? +* Maybe the page should actually be shown on GN2? + +## Blockers + +Depends on + +=> /issues/gn-uploader/link-authentication-authorisation diff --git a/issues/gn-uploader/replace-redis-with-sqlite3.gmi b/issues/gn-uploader/replace-redis-with-sqlite3.gmi new file mode 100644 index 0000000..3e5020a --- /dev/null +++ b/issues/gn-uploader/replace-redis-with-sqlite3.gmi @@ -0,0 +1,17 @@ +# Replace Redis with SQL + +## Tags + +* status: open +* priority: low +* assigned: fredm +* type: feature, feature-request, feature request +* keywords: gn-uploader, uploader, redis, sqlite, sqlite3 + +## Description + +We currently (as of 2024-06-27) use Redis for tracking any asynchronous jobs (e.g. QC on uploaded files). + +A lot of what we use redis for, we can do in one of the many SQL databases (we'll probably use SQLite3 anyway), which are more standardised, and easier to migrate data from and to. It has the added advantage that we can open multiple connections to the database, enabling the different processes to update the status and metadata of the same job consistently. + +Changes done here can then be migrated to the other systems, i.e. GN2, GN3, and gn-auth, as necessary. diff --git a/issues/gn-uploader/resume-upload.gmi b/issues/gn-uploader/resume-upload.gmi new file mode 100644 index 0000000..0f9ba30 --- /dev/null +++ b/issues/gn-uploader/resume-upload.gmi @@ -0,0 +1,41 @@ +# gn-uploader: Resume Upload + +## Tags + +* status: closed, completed, fixed +* priority: medium +* assigned: fredm, flisso +* type: feature request, feature-request +* keywords: gn-uploader, uploader, upload, resume upload + +## Description + +If a user is uploading a particularly large file, we might need to provide a way for the user to resume their upload of the file. + +Maybe this can wait until we have +=> /issues/gn-uploader/link-authentication-authorisation linked authentication/authorisation to gn-uploader. +In this way, each upload can be linked to a specific user. + +### TODOs + +* [x] Build UI to allow uploads +* [x] Build back-end to handle uploads +* [x] Handle upload failures/errors +* [x] Deploy to staging + +### Updates + +=> https://git.genenetwork.org/gn-uploader/commit/?id=9a8dddab072748a70d43416ac8e6db69ad6fb0cb +=> https://git.genenetwork.org/gn-uploader/commit/?id=df9da3d5b5e4382976ede1b54eb1aeb04c4c45e5 +=> https://git.genenetwork.org/gn-uploader/commit/?id=47c2ea64682064d7cb609e5459d7bd2e49efa17e +=> https://git.genenetwork.org/gn-uploader/commit/?id=a68fe177ae41f2e58a64b3f8dcf3f825d004eeca + +### Possible Resources + +=> https://javascript.info/resume-upload +=> https://github.com/23/resumable.js/ +=> https://www.dropzone.dev/ +=> https://stackoverflow.com/questions/69339582/what-hash-python-3-hashlib-yields-a-portable-hash-of-file-contents + + +This is mostly fixed. Any arising bugs can be tracked is separate issues. diff --git a/issues/gn-uploader/samplelist-details.gmi b/issues/gn-uploader/samplelist-details.gmi new file mode 100644 index 0000000..2e64d8a --- /dev/null +++ b/issues/gn-uploader/samplelist-details.gmi @@ -0,0 +1,17 @@ +# Explanation of how Sample Lists are handled in GN2 (and may be handled moving forward) + +## Tags + +* status: open +* assigned: fredm, zsloan +* priority: medium +* type: documentation +* keywords: strains, gn-uploader + +## Description + +Regarding the order of samples/strains, it can basically be whatever we decide it is. It just needs to stay consistent (like if there are multiple genotype files). It only really affects how the strains are displayed, and any other genotype files we use for mapping needs to share the same order. + +I think this is the case regardless of whether it's strains or individuals (and both the code and files make no distinction). Sometimes it just logically makes sense to sort them in a particular way for display purposes (like BXD1, BXD2, etc), but technically everything would still work the same if you swapped those columns across all genotype files. Users would be confused about why BXD2 is before BXD1, but everything would still work and all calculations would give the same results. + +zsloan's proposal for handling sample lists in the future is to just store them in a JSON file in the genotype_files/genotype directory. diff --git a/issues/gn-uploader/speed-up-rqtl2-qc.gmi b/issues/gn-uploader/speed-up-rqtl2-qc.gmi new file mode 100644 index 0000000..43e6d49 --- /dev/null +++ b/issues/gn-uploader/speed-up-rqtl2-qc.gmi @@ -0,0 +1,30 @@ +# Speed Up QC on R/qtl2 Bundles + +## Tags + +## Description + +The default format for the CSV files in a R/qtl2 bundle is: + +``` +matrix of individuals × (markers/phenotypes/covariates/phenotype covariates/etc.) +``` + +(A) (f/F)ile(s) in the R/qtl2 bundle could however +=> https://kbroman.org/qtl2/assets/vignettes/input_files.html#csv-files be transposed, +which means the system needs to "un-transpose" the file(s) before processing. + +Currently, the system does this by reading all the files of a particular type, and then "un-transposing" the entire thing. This leads to a very slow system. + +This issue proposes to do the quality control/assurance processing on each file in isolation, where possible - this will allow parallelisation/multiprocessing of the QC checks. + +The main considerations that need to be handled are as follows: + +* Do QC on (founder) genotype files (when present) before any of the other files +* Genetic and physical maps (if present) can have QC run on them after the genotype files +* Do QC on phenotype files (when present) after genotype files but before any other files +* Covariate and phenotype covariate files come after the phenotype files +* Cross information files … ? +* Sex information files … ? + +We should probably detail the type of QC checks done for each type of file diff --git a/issues/gn-uploader/uploading-samples.gmi b/issues/gn-uploader/uploading-samples.gmi new file mode 100644 index 0000000..11842b9 --- /dev/null +++ b/issues/gn-uploader/uploading-samples.gmi @@ -0,0 +1,51 @@ +# Uploading Samples + +## Tags + +* status: open +* assigned: fredm +* interested: acenteno, zachs, flisso +* priority: high +* type: feature-request +* keywords: gn-uploader, uploader, samples, strains + +## Description + +This will track the various notes regarding the upload of samples onto GeneNetwork. + +### Sample Lists + +From the email thread(s) with @zachs, @flisso and @acenteno + +``` +When there's a new set of individuals, it generally needs to be added as a new group. In the absence of genotype data, a "dummy" .geno file currently needs to be generated* in order to define the sample list (if you look at the list of .geno files in genotype_files/genotype you'll find some really small files that just have either a single marker or a bunch of fake markers calls "Marker1, Marker2, etc" - these are solely just used to get the samplelist from the columns). So in theory such a file could be generated as a part of the upload process in the absence of genotypes +``` + +We note, however, that the as @zachs mentions + +``` +This is really goofy and should probably change. I've brought up the idea of just replacing these with JSON files containing group metadata (including samplelist), but we've never actually gone through with making any change to this. I already did something sorta similar to this with the existing JSON files (in genotype_files/genotype), but those are currently only used in situations where there are either multiple genotype files, or a genotype file only contains a subset of samples/strains from a group (so the JSON file tells mapping to only use those samples/strains). +``` + +We need to explore whether such a change might need updates to the GN2/GN3 code to ensure code that depends on these dummy files can also use the new format JSON files too. + +Regarding the order of the samples, from the email thread: + +``` +Regarding the order of samples, it can basically be whatever we decide it is. It just needs to stay consistent (like if there are multiple genotype files). It only really affects how it's displayed, and any other genotype files we use for mapping needs to share the same order. +``` + +The ordering of the samples has no bearing on the analysis of the data, i.e. it does not affect the results of computations. + + +### Curation + +``` +But any time new samples are involved, there probably needs to be some explicit confirmation by a curator like Rob (since we want to avoid a situation where a sample/strain just has a typo or somethin and we treat it like a new sample/strain). +``` + +also + +``` +When there's a mix of existing individuals, I think it's usually the case that it's the same group (that is being expanded with new individuals), but anything that involves adding new samples should probably involve some sort of direct/explicit confirmation from a curator like Rob or something. +``` diff --git a/issues/gnqa/GNQA-for-evaluation.gmi b/issues/gnqa/GNQA-for-evaluation.gmi index 9f4a861..0b2e352 100644 --- a/issues/gnqa/GNQA-for-evaluation.gmi +++ b/issues/gnqa/GNQA-for-evaluation.gmi @@ -5,7 +5,7 @@ * Assigned: alexm, shelbys * Keywords: UI, GNQA, evaluation * Type: immediate -* Status: In Progress +* Status: completed ## Description @@ -13,5 +13,5 @@ We need to publish a paper on GeneNetwork Question & Answering system. To that e ## Tasks -* [ ] Add a thumbs up and down for rating the answer to a question -* [ ] Ensure to log the questions, respones, and ratings of each questions +* [X] Add a thumbs up and down for rating the answer to a question +* [X] Ensure to log the questions, respones, and ratings of each questions diff --git a/issues/gnqna/rating-system-has-no-indication-for-login-requirement.gmi b/issues/gnqa/Login_no-indicator-for-req.gmi index 7ed713a..7ed713a 100644 --- a/issues/gnqna/rating-system-has-no-indication-for-login-requirement.gmi +++ b/issues/gnqa/Login_no-indicator-for-req.gmi diff --git a/issues/fetch-pubmed-references-to-gnqa.gmi b/issues/gnqa/fetch-pubmed-references-to-gnqa.gmi index 63351d1..43c45cf 100644 --- a/issues/fetch-pubmed-references-to-gnqa.gmi +++ b/issues/gnqa/fetch-pubmed-references-to-gnqa.gmi @@ -5,7 +5,7 @@ * assigned: alexm * keywords: llm, pubmed, api, references * type: enhancements -* status: in progress +* status: completed, closed ## Description @@ -18,13 +18,13 @@ The task is to integrate PubMed references into the GNQA system by querying the * [x] Query the API with the publication titles. -* [] Display the PubMed information as reference information on the GN2 user interface. +* [x] Display the PubMed information as reference information on the GN2 user interface. -* [] dump the results to a DB e.g sqlite,lmdb +* [x] dump the results to a DB e.g sqlite,lmdb * [x] If references are not found, perform a lossy search or list the closest three papers. -* [] reimplement the reference ui to render the references as modal objects +* [x] reimplement the reference ui to render the references as modal objects For lossy search, see: diff --git a/issues/gn_llm_db_cache_integration.gmi b/issues/gnqa/gn_llm_db_cache_integration.gmi index 86f7c80..86f7c80 100644 --- a/issues/gn_llm_db_cache_integration.gmi +++ b/issues/gnqa/gn_llm_db_cache_integration.gmi diff --git a/issues/gnqa/gn_llm_integration_using_cached_searches.gmi b/issues/gnqa/gn_llm_integration_using_cached_searches.gmi new file mode 100644 index 0000000..e20b5a3 --- /dev/null +++ b/issues/gnqa/gn_llm_integration_using_cached_searches.gmi @@ -0,0 +1,43 @@ +# GN2 Integration with LLM search using cached results + +## Tags + +* assigned: jnduli, alexm, bmunyoki +* keywords: llm, genenetwork2 +* type: enhancement +* status: open + +## Description + +We'd like to include LLM searches integrated into our GN searches, when someone attempts a Xapian search e.g. when I search for `wiki:rif group:mouse nicotine`, we'd do a corresponding search for `rif mouse nicotine` on LLMs, and show the results on the main page. + +Another example: + +xapian search: rif:glioma species:human group:gtex_v8 +llm search: glioma human gtex_v8 + + +This can be phased out into + +* [ ] 1. UI integration, where we modify the search page to include a dummy content box +* [ ] 2. LLM search integration, where we perform a search and modify UI to show the results. This can either be async (i.e. the search results page waits for the LLM search results) or sync (i.e. we load the search results page after we've got the LLM results) +* [x] 2.1 create a copy branch for the gnqa-api branch +* [x] 2.2 create a PR containing all the branches +* [ ] 2.3 how much would it take to get the qa_*** branch merged into main?? +* [ ] 3. Cache design and integration: we already have some + +cache using redis (gn search history), so we may use this for the moment. + + +Let's use flag: `LLM_SEARCH_ENABLED` to enable/disable this feature during development to make sure we don't release this before it's ready. + + +## Notes + +The branch for merging to gn2: + +https://github.com/genenetwork/genenetwork2/pull/863 + +The branch for merging to gn3: + +https://github.com/genenetwork/genenetwork3/pull/188
\ No newline at end of file diff --git a/issues/gnqa/gnqa_integration_to_global_search_Design.gmi b/issues/gnqa/gnqa_integration_to_global_search_Design.gmi new file mode 100644 index 0000000..0d5afd0 --- /dev/null +++ b/issues/gnqa/gnqa_integration_to_global_search_Design.gmi @@ -0,0 +1,74 @@ +# GNQA Integration to Global Search Design Proposal + +## Tags +* assigned: jnduli, alexm +* keywords: llm, genenetwork2 +* type: feature +* status: complete, closed, done + +## Description +This document outlines the design proposal for integrating GNQA into the Global Search feature. + +## High-Level Design + +### UI Design +When the GN2 Global Search page loads: +1. A request is initiated via HTMX to the GNQA search page with the search query. +2. Based on the results, a page or subsection is rendered, displaying the query and the answer, and providing links to references. + +For more details on the UI design, refer to the pull request: +=> https://github.com/genenetwork/genenetwork2/pull/862 + +### Backend Design +The API handles requests to the Fahamu API and manages result caching. Once a request to the Fahamu API is successful, the results are cached using SQLite for future queries. Additionally, a separate API is provided to query cached results. + +## Deep Dive + +### Caching Implementation +For caching, we will use SQLite3 since it is already implemented for search history. Based on our study, this approach will require minimal space: + +*Statistical Estimation:* +We calculated that this caching solution would require approximately 79MB annually for an estimated 20 users, each querying the system 5 times a day. + +Why average request size per user and how we determined this? +The average request size was an upper bound calculation for documents returned from the Fahamu API. + +why we're assuming 20 users making 5 requests per day? + +We’re assuming 20 users making 5 requests per day to estimate typical usage of GN2 services +### Error Handling +* Handle cases where users are not logged in, as GNQA requires authentication. +* Handle scenarios where there is no response from Fahamu. +* Handle general errors. + +### Passing Questions to Fahamu +We can choose to either pass the entire query from the user to Fahamu or parse the query to search for keywords. + +### Generating Possible Questions +It is possible to generate potential questions based on the user's search and render those to Fahamu. Fahamu would then return possible related queries. + +## Related Issues +=> https://issues.genenetwork.org/issues/gn_llm_integration_using_cached_searches + +## Tasks + +* [x] Initiate a background task from HTMX to Fahamu once the search page loads. +* [x] Query Fahamu for data. +* [x] Cache results from Fahamu. +* [x] Render the UI page with the query and answer. +* [x] For "See more," render the entire GNQA page with the query, answer, references, and PubMed data. +* [x] Implement parsing for Xapian queries to normal queries. +* [x] Implement error handling. +* [x] reimplement how gnqa uses GN-AUTH in gn3. +* [x] Query Fahamu to generate possible questions based on certain keywords. + + +## Notes +From the latest Fahamu API docs, they have implemented a way to include subquestions by setting `amplify=True` for the POST request. We also have our own implementation for parsing text to extract questions. + +## PRs Merged Related to This + +=> https://github.com/genenetwork/genenetwork2/pull/868 +=> https://github.com/genenetwork/genenetwork2/pull/862 +=> https://github.com/genenetwork/genenetwork2/pull/867 +=> https://github.com/genenetwork/genenetwork3/pull/191
\ No newline at end of file diff --git a/issues/implement-auth-to-gn-llm.gmi b/issues/gnqa/implement-auth-to-gn-llm.gmi index 496a7cb..2a5456b 100644 --- a/issues/implement-auth-to-gn-llm.gmi +++ b/issues/gnqa/implement-auth-to-gn-llm.gmi @@ -6,7 +6,7 @@ * keywords: llm, auth * type: feature * priority: high -* status: done, completed +* status: done, completed, closed ## Description diff --git a/issues/gnqa/implement-no-login-requirement-for-gnqa.gmi b/issues/gnqa/implement-no-login-requirement-for-gnqa.gmi new file mode 100644 index 0000000..9dcef53 --- /dev/null +++ b/issues/gnqa/implement-no-login-requirement-for-gnqa.gmi @@ -0,0 +1,20 @@ +# Implement No-Login Requirement for GNQA + +## Tags + +* type: feature +* status: progress +* priority: medium +* assigned: alexm, +* keywords: gnqa, user experience, authentication, login, llm + +## Description +This feature will allow usage of LLM/GNQA features without requiring user authentication, while implementing measures to filter out bots + + +## Tasks + +* [x] If logged in: perform AI search with zero penalty +* [ ] Add caching lifetime to save on token usage +* [ ] Routes: check for referrer headers — if the previous search was not from the homepage, perform AI search +* [ ] If global search returns more than *n* results (*n = number*), perform an AI search diff --git a/issues/implement-reference-rating-gn-llm.gmi b/issues/gnqa/implement-reference-rating-gn-llm.gmi index f646a6f..f646a6f 100644 --- a/issues/implement-reference-rating-gn-llm.gmi +++ b/issues/gnqa/implement-reference-rating-gn-llm.gmi diff --git a/issues/integrate_gn_llm_search.gmi b/issues/gnqa/integrate_gn_llm_search.gmi index 5dfd9da..5dfd9da 100644 --- a/issues/integrate_gn_llm_search.gmi +++ b/issues/gnqa/integrate_gn_llm_search.gmi diff --git a/issues/merge-gnqa-to-production.gmi b/issues/gnqa/merge-gnqa-to-production.gmi index 3d34bb1..3d34bb1 100644 --- a/issues/merge-gnqa-to-production.gmi +++ b/issues/gnqa/merge-gnqa-to-production.gmi diff --git a/issues/refactor-gn-llm-code.gmi b/issues/gnqa/refactor-gn-llm-code.gmi index 6e33737..64c43c4 100644 --- a/issues/refactor-gn-llm-code.gmi +++ b/issues/gnqa/refactor-gn-llm-code.gmi @@ -5,7 +5,7 @@ * assigned:alexm,shelby * keywords:refactoring,llm,tests * type: enchancements -* status: in progress +* status: completed, closed ## Description diff --git a/issues/implement_xapian_to_text_transformer.gmi b/issues/implement_xapian_to_text_transformer.gmi new file mode 100644 index 0000000..a3c3dc8 --- /dev/null +++ b/issues/implement_xapian_to_text_transformer.gmi @@ -0,0 +1,15 @@ +# Xapian to Text Transformer + +## Tags +* assigned: alexm, jnduli +* keywords: llm, genenetwork2, xapian, transform +* type: feature +* status: in-progress + +## Description: + +Given a Xapian search query, e.g., "CYTOCHROME AND P450" or "CYTOCHROME NEAR P450," we need to convert the text to a format with no Xapian keywords. In this case, the transformed text would be "CYTOCHROME P450." + + +This issue is a part of the main issue below. +=> https://issues.genenetwork.org/issues/gn_llm_integration_using_cached_searches diff --git a/issues/inspect-discrepancies-between-xapian-and-sql-search.gmi b/issues/inspect-discrepancies-between-xapian-and-sql-search.gmi new file mode 100644 index 0000000..98b46b6 --- /dev/null +++ b/issues/inspect-discrepancies-between-xapian-and-sql-search.gmi @@ -0,0 +1,135 @@ +# Inspect Discrepancies Between Xapian and SQL Search. + +* assigned: bonfacem, rookie101 + +## Description + +When doing a Xapian search, we miss some data that is available from the SQL Search. The searches we tested: + +=> https://cd.genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=WIKI%3Dglioma&search_terms_and=&accession_id=None&FormID=searchResulto SQL search for dataset=HC_M2_0606_P species=mouse group=BXD WIKI=glioma (31 results) + +=> https://cd.genenetwork.org/gsearch?type=gene&terms=species%3Amouse+group%3Abxd+dataset%3Ahc_m2_0606_p+wiki%3Aglioma species:mouse group:bxd dataset:hc_m2_0606_p wiki:glioma (26 results) + +We miss the following entries from the Xapian search: + +``` +15 1423803_s_at Gltscr2 glioma tumor suppressor candidate region gene 2 +16 1451121_a_at Gltscr2 glioma tumor suppressor candidate region 2; exons 8 and 9 +17 1452409_at Gltscr2 glioma tumor suppressor candidate region gene 2 +25 1416556_at Sas sarcoma amplified sequence +26 1430029_a_at Sas sarcoma amplified sequence +``` + +We want to figure out why there is a discrepancy between the 2 searches above. + +## Resolution + +Use "quest" to search for one of the symbols that don't appear in the Xapian search to get the exact document id: + +``` +quest --msize=2 -s en --boolean-prefix="iden:Qgene:" "iden:"1423803_s_at:hc_m2_0606_p"" \ +--db=/export/data/genenetwork-xapian/ + +Parsed Query: Query(0 * Qgene:1423803_s_at:hc_m2_0606_p) +Exactly 1 matches +MSet: +9665867: [0] +{ + "name": "1423803_s_at", + "symbol": "Gltscr2", + "description": "glioma tumor suppressor candidate region gene 2", + "chr": "1", + "mb": 4.687986, + "dataset": "HC_M2_0606_P", + "dataset_fullname": "Hippocampus Consortium M430v2 (Jun06) PDNN", + "species": "mouse", + "group": "BXD", + "tissue": "Hippocampus mRNA", + "mean": 11.749030303030299, + "lrs": 11.3847971289981, + "additive": -0.0650828877005346, + "geno_chr": "5", + "geno_mb": 137.010795 +} +``` + +From the retrieved document-id, use "xapian-delve" to inspect the terms inside the index: + +``` +xapian-delve -r 9665867 -d /export/data/genenetwork-xapian/ + +Data for record #9665867: +{ + "name": "1423803_s_at", + "symbol": "Gltscr2", + "description": "glioma tumor suppressor candidate region gene 2", + "chr": "1", + "mb": 4.687986, + "dataset": "HC_M2_0606_P", + "dataset_fullname": "Hippocampus Consortium M430v2 (Jun06) PDNN", + "species": "mouse", + "group": "BXD", + "tissue": "Hippocampus mRNA", + "mean": 11.749030303030299, + "lrs": 11.3847971289981, + "additive": -0.0650828877005346, + "geno_chr": "5", + "geno_mb": 137.010795 +} +Term List for record #9665867: 1423803_s_at 2 5330430h08rik +9430097c02rik Qgene:1423803_s_at:hc_m2_0606_p +XC1 XDShc_m2_0606_p XGbxd XIhippocampus XImrna XPC5 +XSmouse XTgene XYgltscr2 ZXDShc_m2_0606_p ZXGbxd +ZXIhippocampus ZXImrna ZXSmous ZXYgltscr2 Zbc017637 +Zbxd Zcandid Zgene Zglioma Zgltscr2 Zhc_m2_0606_p +Zhippocampus Zmous Zmrna Zregion Zsuppressor Ztumor +bc017637 bxd candidate gene glioma gltscr2 +hc_m2_0606_p hippocampus mouse mrna +region suppressor tumor +``` + +We have no wiki (XWK) entries from the above. When transforming to TTL files from SQL, we have symbols that exist in the GeneRIF table that do not exist in the GeneRIF_BASIC table: + +``` +SELECT COUNT(symbol) FROM GeneRIF WHERE +symbol NOT IN (SELECT symbol FROM GeneRIF_BASIC) +GROUP BY BINARY symbol; +``` + +Consequently, this means that after transforming to TTL files, we have some missing RDF entries that map a symbol (subject) to it's real name (object). When building the RDF cache, we thereby have some missing RIF/WIKI entries, and some entries are not indexed. This patch fixes the aforementioned error with missing symbols: + +=> https://git.genenetwork.org/gn-transform-databases/commit/?id=d95501bd2bd41ef8cf3584118382e83cbbbe0c87 [gn-transform-databases] Add missing RIF symbols. + +Now these 2 queries return the same exact results: + +=> https://cd.genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=WIKI%3Dglioma&search_terms_and=&accession_id=None&FormID=searchResulto SQL search for dataset=HC_M2_0606_P species=mouse group=BXD WIKI=glioma (31 results) + +=> https://cd.genenetwork.org/gsearch?type=gene&terms=species%3Amouse+group%3Abxd+dataset%3Ahc_m2_0606_p+wiki%3Aglioma species:mouse group:bxd dataset:hc_m2_0606_p wiki:glioma (31 results) + +However, Xapian search is case insensitive while the SQL search is case sensitive: + +=> https://cd.genenetwork.org/gsearch?type=gene&terms=species%3Amouse+group%3Abxd+dataset%3Ahc_m2_0606_p+wiki%3Acancer species:mouse group:bxd dataset:hc_m2_0606_p wiki:cancer (72 results) + +=> https://cd.genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=WIKI%3Dcancer&search_terms_and=&accession_id=None&FormID=searchResulto SQL search for dataset=HC_M2_0606_P species=mouse group=BXD WIKI=cancer (70 results) + +=> https://cd.genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=WIKI%3DCancer&search_terms_and=&accession_id=None&FormID=searchResulto SQL search for dataset=HC_M2_0606_P species=mouse group=BXD WIKI=Cancer (Note the change in the case "Cancer": 13 results) + +Another reason for discrepancies between search results, E.g. + +=> https://cd.genenetwork.org/gsearch?type=gene&terms=species%3Amouse+group%3Abxd+dataset%3Ahc_m2_0606_p+wiki%3Adiabetes species:mouse group:bxd dataset:hc_m2_0606_p wiki:diabetes (59 results) + +=> https://cd.genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=WIKI%3Ddiabetes&search_terms_and=&accession_id=None&FormID=searchResulto SQL search for dataset=HC_M2_0606_P species=mouse group=BXD WIKI=diabetes (52 results) + +is that Xapian performs stemming on the search terms. For example, in the above wiki search for "diabetes", Xapian will stem "diabetes" to "diabet" thereby matching "diabetic", "diabetes", or any other word variation of "diabetes." + +## Ordering of Results + +The ordering in the Xapian search and SQL search is different. By default, SQL orders by Symbol where we have: + +``` +[...] ORDER BY ProbeSet.symbol ASC +``` + +However, Xapian orders search results by decreasing relevance score. This is configurable. + +* closed diff --git a/issues/inspect-discrepancies-between-xapian-and-sql-search2.gmi b/issues/inspect-discrepancies-between-xapian-and-sql-search2.gmi new file mode 100644 index 0000000..451d5c3 --- /dev/null +++ b/issues/inspect-discrepancies-between-xapian-and-sql-search2.gmi @@ -0,0 +1,11 @@ +# Inspect Discrepancies Between Xapian and SQL Search. + +* assigned: bonfacem, rookie101 + +## Description + +When we type BXD_21526 in xapian search we should find + +=> https://genenetwork.org/search?species=mouse&group=BXD&type=Phenotypes&dataset=BXDPublish&search_terms_or=BXD_21526&search_terms_and=&accession_id=None&FormID=searchResult + +This is not the case right now. diff --git a/issues/integrate-markdown-editor-to-gn2.gmi b/issues/integrate-markdown-editor-to-gn2.gmi index 98c170b..5904eac 100644 --- a/issues/integrate-markdown-editor-to-gn2.gmi +++ b/issues/integrate-markdown-editor-to-gn2.gmi @@ -1,3 +1,4 @@ + # GN Markdown Editor Integration ## Tags @@ -5,26 +6,168 @@ * assigned: alexm * status: in progress * priority: high +* tags: markdown, integration, guile ## Notes -This is a to-do list to integrate the GN Markdown editor into GN2. + +This is a to-do list to integrate the GN Markdown editor into GN2. To see the implementation, see: -=> https://github.com/Alexanderlacuna/geditor +=> https://git.genenetwork.org/gn-guile/ ## Tasks -* [ ] Implement APIs to fetch file for edit -* [ ] Add verification for the repository -* [ ] Implement API to edit and commit changes -* [ ] Replace JS with HTMX -* [ ] Support external links and image rendering -* [ ] Package dependencies -* [ ] Handle errors +* [x] Implement APIs to fetch files for editing +* [x] Add verification for the repository +* [x] Implement API to edit and commit changes +* [x] Replace JS with HTMX +* [x] Support external links and image rendering +* [x] Package dependencies +* [x] show diff for files +* [x] Handle errors * [ ] Review by users -* [ ] Integrate auth to the system. +* [x] Integrate authentication into the system + + +## API Documentation + +This APi endpoints are implemented in guile See repo: + +=> https://git.genenetwork.org/gn-guile/ + +The main endpoints are: `/edit` and `/commit` + +### Edit (GET) + +This is a `GET` request to retrieve file content. Make sure you pass a valid `file_path` as `search_query` (the path should be relative to the repository). + +**Edit Request Example:** + +```bash + +curl -G -d "file_path=test.md" localhost:8091/edit +``` + +In case of a successful response, the expected result is: + + +```json +{ +"path": "<file_path>", +"content": "Test for new user\n test 2 for line\n test 3 for new line\n ## real markdown two test\n", +"hash": "<commit_sha>" +} +``` + +In case of an error, the expected response is: + +```json +{ +"error": "<error_type>", +"msg": "<error_reason>" +} +``` + +### Commit (POST) + +**Endpoint:** + +``` +localhost:8091/commit +``` + + +```bash + +curl -X POST http://127.0.0.1:8091/commit \ +-H 'Content-Type: application/json' \ +-d '{ +"content": "make test commit", +"filename": "test.md", +"email": "test@gmail.com", +"username": "test", +"commit_message": "init commit", +"prev_commit": "7cbfc40d98b49a64e98e7cd562f373053d0325bd" +}' + +``` -Related issues: +It expects the following data in JSON format: + +* `content` (the data you want to commit to the file, *valid markdown*) +* `prev_commit` (required for integrity) +* `filename` (file path to the file you are modifying) +* `username` (identifier for the user, in our case from auth) +* `email` (identifier email from the user, in our case from auth) +* `commit_message` + +If the request succeeds, the response should be: + +```json +{ +"status": "201", +"message": "Committed file successfully", +"content": "Test for new user\n test 2 for line\n test 3 for new line\n ## real markdown two test\n", +"commit_sha": "47df3b7f13a935d50cc8b40e98ca9e513cba104c", +"commit_message": "commit by genetics" +} +``` + +If there are no changes to the file: + +```json +{ +"status": "200", +"message": "Nothing to commit, working tree clean", +"commit_sha": "ecd96f27c45301279150fbda411544687db1aa45" +} +``` + +If the request fails, the expected results are: + +```json +{ +"error": "<error_type>", +"msg": "Commits do not match. Please pull in the latest changes for the current commit *ecd96f27c45301279150fbda411544687db1aa45* and previous commits." +} +``` + +## Related Issues => https://issues.genenetwork.org/issues/implement-gn-markdown-editor-in-guile -=> https://issues.genenetwork.org/issues/implement-gn-markdown-editor
\ No newline at end of file +=> https://issues.genenetwork.org/issues/implement-gn-markdown-editor + +## Notes on Gn-Editor UI + +Here is the link to the PR for integrating the GN-Editor, including screenshots: + +=> https://github.com/genenetwork/genenetwork2/pull/854 + +Genenetwork2 consumes the endpoint for the GN-Editor. Authentication is required to prevent access by malicious users and bots. + +The main endpoint to fetch and edit a file is: + +``` +genenetwork.org/editor/edit?file-path=<relative file path> +``` + +This loads the editor with the content for editing. + +### Modifying Editor Settings + +You can modify editor settings, such as font size and keyboard bindings. To do this, navigate to: + +``` +genenetwork.org/editor/settings +``` + +Be sure to save your changes for them to take effect. + +### Showing Diff for Editor + +The editor also provides a diff functionality to show you the changes made to the file. Use the "Diff" button in the navigation to view these changes. + +### Committing Changes + +To commit your changes, use the "Commit" button. A commit message is required in the text area for the commit to be processed. + diff --git a/issues/mgamma/mgamma-design.gmi b/issues/mgamma/mgamma-design.gmi index 23e02d5..ed4c061 100644 --- a/issues/mgamma/mgamma-design.gmi +++ b/issues/mgamma/mgamma-design.gmi @@ -7,3 +7,31 @@ We have a lot of experience running and hacking the GEMMA tool in GeneNetwork.or GEMMA proves to give great GWA results and has a decent speed for a single threaded implementation - even though the matrix calls to openblas use multiple threads. The source code base of GEMMA, however, proves hard to build on. This is why we are creating a next generation tool that has a focus on *performance and hackability*. After several attempts using R, D, Julia, python, Ruby we have in 2023 settled on Guile+C+Zig. Guile provides a REPL and great hackabability. C+Zig we'll use for performance. The other languages are all great, but we think we can work faster in this setup. + +Well, it is the end of 2024 and we have ditched that effort. Who said life was easy! The guile interface proved problematic - and Zig went out of favour because of its bootstrap story which prevents it becoming part of Guix, Debian etc. Also I discovered new tensor MPUs support f64 - so we may want to support vector and matrix computations on these cores. + +To write a gemma replacement I am now favouring to chunk up existing gemma and make sure its components can talk with alternative implementations. We may use a propagated network approach. Critical is to keep the data in RAM, so it may need some message passing interface with memory that can be shared. The chunking into CELLs (read propagator network PN) is a requirement because we kept tripping over state in GEMMA. So a PN should make sure we can run two implementations of the same CELL and compare outcomes for testing. Also it will allow us to test AVX, tensor and (say) MKL or CUDA implementations down the line. Also it should allow us to start using new functionality on GN faster. It would also be fun to have an implementation run on the RISC-V manycore. + +So, what do we want out of our languages: + +* Nice matrix interface (Julia) +* Support for AVX (Julia) +* Possibility to drop to low level C programming (Julia+prescheme+C?) +* High level -- PN -- glue (Julia+Guile?) + +Julia looks like a great candidate, even though it has notable downsides including the big 'server' blob deployment and the garbage collector (the latter also being a strength, mind). Alternatives could be Rust and Prescheme which have no such concerns, but lack the nice matrix notation. + +The approach will be to start with Julia and reimplementing GEMMA functions so they can be called from Julia and/or guile. + +Oh, I just found out that Julia, like zig, is no longer up-to-date on Debian. And the Guix version is 2 years old. That is really bad. If these languages don't get supported on major distros it is a dead end! + +=> https://mastodon.social/@pjotrprins/113379842047170785 + +What to now? + +* Nice matrix interface (?) +* Support for AVX (?) +* Possibility to drop to low level C programming (?+prescheme+C?) +* High level -- PN -- glue (?+Guile?) + +Current candidates for ? are Nim and Rust. Neither has a really nice matrix interface - though Nim's is probably what I prefer and it is close to python. Chicken may work too when I get fed with mentioned two languages. diff --git a/issues/mgamma/mgamma-lmm.gmi b/issues/mgamma/mgamma-lmm.gmi new file mode 100644 index 0000000..61481c2 --- /dev/null +++ b/issues/mgamma/mgamma-lmm.gmi @@ -0,0 +1,17 @@ +# MGAMMA LMM + +MGamma does GWAS, which means it has to do Linear Mixed Models—both univariate and multivariate. + +# Tags + +* assigned: pjotrp, artyom +* type: feature +* priority: high + +# Tasks + +* [X] Kinship matrix computation. +* [X] Univariate LMM. +* [ ] Multivariate LMM. +* [X] Export data from GEMMA. +* [ ] Compare and ensure data match between MGamma and GEMMA.
\ No newline at end of file diff --git a/issues/move-racket-gn-rest-api-to-guile.gmi b/issues/move-racket-gn-rest-api-to-guile.gmi index 185e7de..659c586 100644 --- a/issues/move-racket-gn-rest-api-to-guile.gmi +++ b/issues/move-racket-gn-rest-api-to-guile.gmi @@ -6,7 +6,7 @@ * priority: medium * type: API, metadata * keywords: API -* status: open +* status: stalled ## Description diff --git a/issues/move-search-to-xapian.gmi b/issues/move-search-to-xapian.gmi index 57612e7..d98be9b 100644 --- a/issues/move-search-to-xapian.gmi +++ b/issues/move-search-to-xapian.gmi @@ -18,3 +18,5 @@ As a work around---to make search work with Python3.10, an inefficient hack was => https://github.com/genenetwork/genenetwork2/pull/805/commits/9a6ddf9f1560b3bc1611f50bf2b94f0dc44652a2 Replace escape with conn.escape_string To get rid of this inheritance, I propose rewriting the search functionality in a more straightforward and functional manner. In doing so, we can also transition to Xapian search, a faster and more efficient search system. + +* closed diff --git a/issues/old_session_bug.gmi b/issues/old_session_bug.gmi index 649ea46..925b9f6 100644 --- a/issues/old_session_bug.gmi +++ b/issues/old_session_bug.gmi @@ -2,7 +2,7 @@ ## Tags -* status: open +* status: closed * priority: medium * type: bug * assigned: zsloan, fredm diff --git a/issues/production-container-mechanical-rob-failure.gmi b/issues/production-container-mechanical-rob-failure.gmi new file mode 100644 index 0000000..ae6bae8 --- /dev/null +++ b/issues/production-container-mechanical-rob-failure.gmi @@ -0,0 +1,224 @@ +# Production Container: `mechanical-rob` Failure + +## Tags + +* status: closed, completed, fixed +* priority: high +* type: bug +* assigned: fredm +* keywords: genenetwork, production, mechanical-rob + +## Description + +After deploying the latest commits to https://gn2-fred.genenetwork.org on 2025-02-19UTC-0600, with the following commits: + +* genenetwork2: 2a3df8cfba6b29dddbe40910c69283a1afbc8e51 +* genenetwork3: 99fd5070a84f37f91993f329f9cc8dd82a4b9339 +* gn-auth: 073395ff331042a5c686a46fa124f9cc6e10dd2f +* gn-libs: 72a95f8ffa5401649f70978e863dd3f21900a611 + +I had the (not so) bright idea to run the `mechanical-rob` tests against it before pushing it to production, proper. Here's where I ran into problems: some of the `mechanical-rob` tests failed, specifically, the correlation tests. + +Meanwhile, a run of the same tests against https://cd.genenetwork.org with the same commits was successful: + +=> https://ci.genenetwork.org/jobs/genenetwork2-mechanical-rob/1531 See this. + +This points to a possible problem with the setup of the production container, that leads to failures where none should be. This needs investigation and fixing. + +### Update 2025-02-20 + +The MariaDB server is crashing. To reproduce: + +* Go to https://gn2-fred.genenetwork.org/show_trait?trait_id=1435464_at&dataset=HC_M2_0606_P +* Click on "Calculate Correlations" to expand +* Click "Compute" + +Observe that after a little while, the system fails with the following errors: + +* `MySQLdb.OperationalError: (2013, 'Lost connection to MySQL server during query')` +* `MySQLdb.OperationalError: (2006, 'MySQL server has gone away')` + +I attempted updating the configuration for MariaDB, setting the `max_allowed_packet` to 16M and then 64M, but that did not resolve the problem. + +The log files indicate the following: + +``` +2025-02-20 7:46:07 0 [Note] Recovering after a crash using /var/lib/mysql/gn0-binary-log +2025-02-20 7:46:07 0 [Note] Starting crash recovery... +2025-02-20 7:46:07 0 [Note] Crash recovery finished. +2025-02-20 7:46:07 0 [Note] Server socket created on IP: '0.0.0.0'. +2025-02-20 7:46:07 0 [Warning] 'user' entry 'webqtlout@tux01' ignored in --skip-name-resolve mode. +2025-02-20 7:46:07 0 [Warning] 'db' entry 'db_webqtl webqtlout@tux01' ignored in --skip-name-resolve mode. +2025-02-20 7:46:07 0 [Note] Reading of all Master_info entries succeeded +2025-02-20 7:46:07 0 [Note] Added new Master_info '' to hash table +2025-02-20 7:46:07 0 [Note] /usr/sbin/mariadbd: ready for connections. +Version: '10.5.23-MariaDB-0+deb11u1-log' socket: '/run/mysqld/mysqld.sock' port: 3306 Debian 11 +2025-02-20 7:46:07 4 [Warning] Access denied for user 'root'@'localhost' (using password: NO) +2025-02-20 7:46:07 5 [Warning] Access denied for user 'root'@'localhost' (using password: NO) +2025-02-20 7:46:07 0 [Note] InnoDB: Buffer pool(s) load completed at 250220 7:46:07 +250220 7:50:12 [ERROR] mysqld got signal 11 ; +Sorry, we probably made a mistake, and this is a bug. + +Your assistance in bug reporting will enable us to fix this for the next release. +To report this bug, see https://mariadb.com/kb/en/reporting-bugs + +We will try our best to scrape up some info that will hopefully help +diagnose the problem, but since we have already crashed, +something is definitely wrong and this may fail. + +Server version: 10.5.23-MariaDB-0+deb11u1-log source revision: 6cfd2ba397b0ca689d8ff1bdb9fc4a4dc516a5eb +key_buffer_size=10485760 +read_buffer_size=131072 +max_used_connections=1 +max_threads=2050 +thread_count=1 +It is possible that mysqld could use up to +key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4523497 K bytes of memory +Hope that's ok; if not, decrease some variables in the equation. + +Thread pointer: 0x7f599c000c58 +Attempting backtrace. You can use the following information to find out +where mysqld died. If you see no messages after this, something went +terribly wrong... +stack_bottom = 0x7f6150282d78 thread_stack 0x49000 +/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55f43330c14e] +/usr/sbin/mariadbd(handle_fatal_signal+0x475)[0x55f432e013b5] +sigaction.c:0(__restore_rt)[0x7f615a1cb140] +/usr/sbin/mariadbd(+0xcbffbe)[0x55f43314efbe] +/usr/sbin/mariadbd(+0xd730ec)[0x55f4332020ec] +/usr/sbin/mariadbd(+0xd1b36b)[0x55f4331aa36b] +/usr/sbin/mariadbd(+0xd1cd8e)[0x55f4331abd8e] +/usr/sbin/mariadbd(+0xc596f3)[0x55f4330e86f3] +/usr/sbin/mariadbd(_ZN7handler18ha_index_next_sameEPhPKhj+0x2a5)[0x55f432e092b5] +/usr/sbin/mariadbd(+0x7b54d1)[0x55f432c444d1] +/usr/sbin/mariadbd(_Z10sub_selectP4JOINP13st_join_tableb+0x1f8)[0x55f432c37da8] +/usr/sbin/mariadbd(_ZN10JOIN_CACHE24generate_full_extensionsEPh+0x134)[0x55f432d24224] +/usr/sbin/mariadbd(_ZN10JOIN_CACHE21join_matching_recordsEb+0x206)[0x55f432d245d6] +/usr/sbin/mariadbd(_ZN10JOIN_CACHE12join_recordsEb+0x1cf)[0x55f432d23eff] +/usr/sbin/mariadbd(_Z16sub_select_cacheP4JOINP13st_join_tableb+0x8a)[0x55f432c382fa] +/usr/sbin/mariadbd(_ZN4JOIN10exec_innerEv+0xd16)[0x55f432c63826] +/usr/sbin/mariadbd(_ZN4JOIN4execEv+0x35)[0x55f432c63cc5] +/usr/sbin/mariadbd(_Z12mysql_selectP3THDP10TABLE_LISTR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x106)[0x55f432c61c26] +/usr/sbin/mariadbd(_Z13handle_selectP3THDP3LEXP13select_resultm+0x138)[0x55f432c62698] +/usr/sbin/mariadbd(+0x762121)[0x55f432bf1121] +/usr/sbin/mariadbd(_Z21mysql_execute_commandP3THD+0x3d6c)[0x55f432bfdd1c] +/usr/sbin/mariadbd(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x20b)[0x55f432bff17b] +/usr/sbin/mariadbd(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0xdb5)[0x55f432c00f55] +/usr/sbin/mariadbd(_Z10do_commandP3THD+0x120)[0x55f432c02da0] +/usr/sbin/mariadbd(_Z24do_handle_one_connectionP7CONNECTb+0x2f2)[0x55f432cf8b32] +/usr/sbin/mariadbd(handle_one_connection+0x5d)[0x55f432cf8dad] +/usr/sbin/mariadbd(+0xbb4ceb)[0x55f433043ceb] +nptl/pthread_create.c:478(start_thread)[0x7f615a1bfea7] +x86_64/clone.S:97(__GI___clone)[0x7f6159dc6acf] + +Trying to get some variables. +Some pointers may be invalid and cause the dump to abort. +Query (0x7f599c012c50): SELECT ProbeSet.Name,ProbeSet.Chr,ProbeSet.Mb, + ProbeSet.Symbol,ProbeSetXRef.mean, + CONCAT_WS('; ', ProbeSet.description, ProbeSet.Probe_Target_Description) AS description, + ProbeSetXRef.additive,ProbeSetXRef.LRS,Geno.Chr, Geno.Mb + FROM ProbeSet INNER JOIN ProbeSetXRef + ON ProbeSet.Id=ProbeSetXRef.ProbeSetId + INNER JOIN Geno + ON ProbeSetXRef.Locus = Geno.Name + INNER JOIN Species + ON Geno.SpeciesId = Species.Id + WHERE ProbeSet.Name in ('1447591_x_at', '1422809_at', '1428917_at', '1438096_a_at', '1416474_at', '1453271_at', '1441725_at', '1452952_at', '1456774_at', '1438413_at', '1431110_at', '1453723_x_at', '1424124_at', '1448706_at', '1448762_at', '1428332_at', '1438389_x_at', '1455508_at', '1455805_x_at', '1433276_at', '1454989_at', '1427467_a_at', '1447448_s_at', '1438695_at', '1456795_at', '1454874_at', '1455189_at', '1448631_a_at', '1422697_s_at', '1423717_at', '1439484_at', '1419123_a_at', '1435286_at', '1439886_at', '1436348_at', '1437475_at', '1447667_x_at', '1421046_a_at', '1448296_x_at', '1460577_at', 'AFFX-GapdhMur/M32599_M_at', '1424393_s_at', '1426190_at', '1434749_at', '1455706_at', '1448584_at', '1434093_at', '1434461_at', '1419401_at', '1433957_at', '1419453_at', '1416500_at', '1439436_x_at', '1451413_at', '1455696_a_at', '1457190_at', '1455521_at', '1434842_s_at', '1442525_at', '1452331_s_at', '1428862_at', '1436463_at', '1438535_at', 'AFFX-GapdhMur/M32599_3_at', '1424012_at', '1440027_at', '1435846_x_at', '1443282_at', '1435567_at', '1450112_a_at', '1428251_at', '1429063_s_at', '1433781_a_at', '1436698_x_at', '1436175_at', '1435668_at', '1424683_at', '1442743_at', '1416944_a_at', '1437511_x_at', '1451254_at', '1423083_at', '1440158_x_at', '1424324_at', '1426382_at', '1420142_s_at', '1434553_at', '1428772_at', '1424094_at', '1435900_at', '1455322_at', '1453283_at', '1428551_at', '1453078_at', '1444602_at', '1443836_x_at', '1435590_at', '1434283_at', '1435240_at', '1434659_at', '1427032_at', '1455278_at', '1448104_at', '1421247_at', 'AFFX-MURINE_b1_at', '1460216_at', '1433969_at', '1419171_at', '1456699_s_at', '1456901_at', '1442139_at', '1421849_at', '1419824_a_at', '1460588_at', '1420131_s_at', '1446138_at', '1435829_at', '1434462_at', '1435059_at', '1415949_at', '1460624_at', '1426707_at', '1417250_at', '1434956_at', '1438018_at', '1454846_at', '1435298_at', '1442077_at', '1424074_at', '1428883_at', '1454149_a_at', '1423925_at', '1457060_at', '1433821_at', '1447923_at', '1460670_at', '1434468_at', '1454980_at', '1426913_at', '1456741_s_at', '1449278_at', '1443534_at', '1417941_at', '1433167_at', '1434401_at', '1456516_x_at', '1451360_at', 'AFFX-GapdhMur/M32599_5_at', '1417827_at', '1434161_at', '1448979_at', '1435797_at', '1419807_at', '1418330_at', '1426304_x_at', '1425492_at', '1437873_at', '1435734_x_at', '1420622_a_at', '1456019_at', '1449200_at', '1455314_at', '1428419_at', '1426349_s_at', '1426743_at', '1436073_at', '1452306_at', '1436735_at', '1439529_at', '1459347_at', '1429642_at', '1438930_s_at', '1437380_x_at', '1459861_s_at', '1424243_at', '1430503_at', '1434474_at', '1417962_s_at', '1440187_at', '1446809_at', '1436234_at', '1415906_at', 'AFFX-MURINE_B2_at', '1434836_at', '1426002_a_at', '1448111_at', '1452882_at', '1436597_at', '1455915_at', '1421846_at', '1428693_at', '1422624_at', '1423755_at', '1460367_at', '1433746_at', '1454872_at', '1429194_at', '1424652_at', '1440795_x_at', '1458690_at', '1434355_at', '1456324_at', '1457867_at', '1429698_at', '1423104_at', '1437585_x_at', '1437739_a_at', '1445605_s_at', '1436313_at', '1449738_s_at', '1437525_a_at', '1454937_at', '1429043_at', '1440091_at', '1422820_at', '1437456_x_at', '1427322_at', '1446649_at', '1433568_at', '1441114_at', '1456541_x_at', '1426985_s_at', '1454764_s_at', '1424071_s_at', '1429251_at', '1429155_at', '1433946_at', '1448771_a_at', '1458664_at', '1438320_s_at', '1449616_s_at', '1435445_at', '1433872_at', '1429273_at', '1420880_a_at', '1448645_at', '1449646_s_at', '1428341_at', '1431299_a_at', '1433427_at', '1418530_at', '1436247_at', '1454350_at', '1455860_at', '1417145_at', '1454952_s_at', '1435977_at', '1434807_s_at', '1428715_at', '1418117_at', '1447947_at', '1431781_at', '1428915_at', '1427197_at', '1427208_at', '1455460_at', '1423899_at', '1441944_s_at', '1455429_at', '1452266_at', '1454409_at', '1426384_a_at', '1428725_at', '1419181_at', '1454862_at', '1452907_at', '1433794_at', '1435492_at', '1424839_a_at', '1416214_at', '1449312_at', '1436678_at', '1426253_at', '1438859_x_at', '1448189_a_at', '1442557_at', '1446174_at', '1459718_x_at', '1437613_s_at', '1456509_at', '1455267_at', '1440480_at', '1417296_at', '1460050_x_at', '1433585_at', '1436771_x_at', '1424294_at', '1448648_at', '1417753_at', '1436139_at', '1425642_at', '1418553_at', '1415747_s_at', '1445984_at', '1440024_at', '1448720_at', '1429459_at', '1451459_at', '1428853_at', '1433856_at', '1426248_at', '1417765_a_at', '1439459_x_at', '1447023_at', '1426088_at', '1440825_s_at', '1417390_at', '1444744_at', '1435618_at', '1424635_at', '1443727_x_at', '1421096_at', '1427410_at', '1416860_s_at', '1442773_at', '1442030_at', '1452281_at', '1434774_at', '1416891_at', '1447915_x_at', '1429129_at', '1418850_at', '1416308_at', '1422858_at', '1447679_s_at', '1440903_at', '1417321_at', '1452342_at', '1453510_s_at', '1454923_at', '1454611_a_at', '1457532_at', '1438440_at', '1434232_a_at', '1455878_at', '1455571_x_at', '1436401_at', '1453289_at', '1457365_at', '1436708_x_at', '1434494_at', '1419588_at', '1433679_at', '1455159_at', '1428982_at', '1446510_at', '1434131_at', '1418066_at', '1435346_at', '1449415_at', '1455384_x_at', '1418817_at', '1442073_at', '1457265_at', '1447361_at', '1418039_at', '1428467_at', '1452224_at', '1417538_at', '1434529_x_at', '1442149_at', '1437379_x_at', '1416473_a_at', '1432750_at', '1428389_s_at', '1433823_at', '1451889_at', '1438178_x_at', '1441807_s_at', '1416799_at', '1420623_x_at', '1453245_at', '1434037_s_at', '1443012_at', '1443172_at', '1455321_at', '1438396_at', '1440823_x_at', '1436278_at', '1457543_at', '1452908_at', '1417483_at', '1418397_at', '1446589_at', '1450966_at', '1447877_x_at', '1446524_at', '1438592_at', '1455589_at', '1428629_at', '1429585_s_at', '1440020_at', '1417365_a_at', '1426442_at', '1427151_at', '1437377_a_at', '1433995_s_at', '1435464_at', '1417007_a_at', '1429690_at', '1427999_at', '1426819_at', '1454905_at', '1439516_at', '1434509_at', '1428707_at', '1416793_at', '1440822_x_at', '1437327_x_at', '1428682_at', '1435004_at', '1434238_at', '1417581_at', '1434699_at', '1455597_at', '1458613_at', '1456485_at', '1435122_x_at', '1452864_at', '1453122_at', '1435254_at', '1451221_at', '1460168_at', '1455336_at', '1427965_at', '1432576_at', '1455425_at', '1428762_at', '1455459_at', '1419317_x_at', '1434691_at', '1437950_at', '1426401_at', '1457261_at', '1433824_x_at', '1435235_at', '1437343_x_at', '1439964_at', '1444280_at', '1455434_a_at', '1424431_at', '1421519_a_at', '1428412_at', '1434010_at', '1419976_s_at', '1418887_a_at', '1428498_at', '1446883_at', '1435675_at', '1422599_s_at', '1457410_at', '1444437_at', '1421050_at', '1437885_at', '1459754_x_at', '1423807_a_at', '1435490_at', '1426760_at', '1449459_s_at', '1432098_a_at', '1437067_at', '1435574_at', '1433999_at', '1431289_at', '1428919_at', '1425678_a_at', '1434924_at', '1421640_a_at', '1440191_s_at', '1460082_at', '1449913_at', '1439830_at', '1425020_at', '1443790_x_at', '1436931_at', '1454214_a_at', '1455854_a_at', '1437061_at', '1436125_at', '1426385_x_at', '1431893_a_at', '1417140_a_at', '1435333_at', '1427907_at', '1434446_at', '1417594_at', '1426518_at', '1437345_a_at', '1420091_s_at', '1450058_at', '1435161_at', '1430348_at', '1455778_at', '1422653_at', '1447942_x_at', '1434843_at', '1454956_at', '1454998_at', '1427384_at', '1439828_at') AND + Species.Name = 'mouse' AND + ProbeSetXRef.ProbeSetFreezeId IN ( + SELECT ProbeSetFreeze.Id + FROM ProbeSetFreeze WHERE ProbeSetFreeze.Name = 'HC_M2_0606_P') + +Connection ID (thread ID): 41 +Status: NOT_KILLED + +Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off + +The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains +information that should help you find out what is causing the crash. +Writing a core file... +Working directory at /export/mysql/var/lib/mysql +Resource Limits: +Limit Soft Limit Hard Limit Units +Max cpu time unlimited unlimited seconds +Max file size unlimited unlimited bytes +Max data size unlimited unlimited bytes +Max stack size 8388608 unlimited bytes +Max core file size 0 unlimited bytes +Max resident set unlimited unlimited bytes +Max processes 3094157 3094157 processes +Max open files 64000 64000 files +Max locked memory 65536 65536 bytes +Max address space unlimited unlimited bytes +Max file locks unlimited unlimited locks +Max pending signals 3094157 3094157 signals +Max msgqueue size 819200 819200 bytes +Max nice priority 0 0 +Max realtime priority 0 0 +Max realtime timeout unlimited unlimited us +Core pattern: core + +Kernel version: Linux version 5.10.0-22-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.178-3 (2023-04-22) + +2025-02-20 7:50:17 0 [Note] Starting MariaDB 10.5.23-MariaDB-0+deb11u1-log source revision 6cfd2ba397b0ca689d8ff1bdb9fc4a4dc516a5eb as process 3086167 +2025-02-20 7:50:17 0 [Note] InnoDB: !!! innodb_force_recovery is set to 1 !!! +2025-02-20 7:50:17 0 [Note] InnoDB: Uses event mutexes +2025-02-20 7:50:17 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 +2025-02-20 7:50:17 0 [Note] InnoDB: Number of pools: 1 +2025-02-20 7:50:17 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions +2025-02-20 7:50:17 0 [Note] InnoDB: Using Linux native AIO +2025-02-20 7:50:17 0 [Note] InnoDB: Initializing buffer pool, total size = 17179869184, chunk size = 134217728 +2025-02-20 7:50:17 0 [Note] InnoDB: Completed initialization of buffer pool +2025-02-20 7:50:17 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1537379110991,1537379110991 +2025-02-20 7:50:17 0 [Note] InnoDB: Last binlog file '/var/lib/mysql/gn0-binary-log.000134', position 82843148 +2025-02-20 7:50:17 0 [Note] InnoDB: 128 rollback segments are active. +2025-02-20 7:50:17 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" +2025-02-20 7:50:17 0 [Note] InnoDB: Creating shared tablespace for temporary tables +2025-02-20 7:50:17 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... +2025-02-20 7:50:17 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. +2025-02-20 7:50:17 0 [Note] InnoDB: 10.5.23 started; log sequence number 1537379111003; transaction id 3459549902 +2025-02-20 7:50:17 0 [Note] Plugin 'FEEDBACK' is disabled. +2025-02-20 7:50:17 0 [Note] InnoDB: Loading buffer pool(s) from /export/mysql/var/lib/mysql/ib_buffer_pool +2025-02-20 7:50:17 0 [Note] Loaded 'locales.so' with offset 0x7f9551bc0000 +2025-02-20 7:50:17 0 [Note] Recovering after a crash using /var/lib/mysql/gn0-binary-log +2025-02-20 7:50:17 0 [Note] Starting crash recovery... +2025-02-20 7:50:17 0 [Note] Crash recovery finished. +2025-02-20 7:50:17 0 [Note] Server socket created on IP: '0.0.0.0'. +2025-02-20 7:50:17 0 [Warning] 'user' entry 'webqtlout@tux01' ignored in --skip-name-resolve mode. +2025-02-20 7:50:17 0 [Warning] 'db' entry 'db_webqtl webqtlout@tux01' ignored in --skip-name-resolve mode. +2025-02-20 7:50:17 0 [Note] Reading of all Master_info entries succeeded +2025-02-20 7:50:17 0 [Note] Added new Master_info '' to hash table +2025-02-20 7:50:17 0 [Note] /usr/sbin/mariadbd: ready for connections. +Version: '10.5.23-MariaDB-0+deb11u1-log' socket: '/run/mysqld/mysqld.sock' port: 3306 Debian 11 +2025-02-20 7:50:17 4 [Warning] Access denied for user 'root'@'localhost' (using password: NO) +2025-02-20 7:50:17 5 [Warning] Access denied for user 'root'@'localhost' (using password: NO) +2025-02-20 7:50:17 0 [Note] InnoDB: Buffer pool(s) load completed at 250220 7:50:17 +``` + +A possible issue is the use of the environment variable SQL_URI at this point: + +=> https://github.com/genenetwork/genenetwork2/blob/testing/gn2/wqflask/correlation/rust_correlation.py#L34 + +which is requested + +=> https://github.com/genenetwork/genenetwork2/blob/testing/gn2/wqflask/correlation/rust_correlation.py#L7 from here. + +I tried setting an environment variable "SQL_URI" with the same value as the config and rebuilt the container. That did not fix the problem. + +Running the query directly in the default mysql client also fails with: + +``` +ERROR 2013 (HY000): Lost connection to MySQL server during query +``` + +Huh, so this was not a code problem. + +Configured database to allow upgrade of tables if necessary and restarted mariadbd. + +The problem still persists. + +Note Pjotr: this is likely a mariadb bug with 10.5.23, the most recent mariadbd we use (both tux01 and tux02 are older). The dump shows it balks on creating a new thread: pthread_create.c:478. Looks similar to https://jira.mariadb.org/browse/MDEV-32262 + +10.5, 10.6, 10.11 are affected. so running correlations on production crashes mysqld? I am not trying for obvious reasons ;) the threading issues of mariadb look scary - I wonder how deep it goes. + +We'll test for a different version of mariadb combining a Debian update because Debian on tux04 is broken. diff --git a/issues/quality-control/fix-flash-messages.gmi b/issues/quality-control/fix-flash-messages.gmi index da54c52..e65c0f6 100644 --- a/issues/quality-control/fix-flash-messages.gmi +++ b/issues/quality-control/fix-flash-messages.gmi @@ -5,7 +5,7 @@ * assigned: fredm * priority: low * type: bug -* status: open +* status: closed, completed, fixed * keywords: flask, flash ## Description diff --git a/issues/quality-control/qc-r-qtl2-bundles.gmi b/issues/quality-control/qc-r-qtl2-bundles.gmi index 9cc1452..6560594 100644 --- a/issues/quality-control/qc-r-qtl2-bundles.gmi +++ b/issues/quality-control/qc-r-qtl2-bundles.gmi @@ -3,7 +3,7 @@ ## Tags * assigned: fredm, acenteno -* status: open +* status: closed, completed * type: feature request * priority: medium * keywords: quality control, QC, R/qtl2 bundle diff --git a/issues/rdf/automate-rdf-generation-and-ingress.gmi b/issues/rdf/automate-rdf-generation-and-ingress.gmi new file mode 100644 index 0000000..ef4ba9f --- /dev/null +++ b/issues/rdf/automate-rdf-generation-and-ingress.gmi @@ -0,0 +1,37 @@ +# Update RDF Generation and Ingress to Virtuoso + +## Tags + +* assigned: bonfacem +* priority: high +* tags: in-progress +* deadline: 2024-10-23 Wed + +We need to update Virtuoso in production. At the moment this is done manually. For the current set-up, we need to update the recent modified RIF+WIKI models: + + +``` +# Generate the RDF triples +time guix shell guile-dbi guile-hashing -m manifest.scm -- ./pre-inst-env ./examples/generif.scm --settings conf.scm --output /home/bonfacem/ttl-files/generif-metadata-new.ttl --documentation ./docs/generif-metadata.md + +# Make sure they are valid +guix shell -m manifest.scm -- rapper --input turtle --count /home/bonfacem/ttl-files/generif-metadata-new.ttl + +# Copy the files over to the exposed virtuoso path +cp /home/bonfacem/ttl-files/generif-metadata-new.ttl </some/dir/> + +# Get into Virtuoso (with a password) +guix shell virtuoso-ose -- isql <port-number> + +# Load the files to be loaded +# Assuming that '/var/lib/data' is where the files are +ld_dir('/var/lib/data', 'generif-metadata-new.ttl', 'http://genenetwork.org'); + +# Load the files +rdf_loader_run(); +CHECKPOINT; +``` + +Above steps should be automated and tested in CD before roll-out in production. Key considerations: + +- Pick latest important changes from git, so that we can pick what files to run instead of generating all the ttl files all the time. diff --git a/issues/rdf/hash-rdf-graph.gmi b/issues/rdf/hash-rdf-graph.gmi index c896218..2863108 100644 --- a/issues/rdf/hash-rdf-graph.gmi +++ b/issues/rdf/hash-rdf-graph.gmi @@ -5,3 +5,12 @@ ## Description Building the index is an expesive operation. Hash the graph and store the metadata in xapian, and similarly in the RDF store. The mcron-job should check whether this has changed, and if there's any difference, go ahead and re-build the index. + +Resolution: + +=> https://github.com/genenetwork/genenetwork3/pull/171 Improve Sharing Memory Across Processes. +=> https://github.com/genenetwork/genenetwork3/pull/172 Check whether table names were stored in xapian. +=> https://github.com/genenetwork/genenetwork3/pull/174 Wikidata index. +=> https://github.com/genenetwork/genenetwork3/pull/175 Refactor how the generif md5 sum is calculated and stored in XAPIAN. + +* closed diff --git a/issues/redesign-global-search-design.gmi b/issues/redesign-global-search-design.gmi new file mode 100644 index 0000000..df63791 --- /dev/null +++ b/issues/redesign-global-search-design.gmi @@ -0,0 +1,23 @@ +# Redesign Global Search Design + +## Tags +* assigned: alexm, zac +* keywords: global search, design, HTML +* type: enhancement +* status: closed, completed, done + +## Description +Rob suggested we model the global search on the NCBI PubMed interface. We should remove the `?` button, which seems to be confusing for users, and have a better user guide. + +## Tasks + +* [x] Redesign the global search to fit the NCBI PubMed model. +* [x] Replace the "?" button that acts as a user guide + +## Related issues: + +=> https://issues.genenetwork.org/issues/cleanup-base-file-gn2 + +## Notes +PR that seeks to address this issue: +=> https://github.com/genenetwork/genenetwork2/pull/880
\ No newline at end of file diff --git a/issues/remove-custom-bootstrap-css.gmi b/issues/remove-custom-bootstrap-css.gmi index 7fa6f24..14c1c35 100644 --- a/issues/remove-custom-bootstrap-css.gmi +++ b/issues/remove-custom-bootstrap-css.gmi @@ -1,7 +1,7 @@ # Remove overrides to bootstrap classes in bootstrap-custom.css * assigned: zachs, bonfacem, alexm - +* status: stalled We have a "bootstrap-custom.css" in GeneNetwork. Consider this snippet: diff --git a/issues/remove-references-to-old-gn-auth-code.gmi b/issues/remove-references-to-old-gn-auth-code.gmi index 1a03c25..8c110aa 100644 --- a/issues/remove-references-to-old-gn-auth-code.gmi +++ b/issues/remove-references-to-old-gn-auth-code.gmi @@ -4,7 +4,7 @@ * assigned: bonfacem * keywords: auth -* status: open +* status: stalled ## Description diff --git a/issues/replace-neo4j-with-virtuoso.gmi b/issues/replace-neo4j-with-virtuoso.gmi new file mode 100644 index 0000000..450fb70 --- /dev/null +++ b/issues/replace-neo4j-with-virtuoso.gmi @@ -0,0 +1,8 @@ +# Replace Neo4J with Virtuoso + +## Tags + +* assigned: bonfacem, soloshelby +* deadline: 2024-10-25 Fri + +Currently, the RAG ingests TTL files into Neo4J. Replace this with Virtuoso. diff --git a/issues/reset-password-on-container-rebuild.gmi b/issues/reset-password-on-container-rebuild.gmi index b0e4dbb..6c0ad1e 100644 --- a/issues/reset-password-on-container-rebuild.gmi +++ b/issues/reset-password-on-container-rebuild.gmi @@ -2,5 +2,6 @@ ## Tags * assigned: bonfacem +* status: stalled Whenever the virtuoso container is rebuilt, we manually have to reset the password. We should fix this by modifying the virtuoso service so that things are set automatically. diff --git a/issues/search-for-brca.gmi b/issues/search-for-brca.gmi index c42c745..05c6fd0 100644 --- a/issues/search-for-brca.gmi +++ b/issues/search-for-brca.gmi @@ -1,10 +1,31 @@ -# Search for brca +# Search Improvements: capital insensitive search for RIF+WIKI; Examples -* assigned: arun +## Tags -Search for brca does not return results for brca1 and brca2. It should. -=> https://cd.genenetwork.org/gsearch?type=gene&terms=brca +* assigned: bonfacem, rookie101 +* priority: high +* type: ops +* keywords: virtuoso -The xapian stemmer does not stem brca1 to brca. That's why when one searches for brca, results for brca1 are not returned. +## Description + +RIF search is finally working on production: + +> rif:Brca2 and group:BXD + +and capital insentive search too for the BXD. See: + +=> https://github.com/genenetwork/genenetwork3/commit/4b2e9f3fb3383421d7a55df5399aab71e0cc3b4f Stem group field regardless of case. +=> https://github.com/genenetwork/genenetwork3/commit/a37622b466f9f045db06a6f07e88fcf81b176f91 Stem all the time. + +## Questions: + +* How do we search genewiki data? + +* rif:Brca2 should also be RIF:Brca2 (prefer the latter if we have to +choose as that is what people will try) + +* Can we continue giving examples at + +=> https://genenetwork.org/search-syntax search syntax -Perhaps we should write a custom stemmer that stems brca1 to brca. But, at the same time, we should be wary of stemming terms like p450 to p. Pjotr suggests the heuristic that we look for at least 2 or 3 alphabetic characters at the beginning. Another approach is to hard-code a list of candidates to look for. diff --git a/issues/set-up-gn-guile-in-tux02.gmi b/issues/set-up-gn-guile-in-tux02.gmi new file mode 100644 index 0000000..29eca68 --- /dev/null +++ b/issues/set-up-gn-guile-in-tux02.gmi @@ -0,0 +1,15 @@ +# Set Up gn-guile in tux02 + +## Tags + +* assigned: bonfacem +* priority: high +* status: in-progress +* deadline: 2024-10-23 Wed + +## Tasks + +* [-] Create gn-guile container. +* [X] Merge gn2 UI PR. +=> https://github.com/genenetwork/genenetwork2/pull/854 Feature/gn editor UI +* [-] Test out auth editing in CD. diff --git a/issues/set-up-virtuoso-on-production.gmi b/issues/set-up-virtuoso-on-production.gmi index 88c04f7..614565a 100644 --- a/issues/set-up-virtuoso-on-production.gmi +++ b/issues/set-up-virtuoso-on-production.gmi @@ -1,8 +1,8 @@ -# Set-up Virtuoso on Production +# Set-up Virtuoso+Xapian on Production ## Tags -* assigned: bonfacem +* assigned: bonfacem, zachs, fredm * priority: high * type: ops * keywords: virtuoso @@ -11,5 +11,121 @@ We already have virtuoso set-up in tux02. Right now, to be able to interact with RDF, we need to have virtuoso set-up. This issue will unblock: +* Global Search in Production + => https://github.com/genenetwork/genenetwork3/pull/137 Update RDF endpoints + => https://github.com/genenetwork/genenetwork2/pull/808 UI/RDF frontend + + +## HOWTO: Updating Virtuoso in Production (Tux01) + + +Note where the virtuoso data directory is mapped from the "production.sh" script as you will use this in the consequent steps: + +> --share=/export2/guix-containers/genenetwork/var/lib/virtuoso=/var/lib/virtuoso + +### Generating the TTL Files + +=> https://git.genenetwork.org/gn-transform-databases/tree/generate-ttl-files.scm Run "generate-ttl-files" to generate the TTL files: + +``` +time guix shell guile-dbi -m manifest.scm -- \ +./generate-ttl-files.scm --settings conn-dev.scm --output \ +/export2/guix-containers/genenetwork-development/var/lib/virtuoso \ +--documentation /tmp/doc-directory +``` + +* [Recommended] Alternatively, copy over the TTL files (in Tux01) to the correct shared directory in the container: + +``` +cp /home/bonfacem/ttl-files/*ttl /export2/guix-containers/genenetwork/var/lib/virtuoso/ +``` + +### Loading the TTL Files + +* Make sure that the virtuoso service type has the "dirs-allowed" variable set correctly: + +``` +(service virtuoso-service-type + (virtuoso-configuration + (server-port 7892) + (http-server-port 7893) + (dirs-allowed "/var/lib/virtuoso"))) +``` + +* Get into isql: + +``` +guix shell virtuoso-ose -- isql 7892 +``` +* Make sure that no pre-existing TTL files exist in "DB.DBA.LOAD_LIST": + +``` +SQL> select * from DB.DBA.LOAD_LIST; +SQL> delete from DB.DBA.load_list; +``` +* Delete the genenetwork graph: + +``` +SQL> DELETE FROM rdf_quad WHERE g = iri_to_id('http://genenetwork.org'); +``` + +* Load all the TTL files (This takes some time): + +``` +SQL> ld_dir('/var/lib/virtuoso', '*.ttl', 'http://genenetwork.org'); +SQL> rdf_loader_run(); +SQL> CHECKPOINT; +SQL> checkpoint_interval(60); +SQL> scheduler_interval(10); +``` +* Verify you have some RDF data by running: + +``` +SQL> SPARQL +PREFIX gn: <http://genenetwork.org/id/> +PREFIX gnc: <http://genenetwork.org/category/> +PREFIX owl: <http://www.w3.org/2002/07/owl#> +PREFIX gnt: <http://genenetwork.org/term/> +PREFIX skos: <http://www.w3.org/2004/02/skos/core#> +PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> +PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> +PREFIX taxon: <http://purl.uniprot.org/taxonomy/> + +SELECT * WHERE { + ?s skos:member gn:Mus_musculus . + ?s ?p ?o . +}; +``` + +* Update GN3 Configurations to point to the correct Virtuoso instance: + +> SPARQL_ENDPOINT="http://localhost:7893/sparql" + +## HOWTO: Generating the Xapian Index + +* Make sure you are using the correct guix profile or that you have the "PYTHONPATH" pointing to the GN3 repository. + +* Generate the Xapian Index using "genenetwork3/scripts/create-xapian-index" against the correct output directory (The build takes around 71 minutes on an SSD Drive): + +``` +time python index-genenetwork create-xapian-index \ +/export/data/genenetwork-xapian/ \ +mysql://<user>:<password>@localhost/db_webqtl \ +http://localhost:7893/sparql +``` +* After the build, you can verify that the index works by: + +``` +guix shell xapian -- xapian-delve /export/data/genenetwork-xapian/ +``` +* Update GN3 configuration files to point to the right Xapian path: + +> XAPIAN_DB_PATH="/export/data/genenetwork-xapian/" + +## Resolution + +@fredm updated virtuoso; and @zachs updated the xapian index in production. + +* closed diff --git a/issues/systems/apps.gmi b/issues/systems/apps.gmi new file mode 100644 index 0000000..b9d4155 --- /dev/null +++ b/issues/systems/apps.gmi @@ -0,0 +1,207 @@ +# Apps + +GeneNetwork.org retains a number of apps. Currently they are managed by shepherd as `guix shell` services, but we should really move them to system containers. + +# Tags + +* assigned: pjotrp +* type: enhancement +* status: in progress +* priority: medium +* keywords: system, sheepdog, shepherd + +# Tasks + +* [ ] Get services running +* [ ] Move guix shell into containers +* [ ] Make sure the container starts up on reboot and/or migrate to a new host + +# List of apps + +Current apps managed by shepherd/systemd on tux02/balg01 are + +=> https://genecup.org/ +* [+] genecup [shell] (hao) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - add link in GN2 +* [X] - add banner for GeneNetwork +* [ ] - create system container +* [X] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> https://bnw.genenetwork.org/ +* [+] bnw [container] (yan cui and rob) +* [X] - fire up service +* [X] - add sheepdog monitor +* [X] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - update system container +* [X] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> http://hrdp.genenetwork.org +* [+] hrdp-project (hao?) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - https +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [ ] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> https://pluto.genenetwork.org/ +* [+] pluto (saunak) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [ ] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> https://power.genenetwork.org/ +* [+] power app (dave) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [X] - create guix root +* [ ] - make sure it works on reboot (systemd) +* [ ] root? +=> http://longevity-explorer.genenetwork.org/ +* [+] Longevity explorer [container balg01] (dave) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - https +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [ ] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> http://jumpshiny.genenetwork.org/ +* [+] jumpshiny app (xusheng) +* [+] - fire up service (still some dependencies) +* [X] - add sheepdog monitor +* [ ] - https +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [ ] - create guix root +* [ ] - make sure it works on reboot (systemd) +=> https://hegp.genenetwork.org/ +* [+] hegp (pjotr) +* [X] - fire up service +* [X] - add sheepdog monitor +* [ ] - add link in GN2 +* [ ] - add banner for GeneNetwork +* [ ] - create system container +* [ ] - create guix root +* [X] - make sure it works on reboot (systemd) + +* [-] singlecell (siamak) +* [-] rn6app (hao - remove) +* [-] genome-browser (defunct) + +To fix them we need to validate the sheepdog monitor and make sure they are working in either shepherd (+), or as a system container (X). + +Sheepdog monitor is at + +=> http://sheepdog.genenetwork.org/sheepdog/status.html + +# Info + +## BNW + +The app is already a Guix system container! To make it part of the startup I had to move it away from shepherd (which runs in userland) and: + +``` +/home/shepherd/guix-profiles/bnw/bin/guix system container /home/shepherd/guix-bioinformatics/gn/services/bnw-container.scm --share=/home/shepherd/logs/bnw-server=/var/log --network +ln -s /gnu/store/0hnfb9ynnxsig3yyprwxmg5h6c9g8mry-run-container /usr/local/bin/bnw-app-container +``` + +systemd service: + +``` +root@tux02:/etc/systemd/system# cat bnw-app-container.service +[Unit] +Description = Run genenetwork BNW app container +[Service] +ExecStart = /usr/local/bin/bnw-app-container +[Install] +WantedBy = multi-user.target +``` + +We need to make sure the garbace collector does not destroy the container, add the --root switch + +``` +/home/shepherd/guix-profiles/bnw/bin/guix system container /home/shepherd/guix-bioinformatics/gn/services/bnw-container.scm --share=/home/shepherd/logs/bnw-server=/var/log --network --root=/usr/local/bin/bnw-app-container +``` + +Check with + +``` +root@tux02:/home/shepherd# /home/shepherd/guix-profiles/bnw/bin/guix gc --list-roots |grep bnw + /usr/local/bin/bnw-app-container +``` + +## R/shiny apps + +The R/shiny apps were showing a tarball mismatch: + +``` +building /gnu/store/rjnw7k56z955v4bl07flm9pjwxx5vs0r-r-minimal-4.0.2.drv... +downloading from http://cran.r-project.org/src/contrib/Archive/KernSmooth/KernSmooth_2.23-17.tar.gz ... +- 'configure' phasesha256 hash mismatch for /gnu/store/n05zjfhxl0iqx1jbw8i6vv1174zkj7ja-KernSmooth_2.23-17.tar.gz: + expected hash: 11g6b0q67vasxag6v9m4px33qqxpmnx47c73yv1dninv2pz76g9b + actual hash: 1ciaycyp79l5aj78gpmwsyx164zi5jc60mh84vxxzq4j7vlcdb5p + hash mismatch for store item '/gnu/store/n05zjfhxl0iqx1jbw8i6vv1174zkj7ja-KernSmooth_2.23-17.tar.gz' +``` + +Guix checks and it is not great CRAN allows for changing tarballs with the same version number!! Luckily building with a more recent version of Guix just worked (TM). Now we create a root too: + +``` +/home/wrk/opt/guix-pull/bin/guix pull -p ~/guix-profiles/guix-for-r-shiny +``` + +Note I did not have to pull in guix-bioinformatics channel + +## Singlecell + +Singlecell is an R/shiny app. It starts with an error after above upgrade: + +``` +no slot of name "counts" for this object of class +``` + +and the code needs to be updated: + +=> https://github.com/satijalab/seurat/issues/8804 + +The 4 year old code lives at + +=> https://github.com/genenetwork/singleCellRshiny + +and it looks like lines like these need to be updated: + +=> https://github.com/genenetwork/singleCellRshiny/blob/6b2a344dd0d02f65228ad8c350bac0ced5850d05/app.R#L167 + +Let me ask the author Siamak Yousefi. I think we'll drop it. + +## longevity + +Package definition is at + +=> https://git.genenetwork.org/guix-bioinformatics/tree/gn/packages/mouse-longevity.scm + +Container is at + +=> https://git.genenetwork.org/guix-bioinformatics/tree/gn/services/bxd-power-container.scm + +## jumpshiny + +Jumpshiny is hosted on balg01. Scripts are in tux02 git. + +``` +root@balg01:/home/j*/gn-machines# . /usr/local/guix-profiles/guix-pull/etc/profile +guix system container --network -L . -L ../guix-forge/guix/ -L ../guix-bioinformatics/ -L ../guix-past/modules/ --substitute-urls='https://ci.guix.gnu.org https://bordeaux.guix.gnu.org https://cuirass.genenetwork.org' test-r-container.scm -L ../guix-forge/guix/gnu/store/xyks73sf6pk78rvrwf45ik181v0zw8rx-run-container +/gnu/store/6y65x5jk3lxy4yckssnl32yayjx9nwl5-run-container +``` diff --git a/issues/systems/fallbacks-and-backups.gmi b/issues/systems/fallbacks-and-backups.gmi index 9b890c7..53bd8fa 100644 --- a/issues/systems/fallbacks-and-backups.gmi +++ b/issues/systems/fallbacks-and-backups.gmi @@ -1,6 +1,12 @@ # Fallbacks and backups -As a hurricane is barreling towards our machine room in Memphis we are checking our fallbacks and backups for GeneNetwork. For years we have been making backups on Amazon - both S3 and a running virtual machine. The latter was expensive, so I replaced it with a bare metal server which earns itself (if it hadn't been down for months, but that is a different story). +A revisit to previous work on backups etc. The sheepdog hosts are no longer responding and we should really run sheepdog on a machine that is not physically with the other machines. In time sheepdog should also move away from redis and run in a system container, but that is for later. I did most of the work late 2021 when I wrote: + +> As a hurricane is barreling towards our machine room in Memphis we are checking our fallbacks and backups for GeneNetwork. For years we have been making backups on Amazon - both S3 and a running virtual machine. The latter was expensive, so I replaced it with a bare metal server which earns itself (if it hadn't been down for months, but that is a different story). + +As we are introducing an external sheepdog server we may give it a DNS entry as sheepdog.genenetwork.org. + +=> http://sheepdog.genenetwork.org/sheepdog/index.html See also @@ -16,13 +22,15 @@ See also ## Tasks -* [.] backup ratspub, r/shiny, bnw, covid19, hegp, pluto services -* [X] /etc /home/shepherd backups for Octopus -* [X] /etc /home/shepherd backups for P2 -* [X] Get backups running again on fallback -* [ ] fix redis queue for P2 - needs to be on rabbit +* [X] fix redis queue and sheepdog server +* [X] check backups on tux01 +* [ ] drop tux02 backups off-site +* [ ] backup ratspub, r/shiny, bnw, covid19, hegp, pluto services +* [ ] /etc /home/shepherd backups for Octopus +* [ ] /etc /home/shepherd /home/git CI-CD GN-QA backups on Tux02 +* [ ] Get backups running again on fallback * [ ] fix bacchus large backups -* [ ] backup octopus01:/lizardfs/backup-pangenome on bacchus +* [ ] mount bacchus on HPC ## Backup and restore @@ -52,22 +60,21 @@ Recently epysode was reinstated after hardware failure. I took the opportunity t As epysode was one of the main sheepdog messaging servers I need to reinstate: * [X] scripts for sheepdog -* [X] enable trim -* [X] reinstate monitoring web services -* [X] reinstate daily backup from penguin2 -* [X] CRON -* [X] make sure messaging works through redis -* [X] fix and propagate GN1 backup -* [X] fix and propagate IPFS and gitea backups -* [X] add GN1 backup -* [X] add IPFS backup -* [X] other backups +* [ ] Check tunnel on tux01 is reinstated +* [ ] enable trim +* [ ] reinstate monitoring web services +* [ ] reinstate daily backups +* [ ] CRON +* [ ] make sure messaging works through redis +* [ ] fix and propagate GN1 backup +* [ ] fix and propagate fileserver and git backups +* [ ] add GN1 backup +* [ ] other backups * [ ] email on fail Tux01 is backed up now. Need to make sure it propagates to -* [X] P2 -* [X] epysode -* [X] rabbit -* [X] Tux02 +* [ ] rabbit +* [ ] Tux02 +* [ ] balg01 * [ ] bacchus diff --git a/issues/systems/machine-room.gmi b/issues/systems/machine-room.gmi deleted file mode 100644 index 28d9921..0000000 --- a/issues/systems/machine-room.gmi +++ /dev/null @@ -1,19 +0,0 @@ -# Machine room - -## Tags - -* assign: pjotrp, dana -* type: system administration -* priority: high -* keywords: systems -* status: unclear - -## Tasks - -* [X] Make tux02e visible from outside -* [ ] Network switch 10Gbs - add hosts -* [ ] Add disks to tux01 and tux02 - need to reboot -* [ ] Set up E-mail relay for tux01 and tux02 smtp.uthsc.edu, port 25 - -=> tux02-production.gmi setup new production machine -=> decommission-machines.gmi Decommission machines diff --git a/issues/systems/octoraid-storage.gmi b/issues/systems/octoraid-storage.gmi new file mode 100644 index 0000000..97e0e55 --- /dev/null +++ b/issues/systems/octoraid-storage.gmi @@ -0,0 +1,18 @@ +# OctoRAID + +We are building machines that can handle cheap drives. + +# octoraid01 + +This is a jetson with 4 22TB seagate-ironwolf-pro-st22000nt001-22tb-enterprise-nas-hard-drives-7200-rpm. + +Unfortunately the stock kernel has no RAID support, so we simple mount the 4 drives (hosted on a USB-SATA bridge). + +Stress testing: + +``` +cd /export/nfs/lair01 +stress -v -d 1 +``` + +Running on multiple disks the jetson is holding up well! diff --git a/issues/systems/penguin2-raid5.gmi b/issues/systems/penguin2-raid5.gmi new file mode 100644 index 0000000..f03075d --- /dev/null +++ b/issues/systems/penguin2-raid5.gmi @@ -0,0 +1,61 @@ +# Penguin2 RAID 5 + +# Tags + +* assigned: @fredm, @pjotrp +* status: in progress + +# Description + +The current RAID contains 3 disks: + +``` +root@penguin2:~# cat /proc/mdstat +md0 : active raid5 sdb1[1] sda1[0] sdg1[4] +/dev/md0 33T 27T 4.2T 87% /export +``` + +using /dev/sda,sdb,sdg + +The current root and swap is on + +``` +# root +/dev/sdd1 393G 121G 252G 33% / +# swap +/dev/sdd5 partition 976M 76.5M -2 +``` + +We can therefore add four new disks in slots /dev/sdc,sde,sdf,sdh + +penguin2 has no out-of-band and no serial connector right now. That means any work needs to be done on the terminal. + +Boot loader menu: + +``` +menuentry 'Debian GNU/Linux' --class debian --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-7ff268df-cb90-4cbc-9d76-7fd6677b4964' { + load_video + insmod gzio + if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi + insmod part_msdos + insmod ext2 + set root='hd2,msdos1' + if [ x$feature_platform_search_hint = xy ]; then + search --no-floppy --fs-uuid --set=root --hint-bios=hd2,msdos1 --hint-efi=hd2,msdos1 --hint-baremetal=ahci2,msdos1 7ff268df-cb90-4cbc-9d76-7fd6677b4964 + else + search --no-floppy --fs-uuid --set=root 7ff268df-cb90-4cbc-9d76-7fd6677b4964 + fi + echo 'Loading Linux 5.10.0-18-amd64 ...' + linux /boot/vmlinuz-5.10.0-18-amd64 root=UUID=7ff268df-cb90-4cbc-9d76-7fd6677b4964 ro quiet + echo 'Loading initial ramdisk ...' + initrd /boot/initrd.img-5.10.0-18-amd64 +} +``` + +Added to sdd MBR + +``` +root@penguin2:~# grub-install /dev/sdd +Installing for i386-pc platform. +Installation finished. No error reported. +``` diff --git a/issues/systems/tux02-production.gmi b/issues/systems/tux02-production.gmi index 7de911f..d811c5e 100644 --- a/issues/systems/tux02-production.gmi +++ b/issues/systems/tux02-production.gmi @@ -14,9 +14,9 @@ We are going to move production to tux02 - tux01 will be the staging machine. Th * [X] update guix guix-1.3.0-9.f743f20 * [X] set up nginx (Debian) -* [X] test ipmi console (172.23.30.40) +* [X] test ipmi console * [X] test ports (nginx) -* [?] set up network for external tux02e.uthsc.edu (128.169.4.52) +* [?] set up network for external tux02 * [X] set up deployment evironment * [X] sheepdog copy database backup from tux01 on a daily basis using ibackup user * [X] same for GN2 production environment diff --git a/issues/systems/tux04-disk-issues.gmi b/issues/systems/tux04-disk-issues.gmi index cea5a59..bc6e1db 100644 --- a/issues/systems/tux04-disk-issues.gmi +++ b/issues/systems/tux04-disk-issues.gmi @@ -1,4 +1,4 @@ -# Tux04 disk issues +# Tux04/Tux05 disk issues We are facing some disk issues with Tux04: @@ -6,6 +6,10 @@ We are facing some disk issues with Tux04: May 02 20:57:42 tux04 kernel: Buffer I/O error on device sdf1, logical block 859240457 ``` +and the same happened to tux05 (same batch). Basically the controllers report no issues. Just to be sure we added +a copy of the boot partition. + +=> topics/system/linux/add-boot-partition # Tags @@ -52,6 +56,8 @@ Download megacli from => https://hwraid.le-vert.net/wiki/DebianPackages ``` +apt-get update +apt-get install megacli megacli -LDInfo -L5 -a0 ``` @@ -95,3 +101,280 @@ and nothing ;). Megacli is actually the tool to use ``` megacli -AdpAllInfo -aAll ``` + +# Database + +During a backup the DB shows this error: + +``` +2025-03-02 06:28:33 Database page corruption detected at page 1079428, retrying...\n[01] 2025-03-02 06:29:33 Database page corruption detected at page 1103108, retrying... +``` + + +Interestingly the DB recovered on a second backup. + +The database is hosted on a solid /dev/sde Dell Ent NVMe FI. The log says + +``` +kernel: I/O error, dev sde, sector 2136655448 op 0x0:(READ) flags 0x80700 phys_seg 40 prio class 2 +``` + +Suggests: + +=> https://stackoverflow.com/questions/50312219/blk-update-request-i-o-error-dev-sda-sector-xxxxxxxxxxx + +> The errors that you see are interface errors, they are not coming from the disk itself but rather from the connection to it. It can be the cable or any of the ports in the connection. +> Since the CRC errors on the drive do not increase I can only assume that the problem is on the receive side of the machine you use. You should check the cable and try a different SATA port on the server. + +and someone wrote + +> analyzed that most of the reasons are caused by intensive reading and writing. This is a CDN cache node. Type reading NVME temperature is relatively high, if it continues, it will start to throttle and then slowly collapse. + +and temperature on that drive has been 70 C. + +Mariabd log is showing errors: + +``` +2025-03-02 6:54:47 0 [ERROR] InnoDB: Failed to read page 449925 from file './db_webqtl/SnpAll.ibd': Page read from tablespace is corrupted. +2025-03-02 7:01:43 489015 [ERROR] Got error 180 when reading table './db_webqtl/ProbeSetXRef' +2025-03-02 8:10:32 489143 [ERROR] Got error 180 when reading table './db_webqtl/ProbeSetXRef' +``` + +Let's try and dump those tables when the backup is done. + +``` +mariadb-dump -uwebqtlout db_webqtl SnpAll +mariadb-dump: Error 1030: Got error 1877 "Unknown error 1877" from storage engine InnoDB when dumping table `SnpAll` at row: 0 +mariadb-dump -uwebqtlout db_webqtl ProbeSetXRef > ProbeSetXRef.sql +``` + +Eeep: + +``` +tux04:/etc$ mariadb-check -uwebqtlout -c db_webqtl ProbeSetXRef +db_webqtl.ProbeSetXRef +Warning : InnoDB: Index ProbeSetFreezeId is marked as corrupted +Warning : InnoDB: Index ProbeSetId is marked as corrupted +error : Corrupt +tux04:/etc$ mariadb-check -uwebqtlout -c db_webqtl SnpAll +db_webqtl.SnpAll +Warning : InnoDB: Index PRIMARY is marked as corrupted +Warning : InnoDB: Index SnpName is marked as corrupted +Warning : InnoDB: Index Rs is marked as corrupted +Warning : InnoDB: Index Position is marked as corrupted +Warning : InnoDB: Index Source is marked as corrupted +error : Corrupt +``` + +On tux01 we have a working database, we can test with + +``` +mysqldump --no-data --all-databases > table_schema.sql +mysqldump -uwebqtlout db_webqtl SnpAll > SnpAll.sql +``` + +Running the backup with rate limiting from: + +``` +Mar 02 17:09:59 tux04 sudo[548058]: pam_unix(sudo:session): session opened for user root(uid=0) by wrk(uid=1000) +Mar 02 17:09:59 tux04 sudo[548058]: wrk : TTY=pts/3 ; PWD=/export3/local/home/wrk/iwrk/deploy/gn-deploy-servers/scripts/tux04 ; USER=roo> +Mar 02 17:09:55 tux04 sudo[548058]: pam_unix(sudo:auth): authentication failure; logname=wrk uid=1000 euid=0 tty=/dev/pts/3 ruser=wrk rhost= > +Mar 02 17:04:26 tux04 su[548006]: pam_unix(su:session): session opened for user ibackup(uid=1003) by wrk(uid=0) +``` + +Oh oh + +Tux04 is showing errors on all disks. We have to bail out. I am copying the potentially corrupted files to tux01 right now. We have backups, so nothing serious I hope. I am only worried about the myisam files we have because they have no strong internal validation: + +``` +2025-03-04 8:32:45 502 [ERROR] db_webqtl.ProbeSetData: Record-count is not ok; is 5264578601 Should be: 5264580806 +2025-03-04 8:32:45 502 [Warning] db_webqtl.ProbeSetData: Found 28665 deleted space. Should be 0 +2025-03-04 8:32:45 502 [Warning] db_webqtl.ProbeSetData: Found 2205 deleted blocks Should be: 0 +2025-03-04 8:32:45 502 [ERROR] Got an error from thread_id=502, ./storage/myisam/ha_myisam.cc:1120 +2025-03-04 8:32:45 502 [ERROR] MariaDB thread id 502, OS thread handle 139625162532544, query id 837999 localhost webqtlout Checking table +CHECK TABLE ProbeSetData +2025-03-04 8:34:02 79695 [ERROR] mariadbd: Table './db_webqtl/ProbeSetData' is marked as crashed and should be repaired +``` + +See also + +=> https://dev.mysql.com/doc/refman/8.4/en/myisam-check.html + +Tux04 will require open heart 'disk controller' surgery and some severe testing before we move back. We'll also look at tux05-8 to see if they have similar problems. + +## Recovery + +According to the logs tux04 started showing serious errors on March 2nd - when I introduced sanitizing the mariadb backup: + +``` +Mar 02 05:00:42 tux04 kernel: I/O error, dev sde, sector 2071078320 op 0x0:(READ) flags 0x80700 phys_seg 16 prio class 2 +Mar 02 05:00:58 tux04 kernel: I/O error, dev sde, sector 2083650928 op 0x0:(READ) flags 0x80700 phys_seg 59 prio class 2 +... +``` + +The log started on Feb 23 when we had our last reboot. It probably is a good idea to turn on persistent logging! Anyway, it is likely files were fine until March 2nd. Similarly the mariadb logs also show + +``` +2025-03-02 6:53:52 489007 [ERROR] mariadbd: Index for table './db_webqtl/ProbeSetData.MYI' is corrupt; try to repair it +2025-03-02 6:53:52 489007 [ERROR] db_webqtl.ProbeSetData: Can't read key from filepos: 2269659136 +``` + +So, if we can restore a backup from March 1st we should be reasonably confident it is sane. + +First is to backup the existing database(!) Next restore the new DB by changing the DB location (symlink in /var/lib/mysql as well as check /etc/mysql/mariadb.cnf). + +When upgrading it is an idea to switch on these in mariadb.cnf + +``` +# forcing recovery with these two lines: +innodb_force_recovery=3 +innodb_purge_threads=0 +``` + +Make sure to disable (and restart) once it is up and running! + +So the steps are: + +* [X] install updated guix version of mariadb in /usr/local/guix-profiles (don't use Debian!!) +* [X] repair borg backup +* [X] Stop old mariadb (on new host tux02) +* [X] backup old mariadb database +* [X] restore 'sane' version of DB from borg March 1st +* [X] point to new DB in /var/lib/mysql and cnf file +* [X] update systemd settings +* [X] start mariadb new version with recovery setting in cnf +* [X] check logs +* [X] once running revert on recovery setting in cnf and restart + +OK, looks like we are in business again. In the next phase we need to validate files. Normal files can be checked with + +``` +find -type f \( -not -name "md5sum.txt" \) -exec md5sum '{}' \; > md5sum.txt +``` + +and compared with another set on a different server with + +``` +md5sum -c md5sum.txt +``` + +* [X] check genotype file directory - some MAGIC files missing on tux01 + +gn-docs is a git repo, so that is easily checked + +* [X] check gn-docs and sync with master repo + + +## Other servers + +``` +journalctl -r|grep -i "I/O error"|less +# tux05 +Nov 18 02:19:55 tux05 kernel: XFS (sdc2): metadata I/O error in "xfs_da_read_buf+0xd9/0x130 [xfs]" at daddr 0x78 len 8 error 74 +Nov 05 14:36:32 tux05 kernel: blk_update_request: I/O error, dev sdb, sector 1993616 op 0x1:(WRITE) flags +0x0 phys_seg 35 prio class 0 +Jul 27 11:56:22 tux05 kernel: blk_update_request: I/O error, dev sdc, sector 55676616 op 0x0:(READ) flags +0x80700 phys_seg 26 prio class 0 +Jul 27 11:56:22 tux05 kernel: blk_update_request: I/O error, dev sdc, sector 55676616 op 0x0:(READ) flags +0x80700 phys_seg 26 prio class 0 +# tux06 +Apr 15 08:10:57 tux06 kernel: I/O error, dev sda, sector 21740352 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 2 +Dec 13 12:56:14 tux06 kernel: I/O error, dev sdb, sector 3910157327 op 0x9:(WRITE_ZEROES) flags 0x8000000 phys_seg 0 prio class 2 +# tux07 +Mar 27 08:00:11 tux07 mfschunkserver[1927469]: replication error: failed to create chunk (No space left) +# tux08 +Mar 27 08:12:11 tux08 mfschunkserver[464794]: replication error: failed to create chunk (No space left) +``` + +Tux04, 05 and 06 show disk errors. Tux07 and Tux08 are overloaded with a full disk, but no other errors. We need to babysit Lizard more! + +``` +stress -v -d 1 +``` + +Write test: + +``` +dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct +``` + +Read test: + +``` +/sbin/sysctl -w vm.drop_caches=3 +dd if=./test of=/dev/zero bs=512k count=2048 +``` + + +smartctl -a /dev/sdd -d megaraid,0 + +RAID Controller in SL 3: Dell PERC H755N Front + +# The story continues + +I don't know what happened but the server gave a hard +error in the logs: + +``` +racadm getsel # get system log +Record: 340 +Date/Time: 05/31/2025 09:25:17 +Source: system +Severity: Critical +Description: A high-severity issue has occurred at the Power-On +Self-Test (POST) phase which has resulted in the system BIOS to +abruptly stop functioning. +``` + +Woops! I fixed it by resetting idrac and rebooting remotely. Nasty. + +Looking around I found this link + +=> +https://tomaskalabis.com/wordpress/a-high-severity-issue-has-occurred-at-the-power-on-self-te +st-post-phase-which-has-resulted-in-the-system-bios-to-abruptly-stop-functioning/ + +suggesting we should upgrade idrac firmware. I am not going to do that +without backups and a fully up-to-date fallback online. It may fix the +other hardware issues we have been seeing (who knows?). + +Fred, the boot sequence is not perfect yet. Turned out the network +interfaces do not come up in the right order and nginx failed because +of a missing /var/run/nginx. The container would not restart because - +missing above - it could not check the certificates. + +## A week later + +``` +[SMM] APIC 0x00 S00:C00:T00 > ASSERT [AmdPlatformRasRsSmm] u:\EDK2\MdePkg\Library\BasePciSegmentLibPci\PciSegmentLib.c(766): ((Address) & (0xfffffffff0000000ULL | (3))) == 0 !!!! X64 Exception Type - 03(#BP - Breakpoint) CPU Apic ID - 00000000 !!!! +RIP - 0000000076DA4343, CS - 0000000000000038, RFLAGS - 0000000000000002 +RAX - 0000000000000010, RCX - 00000000770D5B58, RDX - 00000000000002F8 +RBX - 0000000000000000, RSP - 0000000077773278, RBP - 0000000000000000 +RSI - 0000000000000087, RDI - 00000000777733E0 R8 - 00000000777731F8, R9 - 0000000000000000, R10 - 0000000000000000 +R11 - 00000000000000A0, R12 - 0000000000000000, R13 - 0000000000000000 +R14 - FFFFFFFFA0C1A118, R15 - 000000000005B000 +DS - 0000000000000020, ES - 0000000000000020, FS - 0000000000000020 +GS - 0000000000000020, SS - 0000000000000020 +CR0 - 0000000080010033, CR2 - 0000000015502000, CR3 - 0000000077749000 +CR4 - 0000000000001668, CR8 - 0000000000000001 +DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 +GDTR - 000000007773C000 000000000000004F, LDTR - 0000000000000000 IDTR - 0000000077761000 00000000000001FF, TR - 0000000000000040 +FXSAVE_STATE - 0000000077772ED0 +!!!! Find image based on IP(0x76DA4343) u:\Build_Genoa\DellBrazosPkg\DEBUG_MYTOOLS\X64\DellPkgs\DellChipsetPkgs\AmdGenoaModulePkg\Override\AmdCpmPkg\Features\PlatformRas\Rs\Smm\AmdPlatformRasRsSmm\DEBUG\AmdPlatformRasRsSmm.pdb (ImageBase=0000000076D3E000, EntryPoint=0000000076D3E6C0) !!!! +``` + +New error in system log: + +``` +Record: 341 Date/Time: 06/04/2025 19:47:08 +Source: system +Severity: Critical Description: A high-severity issue has occurred at the Power-On Self-Test (POST) phase which has resulted in the system BIOS to abruptly stop functioning. +``` + +The error appears to relate to AMD Brazos which is probably part of the on board APU/GPU. + +The code where it segfaulted is online at: + +=> https://github.com/tianocore/edk2/blame/master/MdePkg/Library/BasePciSegmentLibPci/PciSegmentLib.c + +and has to do with PCI registers and that can actually be caused by the new PCIe card we hosted. diff --git a/issues/systems/tux04-production.gmi b/issues/systems/tux04-production.gmi new file mode 100644 index 0000000..58ff8c1 --- /dev/null +++ b/issues/systems/tux04-production.gmi @@ -0,0 +1,279 @@ +# Production on tux04 + +Lately we have been running production on tux04. Unfortunately Debian got broken and I don't see a way to fix it (something with python versions that break apt!). Also mariadb is giving problems: + +=> issues/production-container-mechanical-rob-failure.gmi + +and that is alarming. We might as well try an upgrade. I created a new partition on /dev/sda4 using debootstrap. + +The hardware RAID has proven unreliable on this machine (and perhaps others). + +We added a drive on a PCIe raiser outside the RAID. Use this for bulk data copying. We still bootstrap from the RAID. + +Luckily not too much is running on this machine and if we mount things again, most should work. + +# Tasks + +* [X] cleanly shut down mariadb +* [X] reboot into new partition /dev/sda4 +* [X] git in /etc +* [X] make sure serial boot works (/etc/default/grub) +* [X] fix groups and users +* [X] get guix going +* [X] get mariadb going +* [X] fire up GN2 service +* [X] fire up SPARQL service +* [X] sheepdog +* [ ] fix CRON jobs and backups +* [ ] test full reboots + + +# Boot in new partition + +``` +blkid /dev/sda4 +/dev/sda4: UUID="4aca24fe-3ece-485c-b04b-e2451e226bf7" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="2e3d569f-6024-46ea-8ef6-15b26725f811" +``` + +After debootstrap there are two things to take care of: the /dev directory and grub. For good measure +I also capture some state + +``` +cd ~ +ps xau > cron.log +systemctl > systemctl.txt +cp /etc/network/interfaces . +cp /boot/grub/grub.cfg . +``` + +we should still have access to the old root partition, so I don't need to capture everything. + +## /dev + +I ran MAKEDEV and that may not be needed with udev. + +## grub + +We need to tell grub to boot into the new partition. The old root is on +UUID=8e874576-a167-4fa1-948f-2031e8c3809f /dev/sda2. + +Next I ran + +``` +tux04:~$ update-grub2 /dev/sda +Generating grub configuration file ... +Found linux image: /boot/vmlinuz-5.10.0-32-amd64 +Found initrd image: /boot/initrd.img-5.10.0-32-amd64 +Found linux image: /boot/vmlinuz-5.10.0-22-amd64 +Found initrd image: /boot/initrd.img-5.10.0-22-amd64 +Warning: os-prober will be executed to detect other bootable partitions. +Its output will be used to detect bootable binaries on them and create new boot entries. +Found Debian GNU/Linux 12 (bookworm) on /dev/sda4 +Found Windows Boot Manager on /dev/sdd1@/efi/Microsoft/Boot/bootmgfw.efi +Found Debian GNU/Linux 11 (bullseye) on /dev/sdf2 +``` + +Very good. Do a diff on grub.cfg and you see it even picked up the serial configuration. It only shows it added menu entries for the new boot. Very nice. + +At this point I feel safe to boot as we should be able to get back into the old partition. + +# /etc/fstab + +The old fstab looked like + +``` +UUID=8e874576-a167-4fa1-948f-2031e8c3809f / ext4 errors=remount-ro 0 1 +# /boot/efi was on /dev/sdc1 during installation +UUID=998E-68AF /boot/efi vfat umask=0077 0 1 +# swap was on /dev/sdc3 during installation +UUID=cbfcd84e-73f8-4cec-98ee-40cad404735f none swap sw 0 0 +UUID="783e3bd6-5610-47be-be82-ac92fdd8c8b8" /export2 ext4 auto 0 2 +UUID="9e6a9d88-66e7-4a2e-a12c-f80705c16f4f" /export ext4 auto 0 2 +UUID="f006dd4a-2365-454d-a3a2-9a42518d6286" /export3 auto auto 0 2 +/export2/gnu /gnu none defaults,bind 0 0 +# /dev/sdd1: PARTLABEL="bulk" PARTUUID="b1a820fe-cb1f-425e-b984-914ee648097e" +# /dev/sdb4 /export ext4 auto 0 2 +# /dev/sdd1 /export2 ext4 auto 0 2 +``` + +# reboot + +Next we are going to reboot, and we need a serial connector to the Dell out-of-band using racadm: + +``` +ssh IP +console com2 +racadm getsel +racadm serveraction powercycle +racadm serveraction powerstatus + +``` + +Main trick it so hit ESC, wait 2 sec and 2 when you want the bios boot menu. Ctrl-\ to escape console. Otherwise ESC (wait) ! to get to the boot menu. + +# First boot + +It still boots by default into the old root. That gave an error: + +[FAILED] Failed to start File Syste…a-2365-454d-a3a2-9a42518d6286 + +This is /export3. We can fix that later. + +When I booted into the proper partition the console clapped out. Also the racadm password did not work on tmux -- I had to switch to a standard console to log in again. Not sure why that is, but next I got: + +``` +Give root password for maintenance +(or press Control-D to continue): +``` + +and giving the root password I was in maintenance mode on the correct partition! + +To rerun grup I had to add `GRUB_DISABLE_OS_PROBER=false`. + +Once booting up it is a matter of mounting partitions and tick the check boxes above. + +The following contained errors: + +``` +/dev/sdd1 3.6T 1.8T 1.7T 52% /export2 +``` + +# Guix + +Getting guix going is a bit tricky because we want to keep the store! + +``` +cp -vau /mnt/old-root/var/guix/ /var/ +cp -vau /mnt/old-root/usr/local/guix-profiles /usr/local/ +cp -vau /mnt/old-root/usr/local/bin/* /usr/local/bin/ +cp -vau /mnt/old-root/etc/systemd/system/guix-daemon.service* /etc/systemd/system/ +cp -vau /mnt/old-root/etc/systemd/system/gnu-store.mount* /etc/systemd/system/ +``` + +Also had to add guixbuild users and group by hand. + +# nginx + +We use the streaming facility. Check that + +``` +nginx -V +``` + +lists --with-stream=static, see + +=> https://serverfault.com/questions/858067/unknown-directive-stream-in-etc-nginx-nginx-conf86/858074#858074 + +and load at the start of nginx.conf: + +``` +load_module /usr/lib/nginx/modules/ngx_stream_module.so; +``` + +and + +``` +nginx -t +``` + +passes + +Now the container responds to the browser with `Internal Server Error`. + +# container web server + +Visit the container with something like + +``` +nsenter -at 2838 /run/current-system/profile/bin/bash --login +``` + +The nginx log in the container has many + +``` +2025/02/22 17:23:48 [error] 136#0: *166916 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: genenetwork.org, request: "GET /gn3/gene/aliases/st%2029:1;o;s HTTP/1.1", upstream: "http://127.0.0.1:9800/gene/aliases/st%2029:1;o;s", host: "genenetwork.org" +``` + +that is interesting. Acme/https is working because GN2 is working: + +``` +curl https://genenetwork.org/api3/version +"1.0" +``` + +Looking at the logs it appears it is a redis problem first for GN2. + +Fred builds the container with `/home/fredm/opt/guix-production/bin/guix`. Machines are defined in + +``` +fredm@tux04:/export3/local/home/fredm/gn-machines +``` + +The shared dir for redis is at + +--share=/export2/guix-containers/genenetwork/var/lib/redis=/var/lib/redis + +with + +``` +root@genenetwork-production /var# ls lib/redis/ -l +-rw-r--r-- 1 redis redis 629328484 Feb 22 17:25 dump.rdb +``` + +In production.scm it is defined as + +``` +(service redis-service-type + (redis-configuration + (bind "127.0.0.1") + (port 6379) + (working-directory "/var/lib/redis"))) +``` + +The defaults are the same as the definition of redis-service-type (in guix). Not sure why we are duplicating. + +After starting redis by hand I get another error `500 DatabaseError: The following exception was raised while attempting to access http://auth.genenetwork.org/auth/data/authorisation: database disk image is malformed`. The problem is it created +a DB in the wrong place. Alright, the logs in the container say: + +``` +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:C 23 Feb 2025 14:04:31.040 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:C 23 Feb 2025 14:04:31.040 # Redis version=7.0.12, bits=64, commit=00000000, modified=0, pid=3977, just started +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:C 23 Feb 2025 14:04:31.040 # Configuration loaded +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.041 * Increased maximum number of open files to 10032 (it was originally set to 1024). +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.041 * monotonic clock: POSIX clock_gettime +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.041 * Running mode=standalone, port=6379. +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.042 # Server initialized +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.042 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.042 # Wrong signature trying to load DB from file +Feb 23 14:04:31 genenetwork-production shepherd[1]: [redis-server] 3977:M 23 Feb 2025 14:04:31.042 # Fatal error loading the DB: Invalid argument. Exiting. +Feb 23 14:04:31 genenetwork-production shepherd[1]: Service redis (PID 3977) exited with 1. +``` + +This is caused by a newer version of redis. This is odd because we are using the same version from the container?! + +Actually it turned out the redis DB was corrupted on the SSD! Same for some other databases (ugh). + +Fred copied all data to an enterprise level storage, and we rolled back to some older DBs, so hopefully we'll be OK for now. + +# Reinstating backups + +In the next step we need to restore backups as described in + +=> /topics/systems/backups-with-borg + +I already created an ibackup user. Next we test the backup script for mariadb. + +One important step is to check the database: + +``` +/usr/bin/mariadb-check -c -u user -p* db_webqtl +``` + +A successful mariadb backup consists of multiple steps + +``` +2025-02-27 11:48:28 +0000 (ibackup@tux04) SUCCESS 0 <32m43s> mariabackup-dump +2025-02-27 11:48:29 +0000 (ibackup@tux04) SUCCESS 0 <00m00s> mariabackup-make-consistent +2025-02-27 12:16:37 +0000 (ibackup@tux04) SUCCESS 0 <28m08s> borg-tux04-sql-backup +2025-02-27 12:16:46 +0000 (ibackup@tux04) SUCCESS 0 <00m07s> drop-rsync-balg01 +``` diff --git a/issues/xapian_bug.gmi b/issues/xapian_bug.gmi index f11b604..068d8eb 100644 --- a/issues/xapian_bug.gmi +++ b/issues/xapian_bug.gmi @@ -5,6 +5,7 @@ * assigned: zsloan * priority: high * type: search +* status: closed * keywords: xapian, gn2, gn3 ## Description diff --git a/miscellaneous/broken-guix-on-tux04-20241108.org b/miscellaneous/broken-guix-on-tux04-20241108.org new file mode 100644 index 0000000..a3eb550 --- /dev/null +++ b/miscellaneous/broken-guix-on-tux04-20241108.org @@ -0,0 +1,162 @@ +* Broken Guix on Tux04 Troubleshooting Log + +Noticed at arount 15:30 (UTC-0500) that guix broke for no apparent reason and I +could no longer do ~guix pull~ without a "Segmentation fault" error. + +I could not even do a ~guix search~ without the error. + +Began today (2024-11-08UTC-0500) by attempting a pull, and still got the Segmentation fault. + +Try checking out an older generation + +#+begin_src sh + $ /home/fredm/opt/guix/bin/guix -p ~/opt/guix --list-generations + ⋮ + $ /home/fredm/opt/guix/bin/guix -p ~/opt/guix --delete-generations=46 + ⋮ + Segmentation fault + $ /home/fredm/opt/guix/bin/guix -p ~/opt/guix --delete-generations=40..47 + ⋮ + Segmentation fault +#+end_src + +Try pulling with someone else's guix: + +#+begin_src bash + /var/guix/profiles/per-user/aruni/current-guix/bin/guix --profile=/export3/local/home/fredm/opt/guix --channels=channels.scm +#+end_src + +#+begin_example +(repl-version 0 1 1) +Backtrace: + 1 (primitive-load "/gnu/store/2lpf5rl0dhq1v8wjfnfmglvr0p2?") +In ice-9/boot-9.scm: + 2007:7 0 (error _ . _) + +ice-9/boot-9.scm:2007:7: In procedure error: +inferior failed "/gnu/store/5rac7n9ya6qrpbz42gpn96sa47dkz7rj-guix-2394a7f5f" +#+end_example + +Fine, maybe the issue is my profile: delete all links and do a ~guix gc~ to clear any issues. + +Upgrade guix daemon + +#+begin_src bash + $ sudo -i /var/guix/profiles/per-user/aruni/current-guix/bin/guix pull + ⋮ + $ sudo systemctl stop guix-daemon.service + $ sudo systemctl start guix-daemon.service + $ sudo systemctl status guix-daemon.service +#+end_src + +Try again: No success. + +Try: +#+begin_src shell + $ /var/guix/profiles/per-user/aruni/current-guix/bin/guix describe --format=channels > channels-aruni.scm + $ /var/guix/profiles/per-user/aruni/current-guix/bin/guix pull -C channels-aruni.scm -p /home/fredm/opt/guix + $ /home/fredm/opt/guix/bin/guix search python-requests +#+end_src + +Okay, now that gives me an output! + +Try pulling latest channels: + +#+begin_src shell + $ /home/fredm/opt/guix/bin/guix pull -C channels.scm -p ~/opt/guix + ⋮ + Building from these channels: + guix-bioinformaticshttps://git.genenetwork.org/guix-bioinformatics 3659ecd + guix-forgehttps://git.systemreboot.net/guix-forge/ 9864821 + guix-past https://gitlab.inria.fr/guix-hpc/guix-past 5fb77cc + guix https://git.savannah.gnu.org/git/guix.git 2394a7f + ⋮ + building /gnu/store/gzm7jb7hj2bmfhdjzsprar8lacz34ncr-guix-past.drv... + -builder for `/gnu/store/gzm7jb7hj2bmfhdjzsprar8lacz34ncr-guix-past.drv' failed with exit code 1 + build of /gnu/store/gzm7jb7hj2bmfhdjzsprar8lacz34ncr-guix-past.drv failed + View build log at '/var/log/guix/drvs/gz/m7jb7hj2bmfhdjzsprar8lacz34ncr-guix-past.drv.gz'. + cannot build derivation `/gnu/store/rwarhjfc9kymw5k5wvl0pz658bmmmqvh-profile.drv': 1 dependencies couldn't be built + guix pull: error: build of `/gnu/store/rwarhjfc9kymw5k5wvl0pz658bmmmqvh-profile.drv' failed +#+end_src + +Okay, view the logs: +#+begin_src sh + $ gunzip -kc /var/log/guix/drvs/gz/m7jb7hj2bmfhdjzsprar8lacz34ncr-guix-past.drv.gz + (repl-version 0 1 1) + Backtrace: + 1 (primitive-load "/gnu/store/2lpf5rl0dhq1v8wjfnfmglvr0p2?") + In ice-9/boot-9.scm: + 2007:7 0 (error _ . _) + + ice-9/boot-9.scm:2007:7: In procedure error: + inferior failed "/gnu/store/5rac7n9ya6qrpbz42gpn96sa47dkz7rj-guix-2394a7f5f" +#+end_src + +Oh boy! + +How about we start with @aruni's channels and pin everything except +=guix-bioinformatics= in the code below: +#+begin_src scheme + (list (channel + (name 'guix-bioinformatics) + (url "https://git.genenetwork.org/guix-bioinformatics") + (branch "master")) + (channel + (name 'guix-forge) + (url "https://git.systemreboot.net/guix-forge/") + (branch "main") + (commit + "9864821f976b3de355138c12f287a19fdced67df") + (introduction + (make-channel-introduction + "0432e37b20dd678a02efee21adf0b9525a670310" + (openpgp-fingerprint + "7F73 0343 F2F0 9F3C 77BF 79D3 2E25 EE8B 6180 2BB3")))) + (channel + (name 'guix-past) + (url "https://gitlab.inria.fr/guix-hpc/guix-past") + (branch "master") + (commit + "5fb77cce01f21a03b8f5a9c873067691cf09d057") + (introduction + (make-channel-introduction + "0c119db2ea86a389769f4d2b9c6f5c41c027e336" + (openpgp-fingerprint + "3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5")))) + (channel + (name 'guix) + (url "https://git.savannah.gnu.org/git/guix.git") + (branch "master") + (commit + "b0b988c41c9e0e591274495a1b2d6f27fcdae15a") + (introduction + (make-channel-introduction + "9edb3f66fd807b096b48283debdcddccfea34bad" + (openpgp-fingerprint + "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA"))))) +#+end_src +Put the code above in =channels.scm= and retry: + +#+begin_src sh + $ /home/fredm/opt/guix/bin/guix pull -C channels.scm -p ~/opt/guix + ⋮ + Building from these channels: + guix-bioinformaticshttps://git.genenetwork.org/guix-bioinformatics 3659ecd + guix-forgehttps://git.systemreboot.net/guix-forge/ 9864821 + guix-past https://gitlab.inria.fr/guix-hpc/guix-past 5fb77cc + guix https://git.savannah.gnu.org/git/guix.git b0b988c + ⋮ +#+end_src + +Success! + +Okay, now, try rebuilding the container: +#+begin_src sh + $ env PATH="/home/fredm/opt/guix/bin:${PATH}" ./production-deploy.sh +#+end_src + +Success! + +Check that https://gn2-fred.genenetwork.org/ is up and running: Yes! + +Okay. We are back! diff --git a/tasks/alexm.gmi b/tasks/alexm.gmi index 88d3927..7ec8e87 100644 --- a/tasks/alexm.gmi +++ b/tasks/alexm.gmi @@ -1,4 +1,4 @@ -# Tasks for Fred +# Tasks for Alex ## Description @@ -16,11 +16,83 @@ You can refine the search by constraining the checks some more, e.g. to get high # Tasks -* [ ] Make GNQA reliable (with @fahamu) -* [ ] Improve UX for GNQA (with @shelbys) -* [ ] GNQA add abstracts pubmed (with @shelbys) +## This week + +* [ ] Start application - Pwani +* - [X] Got all transcripts +* [+] Correlations - Fred is having issues - Rust updated on Guix +* - also take a look at long running SQL statement and large LIMIT value (check prod!) +* [ ] Friend of UTHSC - Pjotr needs to send forms +* [+] Disable spinner on production (check prod!) +* [+] Rqtl2 - BXD output work on CD +* - [ ] should go to production w. fredm + Disable for Production +* - [X] DO mice family file - children are heterozygous - family file contains parents->child +* - [X] DO GN2 compatible by generating .geno files + Test on CD +* [+ ] Minor refactorings - Rqtl2 is hacky +* [ ] Work in development system container and document +=> https://git.genenetwork.org/gn-machines/commit/?h=gn-local-development-container&id=589dcf32be90f5ec827cb6976d3cb5838d500ac0 +* [+] Create terminal output for external processes on *PRODUCTION* (Rqtl1, Rqtl2, GEMMA, pair-scan are done --- WGCNA as a pilot, with @bonfacem and @pjotrp) + + +## (14/4/25) + +* [x] Debug DO results for for genenetwork2 + * [x] inspect results from gn3 and display mapping results + * [x] Debug db tunneling connection + * [x] Debug rendering huge datatables + +## (21/4/25) +* [x] QTL computation for the DO dataset + * [x] Debug rendering large datasets using datatables + * [x] fix issue with qtl2 plot for DO dataset + * [x] Caching for qtl2 computations + +* [] Pwani Campus Application + +## 28/4/25 + +* [x] Push changes to CD/Production +* [x] Enable RQTL2 only for DO/bxd dataset +* [] look at integrating QTL for HS dataset +* [x] setup local container with bons + +## 5/05/25 + +* [] Integrate hsrat dataset for rqtl2 mapping. +* [] Pwani campus application. +* [] Look at caching for genotype probabilities (rqtl2). +* [] Add full logs on the mapping results page. +* [x] Add test feature flag for rqtl2. + +## 2/06/2025 + +* work onsubset for hs dataset;; define founder genotype files?? +* script to dump genotypes to db with bons +* experiment with caching for Genotypic probabilities rds objects +* work on genenetwork llms how to make search without login + +* masters ; submit documents + +## Next week(s) + +* [ ] Accelerate Xapian functionality - needs Aider key from Pjotr +* Check and fix CTL? +* [+] Create terminal output for external processes (Rqtl1, Rqtl2, pair-scan are done --- WGCNA as a pilot, with @bonfacem and @pjotrp) +* [X] GNQA says there are no results, but has them +* [X] Correlations are slow + +## Done + +* [X] Rqtl1 - ITP output - 3K individuals - family file +* [X] When bonz is ready wire up GNQA +* + balg-qa.genenetwork.org +* [X] Don't support new PIL - stick to the old one in guix-bioninformatics +* [X] Make GNQA reliable (with @fahamu) +* [X] Improve UX for GNQA (with @shelbys) -- Adrian wants to use our AI UX for their setup +* [X] GNQA add abstracts pubmed (with @shelbys) => ../issues/fetch-pubmed-references-to-gnqa +* [X] Edit markdown/gemtext pages through web UI (with @bonfacem) + -* [ ] Edit markdown/gemtext pages through web UI (with @bonfacem) -* [ ] GNQA add GN metadata with @bonfacem -* [ ] Create terminal output for external processes (WGCNA as a pilot, with @bonfacem and @pjotrp) diff --git a/tasks/bonfacem.gmi b/tasks/bonfacem.gmi index 52f4027..03848f1 100644 --- a/tasks/bonfacem.gmi +++ b/tasks/bonfacem.gmi @@ -8,9 +8,62 @@ ## Tasks -* [X] Indexing generif data / Improve Local Search -* [ ] Add hashes to RDF metadata -* [-] Brain Data (To be spec'ed further) +### Note +- GN-auth dashboard fixes. Follow up with Fred. +- Case-attributes used in co-variates. +- Encourage FahamuAI to be open. + +### This week +* [+] Case Attributes (Do a diagnostic and delegate) +* - Git blame. Add tests. +* - Error when checking the history. +* - Reach out to Zach. +* - Disable diff in the UI. +* [ ] Distinct admin and dev user. +* [ ] Adapter to LMDB into a cross object. +* - Try computations with R/qtl2. +* - Look at R LMDB libraries. +* - Look at functions that read the files. +* - PJ: LMDB adapter in R and cross-type files. +* [ ] Send Arun an e-mail on how to go about upgrading shepherd. +* [ ] Dump all genotypes from production to LMDB. +* - PJ sync tux01 genotypes with tux02/04. +* [+] Correlations hash. +* - Add dataset count to RDF. +* [ ] Spam + LLMs +* - RateLimiting for Rif Editing. +* - Honep Pot approach. +* [+] Help Alex with SSL certification container error. +* - Put the changes in the actual scm files. +* [X] Python Fahamu. +* [X] Memvid - brief look. + +### Later +* [ ] Dockerise GN container. For Harm. +* [ ] Send emails when job fail. +* [ ] Look at updating gn-auth/gn-libs to PYTHONPATH for gn2/3. +* [ ] Sample/individual/strain/genometype counts for PublishData only - ProbeSetData? https://github.com/genenetwork/genenetwork2/blob/testing/scripts/sample_count.py - mirror in RDF and use global search +* - search for all traits that have more than X samples +* [ ] Add case attributes to RDF and share with Felix (depends on @felixl) +* [ ] xapian search, add dataset size keys, as well as GN accession id, trait id, and date/year +* - Improve xapian markdown docs to show all used fields/keys with examples +* - genewiki search (link in table? check with Rob) +* - base line with GN1 search - add tests +* - Fix missing search term for sh* - both menu search and global search +* - Use GN1 as a benchmark for search results (mechanical Rob?) +* - Xapian ranges for markers + +### Even later + +* [ ] Rest API for precompute output (mapping with GEMMA) +* [ ] GNQA add GN metadata (to RAG) +* - Focus on RIF +* - triple -> plain text +* - bob :fatherof nancy -> Bob is the father of Nancy. + +## Later + +* [ ] AI improvements ### On going tasks @@ -34,3 +87,47 @@ Should something in one of these closed issues be amiss, we can always and shoul Currently closed issues are: => https://issues.genenetwork.org/search?type=closed-issue&query=assigned%3ABonfaceKilz%20AND%20type%3Aissue%20AND%20is%3Aclosed Closed Issues + +* [X] Indexing generif data / Improve Local Search +* [X] lmdb publishdata output and share with Pjotr and Johannes + +## Done + +* [X] Add lmdb output hashes with index and export LMDB_DATA_DIRECTORY +* [X] Share small database with @pjotrp and @felixl +* [X] With Alex get rqtl2 demo going in CD (for BXD) +* [X] Set up meeting with ILRI +* - Zasper https://news.ycombinator.com/item?id=42572057 - Alan +* [X] Migrate fahamuai RAG to VPS and switch tokens to GGI OpenAI account +* 1. Running AI server using (our) VPS and our tokens +* + Pjotr gives API key - OpenAI - model? +* 2. Read the code base - Elixir is plumbing incl. authentication, Python processing text etc. +* 3. Try ingestion and prompt (REST API) - check out postgres tables +* 4. Backup state from production Elixir +* 5. Assess porting it to Guix (don't do any work) - minimum version Elixir +* 6. Get docs from Shelby/Brian +* [X] Set-up grobit on balg01 +* - guix docker/native +* - recent breaking changes +* [X] GeneRIF +* - Merge recent changes first. Ping Rob. +* - Brainstorm ideas around log-in. +* - Unlimited tokens that don't expire. +* - Sync prod with CD -- sqlite. +* - Add deletion +* [X] Describe Generif/wikidata access for Rob in an email with test account on CD +* 1. Send email to Rob +* 2. Work on production w. Fred +* [X] Distinguish CD from production -- banners/buttons/colors. +* [X] Use aider - give a presentation in the coming weeks +* [X] gn-auth fixes +* [X] Assess Brian's repo for deployment. +* [X] Finish container work +* - View diffs in BXD: Edit case attributes throws an error. +* [X] Check small db from: https://files.genenetwork.org/database/ +* [X] Changes to Production + (Alex) +* [X] File issue with syslog +* [X] LMDB database. +* - Simplify (focus on small files). Don't over-rely on Numpy. +* [X] Assess adding GeneRIF to LLM. +* [X] Referrer headers -- a way of preventing bots beyond rate-limiting. diff --git a/tasks/felixl.gmi b/tasks/felixl.gmi index 209e8c9..347f387 100644 --- a/tasks/felixl.gmi +++ b/tasks/felixl.gmi @@ -1,4 +1,4 @@ -# Tasks for Munyoki +# Tasks for Felix ## Tags @@ -6,12 +6,134 @@ * assigned: felixl * status: in progress -## October +## Tasks +### Goals + +1. Write papers for PhD +2. Load data into GN - serve the communities +3. Get comfortable with programming + +#### Previous week(s) + +* [x] Restless Legs Syndrome (RLS) - 'Traditional Phewas' - AI aspect - Johannes +* [+] Finalize the slide deck - so it can be read on its own +* [.] Review paper: one-liners for @pjotrp - why is this important for GN and/or thesis +* - [ ] list of relevant papers with one-liners - the WHY +=> https://pmc.ncbi.nlm.nih.gov/articles/PMC3294237/ +* [+] Analyse and discuss BXD case attributes with Rob --- both group level and dataset level +* [ ] Sane representation of case attributes in RDF with @bonfacem +* [X] Present C.elegans protocol and example mappings with GEMMA/Rqtl +* [ ] Uploader - setting up code with @fredm +* - [ ] Concrete improvement to work on +* - [X] run small database mysql locally +* - [X] aider with Sonnet + code fixes +* - [ ] document - add to code base - merge with Fred's tree - share changes with Pjotr & team +* [ ] Sort @alexm application with Pwani = this week + +### This week (07-04-2025 onwards) + +* GN2 tasks + * [ X ] Progress on Kilifish + - meet with Dennis (send him an email with all the queries needed) + - progress to format and upload data to gn2 (to be ready by latest Friday!) + * [ X ] Make a milestone with genotype smoothing + +* PhD tasks + * [ X ] Complete and share concept note and timeline to supervisors, have a meeting for progress + * [ ] Make a milestone on chapter one manuscript (deep dive into the selected papers){THE BIG PICTURE; a complete draft by early May} + +* Programming + * [ ] Make a milestone with the uploader (really push and learn!) + - documentation (use ai); add to the code base of the uploader + - utilise the hurdles to learn programming priniciples in action + +### This week (14-04-2025 onwards) + +* gn-uploader programming + * [X] - Resolve the config file issue with your local uploader + * [ ] - Run the uploader locally, then break the system, see how components connect to each other + * [ ] - document your findings + +* genotype smoothing + * [ ] - resolve errors with plotting, document your findings + +### This week (21-04-Onwards) + +* genotype smoothing + * [ ] - haplotyping tools for smoothing (plink,., etc) + - see what it can offer with smoothing. See what others say about this. + +* gn-uploader programming + * [ ] - Run the uploader locally, then break the system, see how components connect to each other (ask help from Bonz) + * [ ] - document your findings + +### This week (28-04-Onwards) +* gn-uploader programming + * [X] - Run the uploader locally, then break the system, see how components connect to each other (ask help from Bonz) + * [X] - document your findings + {Get help from your teammates/AI to jump start this!, swallow your pride! :(} + +* genotype smoothing + * [X] Keep refining the following: + * [X] filtering power adapted from plink + * [X] the xsomes mix up in the plot (probably the phenotype data?) + * [X] Update findings and push to github + +### This week (05-05-Onwards) +* programming (gn-uploader) + * [ ] - pick one file each day, review it, understand it + * [ ] - pair programming with Alex on test runs + +* HS rats scripts + * [ ] - prepare/refine scripts to quickly process HS rats file + * [ ] - assist alex with hs rats cross info + +* AOBs + * [ X ] Weekly meetings + * [ X ] follow up with Paul on his progress + * [ X ] follow up on the MSc bioinformatics project + * [ X ] follow up on Alex's application with Pwani + +### (12-05-onwards) + * [X] - HS genotypes scripting + +### (19-05-onwards) + * [X] - HS genotypes debugging (memory issue) + * [X] - pair programming with Bonz to improve the script + +### this week (26-05-onwards) + * [X] - process the genotype file for hs rats + * [X] - approach by tissues categories + * [X] - adipose and liver + - test by Xsomes for memory capture + - run the working commands + * [X] - the rest 10 other tissues (in progress) + * [X] - *.bed file vs the updated vcf files from the website? + +### this week (02-06-onwards) +* [X] - process the genotypes for the rest of the 10 tissues for HS rats +* [X] - document the new findings about smoothing using bcftools and plink + +* ## this week (09-06-onwards) +* [ ] - identify start and end points for haplotypes in hs genotype files +* [ ] - upload the final updates to gn2, test and see the results +* [ ] - gn-uploader/uploader folder, explore + +### Later weeks (non-programming tasks) + +* [ ] Kilifish into GN +* [ ] Review paper on genotyping +* [ ] HS Rat +* [ ] Prepare others for C.elegans * [ ] Upload Arabidopsis dataset * [ ] Upload Medaka dataset +* [ ] Work on improved DO and Ce genotyping + +### Done + + -## Tasks ### On going tasks => https://issues.genenetwork.org/search?query=assigned%3Afelixl+AND+is%3Aopen&type=open-issue All in-progress tasks diff --git a/tasks/fredm.gmi b/tasks/fredm.gmi index 743007a..1cd3125 100644 --- a/tasks/fredm.gmi +++ b/tasks/fredm.gmi @@ -1,11 +1,27 @@ # Tasks for Fred +# Tags + +* kanban: fredm +* assigned: @fredm +* status: in progress + +# Tasks + +* [ ] Add drives to Penguin2, see issues/systems/penguin2-raid5 +* [X] Move production files from sdc to sde +* [ ] Fix password weakness +* [ ] Fix gn-docs and editing, e.g. facilities page by gn-guile in container +* [ ] Unifiy container dirs +* [ ] Fix wikidata gene aliases (see mapping page) with @pjotrp +* [ ] Public SPARQL container? + ## Description These are the tasks and issues to be handled by Fred. Here are all open issues -=> https://issues.genenetwork.org/search?query=is%3Aopen+AND+assigned%3AFrederick&type=all assigned to Frederick +=> https://issues.genenetwork.org/search?query=is%3Aopen+AND+assigned%3Afrederick&type=all assigned to Frederick The search used for the list above is > is:open AND assigned:Frederick @@ -17,8 +33,21 @@ You can refine the search by constraining the checks some more, e.g. to get high ## Priority -… +### gn-uploader -## Extra +=> https://issues.genenetwork.org/search?query=is%3Aopen+AND+assigned%3Afrederick+AND+keywords%3Agn-uploader&type=all + +### gn-auth + +=> https://issues.genenetwork.org/search?query=is%3Aopen+AND+assigned%3Afrederick+AND+keywords%3Agn-auth&type=all -* Work on analysis of aging mice +### GeneNetwork2 + +=> https://issues.genenetwork.org/search?query=is%3Aopen+AND+assigned%3Afrederick+AND+keywords%3Agn2&type=all + +### GeneNetwork2 + +=> https://issues.genenetwork.org/search?query=is%3Aopen+AND+assigned%3Afrederick+AND+keywords%3Agn3&type=all + + +## Extra diff --git a/tasks/machine-room.gmi b/tasks/machine-room.gmi index f6c7737..77f7b8e 100644 --- a/tasks/machine-room.gmi +++ b/tasks/machine-room.gmi @@ -1,36 +1,45 @@ # Machine room tasks -## Tags +# Tags * assigned: pjotrp * priority: medium * type: system administration * keywords: system administration, octopus, gateway, tux02, tux01, tux03 -## Tasks - -### UTHSC - -* [ ] describe machines with Rick Stripes -* [ ] get bacchus back on line -* [ ] fix www.genenetwork.org and gn2.genenetwork.org https +# Tasks + +## GN + +* [ ] penguin2 has 90TB of space we can use on NFS/backups +* [ ] Script to replace reaper with GEMMA +* [ ] Transfer nervenet.org to dnsimple +* [+] Trait vectors for Johannes +* [X] grub on tux04 +* [ ] nft on tux04 +* [ ] !!Organize pluto, update Julia and add apps to GN menu Jupyter notebooks +* [+] !!Xusheng jumpshiny services +* [ ] Fix apps and create system containers for herd services - see issues/systems/apps +* [ ] Slurm+ravanan on production for GEMMA speedup +* [ ] Embed R/qtl2 (Alex) +* [ ] Hoot in GN2 (Andrew) * [ ] tux02 certbot failing (manual now) -* [ ] get data from summer211.uthsc.edu (access machine room) -* [ ] VPN access and FoUT -* [ ] penguin2 has 32TB of space we can use on NFS/backups - -Network: -* [ ] Octopus: wire up machines so they talk with each other over fiber +## Octopus: -Lambda: - -* [ ] remote access? (with Erik) - * [X] get BMC password +* [X] Fix Tux05 badblocks on /dev/sdb2 1050624 47925247 46874624 22.4G Linux filesystem + - see add-boot-partition +* [+] Copy linux partition on tux04, tux05, tux02 and test reboot +* [ ] !!Ceph on Tuxes +* [ ] Centralized user management system +* [ ] Monitor nodes +* [ ] Check machines so they talk with each other over fiber -Backups & storage: +## Backups & storage: -* [_] data warehousing +* [ ] Create and check backups of tux04 etc etc. +* [ ] set up zero to backup tux02 and report to redis +* [ ] reintroduce borg-borg on zero * [+] run sheepdog as root: redis password error; introduce SHEEPDOG_CONF * [ ] tux01 has unused 4TB spinning disk * [ ] tux02 has unused 2x4TB spinning disks and 2TB nvme /dev/nvme0n1 on adapter @@ -39,22 +48,23 @@ Backups & storage: fwupdmgr get-devices fwupdmgr update The previously problematic Samsung 980 Pro was basically using the 3B2QGXA7, and now Samsung has introduced a new 5B2QGXA7 firmware to fix the problem. The problem mainly affects the 2TB version of the 980 Pro -* [ ] Check backups of etc etc. Security: * [ ] Limit idrac access -* [X] space server out-of-band access -### Spice +## Spice -* [ ] Run GN off balg01 +* [ ] Add 2nd boot partition on balg01 * [ ] Add firewall test to sheepdog -* [ ] Convert balg02 to Guix server -* [ ] VM for student team -### Done +## Done +* [X] describe machines with Rick Stripes +* [X] get bacchus back on line +* [X] fix www.genenetwork.org and gn2.genenetwork.org https +* [-] get data from summer211.uthsc.edu (access machine room) +* [X] VPN access and FoUT * [X] lambda: get fiber working * [X] lambda: add to Octopus HPC * [X] lambda: racked up and runs @@ -82,3 +92,10 @@ Security: * [X] tux07 has no fiber * [X] tux08 has no fiber * [X] tux09 has no fiber +### Lambda +* [X] remote access? (with Erik) + * [X] get BMC password +* [X] space server out-of-band access +### Spice +* [X] Run GN off balg01 +* [X] Convert balg02 to Guix server diff --git a/tasks/octopus.gmi b/tasks/octopus.gmi index 27232ec..61955ec 100644 --- a/tasks/octopus.gmi +++ b/tasks/octopus.gmi @@ -2,6 +2,9 @@ In this file we track tasks that need to be done. +Tuxes still have some 30x 2.5" slots. +Lambda has 18x 2.5" slots. + # Tasks * [X] get lizardfs and NFS going on tuxes tux06-09 diff --git a/tasks/pjotrp.gmi b/tasks/pjotrp.gmi index 278323b..57620aa 100644 --- a/tasks/pjotrp.gmi +++ b/tasks/pjotrp.gmi @@ -6,28 +6,69 @@ * assigned: pjotrp * status: in progress -# Notes - -The tasks here should probably be broken out into appropriately tagged issues, where they have not - they can be found and filtered out with tissue (formerly gnbug). - -=> https://issues.genenetwork.org - # Current -## Mempang24 - -* [X] Invites -* [+] Payments -* [+] Rooms -* [+] Catering +## 1U01HG013760 + +* Prefix-Free Parsing Compressed Suffix Tree (PFP) for tokenization +* Mempang + +* [+] create backup server with @fredm +* [+] RAG with Shelby and Bonz +* [+] Moni builds 1U01HG013760 +* [+] test framework wfmash - vertebrate tree and HPC compute? +* - wfmash - wgatools -> PAF + FASTA to VCF +* - wfmash arch=native build +* [ ] gbam - data compression with Nick and Hasithak +* [X] accelerate wfmash with @santiago and team +* [+] package wfmash and Rust wfa2-lib +* [ ] add Ceph for distributed network storage 1U01HG013760 +* [ ] Work on pangenome genotyping 1U01HG013760 +* [ ] update freebayes into Debian (version #) +* - [ ] static build and prepare for conda +* [ ] update vcflib into Debian (version #) +* - [ ] static build and prepare for conda +* [ ] pangenome as a 1st class input for GEMMA +* kilifish pangenome with Paul and Dario + +## Systems + +* [+] jumpshiny +* [ ] pluto +* [ ] Backup production databases on Tux04 +* - [+] Dump containers w. databases +* - [X] Dump mariadb +* - [ ] backup remote +* - [ ] borg-borg +* - [ ] fix root scripts +* [ ] make sure production is up to scratch (see stable below) +* [ ] synchronize git repos for public, CD, fallback and production using sheepdog and document +* [ ] drop tux02 backups on balg01 +* [X] Small database public ## Ongoing tasks (current/urgent) -* [+] Set up stable GeneNetwork server instance with new hardware (see below) -=> /topics/systems/fire-up-genenetwork-system-container.gmi -* [+] Julia as part of GN3 deployment +* [ ] ~Felix, Alex, Rahul as friends of UTHSC +* [ ] Precompute with GEMMA + + [ ] Store N + + [ ] Store significance levels + + [ ] Check genotype input data + + [ ] Imputation + + [ ] Do same with bulkLMM + + [ ] Generate lmdb output + + [ ] Hook into Xapian + + [ ] Hook into correlations + +* [ ] Check email setup tux04 +* [ ] jbrowse plugin code - https://genenetwork.trop.in/mm10 +* [+] bulklmm Julia as part of GN3 deployment + - precompute & Julia +=> https://github.com/GregFa/TestSysimage + Here the repo with BulkLMMSysimage: +=> https://github.com/GregFa/BulkLMMSysimage => /topics/deploy/julia.gmi -* [+] Shelby's paper+application +* [X] Set up stable GeneNetwork server instance with new hardware (see below) +=> /topics/systems/fire-up-genenetwork-system-container.gmi # Tasks @@ -43,18 +84,16 @@ Now (X=done +=WIP _=kickoff ?=?) * [+] Build leadership team * [+] gBAM -* [ ] Audit DB security * [ ] p-value global search -* [ ] Xapian search add tags, notmuch style (with @zachs) +* [+] Xapian search add tags, notmuch style (with @bonfacem and @zachs) => ../issues/systems/octopus -* [ ] Add R/qtl2 and multi-parent support with Karl (DO and Magic populations) -* [ ] Fix slow search on Mariadb? +* [+] Add R/qtl2 and multi-parent support with Karl (DO and Magic populations) +* [+] Fix slow search on Mariadb? Moving to xapian * [.] GeneNetwork paper * + [ ] add FAIR statement * [ ] RISC-V trials -* [ ] Guix project server Longer term @@ -64,22 +103,24 @@ Longer term Later -* [ ] Remove ttf files GN2 +* [ ] Mempang25 1U01HG013760 + + [X] Invites + + [ ] Payments + + [ ] Rooms + + [ ] Catering + +* [ ] Remove ttf files GN2 git repo * [+] DOI support GN -* [+] Tree of life w. Andrea * [ ] Hao's idea for counting cis-qtl * [ ] Improve search for significant and suggestive hits * [ ] Include dominance matrix results in gemma (gene expression and transcript isophorms) -* [ ] UCSC browser and big wig files -* [ ] Jbrowse2 browser +* [-] UCSC browser and big wig files (no longer applicable) +* [+] Jbrowse2 browser ### Set up stable server instance with new hardware -* [ ] R/qtl2 with Karl -* [ ] Set up opensmtpd as a service - + [ ] Add package dependency - + [ ] Test on open port 25 - + [ ] Add public-inbox (Arun) +* [ ] ssh-shell access for git markdown +* [+] R/qtl2 with Karl and Alex, see [alex.gmi] => ./machine-room.gmi machine room @@ -92,6 +133,9 @@ Later ### Done +* [X] add drives to Octopus 1U01HG013760 +* [X] mount NFS bacchus on octopus 1U01HG013760 +* [X] Guix project server * [X] Build junior team * [X] Complete Octopus setup with Erik, Andrea and Efraim - adding all machines * [X] Aging grant application w. Rob (fall) @@ -104,3 +148,12 @@ Later * [X] Fix mariadb index search - need to upgrade mariadb to convert final utf8mb4, see => ../issues/slow-sql-query-for-xapian-indexing.gmi * [X] Debian/free software issues incl. vcflib work in Zig and release +* [X] Set up opensmtpd as a service + +# Notes + +The tasks here should probably be broken out into appropriately tagged issues, where they have not - they can be found and filtered out with tissue (formerly gnbug). + +=> https://issues.genenetwork.org + +Generally work applies to NIH/R073237482 and other grants. diff --git a/tasks/programmer-team/meetings.gmi b/tasks/programmer-team/meetings.gmi new file mode 100644 index 0000000..d972b3b --- /dev/null +++ b/tasks/programmer-team/meetings.gmi @@ -0,0 +1,82 @@ +# Weekly meetings + +In this document we will track tasks based of our weekly meetings. This list sets the agenda +on progress for the next week's meeting. + +## 02-10-2024 +## @felixm +* [ ] Use Aider to contribute and cover to Fred's coding. Share useful prompts. +* [ ] Feed relevant papers to GPT and find similar summary for other datasets. Start with C-Elegans. + + +## @bonfacem +* [ ] Share values with PJ. +* [ ] Assume LMDB files are transient. When hash doesn't exist, generate the hash for that dataset. Use LMDB to store key value pairs of hashes. +* [ ] Add dump script to gn-guile. +* [ ] Add Case Attributes in Virtuoso. + +## @alex +* [ ] Push R/QTL2 to production +* [ ] Have R/QTL2 work for ITP + +Nice to have: +* Think about editing publish data and consequent updates to LMDB. + +## @pjotr +* Kickstart UTHSC VPN access for Felix and Alex. + +## 01-20-2024 +### @bonfacem + +* [ ] Report: OpenAI on Aider - use AI for programming - discuss with @alexm + +=> https://issues.genenetwork.org/topics/ai/aider + +* [-] Metadata: Provide list of case attributes for BXD to @flisso +* [-] Code UI: GeneRIF and GenWiki should work from the mapping page - encourage people to use + - anyone logged in can edit + - If RIF does not exist point to GeneWiki + - If GeneWiki does not exist provide edit page +* [ ] Code export: Exporting traits to lmdb PublishData - @alexm helps with SQL + - missing data should not be an X + - run lmdb design (first code) by @pjotrp + - start exporting traits for Johannes (he will need to write a python reader) +* Later: Improve the work/dev container for @alexm + +### @flisso + +* [ ] Write: Uploader protocol. NOTES: Finished with C-elegans. Yet to test with other datasets. +* [ ] Script: Run Reaper +* [ ] Data: Case attributes - with @bonfacem +* [ ] Write: Create protocol to upload case attributes + +### @alexm + +* [ ] Code: Rqtl2 match Rqtl1: match scan changes. Notes: PR out and added tests. +* [ ] Bug: Fix pair scan. NOTES: Fixed it. But can't test it now since CD is down. +* Later: AI changes + +### @Pjotr + +* [ ] Code: Work on precompute with GEMMA (w. Jameson) +* [ ] Code: Take Bonface's trait files when they become available + + +## 01-27-2024 + +Last week's error with CD and production downtime: +* [ level 1] Container: Error messages when data not loaded in Virtuoso, Indexing. +* [ level 2] Sheepdog: Check services --- sheepdog. Health checkpoints. +* [ level 3] User feedback. Escalate errors correctly to the users, so they can report to coders + +### @bonfacem +* [ ] Troubleshoot CD. +* [ ] Export files in lmdb. Yohannes read file in Python example +* [ ] Metadata: Provide list of case attributes for BXD to @flisso +* [ ] Aider: See if it can generate some guile and python. Give an example. + +### @alexm +* [ ] UI for R/Qtl2. + +### @flisso +* [ ] Look at Fred Python code for the uploader and report on this. diff --git a/tasks/roadmap.gmi b/tasks/roadmap.gmi new file mode 100644 index 0000000..9bed63d --- /dev/null +++ b/tasks/roadmap.gmi @@ -0,0 +1,65 @@ +# GN Road map + +GN is a web service for complex traits. The main version is currently deployed in Memphis TN, mostly targetting mouse and rat. +Here we define a road map to bring GN to more communities by providing federated services. +The aim is to have plant.genenetwork.org, nematode.genenetwork.org, big.genenetwork.org running in the coming years. + +# Getting an instance up (step 1) + +## Deploy a new instance + +To test things we can use an existing database or a new one. We can deploy that as a (new) Guix service container. + +We'll need to run a few services including: + +* GN3 +* GN2 +* Auth (if required) +* Uploader (if required) + +## Get database ready + +In the first step we have to upload data for the target community. This can be done by updating the databases with some example datasets. Care has to be taken that search etc. works and that we can do the mapping. + +* Add traits +* Add genotype files +* Add metadata + +# Branding and hosting (Step 2) + +Once we have a working database with a number of example use cases we can start rebranding the service and, ideally, host it on location. + +# Synchronization (Step 3) + +## Move traits into lmdb + +This is WIP. We need to adapt the GN3 code to work with lmdb when available. + +## Move genotypes into lmdb + +This is WIP. We need to adapt the GN3 code to work with lmdb when available. + +# Federated metadata (Step 4) + +## Move all metadata into RDF + +This is WIP and happening. We will need to document. + +# LLM Integration (Step 5) + +Provide an LLM that integrates well with the gn eco-system. Goals for the LLM: + +* Flexible data ingestion +* Plug and play LLMS (local, OpenAI, Claude etc.) + +This is still a WIP. + +# Community (Step 6) + +## Uploading data examples + +## GN3 examples + +## UI examples + +## Provide programming examples diff --git a/tasks/zachs.gmi b/tasks/zachs.gmi new file mode 100644 index 0000000..6ae3df1 --- /dev/null +++ b/tasks/zachs.gmi @@ -0,0 +1,7 @@ +# Tasks for Zach + +# Tasks + +* [ ] Move non-ephemeral data out of redis into sqlite DB - see JSON dump +* - [ ] Collections +* - [ ] permanent URIs(?) diff --git a/topics/ADR/gn-guile/000-markdown-editor-push-to-bare-repo.gmi b/topics/ADR/gn-guile/000-markdown-editor-push-to-bare-repo.gmi new file mode 100644 index 0000000..05b2b6a --- /dev/null +++ b/topics/ADR/gn-guile/000-markdown-editor-push-to-bare-repo.gmi @@ -0,0 +1,18 @@ +# [gn-guile/ADR-000] Extend Markdown Editor to push to Git Bare Repo + +* author: bonfacem +* status: accepted +* reviewed-by: alexm, jnduli + +## Context + +The gn-guile markdown editor currently reads from normal git repositories. However, for GN's self-hosted git repository, we use bare repositories. Bare repositories only store the git objects, therefore we can't edit files directly. + +## Decision + +gn-guile and the cgit instance run in the same server. We will have one normal repository, and the bare repository, configurable by: "CURRENT_REPO_PATH", which has the normal raw files; and "CGIT_REPO_PATH" which is the bare repository. We will make edits to the normal repository, and once that is done, push locally to the cgit instance. + +## Consequences + +* When creating the gn-guile container, this introduces extra complexity in that will have to make sure that the container has the correct write access to the bare repository in the container. +* With this, we are coupled to our GN git set-up. diff --git a/topics/ADR/gn-transform-databases/000-remodel-rif-transform-with-predicateobject-lists.gmi b/topics/ADR/gn-transform-databases/000-remodel-rif-transform-with-predicateobject-lists.gmi new file mode 100644 index 0000000..1e3ee6a --- /dev/null +++ b/topics/ADR/gn-transform-databases/000-remodel-rif-transform-with-predicateobject-lists.gmi @@ -0,0 +1,74 @@ +# [gn-transform-databases/ADR-000] Remodel GeneRIF Metadata Using predicateObject Lists + +* author: bonfacem +* status: rejected +* reviewed-by: pjotr, jnduli + +## Context + +In RDF 1.1 Turtle, you have to use a Qname as the subject. As such, you cannot have a string literal forming the string. In simpler terms, this is not possible: + +``` +"Unique expression signature of a system that includes the subiculum, layer 6 in cortex ventral and lateral to dorsal striatum, and the endopiriform nucleus. Expression in cerebellum is apparently limited to Bergemann glia ABA" dct:created "2007-08-31T13:00:47"^^xsd:datetime . +``` + +As of commit "397745b554e0", a work-around was to manually create a unique identifier for each comment for the GeneRIF table. This identifier was created by combining GeneRIF.Id with GeneRIF.VersionId. One challenge with this is that we create some coupling with MySQL's unique generation of the GeneRIF.Id column. Here's an example of snipped turtle entries: + +``` +gn:wiki-352-0 rdfs:comment "Ubiquitously expressed. Hypomorphic vibrator allele shows degeneration of interneurons and tremor and juvenile lethality; modified by CAST alleles of Nxf1. Knockout has hepatic steatosis and hypoglycemia." . +gn:wiki-352-0 rdf:type gnc:GNWikiEntry . +gn:wiki-352-0 gnt:symbol gn:symbolPitpna . +gn:wiki-352-0 dct:created "2006-03-10T15:39:29"^^xsd:datetime . +gn:wiki-352-0 gnt:belongsToSpecies gn:Mus_musculus . +gn:wiki-352-0 dct:hasVersion "0"^^xsd:int . +gn:wiki-352-0 dct:identifier "352"^^xsd:int . +gn:wiki-352-0 gnt:initial "BAH" . +gn:wiki-352-0 foaf:mbox "XXX@XXX.XXX" . +gn:wiki-352-0 dct:references ( pubmed:9182797 pubmed:12788952 pubmed:14517553 ) . +gn:wiki-352-0 gnt:belongsToCategory ( "Cellular distribution" "Development and aging" "Expression patterns: mature cells, tissues" "Genetic variation and alleles" "Health and disease associations" "Interactions: mRNA, proteins, other molecules" ) . +``` + +## Decision + +We want to avoid manually generating a unique identifier for each WIKI comment. We should instead have that UID be a blank node reference that we don't care about and use predicateObjectLists as an idiom for representing string literals that can't be subjects. + +=> https://www.w3.org/TR/turtle/#grammar-production-predicateObjectList Predicate Object Lists + +The above transform (gn:wiki-352-0) would now be represented as: + +``` +[ rdfs:comment '''Ubiquitously expressed. Hypomorphic vibrator allele shows degeneration of interneurons and tremor and juvenile lethality; modified by CAST alleles of Nxf1. Knockout has hepatic steatosis and hypoglycemia.'''@en] rdf:type gnc:GNWikiEntry ; + gnt:belongsToSpecies gn:Mus_musculus ; + dct:created "2006-03-10 12:39:29"^^xsd:datetime ; + dct:references ( pubmed:9182797 pubmed:12788952 pubmed:14517553 ) ; + foaf:mbox <XXX@XXX.XXX> ; + dct:identifier "352"^^xsd:integer ; + dct:hasVersion "0"^^xsd:integer ; + gnt:initial "BAH" ; + gnt:belongsToCategory ( "Cellular distribution" "Development and aging" "Expression patterns: mature cells, tissues" "Genetic variation and alleles" "Health and disease associations" "Interactions: mRNA, proteins, other molecules" ) ; + gnt:symbol gn:symbolPitpna . +``` + +The above can be loosely translated as: + +``` +_:comment rdfs:comment '''Ubiquitously expressed. Hypomorphic vibrator allele shows degeneration of interneurons and tremor and juvenile lethality; modified by CAST alleles of Nxf1. Knockout has hepatic steatosis and hypoglycemia.'''@en] . +_:comment rdf:type gnc:GNWikiEntry . +_:comment dct:created "2006-03-10 12:39:29"^^xsd:datetime . +_:comment dct:references ( pubmed:9182797 pubmed:12788952 pubmed:14517553 ) . +_:comment foaf:mbox <bah@ucsd.edu> . +_:comment dct:identifier "352"^^xsd:integer . +_:comment dct:hasVersion "0"^^xsd:integer . +_:comment gnt:initial "BAH" . +_:comment gnt:belongsToCategory ( "Cellular distribution" "Development and aging" "Expression patterns: mature cells, tissues" "Genetic variation and alleles" "Health and disease associations" "Interactions: mRNA, proteins, other molecules" ) . +_:comment gnt:symbol gn:symbolPitpna . +``` + +## Consequences + +* Update SPARQL in tux02, tux01 in lockstep with updating GN3/GN2 and the XAPIAN index. +* Reduction in size of the final output, and faster transform time because using PredicateObjectLists output more terse RDF. + +## Rejection Rationale + +This proposal was rejected because relying on blank-nodes as an identifier is opaque and not human-readable. We want to use human readable identifiers where possible. diff --git a/topics/ADR/gn-transform-databases/001-remodel-ncbi-transform-with-predicateobject-lists.gmi b/topics/ADR/gn-transform-databases/001-remodel-ncbi-transform-with-predicateobject-lists.gmi new file mode 100644 index 0000000..073525a --- /dev/null +++ b/topics/ADR/gn-transform-databases/001-remodel-ncbi-transform-with-predicateobject-lists.gmi @@ -0,0 +1,102 @@ +# [gn-transform-databases/ADR-001] Remodel GeneRIF_BASIC (NCBI RIFs) Metadata Using predicateObject Lists + +* author: bonfacem +* status: rejected +* reviewed-by: pjotr, jnduli + +## Context + +We can model RIF comments using pridacetobject lists as described in: + +=> https://issues.genenetwork.org/topics/ADR/gn-transform-databases/000-remodel-rif-transform-with-predicateobject-lists [ADR/gn-transform-databases] Remodel GeneRIF Metadata Using predicateObject Lists + +However, currently for NCBI RIFs we represent comments as blank nodes: + +``` +gn:symbolsspA rdfs:comment [ + rdf:type gnc:NCBIWikiEntry ; + rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string ; + gnt:belongsToSpecies gn:Mus_musculus ; + skos:notation taxon:511145 ; + gnt:hasGeneId generif:944744 ; + dct:hasVersion '1'^^xsd:int ; + dct:references pubmed:97295 ; + ... + dct:references pubmed:15361618 ; + dct:created "2007-11-06T00:38:00"^^xsd:datetime ; +] . +gn:symbolaraC rdfs:comment [ + rdf:type gnc:NCBIWikiEntry ; + rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string ; + gnt:belongsToSpecies gn:Mus_musculus ; + skos:notation taxon:511145 ; + gnt:hasGeneId generif:944780 ; + dct:hasVersion '1'^^xsd:int ; + dct:references pubmed:320034 ; + ... + dct:references pubmed:16369539 ; + dct:created "2007-11-06T00:39:00"^^xsd:datetime ; +] . + +``` + +Here we see alot of duplicated entries for the same symbols. For the above 2 entries, everything is exactly the same except for the "gnt:hasGeneId" and "dct:references" predicates. + +## Decision + +We use predicateObjectLists with blankNodePropertyLists as an idiom to represent the generif comments. + +=> https://www.w3.org/TR/turtle/#grammar-production-predicateObjectList predicateObjectList +=> https://www.w3.org/TR/turtle/#grammar-production-blankNodePropertyList blankNodePropertyList + +In so doing, we can de-duplicate the entries demonstrated above. A representation of the above RDF Turtle triples would be: + +``` +[ rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string ] +rdf:type gnc:NCBIWikiEntry ; +dct:created "2007-11-06T00:39:00"^^xsd:datetime ; +gnt:belongsToSpecies gn:Mus_musculus ; +skos:notation taxon:511145 ; +dct:hasVersion '1'^^xsd:int ; +rdfs:seeAlso [ + gnt:hasGeneId generif:944744 ; + gnt:symbol gn:symbolsspA ; + dct:references ( pubmed:97295 ... pubmed:15361618 ) ; +] ; +rdfs:seeAlso [ + gnt:hasGeneId generif:944780 ; + gn:symbolaraC ; + dct:references ( pubmed:320034 ... pubmed:16369539 ) ; +] . +``` + +The above would translate to the following triples: + +``` +_:comment rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string . +_:comment rdfs:type gn:NCBIWikiEntry . +_:comment dct:created "2007-11-06T00:39:00"^^xsd:datetime . +_:comment gnt:belongsToSpecies gn:Mus_musculus . +_:comment skos:notation taxon:511145 . +_:comment dct:hasVersion '1'^^xsd:int . +_:comment rdfs:seeAlso _:metadata1 +_:comment rdfs:seeAlso _:metadata2 . +_:metadata1 gnt:hasGeneId generif:944744 . +_:metadata1 gnt:symbol gn:symbolaraC . +_:metadata1 dct:references ( pubmed:97295 ... pubmed:15361618 ) +_:metadata2 gnt:hasGeneId generif:944780 . +_:metadata2 gnt:symbol gn:symbolsspA . +_:metadata2 dct:references ( pubmed:320034 ... pubmed:16369539 ) . +``` + +Beyond that, we intentionally use a sequence to store a list of pubmed references. + +## Consequences + +* De-duplication of comments during the transform while retaining the integrity of the RIF metadata. +* Because of the terseness, less work during the I/O heavy operation. +* Update SPARQL in tux02, tux01 in lockstep with updating GN3/GN2 and the XAPIAN index. + +## Rejection Rationale + +This proposal was rejected because relying on blank-nodes as an identifier is opaque and not human-readable. We want to use human readable identifiers where possible. diff --git a/topics/ADR/gn-transform-databases/002-remodel-ncbi-transform-to-be-more-compact.gmi b/topics/ADR/gn-transform-databases/002-remodel-ncbi-transform-to-be-more-compact.gmi new file mode 100644 index 0000000..ac06fc1 --- /dev/null +++ b/topics/ADR/gn-transform-databases/002-remodel-ncbi-transform-to-be-more-compact.gmi @@ -0,0 +1,127 @@ +# [gn-transform-databases/ADR-002] Remodel GeneRIF_BASIC (NCBI RIFs) Metadata To Be More Compact + +* author: bonfacem +* status: proposal +* reviewed-by: pjotr, jnduli + +## Context + +Currently, we represent NCBI RIFs as blank nodes that form the object of a given symbol: + +``` +gn:symbolsspA rdfs:comment [ + rdf:type gnc:NCBIWikiEntry ; + rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string ; + gnt:belongsToSpecies gn:Mus_musculus ; + skos:notation taxon:511145 ; + gnt:hasGeneId generif:944744 ; + dct:hasVersion '1'^^xsd:int ; + dct:references pubmed:97295 ; + ... + dct:references pubmed:15361618 ; + dct:created "2007-11-06T00:38:00"^^xsd:datetime ; +] . +gn:symbolaraC rdfs:comment [ + rdf:type gnc:NCBIWikiEntry ; + rdfs:comment "N-terminus verified by Edman degradation on mature peptide"^^xsd:string ; + gnt:belongsToSpecies gn:Mus_musculus ; + skos:notation taxon:511145 ; + gnt:hasGeneId generif:944780 ; + dct:hasVersion '1'^^xsd:int ; + dct:references pubmed:320034 ; + ... + dct:references pubmed:16369539 ; + dct:created "2007-11-06T00:39:00"^^xsd:datetime ; +] . +``` + +Moreover, we also store all the different versions of a comment: + +``` +mysql> SELECT * FROM GeneRIF_BASIC WHERE SpeciesId=1 AND TaxID=7955 AND GeneId=323473 AND PubMed_ID = 15680355\G +*************************** 1. row *************************** + SpeciesId: 1 + TaxID: 7955 + GeneId: 323473 + symbol: prdm1a + PubMed_ID: 15680355 +createtime: 2010-01-21 00:00:00 + comment: One of two mutations in which defects are observed in both cell populations: it leads to a complete absence of RB neurons and a reduction in neural crest cells + VersionId: 1 +*************************** 2. row *************************** + SpeciesId: 1 + TaxID: 7955 + GeneId: 323473 + symbol: prdm1a + PubMed_ID: 15680355 +createtime: 2010-01-21 00:00:00 + comment: prdm1 functions to promote the cell fate specification of both neural crest cells and sensory neurons + VersionId: 2 +``` + +## Decision + +First, we should only store the latest version of a given RIF entry and ignore all other versions. RIF entries in the GeneRIF_BASIC table are uniquely identified by the columns: SpeciesId, GeneId, PubMed_ID, createtime, and VersionId. Since we are storing the latest version of a given RIF entry, we drop the version identifier during the RDF transform. + +We use a unique identifier for a given comment, and use that as a triple's QName: + +> gn:rif-<speciesId>-<GeneId> + +Finally instead of: + +``` +<symbol> predicate <comment metadata> +``` + +We use: + +``` +<comment-uid> predicate object ; + ... (more metadata) . +``` + +An example triple would take the form: + +``` +gn:rif-1-511145 rdf:label '''N-terminus verified by Edman degradation on mature peptide'''@en . +gn:rif-1-511145 rdf:type gnc:NCBIWikiEntry . +gn:rif-1-511145 gnt:belongsToSpecies gn:Mus_musculus . +gn:rif-1-511145 skos:notation taxon:511145 . +gn:rif-1-511145 rdfs:seeAlso [ + gnt:hasGeneId generif:944744 ; + gnt:symbol "spA" ; + dct:references ( pubmed:97295 ... pubmed:15361618 ) . +] . +gn:rif-1-511145 rdfs:seeAlso [ + gnt:hasGeneId generif:944780 ; + gnt:symbol "araC" ; + dct:references ( pubmed:320034 ... pubmed:16369539 ) . +] +``` + +To efficiently store GeneIds, symbols and references, we use blank nodes. This reduces redundancy and simplifies the triples compared to including these details within the subject: + +``` +gn:rif-1-511145-944744 rdf:label '''N-terminus verified by Edman degradation on mature peptide'''@en . +gn:rif-1-511145-944744 rdf:type gnc:NCBIWikiEntry . +gn:rif-1-511145-944744 gnt:belongsToSpecies gn:Mus_musculus . +gn:rif-1-511145-944744 skos:notation taxon:511145 . +gn:rif-1-511145-944744 gnt:hasGeneId generif:944744 . +gn:rif-1-511145-944744 gnt:symbol "spA" . +gn:rif-1-511145-944744 dct:references ( pubmed:97295 ... pubmed:15361618 ) . + +gn:rif-1-511145-944780 rdf:label '''N-terminus verified by Edman degradation on mature peptide'''@en . +gn:rif-1-511145-944780 rdf:type gnc:NCBIWikiEntry . +gn:rif-1-511145-944780 gnt:belongsToSpecies gn:Mus_musculus . +gn:rif-1-511145-944780 skos:notation taxon:511145 . +gn:rif-1-511145-944780 gnt:hasGeneId generif:944744 . +gn:rif-1-511145-944780 gnt:symbol "spA" . +gn:rif-1-511145-944780 dct:references ( pubmed:97295 ... pubmed:15361618 ) . +``` + +## Consequences + +* More complex SQL query required for the transform. +* De-duplication of RIF entries during the transform. +* Because of the terseness, less work during the I/O heavy operation. +* Update SPARQL in tux02, tux01 in lockstep with updating GN3/GN2 and the XAPIAN index. diff --git a/topics/ADR/gn3/000-add-test-cases-for-rdf.gmi b/topics/ADR/gn3/000-add-test-cases-for-rdf.gmi new file mode 100644 index 0000000..43ac2ba --- /dev/null +++ b/topics/ADR/gn3/000-add-test-cases-for-rdf.gmi @@ -0,0 +1,21 @@ +# [gn3/ADR-000] Add RDF Test Cases + +* author: bonfacem +* status: proposed +* reviewed-by: jnduli + +## Context + +We have no way of ensuring the integrity of our SPARQL queries in GN3. As such, GN3 is fragile to breaking changes when the TTL files are updated. + +## Decision + +In Virtuoso, we load all our data to a default named graph: <http://genenetwork.org>. For SPARQL/RDF tests, we should upload test ttl files to a test named graph: <http://cd-test.genenetwork.org>, and run our RDF unit tests against that named graph. + +## Consequences + +* Extra bootstrapping to load ttl files when running the test. +* Extra documentation to GN developers on how to run virtuoso locally to get the tests running. +* Testing against gn-machines to make sure that all things run accordingly. +* Extra maintenance costs to keep the TTL files in lockstep with the latest RDF changes during re-modeling. +* Improvement in GN3 reliability. diff --git a/topics/ADR/gn3/001-remove-stace-traces-in-gn3-error-response.gmi b/topics/ADR/gn3/001-remove-stace-traces-in-gn3-error-response.gmi new file mode 100644 index 0000000..0910415 --- /dev/null +++ b/topics/ADR/gn3/001-remove-stace-traces-in-gn3-error-response.gmi @@ -0,0 +1,49 @@ +# [gn3/ADR-001] Remove Stack Traces in GN3 + +* author: bonfacem +* status: rejected +* reviewed-by: jnduli, zach, pjotr, fredm + +## Context + +Currently, GN3 error responses include stack traces: + +``` +def add_trace(exc: Exception, jsonmsg: dict) -> dict: + """Add the traceback to the error handling object.""" + return { + **jsonmsg, + "error-trace": "".join(traceback.format_exception(exc)) + } + + +def page_not_found(pnf): + """Generic 404 handler.""" + current_app.logger.error("Handling 404 errors", exc_info=True) + return jsonify(add_trace(pnf, { + "error": pnf.name, + "error_description": pnf.description + })), 404 + + +def internal_server_error(pnf): + """Generic 404 handler.""" + current_app.logger.error("Handling internal server errors", exc_info=True) + return jsonify(add_trace(pnf, { + "error": pnf.name, + "error_description": pnf.description + })), 500 +``` + + +## Decision + +Stack traces have the potential to allow malicious actors compromise our system by providing more context. As such, we should send a useful description of what went wrong; and log our stack traces in our logs, and send an appropriate error status code. We can use the logs to troubleshoot our system. + +## Consequences + +* Lockstep update in GN2 UI on how we handle GN3 errors. + +## Rejection Rationale + +The proposal to remove stack traces from error responses was rejected because they are essential for troubleshooting, especially when issues are difficult to reproduce or production logs are inaccessible. Stack traces provide immediate error context, and removing them would complicate debugging by requiring additional effort to link logs with specific requests; a trade-off we are not willing to make at the moment. diff --git a/topics/ADR/gn3/002-run-rdf-tests-in-build-container.gmi b/topics/ADR/gn3/002-run-rdf-tests-in-build-container.gmi new file mode 100644 index 0000000..a8026ce --- /dev/null +++ b/topics/ADR/gn3/002-run-rdf-tests-in-build-container.gmi @@ -0,0 +1,32 @@ +# [gn3/ADR-002] Move RDF Test Cases to Build Container + +* author: bonfacem +* status: accepted +* reviewed-by: jnduli + +## Context + +GN3 RDF tests are run against the CD's virtuoso instance. As such, we need to set special parameters when running tests: + +``` +SPARQL_USER = "dba" +SPARQL_PASSWORD = "dba" +SPARQL_AUTH_URI="http://localhost:8890/sparql-auth/" +SPARQL_CRUD_AUTH_URI="http://localhost:8890/sparql-graph-crud-auth" +FAHAMU_AUTH_TOKEN="XXXXXX" +``` + +This extra bootstrapping when running tests needs care, and locks tests to CD or special configuration when running locally. This leads to fragile tests that cause CD to break. Moreover, to add tests to CD, we would have to add extra g-exp to gn-machines. + +This ADR is related to: + +=> /topics/ADR/gn3/000-add-test-cases-for-rdf.gmi gn3/ADR-000. + +## Decision + +Move tests to the test build phase of building the genenetwork3 package. These tests are added in the ".guix/genenetwork3-all-tests.scm" file instead of the main "genenetwork3" package definition in guix-bioinformatics. This way, we have all our "light" tests I.e. unit tests running in guix-bioinformatics, while having all our heavier tests, in this case, RDF tests, running in CD. + +## Consequences + +* Extra bootstrapping to gn3's .guix/genenetwork3-package.scm to get tests working. +* GN3 RDF tests refactoring to use a virtuoso instance running in the background while tests are running. diff --git a/topics/ai/aider.gmi b/topics/ai/aider.gmi new file mode 100644 index 0000000..aa88e71 --- /dev/null +++ b/topics/ai/aider.gmi @@ -0,0 +1,16 @@ +# Aider + +=> https://aider.chat/ + +``` +python3 -m venv ~/opt/python-aider +~/opt/python-aider/bin/python3 -m pip install aider-install +~/opt/python-aider/bin/aider-install +``` + +Installed 1 executable: aider +Executable directory /home/wrk/.local/bin is already in PATH + +``` +aider --model gpt-4o --openai-api-key aa... +``` diff --git a/topics/ai/ontogpt.gmi b/topics/ai/ontogpt.gmi new file mode 100644 index 0000000..94bd165 --- /dev/null +++ b/topics/ai/ontogpt.gmi @@ -0,0 +1,7 @@ +# OntoGPT + +python3 -m venv ~/opt/ontogpt +~/opt/ontogpt/bin/python3 -m pip install ontogpt + + +runoak set-apikey -e openai diff --git a/topics/authentication/architecture.gmi b/topics/authentication/architecture.gmi index 931f9cb..2200745 100644 --- a/topics/authentication/architecture.gmi +++ b/topics/authentication/architecture.gmi @@ -54,13 +54,14 @@ Users are granted privileges (see "Privileges" section) to act upon resources, t Examples of "types" of resources on the system: -- system: The system itself -- group: Collection of users considered a group -- genotype: A resource representing a genotype trait -- phenotype: A resource representing a phenotype trait -- mrna: A resource representing a collection of mRNA Assay traits -- inbredset-group: A resource representing an InbredSet group - +* system: The system itself +* group: Collection of users considered a group +* genotype: A resource representing a genotype trait +* phenotype: A resource representing a phenotype trait +* mrna: A resource representing a collection of mRNA Assay traits +* inbredset-group: A resource representing an InbredSet group + +---- * TODO: Figure out a better name/description for "InbredSet group" -- so far, I have "a classification/grouping of traits/datasets within a species". Another is to use the term "population". ## Users diff --git a/topics/authentication/development-guide.gmi b/topics/authentication/development-guide.gmi new file mode 100644 index 0000000..840c26b --- /dev/null +++ b/topics/authentication/development-guide.gmi @@ -0,0 +1,60 @@ +# GN-AUTH FAQ + +## Tags + +* type: docs, documentation +* status: ongoing, open +* keywords: authentication, authorisation, docs, documentation +* author: @jnduli + +## Quick configuration for local development + +Save a `local_settings.conf` file that has the contents: + +``` +SQL_URI = "mysql://user:password@localhost/db_name" # mysql uri +AUTH_DB = "/absolute/path/to/auth.db/" # path to sqlite db file +GN_AUTH_SECRETS = "/absolute/path/to/secrets/secrets.conf" +``` + +The `GN_AUTH_SECRETS` path has two functions: + +* It contains the `SECRET_KEY` we use in our application +* The folder containing this file is used to store our jwks. + +An example is: + +``` +SECRET_KEY = "qQIrgiK29kXZU6v8D09y4uw_sk8I4cqgNZniYUrRoUk" +``` + +## Quick set up cli commands + +``` +export FLASK_DEBUG=1 AUTHLIB_INSECURE_TRANSPORT=1 OAUTHLIB_INSECURE_TRANSPORT=1 FLASK_APP=gn_auth/wsgi +export GN_AUTH_CONF=/absolute/path/to/local_settings.conf +flask init-dev-clients --client-uri "http://localhost:port" +flask init-dev-users +flask assign-system-admin 0ad1917c-57da-46dc-b79e-c81c91e5b928 +``` + +## Handling verification for users in local development + +* Run flask init_dev_users, which will create a verified local user. +* Run `UPDATE users set verified=1` on the sqlite3 auth database. + +## Errors related to unsupported clients/redirect URIs for client + +Rerun + +``` +FLASK_DEBUG=1 AUTHLIB_INSECURE_TRANSPORT=1 OAUTHLIB_INSECURE_TRANSPORT=1 \ + GN_AUTH_CONF=/absolute/path/to/local_settings.conf FLASK_APP=gn_auth/wsgi \ + flask init-dev-clients --client-uri "http://localhost:port_you_use_for_gn2" +``` + +This will update your clients list to have all the related urls we want. + +## 500 Server Error: INTERNAL SERVER ERROR + +When you see the error: `500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:8081/auth/token`, restart the gn2 server. diff --git a/topics/authentication/permission_hooks.gmi b/topics/authentication/permission_hooks.gmi new file mode 100644 index 0000000..dd475b6 --- /dev/null +++ b/topics/authentication/permission_hooks.gmi @@ -0,0 +1,62 @@ +# Permission Hooks System Design +## Status: Draft + +## Objective + +We want to achieve: + +- Default permissions for users that come from `.edu` domains. +- Support for visitors to the website. + +This should be dynamic and easily maintenable. + +## Design + +### Events + +* Use middleware to plug into the various aspects of a requests life cycle. We'll plug into `after_request` for providing default permissions. +* Create a hook which contains: the event to handle, what part of the life cycle the hook plugs into and the actual functions to call, +* Events can be identified using their `request.base_url` parameter. +* Each hook registers itself to the global set of hooks (TODO: Figure out how to automatically handle the registration). + + +``` +@app.after_request +def handle_hooks(): + for hook in hooks: + if hook.lifecycle == "after_request" and hook.can_handle(): + hook.run() + + +Hooks = [RegistrationHook, ...] + + +class RegistrationHook: + + def can_handle(self): + request.base_url == "register" + + def lifecyle: + return "after_request" + + def run(self): + ... +``` + +### Privilege Hooks + +* After login/registration, use the email to get extra privileges assigned to a user. We use `login` too to ensure that all users have the most up-to-date roles and privileges. +* This means that any user gets assigned these privileges and normal workflows can happen. + +### Storage + +* Create a new role that contains the default privileges we want to assign to users depending on their domain. +* This role will link up with the privileges to be assigned to said user. +* Example privileges we may want to add to users in the `.edu` domain: + * group:resource:edit-resource + * system:inbreadset:apply-case-attribute-edit + * system:inbreadset:edit-case-attribute + * system:inbreadset:view-case-attribute +* Create an extra table that provides a link between some `email identifier` and the role we'd like to pre-assign. We can use python regex for the email identifier e.g. `*.edu$` or `*.utsch.edu`. +* This will be the table used by the Registration Hook. +* This also allows us to edit roles/privileges without code releases. diff --git a/topics/biohackathon/biohackrxiv2024.gmi b/topics/biohackathon/biohackrxiv2024.gmi new file mode 100644 index 0000000..a159ec4 --- /dev/null +++ b/topics/biohackathon/biohackrxiv2024.gmi @@ -0,0 +1,7 @@ +# BioHackRxiv + +We have a hacking week in Barcelona to work on BioHackRXiv. + +# Tasks + +* [ ] ORCIDs for authors in PDF diff --git a/topics/R-qtl2-format-notes.gmi b/topics/data/R-qtl2-format-notes.gmi index e0109b1..3397b5e 100644 --- a/topics/R-qtl2-format-notes.gmi +++ b/topics/data/R-qtl2-format-notes.gmi @@ -1,4 +1,4 @@ -# R/qtl2 Format Notes +# R/qtl2 and GEMMA Format Notes This document is mostly to help other non-biologists figure out their way around the format(s) of the R/qtl2 files. It mostly deals with the meaning/significance of the various fields. @@ -12,6 +12,39 @@ and We are going to consider the "non-transposed" form here, for ease of documentation: simply flip the meanings as appropriate for the transposed files. +To convert between formats we should probably use python as that is what can use as 'esperanto'. + +## Control files + +Both GN and R/qtl2 have control files. For GN it basically describes the individuals (genometypes) and looks like: + +```js +{ + "mat": "C57BL/6J", + "pat": "DBA/2J", + "f1s": ["B6D2F1", "D2B6F1"], + "genofile" : [{ + "title" : "WGS-based (Mar2022)", + "location" : "BXD.8.geno", + "sample_list" : ["BXD1", "BXD2", "BXD5", "BXD6", "BXD8", "BXD9", "BXD11", "BXD12", "BXD13", "BXD14", "BXD15", "BXD16", "BXD18", "BXD19", "BXD20", "BXD21", "BXD22", "BXD23", "BXD24", "BXD24a", "BXD25", "BXD27", "BXD28", "BXD29", "BXD30", "BXD31", "BXD32", "BXD33", "BXD34", "BXD35", "BXD36", "BXD37", "BXD38", "BXD39", "BXD40", "BXD41", "BXD42", "BXD43", "BXD44", + ...]}]} +``` + +In gn-guile this gets parsed in gn/data/genotype.scm to fetch the individuals that match the genotype and phenotype layouts. + +## pheno files and phenotypes + +The standard GEMMA input files are not very good for trouble shooting. R/qtl2 has at least the individual or genometype ID for every line: + +``` +id,bolting_days,seed_weight,seed_area,ttl_seedspfruit,branches,height,pc_seeds_aborted,fruit_length +MAGIC.1,15.33,17.15,0.64,45.11,10.5,NA,0,14.95 +MAGIC.2,22,22.71,0.75,49.11,4.33,42.33,1.09,13.27 +MAGIC.3,23,21.03,0.68,57,4.67,50,0,13.9 +``` + +This is a good standard and can match with the control files. + ## geno files > The genotype data file is a matrix of individuals × markers. The first column is the individual IDs; the first row is the marker names. @@ -22,10 +55,6 @@ For GeneNetwork, this means that the first column contains the Sample names (pre The first column of the gmap/pmap file contains genetic marker values. There are no Individuals/samples (or strains) here. -## pheno files - -The first column is the list of individuals (samples/strains) whereas the first column is the list of phenotypes. - ## phenocovar files These seem to contain extra metadata for the phenotypes. diff --git a/topics/data/precompute/steps.gmi b/topics/data/precompute/steps.gmi index 75e3bfd..d22778a 100644 --- a/topics/data/precompute/steps.gmi +++ b/topics/data/precompute/steps.gmi @@ -13,8 +13,18 @@ We will track precompute steps here. We will have: Trait archives will have steps for * [X] step p1: list-traits-to-compute -* [+] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper -* [ ] step p3: gemma-to-lmdb: create a clean vector +* [X] step p2: gemma-lmm9-loco-output: Compute standard GEMMA lmm9 LOCO vector with gemma-wrapper +* [X] step p3: gemma-to-lmdb: create a clean vector + +Start precompute + +* [ ] Fetch traits on tux04 +* [ ] Set up runner on tux04 and others +* [ ] Run on Octopus + +Work on published data + +* [ ] Fetch traits The DB itself can be updated from these @@ -22,8 +32,11 @@ The DB itself can be updated from these Later +* [ ] Rqtl2: Compute Rqtl2 vector * [ ] bulklmm: Compute bulklmm vector +Interestingly this work coincides with Arun's work on CWL. Rather than trying to write a workflow in bash, we'll use ccwl and accompanying tools to scale up the effort. + # Tags * assigned: pjotrp @@ -36,10 +49,10 @@ Later * [ ] Check Artyoms LMDB version for kinship and maybe add LOCO * [+] Create JSON metadata controller for every compute incl. type of content -* [+] Create genotype archive -* [+] Create kinship archive +* [X] Create genotype archive +* [X] Create kinship archive * [+] Create trait archives -* [+] Kick off lmm9 step +* [X] Kick off lmm9 step * [ ] Update DB step v1 # Step p1: list traits to compute @@ -62,7 +75,7 @@ At this point we can write {"2":9.40338,"3":10.196,"4":10.1093,"5":9.42362,"6":9.8285,"7":10.0808,"8":9.17844,"9":10.1527,"10":10.1167,"11":9.88551,"13":9.58127,"15":9.82312,"17":9.88005,"19":10.0761,"20":10.2739,"21":9.54171,"22":10.1056,"23":10.5702,"25":10.1433,"26":9.68685,"28":9.98464,"29":10.132,"30":9.96049,"31":10.2055,"35":10.1406,"36":9.94794,"37":9.96864,"39":9.31048} ``` -Note that it (potentially) includes the parents. Also the strain-id is a string and we may want to plug in the strain name. To allow for easy comparison downstream. Finally we may want to store a checksum of sorts. In Guile this can be achieved with: +Note that it (potentially) includes the parents and that is corrected when generating the phenotype file for GEMMA. Also the strain-id is a string and we may want to plug in the strain name. To allow for easy comparison downstream. Finally we may want to store a checksum of sorts. In Guile this can be achieved with: ```scheme (use-modules (rnrs bytevectors) diff --git a/topics/database/mariadb-database-architecture.gmi b/topics/database/mariadb-database-architecture.gmi new file mode 100644 index 0000000..0454d71 --- /dev/null +++ b/topics/database/mariadb-database-architecture.gmi @@ -0,0 +1,830 @@ +# MariaDB Database Architecture + +The GeneNetwork database is running on MariaDB and the layout is almost carved in stone because so much code depends on it. +We are increasingly moving material out into lmdb (genotypes and phenotypes) and virtuoso (all types of metadata), but this proves a lengthy and rather tedious process. We also run redis for cachine, sqlite for authentication, and xapian for search! + +In this document we'll discuss where things are, where they ought to go, and how the nomenclature should change. + +An SVG of the SQL layout can be found here + +=> https://raw.githubusercontent.com/genenetwork/gn-gemtext-threads/main/topics/database/sql.svg + +# Nomenclature + +These are the terms we use + +* Genotypes +* Case or genometype: individual, strain, sample +* ProbeData: Now almost obsolete. [Comment by RWW perhaps for a footnote: In GeneNetwork 1 we had built and maintained a table for individual "Probe level" data simply because the Affymetrix data sets were so large. For example, the BXD Family: "UMUTAffy Hippocampus Exon 9Feb09)RMA" array data consists of 1.236 million "probesets" each of which is a summary of many individual probe assays (ProbeData)—a total of 4.5 million probes (see https://www.thermofisher.com/order/catalog/product/900817). In GN1 we built a special interface to interrogate these 4.5 million indivdual probes--extremely useful to studing the fine-structure of mRNA expresswion. We thought it best to split these very large "pro-level" data sets from the much smaller and more widely use "ProbeSetData". The term "Probe" in this particular context (Affymetrix Exon arrays) refers to short nucleotide probes used by Affymetrix and other microarray vendors. Affymetrix "Exon"-type arrays consist of 25 nt hybridization probes that target relatively specific parts of RNAs--mainly exons but also many intronic sequences.] +* ProbeSetData: trait/sample values almost exclusively used for molecular data types (mRNA, protein, methylation assays, metabolomics, etc). [Comment by RWW perhaps for a footnote: The term "ProbeSetData" should ideally be changed to "High_Content_Data_Assays. In 2003 the only high content data assays we had were Affymetrix microarrays that measured mRNA level, and the vendor called their assays "ProbeSets". We used this now obsolete term. Most ProbeSetData in GN1 and GN2 as of 2024 are measurments of molecular traits that can be tagged to a single genome location—-the location of the gene from which the mRNA and its derivative protein are transcribed and translated, or in the case of epigenomic studies—the site at which the genome is methylated. When these three types of molecular traits are mapped, we typically add a mark all graphic output maps that highlight the location of the "parent" gene. For example, the sonic hedgehog gene in mice is located on chromosome 5 at about 28.457 Mb on the mm10 assembly (aka GRCm38). When we measure the expression of Shh mRNA, we place a purple triangle at the coordinate of the Shh gene. Two notes: 1. There are at least three ProbeSetData types do NOT have parent genes--metabolomic data, and metagenomic data, and new high-content brain connectome data. When we do NOT know the location of a parent gene, we should NOT place any mark along the X-axis. 2. Ideally GN databases would define the TYPE of high-content data, so that the code could fork to the correct GUI for that particular data type. Connectome data for the brain is an example of a data type that is very large (40,000 measurements per brain), that is truly high-content data, but that is NOT molecular. Time series data may also fall into this category.] +* ProbeSetFreeze: points to datasets + +## More on naming + +Naming convention-wise there is a confusing use of id and data-id in particular. We should stick to the table-id naming. + +# The small test database (2GB) + +The default install comes with a smaller database which includes a +number of the BXDs and the Human liver dataset (GSE9588). + +It can be downloaded from: + +=> https://files.genenetwork.org/database/ + +Try the latest one first. + +# GeneNetwork database + +Estimated table sizes with metadata comment for the important tables + +select table_name,round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` from information_schema.TABLES where table_schema = "db_webqtl" order by data_length; + +``` ++-------------------------+------------+ +| table_name | Size in MB | Should be named: ++-------------------------+------------+ +| PublishData | 22.54 | ClassicTraitValues <- data-id, strain-id, value (3M traits) +| PublishSE | 4.71 | ClassicTraitValueError (300K traits) <- data-id, strain-id, value +| PublishXRef | 2.18 | List of publications <- id, data-id, inbred-id, pheno-id, pub-id +| ProbeSetData | 59358.80 | BulkTraitValues <- id, strain, value +| ProbeSetSE | 14551.02 | BulkTraitValueError <- SE values aligns with ProbeSetData +| ProbeSetXRef | 4532.89 | PrecomputedLRS <- precomputed LRS values, pointing to dataset+trait +| ProbeSet | 2880.21 | ProbeSetInfo <- over utilized mRNA probeset description, e.g. 100001_at comes with sequence info +| ProbeSetFreeze | 0.22 | DatasetInfo <- dataset description, e.g. "Hippocampus_BXD_Jun06" - probesetfreezeid points to dataset, shortname, public? +| Probe | 2150.30 | ProbeInfo <- Probe trait info incl sequence, id, probeset-id +| ProbeFreeze | 0.06 | Dataset names <- Similar to ProbesetFreeze, id, chip-id, inbredset-id, tissue-id +| Phenotype | 6.50 | PhenotypeMeta <- "Hippocampus weight", id, prepublish short-name, postpublish short-name +| ProbeXRef | 743.38 | ProbeFreezeDataIDs <- link ProbeFreeze-Id,Probe-Id with Data-Id +| Datasets | 2.31 | DatasetMeta <- "Data generated by...", investigator-id, publication title +| NStrain | 4.80 | StrainCountDataId <- Strains used in dataset, count, strain-id, data-id +| Strain | 1.07 | StrainNames <- with species ID and alias, id, species-id, name +| TissueProbeSetData | 74.42 | <- link Id,TissueID with value +| TissueProbeSetXRef | 14.73 | TissueGeneTable? <- data-id, gene-id, mean, symbol, TissueProbeSetFreezeId | ProbesetId | DataId +| TissueProbeSetFreeze | 0.01 | tissueprobefreeze-id +| InbredSet | 0.01 | InbredSetMeta -> Id,SpeciesId,FullName +| ProbeData | 22405.44 | (OLD?) mRNAStrainValues used for partial correlations <- id, strain, value = individual probe data (mRNA) [GN1,GN3] +| ProbeSE | 6263.83 | (OLD?) Trait Error <- trait SE aligns with ProbeData? [GN3] ++-------------------------+------------+ +``` +Less commonly used tables: + +``` ++-------------------------+------------+ +| table_name | Size in MB | ++-------------------------+------------+ +| LCorrRamin3 | 18506.53 | +| SnpAll | 15484.67 | +| SnpPattern | 9177.05 | +| QuickSearch | 5972.86 | +| GenoData | 3291.91 | Strain by genotype - only used in GN1 +| CeleraINFO_mm6 | 989.80 | +| pubmedsearch | 1032.50 | +| GeneRIF_BASIC | 448.54 | +| BXDSnpPosition | 224.44 | +| EnsemblProbe | 133.66 | +| EnsemblProbeLocation | 105.49 | +| Genbank | 37.71 | +| AccessLog | 42.38 | +| GeneList | 34.11 | +| Geno | 33.90 | Marker probe info (incl. sequence) +| MachineAccessLog | 28.34 | +| IndelAll | 22.42 | +| ProbeH2 | 13.26 | +| GenoXRef | 22.83 | +| TempData | 8.35 | +| GeneList_rn3 | 5.54 | +| GORef | 4.97 | +| temporary | 3.59 | +| InfoFiles | 3.32 | +| Publication | 3.42 | +| Homologene | 5.69 | +| GeneList_rn33 | 2.61 | +| GeneRIF | 2.18 | +| Vlookup | 1.87 | +| H2 | 2.18 | +| IndelXRef | 2.91 | +| GeneMap_cuiyan | 0.51 | +| user_collection | 0.30 | +| CaseAttributeXRef | 0.44 | +| StrainXRef | 0.56 | +| GeneIDXRef | 0.77 | +| Docs | 0.17 | +| News | 0.17 | +| GeneRIFXRef | 0.24 | +| Sample | 0.06 | +| login | 0.06 | +| user | 0.04 | +| TableFieldAnnotation | 0.05 | +| DatasetMapInvestigator | 0.05 | +| User | 0.04 | +| TableComments | 0.02 | +| Investigators | 0.02 | +| DBList | 0.03 | +| Tissue | 0.02 | +| GeneChip | 0.01 | +| GeneCategory | 0.01 | +| SampleXRef | 0.01 | +| SnpAllele_to_be_deleted | 0.00 | +| Organizations | 0.01 | +| PublishFreeze | 0.00 | +| GenoFreeze | 0.00 | Used for public/private +| Chr_Length | 0.01 | +| SnpSource | 0.00 | +| AvgMethod | 0.00 | +| Species | 0.00 | +| Dataset_mbat | 0.00 | +| TissueProbeFreeze | 0.00 | +| EnsemblChip | 0.00 | +| UserPrivilege | 0.00 | +| CaseAttribute | 0.00 | +| MappingMethod | 0.00 | +| DBType | 0.00 | +| InfoFilesUser_md5 | 0.00 | +| GenoCode | 0.00 | +| DatasetStatus | 0.00 | +| GeneChipEnsemblXRef | 0.00 | +| GenoSE | 0.00 | +| user_openids | 0.00 | +| roles_users | 0.00 | +| role | 0.00 | +| Temp | NULL | ++-------------------------+------------+ +97 rows in set, 1 warning (0.01 sec) +``` + +All *Data tables are large + +## Tables containing trait values + +A trait on GN is defined by a trait-id with a dataset-id. + +=> https://genenetwork.org/show_trait?trait_id=10031&dataset=BXDPublish + +The trait-id can also be a probe name + +=> https://genenetwork.org/show_trait?trait_id=1441566_at&dataset=HC_M2_0606_P + +One of the more problematic aspects of GN is that there are two tables containing trait values (actually there are three!). ProbeSetData mostly contains expression data. PublishData contains 'classical' phenotypes. ProbeData is considered defunct. + +So, a set of trait values gets described by the dataset+probe (trait_id) OR by BXDPublish --- which is its own table --- and an identifier, here 10031. + +OK, let's look at the ProbeSetData (expression) traits: + +``` +MariaDB [db_webqtl]> select * from ProbeSetData limit 5; ++----+----------+-------+ +| Id | StrainId | value | ++----+----------+-------+ +| 1 | 1 | 5.742 | +| 1 | 2 | 5.006 | +| 1 | 3 | 6.079 | +| 1 | 4 | 6.414 | +| 1 | 5 | 4.885 | ++----+----------+-------+ +5 rows in set (0.193 sec) +MariaDB [db_webqtl]> select * from ProbeData limit 5; ++--------+----------+---------+ +| Id | StrainId | value | ++--------+----------+---------+ +| 503636 | 42 | 11.6906 | +| 503636 | 43 | 11.4205 | +| 503636 | 44 | 11.2491 | +| 503636 | 45 | 11.2373 | +| 503636 | 46 | 12.0471 | ++--------+----------+---------+ +5 rows in set (0.183 sec) +``` + +ProbeSet describes ProbeSetData. I.e., every probe ID comes with a sequence (microarray) etc. + +As for duplicated data: duplicated or "detached"* data happens sometimes, though that's not related to the PublishData/ProbeSetData distinction (unless this is done deliberately for some reason). I believe that whether data is entered as one or the other primarily comes down to the desire/need to divide it into datasets (or by tissue) within a group (with mRNA expression data just being the most common reason for this). I've encountered a situation before with Arthur where there was data in ProbeSetData that wasn't also in ProbeSetXRef + +an you give an example of exactly what you mean? PublishData would be stuff like sex, weight, etc (is this what you mean?) while ProbeSetData is used for mRNA expression data (except for a few situations where it isn't lol). + +That being said, *functionally*, I think the only real distinction (aside from what metadata is displayed) is that "ProbeSet" data has extra levels of "granularity" where it's also organized by tissue type and can be split into "datasets" (while "PublishData" traits are only associated with a Group (InbredSet in DB). That's why some non-mRNA expression data is still classified as "ProbeSet" - I think it's basically just a way to separate it into datasets within a group, often for specific tissues. + +So the organization is something like this: + +``` +Group -> PublishData +Group -> Tissue -> Dataset -> ProbeSetData +``` + +## ProbeData + +[OBSOLETE] ProbeData meanwhile is a table with fine-grained probe level Affymetrix data only. Contains 1 billion rows March 2016. This table may be *deleted* later since it is only used by the Probe Table display in GN1. Not used in GN2 +"ProbeData" should probably be "AssayData" or something more neutral. + +In comparison the "ProbeSetData" table contains more molecular assay data, including probe set data, RNA-seq data, proteomic data, and metabolomic data. 2.5 billion rows March 2016. +ProbeData contains data only for Affymetrix probe level data (e.g. Exon array probes and M430 probes). + +"StrainId" should be "CaseId" or "SampleId" or "GenometypeId", see nomenclature above. + +``` +select * from ProbeData limit 2; ++--------+----------+---------+ +| Id | StrainId | value | ++--------+----------+---------+ +| 503636 | 42 | 11.6906 | +| 503636 | 43 | 11.4205 | ++--------+----------+---------+ +2 rows in set (0.00 sec) + +select count(*) from ProbeData limit 2; ++-----------+ +| count(*) | ++-----------+ +| 976753435 | ++-----------+ +1 row in set (0.00 sec) +``` + +## PublishData + +These are the classic phenotypes under BXDPublish. + +``` +MariaDB [db_webqtl]> select * from PublishData where StrainId=5 limit 5; ++---------+----------+------------+ +| Id | StrainId | value | ++---------+----------+------------+ +| 8967043 | 5 | 49.000000 | +| 8967044 | 5 | 50.099998 | +| 8967045 | 5 | 403.000000 | +| 8967046 | 5 | 45.500000 | +| 8967047 | 5 | 44.900002 | ++---------+----------+------------+ +5 rows in set (0.265 sec) +MariaDB [db_webqtl]> select * from PublishSE where StrainId=5 limit 5; ++---------+----------+-------+ +| DataId | StrainId | error | ++---------+----------+-------+ +| 8967043 | 5 | 1.25 | +| 8967044 | 5 | 0.71 | +| 8967045 | 5 | 8.6 | +| 8967046 | 5 | 1.23 | +| 8967047 | 5 | 1.42 | ++---------+----------+-------+ +5 rows in set (0.203 sec) +MariaDB [db_webqtl]> select * from PublishXRef limit 2; ++-------+-------------+-------------+---------------+---------+-------------------+----------------+------------------+------------------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Id | InbredSetId | PhenotypeId | PublicationId | DataId | mean | Locus | LRS | additive | Sequence | comments | ++-------+-------------+-------------+---------------+---------+-------------------+----------------+------------------+------------------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| 10001 | 1 | 4 | 116 | 8967043 | 52.13529418496525 | rs48756159 | 13.4974911471087 | 2.39444435069444 | 1 | robwilliams modified post_publication_description at Mon Jul 30 14:58:10 2012 +robwilliams modified post_publication_description at Sat Jan 30 13:48:49 2016 + | +| 10002 | 1 | 10 | 116 | 8967044 | 52.22058767430923 | rsm10000005699 | 22.004269639323 | 2.08178575714286 | 1 | robwilliams modified phenotype at Thu Oct 28 21:43:28 2010 + | ++-------+-------------+-------------+---------------+---------+-------------------+----------------+------------------+------------------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ +2 rows in set (0.328 sec) +``` + +## ProbeSet + + +Comment: PLEASE CHANGE TABLE NAME and rework fields carefully. This is a terrible table but it works well (RWW March 2016). It is used in combination with the crucial TRAIT DATA and ANALYSIS pages in GN1 and GN2. It is also used by annotators using the UPDATE INFO AND DATA web form to correct and update annotation. It is used by Arthur to enter new annotation files and metadata for arrays, genes, proteins, metabolites. The main problem with this table is that it is doing too much work. And it is not doing enough because it is huge, but does not track changes. The plan is to migrate to lmdb for that. + +Initially (2003) this table contained only Affymetrix ProbeSet data for mouse (U74aV2 initially). Many other array platforms for different species were added. At least four other major categories of molecular assays have been added since about 2010: + +1. RNA-seq annotation and sequence data for transcripts using ENSEMBL identifiers or NCBI NM_XXXXX and NR_XXXXX type identifiers + +2. Protein and peptide annotation and sequence data (see BXD Liver Proteome data, SRM and SWATH type data) with identifiers such as "abcb10_q9ji39_t311" for SRM data and "LLGNMIVIVLGHHLGKDFTPAAQAA" for SWATH data where the latter is just the peptide fragment that has been quantified. Data first entered in 2015 for work by Rudi Aebersold and colleagues. + +3. Metabolite annotation and metadata (see BXD Liver Metabolome data) with identifiers that are usually Mass charge ratios such as "149.0970810_MZ" + +4. Epigenomic and methylome data (e.g. Human CANDLE Methylation data with identifiers such as "cg24523000") + +It would make good sense to break this table into four or more types of molecular assay metadata or annotation tables) (AssayRNA_Anno, AssayProtein_Anno, AssayMetabolite_Anno, AssayEpigenome_Anno, AssayMetagenome_Anno), since these assays will have many differences in annotation content compared to RNAs (RWW). + +Some complex logic is used to update contents of this table when annotators modify and correct the information (for example, updating gene symbols). These features requested by Rob so that annotating one gene symbol in one species would annotate all gene symbols in the same species based on common NCBI GeneID number. For example, changing the gene alias for one ProbeSet.Id will changing the list of aliases in all instances with the same gene symbol. + +If the ProbeSet.BlatSeq (or is this ProbSetTargetSeq) is identical between different ProbeSet.Ids then annotation is forced to be the same even if the symbol or geneID is different. This "feature" was implemented when we found many probe sets with identical sequence but different annotations and identifiers. + + +``` +select count(*) from ProbeSet limit 5; ++----------+ +| count(*) | ++----------+ +| 4351030 | ++----------+ +| Id | ChipId | Name | TargetId | Symbol | description | Chr | Mb | alias | GeneId | GenbankId | SNP | BlatSeq |TargetSeq | UniGeneId | Strand_Probe | Strand_Gene | OMIM | comments | Probe_set_target_region | Probe_set_specificity | Probe_set_BLAT_score | Probe_set_Blat_Mb_start | Probe_set_Blat_Mb_end | Probe_set_strand | Probe_set_Note_by_RW | flag | Symbol_H | description_H | chromosome_H | MB_H | alias_H | GeneId_H | chr_num | name_num | Probe_Target_Description | RefSeq_TranscriptId | Chr_mm8 | Mb_mm8 | Probe_set_Blat_Mb_start_mm8 | Probe_set_Blat_Mb_end_mm8 | HomoloGeneID | Biotype_ENS | ProteinID | ProteinName | Flybase_Id | HMDB_ID | Confidence | ChEBI_ID | ChEMBL_ID | CAS_number | PubChem_ID | ChemSpider_ID | UNII_ID | EC_number | KEGG_ID | Molecular_Weight | Nugowiki_ID | Type | Tissue | PrimaryName | SecondaryNames | PeptideSequence | ++------+--------+----------+----------+--------+----------------------------------------------+------+-----------+----------+--------+-----------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+--------------+-------------+--------+----------+-------------------------+-----------------------+----------------------+-------------------------+-----------------------+------------------+----------------------+------+----------+---------------+--------------+------+---------+----------+---------+----------+--------------------------+---------------------+---------+-----------+-----------------------------+---------------------------+--------------+-------------+-----------+-------------+------------+---------+------------+----------+-----------+------------+------------+---------------+---------+-----------+---------+------------------+-------------+------+--------+-------------+----------------+-----------------+ +| 7282 | 1 | 93288_at | NULL | Arpc2 | actin related protein 2/3 complex, subunit 2 | 1 | 74.310961 | AK008777 | 76709 | AI835883 | 0 | CCGACTTCCTTAAGGTGCTCAACCGGACTGCTTGCTACTGGATAATCGTGAGGGATTCTCCATTTGGGTTCCATTTTGTACGAGTTTGGCAAATAACCTGCAGAAACGAGCTGTGCTTGCAAGGACTTGATAGTTCCTAATCCTTTTCCAAGCTGTTTGCTTTGCAATATGT | ccgacttccttaaggtgctcaaccgtnnnnnnccnannnnccnagaaaaaagaaatgaaaannnnnnnnnnnnnnnnnnnttcatcccgctaactcttgggaactgaggaggaagcgctgtcgaccgaagnntggactgcttgctactggataatcgtnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnntgagggattctccatttgggttccattttgtacgagtttggcaaataacctgcagaaacgagctgtgcttgcaaggacttgatagttcctaagaattanaanaaaaaaaanaanttccacttgatcaanttaattcccttttatttttcctccctcantccccttccttttccaagctgtttgctttgcaatatgt | Mm.337038 | + | | 604224 | | NULL | 8.45 | 169 | 74.310961 | 74.31466 | NULL | NULL | 3 | NULL | NULL | NULL | NULL | NULL | NULL | 1 | 93288 | NULL | XM_129773 | 1 | 74.197594 | 74.197594 | 74.201293 | 4187 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | ++------+--------+----------+----------+--------+----------------------------------------------+------+-----------+----------+--------+-----------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+--------------+-------------+--------+----------+-------------------------+-----------------------+----------------------+-------------------------+-----------------------+------------------+----------------------+------+----------+---------------+--------------+------+---------+----------+---------+----------+--------------------------+---------------------+---------+-----------+-----------------------------+---------------------------+--------------+-------------+-----------+-------------+------------+---------+------------+----------+-----------+------------+------------+---------------+---------+-----------+---------+------------------+-------------+------+--------+-------------+----------------+-----------------+ +2 rows in set (0.00 sec) +``` + +** ProbeSetXRef (phenotypes/dataset_name.json) + +For every probe set (read dataset measuring point): + +``` +select * from ProbeSetXRef; +| ProbeSetFreezeId | ProbeSetId | DataId | Locus_old | LRS_old | pValue_old | mean | se | Locus | LRS | pValue | additive | h2 | +| 112 | 123528 | 23439389 | NULL | NULL | NULL | 6.7460707070707 | NULL | rs6239372 | 10.9675593568894 | 0.567 | 0.0448545966228878 | NULL | +| 112 | 123527 | 23439388 | NULL | NULL | NULL | 6.19416161616162 | NULL | rs13476936 | 10.9075670392762 | 0.567 | -0.0358456732993988 | NULL | +``` + +where ProbeSetFreezeId is the dataset (experiment). ProbesetId refers to the probe set information (measuring point). DataId points to the data point. The other values are used for search. It is used in search thus: + +``` +SELECT distinct ProbeSet.Name as TNAME, + ProbeSetXRef.Mean as TMEAN, ProbeSetXRef.LRS as TLRS, + ProbeSetXRef.PVALUE as TPVALUE, ProbeSet.Chr_num as TCHR_NUM, + ProbeSet.Mb as TMB, ProbeSet.Symbol as TSYMBOL, + ProbeSet.name_num as TNAME_NUM +FROM ProbeSetXRef, ProbeSet +WHERE ProbeSet.Id = ProbeSetXRef.ProbeSetId + and ProbeSetXRef.ProbeSetFreezeId = 112 + ORDER BY ProbeSet.symbol ASC limit 5; +| TNAME | TMEAN | TLRS | TPVALUE | TCHR_NUM | TMB | TSYMBOL | TNAME_NUM | +| 1445618_at | 7.05679797979798 | 13.5417452764616 | 0.17 | 8 | 75.077895 | NULL | 1445618 | +| 1452452_at | 7.232 | 30.4944361132252 | 0.0000609756097560421 | 12 | 12.6694 | NULL | 1452452 | +``` + +Probedata - main molecular data. Probesets, metabolome, + +Almost all important molecular assay data is in this table including probe set data, RNA-seq data, proteomic data, and metabolomic data. 2.5 billion rows March 2016. In comparison, ProbeData contains data only for Affymetrix probe level data (e.g. Exon array probes and M430 probes). + +# Strain + +``` +select * from Strain limit 5; ++----+----------+----------+-----------+--------+-------+ +| Id | Name | Name2 | SpeciesId | Symbol | Alias | ++----+----------+----------+-----------+--------+-------+ +| 1 | B6D2F1 | B6D2F1 | 1 | NULL | NULL | +| 2 | C57BL/6J | C57BL/6J | 1 | B6J | NULL | +| 3 | DBA/2J | DBA/2J | 1 | D2J | NULL | +| 4 | BXD1 | BXD1 | 1 | NULL | NULL | +| 5 | BXD2 | BXD2 | 1 | NULL | NULL | ++----+----------+----------+-----------+--------+-------+ +``` + +``` +show indexes from Strain; ++--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | ++--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| Strain | 0 | PRIMARY | 1 | Id | A | 14368 | NULL | NULL | | BTREE | | | +| Strain | 0 | Name | 1 | Name | A | 14368 | NULL | NULL | YES | BTREE | | | +| Strain | 0 | Name | 2 | SpeciesId | A | 14368 | NULL | NULL | | BTREE | | | +| Strain | 1 | Symbol | 1 | Symbol | A | 14368 | NULL | NULL | YES | BTREE | | | ++--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ + +A typical query may look like + +SELECT Strain.Name, ProbeSetData.value, ProbeSetSE.error, ProbeSetData.Id + FROM (ProbeSetData, ProbeSetFreeze, Strain, ProbeSet, ProbeSetXRef) + left join ProbeSetSE on + (ProbeSetSE.DataId = ProbeSetData.Id AND ProbeSetSE.StrainId = ProbeSetData.StrainId) + WHERE + ProbeSetFreeze.name = 'B139_K_1206_M' AND + ProbeSetXRef.ProbeSetId = ProbeSet.Id AND + ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id AND + ProbeSetXRef.DataId = ProbeSetData.Id AND + ProbeSetData.StrainId = Strain.Id + Order BY Strain.Name + ++-------+-------+-------+----------+ +| Name | value | error | Id | ++-------+-------+-------+----------+ +| SM001 | 38.3 | NULL | 25309550 | +| SM001 | 2.7 | NULL | 25309520 | +| SM001 | 20.3 | NULL | 25309507 | +| SM001 | 125.8 | NULL | 25309511 | +| SM001 | 8.2 | NULL | 25309534 | ++-------+-------+-------+----------+ +5 rows in set (22.28 sec) +``` + +# ProbeSetFreeze + +``` +select * from ProbeSetFreeze limit 5; ++----+---------------+-------+-------------+---------------------------------+---------------------------------------------+-------------------------+------------+-----------+--------+-----------------+-----------------+-----------+ +| Id | ProbeFreezeId | AvgID | Name | Name2 | FullName | ShortName | CreateTime | OrderList | public | confidentiality | AuthorisedUsers | DataScale | ++----+---------------+-------+-------------+---------------------------------+---------------------------------------------+-------------------------+------------+-----------+--------+-----------------+-----------------+-----------+ +| 1 | 3 | 1 | Br_U_0803_M | BXDMicroArray_ProbeSet_August03 | UTHSC Brain mRNA U74Av2 (Aug03) MAS5 | Brain U74Av2 08/03 MAS5 | 2003-08-01 | NULL | 0 | 0 | NULL | log2 | +| 2 | 10 | 1 | Br_U_0603_M | BXDMicroArray_ProbeSet_June03 | UTHSC Brain mRNA U74Av2 (Jun03) MAS5 | Brain U74Av2 06/03 MAS5 | 2003-06-01 | NULL | 0 | 0 | NULL | log2 | +| 3 | 8 | 1 | Br_U_0303_M | BXDMicroArray_ProbeSet_March03 | UTHSC Brain mRNA U74Av2 (Mar03) MAS5 | Brain U74Av2 03/03 MAS5 | 2003-03-01 | NULL | 0 | 0 | NULL | log2 | +| 4 | 5 | 1 | Br_U_0503_M | BXDMicroArray_ProbeSet_May03 | UTHSC Brain mRNA U74Av2 (May03) MAS5 | Brain U74Av2 05/03 MAS5 | 2003-05-01 | NULL | 0 | 0 | NULL | log2 | +| 5 | 4 | 1 | HC_U_0303_M | GNFMicroArray_ProbeSet_March03 | GNF Hematopoietic Cells U74Av2 (Mar03) MAS5 | GNF U74Av2 03/03 MAS5 | 2003-03-01 | NULL | 0 | 0 | NULL | log2 | ++----+---------------+-------+-------------+---------------------------------+---------------------------------------------+-------------------------+------------+-----------+--------+-----------------+-----------------+-----------+ +``` + +# ProbeSetXRef + +``` +select * from ProbeSetXRef limit 5; ++------------------+------------+--------+------------+--------------------+------------+-------------------+---------------------+-----------------+--------------------+--------+----------------------+------+ +| ProbeSetFreezeId | ProbeSetId | DataId | Locus_old | LRS_old | pValue_old | mean | se | Locus | LRS | pValue | additive | h2 | ++------------------+------------+--------+------------+--------------------+------------+-------------------+---------------------+-----------------+--------------------+--------+----------------------+------+ +| 1 | 1 | 1 | 10.095.400 | 13.3971627898894 | 0.163 | 5.48794285714286 | 0.08525787814808819 | rs13480619 | 12.590069931048001 | 0.269 | -0.28515625 | NULL | +| 1 | 2 | 2 | D15Mit189 | 10.042057464356201 | 0.431 | 9.90165714285714 | 0.0374686634976217 | CEL-17_50896182 | 10.5970737900941 | 0.304 | -0.11678333333333299 | NULL | +| 1 | 3 | 3 | D5Mit139 | 5.43678531742749 | 0.993 | 7.83948571428571 | 0.0457583416912569 | rs13478499 | 6.0970532702754 | 0.988 | 0.112957489878542 | NULL | +| 1 | 4 | 4 | D1Mit511 | 9.87815279480766 | 0.483 | 8.315628571428569 | 0.0470396593931327 | rs6154379 | 11.774867551173099 | 0.286 | -0.157113725490196 | NULL | +| 1 | 5 | 5 | D16H21S16 | 10.191723834264499 | 0.528 | 9.19345714285714 | 0.0354801718293322 | rs4199265 | 10.923263374016202 | 0.468 | 0.11476470588235299 | NULL | ++------------------+------------+--------+------------+--------------------+------------+-------------------+---------------------+-----------------+--------------------+--------+----------------------+------+ +``` + + +Note that the following unlimited search is very slow: + +select max(value) from ProbeSetData; + +``` ++------------+ +| max(value) | ++------------+ +| 26436006 | ++------------+ +1 row in set (2 min 16.31 sec) +``` + +which is in some form is used in the search page, see [[https://github.com/genenetwork/genenetwork2_diet/blob/master/wqflask/wqflask/do_search.py#L811][the search code]]. + + +*** Comments + +I think the ProbeSetData table should be generalized to a 'phenotypes' table with an 'sample_id' column and a 'value' column. + +A new table 'samples' will link each sample against an 'experiment', an 'individual' and which in turn can link to a 'strain'. + +Experiment is here in a wide sense, GTex can be one - I don't want to use dataset ;) + +This means a (slight) reordering: + +``` +phenotypes: (id), sample_id, value +samples: experiment_id, individual_id +experiments: name, version +individual: strain_id +strains: species_id +species: ... +``` + +ProbeData is also interesting, because it has the same structure as ProbeSetData, but only contains microarrays. This tables should be one (when we clear up the cross-referencing) as they both contain phenotype values. Both are large tables. + +PublishData is another phenotype table with values only which can be merged into that same table. This data does not require the annotations of probesets(!) + +=> https://genenetwork.org/show_trait?trait_id=10031&dataset=BXDPublish + +So we have phenotype data in 3 tables with exactly the same +layout. There is also TissueProbeSet*, but we'll ignore those for +now. I think we should merge these into one and have the sample ref +refer to the type of data (probeset, probe, metabolomics, +whatever). These are all phenotype values and by having them split +into different tables they won't play well when looking for +correlations. + +ProbeSet contains the metadata on the probes and should (eventually) +move into NoSQL. There is plenty redundancy in that table now. + +I know it is going to be a pain to reorganize the database, but if we +want to use it in the long run we are going to have to simplify it. + +# ProbeSetFreeze and ProbeFreeze (/dataset/name.json) + +GN_SERVER: /dataset/HC_M2_0606_P.json + +ProbesetFreeze contains DataSet information, such as name, fullname of +datasets, as well as whether they are public and how the data is +scaled: + +``` +select * from ProbeSetFreeze; +| Id | ProbeFreezeId | AvgID | Name | Name2 | FullName | ShortName | CreateTime | OrderList | public | confidentiality | AuthorisedUsers | DataScale | +| 112 | 30 | 2 | HC_M2_0606_P | Hippocampus_M430_V2_BXD_PDNN_Jun06 | Hippocampus Consortium M430v2 (Jun06) PDNN | Hippocampus M430v2 BXD 06/06 PDNN | 2006-06-23 | NULL | 2 | 0 | NULL | log2 | +``` + +Another table contains a tissue reference and a back reference to the cross +type: + +``` +select * from ProbeFreeze; +| Id | ProbeFreezeId | ChipId | TissueId | Name | FullName | ShortName | CreateTime | InbredSetId | +| 30 | 30 | 4 | 9 | Hippocampus Consortium M430v2 Probe (Jun06) | | | 2006-07-07 | 1 | +``` + +NOTE: these tables can probably be merged into one. + +``` +show indexes from ProbeSetFreeze; ++----------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | ++----------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| ProbeSetFreeze | 0 | PRIMARY | 1 | Id | A | 2 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 0 | FullName | 1 | FullName | A | 2 | NULL | NULL | | BTREE | | | +| ProbeSetFreeze | 0 | Name | 1 | Name | A | 2 | NULL | NULL | YES | BTREE | | | +| ProbeSetFreeze | 1 | NameIndex | 1 | Name2 | A | 2 | NULL | NULL | | BTREE | | | ++----------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +``` + +# ProbeSetSE + +``` +select * from ProbeSetSE limit 5; ++--------+----------+----------+ +| DataId | StrainId | error | ++--------+----------+----------+ +| 1 | 1 | 0.681091 | +| 1 | 2 | 0.361151 | +| 1 | 3 | 0.364342 | +| 1 | 4 | 0.827588 | +| 1 | 5 | 0.303492 | ++--------+----------+----------+ +``` + +# More information + +For the other tables, you may check the GN2/doc/database.org document (the starting point for this document). + +# Contributions regarding data upload to the GeneNetwork webserver +* Ideas shared by the GeneNetwork team to facilitate the process of uploading data to production + +## Quality check and integrity of the data to be uploaded to gn2 + +* A note to add (from Arthur): Some datasets have the following identifiers: ProbeSet IDs {chr_3020701, chr_3020851, etc}. This is not an acceptable way to name the probeset IDs. So, the data provider needs to understand what format is needed for gn2 to accept the ProbeSet IDs in their dataset +* Also, for the annotation file, among other important columns, it is crucial that there are descriptions, aliases, and location columns. And the formatting should be exactly as found in the public repositories such as NCBI, Ensembl, etc. For instance, for description: `X-linked Kx blood group related 4`, and Aliases: ` XRG4; Gm210; mKIAA1889` as in +=> https://www.ncbi.nlm.nih.gov/gene/497097 + +## Valid ProbeSetIDs + +* The official ProbeSetIDs would be the one from the vendor. This would also constitute the platform used to generate data {Novogene-specific platform}, for instance; `NovaSeqPE150` for the MBD UTHSC mice seq dataset +* NB; in this case, if the vendor does not provide the official names as expected, we can use the platform + the numbering order of the file to generate probeset IDs. For instance; `NseqPE150_000001 to NseqPE150_432694` for samples 1 to 432694 +* Avoid IDs with meaning, e.g. =chr1_3020701= → Chromosome 1 at 3020701 base pairs. Prefer IDs with no meaning + +## The importance of having unique identifiers within a platform + +* Unique identifiers solve the hurdles that come with having duplicate genes. So, the QA tools in place should ensure the uploaded dataset adheres to the requirements mentioned +* However, newer RNA-seq data sets generated by sequencing do not usually have an official vendor identifier. The identifier is usually based on the NCBI mRNA model (NM_XXXXXX) that was used to evaluate an expression and on the sequence that is involved, usually the start and stop nucleotide positions based on a specific genome assembly or just a suffix to make sure it is unique. In this case, you are looking at mRNA assays for a single transcript, but different parts of the transcript that have different genome coordinates. We now typically use ENSEMBL identifiers. +* The mouse version of the sonic hedgehog gene as an example: `ENSMUST00000002708` or `ENSMUSG00000002633` sources should be fine. The important thing is to know the provenance of the ID—who is in charge of that ID type? +* When a mRNA assay is super precise (one exon only or a part of the 5' UTR), then we should use exon identifiers from ENSEMBL probably. +* Ideally, we should enter the sequence's first and last 100 nt in GeneNetwork for verification and alignment. We did this religiously for arrays, but have started to get lazy now. The sequence is the ultimate identifier +* For methylation arrays and CpG assays, we can use this format `cg14050475` as seen in MBD UTHSC Ben's data +* For metabolites like isoleucine—the ID we have been using is the mass-to-charge (MZ) ratio such as `130.0874220_MZ` +* For protein and peptide identifiers we have used the official Protein ID followed by an underscore character and then some or all of the sequence. This is then followed by another underscore and a number. Evan to confirm, but the suffix number is the charge state if I remember correctly +``` +Q9JHJ3_LLHTADVCQLEVALVGASPR_3 +A2A8E1_TIVEFECR_2 +A2A8E1_ATLENVTNLRPVGEDFR_3 +A2A8E1_ENSIDILSSTIK_2 +``` +* But in older protein expression databases Evan and the team used a different method +``` +abcb10_q9ji39_t311 +abcb10_q9ji39_t312 +``` +* The above is just the gene symbol then the protein ID and not so sure what t311 and t312 mean +* Ideally these IDs are explained to some extent when they embed some information + + + +## BXD individuals + +* Basically groups (represented by the InbredSet tables) are primarily defined by their list of samples/strains (represented by the Strain tables). When we create a new group, it's because we have data with a distinct set of samples/strains from any existing groups. +* So when we receive data for BXD individuals, as far as the database is concerned they are a completely separate group (since the list of samples is new/distinct from any other existing groups). We can choose to also enter it as part of the "generic" BXD group (by converting it to strain means/SEs using the strain of each individual, assuming it's provided like in the files Arthur was showing us). +* This same logic could apply to other groups as well - we could choose to make one group the "strain mean" group for another set of groups that contain sample data for individuals. But the database doesn't reflect the relationship between these groups* +* As far as the database is concerned, there is no distinction between strain means and individual sample data - they're all rows in the ProbeSetData/PublishData tables. The only difference is that strain mean data will probably also have an SE value in the ProbeSetSE/PublishSE tables and/or an N (number of individuals per strain) value in the NStrain table +* As for what this means for the uploader - I think it depends on whether Rob/Arthur/etc wants to give users the ability to simultaneously upload both strain mean and individual data. For example, if someone uploads some BXD individuals' data, do we want the uploader to both create a new group for this (or add to an existing BXD individuals group) and calculate the strain means/SE and enter it into the "main" BXD group? My personal feeling is that it's probably best to postpone that for later and only upload the data with the specific set of samples indicated in the file since it would insert some extra complexity to the uploading process that could always be added later (since the user would need to select "the group the strains are from" as a separate option) +* The relationship is sorta captured in the CaseAttribute and CaseAttributeXRefNew tables (which contain sample metadata), but only in the form of the metadata that is sometimes displayed as extra columns in the trait page table - this data isn't used in any queries/analyses currently (outside of some JS filters run on the table itself) and isn't that important as part of the uploading process (or at least can be postponed) + +## Individual Datasets and Derivatives datasets in gn2 +* Individual dataset reflects the actual data provided or submitted by the investigator (user). Derivative datasets include the processed information from the individual dataset, as in the case of the average datasets. +* An example of an individual dataset would look something like; (MBD dataset) +``` +#+begin_example +sample, strain, Sex, Age,… +FEB0001,BXD48a,M,63,… +FEB0002,BXD48a,M,15,… +FEB0003,BXD48a,F,22,… +FEB0004,BXD16,M,39,… +FEB0005,BXD16,F,14,… +⋮ +#+end_example +``` +* The strain column above has repetitive values. Each value has a one-to-many relationship with values on sample column. From this dataset, there can be several derivatives. For example; +- Sex-based categories +- Average data (3 sample values averaged to one strain value) +- Standard error table computed for the averages + +## Saving data to database +* Strain table schema +``` +#+begin_src sql + MariaDB [db_webqtl]> DESC Strain; + +-----------+----------------------+------+-----+---------+----------------+ + | Field | Type | Null | Key | Default | Extra | + +-----------+----------------------+------+-----+---------+----------------+ + | Id | int(20) | NO | PRI | NULL | auto_increment | + | Name | varchar(100) | YES | MUL | NULL | | + | Name2 | varchar(100) | YES | | NULL | | + | SpeciesId | smallint(5) unsigned | NO | | 0 | | + | Symbol | varchar(20) | YES | MUL | NULL | | + | Alias | varchar(255) | YES | | NULL | | + +-----------+----------------------+------+-----+---------+----------------+ + 6 rows in set (0.00 sec) +#+end_src +``` +* For the *individual data*, the =sample= field would be saved as =Name= and the =strain= would be saved as =Name2=. These records would then all be linked to an inbredset group (population?) in the =InbredSet= table via the =StrainXRef= table, whose schema is as follows: +``` +#+begin_src sql + MariaDB [db_webqtl]> DESC StrainXRef; + +------------------+----------------------+------+-----+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +------------------+----------------------+------+-----+---------+-------+ + | InbredSetId | smallint(5) unsigned | NO | PRI | 0 | | + | StrainId | int(20) | NO | PRI | NULL | | + | OrderId | int(20) | YES | | NULL | | + | Used_for_mapping | char(1) | YES | | N | | + | PedigreeStatus | varchar(255) | YES | | NULL | | + +------------------+----------------------+------+-----+---------+-------+ + 5 rows in set (0.00 sec) +#+end_src +``` +* Where the =InbredSetId= comes from the =InbredSet= table and the =StrainId= comes from the =Strain= table. The *individual data* would be linked to an inbredset group that is for individuals +* For the *average data*, the only value to save would be the =strain= field, which would be saved as =Name= in the =Strain= table and linked to an InbredSet group that is for averages +*Question 01*: How do we distinguish the inbredset groups? +*Answer*: The =Family= field is useful for this. + +*Question 02*: If you have more derived "datasets", e.g. males-only, females-only, under-10-years, 10-to-25-years, etc. How would the =Strains= table handle all those differences? + +## Metadata +* The data we looked at had =gene id= and =gene symbol= fields. These fields were used to fetch the *Ensembl ID* and *descriptions* from [[https://www.ncbi.nlm.nih.gov/][NCBI]] and the [[https://useast.ensembl.org/][Ensembl Genome Browser]] + +## Files for mapping +* Files used for mapping need to be in =bimbam= or =.geno= formats. We would need to do conversions to at least one of these formats where necessary + +## Annotation files +* Consider the following schema of DB tables +#+begin_src sql + MariaDB [db_webqtl]> DESC InbredSet; + +-----------------+----------------------+------+-----+---------+----------------+ + | Field | Type | Null | Key | Default | Extra | + +-----------------+----------------------+------+-----+---------+----------------+ + | Id | smallint(5) unsigned | NO | PRI | NULL | auto_increment | + | InbredSetId | int(5) unsigned | NO | | NULL | | + | InbredSetName | varchar(100) | YES | | NULL | | + | Name | char(30) | NO | | | | + | SpeciesId | smallint(5) unsigned | YES | | 1 | | + | FullName | varchar(100) | YES | | NULL | | + | public | tinyint(3) unsigned | YES | | 2 | | + | MappingMethodId | char(50) | YES | | 1 | | + | GeneticType | varchar(255) | YES | | NULL | | + | Family | varchar(100) | YES | | NULL | | + | FamilyOrder | int(5) | YES | | NULL | | + | MenuOrderId | double | NO | | NULL | | + | InbredSetCode | varchar(5) | YES | | NULL | | + | Description | longtext | YES | | NULL | | + +-----------------+----------------------+------+-----+---------+----------------+ + ⋮ + MariaDB [db_webqtl]> DESC Strain; + +-----------+----------------------+------+-----+---------+----------------+ + | Field | Type | Null | Key | Default | Extra | + +-----------+----------------------+------+-----+---------+----------------+ + | Id | int(20) | NO | PRI | NULL | auto_increment | + | Name | varchar(100) | YES | MUL | NULL | | + | Name2 | varchar(100) | YES | | NULL | | + | SpeciesId | smallint(5) unsigned | NO | | 0 | | + | Symbol | varchar(20) | YES | MUL | NULL | | + | Alias | varchar(255) | YES | | NULL | | + +-----------+----------------------+------+-----+---------+----------------+ + ⋮ + MariaDB [db_webqtl]> DESC StrainXRef; + +------------------+----------------------+------+-----+---------+-------+ + | Field | Type | Null | Key | Default | Extra | + +------------------+----------------------+------+-----+---------+-------+ + | InbredSetId | smallint(5) unsigned | NO | PRI | 0 | | + | StrainId | int(20) | NO | PRI | NULL | | + | OrderId | int(20) | YES | | NULL | | + | Used_for_mapping | char(1) | YES | | N | | + | PedigreeStatus | varchar(255) | YES | | NULL | | + +------------------+----------------------+------+-----+---------+-------+ +#+end_src + +* The =StrainXRef= table creates a link between the Samples/cases/individuals (stored in the =Strain= table) to the group (population?) they belong to in the =InbredSet= table +* Steps to prepare the TSV file for entering samples/cases into the database are: +- Clean up =Name= of the samples/cases/individuals in the file: + - Names should have no spaces + - Names should be the same length of characters: pad those that are shorter e.g. *SampleName12* → *SampleName012* to fit in with other names if, say, the samples range from 1 to 999. In a similar vein, you'd rename *SampleName1* to *SampleName001* +- Order samples by the names +- Create a new column, say, =orderId= in the TSV, and assign the order *1, 2, 3, …, n* for the rows, from the first to the "n^{th}" row. The order of the strains is very important and must be maintained +- retrieve the largest current =Id= value in the =Strain= table +- Increment by one (1) and assign that to the first row of your ordered data + - Assign subsequent rows, the subsequent values for the ID e.g. Assuming the largest =Id= value in the =Strain= table was *23*, the first row of the new data would have the id *24*. The second row would have *25*, the third, *26* and so on +- Get the =InbredSetId= for your samples' data. Add a new column in the data and copy this value for all rows +- Enter data into the =Strain= table +- Using the previously computed strain ID values, and the =InbredSetId= previously copied, enter data into the =StrainXRef= table + +* Some notes on the data: +- The =Symbol= field in the =Strain= table corresponds to the =Strain= field in the annotation file +- The =used_for_mapping= field should be set to ~Y~ unless otherwise informed +- The =PedigreeStatus= field is unknown to us for now: set to ~NULL~ + +* Annotation file format +The important fields are: +- =ChipId=: The platform that the data was collected from/with +Consider the following table; +#+begin_src sql + MariaDB [db_webqtl]> DESC GeneChip; + +---------------+----------------------+------+-----+---------+----------------+ + | Field | Type | Null | Key | Default | Extra | + +---------------+----------------------+------+-----+---------+----------------+ + | Id | smallint(5) unsigned | NO | PRI | NULL | auto_increment | + | GeneChipId | int(5) | YES | | NULL | | + | GeneChipName | varchar(200) | YES | | NULL | | + | Name | char(30) | NO | | | | + | GeoPlatform | char(15) | YES | | NULL | | + | Title | varchar(100) | YES | | NULL | | + | SpeciesId | int(5) | YES | | 1 | | + | GO_tree_value | varchar(50) | YES | | NULL | | + +---------------+----------------------+------+-----+---------+----------------+ + #+end_src + Some of the important fields that were highlighted were: + - =GeoPlatform=: Links the details of the platform in our database with NCBI's [[https://www.ncbi.nlm.nih.gov/geo/][Gene Ontology Omnibus (GEO)]] system. This is not always possible, but where we can, it would be nice to have this field populated + - =GO_tree_value=: This is supposed to link the detail we have with some external system "GO". I have not figured this one out on my own and will need to follow up on it. + - =Name=: The name corresponds to the =ProbeSetId=, and we want this to be the same value as the identifier on the [[https://www.ensembl.org][Ensembl genome browser]], e.g. For a gene, say =Shh=, for *mouse*, we want the =Name= value to be a variation on [[https://useast.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000002633;r=5:28661813-28672254;t=ENSMUST00000002708][*ENSMUSG00000002633*]] + - =Probe_set_Blat_Mb_start=/=Probe_set_Blat_Mb_end=: In Byron's and Beni's data, these correspond to the =geneStart= and =geneEnd= fields respectively. These are the positions, in megabasepairs, that the gene begins and ends at, respectively. + - =Mb=: This is the =geneStart=/=Probe_set_Blat_Mb_start= value divided by *1000000*. (*Note to self*: Maybe the Probe_set_Blat_Mb_* fields above might not be in megabase pairs — please confirm) + - =Strand_Probe= and =Strand_Gene=: These fields' values are simply ~+~ or ~-~. If these values are missing, you can [[https://ftp.ncbi.nih.gov/gene/README][retrieve them from NCBI]], specifically from the =orientation= field of seemingly any text file with the field + - =Chr=: This is the chromosome on which the gene is found + +* The final annotation file will have (at minimum) the following fields (or their +analogs): +- =StrainName= +- =OrderId= +- =StrainId=: from the database +- =InbredSetId=: from the database +- =Symbol=: This could be named =Strain= +- =GeneChipId=: from the database +- =EnsemblId=: from the Ensembl genome browser +- =Probe_set_Blat_Mb_start=: possible analogue is =geneStart= +- =Probe_set_Blat_Mb_end=: possible analogue is =geneEnd= +- =Mb= +- =Strand_Probe= +- =Strand_Gene= +- =Chr= + +* =.geno= Files +- The =.geno= files have sample names, not the strain/symbol. The =Locus= field in the =.geno= file corresponds to the **marker**. =.geno= files are used with =QTLReaper= +- The sample names in the ~.geno~ files *MUST* be in the same order as the +strains/symbols for that species. For example; +Data format is as follows; +``` +#+begin_example +SampleName,Strain,… +⋮ +BJCWI0001,BXD40,… +BJCWI0002,BXD40,… +BJCWI0003,BXD33,… +BJCWI0004,BXD50,… +BJCWI0005,BXD50,… +⋮ +#+end_example +``` +and the order of strains is as follows; +``` +#+begin_example +…,BXD33,…,BXD40,…,BXD50,… +#+end_example +``` +then, the ~.geno~ file generated by this data should have a form such as shown +below; +``` +#+begin_example +…,BJCWI0003,…,BJCWI0001,BJCWI0002,…,BJCWI0004,BJCWI0005,… +#+end_example +``` +The order of samples that belong to the same strain is irrelevant - they share the same data, i.e. the order below is also valid; +``` +#+begin_example +…,BJCWI0003,…,BJCWI0002,BJCWI0001,…,BJCWI0004,BJCWI0005,… +#+end_example +``` +* =BimBam= Files +- Used with =GEMMA= +* Case Attributes +- These are metadata about every case/sample/individual in an InbredSet group. The metadata is any data that has nothing to do with phenotypes (e.g. height, weight, etc) that is useful for researchers to have in order to make sense of the data. +- Examples of case attributes: + - Treatment + - Sex (Really? Isn't sex an expression of genes?) + - batch + - Case ID, etc + +* Summary steps to load data to the database +- [x] Create *InbredSet* group (think population) +- [x] Load the strains/samples data +- [x] Load the sample cross-reference data to link the samples to their + InbredSet group +- Load the case-attributes data +- [x] Load the annotation (data into ProbeSet table) +- [x] Create the study for the data (At around this point, the InbredSet group + will show up in the UI). +- [x] Create the Dataset for the data +- [x] Load the *Log2* data (ProbeSetData and ProbeSetXRef tables) +- [x] Compute means (an SQL query was used — this could be pre-computed in code + and entered along with the data) +- [x] Run QTLReaper diff --git a/topics/database/setting-up-local-development-database.gmi b/topics/database/setting-up-local-development-database.gmi index 3b743b9..9ebb48b 100644 --- a/topics/database/setting-up-local-development-database.gmi +++ b/topics/database/setting-up-local-development-database.gmi @@ -41,7 +41,12 @@ Setting up mariadb in a Guix container is the preferred and easier method. But, ``` $ sudo $(./containers/db-container.sh) ``` -You should now be able to connect to the database using +By default, mariadb allows passwordless login for root only on the local machine. So, enter the container using guix container exec and set the root password to a blank. +``` +$ mysql -u root +MariaDB [(none)]> SET PASSWORD = PASSWORD(""); +``` +You should now be able to connect to the database from outside the container using ``` $ mysql --protocol tcp -u root ``` diff --git a/topics/database/sql.svg b/topics/database/sql.svg new file mode 100644 index 0000000..b7ab96e --- /dev/null +++ b/topics/database/sql.svg @@ -0,0 +1,2558 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" + "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> +<!-- Generated by graphviz version 2.49.0 (20210828.1703) + --> +<!-- Title: schema Pages: 1 --> +<svg width="13704pt" height="5921pt" + viewBox="0.00 0.00 13703.50 5921.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> +<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 5917)"> +<title>schema</title> +<polygon fill="white" stroke="transparent" points="-4,4 -4,-5917 13699.5,-5917 13699.5,4 -4,4"/> +<!-- NStrain --> +<g id="node1" class="node"> +<title>NStrain</title> +<polygon fill="white" stroke="transparent" points="6648.5,-1918 6648.5,-2008 6775.5,-2008 6775.5,-1918 6648.5,-1918"/> +<polygon fill="#df65b0" stroke="transparent" points="6652,-1984 6652,-2005 6773,-2005 6773,-1984 6652,-1984"/> +<polygon fill="none" stroke="black" points="6652,-1984 6652,-2005 6773,-2005 6773,-1984 6652,-1984"/> +<text text-anchor="start" x="6655" y="-1990.8" font-family="Times,serif" font-size="14.00">NStrain (9 MiB)</text> +<text text-anchor="start" x="6692.5" y="-1968.8" font-family="Times,serif" font-size="14.00">count</text> +<text text-anchor="start" x="6688" y="-1947.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="6683" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="6648.5,-1918 6648.5,-2008 6775.5,-2008 6775.5,-1918 6648.5,-1918"/> +</g> +<!-- Strain --> +<g id="node40" class="node"> +<title>Strain</title> +<polygon fill="lightgrey" stroke="transparent" points="5728.5,-765.5 5728.5,-918.5 5843.5,-918.5 5843.5,-765.5 5728.5,-765.5"/> +<polygon fill="#df65b0" stroke="transparent" points="5732,-894 5732,-915 5841,-915 5841,-894 5732,-894"/> +<polygon fill="none" stroke="black" points="5732,-894 5732,-915 5841,-915 5841,-894 5732,-894"/> +<text text-anchor="start" x="5735" y="-900.8" font-family="Times,serif" font-size="14.00">Strain (2 MiB)</text> +<polygon fill="green" stroke="transparent" points="5732,-873 5732,-892 5841,-892 5841,-873 5732,-873"/> +<text text-anchor="start" x="5769" y="-878.8" font-family="Times,serif" font-size="14.00">Alias</text> +<polygon fill="green" stroke="transparent" points="5732,-852 5732,-871 5841,-871 5841,-852 5732,-852"/> +<text text-anchor="start" x="5765" y="-857.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="green" stroke="transparent" points="5732,-831 5732,-850 5841,-850 5841,-831 5732,-831"/> +<text text-anchor="start" x="5760.5" y="-836.8" font-family="Times,serif" font-size="14.00">Name2</text> +<polygon fill="green" stroke="transparent" points="5732,-810 5732,-829 5841,-829 5841,-810 5732,-810"/> +<text text-anchor="start" x="5759.5" y="-815.8" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="5779" y="-794.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="5751.5" y="-773.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<polygon fill="none" stroke="black" points="5728.5,-765.5 5728.5,-918.5 5843.5,-918.5 5843.5,-765.5 5728.5,-765.5"/> +</g> +<!-- NStrain->Strain --> +<g id="edge1" class="edge"> +<title>NStrain:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6651,-1930C6610.43,-1930 6653.88,-1233.5 6631,-1200 6450.66,-935.96 6033.45,-866.5 5861.83,-848.81"/> +<polygon fill="black" stroke="black" points="5861.92,-845.3 5851.62,-847.79 5861.23,-852.27 5861.92,-845.3"/> +</g> +<!-- roles_users --> +<g id="node2" class="node"> +<title>roles_users</title> +<polygon fill="white" stroke="transparent" points="7071.5,-4853 7071.5,-4922 7204.5,-4922 7204.5,-4853 7071.5,-4853"/> +<polygon fill="#f1eef6" stroke="transparent" points="7075,-4897.5 7075,-4918.5 7202,-4918.5 7202,-4897.5 7075,-4897.5"/> +<polygon fill="none" stroke="black" points="7075,-4897.5 7075,-4918.5 7202,-4918.5 7202,-4897.5 7075,-4897.5"/> +<text text-anchor="start" x="7078" y="-4904.3" font-family="Times,serif" font-size="14.00">roles_users (0 B)</text> +<text text-anchor="start" x="7114" y="-4882.3" font-family="Times,serif" font-size="14.00">role_id</text> +<text text-anchor="start" x="7112.5" y="-4861.3" font-family="Times,serif" font-size="14.00">user_id</text> +<polygon fill="none" stroke="black" points="7071.5,-4853 7071.5,-4922 7204.5,-4922 7204.5,-4853 7071.5,-4853"/> +</g> +<!-- role --> +<g id="node58" class="node"> +<title>role</title> +<polygon fill="white" stroke="transparent" points="7093.5,-3249 7093.5,-3339 7184.5,-3339 7184.5,-3249 7093.5,-3249"/> +<polygon fill="#f1eef6" stroke="transparent" points="7097,-3315 7097,-3336 7182,-3336 7182,-3315 7097,-3315"/> +<polygon fill="none" stroke="black" points="7097,-3315 7097,-3336 7182,-3336 7182,-3315 7097,-3315"/> +<text text-anchor="start" x="7106" y="-3321.8" font-family="Times,serif" font-size="14.00">role (0 B)</text> +<text text-anchor="start" x="7099" y="-3299.8" font-family="Times,serif" font-size="14.00">description</text> +<text text-anchor="start" x="7119.5" y="-3278.8" font-family="Times,serif" font-size="14.00">name</text> +<text text-anchor="start" x="7117.5" y="-3257.8" font-family="Times,serif" font-size="14.00">the_id</text> +<polygon fill="none" stroke="black" points="7093.5,-3249 7093.5,-3339 7184.5,-3339 7184.5,-3249 7093.5,-3249"/> +</g> +<!-- roles_users->role --> +<g id="edge2" class="edge"> +<title>roles_users:role_id->role</title> +<path fill="none" stroke="black" d="M7203,-4885.5C7242.13,-4885.5 7161.86,-3639.62 7142.89,-3353.21"/> +<polygon fill="black" stroke="black" points="7146.37,-3352.78 7142.22,-3343.03 7139.39,-3353.24 7146.37,-3352.78"/> +</g> +<!-- User --> +<g id="node60" class="node"> +<title>User</title> +<polygon fill="white" stroke="transparent" points="7244,-3175.5 7244,-3412.5 7354,-3412.5 7354,-3175.5 7244,-3175.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="7247,-3388 7247,-3409 7351,-3409 7351,-3388 7247,-3388"/> +<polygon fill="none" stroke="black" points="7247,-3388 7247,-3409 7351,-3409 7351,-3388 7247,-3388"/> +<text text-anchor="start" x="7250" y="-3394.8" font-family="Times,serif" font-size="14.00">User (28 KiB)</text> +<text text-anchor="start" x="7260" y="-3372.8" font-family="Times,serif" font-size="14.00">createtime</text> +<text text-anchor="start" x="7273" y="-3351.8" font-family="Times,serif" font-size="14.00">disable</text> +<text text-anchor="start" x="7279" y="-3330.8" font-family="Times,serif" font-size="14.00">email</text> +<text text-anchor="start" x="7265.5" y="-3309.8" font-family="Times,serif" font-size="14.00">grpName</text> +<text text-anchor="start" x="7292" y="-3288.8" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="7268" y="-3267.8" font-family="Times,serif" font-size="14.00">lastlogin</text> +<text text-anchor="start" x="7279" y="-3246.8" font-family="Times,serif" font-size="14.00">name</text> +<text text-anchor="start" x="7264.5" y="-3225.8" font-family="Times,serif" font-size="14.00">password</text> +<text text-anchor="start" x="7267" y="-3204.8" font-family="Times,serif" font-size="14.00">privilege</text> +<text text-anchor="start" x="7273" y="-3183.8" font-family="Times,serif" font-size="14.00">user_ip</text> +<polygon fill="none" stroke="black" points="7244,-3175.5 7244,-3412.5 7354,-3412.5 7354,-3175.5 7244,-3175.5"/> +</g> +<!-- roles_users->User --> +<g id="edge3" class="edge"> +<title>roles_users:user_id->User</title> +<path fill="none" stroke="black" d="M7139,-4854.5C7139,-4323.12 7232.06,-3695.19 7276.24,-3427.05"/> +<polygon fill="black" stroke="black" points="7279.74,-3427.32 7277.92,-3416.88 7272.83,-3426.18 7279.74,-3427.32"/> +</g> +<!-- SnpAllRat --> +<g id="node3" class="node"> +<title>SnpAllRat</title> +<polygon fill="white" stroke="transparent" points="2716,-702.5 2716,-981.5 2876,-981.5 2876,-702.5 2716,-702.5"/> +<polygon fill="#df65b0" stroke="transparent" points="2719,-957 2719,-978 2873,-978 2873,-957 2719,-957"/> +<polygon fill="none" stroke="black" points="2719,-957 2719,-978 2873,-978 2873,-957 2719,-957"/> +<text text-anchor="start" x="2722" y="-963.8" font-family="Times,serif" font-size="14.00">SnpAllRat (908 MiB)</text> +<text text-anchor="start" x="2772" y="-941.8" font-family="Times,serif" font-size="14.00">Alleles</text> +<text text-anchor="start" x="2749" y="-920.8" font-family="Times,serif" font-size="14.00">Chromosome</text> +<text text-anchor="start" x="2728" y="-899.8" font-family="Times,serif" font-size="14.00">ConservationScore</text> +<text text-anchor="start" x="2768.5" y="-878.8" font-family="Times,serif" font-size="14.00">Domain</text> +<text text-anchor="start" x="2764" y="-857.8" font-family="Times,serif" font-size="14.00">Function</text> +<text text-anchor="start" x="2777.5" y="-836.8" font-family="Times,serif" font-size="14.00">Gene</text> +<text text-anchor="start" x="2788.5" y="-815.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2767" y="-794.8" font-family="Times,serif" font-size="14.00">Position</text> +<text text-anchor="start" x="2761" y="-773.8" font-family="Times,serif" font-size="14.00">SnpName</text> +<text text-anchor="start" x="2771" y="-752.8" font-family="Times,serif" font-size="14.00">Source</text> +<text text-anchor="start" x="2761" y="-731.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="2758.5" y="-710.8" font-family="Times,serif" font-size="14.00">Transcript</text> +<polygon fill="none" stroke="black" points="2716,-702.5 2716,-981.5 2876,-981.5 2876,-702.5 2716,-702.5"/> +</g> +<!-- Species --> +<g id="node33" class="node"> +<title>Species</title> +<polygon fill="lightgrey" stroke="transparent" points="2734,-201 2734,-396 2858,-396 2858,-201 2734,-201"/> +<polygon fill="#f1eef6" stroke="transparent" points="2737,-371.5 2737,-392.5 2855,-392.5 2855,-371.5 2737,-371.5"/> +<polygon fill="none" stroke="black" points="2737,-371.5 2737,-392.5 2855,-392.5 2855,-371.5 2737,-371.5"/> +<text text-anchor="start" x="2740" y="-378.3" font-family="Times,serif" font-size="14.00">Species (796 B)</text> +<polygon fill="green" stroke="transparent" points="2737,-350.5 2737,-369.5 2855,-369.5 2855,-350.5 2737,-350.5"/> +<text text-anchor="start" x="2761" y="-356.3" font-family="Times,serif" font-size="14.00">FullName</text> +<polygon fill="green" stroke="transparent" points="2737,-329.5 2737,-348.5 2855,-348.5 2855,-329.5 2737,-329.5"/> +<text text-anchor="start" x="2754.5" y="-335.3" font-family="Times,serif" font-size="14.00">MenuName</text> +<polygon fill="green" stroke="transparent" points="2737,-308.5 2737,-327.5 2855,-327.5 2855,-308.5 2737,-308.5"/> +<text text-anchor="start" x="2747.5" y="-314.3" font-family="Times,serif" font-size="14.00">SpeciesName</text> +<text text-anchor="start" x="2788.5" y="-293.3" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="green" stroke="transparent" points="2737,-266.5 2737,-285.5 2855,-285.5 2855,-266.5 2737,-266.5"/> +<text text-anchor="start" x="2774.5" y="-272.3" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="2767.5" y="-251.3" font-family="Times,serif" font-size="14.00">OrderId</text> +<text text-anchor="start" x="2761" y="-230.3" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="2752.5" y="-209.3" font-family="Times,serif" font-size="14.00">TaxonomyId</text> +<polygon fill="none" stroke="black" points="2734,-201 2734,-396 2858,-396 2858,-201 2734,-201"/> +</g> +<!-- SnpAllRat->Species --> +<g id="edge4" class="edge"> +<title>SnpAllRat:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M2874,-735C2906.96,-735 2860.65,-539.2 2826.56,-410.18"/> +<polygon fill="black" stroke="black" points="2829.87,-409 2823.92,-400.23 2823.1,-410.8 2829.87,-409"/> +</g> +<!-- SampleXRef --> +<g id="node4" class="node"> +<title>SampleXRef</title> +<polygon fill="white" stroke="transparent" points="3272,-3259.5 3272,-3328.5 3426,-3328.5 3426,-3259.5 3272,-3259.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3275,-3304 3275,-3325 3423,-3325 3423,-3304 3275,-3304"/> +<polygon fill="none" stroke="black" points="3275,-3304 3275,-3325 3423,-3325 3423,-3304 3275,-3304"/> +<text text-anchor="start" x="3278" y="-3310.8" font-family="Times,serif" font-size="14.00">SampleXRef (4 KiB)</text> +<text text-anchor="start" x="3296" y="-3288.8" font-family="Times,serif" font-size="14.00">ProbeFreezeId</text> +<text text-anchor="start" x="3315" y="-3267.8" font-family="Times,serif" font-size="14.00">SampleId</text> +<polygon fill="none" stroke="black" points="3272,-3259.5 3272,-3328.5 3426,-3328.5 3426,-3259.5 3272,-3259.5"/> +</g> +<!-- ProbeFreeze --> +<g id="node42" class="node"> +<title>ProbeFreeze</title> +<polygon fill="white" stroke="transparent" points="2611,-1855 2611,-2071 2777,-2071 2777,-1855 2611,-1855"/> +<polygon fill="#d7b5d8" stroke="transparent" points="2614,-2047 2614,-2068 2774,-2068 2774,-2047 2614,-2047"/> +<polygon fill="none" stroke="black" points="2614,-2047 2614,-2068 2774,-2068 2774,-2047 2614,-2047"/> +<text text-anchor="start" x="2617" y="-2053.8" font-family="Times,serif" font-size="14.00">ProbeFreeze (30 KiB)</text> +<text text-anchor="start" x="2670" y="-2031.8" font-family="Times,serif" font-size="14.00">ChipId</text> +<text text-anchor="start" x="2652" y="-2010.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="2659" y="-1989.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="2686.5" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2651" y="-1947.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="2672.5" y="-1926.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="2641" y="-1905.8" font-family="Times,serif" font-size="14.00">ProbeFreezeId</text> +<text text-anchor="start" x="2653" y="-1884.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<text text-anchor="start" x="2663.5" y="-1863.8" font-family="Times,serif" font-size="14.00">TissueId</text> +<polygon fill="none" stroke="black" points="2611,-1855 2611,-2071 2777,-2071 2777,-1855 2611,-1855"/> +</g> +<!-- SampleXRef->ProbeFreeze --> +<g id="edge5" class="edge"> +<title>SampleXRef:ProbeFreezeId->ProbeFreeze</title> +<path fill="none" stroke="black" d="M3274,-3292C3032.87,-3292 3338.17,-2922.26 3158,-2762 3097.26,-2707.98 2852.39,-2782.55 2794,-2726 2622.74,-2560.12 2641.84,-2254.55 2669,-2085.12"/> +<polygon fill="black" stroke="black" points="2672.47,-2085.6 2670.63,-2075.16 2665.56,-2084.47 2672.47,-2085.6"/> +</g> +<!-- Sample --> +<g id="node95" class="node"> +<title>Sample</title> +<polygon fill="white" stroke="transparent" points="3653.5,-1792 3653.5,-2134 3782.5,-2134 3782.5,-1792 3653.5,-1792"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3657,-2110 3657,-2131 3780,-2131 3780,-2110 3657,-2110"/> +<polygon fill="none" stroke="black" points="3657,-2110 3657,-2131 3780,-2131 3780,-2110 3657,-2110"/> +<text text-anchor="start" x="3660" y="-2116.8" font-family="Times,serif" font-size="14.00">Sample (53 KiB)</text> +<text text-anchor="start" x="3704.5" y="-2094.8" font-family="Times,serif" font-size="14.00">Age</text> +<text text-anchor="start" x="3688" y="-2073.8" font-family="Times,serif" font-size="14.00">CELURL</text> +<text text-anchor="start" x="3686.5" y="-2052.8" font-family="Times,serif" font-size="14.00">CHPURL</text> +<text text-anchor="start" x="3676.5" y="-2031.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="3688" y="-2010.8" font-family="Times,serif" font-size="14.00">DATURL</text> +<text text-anchor="start" x="3688" y="-1989.8" font-family="Times,serif" font-size="14.00">EXPURL</text> +<text text-anchor="start" x="3687" y="-1968.8" font-family="Times,serif" font-size="14.00">FromSrc</text> +<text text-anchor="start" x="3711" y="-1947.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3680.5" y="-1926.8" font-family="Times,serif" font-size="14.00">ImageURL</text> +<text text-anchor="start" x="3697" y="-1905.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="3688" y="-1884.8" font-family="Times,serif" font-size="14.00">RPTURL</text> +<text text-anchor="start" x="3705" y="-1863.8" font-family="Times,serif" font-size="14.00">Sex</text> +<text text-anchor="start" x="3689" y="-1842.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="3678" y="-1821.8" font-family="Times,serif" font-size="14.00">TissueType</text> +<text text-anchor="start" x="3688.5" y="-1800.8" font-family="Times,serif" font-size="14.00">TXTURL</text> +<polygon fill="none" stroke="black" points="3653.5,-1792 3653.5,-2134 3782.5,-2134 3782.5,-1792 3653.5,-1792"/> +</g> +<!-- SampleXRef->Sample --> +<g id="edge6" class="edge"> +<title>SampleXRef:SampleId->Sample</title> +<path fill="none" stroke="black" d="M3424,-3271C3878.8,-3271 3810.34,-2508.42 3752.65,-2148.25"/> +<polygon fill="black" stroke="black" points="3756.08,-2147.55 3751.03,-2138.24 3749.17,-2148.67 3756.08,-2147.55"/> +</g> +<!-- GeneIDXRef --> +<g id="node5" class="node"> +<title>GeneIDXRef</title> +<polygon fill="white" stroke="transparent" points="7441,-4842.5 7441,-4932.5 7613,-4932.5 7613,-4842.5 7441,-4842.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="7444,-4908.5 7444,-4929.5 7610,-4929.5 7610,-4908.5 7444,-4908.5"/> +<polygon fill="none" stroke="black" points="7444,-4908.5 7444,-4929.5 7610,-4929.5 7610,-4908.5 7444,-4908.5"/> +<text text-anchor="start" x="7447" y="-4915.3" font-family="Times,serif" font-size="14.00">GeneIDXRef (220 KiB)</text> +<text text-anchor="start" x="7502.5" y="-4893.3" font-family="Times,serif" font-size="14.00">human</text> +<text text-anchor="start" x="7503.5" y="-4872.3" font-family="Times,serif" font-size="14.00">mouse</text> +<text text-anchor="start" x="7516" y="-4851.3" font-family="Times,serif" font-size="14.00">rat</text> +<polygon fill="none" stroke="black" points="7441,-4842.5 7441,-4932.5 7613,-4932.5 7613,-4842.5 7441,-4842.5"/> +</g> +<!-- MachineAccessLog --> +<g id="node6" class="node"> +<title>MachineAccessLog</title> +<polygon fill="white" stroke="transparent" points="7647,-4811 7647,-4964 7861,-4964 7861,-4811 7647,-4811"/> +<polygon fill="#df65b0" stroke="transparent" points="7650,-4939.5 7650,-4960.5 7858,-4960.5 7858,-4939.5 7650,-4939.5"/> +<polygon fill="none" stroke="black" points="7650,-4939.5 7650,-4960.5 7858,-4960.5 7858,-4939.5 7650,-4939.5"/> +<text text-anchor="start" x="7653" y="-4946.3" font-family="Times,serif" font-size="14.00">MachineAccessLog (23 MiB)</text> +<text text-anchor="start" x="7714.5" y="-4924.3" font-family="Times,serif" font-size="14.00">accesstime</text> +<text text-anchor="start" x="7732" y="-4903.3" font-family="Times,serif" font-size="14.00">action</text> +<text text-anchor="start" x="7728" y="-4882.3" font-family="Times,serif" font-size="14.00">data_id</text> +<text text-anchor="start" x="7734.5" y="-4861.3" font-family="Times,serif" font-size="14.00">db_id</text> +<text text-anchor="start" x="7747" y="-4840.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="7715.5" y="-4819.3" font-family="Times,serif" font-size="14.00">ip_address</text> +<polygon fill="none" stroke="black" points="7647,-4811 7647,-4964 7861,-4964 7861,-4811 7647,-4811"/> +</g> +<!-- metadata_audit --> +<g id="node7" class="node"> +<title>metadata_audit</title> +<polygon fill="white" stroke="transparent" points="292.5,-1897 292.5,-2029 479.5,-2029 479.5,-1897 292.5,-1897"/> +<polygon fill="#d7b5d8" stroke="transparent" points="296,-2005 296,-2026 477,-2026 477,-2005 296,-2005"/> +<polygon fill="none" stroke="black" points="296,-2005 296,-2026 477,-2026 477,-2005 296,-2005"/> +<text text-anchor="start" x="299" y="-2011.8" font-family="Times,serif" font-size="14.00">metadata_audit (16 KiB)</text> +<text text-anchor="start" x="349.5" y="-1989.8" font-family="Times,serif" font-size="14.00">dataset_id</text> +<text text-anchor="start" x="365" y="-1968.8" font-family="Times,serif" font-size="14.00">editor</text> +<text text-anchor="start" x="379.5" y="-1947.8" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="337.5" y="-1926.8" font-family="Times,serif" font-size="14.00">json_diff_data</text> +<text text-anchor="start" x="344.5" y="-1905.8" font-family="Times,serif" font-size="14.00">time_stamp</text> +<polygon fill="none" stroke="black" points="292.5,-1897 292.5,-2029 479.5,-2029 479.5,-1897 292.5,-1897"/> +</g> +<!-- Datasets --> +<g id="node16" class="node"> +<title>Datasets</title> +<polygon fill="lightgrey" stroke="transparent" points="305,-660.5 305,-1023.5 469,-1023.5 469,-660.5 305,-660.5"/> +<polygon fill="#df65b0" stroke="transparent" points="308,-999 308,-1020 466,-1020 466,-999 308,-999"/> +<polygon fill="none" stroke="black" points="308,-999 308,-1020 466,-1020 466,-999 308,-999"/> +<text text-anchor="start" x="326.5" y="-1005.8" font-family="Times,serif" font-size="14.00">Datasets (4 MiB)</text> +<polygon fill="green" stroke="transparent" points="308,-978 308,-997 466,-997 466,-978 308,-978"/> +<text text-anchor="start" x="344.5" y="-983.8" font-family="Times,serif" font-size="14.00">AboutCases</text> +<polygon fill="green" stroke="transparent" points="308,-957 308,-976 466,-976 466,-957 308,-957"/> +<text text-anchor="start" x="310" y="-962.8" font-family="Times,serif" font-size="14.00">AboutDataProcessing</text> +<polygon fill="green" stroke="transparent" points="308,-936 308,-955 466,-955 466,-936 308,-936"/> +<text text-anchor="start" x="334.5" y="-941.8" font-family="Times,serif" font-size="14.00">AboutPlatform</text> +<polygon fill="green" stroke="transparent" points="308,-915 308,-934 466,-934 466,-915 308,-915"/> +<text text-anchor="start" x="343" y="-920.8" font-family="Times,serif" font-size="14.00">AboutTissue</text> +<polygon fill="green" stroke="transparent" points="308,-894 308,-913 466,-913 466,-894 308,-894"/> +<text text-anchor="start" x="325.5" y="-899.8" font-family="Times,serif" font-size="14.00">Acknowledgment</text> +<polygon fill="green" stroke="transparent" points="308,-873 308,-892 466,-892 466,-873 308,-873"/> +<text text-anchor="start" x="358" y="-878.8" font-family="Times,serif" font-size="14.00">Citation</text> +<polygon fill="green" stroke="transparent" points="308,-852 308,-871 466,-871 466,-852 308,-852"/> +<text text-anchor="start" x="341" y="-857.8" font-family="Times,serif" font-size="14.00">Contributors</text> +<text text-anchor="start" x="352" y="-836.8" font-family="Times,serif" font-size="14.00">DatasetId</text> +<polygon fill="green" stroke="transparent" points="308,-810 308,-829 466,-829 466,-810 308,-810"/> +<text text-anchor="start" x="338" y="-815.8" font-family="Times,serif" font-size="14.00">DatasetName</text> +<text text-anchor="start" x="328.5" y="-794.8" font-family="Times,serif" font-size="14.00">DatasetStatusId</text> +<polygon fill="green" stroke="transparent" points="308,-768 308,-787 466,-787 466,-768 308,-768"/> +<text text-anchor="start" x="320" y="-773.8" font-family="Times,serif" font-size="14.00">ExperimentDesign</text> +<polygon fill="green" stroke="transparent" points="308,-747 308,-766 466,-766 466,-747 308,-747"/> +<text text-anchor="start" x="350.5" y="-752.8" font-family="Times,serif" font-size="14.00">GeoSeries</text> +<text text-anchor="start" x="336" y="-731.8" font-family="Times,serif" font-size="14.00">InvestigatorId</text> +<polygon fill="green" stroke="transparent" points="308,-705 308,-724 466,-724 466,-705 308,-705"/> +<text text-anchor="start" x="365.5" y="-710.8" font-family="Times,serif" font-size="14.00">Notes</text> +<text text-anchor="start" x="330.5" y="-689.8" font-family="Times,serif" font-size="14.00">PublicationTitle</text> +<polygon fill="green" stroke="transparent" points="308,-663 308,-682 466,-682 466,-663 308,-663"/> +<text text-anchor="start" x="352" y="-668.8" font-family="Times,serif" font-size="14.00">Summary</text> +<polygon fill="none" stroke="black" points="305,-660.5 305,-1023.5 469,-1023.5 469,-660.5 305,-660.5"/> +</g> +<!-- metadata_audit->Datasets --> +<g id="edge7" class="edge"> +<title>metadata_audit:dataset_id->Datasets</title> +<path fill="none" stroke="black" d="M478,-1994C525.38,-1994 453.11,-1365.95 412.1,-1037.71"/> +<polygon fill="black" stroke="black" points="415.55,-1037.1 410.84,-1027.61 408.61,-1037.97 415.55,-1037.1"/> +</g> +<!-- GenoXRef --> +<g id="node8" class="node"> +<title>GenoXRef</title> +<polygon fill="white" stroke="transparent" points="4464,-3228 4464,-3360 4614,-3360 4614,-3228 4464,-3228"/> +<polygon fill="#df65b0" stroke="transparent" points="4467,-3336 4467,-3357 4611,-3357 4611,-3336 4467,-3336"/> +<polygon fill="none" stroke="black" points="4467,-3336 4467,-3357 4611,-3357 4611,-3336 4467,-3336"/> +<text text-anchor="start" x="4470" y="-3342.8" font-family="Times,serif" font-size="14.00">GenoXRef (14 MiB)</text> +<text text-anchor="start" x="4528" y="-3320.8" font-family="Times,serif" font-size="14.00">cM</text> +<text text-anchor="start" x="4514.5" y="-3299.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="4489" y="-3278.8" font-family="Times,serif" font-size="14.00">GenoFreezeId</text> +<text text-anchor="start" x="4513" y="-3257.8" font-family="Times,serif" font-size="14.00">GenoId</text> +<text text-anchor="start" x="4472.5" y="-3236.8" font-family="Times,serif" font-size="14.00">Used_for_mapping</text> +<polygon fill="none" stroke="black" points="4464,-3228 4464,-3360 4614,-3360 4614,-3228 4464,-3228"/> +</g> +<!-- Geno --> +<g id="node46" class="node"> +<title>Geno</title> +<polygon fill="white" stroke="transparent" points="4245,-671 4245,-1013 4383,-1013 4383,-671 4245,-671"/> +<polygon fill="#df65b0" stroke="transparent" points="4248,-989 4248,-1010 4380,-1010 4380,-989 4248,-989"/> +<polygon fill="none" stroke="black" points="4248,-989 4248,-1010 4380,-1010 4380,-989 4248,-989"/> +<text text-anchor="start" x="4262" y="-995.8" font-family="Times,serif" font-size="14.00">Geno (39 MiB)</text> +<text text-anchor="start" x="4300.5" y="-973.8" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="4279" y="-952.8" font-family="Times,serif" font-size="14.00">Chr_mm8</text> +<text text-anchor="start" x="4283" y="-931.8" font-family="Times,serif" font-size="14.00">chr_num</text> +<text text-anchor="start" x="4275.5" y="-910.8" font-family="Times,serif" font-size="14.00">Comments</text> +<text text-anchor="start" x="4306.5" y="-889.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="4263" y="-868.8" font-family="Times,serif" font-size="14.00">Marker_Name</text> +<text text-anchor="start" x="4302" y="-847.8" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="4280.5" y="-826.8" font-family="Times,serif" font-size="14.00">Mb_2016</text> +<text text-anchor="start" x="4280.5" y="-805.8" font-family="Times,serif" font-size="14.00">Mb_mm8</text> +<text text-anchor="start" x="4292.5" y="-784.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="4279" y="-763.8" font-family="Times,serif" font-size="14.00">Sequence</text> +<text text-anchor="start" x="4289" y="-742.8" font-family="Times,serif" font-size="14.00">Source</text> +<text text-anchor="start" x="4284.5" y="-721.8" font-family="Times,serif" font-size="14.00">Source2</text> +<text text-anchor="start" x="4279" y="-700.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="4250" y="-679.8" font-family="Times,serif" font-size="14.00">used_by_geno_file</text> +<polygon fill="none" stroke="black" points="4245,-671 4245,-1013 4383,-1013 4383,-671 4245,-671"/> +</g> +<!-- GenoXRef->Geno --> +<g id="edge9" class="edge"> +<title>GenoXRef:GenoId->Geno</title> +<path fill="none" stroke="black" d="M4612,-3261C4626.31,-3261 4580.57,-1213.56 4576,-1200 4540.22,-1093.91 4460.35,-992.99 4398.15,-925.69"/> +<polygon fill="black" stroke="black" points="4400.41,-922.99 4391.03,-918.06 4395.29,-927.76 4400.41,-922.99"/> +</g> +<!-- GenoFreeze --> +<g id="node82" class="node"> +<title>GenoFreeze</title> +<polygon fill="white" stroke="transparent" points="4407,-1855 4407,-2071 4559,-2071 4559,-1855 4407,-1855"/> +<polygon fill="#d7b5d8" stroke="transparent" points="4410,-2047 4410,-2068 4556,-2068 4556,-2047 4410,-2047"/> +<polygon fill="none" stroke="black" points="4410,-2047 4410,-2068 4556,-2068 4556,-2047 4410,-2047"/> +<text text-anchor="start" x="4413" y="-2053.8" font-family="Times,serif" font-size="14.00">GenoFreeze (2 KiB)</text> +<text text-anchor="start" x="4422.5" y="-2031.8" font-family="Times,serif" font-size="14.00">AuthorisedUsers</text> +<text text-anchor="start" x="4431.5" y="-2010.8" font-family="Times,serif" font-size="14.00">confidentiality</text> +<text text-anchor="start" x="4441" y="-1989.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="4448" y="-1968.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="4475.5" y="-1947.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="4440" y="-1926.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="4461.5" y="-1905.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="4461" y="-1884.8" font-family="Times,serif" font-size="14.00">public</text> +<text text-anchor="start" x="4442" y="-1863.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<polygon fill="none" stroke="black" points="4407,-1855 4407,-2071 4559,-2071 4559,-1855 4407,-1855"/> +</g> +<!-- GenoXRef->GenoFreeze --> +<g id="edge8" class="edge"> +<title>GenoXRef:GenoFreezeId->GenoFreeze</title> +<path fill="none" stroke="black" d="M4466,-3282C4346.95,-3282 4432.68,-2411.13 4468.93,-2085.19"/> +<polygon fill="black" stroke="black" points="4472.41,-2085.56 4470.04,-2075.24 4465.45,-2084.79 4472.41,-2085.56"/> +</g> +<!-- TissueProbeSetXRef --> +<g id="node9" class="node"> +<title>TissueProbeSetXRef</title> +<polygon fill="white" stroke="transparent" points="6347,-4748 6347,-5027 6563,-5027 6563,-4748 6347,-4748"/> +<polygon fill="#df65b0" stroke="transparent" points="6350,-5002.5 6350,-5023.5 6560,-5023.5 6560,-5002.5 6350,-5002.5"/> +<polygon fill="none" stroke="black" points="6350,-5002.5 6350,-5023.5 6560,-5023.5 6560,-5002.5 6350,-5002.5"/> +<text text-anchor="start" x="6353" y="-5009.3" font-family="Times,serif" font-size="14.00">TissueProbeSetXRef (9 MiB)</text> +<text text-anchor="start" x="6441.5" y="-4987.3" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="6430.5" y="-4966.3" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="6414.5" y="-4945.3" font-family="Times,serif" font-size="14.00">description</text> +<text text-anchor="start" x="6429" y="-4924.3" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="6443" y="-4903.3" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="6421.5" y="-4882.3" font-family="Times,serif" font-size="14.00">Mb_2016</text> +<text text-anchor="start" x="6435" y="-4861.3" font-family="Times,serif" font-size="14.00">Mean</text> +<text text-anchor="start" x="6362.5" y="-4840.3" font-family="Times,serif" font-size="14.00">Probe_Target_Description</text> +<text text-anchor="start" x="6415.5" y="-4819.3" font-family="Times,serif" font-size="14.00">ProbesetId</text> +<text text-anchor="start" x="6428" y="-4798.3" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="6367.5" y="-4777.3" font-family="Times,serif" font-size="14.00">TissueProbeSetFreezeId</text> +<text text-anchor="start" x="6419" y="-4756.3" font-family="Times,serif" font-size="14.00">useStatus</text> +<polygon fill="none" stroke="black" points="6347,-4748 6347,-5027 6563,-5027 6563,-4748 6347,-4748"/> +</g> +<!-- TissueProbeSetFreeze --> +<g id="node23" class="node"> +<title>TissueProbeSetFreeze</title> +<polygon fill="white" stroke="transparent" points="4747,-3165 4747,-3423 4977,-3423 4977,-3165 4747,-3165"/> +<polygon fill="#f1eef6" stroke="transparent" points="4750,-3399 4750,-3420 4974,-3420 4974,-3399 4750,-3399"/> +<polygon fill="none" stroke="black" points="4750,-3399 4750,-3420 4974,-3420 4974,-3399 4750,-3399"/> +<text text-anchor="start" x="4753" y="-3405.8" font-family="Times,serif" font-size="14.00">TissueProbeSetFreeze (228 B)</text> +<text text-anchor="start" x="4801.5" y="-3383.8" font-family="Times,serif" font-size="14.00">AuthorisedUsers</text> +<text text-anchor="start" x="4840" y="-3362.8" font-family="Times,serif" font-size="14.00">AvgID</text> +<text text-anchor="start" x="4810.5" y="-3341.8" font-family="Times,serif" font-size="14.00">confidentiality</text> +<text text-anchor="start" x="4820" y="-3320.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="4827" y="-3299.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="4854.5" y="-3278.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="4840.5" y="-3257.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="4836" y="-3236.8" font-family="Times,serif" font-size="14.00">Name2</text> +<text text-anchor="start" x="4840" y="-3215.8" font-family="Times,serif" font-size="14.00">public</text> +<text text-anchor="start" x="4821" y="-3194.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<text text-anchor="start" x="4786.5" y="-3173.8" font-family="Times,serif" font-size="14.00">TissueProbeFreezeId</text> +<polygon fill="none" stroke="black" points="4747,-3165 4747,-3423 4977,-3423 4977,-3165 4747,-3165"/> +</g> +<!-- TissueProbeSetXRef->TissueProbeSetFreeze --> +<g id="edge11" class="edge"> +<title>TissueProbeSetXRef:TissueProbeSetFreezeId->TissueProbeSetFreeze</title> +<path fill="none" stroke="black" d="M6349,-4780.5C5901.77,-4780.5 6243.92,-4188.23 5938,-3862 5667.77,-3573.83 5217.81,-3404.02 4995.17,-3333.49"/> +<polygon fill="black" stroke="black" points="4995.98,-3330.08 4985.39,-3330.41 4993.88,-3336.75 4995.98,-3330.08"/> +</g> +<!-- ProbeSE --> +<g id="node78" class="node"> +<title>ProbeSE</title> +<polygon fill="white" stroke="transparent" points="6992,-1918 6992,-2008 7122,-2008 7122,-1918 6992,-1918"/> +<polygon fill="#ce1256" stroke="transparent" points="6995,-1984 6995,-2005 7119,-2005 7119,-1984 6995,-1984"/> +<polygon fill="none" stroke="black" points="6995,-1984 6995,-2005 7119,-2005 7119,-1984 6995,-1984"/> +<text text-anchor="start" x="6998" y="-1990.8" font-family="Times,serif" font-size="14.00">ProbeSE (3 GiB)</text> +<text text-anchor="start" x="7032.5" y="-1968.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="7038.5" y="-1947.8" font-family="Times,serif" font-size="14.00">error</text> +<text text-anchor="start" x="7027.5" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="6992,-1918 6992,-2008 7122,-2008 7122,-1918 6992,-1918"/> +</g> +<!-- TissueProbeSetXRef->ProbeSE --> +<g id="edge10" class="edge"> +<title>TissueProbeSetXRef:ProbesetId->ProbeSE</title> +<path fill="none" stroke="black" d="M6561,-4822.5C6998.45,-4822.5 6458.97,-4163.43 6776,-3862 6844.63,-3796.75 6923.59,-3897.22 6986,-3826 7107.35,-3687.52 7069.01,-2322.6 7059.04,-2022.25"/> +<polygon fill="black" stroke="black" points="7062.53,-2021.9 7058.7,-2012.02 7055.54,-2022.13 7062.53,-2021.9"/> +</g> +<!-- Homologene --> +<g id="node10" class="node"> +<title>Homologene</title> +<polygon fill="white" stroke="transparent" points="7895,-4842.5 7895,-4932.5 8055,-4932.5 8055,-4842.5 7895,-4842.5"/> +<polygon fill="#df65b0" stroke="transparent" points="7898,-4908.5 7898,-4929.5 8052,-4929.5 8052,-4908.5 7898,-4908.5"/> +<polygon fill="none" stroke="black" points="7898,-4908.5 7898,-4929.5 8052,-4929.5 8052,-4908.5 7898,-4908.5"/> +<text text-anchor="start" x="7901" y="-4915.3" font-family="Times,serif" font-size="14.00">Homologene (3 MiB)</text> +<text text-anchor="start" x="7949" y="-4893.3" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="7923" y="-4872.3" font-family="Times,serif" font-size="14.00">HomologeneId</text> +<text text-anchor="start" x="7931.5" y="-4851.3" font-family="Times,serif" font-size="14.00">TaxonomyId</text> +<polygon fill="none" stroke="black" points="7895,-4842.5 7895,-4932.5 8055,-4932.5 8055,-4842.5 7895,-4842.5"/> +</g> +<!-- PublishData --> +<g id="node11" class="node"> +<title>PublishData</title> +<polygon fill="white" stroke="transparent" points="5091,-1918 5091,-2008 5257,-2008 5257,-1918 5091,-1918"/> +<polygon fill="#df65b0" stroke="transparent" points="5094,-1984 5094,-2005 5254,-2005 5254,-1984 5094,-1984"/> +<polygon fill="none" stroke="black" points="5094,-1984 5094,-2005 5254,-2005 5254,-1984 5094,-1984"/> +<text text-anchor="start" x="5097" y="-1990.8" font-family="Times,serif" font-size="14.00">PublishData (34 MiB)</text> +<text text-anchor="start" x="5166.5" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="5144.5" y="-1947.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="5154.5" y="-1926.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="5091,-1918 5091,-2008 5257,-2008 5257,-1918 5091,-1918"/> +</g> +<!-- PublishData->Strain --> +<g id="edge12" class="edge"> +<title>PublishData:StrainId->Strain</title> +<path fill="none" stroke="black" d="M5255,-1951C5275.87,-1951 5264.11,-1218.38 5274,-1200 5368.85,-1023.7 5593.45,-915.93 5711.13,-869.6"/> +<polygon fill="black" stroke="black" points="5712.4,-872.86 5720.45,-865.97 5709.86,-866.34 5712.4,-872.86"/> +</g> +<!-- ProbeSetXRef --> +<g id="node12" class="node"> +<title>ProbeSetXRef</title> +<polygon fill="white" stroke="transparent" points="3033.5,-4737.5 3033.5,-5037.5 3200.5,-5037.5 3200.5,-4737.5 3033.5,-4737.5"/> +<polygon fill="#ce1256" stroke="transparent" points="3037,-5013.5 3037,-5034.5 3198,-5034.5 3198,-5013.5 3037,-5013.5"/> +<polygon fill="none" stroke="black" points="3037,-5013.5 3037,-5034.5 3198,-5034.5 3198,-5013.5 3037,-5013.5"/> +<text text-anchor="start" x="3040" y="-5020.3" font-family="Times,serif" font-size="14.00">ProbeSetXRef (2 GiB)</text> +<text text-anchor="start" x="3088.5" y="-4998.3" font-family="Times,serif" font-size="14.00">additive</text> +<text text-anchor="start" x="3093" y="-4977.3" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="3108" y="-4956.3" font-family="Times,serif" font-size="14.00">h2</text> +<text text-anchor="start" x="3096.5" y="-4935.3" font-family="Times,serif" font-size="14.00">Locus</text> +<text text-anchor="start" x="3082.5" y="-4914.3" font-family="Times,serif" font-size="14.00">Locus_old</text> +<text text-anchor="start" x="3102.5" y="-4893.3" font-family="Times,serif" font-size="14.00">LRS</text> +<text text-anchor="start" x="3088.5" y="-4872.3" font-family="Times,serif" font-size="14.00">LRS_old</text> +<text text-anchor="start" x="3097.5" y="-4851.3" font-family="Times,serif" font-size="14.00">mean</text> +<text text-anchor="start" x="3052.5" y="-4830.3" font-family="Times,serif" font-size="14.00">ProbeSetFreezeId</text> +<text text-anchor="start" x="3077" y="-4809.3" font-family="Times,serif" font-size="14.00">ProbeSetId</text> +<text text-anchor="start" x="3093" y="-4788.3" font-family="Times,serif" font-size="14.00">pValue</text> +<text text-anchor="start" x="3079" y="-4767.3" font-family="Times,serif" font-size="14.00">pValue_old</text> +<text text-anchor="start" x="3109.5" y="-4746.3" font-family="Times,serif" font-size="14.00">se</text> +<polygon fill="none" stroke="black" points="3033.5,-4737.5 3033.5,-5037.5 3200.5,-5037.5 3200.5,-4737.5 3033.5,-4737.5"/> +</g> +<!-- ProbeSetXRef->ProbeSE --> +<g id="edge14" class="edge"> +<title>ProbeSetXRef:ProbeSetId->ProbeSE</title> +<path fill="none" stroke="black" d="M3199,-4812.5C4021.93,-4812.5 3996.77,-4088.2 4788,-3862 4841.88,-3846.6 6765.02,-3865.27 6805,-3826 6889.39,-3743.1 6769.62,-2854.79 6843,-2762 6880.46,-2714.64 6934.85,-2771.97 6974,-2726 7149.11,-2520.43 7098.76,-2161.98 7070.36,-2022.18"/> +<polygon fill="black" stroke="black" points="7073.73,-2021.18 7068.27,-2012.1 7066.87,-2022.6 7073.73,-2021.18"/> +</g> +<!-- ProbeSetFreeze --> +<g id="node90" class="node"> +<title>ProbeSetFreeze</title> +<polygon fill="white" stroke="transparent" points="2639.5,-3144 2639.5,-3444 2838.5,-3444 2838.5,-3144 2639.5,-3144"/> +<polygon fill="#d7b5d8" stroke="transparent" points="2643,-3420 2643,-3441 2836,-3441 2836,-3420 2643,-3420"/> +<polygon fill="none" stroke="black" points="2643,-3420 2643,-3441 2836,-3441 2836,-3420 2643,-3420"/> +<text text-anchor="start" x="2646" y="-3426.8" font-family="Times,serif" font-size="14.00">ProbeSetFreeze (171 KiB)</text> +<text text-anchor="start" x="2679" y="-3404.8" font-family="Times,serif" font-size="14.00">AuthorisedUsers</text> +<text text-anchor="start" x="2717.5" y="-3383.8" font-family="Times,serif" font-size="14.00">AvgID</text> +<text text-anchor="start" x="2688" y="-3362.8" font-family="Times,serif" font-size="14.00">confidentiality</text> +<text text-anchor="start" x="2697.5" y="-3341.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="2703" y="-3320.8" font-family="Times,serif" font-size="14.00">DataScale</text> +<text text-anchor="start" x="2704.5" y="-3299.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="2732" y="-3278.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2718" y="-3257.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="2713.5" y="-3236.8" font-family="Times,serif" font-size="14.00">Name2</text> +<text text-anchor="start" x="2704.5" y="-3215.8" font-family="Times,serif" font-size="14.00">OrderList</text> +<text text-anchor="start" x="2686.5" y="-3194.8" font-family="Times,serif" font-size="14.00">ProbeFreezeId</text> +<text text-anchor="start" x="2717.5" y="-3173.8" font-family="Times,serif" font-size="14.00">public</text> +<text text-anchor="start" x="2698.5" y="-3152.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<polygon fill="none" stroke="black" points="2639.5,-3144 2639.5,-3444 2838.5,-3444 2838.5,-3144 2639.5,-3144"/> +</g> +<!-- ProbeSetXRef->ProbeSetFreeze --> +<g id="edge13" class="edge"> +<title>ProbeSetXRef:ProbeSetFreezeId->ProbeSetFreeze</title> +<path fill="none" stroke="black" d="M3036,-4833.5C2816.79,-4833.5 2907.79,-4076.99 2865,-3862 2837.79,-3725.3 2803.24,-3570.92 2777.19,-3457.81"/> +<polygon fill="black" stroke="black" points="2780.6,-3456.98 2774.94,-3448.03 2773.77,-3458.56 2780.6,-3456.98"/> +</g> +<!-- TraitMetadata --> +<g id="node13" class="node"> +<title>TraitMetadata</title> +<polygon fill="white" stroke="transparent" points="8089,-4853 8089,-4922 8267,-4922 8267,-4853 8089,-4853"/> +<polygon fill="#d7b5d8" stroke="transparent" points="8092,-4897.5 8092,-4918.5 8264,-4918.5 8264,-4897.5 8092,-4897.5"/> +<polygon fill="none" stroke="black" points="8092,-4897.5 8092,-4918.5 8264,-4918.5 8264,-4897.5 8092,-4897.5"/> +<text text-anchor="start" x="8095" y="-4904.3" font-family="Times,serif" font-size="14.00">TraitMetadata (16 KiB)</text> +<text text-anchor="start" x="8162" y="-4882.3" font-family="Times,serif" font-size="14.00">type</text> +<text text-anchor="start" x="8158.5" y="-4861.3" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="8089,-4853 8089,-4922 8267,-4922 8267,-4853 8089,-4853"/> +</g> +<!-- TissueProbeSetData --> +<g id="node14" class="node"> +<title>TissueProbeSetData</title> +<polygon fill="white" stroke="transparent" points="2313.5,-1918 2313.5,-2008 2538.5,-2008 2538.5,-1918 2313.5,-1918"/> +<polygon fill="#df65b0" stroke="transparent" points="2317,-1984 2317,-2005 2536,-2005 2536,-1984 2317,-1984"/> +<polygon fill="none" stroke="black" points="2317,-1984 2317,-2005 2536,-2005 2536,-1984 2317,-1984"/> +<text text-anchor="start" x="2320" y="-1990.8" font-family="Times,serif" font-size="14.00">TissueProbeSetData (33 MiB)</text> +<text text-anchor="start" x="2419" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2395" y="-1947.8" font-family="Times,serif" font-size="14.00">TissueID</text> +<text text-anchor="start" x="2407" y="-1926.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="2313.5,-1918 2313.5,-2008 2538.5,-2008 2538.5,-1918 2313.5,-1918"/> +</g> +<!-- Tissue --> +<g id="node79" class="node"> +<title>Tissue</title> +<polygon fill="lightgrey" stroke="transparent" points="2372.5,-755 2372.5,-929 2497.5,-929 2497.5,-755 2372.5,-755"/> +<polygon fill="#d7b5d8" stroke="transparent" points="2376,-905 2376,-926 2495,-926 2495,-905 2376,-905"/> +<polygon fill="none" stroke="black" points="2376,-905 2376,-926 2495,-926 2495,-905 2376,-905"/> +<text text-anchor="start" x="2381" y="-911.8" font-family="Times,serif" font-size="14.00">Tissue (11 KiB)</text> +<text text-anchor="start" x="2390.5" y="-889.8" font-family="Times,serif" font-size="14.00">BIRN_lex_ID</text> +<text text-anchor="start" x="2378" y="-868.8" font-family="Times,serif" font-size="14.00">BIRN_lex_Name</text> +<text text-anchor="start" x="2428" y="-847.8" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="green" stroke="transparent" points="2376,-821 2376,-840 2495,-840 2495,-821 2376,-821"/> +<text text-anchor="start" x="2414" y="-826.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="green" stroke="transparent" points="2376,-800 2376,-819 2495,-819 2495,-800 2376,-800"/> +<text text-anchor="start" x="2391" y="-805.8" font-family="Times,serif" font-size="14.00">Short_Name</text> +<text text-anchor="start" x="2405" y="-784.8" font-family="Times,serif" font-size="14.00">TissueId</text> +<text text-anchor="start" x="2391.5" y="-763.8" font-family="Times,serif" font-size="14.00">TissueName</text> +<polygon fill="none" stroke="black" points="2372.5,-755 2372.5,-929 2497.5,-929 2497.5,-755 2372.5,-755"/> +</g> +<!-- TissueProbeSetData->Tissue --> +<g id="edge15" class="edge"> +<title>TissueProbeSetData:TissueID->Tissue</title> +<path fill="none" stroke="black" d="M2537,-1951C2587.33,-1951 2488.08,-1216.42 2449.46,-943.5"/> +<polygon fill="black" stroke="black" points="2452.87,-942.61 2448,-933.2 2445.94,-943.59 2452.87,-942.61"/> +</g> +<!-- DBType --> +<g id="node15" class="node"> +<title>DBType</title> +<polygon fill="white" stroke="transparent" points="8304.5,-3259.5 8304.5,-3328.5 8421.5,-3328.5 8421.5,-3259.5 8304.5,-3259.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="8308,-3304 8308,-3325 8419,-3325 8419,-3304 8308,-3304"/> +<polygon fill="none" stroke="black" points="8308,-3304 8308,-3325 8419,-3325 8419,-3304 8308,-3304"/> +<text text-anchor="start" x="8311" y="-3310.8" font-family="Times,serif" font-size="14.00">DBType (99 B)</text> +<text text-anchor="start" x="8356" y="-3288.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="8342" y="-3267.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="8304.5,-3259.5 8304.5,-3328.5 8421.5,-3328.5 8421.5,-3259.5 8304.5,-3259.5"/> +</g> +<!-- DatasetStatus --> +<g id="node20" class="node"> +<title>DatasetStatus</title> +<polygon fill="lightgrey" stroke="transparent" points="305.5,-264 305.5,-333 468.5,-333 468.5,-264 305.5,-264"/> +<polygon fill="#f1eef6" stroke="transparent" points="309,-308.5 309,-329.5 466,-329.5 466,-308.5 309,-308.5"/> +<polygon fill="none" stroke="black" points="309,-308.5 309,-329.5 466,-329.5 466,-308.5 309,-308.5"/> +<text text-anchor="start" x="312" y="-315.3" font-family="Times,serif" font-size="14.00">DatasetStatus (40 B)</text> +<text text-anchor="start" x="329" y="-293.3" font-family="Times,serif" font-size="14.00">DatasetStatusId</text> +<polygon fill="green" stroke="transparent" points="309,-266.5 309,-285.5 466,-285.5 466,-266.5 309,-266.5"/> +<text text-anchor="start" x="315" y="-272.3" font-family="Times,serif" font-size="14.00">DatasetStatusName</text> +<polygon fill="none" stroke="black" points="305.5,-264 305.5,-333 468.5,-333 468.5,-264 305.5,-264"/> +</g> +<!-- Datasets->DatasetStatus --> +<g id="edge16" class="edge"> +<title>Datasets:DatasetStatusId->DatasetStatus</title> +<path fill="none" stroke="black" d="M467,-798C557.78,-798 449.28,-471.63 404.55,-347.04"/> +<polygon fill="black" stroke="black" points="407.75,-345.6 401.06,-337.38 401.16,-347.97 407.75,-345.6"/> +</g> +<!-- Investigators --> +<g id="node71" class="node"> +<title>Investigators</title> +<polygon fill="lightgrey" stroke="transparent" points="88,-117 88,-480 258,-480 258,-117 88,-117"/> +<polygon fill="#d7b5d8" stroke="transparent" points="91,-455.5 91,-476.5 255,-476.5 255,-455.5 91,-455.5"/> +<polygon fill="none" stroke="black" points="91,-455.5 91,-476.5 255,-476.5 255,-455.5 91,-455.5"/> +<text text-anchor="start" x="94" y="-462.3" font-family="Times,serif" font-size="14.00">Investigators (22 KiB)</text> +<polygon fill="green" stroke="transparent" points="91,-434.5 91,-453.5 255,-453.5 255,-434.5 91,-434.5"/> +<text text-anchor="start" x="144" y="-440.3" font-family="Times,serif" font-size="14.00">Address</text> +<polygon fill="green" stroke="transparent" points="91,-413.5 91,-432.5 255,-432.5 255,-413.5 91,-413.5"/> +<text text-anchor="start" x="158" y="-419.3" font-family="Times,serif" font-size="14.00">City</text> +<polygon fill="green" stroke="transparent" points="91,-392.5 91,-411.5 255,-411.5 255,-392.5 91,-392.5"/> +<text text-anchor="start" x="144" y="-398.3" font-family="Times,serif" font-size="14.00">Country</text> +<polygon fill="green" stroke="transparent" points="91,-371.5 91,-390.5 255,-390.5 255,-371.5 91,-371.5"/> +<text text-anchor="start" x="152" y="-377.3" font-family="Times,serif" font-size="14.00">Email</text> +<polygon fill="green" stroke="transparent" points="91,-350.5 91,-369.5 255,-369.5 255,-350.5 91,-350.5"/> +<text text-anchor="start" x="134.5" y="-356.3" font-family="Times,serif" font-size="14.00">FirstName</text> +<text text-anchor="start" x="122" y="-335.3" font-family="Times,serif" font-size="14.00">InvestigatorId</text> +<polygon fill="green" stroke="transparent" points="91,-308.5 91,-327.5 255,-327.5 255,-308.5 91,-308.5"/> +<text text-anchor="start" x="136.5" y="-314.3" font-family="Times,serif" font-size="14.00">LastName</text> +<text text-anchor="start" x="119.5" y="-293.3" font-family="Times,serif" font-size="14.00">OrganizationId</text> +<polygon fill="green" stroke="transparent" points="91,-266.5 91,-285.5 255,-285.5 255,-266.5 91,-266.5"/> +<text text-anchor="start" x="150.5" y="-272.3" font-family="Times,serif" font-size="14.00">Phone</text> +<polygon fill="green" stroke="transparent" points="91,-245.5 91,-264.5 255,-264.5 255,-245.5 91,-245.5"/> +<text text-anchor="start" x="153.5" y="-251.3" font-family="Times,serif" font-size="14.00">State</text> +<polygon fill="green" stroke="transparent" points="91,-224.5 91,-243.5 255,-243.5 255,-224.5 91,-224.5"/> +<text text-anchor="start" x="161" y="-230.3" font-family="Times,serif" font-size="14.00">Url</text> +<text text-anchor="start" x="138.5" y="-209.3" font-family="Times,serif" font-size="14.00">UserDate</text> +<text text-anchor="start" x="136.5" y="-188.3" font-family="Times,serif" font-size="14.00">UserLevel</text> +<text text-anchor="start" x="134.5" y="-167.3" font-family="Times,serif" font-size="14.00">UserName</text> +<text text-anchor="start" x="139.5" y="-146.3" font-family="Times,serif" font-size="14.00">UserPass</text> +<polygon fill="green" stroke="transparent" points="91,-119.5 91,-138.5 255,-138.5 255,-119.5 91,-119.5"/> +<text text-anchor="start" x="143" y="-125.3" font-family="Times,serif" font-size="14.00">ZipCode</text> +<polygon fill="none" stroke="black" points="88,-117 88,-480 258,-480 258,-117 88,-117"/> +</g> +<!-- Datasets->Investigators --> +<g id="edge17" class="edge"> +<title>Datasets:InvestigatorId->Investigators</title> +<path fill="none" stroke="black" d="M307,-735C252.81,-735 218.24,-610.26 197.82,-494.3"/> +<polygon fill="black" stroke="black" points="201.22,-493.45 196.07,-484.19 194.32,-494.64 201.22,-493.45"/> +</g> +<!-- IndelAll --> +<g id="node17" class="node"> +<title>IndelAll</title> +<polygon fill="white" stroke="transparent" points="3168,-692 3168,-992 3302,-992 3302,-692 3168,-692"/> +<polygon fill="#df65b0" stroke="transparent" points="3171,-968 3171,-989 3299,-989 3299,-968 3171,-968"/> +<polygon fill="none" stroke="black" points="3171,-968 3171,-989 3299,-989 3299,-968 3171,-968"/> +<text text-anchor="start" x="3174" y="-974.8" font-family="Times,serif" font-size="14.00">IndelAll (17 MiB)</text> +<text text-anchor="start" x="3188" y="-952.8" font-family="Times,serif" font-size="14.00">Chromosome</text> +<text text-anchor="start" x="3227.5" y="-931.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3181" y="-910.8" font-family="Times,serif" font-size="14.00">InDelSequence</text> +<text text-anchor="start" x="3206.5" y="-889.8" font-family="Times,serif" font-size="14.00">Mb_end</text> +<text text-anchor="start" x="3185" y="-868.8" font-family="Times,serif" font-size="14.00">Mb_end_2016</text> +<text text-anchor="start" x="3202.5" y="-847.8" font-family="Times,serif" font-size="14.00">Mb_start</text> +<text text-anchor="start" x="3181" y="-826.8" font-family="Times,serif" font-size="14.00">Mb_start_2016</text> +<text text-anchor="start" x="3213.5" y="-805.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="3219.5" y="-784.8" font-family="Times,serif" font-size="14.00">Size</text> +<text text-anchor="start" x="3203" y="-763.8" font-family="Times,serif" font-size="14.00">SourceId</text> +<text text-anchor="start" x="3200" y="-742.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="3210.5" y="-721.8" font-family="Times,serif" font-size="14.00">Strand</text> +<text text-anchor="start" x="3217.5" y="-700.8" font-family="Times,serif" font-size="14.00">Type</text> +<polygon fill="none" stroke="black" points="3168,-692 3168,-992 3302,-992 3302,-692 3168,-692"/> +</g> +<!-- IndelAll->Species --> +<g id="edge18" class="edge"> +<title>IndelAll:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M3170,-746C3144.8,-746 3164.16,-541.49 3151,-520 3088.71,-418.27 2960,-356.26 2875.88,-324.91"/> +<polygon fill="black" stroke="black" points="2876.95,-321.58 2866.36,-321.42 2874.55,-328.15 2876.95,-321.58"/> +</g> +<!-- GORef --> +<g id="node18" class="node"> +<title>GORef</title> +<polygon fill="white" stroke="transparent" points="8459.5,-4842.5 8459.5,-4932.5 8576.5,-4932.5 8576.5,-4842.5 8459.5,-4842.5"/> +<polygon fill="#df65b0" stroke="transparent" points="8463,-4908.5 8463,-4929.5 8574,-4929.5 8574,-4908.5 8463,-4908.5"/> +<polygon fill="none" stroke="black" points="8463,-4908.5 8463,-4929.5 8574,-4929.5 8574,-4908.5 8463,-4908.5"/> +<text text-anchor="start" x="8466" y="-4915.3" font-family="Times,serif" font-size="14.00">GORef (2 MiB)</text> +<text text-anchor="start" x="8497" y="-4893.3" font-family="Times,serif" font-size="14.00">genes</text> +<text text-anchor="start" x="8492.5" y="-4872.3" font-family="Times,serif" font-size="14.00">goterm</text> +<text text-anchor="start" x="8511.5" y="-4851.3" font-family="Times,serif" font-size="14.00">id</text> +<polygon fill="none" stroke="black" points="8459.5,-4842.5 8459.5,-4932.5 8576.5,-4932.5 8576.5,-4842.5 8459.5,-4842.5"/> +</g> +<!-- Publication --> +<g id="node19" class="node"> +<title>Publication</title> +<polygon fill="lightgrey" stroke="transparent" points="2531.5,-723.5 2531.5,-960.5 2682.5,-960.5 2682.5,-723.5 2531.5,-723.5"/> +<polygon fill="#df65b0" stroke="transparent" points="2535,-936 2535,-957 2680,-957 2680,-936 2535,-936"/> +<polygon fill="none" stroke="black" points="2535,-936 2535,-957 2680,-957 2680,-936 2535,-936"/> +<text text-anchor="start" x="2538" y="-942.8" font-family="Times,serif" font-size="14.00">Publication (7 MiB)</text> +<polygon fill="green" stroke="transparent" points="2535,-915 2535,-934 2680,-934 2680,-915 2535,-915"/> +<text text-anchor="start" x="2577" y="-920.8" font-family="Times,serif" font-size="14.00">Abstract</text> +<polygon fill="green" stroke="transparent" points="2535,-894 2535,-913 2680,-913 2680,-894 2535,-894"/> +<text text-anchor="start" x="2579" y="-899.8" font-family="Times,serif" font-size="14.00">Authors</text> +<polygon fill="green" stroke="transparent" points="2535,-873 2535,-892 2680,-892 2680,-873 2535,-873"/> +<text text-anchor="start" x="2581.5" y="-878.8" font-family="Times,serif" font-size="14.00">Journal</text> +<polygon fill="green" stroke="transparent" points="2535,-852 2535,-871 2680,-871 2680,-852 2535,-852"/> +<text text-anchor="start" x="2584" y="-857.8" font-family="Times,serif" font-size="14.00">Month</text> +<polygon fill="green" stroke="transparent" points="2535,-831 2535,-850 2680,-850 2680,-831 2535,-831"/> +<text text-anchor="start" x="2586" y="-836.8" font-family="Times,serif" font-size="14.00">Pages</text> +<polygon fill="green" stroke="transparent" points="2535,-810 2535,-829 2680,-829 2680,-810 2535,-810"/> +<text text-anchor="start" x="2566" y="-815.8" font-family="Times,serif" font-size="14.00">PubMed_ID</text> +<polygon fill="green" stroke="transparent" points="2535,-789 2535,-808 2680,-808 2680,-789 2535,-789"/> +<text text-anchor="start" x="2591" y="-794.8" font-family="Times,serif" font-size="14.00">Title</text> +<polygon fill="green" stroke="transparent" points="2535,-768 2535,-787 2680,-787 2680,-768 2535,-768"/> +<text text-anchor="start" x="2581" y="-773.8" font-family="Times,serif" font-size="14.00">Volume</text> +<polygon fill="green" stroke="transparent" points="2535,-747 2535,-766 2680,-766 2680,-747 2535,-747"/> +<text text-anchor="start" x="2591.5" y="-752.8" font-family="Times,serif" font-size="14.00">Year</text> +<text text-anchor="start" x="2600" y="-731.8" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="none" stroke="black" points="2531.5,-723.5 2531.5,-960.5 2682.5,-960.5 2682.5,-723.5 2531.5,-723.5"/> +</g> +<!-- PublishFreeze --> +<g id="node21" class="node"> +<title>PublishFreeze</title> +<polygon fill="white" stroke="transparent" points="3246.5,-1855 3246.5,-2071 3415.5,-2071 3415.5,-1855 3246.5,-1855"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3250,-2047 3250,-2068 3413,-2068 3413,-2047 3250,-2047"/> +<polygon fill="none" stroke="black" points="3250,-2047 3250,-2068 3413,-2068 3413,-2047 3250,-2047"/> +<text text-anchor="start" x="3253" y="-2053.8" font-family="Times,serif" font-size="14.00">PublishFreeze (6 KiB)</text> +<text text-anchor="start" x="3271" y="-2031.8" font-family="Times,serif" font-size="14.00">AuthorisedUsers</text> +<text text-anchor="start" x="3280" y="-2010.8" font-family="Times,serif" font-size="14.00">confidentiality</text> +<text text-anchor="start" x="3289.5" y="-1989.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="3296.5" y="-1968.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="3324" y="-1947.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3288.5" y="-1926.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="3310" y="-1905.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="3309.5" y="-1884.8" font-family="Times,serif" font-size="14.00">public</text> +<text text-anchor="start" x="3290.5" y="-1863.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<polygon fill="none" stroke="black" points="3246.5,-1855 3246.5,-2071 3415.5,-2071 3415.5,-1855 3246.5,-1855"/> +</g> +<!-- InbredSet --> +<g id="node28" class="node"> +<title>InbredSet</title> +<polygon fill="lightgrey" stroke="transparent" points="3781.5,-692 3781.5,-992 3928.5,-992 3928.5,-692 3781.5,-692"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3785,-968 3785,-989 3926,-989 3926,-968 3785,-968"/> +<polygon fill="none" stroke="black" points="3785,-968 3785,-989 3926,-989 3926,-968 3785,-968"/> +<text text-anchor="start" x="3788" y="-974.8" font-family="Times,serif" font-size="14.00">InbredSet (10 KiB)</text> +<text text-anchor="start" x="3810" y="-952.8" font-family="Times,serif" font-size="14.00">FamilyOrder</text> +<text text-anchor="start" x="3848" y="-931.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3801.5" y="-910.8" font-family="Times,serif" font-size="14.00">InbredSetCode</text> +<text text-anchor="start" x="3812.5" y="-889.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="3798.5" y="-868.8" font-family="Times,serif" font-size="14.00">InbredSetName</text> +<text text-anchor="start" x="3789" y="-847.8" font-family="Times,serif" font-size="14.00">MappingMethodId</text> +<text text-anchor="start" x="3807" y="-826.8" font-family="Times,serif" font-size="14.00">MenuOrderId</text> +<polygon fill="green" stroke="transparent" points="3785,-800 3785,-819 3926,-819 3926,-800 3785,-800"/> +<text text-anchor="start" x="3834" y="-805.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="3833.5" y="-784.8" font-family="Times,serif" font-size="14.00">public</text> +<text text-anchor="start" x="3820.5" y="-763.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<polygon fill="green" stroke="transparent" points="3785,-737 3785,-756 3926,-756 3926,-737 3785,-737"/> +<text text-anchor="start" x="3831" y="-742.8" font-family="Times,serif" font-size="14.00">Family</text> +<polygon fill="green" stroke="transparent" points="3785,-716 3785,-735 3926,-735 3926,-716 3785,-716"/> +<text text-anchor="start" x="3820.5" y="-721.8" font-family="Times,serif" font-size="14.00">FullName</text> +<polygon fill="green" stroke="transparent" points="3785,-695 3785,-714 3926,-714 3926,-695 3785,-695"/> +<text text-anchor="start" x="3810.5" y="-700.8" font-family="Times,serif" font-size="14.00">GeneticType</text> +<polygon fill="none" stroke="black" points="3781.5,-692 3781.5,-992 3928.5,-992 3928.5,-692 3781.5,-692"/> +</g> +<!-- PublishFreeze->InbredSet --> +<g id="edge19" class="edge"> +<title>PublishFreeze:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M3414,-1930C3454.58,-1930 3409.48,-1229.81 3437,-1200 3485.84,-1147.1 3703.73,-1210.15 3759,-1164 3805.64,-1125.05 3830.2,-1064.45 3842.93,-1006.34"/> +<polygon fill="black" stroke="black" points="3846.42,-1006.79 3845.03,-996.28 3839.56,-1005.36 3846.42,-1006.79"/> +</g> +<!-- TissueProbeFreeze --> +<g id="node22" class="node"> +<title>TissueProbeFreeze</title> +<polygon fill="white" stroke="transparent" points="4631,-1865.5 4631,-2060.5 4837,-2060.5 4837,-1865.5 4631,-1865.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="4634,-2036 4634,-2057 4834,-2057 4834,-2036 4634,-2036"/> +<polygon fill="none" stroke="black" points="4634,-2036 4634,-2057 4834,-2057 4834,-2036 4634,-2036"/> +<text text-anchor="start" x="4637" y="-2042.8" font-family="Times,serif" font-size="14.00">TissueProbeFreeze (116 B)</text> +<text text-anchor="start" x="4710" y="-2020.8" font-family="Times,serif" font-size="14.00">ChipId</text> +<text text-anchor="start" x="4692" y="-1999.8" font-family="Times,serif" font-size="14.00">CreateTime</text> +<text text-anchor="start" x="4699" y="-1978.8" font-family="Times,serif" font-size="14.00">FullName</text> +<text text-anchor="start" x="4726.5" y="-1957.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="4691" y="-1936.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="4712.5" y="-1915.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="4693" y="-1894.8" font-family="Times,serif" font-size="14.00">ShortName</text> +<text text-anchor="start" x="4704.5" y="-1873.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="4631,-1865.5 4631,-2060.5 4837,-2060.5 4837,-1865.5 4631,-1865.5"/> +</g> +<!-- TissueProbeFreeze->InbredSet --> +<g id="edge20" class="edge"> +<title>TissueProbeFreeze:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M4633,-1940C4550.53,-1940 4633.54,-1259.07 4576,-1200 4521.75,-1144.31 4299.4,-1194.77 4228,-1164 4116.11,-1115.79 4013.14,-1021.68 3943.86,-947.77"/> +<polygon fill="black" stroke="black" points="3946.22,-945.17 3936.85,-940.23 3941.1,-949.94 3946.22,-945.17"/> +</g> +<!-- TissueProbeSetFreeze->TissueProbeFreeze --> +<g id="edge21" class="edge"> +<title>TissueProbeSetFreeze:TissueProbeFreezeId->TissueProbeFreeze</title> +<path fill="none" stroke="black" d="M4862,-3167C4862,-2762.54 4789.57,-2285.87 4753.68,-2074.48"/> +<polygon fill="black" stroke="black" points="4757.13,-2073.88 4752,-2064.61 4750.23,-2075.06 4757.13,-2073.88"/> +</g> +<!-- ProbeXRef --> +<g id="node24" class="node"> +<title>ProbeXRef</title> +<polygon fill="white" stroke="transparent" points="4805,-4842.5 4805,-4932.5 4969,-4932.5 4969,-4842.5 4805,-4842.5"/> +<polygon fill="#df65b0" stroke="transparent" points="4808,-4908.5 4808,-4929.5 4966,-4929.5 4966,-4908.5 4808,-4908.5"/> +<polygon fill="none" stroke="black" points="4808,-4908.5 4808,-4929.5 4966,-4929.5 4966,-4908.5 4808,-4908.5"/> +<text text-anchor="start" x="4811" y="-4915.3" font-family="Times,serif" font-size="14.00">ProbeXRef (229 MiB)</text> +<text text-anchor="start" x="4862.5" y="-4893.3" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="4834" y="-4872.3" font-family="Times,serif" font-size="14.00">ProbeFreezeId</text> +<text text-anchor="start" x="4858.5" y="-4851.3" font-family="Times,serif" font-size="14.00">ProbeId</text> +<polygon fill="none" stroke="black" points="4805,-4842.5 4805,-4932.5 4969,-4932.5 4969,-4842.5 4805,-4842.5"/> +</g> +<!-- Probe --> +<g id="node41" class="node"> +<title>Probe</title> +<polygon fill="white" stroke="transparent" points="6860.5,-3186 6860.5,-3402 6969.5,-3402 6969.5,-3186 6860.5,-3186"/> +<polygon fill="#ce1256" stroke="transparent" points="6864,-3378 6864,-3399 6967,-3399 6967,-3378 6864,-3378"/> +<polygon fill="none" stroke="black" points="6864,-3378 6864,-3399 6967,-3399 6967,-3378 6864,-3378"/> +<text text-anchor="start" x="6867" y="-3384.8" font-family="Times,serif" font-size="14.00">Probe (2 GiB)</text> +<text text-anchor="start" x="6891" y="-3362.8" font-family="Times,serif" font-size="14.00">E_GSB</text> +<text text-anchor="start" x="6890.5" y="-3341.8" font-family="Times,serif" font-size="14.00">E_NSB</text> +<text text-anchor="start" x="6887" y="-3320.8" font-family="Times,serif" font-size="14.00">ExonNo</text> +<text text-anchor="start" x="6908" y="-3299.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="6894" y="-3278.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="6875" y="-3257.8" font-family="Times,serif" font-size="14.00">ProbeSetId</text> +<text text-anchor="start" x="6880.5" y="-3236.8" font-family="Times,serif" font-size="14.00">Sequence</text> +<text text-anchor="start" x="6873" y="-3215.8" font-family="Times,serif" font-size="14.00">SerialOrder</text> +<text text-anchor="start" x="6904" y="-3194.8" font-family="Times,serif" font-size="14.00">Tm</text> +<polygon fill="none" stroke="black" points="6860.5,-3186 6860.5,-3402 6969.5,-3402 6969.5,-3186 6860.5,-3186"/> +</g> +<!-- ProbeXRef->Probe --> +<g id="edge23" class="edge"> +<title>ProbeXRef:ProbeId->Probe</title> +<path fill="none" stroke="black" d="M4967,-4854.5C5534.68,-4854.5 5262.79,-4114.96 5771,-3862 5877.2,-3809.14 6749.63,-3905.13 6838,-3826 6950.47,-3725.29 6951.4,-3539.28 6936.93,-3416.33"/> +<polygon fill="black" stroke="black" points="6940.37,-3415.61 6935.68,-3406.11 6933.42,-3416.47 6940.37,-3415.61"/> +</g> +<!-- ProbeXRef->ProbeFreeze --> +<g id="edge22" class="edge"> +<title>ProbeXRef:ProbeFreezeId->ProbeFreeze</title> +<path fill="none" stroke="black" d="M4807,-4875.5C3968.98,-4875.5 3960.35,-4248.91 3217,-3862 3179.88,-3842.68 3157.46,-3857.58 3130,-3826 2809.52,-3457.41 3148.75,-3152.22 2855,-2762 2836.07,-2736.85 2811.36,-2752.26 2794,-2726 2665.13,-2531.04 2665.79,-2246.15 2679.06,-2085.66"/> +<polygon fill="black" stroke="black" points="2682.59,-2085.53 2679.95,-2075.27 2675.61,-2084.93 2682.59,-2085.53"/> +</g> +<!-- Publication_Test --> +<g id="node25" class="node"> +<title>Publication_Test</title> +<polygon fill="white" stroke="transparent" points="8610.5,-4769 8610.5,-5006 8797.5,-5006 8797.5,-4769 8610.5,-4769"/> +<polygon fill="#df65b0" stroke="transparent" points="8614,-4981.5 8614,-5002.5 8795,-5002.5 8795,-4981.5 8614,-4981.5"/> +<polygon fill="none" stroke="black" points="8614,-4981.5 8614,-5002.5 8795,-5002.5 8795,-4981.5 8614,-4981.5"/> +<text text-anchor="start" x="8617" y="-4988.3" font-family="Times,serif" font-size="14.00">Publication_Test (7 MiB)</text> +<text text-anchor="start" x="8674" y="-4966.3" font-family="Times,serif" font-size="14.00">Abstract</text> +<text text-anchor="start" x="8676" y="-4945.3" font-family="Times,serif" font-size="14.00">Authors</text> +<text text-anchor="start" x="8697" y="-4924.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="8678.5" y="-4903.3" font-family="Times,serif" font-size="14.00">Journal</text> +<text text-anchor="start" x="8681" y="-4882.3" font-family="Times,serif" font-size="14.00">Month</text> +<text text-anchor="start" x="8683" y="-4861.3" font-family="Times,serif" font-size="14.00">Pages</text> +<text text-anchor="start" x="8663" y="-4840.3" font-family="Times,serif" font-size="14.00">PubMed_ID</text> +<text text-anchor="start" x="8688" y="-4819.3" font-family="Times,serif" font-size="14.00">Title</text> +<text text-anchor="start" x="8678" y="-4798.3" font-family="Times,serif" font-size="14.00">Volume</text> +<text text-anchor="start" x="8688.5" y="-4777.3" font-family="Times,serif" font-size="14.00">Year</text> +<polygon fill="none" stroke="black" points="8610.5,-4769 8610.5,-5006 8797.5,-5006 8797.5,-4769 8610.5,-4769"/> +</g> +<!-- DBList --> +<g id="node26" class="node"> +<title>DBList</title> +<polygon fill="white" stroke="transparent" points="8301,-4821.5 8301,-4953.5 8425,-4953.5 8425,-4821.5 8301,-4821.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="8304,-4929.5 8304,-4950.5 8422,-4950.5 8422,-4929.5 8304,-4929.5"/> +<polygon fill="none" stroke="black" points="8304,-4929.5 8304,-4950.5 8422,-4950.5 8422,-4929.5 8304,-4929.5"/> +<text text-anchor="start" x="8307" y="-4936.3" font-family="Times,serif" font-size="14.00">DBList (99 KiB)</text> +<text text-anchor="start" x="8344.5" y="-4914.3" font-family="Times,serif" font-size="14.00">Code</text> +<text text-anchor="start" x="8327.5" y="-4893.3" font-family="Times,serif" font-size="14.00">DBTypeId</text> +<text text-anchor="start" x="8331" y="-4872.3" font-family="Times,serif" font-size="14.00">FreezeId</text> +<text text-anchor="start" x="8355.5" y="-4851.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="8341.5" y="-4830.3" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="8301,-4821.5 8301,-4953.5 8425,-4953.5 8425,-4821.5 8301,-4821.5"/> +</g> +<!-- DBList->DBType --> +<g id="edge24" class="edge"> +<title>DBList:DBTypeId->DBType</title> +<path fill="none" stroke="black" d="M8423,-4897.5C8462.94,-4897.5 8383.01,-3608.94 8366.07,-3342.76"/> +<polygon fill="black" stroke="black" points="8369.55,-3342.4 8365.42,-3332.64 8362.57,-3342.84 8369.55,-3342.4"/> +</g> +<!-- H2 --> +<g id="node27" class="node"> +<title>H2</title> +<polygon fill="white" stroke="transparent" points="8831.5,-4832 8831.5,-4943 8922.5,-4943 8922.5,-4832 8831.5,-4832"/> +<polygon fill="#df65b0" stroke="transparent" points="8835,-4918.5 8835,-4939.5 8920,-4939.5 8920,-4918.5 8835,-4918.5"/> +<polygon fill="none" stroke="black" points="8835,-4918.5 8835,-4939.5 8920,-4939.5 8920,-4918.5 8835,-4918.5"/> +<text text-anchor="start" x="8838" y="-4925.3" font-family="Times,serif" font-size="14.00">H2 (2 MiB)</text> +<text text-anchor="start" x="8853" y="-4903.3" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="8856.5" y="-4882.3" font-family="Times,serif" font-size="14.00">H2SE</text> +<text text-anchor="start" x="8856" y="-4861.3" font-family="Times,serif" font-size="14.00">HPH2</text> +<text text-anchor="start" x="8859" y="-4840.3" font-family="Times,serif" font-size="14.00">ICH2</text> +<polygon fill="none" stroke="black" points="8831.5,-4832 8831.5,-4943 8922.5,-4943 8922.5,-4832 8831.5,-4832"/> +</g> +<!-- InbredSet->Species --> +<g id="edge25" class="edge"> +<title>InbredSet:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M3784,-767C3728.83,-767 3795.51,-561.36 3759,-520 3641.66,-387.09 3085.79,-325.05 2876.21,-306.09"/> +<polygon fill="black" stroke="black" points="2876.47,-302.6 2866.2,-305.19 2875.85,-309.57 2876.47,-302.6"/> +</g> +<!-- DatasetMapInvestigator --> +<g id="node29" class="node"> +<title>DatasetMapInvestigator</title> +<polygon fill="white" stroke="transparent" points="8,-1918 8,-2008 258,-2008 258,-1918 8,-1918"/> +<polygon fill="#d7b5d8" stroke="transparent" points="11,-1984 11,-2005 255,-2005 255,-1984 11,-1984"/> +<polygon fill="none" stroke="black" points="11,-1984 11,-2005 255,-2005 255,-1984 11,-1984"/> +<text text-anchor="start" x="14" y="-1990.8" font-family="Times,serif" font-size="14.00">DatasetMapInvestigator (28 KiB)</text> +<text text-anchor="start" x="98" y="-1968.8" font-family="Times,serif" font-size="14.00">DatasetId</text> +<text text-anchor="start" x="125.5" y="-1947.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="82" y="-1926.8" font-family="Times,serif" font-size="14.00">InvestigatorId</text> +<polygon fill="none" stroke="black" points="8,-1918 8,-2008 258,-2008 258,-1918 8,-1918"/> +</g> +<!-- DatasetMapInvestigator->Datasets --> +<g id="edge26" class="edge"> +<title>DatasetMapInvestigator:DatasetId->Datasets</title> +<path fill="none" stroke="black" d="M256,-1973C277.48,-1973 271.49,-1221.19 275,-1200 283.9,-1146.31 298.97,-1089.52 315.22,-1037.42"/> +<polygon fill="black" stroke="black" points="318.6,-1038.33 318.27,-1027.74 311.93,-1036.23 318.6,-1038.33"/> +</g> +<!-- DatasetMapInvestigator->Investigators --> +<g id="edge27" class="edge"> +<title>DatasetMapInvestigator:InvestigatorId->Investigators</title> +<path fill="none" stroke="black" d="M133,-1920C133,-1405.22 153.42,-798.72 165.08,-494.41"/> +<polygon fill="black" stroke="black" points="168.59,-494.29 165.48,-484.16 161.59,-494.02 168.59,-494.29"/> +</g> +<!-- Docs --> +<g id="node30" class="node"> +<title>Docs</title> +<polygon fill="white" stroke="transparent" points="8956.5,-4832 8956.5,-4943 9075.5,-4943 9075.5,-4832 8956.5,-4832"/> +<polygon fill="#d7b5d8" stroke="transparent" points="8960,-4918.5 8960,-4939.5 9073,-4939.5 9073,-4918.5 8960,-4918.5"/> +<polygon fill="none" stroke="black" points="8960,-4918.5 8960,-4939.5 9073,-4939.5 9073,-4918.5 8960,-4918.5"/> +<text text-anchor="start" x="8963" y="-4925.3" font-family="Times,serif" font-size="14.00">Docs (148 KiB)</text> +<text text-anchor="start" x="8989" y="-4903.3" font-family="Times,serif" font-size="14.00">content</text> +<text text-anchor="start" x="8997" y="-4882.3" font-family="Times,serif" font-size="14.00">entry</text> +<text text-anchor="start" x="9009.5" y="-4861.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="9001.5" y="-4840.3" font-family="Times,serif" font-size="14.00">title</text> +<polygon fill="none" stroke="black" points="8956.5,-4832 8956.5,-4943 9075.5,-4943 9075.5,-4832 8956.5,-4832"/> +</g> +<!-- Phenotype --> +<g id="node31" class="node"> +<title>Phenotype</title> +<polygon fill="lightgrey" stroke="transparent" points="2910,-713 2910,-971 3134,-971 3134,-713 2910,-713"/> +<polygon fill="#df65b0" stroke="transparent" points="2913,-947 2913,-968 3131,-968 3131,-947 2913,-947"/> +<polygon fill="none" stroke="black" points="2913,-947 2913,-968 3131,-968 3131,-947 2913,-947"/> +<text text-anchor="start" x="2955" y="-953.8" font-family="Times,serif" font-size="14.00">Phenotype (9 MiB)</text> +<text text-anchor="start" x="3014.5" y="-931.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2915" y="-910.8" font-family="Times,serif" font-size="14.00">Post_publication_abbreviation</text> +<text text-anchor="start" x="2918" y="-889.8" font-family="Times,serif" font-size="14.00">Pre_publication_abbreviation</text> +<polygon fill="green" stroke="transparent" points="2913,-863 2913,-882 3131,-882 3131,-863 2913,-863"/> +<text text-anchor="start" x="2958.5" y="-868.8" font-family="Times,serif" font-size="14.00">Authorized_Users</text> +<polygon fill="green" stroke="transparent" points="2913,-842 2913,-861 3131,-861 3131,-842 2913,-842"/> +<text text-anchor="start" x="2988.5" y="-847.8" font-family="Times,serif" font-size="14.00">Lab_code</text> +<polygon fill="green" stroke="transparent" points="2913,-821 2913,-840 3131,-840 3131,-821 2913,-821"/> +<text text-anchor="start" x="2949.5" y="-826.8" font-family="Times,serif" font-size="14.00">Original_description</text> +<polygon fill="green" stroke="transparent" points="2913,-800 2913,-819 3131,-819 3131,-800 2913,-800"/> +<text text-anchor="start" x="2998" y="-805.8" font-family="Times,serif" font-size="14.00">Owner</text> +<polygon fill="green" stroke="transparent" points="2913,-779 2913,-798 3131,-798 3131,-779 2913,-779"/> +<text text-anchor="start" x="2919.5" y="-784.8" font-family="Times,serif" font-size="14.00">Post_publication_description</text> +<polygon fill="green" stroke="transparent" points="2913,-758 2913,-777 3131,-777 3131,-758 2913,-758"/> +<text text-anchor="start" x="2922.5" y="-763.8" font-family="Times,serif" font-size="14.00">Pre_publication_description</text> +<polygon fill="green" stroke="transparent" points="2913,-737 2913,-756 3131,-756 3131,-737 2913,-737"/> +<text text-anchor="start" x="2985.5" y="-742.8" font-family="Times,serif" font-size="14.00">Submitter</text> +<polygon fill="green" stroke="transparent" points="2913,-716 2913,-735 3131,-735 3131,-716 2913,-716"/> +<text text-anchor="start" x="3002" y="-721.8" font-family="Times,serif" font-size="14.00">Units</text> +<polygon fill="none" stroke="black" points="2910,-713 2910,-971 3134,-971 3134,-713 2910,-713"/> +</g> +<!-- SnpPattern --> +<g id="node32" class="node"> +<title>SnpPattern</title> +<polygon fill="white" stroke="transparent" points="9110,-3866 9110,-5909 9294,-5909 9294,-3866 9110,-3866"/> +<polygon fill="#ce1256" stroke="transparent" points="9113,-5884.5 9113,-5905.5 9291,-5905.5 9291,-5884.5 9113,-5884.5"/> +<polygon fill="none" stroke="black" points="9113,-5884.5 9113,-5905.5 9291,-5905.5 9291,-5884.5 9113,-5884.5"/> +<text text-anchor="start" x="9134" y="-5891.3" font-family="Times,serif" font-size="14.00">SnpPattern (8 GiB)</text> +<text text-anchor="start" x="9150.5" y="-5869.3" font-family="Times,serif" font-size="14.00">129P2/OlaHsd</text> +<text text-anchor="start" x="9155.5" y="-5848.3" font-family="Times,serif" font-size="14.00">129S1/SvImJ</text> +<text text-anchor="start" x="9153.5" y="-5827.3" font-family="Times,serif" font-size="14.00">129S2/SvHsd</text> +<text text-anchor="start" x="9156.5" y="-5806.3" font-family="Times,serif" font-size="14.00">129S4/SvJae</text> +<text text-anchor="start" x="9145" y="-5785.3" font-family="Times,serif" font-size="14.00">129S5/SvEvBrd</text> +<text text-anchor="start" x="9158" y="-5764.3" font-family="Times,serif" font-size="14.00">129S6/SvEv</text> +<text text-anchor="start" x="9149.5" y="-5743.3" font-family="Times,serif" font-size="14.00">129T2/SvEmsJ</text> +<text text-anchor="start" x="9165" y="-5722.3" font-family="Times,serif" font-size="14.00">129X1/SvJ</text> +<text text-anchor="start" x="9192" y="-5701.3" font-family="Times,serif" font-size="14.00">A/J</text> +<text text-anchor="start" x="9181.5" y="-5680.3" font-family="Times,serif" font-size="14.00">AKR/J</text> +<text text-anchor="start" x="9115" y="-5659.3" font-family="Times,serif" font-size="14.00">B6A6_Esline_Regeneron</text> +<text text-anchor="start" x="9164" y="-5638.3" font-family="Times,serif" font-size="14.00">BALB/cByJ</text> +<text text-anchor="start" x="9173" y="-5617.3" font-family="Times,serif" font-size="14.00">BALB/cJ</text> +<text text-anchor="start" x="9176" y="-5596.3" font-family="Times,serif" font-size="14.00">BPH/2J</text> +<text text-anchor="start" x="9177.5" y="-5575.3" font-family="Times,serif" font-size="14.00">BPL/1J</text> +<text text-anchor="start" x="9176" y="-5554.3" font-family="Times,serif" font-size="14.00">BPN/3J</text> +<text text-anchor="start" x="9148.5" y="-5533.3" font-family="Times,serif" font-size="14.00">BTBRT<+>tf/J</text> +<text text-anchor="start" x="9170.5" y="-5512.3" font-family="Times,serif" font-size="14.00">BUB/BnJ</text> +<text text-anchor="start" x="9135.5" y="-5491.3" font-family="Times,serif" font-size="14.00">C2T1_Esline_Nagy</text> +<text text-anchor="start" x="9171" y="-5470.3" font-family="Times,serif" font-size="14.00">C3H/HeJ</text> +<text text-anchor="start" x="9163" y="-5449.3" font-family="Times,serif" font-size="14.00">C3HeB/FeJ</text> +<text text-anchor="start" x="9164" y="-5428.3" font-family="Times,serif" font-size="14.00">C57BL/10J</text> +<text text-anchor="start" x="9159" y="-5407.3" font-family="Times,serif" font-size="14.00">C57BL/6ByJ</text> +<text text-anchor="start" x="9168.5" y="-5386.3" font-family="Times,serif" font-size="14.00">C57BL/6J</text> +<text text-anchor="start" x="9140" y="-5365.3" font-family="Times,serif" font-size="14.00">C57BL/6JBomTac</text> +<text text-anchor="start" x="9157.5" y="-5344.3" font-family="Times,serif" font-size="14.00">C57BL/6JCrl</text> +<text text-anchor="start" x="9142" y="-5323.3" font-family="Times,serif" font-size="14.00">C57BL/6JOlaHsd</text> +<text text-anchor="start" x="9154" y="-5302.3" font-family="Times,serif" font-size="14.00">C57BL/6NCrl</text> +<text text-anchor="start" x="9150.5" y="-5281.3" font-family="Times,serif" font-size="14.00">C57BL/6NHsd</text> +<text text-anchor="start" x="9162.5" y="-5260.3" font-family="Times,serif" font-size="14.00">C57BL/6NJ</text> +<text text-anchor="start" x="9150.5" y="-5239.3" font-family="Times,serif" font-size="14.00">C57BL/6NNIH</text> +<text text-anchor="start" x="9153" y="-5218.3" font-family="Times,serif" font-size="14.00">C57BL/6NTac</text> +<text text-anchor="start" x="9162.5" y="-5197.3" font-family="Times,serif" font-size="14.00">C57BLKS/J</text> +<text text-anchor="start" x="9164" y="-5176.3" font-family="Times,serif" font-size="14.00">C57BR/cdJ</text> +<text text-anchor="start" x="9178" y="-5155.3" font-family="Times,serif" font-size="14.00">C57L/J</text> +<text text-anchor="start" x="9182.5" y="-5134.3" font-family="Times,serif" font-size="14.00">C58/J</text> +<text text-anchor="start" x="9167.5" y="-5113.3" font-family="Times,serif" font-size="14.00">CALB/RkJ</text> +<text text-anchor="start" x="9170" y="-5092.3" font-family="Times,serif" font-size="14.00">CAST/EiJ</text> +<text text-anchor="start" x="9181.5" y="-5071.3" font-family="Times,serif" font-size="14.00">CBA/J</text> +<text text-anchor="start" x="9186.5" y="-5050.3" font-family="Times,serif" font-size="14.00">CE/J</text> +<text text-anchor="start" x="9157.5" y="-5029.3" font-family="Times,serif" font-size="14.00">CZECHII/EiJ</text> +<text text-anchor="start" x="9176.5" y="-5008.3" font-family="Times,serif" font-size="14.00">DBA/1J</text> +<text text-anchor="start" x="9176.5" y="-4987.3" font-family="Times,serif" font-size="14.00">DBA/2J</text> +<text text-anchor="start" x="9170.5" y="-4966.3" font-family="Times,serif" font-size="14.00">DDK/Pas</text> +<text text-anchor="start" x="9135.5" y="-4945.3" font-family="Times,serif" font-size="14.00">DDY/JclSidSeyFrkJ</text> +<text text-anchor="start" x="9148.5" y="-4924.3" font-family="Times,serif" font-size="14.00">EL/SuzSeyFrkJ</text> +<text text-anchor="start" x="9183.5" y="-4903.3" font-family="Times,serif" font-size="14.00">Fline</text> +<text text-anchor="start" x="9176" y="-4882.3" font-family="Times,serif" font-size="14.00">FVB/NJ</text> +<text text-anchor="start" x="9154" y="-4861.3" font-family="Times,serif" font-size="14.00">HTG/GoSfSnJ</text> +<text text-anchor="start" x="9185" y="-4840.3" font-family="Times,serif" font-size="14.00">I/LnJ</text> +<text text-anchor="start" x="9162.5" y="-4819.3" font-family="Times,serif" font-size="14.00">ILS/IbgTejJ</text> +<text text-anchor="start" x="9164" y="-4798.3" font-family="Times,serif" font-size="14.00">IS/CamRkJ</text> +<text text-anchor="start" x="9162.5" y="-4777.3" font-family="Times,serif" font-size="14.00">ISS/IbgTejJ</text> +<text text-anchor="start" x="9176.5" y="-4756.3" font-family="Times,serif" font-size="14.00">JF1/Ms</text> +<text text-anchor="start" x="9178" y="-4735.3" font-family="Times,serif" font-size="14.00">KK/HlJ</text> +<text text-anchor="start" x="9162.5" y="-4714.3" font-family="Times,serif" font-size="14.00">LEWES/EiJ</text> +<text text-anchor="start" x="9186.5" y="-4693.3" font-family="Times,serif" font-size="14.00">LG/J</text> +<text text-anchor="start" x="9184" y="-4672.3" font-family="Times,serif" font-size="14.00">Lline</text> +<text text-anchor="start" x="9187.5" y="-4651.3" font-family="Times,serif" font-size="14.00">LP/J</text> +<text text-anchor="start" x="9173.5" y="-4630.3" font-family="Times,serif" font-size="14.00">MA/MyJ</text> +<text text-anchor="start" x="9172.5" y="-4609.3" font-family="Times,serif" font-size="14.00">MAI/Pas</text> +<text text-anchor="start" x="9167" y="-4588.3" font-family="Times,serif" font-size="14.00">MOLF/EiJ</text> +<text text-anchor="start" x="9164" y="-4567.3" font-family="Times,serif" font-size="14.00">MOLG/DnJ</text> +<text text-anchor="start" x="9168.5" y="-4546.3" font-family="Times,serif" font-size="14.00">MRL/MpJ</text> +<text text-anchor="start" x="9169.5" y="-4525.3" font-family="Times,serif" font-size="14.00">MSM/Ms</text> +<text text-anchor="start" x="9160.5" y="-4504.3" font-family="Times,serif" font-size="14.00">NOD/ShiLtJ</text> +<text text-anchor="start" x="9171.5" y="-4483.3" font-family="Times,serif" font-size="14.00">NON/LtJ</text> +<text text-anchor="start" x="9172.5" y="-4462.3" font-family="Times,serif" font-size="14.00">NOR/LtJ</text> +<text text-anchor="start" x="9167" y="-4441.3" font-family="Times,serif" font-size="14.00">NZB/BlNJ</text> +<text text-anchor="start" x="9174" y="-4420.3" font-family="Times,serif" font-size="14.00">NZL/LtJ</text> +<text text-anchor="start" x="9164.5" y="-4399.3" font-family="Times,serif" font-size="14.00">NZO/HlLtJ</text> +<text text-anchor="start" x="9166.5" y="-4378.3" font-family="Times,serif" font-size="14.00">NZW/LacJ</text> +<text text-anchor="start" x="9187" y="-4357.3" font-family="Times,serif" font-size="14.00">O20</text> +<text text-anchor="start" x="9192" y="-4336.3" font-family="Times,serif" font-size="14.00">P/J</text> +<text text-anchor="start" x="9169" y="-4315.3" font-family="Times,serif" font-size="14.00">PERA/EiJ</text> +<text text-anchor="start" x="9168.5" y="-4294.3" font-family="Times,serif" font-size="14.00">PERC/EiJ</text> +<text text-anchor="start" x="9187.5" y="-4273.3" font-family="Times,serif" font-size="14.00">PL/J</text> +<text text-anchor="start" x="9170" y="-4252.3" font-family="Times,serif" font-size="14.00">PWD/PhJ</text> +<text text-anchor="start" x="9170" y="-4231.3" font-family="Times,serif" font-size="14.00">PWK/PhJ</text> +<text text-anchor="start" x="9185.5" y="-4210.3" font-family="Times,serif" font-size="14.00">Qsi5</text> +<text text-anchor="start" x="9171.5" y="-4189.3" font-family="Times,serif" font-size="14.00">RBA/DnJ</text> +<text text-anchor="start" x="9186.5" y="-4168.3" font-family="Times,serif" font-size="14.00">RF/J</text> +<text text-anchor="start" x="9179" y="-4147.3" font-family="Times,serif" font-size="14.00">RIIIS/J</text> +<text text-anchor="start" x="9171.5" y="-4126.3" font-family="Times,serif" font-size="14.00">SEA/GnJ</text> +<text text-anchor="start" x="9171.5" y="-4105.3" font-family="Times,serif" font-size="14.00">SEG/Pas</text> +<text text-anchor="start" x="9185" y="-4084.3" font-family="Times,serif" font-size="14.00">SJL/J</text> +<text text-anchor="start" x="9166.5" y="-4063.3" font-family="Times,serif" font-size="14.00">SKIVE/EiJ</text> +<text text-anchor="start" x="9185" y="-4042.3" font-family="Times,serif" font-size="14.00">SM/J</text> +<text text-anchor="start" x="9180.5" y="-4021.3" font-family="Times,serif" font-size="14.00">SnpId</text> +<text text-anchor="start" x="9168.5" y="-4000.3" font-family="Times,serif" font-size="14.00">SOD1/EiJ</text> +<text text-anchor="start" x="9164.5" y="-3979.3" font-family="Times,serif" font-size="14.00">SPRET/EiJ</text> +<text text-anchor="start" x="9183" y="-3958.3" font-family="Times,serif" font-size="14.00">ST/bJ</text> +<text text-anchor="start" x="9179.5" y="-3937.3" font-family="Times,serif" font-size="14.00">SWR/J</text> +<text text-anchor="start" x="9151.5" y="-3916.3" font-family="Times,serif" font-size="14.00">TALLYHO/JngJ</text> +<text text-anchor="start" x="9172" y="-3895.3" font-family="Times,serif" font-size="14.00">WSB/EiJ</text> +<text text-anchor="start" x="9153" y="-3874.3" font-family="Times,serif" font-size="14.00">ZALENDE/EiJ</text> +<polygon fill="none" stroke="black" points="9110,-3866 9110,-5909 9294,-5909 9294,-3866 9110,-3866"/> +</g> +<!-- AccessLog --> +<g id="node34" class="node"> +<title>AccessLog</title> +<polygon fill="white" stroke="transparent" points="9328,-4842.5 9328,-4932.5 9482,-4932.5 9482,-4842.5 9328,-4842.5"/> +<polygon fill="#df65b0" stroke="transparent" points="9331,-4908.5 9331,-4929.5 9479,-4929.5 9479,-4908.5 9331,-4908.5"/> +<polygon fill="none" stroke="black" points="9331,-4908.5 9331,-4929.5 9479,-4929.5 9479,-4908.5 9331,-4908.5"/> +<text text-anchor="start" x="9334" y="-4915.3" font-family="Times,serif" font-size="14.00">AccessLog (46 MiB)</text> +<text text-anchor="start" x="9365.5" y="-4893.3" font-family="Times,serif" font-size="14.00">accesstime</text> +<text text-anchor="start" x="9398" y="-4872.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="9366.5" y="-4851.3" font-family="Times,serif" font-size="14.00">ip_address</text> +<polygon fill="none" stroke="black" points="9328,-4842.5 9328,-4932.5 9482,-4932.5 9482,-4842.5 9328,-4842.5"/> +</g> +<!-- GeneRIF --> +<g id="node35" class="node"> +<title>GeneRIF</title> +<polygon fill="white" stroke="transparent" points="3576.5,-692 3576.5,-992 3709.5,-992 3709.5,-692 3576.5,-692"/> +<polygon fill="#df65b0" stroke="transparent" points="3580,-968 3580,-989 3707,-989 3707,-968 3580,-968"/> +<polygon fill="none" stroke="black" points="3580,-968 3580,-989 3707,-989 3707,-968 3580,-968"/> +<text text-anchor="start" x="3583" y="-974.8" font-family="Times,serif" font-size="14.00">GeneRIF (2 MiB)</text> +<text text-anchor="start" x="3610" y="-952.8" font-family="Times,serif" font-size="14.00">comment</text> +<text text-anchor="start" x="3604.5" y="-931.8" font-family="Times,serif" font-size="14.00">createtime</text> +<text text-anchor="start" x="3617.5" y="-910.8" font-family="Times,serif" font-size="14.00">display</text> +<text text-anchor="start" x="3623.5" y="-889.8" font-family="Times,serif" font-size="14.00">email</text> +<text text-anchor="start" x="3636" y="-868.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3622.5" y="-847.8" font-family="Times,serif" font-size="14.00">initial</text> +<text text-anchor="start" x="3602" y="-826.8" font-family="Times,serif" font-size="14.00">PubMed_ID</text> +<text text-anchor="start" x="3619" y="-805.8" font-family="Times,serif" font-size="14.00">reason</text> +<text text-anchor="start" x="3608.5" y="-784.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="3617.5" y="-763.8" font-family="Times,serif" font-size="14.00">symbol</text> +<text text-anchor="start" x="3617.5" y="-742.8" font-family="Times,serif" font-size="14.00">user_ip</text> +<text text-anchor="start" x="3610" y="-721.8" font-family="Times,serif" font-size="14.00">versionId</text> +<text text-anchor="start" x="3618.5" y="-700.8" font-family="Times,serif" font-size="14.00">weburl</text> +<polygon fill="none" stroke="black" points="3576.5,-692 3576.5,-992 3709.5,-992 3709.5,-692 3576.5,-692"/> +</g> +<!-- GeneRIF->Species --> +<g id="edge28" class="edge"> +<title>GeneRIF:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M3579,-788C3549.14,-788 3577.82,-543.18 3559,-520 3471.93,-412.76 3053.77,-338.32 2876.12,-311.02"/> +<polygon fill="black" stroke="black" points="2876.46,-307.54 2866.05,-309.49 2875.41,-314.46 2876.46,-307.54"/> +</g> +<!-- ProbeData --> +<g id="node36" class="node"> +<title>ProbeData</title> +<polygon fill="white" stroke="transparent" points="5291,-1918 5291,-2008 5443,-2008 5443,-1918 5291,-1918"/> +<polygon fill="#ce1256" stroke="transparent" points="5294,-1984 5294,-2005 5440,-2005 5440,-1984 5294,-1984"/> +<polygon fill="none" stroke="black" points="5294,-1984 5294,-2005 5440,-2005 5440,-1984 5294,-1984"/> +<text text-anchor="start" x="5297" y="-1990.8" font-family="Times,serif" font-size="14.00">ProbeData (10 GiB)</text> +<text text-anchor="start" x="5359.5" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="5337.5" y="-1947.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="5347.5" y="-1926.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="5291,-1918 5291,-2008 5443,-2008 5443,-1918 5291,-1918"/> +</g> +<!-- ProbeData->Strain --> +<g id="edge29" class="edge"> +<title>ProbeData:StrainId->Strain</title> +<path fill="none" stroke="black" d="M5441,-1951C5461.87,-1951 5451.21,-1219.36 5459,-1200 5511.05,-1070.73 5632.85,-959.15 5712.21,-896.58"/> +<polygon fill="black" stroke="black" points="5714.51,-899.22 5720.23,-890.3 5710.2,-893.71 5714.51,-899.22"/> +</g> +<!-- AvgMethod --> +<g id="node37" class="node"> +<title>AvgMethod</title> +<polygon fill="lightgrey" stroke="transparent" points="982.5,-786.5 982.5,-897.5 1133.5,-897.5 1133.5,-786.5 982.5,-786.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="986,-873 986,-894 1131,-894 1131,-873 986,-873"/> +<polygon fill="none" stroke="black" points="986,-873 986,-894 1131,-894 1131,-873 986,-873"/> +<text text-anchor="start" x="989" y="-879.8" font-family="Times,serif" font-size="14.00">AvgMethod (792 B)</text> +<text text-anchor="start" x="1010" y="-857.8" font-family="Times,serif" font-size="14.00">AvgMethodId</text> +<text text-anchor="start" x="1051" y="-836.8" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="green" stroke="transparent" points="986,-810 986,-829 1131,-829 1131,-810 986,-810"/> +<text text-anchor="start" x="1037" y="-815.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="1007.5" y="-794.8" font-family="Times,serif" font-size="14.00">Normalization</text> +<polygon fill="none" stroke="black" points="982.5,-786.5 982.5,-897.5 1133.5,-897.5 1133.5,-786.5 982.5,-786.5"/> +</g> +<!-- GeneRIFXRef --> +<g id="node38" class="node"> +<title>GeneRIFXRef</title> +<polygon fill="white" stroke="transparent" points="3003,-1918 3003,-2008 3175,-2008 3175,-1918 3003,-1918"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3006,-1984 3006,-2005 3172,-2005 3172,-1984 3006,-1984"/> +<polygon fill="none" stroke="black" points="3006,-1984 3006,-2005 3172,-2005 3172,-1984 3006,-1984"/> +<text text-anchor="start" x="3009" y="-1990.8" font-family="Times,serif" font-size="14.00">GeneRIFXRef (82 KiB)</text> +<text text-anchor="start" x="3030.5" y="-1968.8" font-family="Times,serif" font-size="14.00">GeneCategoryId</text> +<text text-anchor="start" x="3050.5" y="-1947.8" font-family="Times,serif" font-size="14.00">GeneRIFId</text> +<text text-anchor="start" x="3055.5" y="-1926.8" font-family="Times,serif" font-size="14.00">versionId</text> +<polygon fill="none" stroke="black" points="3003,-1918 3003,-2008 3175,-2008 3175,-1918 3003,-1918"/> +</g> +<!-- GeneRIFXRef->GeneRIF --> +<g id="edge31" class="edge"> +<title>GeneRIFXRef:GeneRIFId->GeneRIF</title> +<path fill="none" stroke="black" d="M3173,-1951C3214.74,-1951 3168.49,-1230.49 3197,-1200 3252.21,-1140.95 3497.53,-1216.51 3559,-1164 3604.75,-1124.91 3627.15,-1064.28 3637.64,-1006.19"/> +<polygon fill="black" stroke="black" points="3641.12,-1006.59 3639.34,-996.14 3634.22,-1005.42 3641.12,-1006.59"/> +</g> +<!-- GeneCategory --> +<g id="node73" class="node"> +<title>GeneCategory</title> +<polygon fill="white" stroke="transparent" points="3373.5,-807.5 3373.5,-876.5 3542.5,-876.5 3542.5,-807.5 3373.5,-807.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="3377,-852 3377,-873 3540,-873 3540,-852 3377,-852"/> +<polygon fill="none" stroke="black" points="3377,-852 3377,-873 3540,-873 3540,-852 3377,-852"/> +<text text-anchor="start" x="3380" y="-858.8" font-family="Times,serif" font-size="14.00">GeneCategory (5 KiB)</text> +<text text-anchor="start" x="3451" y="-836.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="3437" y="-815.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="3373.5,-807.5 3373.5,-876.5 3542.5,-876.5 3542.5,-807.5 3373.5,-807.5"/> +</g> +<!-- GeneRIFXRef->GeneCategory --> +<g id="edge30" class="edge"> +<title>GeneRIFXRef:GeneCategoryId->GeneCategory</title> +<path fill="none" stroke="black" d="M3173,-1973C3215.97,-1973 3169.76,-1233.22 3197,-1200 3241.84,-1145.31 3299.78,-1211.69 3352,-1164 3430.43,-1092.39 3450.94,-961.62 3456.23,-891.11"/> +<polygon fill="black" stroke="black" points="3459.75,-890.96 3456.93,-880.75 3452.77,-890.49 3459.75,-890.96"/> +</g> +<!-- CaseAttribute --> +<g id="node39" class="node"> +<title>CaseAttribute</title> +<polygon fill="lightgrey" stroke="transparent" points="1168,-797 1168,-887 1334,-887 1334,-797 1168,-797"/> +<polygon fill="#d7b5d8" stroke="transparent" points="1171,-863 1171,-884 1331,-884 1331,-863 1171,-863"/> +<polygon fill="none" stroke="black" points="1171,-863 1171,-884 1331,-884 1331,-863 1171,-863"/> +<text text-anchor="start" x="1174" y="-869.8" font-family="Times,serif" font-size="14.00">CaseAttribute (2 KiB)</text> +<polygon fill="green" stroke="transparent" points="1171,-842 1171,-861 1331,-861 1331,-842 1171,-842"/> +<text text-anchor="start" x="1209.5" y="-847.8" font-family="Times,serif" font-size="14.00">Description</text> +<polygon fill="green" stroke="transparent" points="1171,-821 1171,-840 1331,-840 1331,-821 1171,-821"/> +<text text-anchor="start" x="1243.5" y="-826.8" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="green" stroke="transparent" points="1171,-800 1171,-819 1331,-819 1331,-800 1171,-800"/> +<text text-anchor="start" x="1229.5" y="-805.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="1168,-797 1168,-887 1334,-887 1334,-797 1168,-797"/> +</g> +<!-- Strain->Species --> +<g id="edge32" class="edge"> +<title>Strain:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M5731,-777C5128.52,-777 4994.43,-618.17 4400,-520 3817.59,-423.81 3111.33,-337.05 2876.33,-308.98"/> +<polygon fill="black" stroke="black" points="2876.51,-305.48 2866.17,-307.77 2875.68,-312.43 2876.51,-305.48"/> +</g> +<!-- Probe->ProbeSE --> +<g id="edge33" class="edge"> +<title>Probe:ProbeSetId->ProbeSE</title> +<path fill="none" stroke="black" d="M6968,-3261C6999.5,-3261 7043.75,-2274.36 7054.55,-2022.15"/> +<polygon fill="black" stroke="black" points="7058.05,-2022.23 7054.98,-2012.09 7051.06,-2021.93 7058.05,-2022.23"/> +</g> +<!-- ProbeFreeze->InbredSet --> +<g id="edge34" class="edge"> +<title>ProbeFreeze:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M2775,-1951C2816.74,-1951 2764.69,-1229.71 2794,-1200 2866.79,-1126.23 3641.27,-1223.68 3726,-1164 3778.21,-1127.22 3809.31,-1065.62 3827.82,-1006.16"/> +<polygon fill="black" stroke="black" points="3831.27,-1006.83 3830.79,-996.25 3824.56,-1004.82 3831.27,-1006.83"/> +</g> +<!-- ProbeFreeze->Tissue --> +<g id="edge35" class="edge"> +<title>ProbeFreeze:TissueId->Tissue</title> +<path fill="none" stroke="black" d="M2613,-1867C2575.92,-1867 2609.31,-1231.02 2589,-1200 2568.75,-1169.06 2537.32,-1192.7 2514,-1164 2463.47,-1101.8 2444.56,-1011.96 2437.81,-943.13"/> +<polygon fill="black" stroke="black" points="2441.29,-942.77 2436.9,-933.13 2434.32,-943.41 2441.29,-942.77"/> +</g> +<!-- BXDSnpPosition --> +<g id="node43" class="node"> +<title>BXDSnpPosition</title> +<polygon fill="white" stroke="transparent" points="5476.5,-1886.5 5476.5,-2039.5 5681.5,-2039.5 5681.5,-1886.5 5476.5,-1886.5"/> +<polygon fill="#df65b0" stroke="transparent" points="5480,-2015 5480,-2036 5679,-2036 5679,-2015 5480,-2015"/> +<polygon fill="none" stroke="black" points="5480,-2015 5480,-2036 5679,-2036 5679,-2015 5480,-2015"/> +<text text-anchor="start" x="5483" y="-2021.8" font-family="Times,serif" font-size="14.00">BXDSnpPosition (230 MiB)</text> +<text text-anchor="start" x="5566" y="-1999.8" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="5572.5" y="-1978.8" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="5567.5" y="-1957.8" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="5546" y="-1936.8" font-family="Times,serif" font-size="14.00">Mb_2016</text> +<text text-anchor="start" x="5545.5" y="-1915.8" font-family="Times,serif" font-size="14.00">StrainId1</text> +<text text-anchor="start" x="5545.5" y="-1894.8" font-family="Times,serif" font-size="14.00">StrainId2</text> +<polygon fill="none" stroke="black" points="5476.5,-1886.5 5476.5,-2039.5 5681.5,-2039.5 5681.5,-1886.5 5476.5,-1886.5"/> +</g> +<!-- BXDSnpPosition->Strain --> +<g id="edge36" class="edge"> +<title>BXDSnpPosition:StrainId1->Strain</title> +<path fill="none" stroke="black" d="M5680,-1919C5699.98,-1919 5696.36,-1219.8 5699,-1200 5711.36,-1107.45 5738.02,-1004.03 5758.6,-932.42"/> +<polygon fill="black" stroke="black" points="5762.04,-933.11 5761.46,-922.54 5755.32,-931.17 5762.04,-933.11"/> +</g> +<!-- BXDSnpPosition->Strain --> +<g id="edge37" class="edge"> +<title>BXDSnpPosition:StrainId2->Strain</title> +<path fill="none" stroke="black" d="M5680,-1898C5699.4,-1898 5696.43,-1219.22 5699,-1200 5711.39,-1107.46 5738.05,-1004.03 5758.62,-932.43"/> +<polygon fill="black" stroke="black" points="5762.06,-933.12 5761.48,-922.54 5755.34,-931.17 5762.06,-933.12"/> +</g> +<!-- GeneRIF_BASIC --> +<g id="node44" class="node"> +<title>GeneRIF_BASIC</title> +<polygon fill="white" stroke="transparent" points="531.5,-744.5 531.5,-939.5 734.5,-939.5 734.5,-744.5 531.5,-744.5"/> +<polygon fill="#df65b0" stroke="transparent" points="535,-915 535,-936 732,-936 732,-915 535,-915"/> +<polygon fill="none" stroke="black" points="535,-915 535,-936 732,-936 732,-915 535,-915"/> +<text text-anchor="start" x="538" y="-921.8" font-family="Times,serif" font-size="14.00">GeneRIF_BASIC (275 MiB)</text> +<text text-anchor="start" x="600" y="-899.8" font-family="Times,serif" font-size="14.00">comment</text> +<text text-anchor="start" x="594.5" y="-878.8" font-family="Times,serif" font-size="14.00">createtime</text> +<text text-anchor="start" x="607.5" y="-857.8" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="592" y="-836.8" font-family="Times,serif" font-size="14.00">PubMed_ID</text> +<text text-anchor="start" x="598.5" y="-815.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="607.5" y="-794.8" font-family="Times,serif" font-size="14.00">symbol</text> +<text text-anchor="start" x="612.5" y="-773.8" font-family="Times,serif" font-size="14.00">TaxID</text> +<text text-anchor="start" x="599.5" y="-752.8" font-family="Times,serif" font-size="14.00">VersionId</text> +<polygon fill="none" stroke="black" points="531.5,-744.5 531.5,-939.5 734.5,-939.5 734.5,-744.5 531.5,-744.5"/> +</g> +<!-- GeneRIF_BASIC->Species --> +<g id="edge38" class="edge"> +<title>GeneRIF_BASIC:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M733,-819C766.29,-819 728.98,-544.05 752,-520 890.33,-375.45 2354.35,-314.96 2715.71,-302.17"/> +<polygon fill="black" stroke="black" points="2715.96,-305.66 2725.83,-301.81 2715.71,-298.67 2715.96,-305.66"/> +</g> +<!-- GeneList_rn33 --> +<g id="node45" class="node"> +<title>GeneList_rn33</title> +<polygon fill="white" stroke="transparent" points="9516.5,-4737.5 9516.5,-5037.5 9691.5,-5037.5 9691.5,-4737.5 9516.5,-4737.5"/> +<polygon fill="#df65b0" stroke="transparent" points="9520,-5013.5 9520,-5034.5 9689,-5034.5 9689,-5013.5 9520,-5013.5"/> +<polygon fill="none" stroke="black" points="9520,-5013.5 9520,-5034.5 9689,-5034.5 9689,-5013.5 9520,-5013.5"/> +<text text-anchor="start" x="9523" y="-5020.3" font-family="Times,serif" font-size="14.00">GeneList_rn33 (2 MiB)</text> +<text text-anchor="start" x="9578" y="-4998.3" font-family="Times,serif" font-size="14.00">cdsEnd</text> +<text text-anchor="start" x="9574" y="-4977.3" font-family="Times,serif" font-size="14.00">cdsStart</text> +<text text-anchor="start" x="9559" y="-4956.3" font-family="Times,serif" font-size="14.00">chromosome</text> +<text text-anchor="start" x="9566" y="-4935.3" font-family="Times,serif" font-size="14.00">exonCount</text> +<text text-anchor="start" x="9569.5" y="-4914.3" font-family="Times,serif" font-size="14.00">exonEnds</text> +<text text-anchor="start" x="9565" y="-4893.3" font-family="Times,serif" font-size="14.00">exonStarts</text> +<text text-anchor="start" x="9560.5" y="-4872.3" font-family="Times,serif" font-size="14.00">geneSymbol</text> +<text text-anchor="start" x="9597.5" y="-4851.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="9587.5" y="-4830.3" font-family="Times,serif" font-size="14.00">kgID</text> +<text text-anchor="start" x="9579.5" y="-4809.3" font-family="Times,serif" font-size="14.00">NM_ID</text> +<text text-anchor="start" x="9581" y="-4788.3" font-family="Times,serif" font-size="14.00">strand</text> +<text text-anchor="start" x="9583" y="-4767.3" font-family="Times,serif" font-size="14.00">txEnd</text> +<text text-anchor="start" x="9578.5" y="-4746.3" font-family="Times,serif" font-size="14.00">txStart</text> +<polygon fill="none" stroke="black" points="9516.5,-4737.5 9516.5,-5037.5 9691.5,-5037.5 9691.5,-4737.5 9516.5,-4737.5"/> +</g> +<!-- Geno->Species --> +<g id="edge39" class="edge"> +<title>Geno:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M4247,-704C4089.83,-704 4091.63,-576.6 3945,-520 3561.93,-372.13 3067.37,-320.3 2876.27,-305.04"/> +<polygon fill="black" stroke="black" points="2876.28,-301.52 2866.03,-304.23 2875.73,-308.5 2876.28,-301.52"/> +</g> +<!-- Organizations --> +<g id="node47" class="node"> +<title>Organizations</title> +<polygon fill="white" stroke="transparent" points="90,-4 90,-73 256,-73 256,-4 90,-4"/> +<polygon fill="#d7b5d8" stroke="transparent" points="93,-48.5 93,-69.5 253,-69.5 253,-48.5 93,-48.5"/> +<polygon fill="none" stroke="black" points="93,-48.5 93,-69.5 253,-69.5 253,-48.5 93,-48.5"/> +<text text-anchor="start" x="96" y="-55.3" font-family="Times,serif" font-size="14.00">Organizations (3 KiB)</text> +<text text-anchor="start" x="119.5" y="-33.3" font-family="Times,serif" font-size="14.00">OrganizationId</text> +<text text-anchor="start" x="105.5" y="-12.3" font-family="Times,serif" font-size="14.00">OrganizationName</text> +<polygon fill="none" stroke="black" points="90,-4 90,-73 256,-73 256,-4 90,-4"/> +</g> +<!-- StrainXRef --> +<g id="node48" class="node"> +<title>StrainXRef</title> +<polygon fill="white" stroke="transparent" points="4871,-1897 4871,-2029 5019,-2029 5019,-1897 4871,-1897"/> +<polygon fill="#df65b0" stroke="transparent" points="4874,-2005 4874,-2026 5016,-2026 5016,-2005 4874,-2005"/> +<polygon fill="none" stroke="black" points="4874,-2005 4874,-2026 5016,-2026 5016,-2005 4874,-2005"/> +<text text-anchor="start" x="4877" y="-2011.8" font-family="Times,serif" font-size="14.00">StrainXRef (1 MiB)</text> +<text text-anchor="start" x="4902" y="-1989.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="4916.5" y="-1968.8" font-family="Times,serif" font-size="14.00">OrderId</text> +<text text-anchor="start" x="4890" y="-1947.8" font-family="Times,serif" font-size="14.00">PedigreeStatus</text> +<text text-anchor="start" x="4915.5" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="4878.5" y="-1905.8" font-family="Times,serif" font-size="14.00">Used_for_mapping</text> +<polygon fill="none" stroke="black" points="4871,-1897 4871,-2029 5019,-2029 5019,-1897 4871,-1897"/> +</g> +<!-- StrainXRef->InbredSet --> +<g id="edge40" class="edge"> +<title>StrainXRef:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M4873,-1994C4828.88,-1994 4884.67,-1231.72 4854,-1200 4805.57,-1149.92 4292.6,-1190.1 4228,-1164 4115.23,-1118.43 4012.54,-1024.28 3943.58,-949.66"/> +<polygon fill="black" stroke="black" points="3945.94,-947.05 3936.6,-942.05 3940.78,-951.79 3945.94,-947.05"/> +</g> +<!-- StrainXRef->Strain --> +<g id="edge41" class="edge"> +<title>StrainXRef:StrainId->Strain</title> +<path fill="none" stroke="black" d="M5017,-1930C5057.58,-1930 5018.82,-1233.98 5041,-1200 5195.5,-963.36 5553.55,-879.5 5710.26,-853.43"/> +<polygon fill="black" stroke="black" points="5710.98,-856.86 5720.28,-851.79 5709.85,-849.95 5710.98,-856.86"/> +</g> +<!-- SnpSource --> +<g id="node49" class="node"> +<title>SnpSource</title> +<polygon fill="white" stroke="transparent" points="9726,-4832 9726,-4943 9870,-4943 9870,-4832 9726,-4832"/> +<polygon fill="#d7b5d8" stroke="transparent" points="9729,-4918.5 9729,-4939.5 9867,-4939.5 9867,-4918.5 9729,-4918.5"/> +<polygon fill="none" stroke="black" points="9729,-4918.5 9729,-4939.5 9867,-4939.5 9867,-4918.5 9729,-4918.5"/> +<text text-anchor="start" x="9732" y="-4925.3" font-family="Times,serif" font-size="14.00">SnpSource (1 KiB)</text> +<text text-anchor="start" x="9758.5" y="-4903.3" font-family="Times,serif" font-size="14.00">DateAdded</text> +<text text-anchor="start" x="9752.5" y="-4882.3" font-family="Times,serif" font-size="14.00">DateCreated</text> +<text text-anchor="start" x="9790.5" y="-4861.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="9776.5" y="-4840.3" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="9726,-4832 9726,-4943 9870,-4943 9870,-4832 9726,-4832"/> +</g> +<!-- user_openids --> +<g id="node50" class="node"> +<title>user_openids</title> +<polygon fill="white" stroke="transparent" points="9904.5,-4853 9904.5,-4922 10049.5,-4922 10049.5,-4853 9904.5,-4853"/> +<polygon fill="#f1eef6" stroke="transparent" points="9908,-4897.5 9908,-4918.5 10047,-4918.5 10047,-4897.5 9908,-4897.5"/> +<polygon fill="none" stroke="black" points="9908,-4897.5 9908,-4918.5 10047,-4918.5 10047,-4897.5 9908,-4897.5"/> +<text text-anchor="start" x="9911" y="-4904.3" font-family="Times,serif" font-size="14.00">user_openids (0 B)</text> +<text text-anchor="start" x="9939.5" y="-4882.3" font-family="Times,serif" font-size="14.00">openid_url</text> +<text text-anchor="start" x="9951.5" y="-4861.3" font-family="Times,serif" font-size="14.00">user_id</text> +<polygon fill="none" stroke="black" points="9904.5,-4853 9904.5,-4922 10049.5,-4922 10049.5,-4853 9904.5,-4853"/> +</g> +<!-- GeneMap_cuiyan --> +<g id="node51" class="node"> +<title>GeneMap_cuiyan</title> +<polygon fill="white" stroke="transparent" points="10084,-4832 10084,-4943 10290,-4943 10290,-4832 10084,-4832"/> +<polygon fill="#d7b5d8" stroke="transparent" points="10087,-4918.5 10087,-4939.5 10287,-4939.5 10287,-4918.5 10087,-4918.5"/> +<polygon fill="none" stroke="black" points="10087,-4918.5 10087,-4939.5 10287,-4939.5 10287,-4918.5 10087,-4918.5"/> +<text text-anchor="start" x="10090" y="-4925.3" font-family="Times,serif" font-size="14.00">GeneMap_cuiyan (376 KiB)</text> +<text text-anchor="start" x="10160" y="-4903.3" font-family="Times,serif" font-size="14.00">GeneID</text> +<text text-anchor="start" x="10180" y="-4882.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="10160" y="-4861.3" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="10141.5" y="-4840.3" font-family="Times,serif" font-size="14.00">TranscriptID</text> +<polygon fill="none" stroke="black" points="10084,-4832 10084,-4943 10290,-4943 10290,-4832 10084,-4832"/> +</g> +<!-- InfoFilesUser_md5 --> +<g id="node52" class="node"> +<title>InfoFilesUser_md5</title> +<polygon fill="white" stroke="transparent" points="10324,-4853 10324,-4922 10520,-4922 10520,-4853 10324,-4853"/> +<polygon fill="#f1eef6" stroke="transparent" points="10327,-4897.5 10327,-4918.5 10517,-4918.5 10517,-4897.5 10327,-4897.5"/> +<polygon fill="none" stroke="black" points="10327,-4897.5 10327,-4918.5 10517,-4918.5 10517,-4897.5 10327,-4897.5"/> +<text text-anchor="start" x="10330" y="-4904.3" font-family="Times,serif" font-size="14.00">InfoFilesUser_md5 (96 B)</text> +<text text-anchor="start" x="10387.5" y="-4882.3" font-family="Times,serif" font-size="14.00">Password</text> +<text text-anchor="start" x="10385" y="-4861.3" font-family="Times,serif" font-size="14.00">Username</text> +<polygon fill="none" stroke="black" points="10324,-4853 10324,-4922 10520,-4922 10520,-4853 10324,-4853"/> +</g> +<!-- PublishXRef --> +<g id="node53" class="node"> +<title>PublishXRef</title> +<polygon fill="lightgrey" stroke="transparent" points="2811.5,-1834 2811.5,-2092 2968.5,-2092 2968.5,-1834 2811.5,-1834"/> +<polygon fill="#df65b0" stroke="transparent" points="2815,-2068 2815,-2089 2966,-2089 2966,-2068 2815,-2068"/> +<polygon fill="none" stroke="black" points="2815,-2068 2815,-2089 2966,-2089 2966,-2068 2815,-2068"/> +<text text-anchor="start" x="2818" y="-2074.8" font-family="Times,serif" font-size="14.00">PublishXRef (2 MiB)</text> +<text text-anchor="start" x="2861.5" y="-2052.8" font-family="Times,serif" font-size="14.00">additive</text> +<text text-anchor="start" x="2853.5" y="-2031.8" font-family="Times,serif" font-size="14.00">comments</text> +<text text-anchor="start" x="2866" y="-2010.8" font-family="Times,serif" font-size="14.00">DataId</text> +<polygon fill="green" stroke="transparent" points="2815,-1984 2815,-2003 2966,-2003 2966,-1984 2815,-1984"/> +<text text-anchor="start" x="2883" y="-1989.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2847.5" y="-1968.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="2869.5" y="-1947.8" font-family="Times,serif" font-size="14.00">Locus</text> +<text text-anchor="start" x="2875.5" y="-1926.8" font-family="Times,serif" font-size="14.00">LRS</text> +<text text-anchor="start" x="2870.5" y="-1905.8" font-family="Times,serif" font-size="14.00">mean</text> +<text text-anchor="start" x="2845" y="-1884.8" font-family="Times,serif" font-size="14.00">PhenotypeId</text> +<polygon fill="green" stroke="transparent" points="2815,-1858 2815,-1877 2966,-1877 2966,-1858 2815,-1858"/> +<text text-anchor="start" x="2843" y="-1863.8" font-family="Times,serif" font-size="14.00">PublicationId</text> +<text text-anchor="start" x="2855.5" y="-1842.8" font-family="Times,serif" font-size="14.00">Sequence</text> +<polygon fill="none" stroke="black" points="2811.5,-1834 2811.5,-2092 2968.5,-2092 2968.5,-1834 2811.5,-1834"/> +</g> +<!-- PublishXRef->Publication --> +<g id="edge44" class="edge"> +<title>PublishXRef:PublicationId->Publication</title> +<path fill="none" stroke="black" d="M2814,-1867C2776.93,-1867 2815.52,-1230.19 2794,-1200 2767.79,-1163.23 2729.57,-1197.23 2699,-1164 2651.77,-1112.67 2628.61,-1038.69 2617.34,-974.68"/> +<polygon fill="black" stroke="black" points="2620.73,-973.78 2615.62,-964.5 2613.83,-974.94 2620.73,-973.78"/> +</g> +<!-- PublishXRef->InbredSet --> +<g id="edge42" class="edge"> +<title>PublishXRef:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M2967,-1973C3009.96,-1973 2955.99,-1230.74 2986,-1200 3043.5,-1141.1 3658.94,-1211.74 3726,-1164 3777.91,-1127.05 3808.95,-1065.59 3827.5,-1006.29"/> +<polygon fill="black" stroke="black" points="3830.95,-1006.99 3830.49,-996.41 3824.25,-1004.97 3830.95,-1006.99"/> +</g> +<!-- PublishXRef->Phenotype --> +<g id="edge43" class="edge"> +<title>PublishXRef:PhenotypeId->Phenotype</title> +<path fill="none" stroke="black" d="M2967,-1888C2986.12,-1888 2984.78,-1219.08 2986,-1200 2990.55,-1129.04 2998.2,-1050.39 3005.28,-985.01"/> +<polygon fill="black" stroke="black" points="3008.76,-985.37 3006.37,-975.05 3001.8,-984.61 3008.76,-985.37"/> +</g> +<!-- RatSnpPattern --> +<g id="node54" class="node"> +<title>RatSnpPattern</title> +<polygon fill="white" stroke="transparent" points="10554,-4517 10554,-5258 10748,-5258 10748,-4517 10554,-4517"/> +<polygon fill="#df65b0" stroke="transparent" points="10557,-5233.5 10557,-5254.5 10745,-5254.5 10745,-5233.5 10557,-5233.5"/> +<polygon fill="none" stroke="black" points="10557,-5233.5 10557,-5254.5 10745,-5254.5 10745,-5233.5 10557,-5233.5"/> +<text text-anchor="start" x="10560" y="-5240.3" font-family="Times,serif" font-size="14.00">RatSnpPattern (202 MiB)</text> +<text text-anchor="start" x="10638" y="-5218.3" font-family="Times,serif" font-size="14.00">ACI</text> +<text text-anchor="start" x="10628.5" y="-5197.3" font-family="Times,serif" font-size="14.00">ACI_N</text> +<text text-anchor="start" x="10629.5" y="-5176.3" font-family="Times,serif" font-size="14.00">BBDP</text> +<text text-anchor="start" x="10639.5" y="-5155.3" font-family="Times,serif" font-size="14.00">BN</text> +<text text-anchor="start" x="10630" y="-5134.3" font-family="Times,serif" font-size="14.00">BN_N</text> +<text text-anchor="start" x="10625" y="-5113.3" font-family="Times,serif" font-size="14.00">BUF_N</text> +<text text-anchor="start" x="10632.5" y="-5092.3" font-family="Times,serif" font-size="14.00">F344</text> +<text text-anchor="start" x="10623" y="-5071.3" font-family="Times,serif" font-size="14.00">F344_N</text> +<text text-anchor="start" x="10634" y="-5050.3" font-family="Times,serif" font-size="14.00">FHH</text> +<text text-anchor="start" x="10635.5" y="-5029.3" font-family="Times,serif" font-size="14.00">FHL</text> +<text text-anchor="start" x="10640" y="-5008.3" font-family="Times,serif" font-size="14.00">GK</text> +<text text-anchor="start" x="10643.5" y="-4987.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="10641" y="-4966.3" font-family="Times,serif" font-size="14.00">LE</text> +<text text-anchor="start" x="10634" y="-4945.3" font-family="Times,serif" font-size="14.00">LEW</text> +<text text-anchor="start" x="10640" y="-4924.3" font-family="Times,serif" font-size="14.00">LH</text> +<text text-anchor="start" x="10641.5" y="-4903.3" font-family="Times,serif" font-size="14.00">LL</text> +<text text-anchor="start" x="10640" y="-4882.3" font-family="Times,serif" font-size="14.00">LN</text> +<text text-anchor="start" x="10620.5" y="-4861.3" font-family="Times,serif" font-size="14.00">M520_N</text> +<text text-anchor="start" x="10632.5" y="-4840.3" font-family="Times,serif" font-size="14.00">MHS</text> +<text text-anchor="start" x="10632.5" y="-4819.3" font-family="Times,serif" font-size="14.00">MNS</text> +<text text-anchor="start" x="10629" y="-4798.3" font-family="Times,serif" font-size="14.00">MR_N</text> +<text text-anchor="start" x="10634.5" y="-4777.3" font-family="Times,serif" font-size="14.00">SBH</text> +<text text-anchor="start" x="10634.5" y="-4756.3" font-family="Times,serif" font-size="14.00">SBN</text> +<text text-anchor="start" x="10634.5" y="-4735.3" font-family="Times,serif" font-size="14.00">SHR</text> +<text text-anchor="start" x="10625" y="-4714.3" font-family="Times,serif" font-size="14.00">SHRSP</text> +<text text-anchor="start" x="10629.5" y="-4693.3" font-family="Times,serif" font-size="14.00">SnpId</text> +<text text-anchor="start" x="10640.5" y="-4672.3" font-family="Times,serif" font-size="14.00">SR</text> +<text text-anchor="start" x="10641.5" y="-4651.3" font-family="Times,serif" font-size="14.00">SS</text> +<text text-anchor="start" x="10633.5" y="-4630.3" font-family="Times,serif" font-size="14.00">WAG</text> +<text text-anchor="start" x="10634" y="-4609.3" font-family="Times,serif" font-size="14.00">WKY</text> +<text text-anchor="start" x="10625" y="-4588.3" font-family="Times,serif" font-size="14.00">WKY_N</text> +<text text-anchor="start" x="10636.5" y="-4567.3" font-family="Times,serif" font-size="14.00">WLI</text> +<text text-anchor="start" x="10634" y="-4546.3" font-family="Times,serif" font-size="14.00">WMI</text> +<text text-anchor="start" x="10628" y="-4525.3" font-family="Times,serif" font-size="14.00">WN_N</text> +<polygon fill="none" stroke="black" points="10554,-4517 10554,-5258 10748,-5258 10748,-4517 10554,-4517"/> +</g> +<!-- Genbank --> +<g id="node55" class="node"> +<title>Genbank</title> +<polygon fill="white" stroke="transparent" points="769,-797 769,-887 911,-887 911,-797 769,-797"/> +<polygon fill="#df65b0" stroke="transparent" points="772,-863 772,-884 908,-884 908,-863 772,-863"/> +<polygon fill="none" stroke="black" points="772,-863 772,-884 908,-884 908,-863 772,-863"/> +<text text-anchor="start" x="775" y="-869.8" font-family="Times,serif" font-size="14.00">Genbank (37 MiB)</text> +<text text-anchor="start" x="832.5" y="-847.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="805" y="-826.8" font-family="Times,serif" font-size="14.00">Sequence</text> +<text text-anchor="start" x="805" y="-805.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<polygon fill="none" stroke="black" points="769,-797 769,-887 911,-887 911,-797 769,-797"/> +</g> +<!-- Genbank->Species --> +<g id="edge45" class="edge"> +<title>Genbank:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M909,-809C941.22,-809 910.62,-543.18 933,-520 1058.95,-389.57 2375.45,-319.21 2715.96,-303.1"/> +<polygon fill="black" stroke="black" points="2716.17,-306.6 2725.99,-302.63 2715.84,-299.61 2716.17,-306.6"/> +</g> +<!-- EnsemblChip --> +<g id="node56" class="node"> +<title>EnsemblChip</title> +<polygon fill="white" stroke="transparent" points="1780.5,-786.5 1780.5,-897.5 1945.5,-897.5 1945.5,-786.5 1780.5,-786.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="1784,-873 1784,-894 1943,-894 1943,-873 1784,-873"/> +<polygon fill="none" stroke="black" points="1784,-873 1784,-894 1943,-894 1943,-873 1784,-873"/> +<text text-anchor="start" x="1787" y="-879.8" font-family="Times,serif" font-size="14.00">EnsemblChip (296 B)</text> +<text text-anchor="start" x="1856" y="-857.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="1842" y="-836.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="1815" y="-815.8" font-family="Times,serif" font-size="14.00">ProbeSetSize</text> +<text text-anchor="start" x="1846" y="-794.8" font-family="Times,serif" font-size="14.00">Type</text> +<polygon fill="none" stroke="black" points="1780.5,-786.5 1780.5,-897.5 1945.5,-897.5 1945.5,-786.5 1780.5,-786.5"/> +</g> +<!-- LCorrRamin3 --> +<g id="node57" class="node"> +<title>LCorrRamin3</title> +<polygon fill="white" stroke="transparent" points="10782.5,-4842.5 10782.5,-4932.5 10945.5,-4932.5 10945.5,-4842.5 10782.5,-4842.5"/> +<polygon fill="#ce1256" stroke="transparent" points="10786,-4908.5 10786,-4929.5 10943,-4929.5 10943,-4908.5 10786,-4908.5"/> +<polygon fill="none" stroke="black" points="10786,-4908.5 10786,-4929.5 10943,-4929.5 10943,-4908.5 10786,-4908.5"/> +<text text-anchor="start" x="10789" y="-4915.3" font-family="Times,serif" font-size="14.00">LCorrRamin3 (2 GiB)</text> +<text text-anchor="start" x="10834" y="-4893.3" font-family="Times,serif" font-size="14.00">GeneId1</text> +<text text-anchor="start" x="10834" y="-4872.3" font-family="Times,serif" font-size="14.00">GeneId2</text> +<text text-anchor="start" x="10845" y="-4851.3" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="10782.5,-4842.5 10782.5,-4932.5 10945.5,-4932.5 10945.5,-4842.5 10782.5,-4842.5"/> +</g> +<!-- UserPrivilege --> +<g id="node59" class="node"> +<title>UserPrivilege</title> +<polygon fill="white" stroke="transparent" points="7239,-4842.5 7239,-4932.5 7407,-4932.5 7407,-4842.5 7239,-4842.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="7242,-4908.5 7242,-4929.5 7404,-4929.5 7404,-4908.5 7242,-4908.5"/> +<polygon fill="none" stroke="black" points="7242,-4908.5 7242,-4929.5 7404,-4929.5 7404,-4908.5 7242,-4908.5"/> +<text text-anchor="start" x="7245" y="-4915.3" font-family="Times,serif" font-size="14.00">UserPrivilege (224 B)</text> +<text text-anchor="start" x="7246.5" y="-4893.3" font-family="Times,serif" font-size="14.00">download_result_priv</text> +<text text-anchor="start" x="7258" y="-4872.3" font-family="Times,serif" font-size="14.00">ProbeSetFreezeId</text> +<text text-anchor="start" x="7298.5" y="-4851.3" font-family="Times,serif" font-size="14.00">UserId</text> +<polygon fill="none" stroke="black" points="7239,-4842.5 7239,-4932.5 7407,-4932.5 7407,-4842.5 7239,-4842.5"/> +</g> +<!-- UserPrivilege->User --> +<g id="edge46" class="edge"> +<title>UserPrivilege:UserId->User</title> +<path fill="none" stroke="black" d="M7323,-4844.5C7323,-4319.22 7309.04,-3693.9 7302.41,-3426.66"/> +<polygon fill="black" stroke="black" points="7305.91,-3426.44 7302.16,-3416.53 7298.91,-3426.61 7305.91,-3426.44"/> +</g> +<!-- GeneChip --> +<g id="node61" class="node"> +<title>GeneChip</title> +<polygon fill="lightgrey" stroke="transparent" points="1980,-744.5 1980,-939.5 2116,-939.5 2116,-744.5 1980,-744.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="1983,-915 1983,-936 2113,-936 2113,-915 1983,-915"/> +<polygon fill="none" stroke="black" points="1983,-915 1983,-936 2113,-936 2113,-915 1983,-915"/> +<text text-anchor="start" x="1986" y="-921.8" font-family="Times,serif" font-size="14.00">GeneChip (9 KiB)</text> +<text text-anchor="start" x="2005.5" y="-899.8" font-family="Times,serif" font-size="14.00">GeneChipId</text> +<polygon fill="green" stroke="transparent" points="1983,-873 1983,-892 2113,-892 2113,-873 1983,-873"/> +<text text-anchor="start" x="1992" y="-878.8" font-family="Times,serif" font-size="14.00">GeneChipName</text> +<text text-anchor="start" x="2002.5" y="-857.8" font-family="Times,serif" font-size="14.00">GeoPlatform</text> +<text text-anchor="start" x="1996" y="-836.8" font-family="Times,serif" font-size="14.00">GO_tree_value</text> +<text text-anchor="start" x="2040.5" y="-815.8" font-family="Times,serif" font-size="14.00">Id</text> +<polygon fill="green" stroke="transparent" points="1983,-789 1983,-808 2113,-808 2113,-789 1983,-789"/> +<text text-anchor="start" x="2026.5" y="-794.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="2013" y="-773.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="2031.5" y="-752.8" font-family="Times,serif" font-size="14.00">Title</text> +<polygon fill="none" stroke="black" points="1980,-744.5 1980,-939.5 2116,-939.5 2116,-744.5 1980,-744.5"/> +</g> +<!-- GeneChip->Species --> +<g id="edge47" class="edge"> +<title>GeneChip:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M2114,-777C2142.63,-777 2115.4,-542.59 2133,-520 2274.95,-337.76 2572.58,-304.64 2715.73,-299.62"/> +<polygon fill="black" stroke="black" points="2715.88,-303.12 2725.77,-299.31 2715.66,-296.12 2715.88,-303.12"/> +</g> +<!-- IndelXRef --> +<g id="node62" class="node"> +<title>IndelXRef</title> +<polygon fill="white" stroke="transparent" points="5716,-1918 5716,-2008 5856,-2008 5856,-1918 5716,-1918"/> +<polygon fill="#df65b0" stroke="transparent" points="5719,-1984 5719,-2005 5853,-2005 5853,-1984 5719,-1984"/> +<polygon fill="none" stroke="black" points="5719,-1984 5719,-2005 5853,-2005 5853,-1984 5719,-1984"/> +<text text-anchor="start" x="5722" y="-1990.8" font-family="Times,serif" font-size="14.00">IndelXRef (1 MiB)</text> +<text text-anchor="start" x="5760.5" y="-1968.8" font-family="Times,serif" font-size="14.00">IndelId</text> +<text text-anchor="start" x="5752" y="-1947.8" font-family="Times,serif" font-size="14.00">StrainId1</text> +<text text-anchor="start" x="5752" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId2</text> +<polygon fill="none" stroke="black" points="5716,-1918 5716,-2008 5856,-2008 5856,-1918 5716,-1918"/> +</g> +<!-- IndelXRef->Strain --> +<g id="edge48" class="edge"> +<title>IndelXRef:StrainId1->Strain</title> +<path fill="none" stroke="black" d="M5854,-1951C5904.87,-1951 5825.54,-1197.02 5796.2,-933"/> +<polygon fill="black" stroke="black" points="5799.64,-932.24 5795.05,-922.68 5792.68,-933.01 5799.64,-932.24"/> +</g> +<!-- IndelXRef->Strain --> +<g id="edge49" class="edge"> +<title>IndelXRef:StrainId2->Strain</title> +<path fill="none" stroke="black" d="M5786,-1920C5786,-1553.9 5786,-1117.79 5786,-932.93"/> +<polygon fill="black" stroke="black" points="5789.5,-932.72 5786,-922.72 5782.5,-932.72 5789.5,-932.72"/> +</g> +<!-- user --> +<g id="node63" class="node"> +<title>user</title> +<polygon fill="white" stroke="transparent" points="10979.5,-4779.5 10979.5,-4995.5 11108.5,-4995.5 11108.5,-4779.5 10979.5,-4779.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="10983,-4971.5 10983,-4992.5 11106,-4992.5 11106,-4971.5 10983,-4971.5"/> +<polygon fill="none" stroke="black" points="10983,-4971.5 10983,-4992.5 11106,-4992.5 11106,-4971.5 10983,-4971.5"/> +<text text-anchor="start" x="10997" y="-4978.3" font-family="Times,serif" font-size="14.00">user (64 KiB)</text> +<text text-anchor="start" x="11023" y="-4956.3" font-family="Times,serif" font-size="14.00">active</text> +<text text-anchor="start" x="11008.5" y="-4935.3" font-family="Times,serif" font-size="14.00">confirmed</text> +<text text-anchor="start" x="10993" y="-4914.3" font-family="Times,serif" font-size="14.00">email_address</text> +<text text-anchor="start" x="11009.5" y="-4893.3" font-family="Times,serif" font-size="14.00">full_name</text> +<text text-anchor="start" x="11037.5" y="-4872.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="10999.5" y="-4851.3" font-family="Times,serif" font-size="14.00">organization</text> +<text text-anchor="start" x="11010" y="-4830.3" font-family="Times,serif" font-size="14.00">password</text> +<text text-anchor="start" x="10985" y="-4809.3" font-family="Times,serif" font-size="14.00">registration_info</text> +<text text-anchor="start" x="11008" y="-4788.3" font-family="Times,serif" font-size="14.00">superuser</text> +<polygon fill="none" stroke="black" points="10979.5,-4779.5 10979.5,-4995.5 11108.5,-4995.5 11108.5,-4779.5 10979.5,-4779.5"/> +</g> +<!-- PublishSE --> +<g id="node64" class="node"> +<title>PublishSE</title> +<polygon fill="white" stroke="transparent" points="5890,-1918 5890,-2008 6034,-2008 6034,-1918 5890,-1918"/> +<polygon fill="#df65b0" stroke="transparent" points="5893,-1984 5893,-2005 6031,-2005 6031,-1984 5893,-1984"/> +<polygon fill="none" stroke="black" points="5893,-1984 5893,-2005 6031,-2005 6031,-1984 5893,-1984"/> +<text text-anchor="start" x="5896" y="-1990.8" font-family="Times,serif" font-size="14.00">PublishSE (3 MiB)</text> +<text text-anchor="start" x="5937.5" y="-1968.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="5943.5" y="-1947.8" font-family="Times,serif" font-size="14.00">error</text> +<text text-anchor="start" x="5932.5" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="5890,-1918 5890,-2008 6034,-2008 6034,-1918 5890,-1918"/> +</g> +<!-- PublishSE->Strain --> +<g id="edge50" class="edge"> +<title>PublishSE:StrainId->Strain</title> +<path fill="none" stroke="black" d="M5962,-1920C5962,-1549.32 5859.2,-1116.17 5810.73,-932.54"/> +<polygon fill="black" stroke="black" points="5814.06,-931.43 5808.11,-922.66 5807.29,-933.22 5814.06,-931.43"/> +</g> +<!-- EnsemblProbe --> +<g id="node65" class="node"> +<title>EnsemblProbe</title> +<polygon fill="white" stroke="transparent" points="11143,-4821.5 11143,-4953.5 11327,-4953.5 11327,-4821.5 11143,-4821.5"/> +<polygon fill="#df65b0" stroke="transparent" points="11146,-4929.5 11146,-4950.5 11324,-4950.5 11324,-4929.5 11146,-4929.5"/> +<polygon fill="none" stroke="black" points="11146,-4929.5 11146,-4950.5 11324,-4950.5 11324,-4929.5 11146,-4929.5"/> +<text text-anchor="start" x="11149" y="-4936.3" font-family="Times,serif" font-size="14.00">EnsemblProbe (94 MiB)</text> +<text text-anchor="start" x="11211" y="-4914.3" font-family="Times,serif" font-size="14.00">ChipId</text> +<text text-anchor="start" x="11227.5" y="-4893.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="11212" y="-4872.3" font-family="Times,serif" font-size="14.00">length</text> +<text text-anchor="start" x="11213.5" y="-4851.3" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="11201.5" y="-4830.3" font-family="Times,serif" font-size="14.00">ProbeSet</text> +<polygon fill="none" stroke="black" points="11143,-4821.5 11143,-4953.5 11327,-4953.5 11327,-4821.5 11143,-4821.5"/> +</g> +<!-- InfoFiles --> +<g id="node66" class="node"> +<title>InfoFiles</title> +<polygon fill="lightgrey" stroke="transparent" points="2048.5,-1424.5 2048.5,-2501.5 2279.5,-2501.5 2279.5,-1424.5 2048.5,-1424.5"/> +<polygon fill="#df65b0" stroke="transparent" points="2052,-2477 2052,-2498 2277,-2498 2277,-2477 2052,-2477"/> +<polygon fill="none" stroke="black" points="2052,-2477 2052,-2498 2277,-2498 2277,-2477 2052,-2477"/> +<text text-anchor="start" x="2104" y="-2483.8" font-family="Times,serif" font-size="14.00">InfoFiles (4 MiB)</text> +<text text-anchor="start" x="2085.5" y="-2461.8" font-family="Times,serif" font-size="14.00">About_Array_Platform</text> +<text text-anchor="start" x="2119" y="-2440.8" font-family="Times,serif" font-size="14.00">About_Cases</text> +<text text-anchor="start" x="2054" y="-2419.8" font-family="Times,serif" font-size="14.00">About_Data_Values_Processing</text> +<text text-anchor="start" x="2104.5" y="-2398.8" font-family="Times,serif" font-size="14.00">About_Download</text> +<text text-anchor="start" x="2117" y="-2377.8" font-family="Times,serif" font-size="14.00">About_Tissue</text> +<text text-anchor="start" x="2104" y="-2356.8" font-family="Times,serif" font-size="14.00">AuthorizedUsers</text> +<text text-anchor="start" x="2116" y="-2335.8" font-family="Times,serif" font-size="14.00">AvgMethodId</text> +<text text-anchor="start" x="2135.5" y="-2314.8" font-family="Times,serif" font-size="14.00">Citation</text> +<text text-anchor="start" x="2149.5" y="-2293.8" font-family="Times,serif" font-size="14.00">City</text> +<text text-anchor="start" x="2112" y="-2272.8" font-family="Times,serif" font-size="14.00">Contact_Name</text> +<text text-anchor="start" x="2122" y="-2251.8" font-family="Times,serif" font-size="14.00">Contributor</text> +<text text-anchor="start" x="2135.5" y="-2230.8" font-family="Times,serif" font-size="14.00">Country</text> +<text text-anchor="start" x="2069" y="-2209.8" font-family="Times,serif" font-size="14.00">Data_Source_Acknowledge</text> +<text text-anchor="start" x="2129.5" y="-2188.8" font-family="Times,serif" font-size="14.00">DatasetId</text> +<text text-anchor="start" x="2129" y="-2167.8" font-family="Times,serif" font-size="14.00">DB_Name</text> +<text text-anchor="start" x="2121" y="-2146.8" font-family="Times,serif" font-size="14.00">Department</text> +<text text-anchor="start" x="2140" y="-2125.8" font-family="Times,serif" font-size="14.00">Emails</text> +<text text-anchor="start" x="2101.5" y="-2104.8" font-family="Times,serif" font-size="14.00">Experiment_Type</text> +<text text-anchor="start" x="2122" y="-2083.8" font-family="Times,serif" font-size="14.00">GeneChipId</text> +<polygon fill="green" stroke="transparent" points="2052,-2057 2052,-2076 2277,-2076 2277,-2057 2052,-2057"/> +<text text-anchor="start" x="2111" y="-2062.8" font-family="Times,serif" font-size="14.00">GN_AccesionId</text> +<text text-anchor="start" x="2128.5" y="-2041.8" font-family="Times,serif" font-size="14.00">InbredSet</text> +<text text-anchor="start" x="2121.5" y="-2020.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="2129.5" y="-1999.8" font-family="Times,serif" font-size="14.00">InfoFileId</text> +<polygon fill="green" stroke="transparent" points="2052,-1973 2052,-1992 2277,-1992 2277,-1973 2052,-1973"/> +<text text-anchor="start" x="2120.5" y="-1978.8" font-family="Times,serif" font-size="14.00">InfoFileTitle</text> +<text text-anchor="start" x="2112" y="-1957.8" font-family="Times,serif" font-size="14.00">InfoPageName</text> +<text text-anchor="start" x="2117" y="-1936.8" font-family="Times,serif" font-size="14.00">InfoPageTitle</text> +<text text-anchor="start" x="2125" y="-1915.8" font-family="Times,serif" font-size="14.00">Laboratory</text> +<text text-anchor="start" x="2113.5" y="-1894.8" font-family="Times,serif" font-size="14.00">Normalization</text> +<text text-anchor="start" x="2129.5" y="-1873.8" font-family="Times,serif" font-size="14.00">Organism</text> +<text text-anchor="start" x="2119" y="-1852.8" font-family="Times,serif" font-size="14.00">Organism_Id</text> +<text text-anchor="start" x="2093.5" y="-1831.8" font-family="Times,serif" font-size="14.00">Organization_Name</text> +<text text-anchor="start" x="2110" y="-1810.8" font-family="Times,serif" font-size="14.00">Overall_Design</text> +<text text-anchor="start" x="2142" y="-1789.8" font-family="Times,serif" font-size="14.00">Phone</text> +<text text-anchor="start" x="2129.5" y="-1768.8" font-family="Times,serif" font-size="14.00">Platforms</text> +<text text-anchor="start" x="2132" y="-1747.8" font-family="Times,serif" font-size="14.00">Progreso</text> +<text text-anchor="start" x="2088.5" y="-1726.8" font-family="Times,serif" font-size="14.00">QualityControlStatus</text> +<text text-anchor="start" x="2134" y="-1705.8" font-family="Times,serif" font-size="14.00">Samples</text> +<text text-anchor="start" x="2137" y="-1684.8" font-family="Times,serif" font-size="14.00">Species</text> +<text text-anchor="start" x="2129.5" y="-1663.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<polygon fill="green" stroke="transparent" points="2052,-1637 2052,-1656 2277,-1656 2277,-1637 2052,-1637"/> +<text text-anchor="start" x="2132.5" y="-1642.8" font-family="Times,serif" font-size="14.00">Specifics</text> +<text text-anchor="start" x="2145" y="-1621.8" font-family="Times,serif" font-size="14.00">State</text> +<text text-anchor="start" x="2141" y="-1600.8" font-family="Times,serif" font-size="14.00">Status</text> +<text text-anchor="start" x="2141.5" y="-1579.8" font-family="Times,serif" font-size="14.00">Street</text> +<text text-anchor="start" x="2102.5" y="-1558.8" font-family="Times,serif" font-size="14.00">Submission_Date</text> +<text text-anchor="start" x="2129.5" y="-1537.8" font-family="Times,serif" font-size="14.00">Summary</text> +<text text-anchor="start" x="2141.5" y="-1516.8" font-family="Times,serif" font-size="14.00">Tissue</text> +<text text-anchor="start" x="2134" y="-1495.8" font-family="Times,serif" font-size="14.00">TissueId</text> +<polygon fill="green" stroke="transparent" points="2052,-1469 2052,-1488 2277,-1488 2277,-1469 2052,-1469"/> +<text text-anchor="start" x="2148" y="-1474.8" font-family="Times,serif" font-size="14.00">Title</text> +<text text-anchor="start" x="2148.5" y="-1453.8" font-family="Times,serif" font-size="14.00">URL</text> +<text text-anchor="start" x="2152" y="-1432.8" font-family="Times,serif" font-size="14.00">ZIP</text> +<polygon fill="none" stroke="black" points="2048.5,-1424.5 2048.5,-2501.5 2279.5,-2501.5 2279.5,-1424.5 2048.5,-1424.5"/> +</g> +<!-- InfoFiles->Datasets --> +<g id="edge52" class="edge"> +<title>InfoFiles:DatasetId->Datasets</title> +<path fill="none" stroke="black" d="M2051,-2193C1940.48,-2193 2072.47,-1276.81 1993,-1200 1933.9,-1142.88 581.41,-1211.03 514,-1164 470.71,-1133.8 442.18,-1086.38 423.37,-1037.17"/> +<polygon fill="black" stroke="black" points="426.6,-1035.81 419.85,-1027.64 420.03,-1038.23 426.6,-1035.81"/> +</g> +<!-- InfoFiles->InbredSet --> +<g id="edge54" class="edge"> +<title>InfoFiles:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M2278,-2025C2323.84,-2025 2263.64,-1232.47 2296,-1200 2352.1,-1143.71 3660.72,-1209.33 3726,-1164 3778.57,-1127.49 3809.73,-1065.76 3828.19,-1006.12"/> +<polygon fill="black" stroke="black" points="3831.65,-1006.77 3831.16,-996.18 3824.94,-1004.76 3831.65,-1006.77"/> +</g> +<!-- InfoFiles->Species --> +<g id="edge55" class="edge"> +<title>InfoFiles:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M2278,-1667C2303.96,-1667 2277.61,-1218.33 2296,-1200 2376.56,-1119.71 3240,-1245.83 3319,-1164 3368.7,-1112.52 3358.57,-579.62 3319,-520 3219.73,-370.42 2996.86,-322.06 2876.6,-306.62"/> +<polygon fill="black" stroke="black" points="2876.71,-303.1 2866.35,-305.35 2875.85,-310.05 2876.71,-303.1"/> +</g> +<!-- InfoFiles->AvgMethod --> +<g id="edge51" class="edge"> +<title>InfoFiles:AvgMethodId->AvgMethod</title> +<path fill="none" stroke="black" d="M2051,-2340C1924.17,-2340 2083.05,-1289.32 1993,-1200 1926.52,-1134.05 1224.64,-1221.84 1151,-1164 1075.17,-1104.44 1058.6,-986.94 1056.31,-911.82"/> +<polygon fill="black" stroke="black" points="1059.8,-911.43 1056.07,-901.51 1052.8,-911.59 1059.8,-911.43"/> +</g> +<!-- InfoFiles->GeneChip --> +<g id="edge53" class="edge"> +<title>InfoFiles:GeneChipId->GeneChip</title> +<path fill="none" stroke="black" d="M2051,-2088C2022.77,-2088 2038.62,-1258.67 2045.41,-953.75"/> +<polygon fill="black" stroke="black" points="2048.91,-953.64 2045.63,-943.57 2041.91,-953.49 2048.91,-953.64"/> +</g> +<!-- InfoFiles->Tissue --> +<g id="edge56" class="edge"> +<title>InfoFiles:TissueId->Tissue</title> +<path fill="none" stroke="black" d="M2278,-1499C2311.28,-1499 2278.84,-1228.52 2296,-1200 2311.83,-1173.68 2336.81,-1188.76 2355,-1164 2402.06,-1099.94 2421.62,-1011.33 2429.66,-943.43"/> +<polygon fill="black" stroke="black" points="2433.17,-943.47 2430.81,-933.15 2426.22,-942.7 2433.17,-943.47"/> +</g> +<!-- Vlookup --> +<g id="node67" class="node"> +<title>Vlookup</title> +<polygon fill="white" stroke="transparent" points="2070,-2766 2070,-3822 2258,-3822 2258,-2766 2070,-2766"/> +<polygon fill="#d7b5d8" stroke="transparent" points="2073,-3798 2073,-3819 2255,-3819 2255,-3798 2073,-3798"/> +<polygon fill="none" stroke="black" points="2073,-3798 2073,-3819 2255,-3819 2255,-3798 2073,-3798"/> +<text text-anchor="start" x="2099" y="-3804.8" font-family="Times,serif" font-size="14.00">Vlookup (120 KiB)</text> +<text text-anchor="start" x="2147" y="-3782.8" font-family="Times,serif" font-size="14.00">alias</text> +<text text-anchor="start" x="2137" y="-3761.8" font-family="Times,serif" font-size="14.00">AlignID</text> +<text text-anchor="start" x="2130.5" y="-3740.8" font-family="Times,serif" font-size="14.00">assembly</text> +<text text-anchor="start" x="2115.5" y="-3719.8" font-family="Times,serif" font-size="14.00">AvgMethodId</text> +<text text-anchor="start" x="2135.5" y="-3698.8" font-family="Times,serif" font-size="14.00">BlatSeq</text> +<text text-anchor="start" x="2117.5" y="-3677.8" font-family="Times,serif" font-size="14.00">CAS_number</text> +<text text-anchor="start" x="2137.5" y="-3656.8" font-family="Times,serif" font-size="14.00">cdsEnd</text> +<text text-anchor="start" x="2133.5" y="-3635.8" font-family="Times,serif" font-size="14.00">cdsStart</text> +<text text-anchor="start" x="2129" y="-3614.8" font-family="Times,serif" font-size="14.00">ChEBI_ID</text> +<text text-anchor="start" x="2120" y="-3593.8" font-family="Times,serif" font-size="14.00">ChEMBL_ID</text> +<text text-anchor="start" x="2108" y="-3572.8" font-family="Times,serif" font-size="14.00">ChemSpider_ID</text> +<text text-anchor="start" x="2150.5" y="-3551.8" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="2129" y="-3530.8" font-family="Times,serif" font-size="14.00">DatasetId</text> +<text text-anchor="start" x="2123.5" y="-3509.8" font-family="Times,serif" font-size="14.00">description</text> +<text text-anchor="start" x="2122" y="-3488.8" font-family="Times,serif" font-size="14.00">EC_number</text> +<text text-anchor="start" x="2125.5" y="-3467.8" font-family="Times,serif" font-size="14.00">exonCount</text> +<text text-anchor="start" x="2129" y="-3446.8" font-family="Times,serif" font-size="14.00">exonEnds</text> +<text text-anchor="start" x="2124.5" y="-3425.8" font-family="Times,serif" font-size="14.00">exonStarts</text> +<text text-anchor="start" x="2105" y="-3404.8" font-family="Times,serif" font-size="14.00">Full_Description</text> +<text text-anchor="start" x="2121.5" y="-3383.8" font-family="Times,serif" font-size="14.00">GeneChipId</text> +<text text-anchor="start" x="2138" y="-3362.8" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="2110.5" y="-3341.8" font-family="Times,serif" font-size="14.00">GN_AccesionId</text> +<text text-anchor="start" x="2128" y="-3320.8" font-family="Times,serif" font-size="14.00">HMDB_ID</text> +<text text-anchor="start" x="2156.5" y="-3299.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2121" y="-3278.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="2129" y="-3257.8" font-family="Times,serif" font-size="14.00">InfoFileId</text> +<text text-anchor="start" x="2111.5" y="-3236.8" font-family="Times,serif" font-size="14.00">InfoPageName</text> +<text text-anchor="start" x="2130.5" y="-3215.8" font-family="Times,serif" font-size="14.00">KEGG_ID</text> +<text text-anchor="start" x="2147" y="-3194.8" font-family="Times,serif" font-size="14.00">kgID</text> +<text text-anchor="start" x="2152" y="-3173.8" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="2099.5" y="-3152.8" font-family="Times,serif" font-size="14.00">Molecular_Weight</text> +<text text-anchor="start" x="2142.5" y="-3131.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="2139" y="-3110.8" font-family="Times,serif" font-size="14.00">NM_ID</text> +<text text-anchor="start" x="2118.5" y="-3089.8" font-family="Times,serif" font-size="14.00">Nugowiki_ID</text> +<text text-anchor="start" x="2135" y="-3068.8" font-family="Times,serif" font-size="14.00">Position</text> +<text text-anchor="start" x="2079" y="-3047.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_end</text> +<text text-anchor="start" x="2075" y="-3026.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_start</text> +<text text-anchor="start" x="2129" y="-3005.8" font-family="Times,serif" font-size="14.00">ProteinID</text> +<text text-anchor="start" x="2117.5" y="-2984.8" font-family="Times,serif" font-size="14.00">PubChem_ID</text> +<text text-anchor="start" x="2129" y="-2963.8" font-family="Times,serif" font-size="14.00">SnpName</text> +<text text-anchor="start" x="2129" y="-2942.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="2139.5" y="-2921.8" font-family="Times,serif" font-size="14.00">Strand</text> +<text text-anchor="start" x="2137" y="-2900.8" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="2133.5" y="-2879.8" font-family="Times,serif" font-size="14.00">TissueId</text> +<text text-anchor="start" x="2141" y="-2858.8" font-family="Times,serif" font-size="14.00">TxEnd</text> +<text text-anchor="start" x="2136.5" y="-2837.8" font-family="Times,serif" font-size="14.00">TxStart</text> +<text text-anchor="start" x="2135" y="-2816.8" font-family="Times,serif" font-size="14.00">UNII_ID</text> +<text text-anchor="start" x="2126" y="-2795.8" font-family="Times,serif" font-size="14.00">VLBlatSeq</text> +<text text-anchor="start" x="2114" y="-2774.8" font-family="Times,serif" font-size="14.00">VLProbeSetId</text> +<polygon fill="none" stroke="black" points="2070,-2766 2070,-3822 2258,-3822 2258,-2766 2070,-2766"/> +</g> +<!-- Vlookup->Datasets --> +<g id="edge58" class="edge"> +<title>Vlookup:DatasetId->Datasets</title> +<path fill="none" stroke="black" d="M2072,-3535C1300.04,-3535 942.38,-3381.71 535,-2726 490.25,-2653.97 509.59,-1283.71 496,-1200 487.3,-1146.41 472.62,-1089.65 456.8,-1037.55"/> +<polygon fill="black" stroke="black" points="460.12,-1036.41 453.84,-1027.88 453.42,-1038.46 460.12,-1036.41"/> +</g> +<!-- Vlookup->InbredSet --> +<g id="edge60" class="edge"> +<title>Vlookup:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M2256,-3282C2538.62,-3282 2374.11,-2897.73 2622,-2762 2701.02,-2718.73 3368.94,-2790.34 3432,-2726 3491.36,-2665.43 3412.08,-1262.87 3469,-1200 3556.17,-1103.72 3659.82,-1247.85 3759,-1164 3805.29,-1124.86 3829.81,-1064.39 3842.6,-1006.44"/> +<polygon fill="black" stroke="black" points="3846.07,-1006.91 3844.7,-996.41 3839.22,-1005.48 3846.07,-1006.91"/> +</g> +<!-- Vlookup->Species --> +<g id="edge62" class="edge"> +<title>Vlookup:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M2256,-2946C2438.07,-2946 2446.26,-2809.59 2622,-2762 2683.25,-2745.41 3148.01,-2771.74 3192,-2726 3250.79,-2664.88 3170.49,-1261.39 3229,-1200 3305.4,-1119.84 3650.58,-1245.08 3726,-1164 3774.74,-1111.61 3770.38,-576.13 3726,-520 3619.99,-385.91 3082.28,-324.84 2876.38,-306.1"/> +<polygon fill="black" stroke="black" points="2876.51,-302.6 2866.23,-305.19 2875.88,-309.57 2876.51,-302.6"/> +</g> +<!-- Vlookup->AvgMethod --> +<g id="edge57" class="edge"> +<title>Vlookup:AvgMethodId->AvgMethod</title> +<path fill="none" stroke="black" d="M2072,-3724C882.38,-3724 1769.05,-2234.12 1181,-1200 1170.7,-1181.9 1160.77,-1182.39 1151,-1164 1107.82,-1082.73 1082.45,-978.95 1069.39,-911.74"/> +<polygon fill="black" stroke="black" points="1072.79,-910.86 1067.48,-901.69 1065.91,-912.17 1072.79,-910.86"/> +</g> +<!-- Vlookup->GeneChip --> +<g id="edge59" class="edge"> +<title>Vlookup:GeneChipId->GeneChip</title> +<path fill="none" stroke="black" d="M2072,-3388C1777.21,-3388 2040.12,-3020.64 2031,-2726 2010.03,-2048.1 2014.87,-1878.03 2031,-1200 2032.96,-1117.52 2037.47,-1024.42 2041.35,-953.97"/> +<polygon fill="black" stroke="black" points="2044.86,-953.92 2041.92,-943.75 2037.87,-953.54 2044.86,-953.92"/> +</g> +<!-- Vlookup->InfoFiles --> +<g id="edge61" class="edge"> +<title>Vlookup:InfoFileId->InfoFiles</title> +<path fill="none" stroke="black" d="M2256,-3261C2335.46,-3261 2299.68,-2868.62 2251.39,-2515.5"/> +<polygon fill="black" stroke="black" points="2254.86,-2515.02 2250.03,-2505.59 2247.92,-2515.97 2254.86,-2515.02"/> +</g> +<!-- Vlookup->Tissue --> +<g id="edge63" class="edge"> +<title>Vlookup:TissueId->Tissue</title> +<path fill="none" stroke="black" d="M2256,-2883C2406.09,-2883 2477.36,-2854.46 2555,-2726 2598.85,-2653.44 2589.96,-1277.23 2555,-1200 2545,-1177.91 2526.59,-1184.73 2514,-1164 2473.2,-1096.81 2453.44,-1009.63 2443.89,-943.08"/> +<polygon fill="black" stroke="black" points="2447.33,-942.43 2442.5,-933 2440.4,-943.39 2447.33,-942.43"/> +</g> +<!-- user_collection --> +<g id="node68" class="node"> +<title>user_collection</title> +<polygon fill="white" stroke="transparent" points="11361,-4811 11361,-4964 11543,-4964 11543,-4811 11361,-4811"/> +<polygon fill="#d7b5d8" stroke="transparent" points="11364,-4939.5 11364,-4960.5 11540,-4960.5 11540,-4939.5 11364,-4939.5"/> +<polygon fill="none" stroke="black" points="11364,-4939.5 11364,-4960.5 11540,-4960.5 11540,-4939.5 11364,-4939.5"/> +<text text-anchor="start" x="11367" y="-4946.3" font-family="Times,serif" font-size="14.00">user_collection (60 KiB)</text> +<text text-anchor="start" x="11380" y="-4924.3" font-family="Times,serif" font-size="14.00">changed_timestamp</text> +<text text-anchor="start" x="11383" y="-4903.3" font-family="Times,serif" font-size="14.00">created_timestamp</text> +<text text-anchor="start" x="11445" y="-4882.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="11418.5" y="-4861.3" font-family="Times,serif" font-size="14.00">members</text> +<text text-anchor="start" x="11432" y="-4840.3" font-family="Times,serif" font-size="14.00">name</text> +<text text-anchor="start" x="11436" y="-4819.3" font-family="Times,serif" font-size="14.00">user</text> +<polygon fill="none" stroke="black" points="11361,-4811 11361,-4964 11543,-4964 11543,-4811 11361,-4811"/> +</g> +<!-- pubmedsearch --> +<g id="node69" class="node"> +<title>pubmedsearch</title> +<polygon fill="white" stroke="transparent" points="11577.5,-4800.5 11577.5,-4974.5 11770.5,-4974.5 11770.5,-4800.5 11577.5,-4800.5"/> +<polygon fill="#df65b0" stroke="transparent" points="11581,-4950.5 11581,-4971.5 11768,-4971.5 11768,-4950.5 11581,-4950.5"/> +<polygon fill="none" stroke="black" points="11581,-4950.5 11581,-4971.5 11768,-4971.5 11768,-4950.5 11581,-4950.5"/> +<text text-anchor="start" x="11584" y="-4957.3" font-family="Times,serif" font-size="14.00">pubmedsearch (586 MiB)</text> +<text text-anchor="start" x="11619.5" y="-4935.3" font-family="Times,serif" font-size="14.00">authorfullname</text> +<text text-anchor="start" x="11612.5" y="-4914.3" font-family="Times,serif" font-size="14.00">authorshortname</text> +<text text-anchor="start" x="11650" y="-4893.3" font-family="Times,serif" font-size="14.00">geneid</text> +<text text-anchor="start" x="11667.5" y="-4872.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="11644" y="-4851.3" font-family="Times,serif" font-size="14.00">institute</text> +<text text-anchor="start" x="11638.5" y="-4830.3" font-family="Times,serif" font-size="14.00">pubmedid</text> +<text text-anchor="start" x="11659.5" y="-4809.3" font-family="Times,serif" font-size="14.00">title</text> +<polygon fill="none" stroke="black" points="11577.5,-4800.5 11577.5,-4974.5 11770.5,-4974.5 11770.5,-4800.5 11577.5,-4800.5"/> +</g> +<!-- EnsemblProbeLocation --> +<g id="node70" class="node"> +<title>EnsemblProbeLocation</title> +<polygon fill="white" stroke="transparent" points="6793,-4790 6793,-4985 7037,-4985 7037,-4790 6793,-4790"/> +<polygon fill="#df65b0" stroke="transparent" points="6796,-4960.5 6796,-4981.5 7034,-4981.5 7034,-4960.5 6796,-4960.5"/> +<polygon fill="none" stroke="black" points="6796,-4960.5 6796,-4981.5 7034,-4981.5 7034,-4960.5 6796,-4960.5"/> +<text text-anchor="start" x="6799" y="-4967.3" font-family="Times,serif" font-size="14.00">EnsemblProbeLocation (99 MiB)</text> +<text text-anchor="start" x="6901.5" y="-4945.3" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="6900.5" y="-4924.3" font-family="Times,serif" font-size="14.00">End</text> +<text text-anchor="start" x="6879" y="-4903.3" font-family="Times,serif" font-size="14.00">End_2016</text> +<text text-anchor="start" x="6867" y="-4882.3" font-family="Times,serif" font-size="14.00">MisMataches</text> +<text text-anchor="start" x="6886.5" y="-4861.3" font-family="Times,serif" font-size="14.00">ProbeId</text> +<text text-anchor="start" x="6896.5" y="-4840.3" font-family="Times,serif" font-size="14.00">Start</text> +<text text-anchor="start" x="6875" y="-4819.3" font-family="Times,serif" font-size="14.00">Start_2016</text> +<text text-anchor="start" x="6890.5" y="-4798.3" font-family="Times,serif" font-size="14.00">Strand</text> +<polygon fill="none" stroke="black" points="6793,-4790 6793,-4985 7037,-4985 7037,-4790 6793,-4790"/> +</g> +<!-- EnsemblProbeLocation->Probe --> +<g id="edge64" class="edge"> +<title>EnsemblProbeLocation:ProbeId->Probe</title> +<path fill="none" stroke="black" d="M7035,-4864.5C7071.26,-4864.5 6964.83,-3784.86 6927.45,-3416.46"/> +<polygon fill="black" stroke="black" points="6930.91,-3415.9 6926.42,-3406.3 6923.95,-3416.61 6930.91,-3415.9"/> +</g> +<!-- Investigators->Organizations --> +<g id="edge65" class="edge"> +<title>Investigators:OrganizationId->Organizations</title> +<path fill="none" stroke="black" d="M256,-296.5C296.78,-296.5 271.73,-150.19 255,-113 250.33,-102.62 243.39,-93.09 235.5,-84.57"/> +<polygon fill="black" stroke="black" points="237.88,-82 228.35,-77.36 232.9,-86.93 237.88,-82"/> +</g> +<!-- ProbeSetSE --> +<g id="node72" class="node"> +<title>ProbeSetSE</title> +<polygon fill="white" stroke="transparent" points="6068,-1918 6068,-2008 6222,-2008 6222,-1918 6068,-1918"/> +<polygon fill="#ce1256" stroke="transparent" points="6071,-1984 6071,-2005 6219,-2005 6219,-1984 6071,-1984"/> +<polygon fill="none" stroke="black" points="6071,-1984 6071,-2005 6219,-2005 6219,-1984 6071,-1984"/> +<text text-anchor="start" x="6074" y="-1990.8" font-family="Times,serif" font-size="14.00">ProbeSetSE (7 GiB)</text> +<text text-anchor="start" x="6120.5" y="-1968.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="6126.5" y="-1947.8" font-family="Times,serif" font-size="14.00">error</text> +<text text-anchor="start" x="6115.5" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="6068,-1918 6068,-2008 6222,-2008 6222,-1918 6068,-1918"/> +</g> +<!-- ProbeSetSE->Strain --> +<g id="edge66" class="edge"> +<title>ProbeSetSE:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6070,-1930C6049.72,-1930 6057.62,-1219.18 6051,-1200 6011.97,-1086.88 5923.03,-979.85 5858.94,-913.01"/> +<polygon fill="black" stroke="black" points="5861.11,-910.22 5851.65,-905.47 5856.08,-915.09 5861.11,-910.22"/> +</g> +<!-- TableComments --> +<g id="node74" class="node"> +<title>TableComments</title> +<polygon fill="white" stroke="transparent" points="11805,-4853 11805,-4922 11995,-4922 11995,-4853 11805,-4853"/> +<polygon fill="#d7b5d8" stroke="transparent" points="11808,-4897.5 11808,-4918.5 11992,-4918.5 11992,-4897.5 11808,-4897.5"/> +<polygon fill="none" stroke="black" points="11808,-4897.5 11808,-4918.5 11992,-4918.5 11992,-4897.5 11808,-4897.5"/> +<text text-anchor="start" x="11811" y="-4904.3" font-family="Times,serif" font-size="14.00">TableComments (34 KiB)</text> +<text text-anchor="start" x="11865" y="-4882.3" font-family="Times,serif" font-size="14.00">Comment</text> +<text text-anchor="start" x="11859.5" y="-4861.3" font-family="Times,serif" font-size="14.00">TableName</text> +<polygon fill="none" stroke="black" points="11805,-4853 11805,-4922 11995,-4922 11995,-4853 11805,-4853"/> +</g> +<!-- Dataset_mbat --> +<g id="node75" class="node"> +<title>Dataset_mbat</title> +<polygon fill="white" stroke="transparent" points="12029.5,-4800.5 12029.5,-4974.5 12198.5,-4974.5 12198.5,-4800.5 12029.5,-4800.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="12033,-4950.5 12033,-4971.5 12196,-4971.5 12196,-4950.5 12033,-4950.5"/> +<polygon fill="none" stroke="black" points="12033,-4950.5 12033,-4971.5 12196,-4971.5 12196,-4950.5 12033,-4950.5"/> +<text text-anchor="start" x="12036" y="-4957.3" font-family="Times,serif" font-size="14.00">Dataset_mbat (764 B)</text> +<text text-anchor="start" x="12095.5" y="-4935.3" font-family="Times,serif" font-size="14.00">cross</text> +<text text-anchor="start" x="12082" y="-4914.3" font-family="Times,serif" font-size="14.00">database</text> +<text text-anchor="start" x="12040" y="-4893.3" font-family="Times,serif" font-size="14.00">database_LongName</text> +<text text-anchor="start" x="12107.5" y="-4872.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="12088" y="-4851.3" font-family="Times,serif" font-size="14.00">species</text> +<text text-anchor="start" x="12091" y="-4830.3" font-family="Times,serif" font-size="14.00">switch</text> +<text text-anchor="start" x="12093" y="-4809.3" font-family="Times,serif" font-size="14.00">tissue</text> +<polygon fill="none" stroke="black" points="12029.5,-4800.5 12029.5,-4974.5 12198.5,-4974.5 12198.5,-4800.5 12029.5,-4800.5"/> +</g> +<!-- CaseAttributeXRefNew --> +<g id="node76" class="node"> +<title>CaseAttributeXRefNew</title> +<polygon fill="white" stroke="transparent" points="3817,-1907.5 3817,-2018.5 4053,-2018.5 4053,-1907.5 3817,-1907.5"/> +<polygon fill="#df65b0" stroke="transparent" points="3820,-1994 3820,-2015 4050,-2015 4050,-1994 3820,-1994"/> +<polygon fill="none" stroke="black" points="3820,-1994 3820,-2015 4050,-2015 4050,-1994 3820,-1994"/> +<text text-anchor="start" x="3823" y="-2000.8" font-family="Times,serif" font-size="14.00">CaseAttributeXRefNew (5 MiB)</text> +<text text-anchor="start" x="3877.5" y="-1978.8" font-family="Times,serif" font-size="14.00">CaseAttributeId</text> +<text text-anchor="start" x="3892" y="-1957.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="3905.5" y="-1936.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="3915" y="-1915.8" font-family="Times,serif" font-size="14.00">Value</text> +<polygon fill="none" stroke="black" points="3817,-1907.5 3817,-2018.5 4053,-2018.5 4053,-1907.5 3817,-1907.5"/> +</g> +<!-- CaseAttributeXRefNew->InbredSet --> +<g id="edge68" class="edge"> +<title>CaseAttributeXRefNew:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M3819,-1961C3795.41,-1961 3828.4,-1316.38 3845.65,-1006.1"/> +<polygon fill="black" stroke="black" points="3849.14,-1006.29 3846.2,-996.11 3842.15,-1005.9 3849.14,-1006.29"/> +</g> +<!-- CaseAttributeXRefNew->CaseAttribute --> +<g id="edge67" class="edge"> +<title>CaseAttributeXRefNew:CaseAttributeId->CaseAttribute</title> +<path fill="none" stroke="black" d="M3819,-1983C3775.49,-1983 3829.94,-1230.6 3799,-1200 3702.3,-1104.35 1459.95,-1245.42 1351,-1164 1269.39,-1103.01 1252.58,-975.97 1250.14,-901.3"/> +<polygon fill="black" stroke="black" points="1253.64,-901.03 1249.89,-891.12 1246.64,-901.2 1253.64,-901.03"/> +</g> +<!-- CaseAttributeXRefNew->Strain --> +<g id="edge69" class="edge"> +<title>CaseAttributeXRefNew:StrainId->Strain</title> +<path fill="none" stroke="black" d="M4051,-1940C4092.12,-1940 4042.15,-1230.26 4070,-1200 4119.95,-1145.72 4327.27,-1176.34 4400,-1164 4905.53,-1078.2 5502.61,-920.46 5710.32,-863.88"/> +<polygon fill="black" stroke="black" points="5711.48,-867.19 5720.21,-861.18 5709.64,-860.44 5711.48,-867.19"/> +</g> +<!-- GenoCode --> +<g id="node77" class="node"> +<title>GenoCode</title> +<polygon fill="white" stroke="transparent" points="3486.5,-1907.5 3486.5,-2018.5 3619.5,-2018.5 3619.5,-1907.5 3486.5,-1907.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="3490,-1994 3490,-2015 3617,-2015 3617,-1994 3490,-1994"/> +<polygon fill="none" stroke="black" points="3490,-1994 3490,-2015 3617,-2015 3617,-1994 3490,-1994"/> +<text text-anchor="start" x="3493" y="-2000.8" font-family="Times,serif" font-size="14.00">GenoCode (40 B)</text> +<text text-anchor="start" x="3506.5" y="-1978.8" font-family="Times,serif" font-size="14.00">AlleleSymbol</text> +<text text-anchor="start" x="3516" y="-1957.8" font-family="Times,serif" font-size="14.00">AlleleType</text> +<text text-anchor="start" x="3500.5" y="-1936.8" font-family="Times,serif" font-size="14.00">DatabaseValue</text> +<text text-anchor="start" x="3510.5" y="-1915.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<polygon fill="none" stroke="black" points="3486.5,-1907.5 3486.5,-2018.5 3619.5,-2018.5 3619.5,-1907.5 3486.5,-1907.5"/> +</g> +<!-- GenoCode->InbredSet --> +<g id="edge70" class="edge"> +<title>GenoCode:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M3618,-1919C3657.96,-1919 3611.64,-1231.67 3636,-1200 3670.72,-1154.85 3718.61,-1204.16 3759,-1164 3801.12,-1122.13 3824.91,-1062.6 3838.29,-1006.16"/> +<polygon fill="black" stroke="black" points="3841.71,-1006.93 3840.51,-996.4 3834.88,-1005.38 3841.71,-1006.93"/> +</g> +<!-- ProbeSE->Strain --> +<g id="edge71" class="edge"> +<title>ProbeSE:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6994,-1930C6953.43,-1930 6998.65,-1232.22 6974,-1200 6834.26,-1017.37 6100.93,-891 5861.61,-854.12"/> +<polygon fill="black" stroke="black" points="5862.02,-850.65 5851.61,-852.59 5860.96,-857.57 5862.02,-850.65"/> +</g> +<!-- Temp --> +<g id="node80" class="node"> +<title>Temp</title> +<polygon fill="white" stroke="transparent" points="4087.5,-1865.5 4087.5,-2060.5 4206.5,-2060.5 4206.5,-1865.5 4087.5,-1865.5"/> +<polygon fill="#df65b0" stroke="transparent" points="4091,-2036 4091,-2057 4204,-2057 4204,-2036 4091,-2036"/> +<polygon fill="none" stroke="black" points="4091,-2036 4091,-2057 4204,-2057 4204,-2036 4091,-2036"/> +<text text-anchor="start" x="4099" y="-2042.8" font-family="Times,serif" font-size="14.00">Temp (1 MiB)</text> +<text text-anchor="start" x="4108.5" y="-2020.8" font-family="Times,serif" font-size="14.00">createtime</text> +<text text-anchor="start" x="4123" y="-1999.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="4093" y="-1978.8" font-family="Times,serif" font-size="14.00">dbdisplayname</text> +<text text-anchor="start" x="4107" y="-1957.8" font-family="Times,serif" font-size="14.00">description</text> +<text text-anchor="start" x="4140" y="-1936.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="4104.5" y="-1915.8" font-family="Times,serif" font-size="14.00">InbredSetId</text> +<text text-anchor="start" x="4139.5" y="-1894.8" font-family="Times,serif" font-size="14.00">IP</text> +<text text-anchor="start" x="4126" y="-1873.8" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="4087.5,-1865.5 4087.5,-2060.5 4206.5,-2060.5 4206.5,-1865.5 4087.5,-1865.5"/> +</g> +<!-- Temp->InbredSet --> +<g id="edge72" class="edge"> +<title>Temp:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M4090,-1919C4070.02,-1919 4075.62,-1219.17 4070,-1200 4043.91,-1110.94 3990,-1021.51 3942.68,-954.43"/> +<polygon fill="black" stroke="black" points="3945.3,-952.07 3936.65,-945.95 3939.59,-956.12 3945.3,-952.07"/> +</g> +<!-- GenoData --> +<g id="node81" class="node"> +<title>GenoData</title> +<polygon fill="white" stroke="transparent" points="6256.5,-1918 6256.5,-2008 6403.5,-2008 6403.5,-1918 6256.5,-1918"/> +<polygon fill="#ce1256" stroke="transparent" points="6260,-1984 6260,-2005 6401,-2005 6401,-1984 6260,-1984"/> +<polygon fill="none" stroke="black" points="6260,-1984 6260,-2005 6401,-2005 6401,-1984 6260,-1984"/> +<text text-anchor="start" x="6263" y="-1990.8" font-family="Times,serif" font-size="14.00">GenoData (10 GiB)</text> +<text text-anchor="start" x="6323" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="6301" y="-1947.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="6311" y="-1926.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="6256.5,-1918 6256.5,-2008 6403.5,-2008 6403.5,-1918 6256.5,-1918"/> +</g> +<!-- GenoData->Strain --> +<g id="edge73" class="edge"> +<title>GenoData:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6259,-1951C6217.26,-1951 6257.72,-1237.31 6239,-1200 6158.18,-1038.89 5967.05,-927.85 5860.69,-876.11"/> +<polygon fill="black" stroke="black" points="5862.14,-872.92 5851.61,-871.74 5859.11,-879.23 5862.14,-872.92"/> +</g> +<!-- GenoFreeze->InbredSet --> +<g id="edge74" class="edge"> +<title>GenoFreeze:InbredSetId->InbredSet</title> +<path fill="none" stroke="black" d="M4409,-1930C4368.43,-1930 4415.79,-1231.31 4390,-1200 4343.1,-1143.07 4293.94,-1197.05 4228,-1164 4118.16,-1108.94 4014.02,-1014.44 3943.83,-942.19"/> +<polygon fill="black" stroke="black" points="3946.19,-939.59 3936.73,-934.83 3941.15,-944.45 3946.19,-939.59"/> +</g> +<!-- ProbeSetData --> +<g id="node83" class="node"> +<title>ProbeSetData</title> +<polygon fill="white" stroke="transparent" points="6438,-1918 6438,-2008 6614,-2008 6614,-1918 6438,-1918"/> +<polygon fill="#ce1256" stroke="transparent" points="6441,-1984 6441,-2005 6611,-2005 6611,-1984 6441,-1984"/> +<polygon fill="none" stroke="black" points="6441,-1984 6441,-2005 6611,-2005 6611,-1984 6441,-1984"/> +<text text-anchor="start" x="6444" y="-1990.8" font-family="Times,serif" font-size="14.00">ProbeSetData (62 GiB)</text> +<text text-anchor="start" x="6518.5" y="-1968.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="6496.5" y="-1947.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="6506.5" y="-1926.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="6438,-1918 6438,-2008 6614,-2008 6614,-1918 6438,-1918"/> +</g> +<!-- ProbeSetData->Strain --> +<g id="edge75" class="edge"> +<title>ProbeSetData:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6440,-1951C6398.26,-1951 6441.54,-1235.75 6420,-1200 6294.74,-992.11 6000.36,-895.18 5861.29,-859.75"/> +<polygon fill="black" stroke="black" points="5862.1,-856.35 5851.55,-857.31 5860.4,-863.14 5862.1,-856.35"/> +</g> +<!-- CeleraINFO_mm6 --> +<g id="node84" class="node"> +<title>CeleraINFO_mm6</title> +<polygon fill="white" stroke="transparent" points="12232,-4706 12232,-5069 12448,-5069 12448,-4706 12232,-4706"/> +<polygon fill="#df65b0" stroke="transparent" points="12235,-5044.5 12235,-5065.5 12445,-5065.5 12445,-5044.5 12235,-5044.5"/> +<polygon fill="none" stroke="black" points="12235,-5044.5 12235,-5065.5 12445,-5065.5 12445,-5044.5 12235,-5044.5"/> +<text text-anchor="start" x="12238" y="-5051.3" font-family="Times,serif" font-size="14.00">CeleraINFO_mm6 (780 MiB)</text> +<text text-anchor="start" x="12309.5" y="-5029.3" font-family="Times,serif" font-size="14.00">allele_AJ</text> +<text text-anchor="start" x="12307.5" y="-5008.3" font-family="Times,serif" font-size="14.00">allele_B6</text> +<text text-anchor="start" x="12307" y="-4987.3" font-family="Times,serif" font-size="14.00">allele_D2</text> +<text text-anchor="start" x="12308" y="-4966.3" font-family="Times,serif" font-size="14.00">allele_S1</text> +<text text-anchor="start" x="12308" y="-4945.3" font-family="Times,serif" font-size="14.00">allele_X1</text> +<text text-anchor="start" x="12319" y="-4924.3" font-family="Times,serif" font-size="14.00">B6_AJ</text> +<text text-anchor="start" x="12316.5" y="-4903.3" font-family="Times,serif" font-size="14.00">B6_D2</text> +<text text-anchor="start" x="12294.5" y="-4882.3" font-family="Times,serif" font-size="14.00">chromosome</text> +<text text-anchor="start" x="12318.5" y="-4861.3" font-family="Times,serif" font-size="14.00">D2_AJ</text> +<text text-anchor="start" x="12306.5" y="-4840.3" font-family="Times,serif" font-size="14.00">flanking3</text> +<text text-anchor="start" x="12306.5" y="-4819.3" font-family="Times,serif" font-size="14.00">flanking5</text> +<text text-anchor="start" x="12332.5" y="-4798.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="12302" y="-4777.3" font-family="Times,serif" font-size="14.00">MB_celera</text> +<text text-anchor="start" x="12302.5" y="-4756.3" font-family="Times,serif" font-size="14.00">MB_UCSC</text> +<text text-anchor="start" x="12283.5" y="-4735.3" font-family="Times,serif" font-size="14.00">MB_UCSC_OLD</text> +<text text-anchor="start" x="12315.5" y="-4714.3" font-family="Times,serif" font-size="14.00">SNPID</text> +<polygon fill="none" stroke="black" points="12232,-4706 12232,-5069 12448,-5069 12448,-4706 12232,-4706"/> +</g> +<!-- TableFieldAnnotation --> +<g id="node85" class="node"> +<title>TableFieldAnnotation</title> +<polygon fill="white" stroke="transparent" points="12482,-4842.5 12482,-4932.5 12710,-4932.5 12710,-4842.5 12482,-4842.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="12485,-4908.5 12485,-4929.5 12707,-4929.5 12707,-4908.5 12485,-4908.5"/> +<polygon fill="none" stroke="black" points="12485,-4908.5 12485,-4929.5 12707,-4929.5 12707,-4908.5 12485,-4908.5"/> +<text text-anchor="start" x="12488" y="-4915.3" font-family="Times,serif" font-size="14.00">TableFieldAnnotation (43 KiB)</text> +<text text-anchor="start" x="12556.5" y="-4893.3" font-family="Times,serif" font-size="14.00">Annotation</text> +<text text-anchor="start" x="12552" y="-4872.3" font-family="Times,serif" font-size="14.00">Foreign_Key</text> +<text text-anchor="start" x="12558.5" y="-4851.3" font-family="Times,serif" font-size="14.00">TableField</text> +<polygon fill="none" stroke="black" points="12482,-4842.5 12482,-4932.5 12710,-4932.5 12710,-4842.5 12482,-4842.5"/> +</g> +<!-- ProbeSet --> +<g id="node86" class="node"> +<title>ProbeSet</title> +<polygon fill="white" stroke="transparent" points="752.5,-1204 752.5,-2722 983.5,-2722 983.5,-1204 752.5,-1204"/> +<polygon fill="#ce1256" stroke="transparent" points="756,-2698 756,-2719 981,-2719 981,-2698 756,-2698"/> +<polygon fill="none" stroke="black" points="756,-2698 756,-2719 981,-2719 981,-2698 756,-2698"/> +<text text-anchor="start" x="808" y="-2704.8" font-family="Times,serif" font-size="14.00">ProbeSet (2 GiB)</text> +<text text-anchor="start" x="851.5" y="-2682.8" font-family="Times,serif" font-size="14.00">alias</text> +<text text-anchor="start" x="842.5" y="-2661.8" font-family="Times,serif" font-size="14.00">alias_H</text> +<text text-anchor="start" x="821.5" y="-2640.8" font-family="Times,serif" font-size="14.00">Biotype_ENS</text> +<text text-anchor="start" x="840" y="-2619.8" font-family="Times,serif" font-size="14.00">BlatSeq</text> +<text text-anchor="start" x="822" y="-2598.8" font-family="Times,serif" font-size="14.00">CAS_number</text> +<text text-anchor="start" x="833.5" y="-2577.8" font-family="Times,serif" font-size="14.00">ChEBI_ID</text> +<text text-anchor="start" x="824.5" y="-2556.8" font-family="Times,serif" font-size="14.00">ChEMBL_ID</text> +<text text-anchor="start" x="812.5" y="-2535.8" font-family="Times,serif" font-size="14.00">ChemSpider_ID</text> +<text text-anchor="start" x="844.5" y="-2514.8" font-family="Times,serif" font-size="14.00">ChipId</text> +<text text-anchor="start" x="855" y="-2493.8" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="833.5" y="-2472.8" font-family="Times,serif" font-size="14.00">Chr_2016</text> +<text text-anchor="start" x="833.5" y="-2451.8" font-family="Times,serif" font-size="14.00">Chr_mm8</text> +<text text-anchor="start" x="837.5" y="-2430.8" font-family="Times,serif" font-size="14.00">chr_num</text> +<text text-anchor="start" x="813.5" y="-2409.8" font-family="Times,serif" font-size="14.00">chromosome_H</text> +<text text-anchor="start" x="831.5" y="-2388.8" font-family="Times,serif" font-size="14.00">comments</text> +<text text-anchor="start" x="829" y="-2367.8" font-family="Times,serif" font-size="14.00">Confidence</text> +<text text-anchor="start" x="828" y="-2346.8" font-family="Times,serif" font-size="14.00">description</text> +<text text-anchor="start" x="818.5" y="-2325.8" font-family="Times,serif" font-size="14.00">description_H</text> +<text text-anchor="start" x="826.5" y="-2304.8" font-family="Times,serif" font-size="14.00">EC_number</text> +<text text-anchor="start" x="804.5" y="-2283.8" font-family="Times,serif" font-size="14.00">ENSEMBLGeneId</text> +<text text-anchor="start" x="855" y="-2262.8" font-family="Times,serif" font-size="14.00">flag</text> +<text text-anchor="start" x="830" y="-2241.8" font-family="Times,serif" font-size="14.00">Flybase_Id</text> +<text text-anchor="start" x="829.5" y="-2220.8" font-family="Times,serif" font-size="14.00">GenbankId</text> +<text text-anchor="start" x="842.5" y="-2199.8" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="833.5" y="-2178.8" font-family="Times,serif" font-size="14.00">GeneId_H</text> +<text text-anchor="start" x="833.5" y="-2157.8" font-family="Times,serif" font-size="14.00">HGNC_ID</text> +<text text-anchor="start" x="832.5" y="-2136.8" font-family="Times,serif" font-size="14.00">HMDB_ID</text> +<text text-anchor="start" x="814" y="-2115.8" font-family="Times,serif" font-size="14.00">HomoloGeneID</text> +<text text-anchor="start" x="861" y="-2094.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="835" y="-2073.8" font-family="Times,serif" font-size="14.00">KEGG_ID</text> +<text text-anchor="start" x="856.5" y="-2052.8" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="835" y="-2031.8" font-family="Times,serif" font-size="14.00">Mb_2016</text> +<text text-anchor="start" x="846.5" y="-2010.8" font-family="Times,serif" font-size="14.00">MB_H</text> +<text text-anchor="start" x="835" y="-1989.8" font-family="Times,serif" font-size="14.00">Mb_mm8</text> +<text text-anchor="start" x="804" y="-1968.8" font-family="Times,serif" font-size="14.00">Molecular_Weight</text> +<text text-anchor="start" x="847" y="-1947.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="829.5" y="-1926.8" font-family="Times,serif" font-size="14.00">name_num</text> +<text text-anchor="start" x="823" y="-1905.8" font-family="Times,serif" font-size="14.00">Nugowiki_ID</text> +<text text-anchor="start" x="845.5" y="-1884.8" font-family="Times,serif" font-size="14.00">OMIM</text> +<text text-anchor="start" x="806.5" y="-1863.8" font-family="Times,serif" font-size="14.00">PeptideSequence</text> +<text text-anchor="start" x="818.5" y="-1842.8" font-family="Times,serif" font-size="14.00">PrimaryName</text> +<text text-anchor="start" x="783.5" y="-1821.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_end</text> +<text text-anchor="start" x="762" y="-1800.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_end_2016</text> +<text text-anchor="start" x="762" y="-1779.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_end_mm8</text> +<text text-anchor="start" x="779.5" y="-1758.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_start</text> +<text text-anchor="start" x="758" y="-1737.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_start_2016</text> +<text text-anchor="start" x="758" y="-1716.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_start_mm8</text> +<text text-anchor="start" x="788.5" y="-1695.8" font-family="Times,serif" font-size="14.00">Probe_set_BLAT_score</text> +<text text-anchor="start" x="784.5" y="-1674.8" font-family="Times,serif" font-size="14.00">Probe_set_Note_by_RW</text> +<text text-anchor="start" x="793.5" y="-1653.8" font-family="Times,serif" font-size="14.00">Probe_set_specificity</text> +<text text-anchor="start" x="806.5" y="-1632.8" font-family="Times,serif" font-size="14.00">Probe_set_strand</text> +<text text-anchor="start" x="781" y="-1611.8" font-family="Times,serif" font-size="14.00">Probe_set_target_region</text> +<text text-anchor="start" x="776" y="-1590.8" font-family="Times,serif" font-size="14.00">Probe_Target_Description</text> +<text text-anchor="start" x="833.5" y="-1569.8" font-family="Times,serif" font-size="14.00">ProteinID</text> +<text text-anchor="start" x="821" y="-1548.8" font-family="Times,serif" font-size="14.00">ProteinName</text> +<text text-anchor="start" x="822" y="-1527.8" font-family="Times,serif" font-size="14.00">PubChem_ID</text> +<text text-anchor="start" x="795" y="-1506.8" font-family="Times,serif" font-size="14.00">RefSeq_TranscriptId</text> +<text text-anchor="start" x="840" y="-1485.8" font-family="Times,serif" font-size="14.00">RGD_ID</text> +<text text-anchor="start" x="806" y="-1464.8" font-family="Times,serif" font-size="14.00">SecondaryNames</text> +<text text-anchor="start" x="852.5" y="-1443.8" font-family="Times,serif" font-size="14.00">SNP</text> +<text text-anchor="start" x="822" y="-1422.8" font-family="Times,serif" font-size="14.00">Strand_Gene</text> +<text text-anchor="start" x="819.5" y="-1401.8" font-family="Times,serif" font-size="14.00">Strand_Probe</text> +<text text-anchor="start" x="841.5" y="-1380.8" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="832" y="-1359.8" font-family="Times,serif" font-size="14.00">Symbol_H</text> +<text text-anchor="start" x="838" y="-1338.8" font-family="Times,serif" font-size="14.00">TargetId</text> +<text text-anchor="start" x="831.5" y="-1317.8" font-family="Times,serif" font-size="14.00">TargetSeq</text> +<text text-anchor="start" x="845.5" y="-1296.8" font-family="Times,serif" font-size="14.00">Tissue</text> +<text text-anchor="start" x="851" y="-1275.8" font-family="Times,serif" font-size="14.00">Type</text> +<text text-anchor="start" x="830" y="-1254.8" font-family="Times,serif" font-size="14.00">UniGeneId</text> +<text text-anchor="start" x="839.5" y="-1233.8" font-family="Times,serif" font-size="14.00">UNII_ID</text> +<text text-anchor="start" x="832" y="-1212.8" font-family="Times,serif" font-size="14.00">UniProtID</text> +<polygon fill="none" stroke="black" points="752.5,-1204 752.5,-2722 983.5,-2722 983.5,-1204 752.5,-1204"/> +</g> +<!-- ProbeSet->Genbank --> +<g id="edge76" class="edge"> +<title>ProbeSet:GenbankId->Genbank</title> +<path fill="none" stroke="black" d="M755,-2225C726.53,-2225 752.7,-1228.28 756,-1200 768.49,-1092.85 801.24,-971.17 821.96,-901.12"/> +<polygon fill="black" stroke="black" points="825.42,-901.75 824.93,-891.16 818.72,-899.75 825.42,-901.75"/> +</g> +<!-- GenoFile --> +<g id="node87" class="node"> +<title>GenoFile</title> +<polygon fill="white" stroke="transparent" points="4240.5,-1886.5 4240.5,-2039.5 4373.5,-2039.5 4373.5,-1886.5 4240.5,-1886.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="4244,-2015 4244,-2036 4371,-2036 4371,-2015 4244,-2015"/> +<polygon fill="none" stroke="black" points="4244,-2015 4244,-2036 4371,-2036 4371,-2015 4244,-2015"/> +<text text-anchor="start" x="4247" y="-2021.8" font-family="Times,serif" font-size="14.00">GenoFile (332 B)</text> +<text text-anchor="start" x="4300.5" y="-1999.8" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="4263.5" y="-1978.8" font-family="Times,serif" font-size="14.00">InbredSetID</text> +<text text-anchor="start" x="4279" y="-1957.8" font-family="Times,serif" font-size="14.00">location</text> +<text text-anchor="start" x="4284.5" y="-1936.8" font-family="Times,serif" font-size="14.00">server</text> +<text text-anchor="start" x="4293" y="-1915.8" font-family="Times,serif" font-size="14.00">sort</text> +<text text-anchor="start" x="4292.5" y="-1894.8" font-family="Times,serif" font-size="14.00">title</text> +<polygon fill="none" stroke="black" points="4240.5,-1886.5 4240.5,-2039.5 4373.5,-2039.5 4373.5,-1886.5 4240.5,-1886.5"/> +</g> +<!-- GenoFile->InbredSet --> +<g id="edge77" class="edge"> +<title>GenoFile:InbredSetID->InbredSet</title> +<path fill="none" stroke="black" d="M4243,-1983C4221.24,-1983 4231.73,-1219.93 4223,-1200 4165.37,-1068.5 4034.27,-960.98 3945.16,-899.43"/> +<polygon fill="black" stroke="black" points="3946.9,-896.38 3936.67,-893.62 3942.95,-902.16 3946.9,-896.38"/> +</g> +<!-- TempData --> +<g id="node88" class="node"> +<title>TempData</title> +<polygon fill="white" stroke="transparent" points="6636,-3228 6636,-3360 6788,-3360 6788,-3228 6636,-3228"/> +<polygon fill="#df65b0" stroke="transparent" points="6639,-3336 6639,-3357 6785,-3357 6785,-3336 6639,-3336"/> +<polygon fill="none" stroke="black" points="6639,-3336 6639,-3357 6785,-3357 6785,-3336 6639,-3336"/> +<text text-anchor="start" x="6642" y="-3342.8" font-family="Times,serif" font-size="14.00">TempData (11 MiB)</text> +<text text-anchor="start" x="6704.5" y="-3320.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="6683.5" y="-3299.8" font-family="Times,serif" font-size="14.00">NStrain</text> +<text text-anchor="start" x="6701.5" y="-3278.8" font-family="Times,serif" font-size="14.00">SE</text> +<text text-anchor="start" x="6682.5" y="-3257.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="6692.5" y="-3236.8" font-family="Times,serif" font-size="14.00">value</text> +<polygon fill="none" stroke="black" points="6636,-3228 6636,-3360 6788,-3360 6788,-3228 6636,-3228"/> +</g> +<!-- TempData->NStrain --> +<g id="edge78" class="edge"> +<title>TempData:NStrain->NStrain</title> +<path fill="none" stroke="black" d="M6786,-3304C6851.17,-3304 6745.87,-2280.14 6718.32,-2022.36"/> +<polygon fill="black" stroke="black" points="6721.77,-2021.66 6717.22,-2012.09 6714.81,-2022.4 6721.77,-2021.66"/> +</g> +<!-- TempData->Strain --> +<g id="edge79" class="edge"> +<title>TempData:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6786,-3261C6799.61,-3261 6829.44,-1253.01 6792,-1200 6572.1,-888.62 6056.1,-847.14 5861.8,-842.87"/> +<polygon fill="black" stroke="black" points="5861.75,-839.37 5851.68,-842.67 5861.61,-846.36 5861.75,-839.37"/> +</g> +<!-- CaseAttributeXRef --> +<g id="node89" class="node"> +<title>CaseAttributeXRef</title> +<polygon fill="white" stroke="transparent" points="2630,-4832 2630,-4943 2848,-4943 2848,-4832 2630,-4832"/> +<polygon fill="#d7b5d8" stroke="transparent" points="2633,-4918.5 2633,-4939.5 2845,-4939.5 2845,-4918.5 2633,-4918.5"/> +<polygon fill="none" stroke="black" points="2633,-4918.5 2633,-4939.5 2845,-4939.5 2845,-4918.5 2633,-4918.5"/> +<text text-anchor="start" x="2636" y="-4925.3" font-family="Times,serif" font-size="14.00">CaseAttributeXRef (753 KiB)</text> +<text text-anchor="start" x="2681.5" y="-4903.3" font-family="Times,serif" font-size="14.00">CaseAttributeId</text> +<text text-anchor="start" x="2674" y="-4882.3" font-family="Times,serif" font-size="14.00">ProbeSetFreezeId</text> +<text text-anchor="start" x="2709.5" y="-4861.3" font-family="Times,serif" font-size="14.00">StrainId</text> +<text text-anchor="start" x="2719" y="-4840.3" font-family="Times,serif" font-size="14.00">Value</text> +<polygon fill="none" stroke="black" points="2630,-4832 2630,-4943 2848,-4943 2848,-4832 2630,-4832"/> +</g> +<!-- CaseAttributeXRef->CaseAttribute --> +<g id="edge80" class="edge"> +<title>CaseAttributeXRef:CaseAttributeId->CaseAttribute</title> +<path fill="none" stroke="black" d="M2632,-4907.5C859.27,-4907.5 1188.58,-1398.42 1244.12,-901.29"/> +<polygon fill="black" stroke="black" points="1247.63,-901.45 1245.27,-891.12 1240.67,-900.66 1247.63,-901.45"/> +</g> +<!-- CaseAttributeXRef->Strain --> +<g id="edge82" class="edge"> +<title>CaseAttributeXRef:StrainId->Strain</title> +<path fill="none" stroke="black" d="M2846,-4864.5C3071.96,-4864.5 2844.72,-4009.37 3016,-3862 3099.31,-3790.32 4915.51,-3902.94 4994,-3826 5098.23,-3723.83 4995.8,-1323.24 5074,-1200 5218.94,-971.59 5558.15,-883.8 5710.07,-855.09"/> +<polygon fill="black" stroke="black" points="5711.05,-858.47 5720.24,-853.2 5709.77,-851.59 5711.05,-858.47"/> +</g> +<!-- CaseAttributeXRef->ProbeSetFreeze --> +<g id="edge81" class="edge"> +<title>CaseAttributeXRef:ProbeSetFreezeId->ProbeSetFreeze</title> +<path fill="none" stroke="black" d="M2846,-4885.5C3129.96,-4885.5 2889.92,-3863.52 2783.5,-3457.98"/> +<polygon fill="black" stroke="black" points="2786.86,-3457.01 2780.93,-3448.23 2780.09,-3458.79 2786.86,-3457.01"/> +</g> +<!-- ProbeSetFreeze->ProbeFreeze --> +<g id="edge83" class="edge"> +<title>ProbeSetFreeze:ProbeFreezeId->ProbeFreeze</title> +<path fill="none" stroke="black" d="M2642,-3198C2531.36,-3198 2632.91,-2395.98 2676.43,-2085.09"/> +<polygon fill="black" stroke="black" points="2679.9,-2085.53 2677.83,-2075.14 2672.97,-2084.56 2679.9,-2085.53"/> +</g> +<!-- temporary --> +<g id="node91" class="node"> +<title>temporary</title> +<polygon fill="white" stroke="transparent" points="12744.5,-4811 12744.5,-4964 12889.5,-4964 12889.5,-4811 12744.5,-4811"/> +<polygon fill="#df65b0" stroke="transparent" points="12748,-4939.5 12748,-4960.5 12887,-4960.5 12887,-4939.5 12748,-4939.5"/> +<polygon fill="none" stroke="black" points="12748,-4939.5 12748,-4960.5 12887,-4960.5 12887,-4939.5 12748,-4939.5"/> +<text text-anchor="start" x="12751" y="-4946.3" font-family="Times,serif" font-size="14.00">temporary (4 MiB)</text> +<text text-anchor="start" x="12790.5" y="-4924.3" font-family="Times,serif" font-size="14.00">GeneID</text> +<text text-anchor="start" x="12771.5" y="-4903.3" font-family="Times,serif" font-size="14.00">HomoloGene</text> +<text text-anchor="start" x="12794.5" y="-4882.3" font-family="Times,serif" font-size="14.00">OMIM</text> +<text text-anchor="start" x="12766.5" y="-4861.3" font-family="Times,serif" font-size="14.00">Other_GeneID</text> +<text text-anchor="start" x="12790.5" y="-4840.3" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="12796" y="-4819.3" font-family="Times,serif" font-size="14.00">tax_id</text> +<polygon fill="none" stroke="black" points="12744.5,-4811 12744.5,-4964 12889.5,-4964 12889.5,-4811 12744.5,-4811"/> +</g> +<!-- Chr_Length --> +<g id="node92" class="node"> +<title>Chr_Length</title> +<polygon fill="white" stroke="transparent" points="1368,-765.5 1368,-918.5 1518,-918.5 1518,-765.5 1368,-765.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="1371,-894 1371,-915 1515,-915 1515,-894 1371,-894"/> +<polygon fill="none" stroke="black" points="1371,-894 1371,-915 1515,-915 1515,-894 1371,-894"/> +<text text-anchor="start" x="1374" y="-900.8" font-family="Times,serif" font-size="14.00">Chr_Length (2 KiB)</text> +<text text-anchor="start" x="1417.5" y="-878.8" font-family="Times,serif" font-size="14.00">Length</text> +<text text-anchor="start" x="1396" y="-857.8" font-family="Times,serif" font-size="14.00">Length_2016</text> +<text text-anchor="start" x="1396" y="-836.8" font-family="Times,serif" font-size="14.00">Length_mm8</text> +<text text-anchor="start" x="1421.5" y="-815.8" font-family="Times,serif" font-size="14.00">Name</text> +<text text-anchor="start" x="1414.5" y="-794.8" font-family="Times,serif" font-size="14.00">OrderId</text> +<text text-anchor="start" x="1408" y="-773.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<polygon fill="none" stroke="black" points="1368,-765.5 1368,-918.5 1518,-918.5 1518,-765.5 1368,-765.5"/> +</g> +<!-- Chr_Length->Species --> +<g id="edge84" class="edge"> +<title>Chr_Length:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M1516,-777C1544.63,-777 1515.78,-541.23 1535,-520 1694.07,-344.29 2463.44,-308.31 2715.71,-301.19"/> +<polygon fill="black" stroke="black" points="2716,-304.69 2725.9,-300.91 2715.81,-297.69 2716,-304.69"/> +</g> +<!-- GenoSE --> +<g id="node93" class="node"> +<title>GenoSE</title> +<polygon fill="white" stroke="transparent" points="6848.5,-1918 6848.5,-2008 6957.5,-2008 6957.5,-1918 6848.5,-1918"/> +<polygon fill="#f1eef6" stroke="transparent" points="6852,-1984 6852,-2005 6955,-2005 6955,-1984 6852,-1984"/> +<polygon fill="none" stroke="black" points="6852,-1984 6852,-2005 6955,-2005 6955,-1984 6852,-1984"/> +<text text-anchor="start" x="6855" y="-1990.8" font-family="Times,serif" font-size="14.00">GenoSE (0 B)</text> +<text text-anchor="start" x="6879" y="-1968.8" font-family="Times,serif" font-size="14.00">DataId</text> +<text text-anchor="start" x="6885" y="-1947.8" font-family="Times,serif" font-size="14.00">error</text> +<text text-anchor="start" x="6874" y="-1926.8" font-family="Times,serif" font-size="14.00">StrainId</text> +<polygon fill="none" stroke="black" points="6848.5,-1918 6848.5,-2008 6957.5,-2008 6957.5,-1918 6848.5,-1918"/> +</g> +<!-- GenoSE->Strain --> +<g id="edge85" class="edge"> +<title>GenoSE:StrainId->Strain</title> +<path fill="none" stroke="black" d="M6851,-1930C6810.42,-1930 6850.14,-1232.62 6826,-1200 6591.69,-883.44 6059.6,-845.25 5861.86,-842.35"/> +<polygon fill="black" stroke="black" points="5861.61,-838.85 5851.57,-842.23 5861.53,-845.85 5861.61,-838.85"/> +</g> +<!-- ProbeH2 --> +<g id="node94" class="node"> +<title>ProbeH2</title> +<polygon fill="white" stroke="transparent" points="5788.5,-4832 5788.5,-4943 5921.5,-4943 5921.5,-4832 5788.5,-4832"/> +<polygon fill="#df65b0" stroke="transparent" points="5792,-4918.5 5792,-4939.5 5919,-4939.5 5919,-4918.5 5792,-4918.5"/> +<polygon fill="none" stroke="black" points="5792,-4918.5 5792,-4939.5 5919,-4939.5 5919,-4918.5 5792,-4918.5"/> +<text text-anchor="start" x="5795" y="-4925.3" font-family="Times,serif" font-size="14.00">ProbeH2 (9 MiB)</text> +<text text-anchor="start" x="5846" y="-4903.3" font-family="Times,serif" font-size="14.00">h2</text> +<text text-anchor="start" x="5802.5" y="-4882.3" font-family="Times,serif" font-size="14.00">ProbeFreezeId</text> +<text text-anchor="start" x="5827" y="-4861.3" font-family="Times,serif" font-size="14.00">ProbeId</text> +<text text-anchor="start" x="5831" y="-4840.3" font-family="Times,serif" font-size="14.00">weight</text> +<polygon fill="none" stroke="black" points="5788.5,-4832 5788.5,-4943 5921.5,-4943 5921.5,-4832 5788.5,-4832"/> +</g> +<!-- ProbeH2->Probe --> +<g id="edge87" class="edge"> +<title>ProbeH2:ProbeId->Probe</title> +<path fill="none" stroke="black" d="M5920,-4864.5C6401.38,-4864.5 5940.09,-4144.3 6330,-3862 6421.67,-3795.63 6755.1,-3903.04 6838,-3826 6948.34,-3723.46 6950.01,-3538.6 6936.27,-3416.32"/> +<polygon fill="black" stroke="black" points="6939.72,-3415.69 6935.07,-3406.16 6932.76,-3416.5 6939.72,-3415.69"/> +</g> +<!-- ProbeH2->ProbeFreeze --> +<g id="edge86" class="edge"> +<title>ProbeH2:ProbeFreezeId->ProbeFreeze</title> +<path fill="none" stroke="black" d="M5791,-4885.5C5212.27,-4885.5 5503.91,-4120.25 4986,-3862 4899.92,-3819.08 3329.71,-3886.58 3255,-3826 2877.83,-3520.19 3360.75,-3094.62 3007,-2762 2937.05,-2696.23 2860.62,-2795.13 2794,-2726 2629.09,-2554.88 2645.25,-2253.02 2670.34,-2085.17"/> +<polygon fill="black" stroke="black" points="2673.84,-2085.47 2671.89,-2075.05 2666.92,-2084.41 2673.84,-2085.47"/> +</g> +<!-- MappingMethod --> +<g id="node96" class="node"> +<title>MappingMethod</title> +<polygon fill="white" stroke="transparent" points="12923.5,-4853 12923.5,-4922 13110.5,-4922 13110.5,-4853 12923.5,-4853"/> +<polygon fill="#f1eef6" stroke="transparent" points="12927,-4897.5 12927,-4918.5 13108,-4918.5 13108,-4897.5 12927,-4897.5"/> +<polygon fill="none" stroke="black" points="12927,-4897.5 12927,-4918.5 13108,-4918.5 13108,-4897.5 12927,-4897.5"/> +<text text-anchor="start" x="12930" y="-4904.3" font-family="Times,serif" font-size="14.00">MappingMethod (100 B)</text> +<text text-anchor="start" x="13010" y="-4882.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="12996" y="-4861.3" font-family="Times,serif" font-size="14.00">Name</text> +<polygon fill="none" stroke="black" points="12923.5,-4853 12923.5,-4922 13110.5,-4922 13110.5,-4853 12923.5,-4853"/> +</g> +<!-- SnpAll --> +<g id="node97" class="node"> +<title>SnpAll</title> +<polygon fill="white" stroke="transparent" points="1552,-524 1552,-1160 1746,-1160 1746,-524 1552,-524"/> +<polygon fill="#ce1256" stroke="transparent" points="1555,-1136 1555,-1157 1743,-1157 1743,-1136 1555,-1136"/> +<polygon fill="none" stroke="black" points="1555,-1136 1555,-1157 1743,-1157 1743,-1136 1555,-1136"/> +<text text-anchor="start" x="1593.5" y="-1142.8" font-family="Times,serif" font-size="14.00">SnpAll (11 GiB)</text> +<text text-anchor="start" x="1603.5" y="-1120.8" font-family="Times,serif" font-size="14.00">3Prime_UTR</text> +<text text-anchor="start" x="1603.5" y="-1099.8" font-family="Times,serif" font-size="14.00">5Prime_UTR</text> +<text text-anchor="start" x="1625" y="-1078.8" font-family="Times,serif" font-size="14.00">Alleles</text> +<text text-anchor="start" x="1602" y="-1057.8" font-family="Times,serif" font-size="14.00">Chromosome</text> +<text text-anchor="start" x="1581" y="-1036.8" font-family="Times,serif" font-size="14.00">ConservationScore</text> +<text text-anchor="start" x="1621.5" y="-1015.8" font-family="Times,serif" font-size="14.00">Domain</text> +<text text-anchor="start" x="1603.5" y="-994.8" font-family="Times,serif" font-size="14.00">Downstream</text> +<text text-anchor="start" x="1630.5" y="-973.8" font-family="Times,serif" font-size="14.00">Exon</text> +<text text-anchor="start" x="1630.5" y="-952.8" font-family="Times,serif" font-size="14.00">Gene</text> +<text text-anchor="start" x="1641.5" y="-931.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="1612" y="-910.8" font-family="Times,serif" font-size="14.00">Intergenic</text> +<text text-anchor="start" x="1626.5" y="-889.8" font-family="Times,serif" font-size="14.00">Intron</text> +<text text-anchor="start" x="1591.5" y="-868.8" font-family="Times,serif" font-size="14.00">Non_Splice_Site</text> +<text text-anchor="start" x="1557" y="-847.8" font-family="Times,serif" font-size="14.00">Non_Synonymous_Coding</text> +<text text-anchor="start" x="1620" y="-826.8" font-family="Times,serif" font-size="14.00">Position</text> +<text text-anchor="start" x="1599" y="-805.8" font-family="Times,serif" font-size="14.00">Position_2016</text> +<text text-anchor="start" x="1639.5" y="-784.8" font-family="Times,serif" font-size="14.00">Rs</text> +<text text-anchor="start" x="1614" y="-763.8" font-family="Times,serif" font-size="14.00">SnpName</text> +<text text-anchor="start" x="1624" y="-742.8" font-family="Times,serif" font-size="14.00">Source</text> +<text text-anchor="start" x="1614" y="-721.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="1609.5" y="-700.8" font-family="Times,serif" font-size="14.00">Splice_Site</text> +<text text-anchor="start" x="1602" y="-679.8" font-family="Times,serif" font-size="14.00">Start_Gained</text> +<text text-anchor="start" x="1611.5" y="-658.8" font-family="Times,serif" font-size="14.00">Start_Lost</text> +<text text-anchor="start" x="1603.5" y="-637.8" font-family="Times,serif" font-size="14.00">Stop_Gained</text> +<text text-anchor="start" x="1613.5" y="-616.8" font-family="Times,serif" font-size="14.00">Stop_Lost</text> +<text text-anchor="start" x="1575" y="-595.8" font-family="Times,serif" font-size="14.00">Synonymous_Coding</text> +<text text-anchor="start" x="1611.5" y="-574.8" font-family="Times,serif" font-size="14.00">Transcript</text> +<text text-anchor="start" x="1558.5" y="-553.8" font-family="Times,serif" font-size="14.00">Unknown_Effect_In_Exon</text> +<text text-anchor="start" x="1613" y="-532.8" font-family="Times,serif" font-size="14.00">Upstream</text> +<polygon fill="none" stroke="black" points="1552,-524 1552,-1160 1746,-1160 1746,-524 1552,-524"/> +</g> +<!-- SnpAll->Species --> +<g id="edge88" class="edge"> +<title>SnpAll:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M1744,-725C1789.75,-725 1732.61,-554.2 1763,-520 1889.95,-377.13 2495.01,-320.73 2715.44,-304.71"/> +<polygon fill="black" stroke="black" points="2715.91,-308.18 2725.64,-303.98 2715.41,-301.2 2715.91,-308.18"/> +</g> +<!-- GeneInfo --> +<g id="node98" class="node"> +<title>GeneInfo</title> +<polygon fill="white" stroke="transparent" points="2150,-671 2150,-1013 2338,-1013 2338,-671 2150,-671"/> +<polygon fill="#df65b0" stroke="transparent" points="2153,-989 2153,-1010 2335,-1010 2335,-989 2153,-989"/> +<polygon fill="none" stroke="black" points="2153,-989 2153,-1010 2335,-1010 2335,-989 2153,-989"/> +<text text-anchor="start" x="2178" y="-995.8" font-family="Times,serif" font-size="14.00">GeneInfo (23 MiB)</text> +<text text-anchor="start" x="2226.5" y="-973.8" font-family="Times,serif" font-size="14.00">Alias</text> +<text text-anchor="start" x="2215.5" y="-952.8" font-family="Times,serif" font-size="14.00">BlatSeq</text> +<text text-anchor="start" x="2230.5" y="-931.8" font-family="Times,serif" font-size="14.00">Chr</text> +<text text-anchor="start" x="2218" y="-910.8" font-family="Times,serif" font-size="14.00">GeneId</text> +<text text-anchor="start" x="2189.5" y="-889.8" font-family="Times,serif" font-size="14.00">HomoloGeneID</text> +<text text-anchor="start" x="2236.5" y="-868.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="2232" y="-847.8" font-family="Times,serif" font-size="14.00">Mb</text> +<text text-anchor="start" x="2221" y="-826.8" font-family="Times,serif" font-size="14.00">OMIM</text> +<text text-anchor="start" x="2159" y="-805.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_end</text> +<text text-anchor="start" x="2155" y="-784.8" font-family="Times,serif" font-size="14.00">Probe_set_Blat_Mb_start</text> +<text text-anchor="start" x="2209" y="-763.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="2197.5" y="-742.8" font-family="Times,serif" font-size="14.00">Strand_Gene</text> +<text text-anchor="start" x="2195" y="-721.8" font-family="Times,serif" font-size="14.00">Strand_Probe</text> +<text text-anchor="start" x="2217" y="-700.8" font-family="Times,serif" font-size="14.00">Symbol</text> +<text text-anchor="start" x="2224" y="-679.8" font-family="Times,serif" font-size="14.00">TaxId</text> +<polygon fill="none" stroke="black" points="2150,-671 2150,-1013 2338,-1013 2338,-671 2150,-671"/> +</g> +<!-- GeneInfo->Species --> +<g id="edge89" class="edge"> +<title>GeneInfo:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M2336,-767C2363.53,-767 2339.64,-542.84 2355,-520 2438.32,-396.09 2612.85,-338.66 2715.61,-314.65"/> +<polygon fill="black" stroke="black" points="2716.66,-318 2725.63,-312.36 2715.1,-311.18 2716.66,-318"/> +</g> +<!-- GeneList_rn3 --> +<g id="node99" class="node"> +<title>GeneList_rn3</title> +<polygon fill="white" stroke="transparent" points="552,-1718.5 552,-2207.5 718,-2207.5 718,-1718.5 552,-1718.5"/> +<polygon fill="#df65b0" stroke="transparent" points="555,-2183 555,-2204 715,-2204 715,-2183 555,-2183"/> +<polygon fill="none" stroke="black" points="555,-2183 555,-2204 715,-2204 715,-2183 555,-2183"/> +<text text-anchor="start" x="558" y="-2189.8" font-family="Times,serif" font-size="14.00">GeneList_rn3 (5 MiB)</text> +<text text-anchor="start" x="589.5" y="-2167.8" font-family="Times,serif" font-size="14.00">chromosome</text> +<text text-anchor="start" x="621.5" y="-2146.8" font-family="Times,serif" font-size="14.00">flag</text> +<text text-anchor="start" x="595.5" y="-2125.8" font-family="Times,serif" font-size="14.00">genBankID</text> +<text text-anchor="start" x="576" y="-2104.8" font-family="Times,serif" font-size="14.00">geneDescription</text> +<text text-anchor="start" x="609" y="-2083.8" font-family="Times,serif" font-size="14.00">geneID</text> +<text text-anchor="start" x="591" y="-2062.8" font-family="Times,serif" font-size="14.00">geneSymbol</text> +<text text-anchor="start" x="628" y="-2041.8" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="607" y="-2020.8" font-family="Times,serif" font-size="14.00">identity</text> +<text text-anchor="start" x="618" y="-1999.8" font-family="Times,serif" font-size="14.00">kgID</text> +<text text-anchor="start" x="601.5" y="-1978.8" font-family="Times,serif" font-size="14.00">ProbeSet</text> +<text text-anchor="start" x="616" y="-1957.8" font-family="Times,serif" font-size="14.00">qEnd</text> +<text text-anchor="start" x="615" y="-1936.8" font-family="Times,serif" font-size="14.00">qSize</text> +<text text-anchor="start" x="612" y="-1915.8" font-family="Times,serif" font-size="14.00">qStart</text> +<text text-anchor="start" x="615.5" y="-1894.8" font-family="Times,serif" font-size="14.00">score</text> +<text text-anchor="start" x="601.5" y="-1873.8" font-family="Times,serif" font-size="14.00">sequence</text> +<text text-anchor="start" x="618" y="-1852.8" font-family="Times,serif" font-size="14.00">span</text> +<text text-anchor="start" x="598.5" y="-1831.8" font-family="Times,serif" font-size="14.00">specificity</text> +<text text-anchor="start" x="611.5" y="-1810.8" font-family="Times,serif" font-size="14.00">strand</text> +<text text-anchor="start" x="613.5" y="-1789.8" font-family="Times,serif" font-size="14.00">txEnd</text> +<text text-anchor="start" x="612.5" y="-1768.8" font-family="Times,serif" font-size="14.00">txSize</text> +<text text-anchor="start" x="609" y="-1747.8" font-family="Times,serif" font-size="14.00">txStart</text> +<text text-anchor="start" x="602" y="-1726.8" font-family="Times,serif" font-size="14.00">unigenID</text> +<polygon fill="none" stroke="black" points="552,-1718.5 552,-2207.5 718,-2207.5 718,-1718.5 552,-1718.5"/> +</g> +<!-- GeneList_rn3->Genbank --> +<g id="edge90" class="edge"> +<title>GeneList_rn3:genBankID->Genbank</title> +<path fill="none" stroke="black" d="M716,-2130C741.84,-2130 729.38,-1225.22 735,-1200 738.81,-1182.91 745.09,-1180.48 751,-1164 783.34,-1073.83 811.09,-965.96 826.65,-901.05"/> +<polygon fill="black" stroke="black" points="830.13,-901.54 829.04,-891 823.32,-899.92 830.13,-901.54"/> +</g> +<!-- News --> +<g id="node100" class="node"> +<title>News</title> +<polygon fill="white" stroke="transparent" points="13145,-4842.5 13145,-4932.5 13269,-4932.5 13269,-4842.5 13145,-4842.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="13148,-4908.5 13148,-4929.5 13266,-4929.5 13266,-4908.5 13148,-4908.5"/> +<polygon fill="none" stroke="black" points="13148,-4908.5 13148,-4929.5 13266,-4929.5 13266,-4908.5 13148,-4908.5"/> +<text text-anchor="start" x="13151" y="-4915.3" font-family="Times,serif" font-size="14.00">News (167 KiB)</text> +<text text-anchor="start" x="13191" y="-4893.3" font-family="Times,serif" font-size="14.00">date</text> +<text text-anchor="start" x="13182.5" y="-4872.3" font-family="Times,serif" font-size="14.00">details</text> +<text text-anchor="start" x="13200" y="-4851.3" font-family="Times,serif" font-size="14.00">id</text> +<polygon fill="none" stroke="black" points="13145,-4842.5 13145,-4932.5 13269,-4932.5 13269,-4842.5 13145,-4842.5"/> +</g> +<!-- login --> +<g id="node101" class="node"> +<title>login</title> +<polygon fill="white" stroke="transparent" points="13303.5,-4800.5 13303.5,-4974.5 13414.5,-4974.5 13414.5,-4800.5 13303.5,-4800.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="13307,-4950.5 13307,-4971.5 13412,-4971.5 13412,-4950.5 13307,-4950.5"/> +<polygon fill="none" stroke="black" points="13307,-4950.5 13307,-4971.5 13412,-4971.5 13412,-4950.5 13307,-4950.5"/> +<text text-anchor="start" x="13310" y="-4957.3" font-family="Times,serif" font-size="14.00">login (52 KiB)</text> +<text text-anchor="start" x="13315.5" y="-4935.3" font-family="Times,serif" font-size="14.00">assumed_by</text> +<text text-anchor="start" x="13352.5" y="-4914.3" font-family="Times,serif" font-size="14.00">id</text> +<text text-anchor="start" x="13321" y="-4893.3" font-family="Times,serif" font-size="14.00">ip_address</text> +<text text-anchor="start" x="13323" y="-4872.3" font-family="Times,serif" font-size="14.00">session_id</text> +<text text-anchor="start" x="13322.5" y="-4851.3" font-family="Times,serif" font-size="14.00">successful</text> +<text text-anchor="start" x="13321" y="-4830.3" font-family="Times,serif" font-size="14.00">timestamp</text> +<text text-anchor="start" x="13343.5" y="-4809.3" font-family="Times,serif" font-size="14.00">user</text> +<polygon fill="none" stroke="black" points="13303.5,-4800.5 13303.5,-4974.5 13414.5,-4974.5 13414.5,-4800.5 13303.5,-4800.5"/> +</g> +<!-- GeneList --> +<g id="node102" class="node"> +<title>GeneList</title> +<polygon fill="white" stroke="transparent" points="1017.5,-1582 1017.5,-2344 1164.5,-2344 1164.5,-1582 1017.5,-1582"/> +<polygon fill="#df65b0" stroke="transparent" points="1021,-2320 1021,-2341 1162,-2341 1162,-2320 1021,-2320"/> +<polygon fill="none" stroke="black" points="1021,-2320 1021,-2341 1162,-2341 1162,-2320 1021,-2320"/> +<text text-anchor="start" x="1026" y="-2326.8" font-family="Times,serif" font-size="14.00">GeneList (37 MiB)</text> +<text text-anchor="start" x="1064.5" y="-2304.8" font-family="Times,serif" font-size="14.00">AlignID</text> +<text text-anchor="start" x="1065" y="-2283.8" font-family="Times,serif" font-size="14.00">cdsEnd</text> +<text text-anchor="start" x="1043.5" y="-2262.8" font-family="Times,serif" font-size="14.00">cdsEnd_2016</text> +<text text-anchor="start" x="1043.5" y="-2241.8" font-family="Times,serif" font-size="14.00">cdsEnd_mm8</text> +<text text-anchor="start" x="1061" y="-2220.8" font-family="Times,serif" font-size="14.00">cdsStart</text> +<text text-anchor="start" x="1039.5" y="-2199.8" font-family="Times,serif" font-size="14.00">cdsStart_2016</text> +<text text-anchor="start" x="1039.5" y="-2178.8" font-family="Times,serif" font-size="14.00">cdsStart_mm8</text> +<text text-anchor="start" x="1044.5" y="-2157.8" font-family="Times,serif" font-size="14.00">Chromosome</text> +<text text-anchor="start" x="1023" y="-2136.8" font-family="Times,serif" font-size="14.00">Chromosome_mm8</text> +<text text-anchor="start" x="1053" y="-2115.8" font-family="Times,serif" font-size="14.00">exonCount</text> +<text text-anchor="start" x="1031.5" y="-2094.8" font-family="Times,serif" font-size="14.00">exonCount_mm8</text> +<text text-anchor="start" x="1056.5" y="-2073.8" font-family="Times,serif" font-size="14.00">exonEnds</text> +<text text-anchor="start" x="1035" y="-2052.8" font-family="Times,serif" font-size="14.00">exonEnds_mm8</text> +<text text-anchor="start" x="1052" y="-2031.8" font-family="Times,serif" font-size="14.00">exonStarts</text> +<text text-anchor="start" x="1031" y="-2010.8" font-family="Times,serif" font-size="14.00">exonStarts_mm8</text> +<text text-anchor="start" x="1050.5" y="-1989.8" font-family="Times,serif" font-size="14.00">GenBankID</text> +<text text-anchor="start" x="1031.5" y="-1968.8" font-family="Times,serif" font-size="14.00">GeneDescription</text> +<text text-anchor="start" x="1064.5" y="-1947.8" font-family="Times,serif" font-size="14.00">GeneID</text> +<text text-anchor="start" x="1046" y="-1926.8" font-family="Times,serif" font-size="14.00">GeneSymbol</text> +<text text-anchor="start" x="1084" y="-1905.8" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="1056" y="-1884.8" font-family="Times,serif" font-size="14.00">Info_mm9</text> +<text text-anchor="start" x="1074.5" y="-1863.8" font-family="Times,serif" font-size="14.00">kgID</text> +<text text-anchor="start" x="1066.5" y="-1842.8" font-family="Times,serif" font-size="14.00">NM_ID</text> +<text text-anchor="start" x="1056.5" y="-1821.8" font-family="Times,serif" font-size="14.00">ProteinID</text> +<text text-anchor="start" x="1063" y="-1800.8" font-family="Times,serif" font-size="14.00">RGD_ID</text> +<text text-anchor="start" x="1056.5" y="-1779.8" font-family="Times,serif" font-size="14.00">SpeciesId</text> +<text text-anchor="start" x="1067" y="-1758.8" font-family="Times,serif" font-size="14.00">Strand</text> +<text text-anchor="start" x="1045.5" y="-1737.8" font-family="Times,serif" font-size="14.00">Strand_mm8</text> +<text text-anchor="start" x="1068.5" y="-1716.8" font-family="Times,serif" font-size="14.00">TxEnd</text> +<text text-anchor="start" x="1047" y="-1695.8" font-family="Times,serif" font-size="14.00">TxEnd_2016</text> +<text text-anchor="start" x="1047" y="-1674.8" font-family="Times,serif" font-size="14.00">TxEnd_mm8</text> +<text text-anchor="start" x="1064" y="-1653.8" font-family="Times,serif" font-size="14.00">TxStart</text> +<text text-anchor="start" x="1043" y="-1632.8" font-family="Times,serif" font-size="14.00">TxStart_2016</text> +<text text-anchor="start" x="1043" y="-1611.8" font-family="Times,serif" font-size="14.00">TxStart_mm8</text> +<text text-anchor="start" x="1057" y="-1590.8" font-family="Times,serif" font-size="14.00">UnigenID</text> +<polygon fill="none" stroke="black" points="1017.5,-1582 1017.5,-2344 1164.5,-2344 1164.5,-1582 1017.5,-1582"/> +</g> +<!-- GeneList->Species --> +<g id="edge92" class="edge"> +<title>GeneList:SpeciesId->Species</title> +<path fill="none" stroke="black" d="M1020,-1783C987.59,-1783 1012.7,-1229.81 1000,-1200 991.25,-1179.47 973.39,-1184.68 965,-1164 938.08,-1097.7 917.52,-573.54 965,-520 1083.11,-386.82 2377.72,-318.63 2715.68,-303.02"/> +<polygon fill="black" stroke="black" points="2716.05,-306.51 2725.88,-302.55 2715.73,-299.51 2716.05,-306.51"/> +</g> +<!-- GeneList->Genbank --> +<g id="edge91" class="edge"> +<title>GeneList:GenBankID->Genbank</title> +<path fill="none" stroke="black" d="M1020,-1994C975.87,-1994 1023.12,-1237.58 1000,-1200 982.29,-1171.21 954.25,-1190.29 933,-1164 870.98,-1087.29 850.32,-970.88 843.44,-901.34"/> +<polygon fill="black" stroke="black" points="846.89,-900.65 842.48,-891.02 839.92,-901.3 846.89,-900.65"/> +</g> +<!-- GeneChipEnsemblXRef --> +<g id="node103" class="node"> +<title>GeneChipEnsemblXRef</title> +<polygon fill="white" stroke="transparent" points="1750,-1928.5 1750,-1997.5 1976,-1997.5 1976,-1928.5 1750,-1928.5"/> +<polygon fill="#f1eef6" stroke="transparent" points="1753,-1973 1753,-1994 1973,-1994 1973,-1973 1753,-1973"/> +<polygon fill="none" stroke="black" points="1753,-1973 1753,-1994 1973,-1994 1973,-1973 1753,-1973"/> +<text text-anchor="start" x="1756" y="-1979.8" font-family="Times,serif" font-size="14.00">GeneChipEnsemblXRef (36 B)</text> +<text text-anchor="start" x="1808" y="-1957.8" font-family="Times,serif" font-size="14.00">EnsemblChipId</text> +<text text-anchor="start" x="1820.5" y="-1936.8" font-family="Times,serif" font-size="14.00">GeneChipId</text> +<polygon fill="none" stroke="black" points="1750,-1928.5 1750,-1997.5 1976,-1997.5 1976,-1928.5 1750,-1928.5"/> +</g> +<!-- GeneChipEnsemblXRef->EnsemblChip --> +<g id="edge93" class="edge"> +<title>GeneChipEnsemblXRef:EnsemblChipId->EnsemblChip</title> +<path fill="none" stroke="black" d="M1974,-1961C2027,-1961 1909.96,-1154.89 1873.44,-911.66"/> +<polygon fill="black" stroke="black" points="1876.86,-910.9 1871.91,-901.53 1869.94,-911.94 1876.86,-910.9"/> +</g> +<!-- GeneChipEnsemblXRef->GeneChip --> +<g id="edge94" class="edge"> +<title>GeneChipEnsemblXRef:GeneChipId->GeneChip</title> +<path fill="none" stroke="black" d="M1974,-1940C1994.57,-1940 1996.24,-1220.49 1998,-1200 2005.12,-1117.24 2018.29,-1024.34 2029.33,-954.05"/> +<polygon fill="black" stroke="black" points="2032.84,-954.27 2030.95,-943.85 2025.93,-953.18 2032.84,-954.27"/> +</g> +<!-- SnpAllele_to_be_deleted --> +<g id="node104" class="node"> +<title>SnpAllele_to_be_deleted</title> +<polygon fill="white" stroke="transparent" points="13448.5,-4842.5 13448.5,-4932.5 13687.5,-4932.5 13687.5,-4842.5 13448.5,-4842.5"/> +<polygon fill="#d7b5d8" stroke="transparent" points="13452,-4908.5 13452,-4929.5 13685,-4929.5 13685,-4908.5 13452,-4908.5"/> +<polygon fill="none" stroke="black" points="13452,-4908.5 13452,-4929.5 13685,-4929.5 13685,-4908.5 13452,-4908.5"/> +<text text-anchor="start" x="13455" y="-4915.3" font-family="Times,serif" font-size="14.00">SnpAllele_to_be_deleted (3 KiB)</text> +<text text-anchor="start" x="13551" y="-4893.3" font-family="Times,serif" font-size="14.00">Base</text> +<text text-anchor="start" x="13561" y="-4872.3" font-family="Times,serif" font-size="14.00">Id</text> +<text text-anchor="start" x="13554.5" y="-4851.3" font-family="Times,serif" font-size="14.00">Info</text> +<polygon fill="none" stroke="black" points="13448.5,-4842.5 13448.5,-4932.5 13687.5,-4932.5 13687.5,-4842.5 13448.5,-4842.5"/> +</g> +</g> +</svg> diff --git a/topics/deploy/configuring-nginx-on-host.gmi b/topics/deploy/configuring-nginx-on-host.gmi new file mode 100644 index 0000000..cb1c497 --- /dev/null +++ b/topics/deploy/configuring-nginx-on-host.gmi @@ -0,0 +1,220 @@ +# Configuring Nginx on the Host System + +## Tags + +* type: doc, docs, documentation +* keywords: deploy, deployment, deploying, nginx, guix, guix container, guix system container +* status: in progress + +## Introduction + +We deploy the GeneNetwork system within GNU Guix system containers. All the configurations and HTTPS certificates are handled from within the container, thus all the host has to do is to pass the traffic on to the system container. + +This document shows you how to set up the host container to forward all the necessary traffic so that you do not run into all the problems that we did when figuring this stuff out :-). + +## Ports and Domains + +In your system container, there are certain ports that are defined for various traffic. The most important ones, and the ones we will deal with, are for HTTP and HTTPS. The ideas should translate for most other ports. + +For the examples is this document, we will assume the following ports are defined in the Guix system container: +* HTTP on port 9080 +* HTTPS on port 9081 + +## HTTPS Traffic + +### Nginx --with-stream_ssl_preread_module + +We handle all the necessary traffic details (e.g. SSL/TLS termination, etc.) within the container, and only need the host to forward the traffic. + +In order to achieve this, your Nginx will need to be compiled with the +=> https://nginx.org/en/docs/stream/ngx_stream_ssl_preread_module.html Nginx Stream SSL Preread Module. + +Now, because we are awesome, we include +=> https://git.genenetwork.org/gn-machines/tree/nginx-preread.scm a definition for nginx compiled with the module. +Simply install it on your host by doing something like: + +``` +$ git clone https://git.genenetwork.org/gn-machines +$ cd gn-machines +$ ./nginx-preread-deploy.sh +``` + +That will install the nginx under "/usr/local/sbin/nginx". + +Now, we comment out, or delete any/all lines loading any nginx modules for any previously existing nginx. Comment out/delete the following line in your "/etc/nginx/nginx.conf" file if it exists: + +``` +include /etc/nginx/modules-enabled/*.conf; +``` + +This is necessary since the nginx we installed from guix comes with all the modules we need, and even if not, it would not successfully use the hosts modules anyhow. You'd need to modify the nginx config for yourself to add any missing modules for the nginx from guix — how to do that is outside the scope of this document, but should not be particularly difficult. + +Set up your init system to use the nginx from guix. Assuming systemd, you need to have something like the following in your "/etc/systemd/system/nginx.service" unit file: + +``` +[Unit] +Description=nginx web server (from Guix, not the host) +After=network.target + +[Service] +Type=forking +PIDFile=/run/nginx.pid +ExecStartPre=/usr/local/sbin/nginx -q -t -c /etc/nginx/nginx.conf -e /var/log/nginx/error.log +ExecStart=/usr/local/sbin/nginx -c /etc/nginx/nginx.conf -p /var/run/nginx -e /var/log/nginx/error.log +ExecReload=/usr/local/sbin/nginx -c /etc/nginx/nginx.conf -s reload -e /var/log/nginx/error.log +ExecStop=-/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid +TimeoutStopSec=5 +KillMode=mixed + +[Install] +WantedBy=multi-user.target +``` + +Awesome. Now enable the unit file: + +``` +$ sudo systemctl enable nginx.service +``` + +### Forwarding the HTTPS Traffic + +Now that we have nginx in place, we can forward HTTPS traffic for all the domains we want. In "/etc/nginx/nginx.conf" we add: + +``` +# Forward some HTTPS connections into existing guix containers +stream { + upstream my-container { + # This is our Guix system container + server 127.0.0.1:9081; + } + + upstream host-https { + # Forward any https traffic for any previously existing domains on the + # host itself. + server 127.0.0.1:6443; + } + + map $ssl_preread_server_name $upstream { + yourdomain1.genenetwork.org my-container; + yourdomain2.genenetwork.org my-container; + default host-https; + } + + server { + listen 443; + proxy_pass $upstream; + ssl_preread on; + } +} +``` + +## HTTP Traffic + +You will need to pass the HTTP traffic on to the container in order to enable HTTP-dependent traffic (e.g. setting up the SSL certificates using the ACME protocol) is successfully handled. + +You have 2 options to do this: +* Add a separate server block in `/etc/nginx/site-available/` (or other configured directory) +* Add the server block directly in `/etc/nginx/nginx.conf` (or your main nginx config file, if it's not the standard one mentioned here). + +The configuration to add is as follows: + +``` +server { + ## Forward HTTP traffic to container + ## Without this, the HTTP calls will fall through to the defaults in + ## /etc/nginx/sites-enabled/ leading to http-dependent traffic, like + ## that of the ACME client, failing. + server_name yourdomain1.genenetwork.org yourdomain2.genenetwork.org …; + listen 80; + location / { + proxy_pass http://127.0.0.1:9080; + proxy_set_header Host $host; + } +} +``` + +** Do please replace the "yourdomain*" parts in the example above as appropriate for your scenario. The ellipsis (…) indicate optional extra domains you might need to configure. + +Without this, the `Run ACME Client` below will fail + +## Run ACME Client + +Now that all traffic is set up, and you can reach your sites using both HTTP and HTTPS (you have tested your sites, right? right?) we can now request the SSL certificates from Let's Encrypt so that we no longer see the "Self-signed Certificate" warning. + +You need to get into your system container to do this. The steps are a follows: + +=> https://git.genenetwork.org/gn-machines/tree/README.org#n61 Figure out which process is your container +=> https://git.genenetwork.org/gn-machines/tree/README.org#n55 Get a shell into the container +=> https://guix-forge.systemreboot.net/manual/dev/en/#section-acme-service Run "/usr/bin/acme renew" to get your initial SSL certificates from Let's Encrypt. + +At this point, the traffic portion of the configuration is done. + +## Sample "/etc/nginx/nginx.conf" + +``` +user www-data; +worker_processes auto; +pid /run/nginx.pid; +# include /etc/nginx/modules-enabled/*.conf; + +access_log /var/log/nginx/access.log; +error_log /var/log/nginx/error.log error; + +events { + worker_connections 768; + # multi_accept on; +} + +stream { + upstream my-container { + # This is our Guix system container + server 127.0.0.1:9081; + } + + upstream host-https { + # Forward any https traffic for any previously existing domains on the + # host itself. + server 127.0.0.1:6443; + } + + map $ssl_preread_server_name $upstream { + yourdomain1.genenetwork.org my-container; + yourdomain2.genenetwork.org my-container; + default host-https; + } + + server { + listen 443; + proxy_pass $upstream; + ssl_preread on; + } +} + +http { + ## + # Basic Settings + ## + + ⋮ + + include /etc/nginx/conf.d/*.conf; + server { + ## Forward HTTP traffic to container + ## Without this, the HTTP calls will fall through to the defaults in + ## /etc/nginx/sites-enabled/ leading to http-dependent traffic, like + ## that of the ACME client, failing. + server_name yourdomain1.genenetwork.org yourdomain2.genenetwork.org …; + listen 80; + location / { + proxy_pass http://127.0.0.1:9080; + proxy_set_header Host $host; + } + } + include /etc/nginx/sites-enabled/*; + + ⋮ +} + +⋮ + +``` diff --git a/topics/deploy/deployment.gmi b/topics/deploy/deployment.gmi index b844821..74fd6f0 100644 --- a/topics/deploy/deployment.gmi +++ b/topics/deploy/deployment.gmi @@ -1,14 +1,21 @@ # Deploy GeneNetwork +## Tags + +* type: doc, docs, documentation +* keywords: deploy, deployment, deploying, guix, guix container, guix system container +* status: in progress + # Description This page attempts to document the deployment process we have for GeneNetwork. We use Guix system containers for deployment of CI/CD and the Guix configuration for the CI/CD container should be considered the authoritative reference. -=> https://github.com/genenetwork/genenetwork-machines/blob/main/genenetwork-development.scm +=> https://git.genenetwork.org/gn-machines/tree/genenetwork-development.scm See also => ./guix-system-containers-and-how-we-use-them +=> ./configuring-nginx-on-host ## genenetwork2 diff --git a/topics/deploy/genecup.gmi b/topics/deploy/genecup.gmi index c5aec17..fc93d07 100644 --- a/topics/deploy/genecup.gmi +++ b/topics/deploy/genecup.gmi @@ -53,3 +53,72 @@ and port forward: ssh -L 4200:127.0.0.1:4200 -f -N server curl localhost:4200 ``` + +# Troubleshooting + +## Moving the PubMed dir + +After moving the PubMed dir GeneCup stopped displaying part of the connections. This can be reproduced by running the standard example on the home page - the result should look like the image on the right of the home page. + +After fixing the paths and restarting the service there still was no result. + +Genecup is currently managed by the shepherd as user shepherd. Stop the service as that user: + +``` +shepherd@tux02:~$ herd stop genecup +guile: warning: failed to install locale +Service genecup has been stopped. +``` + +Now the servic looks stopped, but it is still running and you need to kill by hand: + +``` +shepherd@tux02:~$ ps xau|grep genecup +shepherd 89524 0.0 0.0 12780 944 pts/42 S+ 00:32 0:00 grep genecup +shepherd 129334 0.0 0.7 42620944 2089640 ? Sl Mar05 66:30 /gnu/store/1w5v338qk5m8khcazwclprs3znqp6f7f-python-3.10.7/bin/python3 /gnu/store/a6z0mmj6iq6grwynfvkzd0xbbr4zdm0l-genecup-latest-with-tensorflow-native-HEAD-of-master-branch/.server.py-real +shepherd@tux02:~$ kill -9 129334 +shepherd@tux02:~$ ps xau|grep genecup +shepherd 89747 0.0 0.0 12780 944 pts/42 S+ 00:32 0:00 grep genecup +shepherd@tux02:~$ +``` + +The log file lives in + +``` +shepherd@tux02:~/logs$ tail -f genecup.log +``` + +and we were getting errors on a reload and I had to fix + +``` +shepherd@tux02:~/shepherd-services$ grep export run_genecup.sh +export EDIRECT_PUBMED_MASTER=/export3/PubMed +export TMPDIR=/export/ratspub/tmp +export NLTK_DATA=/export3/PubMed/nltk_data +``` + +See + +=> https://git.genenetwork.org/gn-shepherd-services/commit/?id=cd4512634ce1407b14b0842b0ef6a9cd35e6d46c + +The symlink from /export2 is not honoured by the guix container. Now the service works. + +Note we have deprecation warnings that need to be addressed in the future: + +``` +2025-04-22 00:40:07 /home/shepherd/services/genecup/guix-past/modules/past/packages/python.scm:740:19: warning: 'texlive-union' is deprecated, + use 'texlive-updmap.cfg' instead +2025-04-22 00:40:07 guix build: warning: 'texlive-latex-base' is deprecated, use 'texlive-latex-bin' instead +2025-04-22 00:40:15 updating checkout of 'https://git.genenetwork.org/genecup'... +/gnu/store/9lbn1l04y0xciasv6zzigqrrk1bzz543-tensorflow-native-1.9.0/lib/python3.10/site-packages/tensorflow/python/framewo +rk/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. +2025-04-22 00:40:38 _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) +2025-04-22 00:40:38 /gnu/store/9lbn1l04y0xciasv6zzigqrrk1bzz543-tensorflow-native-1.9.0/lib/python3.10/site-packages/tensorflow/python/framewo +rk/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. +2025-04-22 00:40:38 _np_qint32 = np.dtype([("qint32", np.int32, 1)]) +2025-04-22 00:40:38 /gnu/store/9lbn1l04y0xciasv6zzigqrrk1bzz543-tensorflow-native-1.9.0/lib/python3.10/site-packages/tensorflow/python/framewo +rk/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. +2025-04-22 00:40:38 np_resource = np.dtype([("resource", np.ubyte, 1)]) +2025-04-22 00:40:39 /gnu/store/7sam0mr9kxrd4p7g1hlz9wrwag67a6x6-python-flask-sqlalchemy-2.5.1/lib/python3.10/site-packages/flask_sqlalchemy/__ +init__.py:872: FSADeprecationWarning: SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and will be disabled by default in the future. Set it to True or False to suppress this warning. +``` diff --git a/topics/deploy/installation.gmi b/topics/deploy/installation.gmi index 757d848..d6baa79 100644 --- a/topics/deploy/installation.gmi +++ b/topics/deploy/installation.gmi @@ -319,7 +319,7 @@ Currently we have two databases for deployment, from BXD mice and 'db_webqtl_plant' which contains all plant related material. -Download one database from +Download a recent database from => https://files.genenetwork.org/database/ diff --git a/topics/deploy/machines.gmi b/topics/deploy/machines.gmi index d610c9f..a7c197c 100644 --- a/topics/deploy/machines.gmi +++ b/topics/deploy/machines.gmi @@ -2,17 +2,19 @@ ``` - [ ] bacchus 172.23.17.156 (00:11:32:ba:7f:17) - 1 Gbs -- [X] lambda01 172.23.18.212 (7c:c2:55:11:9c:ac) +- [ ] penguin2 +- [X] lambda01 172.23.18.212 (7c:c2:55:11:9c:ac) - currently 172.23.17.41 - [X] tux03i 172.23.17.181 (00:0a:f7:c1:00:8d) - 10 Gbs [X] tux03 128.169.5.101 (00:0a:f7:c1:00:8b) - 1 Gbs -- [ ] tux04i 172.23.17.170 (14:23:f2:4f:e6:10) -- [ ] tux04 128.169.5.119 (14:23:f2:4f:e6:11) +- [X] tux04i 172.23.17.170 (14:23:f2:4f:e6:10) +- [X] tux04 128.169.5.119 (14:23:f2:4f:e6:11) - [X] tux05 172.23.18.129 (14:23:f2:4f:35:00) - [X] tux06 172.23.17.188 (14:23:f2:4e:29:10) - [X] tux07 172.23.17.191 (14:23:f2:4e:7d:60) - [X] tux08 172.23.17.186 (14:23:f2:4f:4e:b0) - [X] tux09 172.23.17.182 (14:23:f2:4e:49:10) - [X] space 128.169.5.175 (e4:3d:1a:80:6c:40) +- [ ] space-i 172.23.18.153 (cc:48:3a:13:db:4c) - [ ] octopus01f 172.23.18.221 (2c:ea:7f:60:bf:61) - [ ] octopus02f 172.23.22.159 (2c:ea:7f:60:bd:61) - [ ] octopus03f 172.23.19.187 (2c:ea:7f:60:ac:2b) @@ -25,6 +27,8 @@ c for console or control ``` - [ ] DNS entries no longer visible +- [X] penguin2-c 172.23.31.83 +- [ ] octolair01 172.23.16.228 - [X] lambda01-c 172.23.17.173 (3c:ec:ef:aa:e5:50) - [X] tux01-c 172.23.31.85 (58:8A:5A:F9:3A:22) - [X] tux02-c 172.23.30.40 (58:8A:5A:F0:E6:E4) diff --git a/topics/deploy/paths-in-flask-applications.gmi b/topics/deploy/paths-in-flask-applications.gmi new file mode 100644 index 0000000..77bc201 --- /dev/null +++ b/topics/deploy/paths-in-flask-applications.gmi @@ -0,0 +1,22 @@ +# Paths in Flask Application + +## Tags + +* type: doc, docs, documentation +* assigned: fredm +* keywords: application paths, flask, absolute path, relative path + +## Content + +Always build and use absolute paths for the resources you use in your application. Assuming that the application will always be run with the root of the application's repository/package as the working directory is a recipe for failure. + +To demonstrate, see the following issue: +=> /issues/genenetwork2/haley-knott-regression-mapping-error + +In this case, the path issue was not caught in the CI/CD environment since it runs the application with the repository root as its working directory. This issue will also not show up in most development environments since it is easier to run the application from the root of the repository than have to set up the PYTHONPATH variables. + +In the new containers making use of the "(genenetwork services genenetwork)" module in gn-machines[fn:1], the working directory where the application is invoked has no relation with the application's package — in fact, the working directory is actually the root of the containers file system ("/"). + +# Footnotes + +[fn:1] https://git.genenetwork.org/gn-machines/ diff --git a/topics/deploy/setting-up-or-migrating-production-across-machines.gmi b/topics/deploy/setting-up-or-migrating-production-across-machines.gmi new file mode 100644 index 0000000..1f35dae --- /dev/null +++ b/topics/deploy/setting-up-or-migrating-production-across-machines.gmi @@ -0,0 +1,58 @@ +# Setting Up or Migrating Production Across Machines + +## Tags + +* type: documentation, docs, doc +* status: in-progress +* assigned: fredm +* priority: undefined +* keywords: migration, production, genenetwork +* interested-parties: pjotrp, zachs + +## Introduction + +Recent events (Late 2024 and early 2025) have led to us needing to move the production system from one machine to the other several time, due to machine failures, disk space, security concerns, and the like. + +In this respect, a number of tasks rise to the front as necessary to accomplish for a successful migration. Each of the following sections will detail a task that's necessary for a successful migration. + +## Set Up the Database + +* Extract: detail this — link to existing document in this repo. Also, probably note that we symlink the extraction back to `/var/lib/mysql`? +* Configure: detail this — link to existing document in this repo + +## Set Up the File System + +* TODO: List the necessary directories and describe what purpose each serves. This will be from the perspective of the container — actual paths on the host system are left to the builders choice, and can vary wildly. +* TODO: Prefer explicit binding rather than implicit — makes the shell scripts longer, but no assumptions have to be made, everything is explicitly spelled out. + +## Redis + +We currently (2025-06-11) use Redis for: + +- Tracking user collection (this will be moved to SQLite database) +- Tracking background jobs (this is being moved out to SQLite databases) +- Tracking running-time (not sure what this is about) +- Others? + +We do need to copy over the redis save file whenever we do a migration, at least until the user collections and background jobs features have been moved completely out of Redis. + +## Container Configurations: Secrets + +* TODO: Detail how to extract/restore the existing secrets configurations in the new machine + +## Build Production Container + +* TODO: Add notes on building +* TODO: Add notes on setting up systemd + +## NGINX + +* TODO: Add notes on streaming and configuration of it thereof + +## SSL Certificates + +* TODO: Add notes on acquisition and setup of SSL certificates + +## DNS + +* TODO: Migrate DNS settings diff --git a/topics/deploy/uthsc-vpn-with-free-software.gmi b/topics/deploy/uthsc-vpn-with-free-software.gmi index 344772c..95fd1cd 100644 --- a/topics/deploy/uthsc-vpn-with-free-software.gmi +++ b/topics/deploy/uthsc-vpn-with-free-software.gmi @@ -10,6 +10,11 @@ $ openconnect-sso --server uthscvpn1.uthsc.edu --authgroup UTHSC ``` Note that openconnect-sso should be run as a regular user, not as root. After passing Duo authentication, openconnect-sso will try to gain root priviliges to set up the network routes. At that point, it will prompt you for your password using sudo. +## Recommended way + +The recommended way is to use Arun's g-expression setup using guix. See below. It should just work, provided you have the +chained certificate that you can get from the browser or one of us. + ## Avoid tunneling all your network traffic through the VPN (aka Split Tunneling) openconnect, by default, tunnels all your traffic through the VPN. This is not good for your privacy. It is better to tunnel only the traffic destined to the specific hosts that you want to access. This can be done using the vpn-slice script. @@ -44,46 +49,46 @@ export OPENSSL_CONF=/tmp/openssl.cnf ``` Then, run the openconnect-sso client as usual. -## Putting it all together using Guix G-expressions +## Misconfigured UTHSC TLS certificate -Remembering to do all these steps is a hassle. Writing a shell script to automate this is a good idea, but why write shell scripts when we have G-expressions! Here's a G-expression script that I prepared earlier. -=> uthsc-vpn.scm -Download it, tweak the %hosts variable to specify the hosts you are interested in, and run it like so: +The UTHSC TLS certificate does not validate on some systems. You can work around this by downloading the certificate chain and adding it to your system: +* Navigate with browser to https://uthscvpn1.uthsc.edu/. Inspect the certificate in the browser (lock icon next to search bar) and export .pem file +* Move it to /usr/local/share/ca-certificates (with .crt extension) or equivalent +* On Debian/Ubuntu update the certificate store with update-ca-certificates +You should see ``` -$(guix build -f uthsc-vpn.scm) +Updating certificates in /etc/ssl/certs... +1 added, 0 removed; done. ``` +Thanks Niklas. See also +=> https://superuser.com/a/719047/914881 -# Troubleshooting - -Older versions would not show a proper dialog for sign-in. Try - +However, adding certificates to your system manually is not good security practice. It is better to limit the added certificate to the openconnect process. You can do this using the REQUESTS_CA_BUNDLE environment variable like so: ``` -export QTWEBENGINE_CHROMIUM_FLAGS=--disable-seccomp-filter-sandbox +REQUESTS_CA_BUNDLE=/path/to/uthsc/certificate.pem openconnect-sso --server uthscvpn1.uthsc.edu --authgroup UTHSC ``` -## Update certificate +## Putting it all together using Guix G-expressions -When the certificate expires you can download the new one with: +Remembering to do all these steps is a hassle. Writing a shell script to automate this is a good idea, but why write shell scripts when we have G-expressions! Here's a G-expression script that I prepared earlier. +=> uthsc-vpn.scm +Download it, download the UTHSC TLS certificate chain to uthsc-certificate.pem, tweak the %hosts variable to specify the hosts you are interested in, and run it like so: +``` +$(guix build -f uthsc-vpn.scm) +``` -* Navigate with browser to https://uthscvpn1.uthsc.edu/. Inspect the certificate in the browser (lock icon next to search bar) and export .pem file -* Move it to /usr/local/share/ca-certificates (with .crt extension) or equivalent -* On Debian/Ubuntu update the certificate store with update-ca-certificates - -You should see +to add a route by hand after you can do ``` -Updating certificates in /etc/ssl/certs... -1 added, 0 removed; done. +ip route add 172.23.17.156 dev tun0 ``` -Thanks Niklas. See also - -=> https://superuser.com/a/719047/914881 +# Troubleshooting -On GUIX you may need to point to the updated certificates file with: +Older versions would not show a proper dialog for sign-in. Try ``` -env REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt openconnect-sso --server uthscvpn1.uthsc.edu --authgroup UTHSC +export QTWEBENGINE_CHROMIUM_FLAGS=--disable-seccomp-filter-sandbox ``` ## Acknowledgement diff --git a/topics/deploy/uthsc-vpn.scm b/topics/deploy/uthsc-vpn.scm index c714731..82f67f5 100644 --- a/topics/deploy/uthsc-vpn.scm +++ b/topics/deploy/uthsc-vpn.scm @@ -1,11 +1,15 @@ -(use-modules ((gnu packages guile-xyz) #:select (guile-ini guile-lib guile-smc)) +(use-modules ((gnu packages python-web) #:select (python-requests python-urllib3)) + ((gnu packages guile-xyz) #:select (guile-ini guile-lib guile-smc)) ((gnu packages vpn) #:select (openconnect-sso vpn-slice)) - (guix gexp)) + (guix build-system python) + (guix download) + (guix gexp) + (guix packages)) ;; Put in the hosts you are interested in here. (define %hosts (list "octopus01" - "tux01.genenetwork.org")) + "spacex.uthsc.edu")) (define (ini-file name scm) "Return a file-like object representing INI file with @var{name} and @@ -19,6 +23,46 @@ (call-with-output-file #$output (cut scm->ini #$scm #:port <>)))))) +(define python-urllib3-1.26 + (package + (inherit python-urllib3) + (version "1.26.15") + (source + (origin + (method url-fetch) + (uri (pypi-uri "urllib3" version)) + (sha256 + (base32 + "01dkqv0rsjqyw4wrp6yj8h3bcnl7c678qkj845596vs7p4bqff4a")))) + (build-system python-build-system))) + +(define python-requests-2.28 + (package + (inherit python-requests) + (name "python-requests") + (version "2.28.1") + (source (origin + (method url-fetch) + (uri (pypi-uri "requests" version)) + (sha256 + (base32 + "10vrr7bijzrypvms3g2sgz8vya7f9ymmcv423ikampgy0aqrjmbw")))) + (build-system python-build-system) + (arguments (list #:tests? #f)) + (native-inputs (list)) + (propagated-inputs + (modify-inputs (package-propagated-inputs python-requests) + (replace "python-urllib3" python-urllib3-1.26))))) + +;; Login to the UTHSC VPN fails with an SSLV3_ALERT_HANDSHAKE_FAILURE +;; on newer python-requests. +(define openconnect-sso-uthsc + (package + (inherit openconnect-sso) + (inputs + (modify-inputs (package-inputs openconnect-sso) + (replace "python-requests" python-requests-2.28))))) + (define uthsc-vpn (with-imported-modules '((guix build utils)) #~(begin @@ -34,7 +78,9 @@ ("system_default" . "system_default_sect")) ("system_default_sect" ("Options" . "UnsafeLegacyRenegotiation"))))) - (invoke #$(file-append openconnect-sso "/bin/openconnect-sso") + (setenv "REQUESTS_CA_BUNDLE" + #$(local-file "uthsc-certificate.pem")) + (invoke #$(file-append openconnect-sso-uthsc "/bin/openconnect-sso") "--server" "uthscvpn1.uthsc.edu" "--authgroup" "UTHSC" "--" diff --git a/topics/documentation/guides_vs_references.gmi b/topics/documentation/guides_vs_references.gmi new file mode 100644 index 0000000..7df0be2 --- /dev/null +++ b/topics/documentation/guides_vs_references.gmi @@ -0,0 +1,24 @@ +# Guides Vs References + +Before coming up with docs, figure out their use. It can either be as a guide (provides solutions to problems encountered) or a reference (similar to man pages, where we provide detailed explanations). + +## For guides: + +* Be as brief as possible, providing reference links for users that want to explore i.e. don't aim from completeness, but rather practicality. +* Prefer providing code or command snippets where possible. +* Preferable have another team member review the docs. This helps eliminate blindspots due to our current knowledge. +* Organize the document in such a way that it starts with the most actionable steps. +* Avoid stream of consciousness writing. + +### Example + +Wrong: + +When setting up guix OS, I couldn't get `tmux` to start, getting `tmux: invalid LC_ALL, LC_CTYPE or LANG`. Running `locale -a` failed too. It took me a while to figure out the solution for this problem, and I attempted to reinstall `glibc-locales` which didn't help. After a lot of research, I found that the root cause was that my applications were built on a different version of `glibc`. I ran `guix update` and the problem disappeared. + +Correct: + +`tmux` failing with `tmux: invalid LC_ALL, LC_CTYPE or LANG` could be caused by having packages build on a different version of `glibc`. Attempt: + +> locale -a # should also fail +> guix update # rebuilds your packages with your current glibc diff --git a/topics/editing/case-attributes.gmi b/topics/editing/case-attributes.gmi new file mode 100644 index 0000000..1a86131 --- /dev/null +++ b/topics/editing/case-attributes.gmi @@ -0,0 +1,110 @@ +# Editing Case-Attributes + +## Tags + +* type: document +* keywords: case-attribute, editing +* assigned: fredm, zachs, acenteno, bonfacem +* status: requirements gathering + +## Introduction + +Case-attributes metadata for samples. They are include: sex, age, etc of the various individuals and exist separately from "normal" traits mainly because they're non-numeric. From the GN2 traits page, they are shown as extra columns under the "Reviews and Edit Data" section. + +Case-attributes are determined at the group-level. E.g. for BXD, case attributes would apply at the level of each sample, across all BXD data. Every strain has a unique attribute and it's fixed, not variable. + +We need to differentiate these two things: + +* Case-Attribute labels/names/categories (e.g. Sex, Height, Cage-handler, etc) +* Case-Attribute values (e.g. Male/Female, 20cm, Frederick, etc.) + +Currently, both labels and values are set at the group level: + +=> https://github.com/genenetwork/genenetwork1/blob/0f170f0b748a4e10eaf8538f6bcbf88b573ce8e7/web/webqtl/showTrait/DataEditingPage.py Case-Attributes on GeneNetwork1 +is a good starting point to help with understanding how case-attributes were implemented and how they worked. + +Critical bug existed where editing one case-attribute affected all case-attributes defined for a group. + +Case attributes can have the following data-types: + +* Free-form text (no constraints) - see the `Status` column +* Enumerations - textual data, but where the user can only pick from specific values +* Links - The value displayed also acts as a link - e.g. the 'JAX:*' values in the `RRID` column + +## HOWTO + +Example SQL query to fetch case-attribute data: + +``` +SELECT + caxrn.*, ca.Name AS CaseAttributeName, + ca.Description AS CaseAttributeDescription, + iset.InbredSetId AS OrigInbredSetId +FROM + CaseAttribute AS ca INNER JOIN CaseAttributeXRefNew AS caxrn + ON ca.Id=caxrn.CaseAttributeId +INNER JOIN + StrainXRef AS sxr + ON caxrn.StrainId=sxr.StrainId +INNER JOIN + InbredSet AS iset + ON sxr.InbredSetId=iset.InbredSetId +WHERE + caxrn.value != 'x' + AND caxrn.value IS NOT NULL; +``` + +CaseAttributeXRefNew differs from CaseAttributeXRef: + +``` +mysql> describe CaseAttributeXRef; ++------------------+----------------------+------+-----+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++------------------+----------------------+------+-----+---------+-------+ +| ProbeSetFreezeId | smallint(5) unsigned | NO | PRI | 0 | | +| StrainId | smallint(5) unsigned | NO | PRI | 0 | | +| CaseAttributeId | smallint(5) | NO | PRI | 0 | | +| Value | varchar(100) | NO | | | | ++------------------+----------------------+------+-----+---------+-------+ +4 rows in set (0.01 sec) + +mysql> describe CaseAttributeXRefNew; ++-----------------+------------------+------+-----+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++-----------------+------------------+------+-----+---------+-------+ +| InbredSetId | int(5) unsigned | NO | PRI | NULL | | +| StrainId | int(20) unsigned | NO | PRI | NULL | | +| CaseAttributeId | int(5) unsigned | NO | PRI | NULL | | +| Value | varchar(100) | NO | | NULL | | ++-----------------+------------------+------+-----+---------+-------+ +4 rows in set (0.01 sec) +``` + +=> https://github.com/genenetwork/genenetwork3/blob/dd0b29c07017ec398c447ca683dd4b4be18d73b7/scripts/update-case-attribute-tables-20230818 Script to update CaseAttribute and CaseAttributeXRefNew table + +## Tasks + +* @bmunyoki: Model case-attributes correctly in RDF. +* @bmunyoki, @zachs: Implement case-attributes editing in GN3 that correctly models case-attributes at the group-level. CRUD operations with the correct authorization. People who can edit sample data should not be able to edit case-attributes because case-attributes are defined at the group level; and editing case-attributes at the group-level will affect other samples. +* @rob: Confirm to team whether "N" and "SE" are case-attributes. @bmunyoki AFAICT, no. + + +Possible set of privileges subject to discussion: + +* group:resource:add-case-attributes - Allows user to add a completely new case attribute +* group:resource:edit-case-attributes - Allows user to edit an existing case attribute +* group:resource:delete-case-attributes - Allows user to delete an existing case attribute +* group:resource:view-case-attributes - Allows user to view case attributes and their value + +Given groups are not directly linked to any auth resource, we may introduce some level of indirection. Addy a new resource type that handles groups may solve this. + +## See Also + +=> https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$myIoafLp_dIONnyNvEI0k2xf3Y8-LyiI_mkP2vBN08o?via=matrix.org Discussion on Case-Attributes Editing in Matrix +=> https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$P6SNnpY-nAZsDr3VZlRi05m6MT32lXBsCl-BYLh-YLM?via=matrix.org More Discussion on Matrix +=> /issues/case-attr-edit-error Case Attribute Edting Problems +=> /issues/fix-case-attribute-work Fix Case Attribute Work (Same Columns) +=> /issues/fix-case-attribute-editing Editing Case Attribute +=> /issues/consecutive-crud-applications-when-uploading-data Fix Case Attribute Work (Consecutive CRUD applications) +=> /issues/edit-metadata-bugs Cannot Edit Metadata of BXD Traits Effectively +=> /topics/data-uploads/datasets Some Historical Context diff --git a/topics/editing/case_attributes.gmi b/topics/editing/case_attributes.gmi deleted file mode 100644 index 5a11026..0000000 --- a/topics/editing/case_attributes.gmi +++ /dev/null @@ -1,180 +0,0 @@ -# Editing Case-Attributes - -## Tags - -* type: document -* keywords: case-attribute, editing -* assigned: fredm, zachs, acenteno -* status: requirements gathering - -## Introduction - -Case-Attributes are essentially the metadata for the samples. In the GN2 system, they are the extra columns in the table in the "Reviews and Edit Data" accordion tab besides the value and its error margin. - -To quote @zachs - -> "Case Attributes" are basically just sample metadata. So stuff like the sex, age, etc of the various individuals (and exist separately from "normal" traits mainly because they're non-numeric) - -They are the metadata for the various sample in a trait. The case attributes are determined at the group-level: - -> Since they're metadata (or "attributes" in this case) for samples, they're group-level so for BXD, case attributes would apply at the level of each sample, across all BXD data - -Also From email: -> Every strain has a unique attribute and it's fixed, not variable. - -## Direction - -We need to differentiate two things: -* Case-Attribute labels/names/categories (e.g. Sex, Height, Cage-handler, etc) -* Case-Attribute values (e.g. Male/Female, 20cm, Frederick, etc.) - -As is currently implemented (as of before 2023-08-31), both the labels and values are set at group level. - -A look at -=> https://github.com/genenetwork/genenetwork1/blob/0f170f0b748a4e10eaf8538f6bcbf88b573ce8e7/web/webqtl/showTrait/DataEditingPage.py Case-Attributes on GeneNetwork1 -is a good starting point to help with understanding how case-attributes were implemented and how they worked. - -## Status - -There is code that existed for the case-attributes editing, but it had a critical bug where the data for existing attributes would be deleted/replaced randomly when one made a change. This lead to a pause in this effort. - -The chosen course of action will, however, not make use of this existing code. Instead, we will reimplement the feature with code in GN3, exposing the data and its editing via API endpoints. - -## Database - -The existing database tables of concern to us are: - -* InbredSet -* CaseAttribute -* StrainXRef -* Strain -* CaseAttributeXRefNew - -We can fetch case-attribute data from the database with: - -``` -SELECT - caxrn.*, ca.Name AS CaseAttributeName, - ca.Description AS CaseAttributeDescription, - iset.InbredSetId AS OrigInbredSetId -FROM - CaseAttribute AS ca INNER JOIN CaseAttributeXRefNew AS caxrn - ON ca.Id=caxrn.CaseAttributeId -INNER JOIN - StrainXRef AS sxr - ON caxrn.StrainId=sxr.StrainId -INNER JOIN - InbredSet AS iset - ON sxr.InbredSetId=iset.InbredSetId -WHERE - caxrn.value != 'x' - AND caxrn.value IS NOT NULL; -``` - -which gives us all the information we need to rework the database schema. - -Since the Case-Attributes are group-level, we need to move the `InbredSetId` to the `CaseAttribute` table from the `CaseAttributeXRefNew` table. - -For more concrete relationship declaration, we can have the `CaseAttributeXRefNew` table have it primary key be composed of the `InbredSetId`, `StrainId` and `CaseAttributeId`. That has the added advantage that we can index the table on `InbredSetId` and `StrainId`. - -That leaves the `CaseAttribute` table with the following columns: - -* InbredSetId: Foreign Key from `InbredSet` table -* Id: The CaseAttribute identifier -* Name: Textual name for the Case-Attribute -* Description: Textual description fro the case-attribute - -while the `CaseAttributeXRefNew` table ends up with the following columns: - -* InbredSetId: Foreign Key from `InbredSet` table -* StrainId: The strain -* CaseAttributeId: The case-attribute identifier -* Value: The value for the case-attribute for this specific strain - -There will not be any `NULL` values allowed for any of the columns in both tables. If a strain has no value, we simply delete the corresponding record from the `CaseAttributeXRefNew` table. - -To that end, the following script has been added to ease the migration of the table schemas: -=> https://github.com/genenetwork/genenetwork3/blob/dd0b29c07017ec398c447ca683dd4b4be18d73b7/scripts/update-case-attribute-tables-20230818 -The script is meant to be run only once, and makes the changes mentioned above for both tables. - -## Data Types - -> ... (and exist separately from "normal" traits mainly because they're non-numeric) - -The values for Case-Attributes are non-numeric data. This will probably be mostly textual data. - -As an example: -=> https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish Trait Data and Analysis for BXD_10010 -we see Case-Attributes as: - -* Free-form text (no constraints) - see the `Status` column -* Enumerations - textual data, but where the user can only pick from specific values -* Links - The value displayed also acts as a link - e.g. the 'JAX:*' values in the `RRID` column - - -=> https://genenetwork.org/show_trait?trait_id=10002&dataset=CCPublish For this trait - -We see: -* Numeric data - see the `N` and `SE` columns -though that might be a misunderstanding of the quote - -> In the following link for example, every column after Value is a case attribute - https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish - -**TODO**: Verify whether `N` and `SE` are Case-Attributes - -## Authorisation - -From email: -> it's probably not okay to let anyone who can edit sample data for a trait also edit case attributes, since they're group level - -and from matrix: -> The weird bug aside, Bonface had (mostly) successfully implemented editing these through the CSV files in the same way as any other sample data, but for authorization reasons this probably doesn't make sense (since a user having access to editing sample data for specific traits doesn't imply that they'd have access for editing case attributes across the entire group) - -From this, it implies we might need a new set of privileges for dealing with case-attributes, e.g. -* group:resource:add-case-attributes - Allows user to add a completely new case attribute -* group:resource:edit-case-attributes - Allows user to edit an existing case attribute -* group:resource:delete-case-attributes - Allows user to delete an existing case attribute -* group:resource:view-case-attributes - Allows user to view case attributes and their value - -Considering, however, that groups (InbredSets) are not directly linked to any auth resource, this might mean some indirection of sorts, or maybe add a new resource type that handles groups. - -## Features - -* Editing existing case-attributes: YES -* Adding new case attributes: ??? -* Deleting existing case attributes: ??? - -Strains/samples are shared across traits. The values for the case attributes are the same for a particular strain/sample for all traits within a particular InbredSet (group). - -## Related and Unsynthesised Chats - -=> https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$myIoafLp_dIONnyNvEI0k2xf3Y8-LyiI_mkP2vBN08o?via=matrix.org -``` -Zachary SloanZ -I'm pretty sure multiple phenotypes and mRNA datasets can belong to the same experiment (and definitely for the purposes of case attributes -since the mRNA datasets are split by tissue -genotype traits should all be considered part of the same "experiment" (at least as long as we're still only databasing a single genotype file for each group) - -pjotrp -: Case attribute editing will still need to be group level, at least until the whole feature is completely changed. Since they're basically just phenotypes we choose to show in the trait page table, and phenotypes are at the group level -``` - -=> https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$P6SNnpY-nAZsDr3VZlRi05m6MT32lXBsCl-BYLh-YLM?via=matrix.org -``` -Zachary SloanZ -21:14 -Groups are defined by their list of samples/strains, and the "case attributes" are just "the characteristics of those samples/strains we choose to show on the trait page" (if we move away from the "group" concept entirely that could change, but if we did that we probably would also replace "case attributes" with something else because the way that's implemented is kind of weird to begin with) -ZB -``` - -## Related issues - -=> /issues/case-attr-edit-error -=> /issues/fix-case-attribute-work -=> /issues/fix-case-attribute-editing -=> /issues/consecutive-crud-applications-when-uploading-data -=> /issues/edit-metadata-bugs - -## References - -=> /topics/data-uploads/datasets diff --git a/topics/engineering/improving-wiki-rif-search-in-genenetwork.gmi b/topics/engineering/improving-wiki-rif-search-in-genenetwork.gmi new file mode 100644 index 0000000..74e7178 --- /dev/null +++ b/topics/engineering/improving-wiki-rif-search-in-genenetwork.gmi @@ -0,0 +1,119 @@ +# Improving RIF+WIKI Search + +* author: bonfacem +* reviewed-by: jnduli + +At the time of this writing, WIKI and/or RIF Search is extremely slow for MySQL .e.g. searching: "WIKI=nicotine MEAN=(12.103 12.105)" causes an Nginx time-out in Genenetwork2. This blog discusses how we improved the WIKI+RIF search using XAPIAN and some of our key learnings. + +### TLDR; Key Learnings from Adding RIF+WIKI to the Index + +* xapian-compacting is IO bound. +* Instrument your indexing script and appropriately choose an appropriate parallel process_count that fits your needs. +* Do NOT store positional data unless you need it. +* Consider stemming your data and removing stop-words from your data ahead of indexing. + +### Slow MySQL Performance + +When indexing genes, we have a complex query [0] which returns 48,308,714 rows + +running an "EXPLAIN" on [0] yields: + +``` +1 +------+-------------+----------------+--------+-----------------------------+------------------+---------+------------------------------------------------------------+-------+-------------+ +2 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +3 +------+-------------+----------------+--------+-----------------------------+------------------+---------+------------------------------------------------------------+-------+-------------+ +4 | 1 | SIMPLE | ProbeSetFreeze | ALL | PRIMARY | NULL | NULL | NULL | 931 | | +5 | 1 | SIMPLE | ProbeFreeze | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeSetFreeze.ProbeFreezeId | 1 | Using where | +6 | 1 | SIMPLE | Tissue | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeFreeze.TissueId | 1 | | +7 | 1 | SIMPLE | InbredSet | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeFreeze.InbredSetId | 1 | Using where | +8 | 1 | SIMPLE | Species | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.InbredSet.SpeciesId | 1 | | +9 | 1 | SIMPLE | ProbeSetXRef | ref | ProbeSetFreezeId,ProbeSetId | ProbeSetFreezeId | 2 | db_webqtl.ProbeSetFreeze.Id | 27287 | | +10 | 1 | SIMPLE | ProbeSet | eq_ref | PRIMARY | PRIMARY | 4 | db_webqtl.ProbeSetXRef.ProbeSetId | 1 | | +11 | 1 | SIMPLE | Geno | eq_ref | species_name | species_name | 164 | db_webqtl.InbredSet.SpeciesId,db_webqtl.ProbeSetXRef.Locus | 1 | Using where | ++------+-------------+----------------+--------+-----------------------------+------------------+---------+------------------------------------------------------------+-------+-------------+ +``` + +From the above table, we note that we have "ref" under the "type" column in line 9. The "type" column describes how the rows are found from the table (I.e the join type) [2]. In this case, "ref" means a non-unique index or prefix is used to find all the rows which we can see by running "SHOW INDEXES FROM ProbeSetXRef" (note the Non-unique value of 1 for ProbeSetFreezeId): + +``` ++--------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | ++--------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +| ProbeSetXRef | 0 | PRIMARY | 1 | DataId | A | 46061750 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 1 | ProbeSetFreezeId | 1 | ProbeSetFreezeId | A | 1688 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 1 | ProbeSetId | 1 | ProbeSetId | A | 11515437 | NULL | NULL | | BTREE | | | +| ProbeSetXRef | 1 | Locus_2 | 1 | Locus | A | 1806 | 5 | NULL | YES | BTREE | | | ++--------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +``` + +We get a performance hit on the join: "INNER JOIN ProbeSetXRef ON ProbeSetXRef.ProbeSetFreezeId = ProbeSetFreeze.Id" since ProbeSetXRef.ProbeSetFreezeId is a non-unique index. What this means to our query is that for rows scanned in the ProbeSetFreeze table, there are several rows under the ProbeSetXRef table tha will satisfy the JOIN condition. This is analogous to nested loops in programming. + +In the RIF Search, we append "INNER JOIN GeneRIF_BASIC ON GeneRIF_BASIC.symbol = ProbeSet.Symbol" to [0]. Running an EXPLAIN on this new query yields: + +``` +1 +------+-------------+----------------+--------+---------------------------------------+--------------+---------+------------------------------------------------------------+---------+-----------------------+ +2 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +3 +------+-------------+----------------+--------+---------------------------------------+--------------+---------+------------------------------------------------------------+---------+-----------------------+ +4 | 1 | SIMPLE | GeneRIF_BASIC | index | NULL | symbol | 777 | NULL | 1366287 | Using index | +5 | 1 | SIMPLE | ProbeSet | ref | PRIMARY,symbol_IDX,ft_ProbeSet_Symbol | symbol_IDX | 403 | func | 1 | Using index condition | +6 | 1 | SIMPLE | ProbeSetXRef | ref | ProbeSetFreezeId,ProbeSetId | ProbeSetId | 4 | db_webqtl.ProbeSet.Id | 4 | | +7 | 1 | SIMPLE | ProbeSetFreeze | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeSetXRef.ProbeSetFreezeId | 1 | | +8 | 1 | SIMPLE | ProbeFreeze | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeSetFreeze.ProbeFreezeId | 1 | Using where | +9 | 1 | SIMPLE | InbredSet | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeFreeze.InbredSetId | 1 | Using where | +10 | 1 | SIMPLE | Tissue | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.ProbeFreeze.TissueId | 1 | | +11 | 1 | SIMPLE | Species | eq_ref | PRIMARY | PRIMARY | 2 | db_webqtl.InbredSet.SpeciesId | 1 | | +12 | 1 | SIMPLE | Geno | eq_ref | species_name | species_name | 164 | db_webqtl.InbredSet.SpeciesId,db_webqtl.ProbeSetXRef.Locus | 1 | Using where | +13 +------+-------------+----------------+--------+---------------------------------------+--------------+---------+------------------------------------------------------------+---------+-----------------------+ +``` + +From the above we see that we have an extra "ref" on line 5 which adds extra overhead. Additionally, now under the "ref" column we see "func" with a "Using index condition" under the "Extra" column. This means that we are using some function during this join [3]. Specifically, this is because the "symbol" column in the GeneRIF_BASIC table is indexed, but the "Symbol" column in the ProbeSet table is not indexed. Regardless, this increases the performance of the query by some orders of magnitude. + +### Adding RIF+WIKI Search to the Existing Gene Index + +Our current indexer[4] works by indexing the results from [0] in chunks of 100,000 into separate xapian databases stored in different directories. This happens by spawning different child processes from the main indexer script. The final step in this process is to compact all the different databases into one database. + +To add RIF+WIKI indices to the existing gene index, we built a global cache. In each child process, we fetch the relevant RIF+WIKI entry from this cache and index. This increased our indexing time and space consumption. At one point we ran out of our RAM causing an intermittent outage on 2024-06-21 (search for "Outage for 2024-06-20 in the following link"): + +=> https://issues.genenetwork.org/topics/meetings/jnduli_bmunyoki Meeting notes + +When troubleshooting our outage, we realized the indexing script consumed all the RAM. This was because the child processes spawned by the index script each consumed around 3GB of RAM; with the total number of child processes and their RAM usage exceeding the system RAM. To remedy this, we settled on a total_child_process count of 67, limiting the number of spawned children and putting a cap on the total number of RAM the indexing script could consume. You can see the fix in this commit: + +=> https://github.com/genenetwork/genenetwork3/commit/99d0d1200d7dcd81e27ce65ab84bab145d9ae543 feat: set 67 parallel processes to run in prod + +To try to speed our indexing speed, we attempted to parallelize our compacting. Parallelising had some improvements in reducing our compacting time, but nothing significant. On a SATA drive, compacting 3 different databases which had been compacted from 50 different databases was significantly faster than compacting one database at once from 150 different databases. The conclusion we could draw from this was that the compacting process is IO bound. This is useful data because it informs the type of drive you would want to run our indexing script in, and in our case, an NVMe drive is an ideal candidate because of the fast IO speeds it has. + +To attempt to reduce the index script's space consumption and improve the script's performance, we first removed stop-words and most common words from the global cache, and stemmed words from other documents. This reduced the space footprint to 152 Gb. This was still unacceptable per our needs. Further research with how xapian indexing works pointed us to positional data in the XAPIAN index. In XAPIAN, positional data allows someone to be able to perform phrase searches such as: "nicotine NEAR mouse" which loosely translates to "search for the term nicotine which occurs near the term mouse." One thing we noticed in the RIF+WIKI search is that we don't need this type of search, a trade-off we were willing to make to make search faster and our XAPIAN database smaller. Instrumenting the impact of dropping positional data from RIF+WIKI data was immediate. Our indexing times, on the NVMe drive dropped to a record high of 1 hour 9 minutes with a size of 73 Gb! The table below summarizes our findings: + + +``` +| | Indexing Time (min) | Space (Gb) | % Inc Size (from G+P) | % Inc Time | +|------------------------------------------------------------------------------------------------------------------ -----| +|G+P (no stop-words, no-stemming, pos. data) | 75 | 60 | 0 | 0 | +|G+P+W+R (no stop-words, no stemming, pos. data)| 429 | 152 | 153.3 | 472 | +|G+P+W+R (stop-words, stemming, no pos. data) | 69 | 73 | 21.6 | -8 | + +Key: +---- +G: Genes +P: Phenotypes +W: Wiki +R: RIF +``` + +### Some Example Searches + +With RIF+WIKI search added, here are some searches you can try out in CD genenetwork instance: + +* wiki:nicotine AND mean:12.103..12.105 +* rif:isoneuronal AND mean:12.103..12.105 +* species:mouse wiki:addiction rif:heteroneuronal mean:12.103..12.105 +* symbol:shh rif:glioma wiki:nicotine + +### References + +=> https://github.com/genenetwork/genenetwork3/blob/52cd294c2f1d06dddbd6ff613b11f8bc43066038/scripts/index-genenetwork#L54-L89 [0] Gene Indexing SQL Query +=> https://mariadb.com/kb/en/explain/ [1] MariaDB EXPLAIN +=> https://stackoverflow.com/a/4528433 [2] What does eq_ref and ref types mean in MySQL explain? +=> https://planet.mysql.com/entry/?id=29724 [3] The meaning of ref=func in MySQL EXPLAIN +=> https://issues.genenetwork.org/topics/engineering/instrumenting-ram-usage [3] Instrument RAM Usage +=> https://github.com/genenetwork/genenetwork3/blob/main/scripts/index-genenetwork#L54 index-genenetwork diff --git a/topics/engineering/instrumenting-ram-usage.gmi b/topics/engineering/instrumenting-ram-usage.gmi new file mode 100644 index 0000000..4f7ab96 --- /dev/null +++ b/topics/engineering/instrumenting-ram-usage.gmi @@ -0,0 +1,32 @@ +# Instrumenting RAM usage + +* author: bonfacem +* reviewed-by: jnduli + +On 2024-06-21, TUX02 experienced an outage because we ran out of RAM on the server. Here we outline how to instrument processes that consume RAM, in particular, what to watch out for. + +=> https://issues.genenetwork.org/topics/meetings/jnduli_bmunyoki Meeting Notes + +The output of "free -m -h" looks like: + +``` + total used free shared buff/cache available +Mem: 251G 88G 57G 6.2G 105G 155G +Swap: 29G 20G 9.8G +``` + +When running "free", you can refresh the output regularly. As an example, to get human readable output every 2 seconds: + +> free -m -h -s 2 + +It's tempting to check the "free" column to see how much RAM is being used. However, this column also includes disk caching. Disk caching doesn't prevent applications from getting the memory they want[1]. What we need to be aware of instead are: + +* available: Make sure this is within acceptable thresholds. +* swap used: Make sure this does not change significantly. + +Also, use htop/top and filter out the process (and preferably order by RAM usage) you are monitoring to see how much RAM a process and it's children (if any) consume. + +## References + +=> https://www.linuxatemyram.com/index.html [0] Linux ate my ram! +=> https://www.linuxatemyram.com/play.html [1] Experiments and fun with Linux disk cache diff --git a/topics/engineering/setting-up-a-basic-pre-commit-hook-for-linting-scheme-files.gmi b/topics/engineering/setting-up-a-basic-pre-commit-hook-for-linting-scheme-files.gmi new file mode 100644 index 0000000..5324de8 --- /dev/null +++ b/topics/engineering/setting-up-a-basic-pre-commit-hook-for-linting-scheme-files.gmi @@ -0,0 +1,31 @@ +# Setting Up a Basic Pre-Commit Hook for Linting Scheme Files + +* author: bonfacem +* reviewed-by: jnduli + +Git executes hooks before/after events such as: commit, push and receive. A pre-commit hook runs before a commit is finalized [0]. This post shows how to create a pre-commit hook for linting scheme files using `guix style`. + +``` +# Step 1: Create the hook +touch .git/hooks/pre-commit + +# Step 2: Make the hook executable +chmod +x .git/hooks/pre-commit + +# Step 3: Copy the following to .git/hooks/pre-commit + +#!/bin/sh + +# Run guix style on staged .scm files +for file in $(git diff --cached --name-only --diff-filter=ACM | grep ".scm$"); do + if ! guix style --whole-file "$file"; then + echo "Linting failed for $file. Please fix the errors and try again." + exit 1 + fi + git add $file +done +``` + +## References: + +=> https://www.slingacademy.com/article/git-pre-commit-hook-a-practical-guide-with-examples/ [0] Git Pre-Commit Hook: A Practical Guide (with Examples) diff --git a/topics/engineering/using-architecture-decision-records-in-genenetwork.gmi b/topics/engineering/using-architecture-decision-records-in-genenetwork.gmi new file mode 100644 index 0000000..43d344c --- /dev/null +++ b/topics/engineering/using-architecture-decision-records-in-genenetwork.gmi @@ -0,0 +1,56 @@ +# Using Architecture Decision Records at GeneNetwork + +* author: bonfacem +* reviewed-by: fredm, jnduli + +> One of the hardest things to track during the life of a project is the motivation behind certain decisions. A new person coming on to a project may be perplexed, baffled, delighted, or infuriated by some past decision. +> -- Michael Nygard + +When building or maintaining software, there's often moments when we ask, "What were they thinking?" This happens when we are trying to figure out why something was done a certain way, leading to speculation, humor, or criticism[0]. Given the constraints we face when writing code, it's important to make sure that important decisions are well-documented and transparent. Architecture Decision Records (ADRs) are one such tool. They provide a structured way to capture the reasoning behind key decisions. + +ADRs consist 4 key sections [0]: + +* Status: An ADR begins with a proposed status. After discussions, it will be accepted or rejected. It is also possible for a decision to be superseded by a newer ADR later on. +* Context: The context section outlines the situation or problem, providing the background and constraints relevant to the decision. This section is meant to frame the issue concisely, not as a lengthy blog post or detailed explanation. +* Decision: This section clearly defines the chosen approach and the specific actions that will be taken to address the issue. +* Consequences: This part lays out the impact or outcomes of the decision, detailing the expected results and potential trade-offs. + +Optionally, when an ADR is rejected, you can add a section: + +* Rejection Rationale: Briefly provides some context for why the ADR was rejected. + +At GeneNetwork, we manage ADRs within our issue tracker, organizing them under the path "/topics/ADR/<project-name>/XXX-name.gmi". The "XXX" represents a three-digit number, allowing for an easy, chronological order of the proposals as they are created. + +Here is a template for a typical ADR in Genenetwork: + +``` +# [<project>/ADR-<XXX>] Title + +* author: author-name +* status: proposed +* reviewed-by: A, B, C + +## Context + +Some context. + +## Decision + +Decisions. + +## Consequences + +Consequences. +``` + +Here are some examples of Genenetwork specific ADRs: + +=> https://issues.genenetwork.org/topics/ADR/gn3/000-add-test-cases-for-rdf [gn3/ADR-000] Add RDF Test Case +=> https://issues.genenetwork.org/topics/ADR/gn3/000-remove-stace-traces-in-gn3-error-response [gn3/ADR-001] Remove Stack Traces in GN3 + +### References + +=> https://www.oreilly.com/library/view/mastering-api-architecture/9781492090625/ [0] Gough, J., Bryant, D., & Auburn, M. (2022). Mastering API Architecture: Design, Operate, and Evolve API-based Systems. O'Reilly Media, Incorporated. +=> https://adr.github.io/ [1] Architectural Decision Records. Homepage of the ADR GitHub organization +=> https://docs.aws.amazon.com/prescriptive-guidance/latest/architectural-decision-records/adr-process.html [2] Amazon's ADR process +=> https://cloud.google.com/architecture/architecture-decision-records [3] Google Cloud Center Architecture Decision Records Overview diff --git a/topics/engineering/working-with-virtuoso-locally.gmi b/topics/engineering/working-with-virtuoso-locally.gmi new file mode 100644 index 0000000..af249a5 --- /dev/null +++ b/topics/engineering/working-with-virtuoso-locally.gmi @@ -0,0 +1,70 @@ +# Working with Virtuoso for Local Development + +* author: bonfacem +* reviewed-by: jnduli + +Using guix, install the Virtuoso server: + +``` +guix install virtuoso-ose # or any other means to install virtuoso +cd /path/to/virtuoso/database/folder +cp $HOME/.guix-profile/var/lib/virtuoso/db/virtuoso.ini ./virtuoso.ini +# modify the virtuoso.ini file to save files to the folder you'd prefer +virtuoso-t +foreground +wait +debug +``` + +## Common Virtuoso Operations + +Use isql to load up data: + +``` +isql +# subsquent commands run in isql prompt +# this folder is relative to the folder virtuoso was started from +ld_dir ('path/to/folder/with/ttls', '*.ttl', 'http://genenetwork.org'); +rdf_loader_run(); +checkpoint; +``` + +Add data using HTTP: + +``` +# Replace dba:dba with <user>:<password> +curl --digest --user 'dba:dba' --verbose --url\ +"http://localhost:8890/sparql-graph-crud-auth?graph=http://genenetwork.org"\ +-T test-data.ttl +``` + +Delete data using HTTP: + +``` +# Replace dba:dba with <user>:<password> +curl --digest --user 'dba:dba' --verbose --url\ +"http://localhost:8890/sparql-graph-crud-auth?graph=http://genenetwork.org" -X DELETE +``` + +Query the graph data: + +``` +curl --verbose --url\ +"http://localhost:8890/sparql-graph-crud?graph=http://genenetwork.org" +``` + +Check out more cURL examples here: + +=> https://vos.openlinksw.com/owiki/wiki/VOS/VirtGraphProtocolCURLExamples SPARQL 1.1 Graph Store HTTP Protocol cURL Exampple Collection + +## Setting Passwords + +Virtuoso's default user is "dba" and its default password is "dba". To change a password, use isql to run: + +``` +set password "dba" "dba"; +CHECKPOINT; +``` + +## More + +Read a fuller more complete tutorial on Virtuoso here: + +=> https://issues.genenetwork.org/topics/systems/virtuoso Virtuoso diff --git a/topics/genenetwork-releases.gmi b/topics/genenetwork-releases.gmi new file mode 100644 index 0000000..e179629 --- /dev/null +++ b/topics/genenetwork-releases.gmi @@ -0,0 +1,77 @@ +# GeneNetwork Releases + +## Tags + +* status: open +* priority: +* assigned: +* type: documentation +* keywords: documentation, docs, release, releases, genenetwork + +## Introduction + +The sections that follow will be note down the commits used for various stable (and stable-ish) releases of genenetwork. + +The tagging of the commits will need to distinguish repository-specific tags from overall system tags. + +In this document, we only concern ourselves with the overall system tags, that shall have the template: + +``` +genenetwork-system-v<major>.<minor>.<patch>[-<commit>] +``` + +the portions in angle brackets will be replaced with the actual version numbers. + +## genenetwork-system-v1.0.0 + +This is the first, guix-system-container-based, stable release of the entire genenetwork system. +The commits involved are: + +=> https://github.com/genenetwork/genenetwork2/commit/314c6d597a96ac903071fcb6e50df3d9e88935e9 GN2: 314c6d5 +=> https://github.com/genenetwork/genenetwork3/commit/0d902ec267d96b87648669a7a43b699c8a22a3de GN3: 0d902ec +=> https://git.genenetwork.org/gn-auth/commit/?id=8e64f7f8a392b8743a4f36c497cd2ec339fcfebc: gn-auth: 8e64f7f +=> https://git.genenetwork.org/gn-libs/commit/?id=72a95f8ffa5401649f70978e863dd3f21900a611: gn-libs: 72a95f8 + +The guix channels used for deployment of the system above are as follows: + +``` +(list (channel + (name 'guix-bioinformatics) + (url "https://git.genenetwork.org/guix-bioinformatics/") + (branch "master") + (commit + "039a3dd72c32d26b9c5d2cc99986fd7c968a90a5")) + (channel + (name 'guix-forge) + (url "https://git.systemreboot.net/guix-forge/") + (branch "main") + (commit + "bcb3e2353b9f6b5ac7bc89d639e630c12049fc42") + (introduction + (make-channel-introduction + "0432e37b20dd678a02efee21adf0b9525a670310" + (openpgp-fingerprint + "7F73 0343 F2F0 9F3C 77BF 79D3 2E25 EE8B 6180 2BB3")))) + (channel + (name 'guix-past) + (url "https://gitlab.inria.fr/guix-hpc/guix-past") + (branch "master") + (commit + "5fb77cce01f21a03b8f5a9c873067691cf09d057") + (introduction + (make-channel-introduction + "0c119db2ea86a389769f4d2b9c6f5c41c027e336" + (openpgp-fingerprint + "3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5")))) + (channel + (name 'guix) + (url "https://git.savannah.gnu.org/git/guix.git") + (branch "master") + (commit + "2394a7f5fbf60dd6adc0a870366adb57166b6d8b") + (introduction + (make-channel-introduction + "9edb3f66fd807b096b48283debdcddccfea34bad" + (openpgp-fingerprint + "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA"))))) +``` diff --git a/topics/genenetwork/Case_Attributes_GN2 b/topics/genenetwork/Case_Attributes_GN2 new file mode 100644 index 0000000..52a956f --- /dev/null +++ b/topics/genenetwork/Case_Attributes_GN2 @@ -0,0 +1,2 @@ +# Update Case Attributes to capture hierarchy info +## The following provides guidelines and insight regarding case attributes as used in GeneNetwork Webservice searches diff --git a/topics/genenetwork/genenetwork-services.gmi b/topics/genenetwork/genenetwork-services.gmi new file mode 100644 index 0000000..717fdd8 --- /dev/null +++ b/topics/genenetwork/genenetwork-services.gmi @@ -0,0 +1,122 @@ +# GeneNetwork Services + +## Tags + +* type: documentation +* keywords: documentation, docs, doc, services, genenetwork services + +## GeneNetwork Core Services + +GeneNetwork is composed of a number of different services. This document attempts to document all the services that make up GeneNetwork and document what links give access to the services. + +### GeneNetwork2 + +This is the main user-interface to the entire GeneNetwork system. + +#### Links + +=> https://github.com/genenetwork/genenetwork2 Repository +=> https://genenetwork.org/ GN2 on production +=> https://fallback.genenetwork.org/ GN2 on old production +=> https://cd.genenetwork.org/ GN2 on CI/CD +=> https://staging.genenetwork.org/ GN2 on staging + +### GeneNetwork3 + +This is the main API server for GeneNetwork. + +#### Links + +=> https://github.com/genenetwork/genenetwork3 Repository +=> https://genenetwork.org/api3/ GN3 on production +=> https://fallback.genenetwork.org/api3/ GN3 on old production +=> https://cd.genenetwork.org/api3/ GN3 on CI/CD +=> https://staging.genenetwork.org/api3/ GN3 on staging + +### Sparql Service + +The SparQL service is served from a Virtuoso-OSE service. + +=> https://issues.genenetwork.org/topics/deploy/our-virtuoso-instances We have notes on our virtuoso instances here. + + +#### Links + +=> https://github.com/genenetwork/genenetwork3 Repository +=> https://sparql.genenetwork.org/sparql/ sparql-service on production +* ??? sparql-service on old production +* ??? sparql-service on CI/CD +* ??? sparql-service on staging + +### GN-Auth + +This is the authorisation server for the GeneNetwork system. + +#### Links + +=> https://git.genenetwork.org/gn-auth/ Repository +=> https://auth.genenetwork.org/ gn-auth on production +=> https://fallback.genenetwork.org/gn-auth/ gn-auth on old production +* ??? gn-auth on CI/CD +=> https://staging-auth.genenetwork.org/ gn-auth on staging + +### GN-Uploader + +This service is to be used for uploading data to GeneNetwork. It is currently in development (best case, alpha). + +#### Links + +=> https://git.genenetwork.org/gn-uploader/ Repository +* ??? gn-uploader on production +* ??? gn-uploader on old production +* ??? gn-uploader on CI/CD +=> https://staging-uploader.genenetwork.org/ gn-uploader on staging + +### Aliases Server + +An extra server to respond with aliases for genetic (etc.) symbols. + +This is currently a project in racket, but we should probably pull in the features in this repository into one of the others (probably GeneNetwork3) and trash this repository. + +#### Links + +=> https://github.com/genenetwork/gn3 Repository +=> https://genenetwork.org/gn3/ aliases-server on production +=> https://fallback.genenetwork.org/gn3/ aliases-server on old production +=> https://cd.genenetwork.org/gn3/ aliases-server on CI/CD +=> https://staging.genenetwork.org/gn3/ aliases-server on staging + +### Markdown Editing Server + +#### Links + +=> https://git.genenetwork.org/gn-guile/ Repository +=> https://genenetwork.org/facilities/ markdown-editing-server on production +=> https://fallback.genenetwork.org/facilities/ markdown-editing-server on old production +=> https://cd.genenetwork.org/facilities/ markdown-editing-server on CI/CD +=> https://staging.genenetwork.org/facilities/ markdown-editing-server on staging + +## Support Services + +These are other services that support the development and maintenance of the core services. + +### Issue Tracker + +We use a text-based issue tracker that is accessible via +=> https://issues.genenetwork.org/ + +The repository for this service is at +=> https://github.com/genenetwork/gn-gemtext-threads/ + +### Repositories Server + +This is where a lot of the genenetwork repositories live. You can access it at +=> https://git.genenetwork.org/ + +### Continuous Integration Service + +… + +=> https://ci.genenetwork.org/ + +### … diff --git a/topics/genenetwork/genenetwork-streaming-functionality.gmi b/topics/genenetwork/genenetwork-streaming-functionality.gmi new file mode 100644 index 0000000..4f81eea --- /dev/null +++ b/topics/genenetwork/genenetwork-streaming-functionality.gmi @@ -0,0 +1,43 @@ +# Genenetwork Streaming Functionality + +## Tags +* type: documentation +* Keywords: documentation, docs, genenetwork, streaming + +### Introduction +Genenetwork implements streaming functionality that logs results from a running external process to a terminal emulator. + +The streaming functionality can be divided into several sections. + +### Streaming UI +The terminal emulator is implemented using the `xterm.js` library and +logs results from the GN3 API. + +See: +=> https://github.com/xtermjs/xterm.js + +### Streaming API +This is the main endpoint for streaming: + +See reference: +=> https://github.com/genenetwork/genenetwork3/gn3/api/streaming.py + +### How to Integrate + +#### Import the `enable_streaming` Decorator + +``` +from gn3.computations.streaming import enable_streaming +``` + +#### Apply the Decorator to Your Endpoint that Runs an External Process + +Note: To run the external process, use the `run_process` function, +which captures the `stdout` in a file identified by the `run_id`. + +``` +@app.route('/your-endpoint') +@enable_streaming +def your_endpoint(streaming_output_file): + run_process(command, streaming_output_file, run_id) +``` diff --git a/topics/genenetwork/starting_gn1.gmi b/topics/genenetwork/starting_gn1.gmi index efbfd0f..e31061f 100644 --- a/topics/genenetwork/starting_gn1.gmi +++ b/topics/genenetwork/starting_gn1.gmi @@ -51,9 +51,7 @@ On an update of guix the build may fail. Try #######################################' # Environment Variables - private ######################################### - # sql_host = '[1]tux02.uthsc.edu' - # sql_host = '128.169.4.67' - sql_host = '172.23.18.213' + sql_host = '170.23.18.213' SERVERNAME = sql_host MYSQL_SERVER = sql_host DB_NAME = 'db_webqtl' diff --git a/topics/gn-learning-team/next-steps.gmi b/topics/gn-learning-team/next-steps.gmi new file mode 100644 index 0000000..b427923 --- /dev/null +++ b/topics/gn-learning-team/next-steps.gmi @@ -0,0 +1,48 @@ +# Next steps + +Wednesday we had a wrap-up meeting of the gn-learning efforts. + +## Data uploading + +The goal of these meetings was to learn how to upload data into GN. In the process Felix has become the de facto uploader, next to Arthur. A C. elegans dataset was uploaded and Felix is preparing + +* More C. elegans +* HSRat +* Kilifish +* Medaka + +Updates are here: + +=> https://issues.genenetwork.org/tasks/felixl + +We'll keep focussing on that work and hopefully we'll get more parties interested in doing some actual work down the line. + +## Hosting GN in Wageningen + +Harm commented that he thought these meetings were valuable, particularly we learnt a lot about GN ins and outs. Harm suggests we focus on hosting GN in Wageningen for C. elegans and Arabidopsis. +Pjotr says that is a priority this year, even if we start on a privately hosted machine in NL. Wageningen requires Docker images and Bonface says that is possible - with some work. So: + +* Host GN in NL +* Make GN specific for C.elegans and Arabidopsis - both trim and add datasets +* Create Docker container +* Host Docker container in Wageningen +* Present to other parties in Wageningen + +Having above datasets will help this effort succeed. + +## AI + +Harm is also very interested in the AI efforts and wants to pursue that in the context of above server - i.e., functionality arrives when it lands in GN. + +## Wormbase + +Jameson suggest we can work with Wormbase and the Caender folks once we have a running system. Interactive data analysis is very powerful and could run in conjunction with those sites. + +=> https://caendr.org/ +=> https://wormbase.org/ + +Other efforts are Flybase and Arabidopsis Magic which we can host, in principle. + +## Mapping methods + +Jameson will continue with his work on risiduals. diff --git a/topics/gn-uploader/genome-details.gmi b/topics/gn-uploader/genome-details.gmi new file mode 100644 index 0000000..f8a12f6 --- /dev/null +++ b/topics/gn-uploader/genome-details.gmi @@ -0,0 +1,42 @@ +# Genome Details + +This file is probably misnamed. + +*TODO*: Update name once we know where this fits + +## Tags + +* type: documentation, doc, docs +* assigned: fredm +* priority: docs +* status: open +* keywords: gn-uploader, uploader, genome + +## Location + +### centiMorgan (cM) + +We no longer use centiMorgan in GeneNetwork + +From the email threads: + +``` +> … +> Sorry, we now generally do not use centimorgans. Chr 19 is 57 cM +> using markers that exclude telomeres in most crosses. +> … +``` + +and + +``` +> … +> I know that cM is a bit more variable because it's not a direct measurement, … +> … +``` + +### Megabasepairs (Mbp) + +The uploader will store any provided physical location values (in megabasepairs) in the +=> https://gn1.genenetwork.org/webqtl/main.py?FormID=schemaShowPage#Geno Geno table +specifically in the `Mb` field of that table. diff --git a/topics/gn-uploader/genotypes-assemblies-markers-and-genenetwork.gmi b/topics/gn-uploader/genotypes-assemblies-markers-and-genenetwork.gmi new file mode 100644 index 0000000..db0ddf3 --- /dev/null +++ b/topics/gn-uploader/genotypes-assemblies-markers-and-genenetwork.gmi @@ -0,0 +1,40 @@ +# Genotypes, Assemblies, Markers and GeneNetwork + +## Tags + +* type: documentation, docs, doc +* keywords: genotype, assembly, markers, data, database, genenetwork, uploader + +## Markers + +``` +The marker is the SNP… + +— Rob (Paraphrased) +``` + +SNPs (Single Nucleotide Polymorphisms) are specific locations of interest within the genome, where the pair of nucleotides can take different forms. + +A SNP and its immediate neighbourhood (a number of megabase pairs before and after the SNP) form a sequence that is effectively the marker, e.g. for mouse (Mus musculus) you could have the following sequence from the GRCm38 genome assembly (mm10): + +``` +GAGATAAAGATGGGTCCCTTGGCACAGGACTGGCCCACATTTCCaatataaattacaacaattttttttaaatttttaaaCAAAACAAGCATCTCACACAC/TTGAAAAAGAAGATGCATTCAAAGAAAATAGATGTTTCAATGTATTTAAGATAATCAAGAGATAACCATGACCATATCATGAGGAAACTTAAGAATTGGCA +``` + +where the position with `C/T` represents the SNP of interest and thus the marker. + +You can search this on the UCSC Genome Browser, specifically the +=> https://genome.ucsc.edu/cgi-bin/hgBlat BLAT search +to get the name of the marker, and some extra details regarding it. + +## Genome Assemblies + +The genome assembly used will "determine" the position of the marker on the genome — newer assemblies will (generally) give a better position accounting for more of the issues discovered in older assemblies. + +With most of the newer assemblies, the positions do not shift very drastically. + +## GeneNetwork + +Currently (September 2024), GeneNetwork uses the GRCm38 (mm10) assembly for mice. + +Unfortunately, since the system was built for mice, the tables (e.g. Geno table) do not account for the fact that you could have markers (and other data) from species other than Mus musculus. You thus have the Geno table with fields like `Mb_mm8`, `Chr_mm8` which are very mouse-specific. diff --git a/topics/gn-uploader/types-of-data.gmi b/topics/gn-uploader/types-of-data.gmi new file mode 100644 index 0000000..1f53dec --- /dev/null +++ b/topics/gn-uploader/types-of-data.gmi @@ -0,0 +1,63 @@ +# Types of Data in GeneNetwork + +## Tags + +* assigned: +* priority: +* status: open +* type: documentation +* keywords: gn-uploader, uploader, genenetwork, documentation, doc, docs, data, data type, types of data + +## Description + +There are five (5) main types of data in GeneNetwork + +* Classical Phenotypes (PublishData) +* High Content Data +* Genotype Data +* Cofactors and Attributes +* Metadata + +### Classical Phenotypes + +This is usually low-content data e.g. body weight, tail length, etc. + +This is currently saved in the `Publish*` tables in the database. + +This data is saved as is i.e. not log-transformed + +### High Content Data + +This includes mainly molecular data such as +* mRNA assay data +* genetic expression data +* probes +* tissue type and data + +These data are saved in the `ProbeSet*` database tables (and other closely related tables like the `Tissue*` tables - fred added this: verify). + +These could be saved in the database in a log-tranformed form - verify. + +How do you check for log-transformation in the data? + +### Genotype Data + +This is core data, and all other data seem to rely on its existence. + +Useful for: +* correlations, cofactor and PheWAS computations. +* mapping purposes +* search and display +* editing and curation + +### Cofactors and Attributes + +This data can be alphanumeric (mix of numerical and non-numerical) data. + +It is not intended for mapping. + +### Metadata + +This data should (ideally) always accompany any and all of the data types above. It provides contextual information regarding the data it accompanies, and is useful for search, and other contextualising operations. + +It is alphanumeric data, and mostly cannot be used for numeric computations. diff --git a/topics/guix/guix-profiles.gmi b/topics/guix/guix-profiles.gmi index 578bb82..8cf41d8 100644 --- a/topics/guix/guix-profiles.gmi +++ b/topics/guix/guix-profiles.gmi @@ -16,7 +16,7 @@ Alternatively put the following into a channels.scm file. ``` (list (channel (name 'gn-bioinformatics) - (url "https://gitlab.com/genenetwork/guix-bioinformatics") + (url "https://git.genenetwork.org/guix-bioinformatics") (branch "master"))) ``` Build a profile using diff --git a/topics/gunicorn/deploying-app-under-url-prefix.gmi b/topics/gunicorn/deploying-app-under-url-prefix.gmi new file mode 100644 index 0000000..b2e382f --- /dev/null +++ b/topics/gunicorn/deploying-app-under-url-prefix.gmi @@ -0,0 +1,121 @@ +# Deploying Your Flask Application Under a URL Prefix With GUnicorn + +## TAGS + +* type: doc, documentation, docs +* author: fredm, zachs +* keywords: flask, gunicorn, SCRIPT_NAME, URL prefix + +## Introduction + +You have your application and are ready to deploy it, however, for some reason, you want to deploy it under a URL prefix, rather than at a top-level-domain. + +This short article details the things you need to set up. + +## Set up Your WebServer (Nginx) + +You need to tell your webserver to serve the application under a particular url prefix. You do this using that particular webserver's reverse-proxying configurations: For this article, we will use Nginx as the server. + +Normally, you'd simply do something like: + +``` +server { + server_name your.server.domain + + ⋮ + + location /the-prefix/ { + proxy_pass http://127.0.0.1:8080/; + proxy_set_header Host $host; + ⋮ + } + + ⋮ +} +``` + +Here, your top-level domain will be https://your.server.domain and you therefore want to access your shiny new application at https://your.server.domain/the-prefix/ + +For a simple application, with no sessions or anything, this should work, somewhat, though you might run into trouble with things like static files (e.g. css, js, etc) if the application does not use the same ones as that one on the TLD. + +If you are using sessions, you might also run into an issue where there is an interaction in the session management of both applications, especially if the application on the TLD makes use of services from the application at the url prefix. This is mostly due to redirects from the url-prefix app getting lost and hitting the TLD app. + +To fix this, we change the configuration above to: + +``` +server { + server_name your.server.domain + + ⋮ + + location /the-prefix/ { + proxy_pass http://127.0.0.1:8080/the-prefix/; + proxy_set_header Host $host; + ⋮ + } + + ⋮ +} +``` + +but now, you get errors, since there is no endpoint in your shiny new app that in at the route /the-prefix/***. + +Enter Gunicorn! + + +## Setting up SCRIPT_NAME for GUnicorn + +### The "Hacky" Way + +At the point of invocation of GUnicorn, we set the SCRIPT_NAME environment variable to the value "/the-prefix" — note that there is no trailing slash; this is very important. You should now have something like: + +``` +$ export SCRIPT_NAME="/the-prefix" +$ gunicorn --bind 0.0.0.0:8082 --workers … +``` + +The first line tells GUnicorn what the URL prefix is. It will use this to compute what URL to pass to the flask application. + +Example, say you try accessing the endpoint + +``` +https://your.server.domain/the-prefix/auth/authorise?response_type=code&client_id=some-id&redirect_uri=some-uri +``` + +Gunicorn will split that URL into 2 parts using the value of the SCRIPT_NAME environment variable, giving you: + +* https://your.server.domain +* /auth/authorise?response_type=code&client_id=some-id&redirect_uri=some-uri + +It will then pass on the second part to flask. This is why the value of SCRIPT_NAME should not have a trailing slash. + +Note that using the SCRIPT_NAME environment variable is a convenience feature provided by GUnicorn, not a WSGI feature. If you ever change your WSGI server, there is no guarantee this fix will work. + +### Using WSGI Routing MiddleWare + +A better way is to make use of a WSGI routing middleware. You could do this by defining a separate WSGI entry point in your application's repository. + +``` +# wsgi_url_prefix.py +from werkzeug.wrappers import Response +from werkzeug.middleware.dispatcher import DispatcherMiddleware + +from app import create_app + +def init_prefixed_app(theapp): + theapp.wsgi_app = DispatcherMiddleware( + Response("Not Found", 404), + { + "/the-prefix": the_app.wsgi_app + }) + return theapp + + +app = init_prefixed_app(create_app()) +``` + +## References + +=> https://docs.gunicorn.org/en/latest/faq.html#how-do-i-set-script-name +=> https://dlukes.github.io/flask-wsgi-url-prefix.html +=> https://www.reddit.com/r/Python/comments/juwj3x/comment/gchdsld/ diff --git a/topics/lmms/bulklmm/readme.gmi b/topics/lmms/bulklmm/readme.gmi new file mode 100644 index 0000000..8bd96a8 --- /dev/null +++ b/topics/lmms/bulklmm/readme.gmi @@ -0,0 +1 @@ +This is a stub diff --git a/topics/lmms/gemma/permutations.gmi b/topics/lmms/gemma/permutations.gmi new file mode 100644 index 0000000..4c8932a --- /dev/null +++ b/topics/lmms/gemma/permutations.gmi @@ -0,0 +1,1014 @@ +# Permutations + +Currently we use gemma-wrapper to compute the significance level - by shuffling the phenotype vector 1000x. +As this is a lengthy procedure we have not incorporated it into the GN web service. The new bulklmm may work +in certain cases (genotypes have to be complete, for one). + +Because of many changes gemma-wrapper is not working for permutations. I have a few steps to take care of: + +* [X] read R/qtl2 format for phenotype + +# R/qtl2 and GEMMA formats + +See + +=> data/R-qtl2-format-notes + +# One-offs + +## Phenotypes + +For a study Dave handed me phenotype and covariate files for the BXD. Phenotypes look like: + +``` + +Record ID,21526,21527,21528,21529,21530,21531,21532,21537,24398,24401,24402,24403,24404,24405,24406,24407,24408,24412,27513,27514,27515,27516, +27517 +BXD1,18.5,161.5,6.5,1919.450806,3307.318848,0.8655,1.752,23.07,0.5,161.5,18.5,6.5,1919.450806,3307.318848,0.8655,1.752,0.5,32,1.5,1.75,2.25,1. +25,50 +BXD100,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x +BXD101,20.6,176.199997,4.4,2546.293945,4574.802734,1.729,3.245,25.172001,0.6,176.199997,20.6,4.4,2546.294189,4574.802734,1.7286,3.2446,0.6,32, +1.875,2.375,2.75,1.75,38 +BXD102,18.785,159.582993,6.167,1745.671997,4241.505859,0.771,2.216,22.796667,0.25,159.583328,18.785,6.166667,1745.672485,4241.506348,0.770667, +2.216242,0.25,28.08333,1.5,2,2.875,1.5,28.5 +... +``` + +which is close to the R/qtl2 format. GEMMA meanwile expects a tab delimited file where x=NA. You can pass in the column number with the -n switch. One thing GEMMA lacks it the first ID which has to align with the genotype file. The BIMBAM geno format, again, does not contain the IDs. See + +=> http://www.xzlab.org/software/GEMMAmanual.pdf + +What we need to do is create and use R/qtl2 format files because they can be error checked on IDs and convert those, again, to BIMBAM for use by GEMMA. In the past I wrote Python converters for gemma2lib: + +=> https://github.com/genetics-statistics/gemma2lib + +I kinda abandoned the project, but you can see a lot of functionality, e.g. + +=> https://github.com/genetics-statistics/gemma2lib/blob/master/gemma2/format/bimbam.py + +We also have bioruby-table as a generic command line tool + +=> https://github.com/pjotrp/bioruby-table + +which is an amazingly flexible tool and can probably do the same. I kinda abandoned that project too. You know, bioinformatics is a graveyard of projects :/ + +OK, let's try. The first step is to convert the phenotype file to something GEMMA can use. We have to make sure that the individuals align with the genotype file(!). So, because we work with GN's GEMMA files, the steps are: + +* [X] Read the JSON layout file - 'sample_list' is essentially the header of the BIMBAM geno file +* [X] Use the R/qtl2-style phenotype file to write a correct GEMMA pheno file (multi column) +* [X] Compare results with GN pheno output + +Running GEMMA by hand it complained + +``` +## number of total individuals = 235 +## number of analyzed individuals = 26 +## number of covariates = 1 +## number of phenotypes = 1 +## number of total SNPs/var = 21056 +## number of analyzed SNPs = 21056 +Calculating Relatedness Matrix ... +rsm10000000001, X, Y, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0.5, 0, 1, 0, 1, 0.5, 0, 1, 0, 0, 0, 1, 1, 0, 0.5, 1, 1, 0.5, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0.5, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0.5, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0.5, 0, 0, 0.5, 0, 1, 0, 1, 0, 0, 1, 0.5, 0, 1, 0, 0.5, 1, 1, 1, 1, 0.5, 0, 0, 0.5, 1, 0.5, 0.5, 0.5, 1, 0.5, 1, 0.5, 0.5, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0.5, 0, 0, 1, 0, 0.5, 1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5 +237 != 235 +WARNING: Columns in geno file do not match # individuals in phenotypes +ERROR: Enforce failed for not enough genotype fields for marker in src/gemma_io.cpp at line 1470 in BimbamKin +``` + +GEMMA on production is fine. So, I counted BXDs. For comparison, GN's pheno outputs 241 BXDs. Daves pheno file has 241 BXDs (good). But when using my script we get 235 BXDs. Ah, apparently they are different from what we use on GN because GN does not use the parents and the F1s for GEMMA. So, my script should complain when a match is not made. Turns out the JSON file only contains 235 'mappable' BXDs and refers to BXD.8 which is from Apr 26, 2023. The header says `BXD_experimental_DGA_7_Dec_2021` and GN says WGS March 2022. So which one is it? I'll just go with latest, but genotype naming is problematic and the headers are not updated. + +> MOTTO: Always complain when there are problems! + +Luckily GEMMA complained, but the script should have also complained. The JSON file with 235 genometypes is not representing the actual 237 genometypes. We'll work on that in the next section. + +Meanwhile let's add this code to gemma-wrapper. The code can be found here: + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/master/bin/rqtl2-pheno-to-gemma.py + +## Genotypes + +The pheno script now errors with + +``` +ERROR: sets differ {'BXD065xBXD102F1', 'C57BL/6J', 'DBA/2J', 'BXD077xBXD065F1', 'D2B6F1', 'B6D2F1'} +``` + +Since these are parents and F1s, and are all NAs in Dave's phenotypes, they are easy to remove. So, now we have 235 samples in the phenotype file and 237 genometypes in the genotype file (according to GEMMA). A quick check shows that BXD.geno has 236 genometypes. Same for the bimbam on production. We now have 3 values: 235, 236 and 237. Question is why these do not overlap. + +### Genotype probabilities for GEMMA + +Another problem on production is that we are not using the standard GEMMA values. So GEMMA complains with + +``` +WARNING: The maximum genotype value is not 2.0 - this is not the BIMBAM standard and will skew l_lme and effect sizes +``` + +This explains why we divide the effect size by 2 in the GN production code. Maybe it is a better idea to fix then geno files! + +* [X] Generate BIMBAM file from GENO .geno files (via R/qtl2) +* [X] Check bimbam files on production + +So we need to convert .geno files as they are the current source of genotypes in GN and contain the sample names that we need to align with pheno files. For this we'll output two files - one JSON file with metadata and sample names and the actual BIMBAM file GEMMA requires. I notice that I actually never had the need to parse a geno file! Zach wrote a tool `gn2/maintenance/convert_geno_to_bimbam.py` that also writes the GN JSON file and I'll take some ideas from that. We'll also need to convert to R/qtl2 as that is what Dave can use and then on to BIMBAM. So, let's add that code to gemma-wrapper again. + +This is another tool at + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/master/bin/gn-geno-to-gemma.py + +where the generated JSON file helps create the pheno file. We ended up with 237 genometypes/samples to match the genotype file and all of Dave's samples matched. Also, now I was able to run GEMMA successfully and passed in the pheno column number with + +``` +gemma -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 +gemma -lmm 9 -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -k output/result.cXX.txt -n 5 +``` + +the pheno file can include the sample names as long as there are no spaces in them. For marker rs3718618 we get values -9 0 X Y 0.317 7.930689e+02 1.779940e+02 1.000000e+05 7.532662e-05. The last value translates to + +``` +-Math.log10(7.532662e-05) => 4.123051519468808 +``` + +and that matches GN's run of GEMMA w.o. LOCO. + +The next step is to make the -n switch run with LOCO on gemma-wrapper. + +``` +./bin/gemma-wrapper --loco --json -- -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > K.json +./bin/gemma-wrapper --keep --force --json --loco --input K.json -- -lmm 9 -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > GWA.json +``` + +Checking the output we get + +``` +-Math.log10(3.191755e-05) => 4.495970452606926 +``` + +and that matches Dave's output for LOCO and marker rs3718618. All good, so far. Next step permute. + +## Permute + +Now we have gemma-wrapper working we need to fix it to work with the latest type of files. + +* [X] randomize phenotypes using -n switch +* [X] Permute gemma and collect results +* [X] Unseed randomizer or make it an option +* [X] Fix tmpdir +* [X] Show final score +* [X] Compare small and large BXD set + +For the first one, the --permutate-phenotype switch takes the input pheno file. Because we pick a column with gemma we can randomize all input lines together. So, in the above example, we shuffle BXD_pheno_Dave-GEMMA.txt. Interestingly it looks like we are already shuffling by line in gemma-wrapper. + +The good news is that it runs, but the outcome is wrong: + +``` +["95 percentile (significant) ", 1000.0, -3.0] +["67 percentile (suggestive) ", 1000.0, -3.0] +``` + +Inspecting the phenotype files they are shuffled, e.g. + +``` +BXD073xBXD065F1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA +BXD49 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA +BXD86 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA +BXD161 15.623 142.908997 4.0 2350.637939 3294.824951 1.452 2.08 20.416365 0.363636 142.909088 15.622727 4.0 2350.638672 3294.825928 1.45 +1636 2.079909 0.363636 33.545448 2.125 2.0 2.375 1.25 44.5 +BXD154 20.143 195.5 4.75 1533.689941 4568.76416 0.727 2.213748 27.9275 0.75 195.5 20.142857 4.75 1533.690796 4568.76416 0.72675 2.2137 +48 0.75 54.5 0.75 1.75 3.0 1.5 33.0 +``` + +which brings out an interesting point. Most BXDs in the genotype file are missing from this experiment. We are computing LOD scores as if we have a full BXD population. So, what we are saying here is that if we have all BXD genotypes and we randomly assign phenotypes against a subset, what is the chance we get a hit at random. I don't think this is a bad assumption, but it not exactly what Gary Churchill had in mind in his 1994 paper: + +=> https://pubmed.ncbi.nlm.nih.gov/7851788/ Empirical threshold values for quantitative trait mapping + +The idea is to shuffle genotypes against phenotypes. If there is a high correlation we get a result. The idea is to break the correlation and that should work for both the large and the small BXD set. Scoring the best 'random' result out of 1000 permutations at, say 95% highest, sets the significance level. +With our new precompute we should be able to show the difference. Anyway, that is one problem, the other is that the stats somehow do not add up to the final result. Score min is set at + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/7769f209bcaff2472ba185234fad47985e59e7a3/bin/gemma-wrapper#L667 + +The next line says 'if false'. Alright, that explains part of it at least as the next block was disabled for slurm and is never run. I should rip the slurm stuff out, actually, as Arun has come up with a much better solution. But that is for later. + +Disabling that permutation stopped with + +``` +Add parallel job: time -v /bin/gemma -loco X -k 02fe8482913a998e6e9559ff5e3f1b89e904d59d.X.cXX.txt.cXX.txt -o 55b49eb774f638d16fd267313d8b4d1d6d2a0a25.X.assoc.txt -p phenotypes-1 -lmm 9 -g BXD-test.txt -n 5 -a BXD.8_snps.txt -outdir /tmp/d20240823-4481-xfrnp6 +DEBUG: Reading 55b49eb774f638d16fd267313d8b4d1d6d2a0a25.X.assoc.txt.1.assoc.txt +./bin/gemma-wrapper:672:in `foreach': No such file or directory @ rb_sysopen - 55b49eb774f638d16fd267313d8b4d1d6d2a0a25.X.assoc.txt.1.assoc.txt (Errno::ENOENT) +``` + +so it created a file, but can't find it because outdir is not shared. Now tmpdir is in the outer block so the file should still exist. For troubleshooting the first step is to seed the randomizer (seed) so we get the same run every time. +It turns out there are a number of problems. First of all the permutation output was numbered and the result was not found. Fixing that gave a first result without the -parallel switch: + +``` +[0.0008489742, 0.03214928, 0.03426648, 0.0351207, 0.0405179, 0.04688354, 0.0692488, 0.1217158, 0.1270747, 0.1880325] +["95 percentile (significant) ", 0.0008489742, 3.1] +["67 percentile (suggestive) ", 0.0351207, 1.5] +``` + +That is pleasing and it suggests that we have a significant result for the trait of interest: `volume of the first tumor that developed`. Running LOCO withouth parallel is slow (how did we survive in the past!). + +The 100 run shows + +``` +[0.0001626146, 0.0001993085, 0.000652191, 0.0007356249, 0.0008489742, 0.0009828207, 0.00102203, 0.001091924, 0.00117823, 0.001282312, 0.001471041, 0.001663572, 0.001898194, 0.003467039, 0.004655921, 0.005284387, 0.005628393, 0.006319995, 0.006767502, 0.007752473, 0.008757406, 0.008826192, 0.009018125, 0.009735282, 0.01034488, 0.01039465, 0.0122644, 0.01231366, 0.01265093, 0.01317425, 0.01348443, 0.013548, 0.01399461, 0.01442383, 0.01534904, 0.01579931, 0.01668551, 0.01696015, 0.01770371, 0.01838937, 0.01883068, 0.02011034, 0.02234977, 0.02362105, 0.0242342, 0.02520063, 0.02536663, 0.0266905, 0.02932001, 0.03116032, 0.03139836, 0.03176087, 0.03214928, 0.03348359, 0.03426648, 0.0351207, 0.03538503, 0.0354338, 0.03609931, 0.0371134, 0.03739827, 0.03787489, 0.04022586, 0.0405179, 0.04056273, 0.04076034, 0.04545012, 0.04588635, 0.04688354, 0.04790254, 0.05871501, 0.05903692, 0.05904868, 0.05978341, 0.06103624, 0.06396175, 0.06628317, 0.06640048, 0.06676557, 0.06848021, 0.0692488, 0.07122914, 0.07166011, 0.0749728, 0.08174019, 0.08188341, 0.08647539, 0.0955264, 0.1019648, 0.1032776, 0.1169525, 0.1182405, 0.1217158, 0.1270747, 0.1316735, 0.1316905, 0.1392859, 0.1576149, 0.1685975, 0.1880325] +["95 percentile (significant) ", 0.0009828207, 3.0] +["67 percentile (suggestive) ", 0.01442383, 1.8] +``` + +Not too far off! + +The command was + +``` +./bin/gemma-wrapper --debug --no-parallel --keep --force --json --loco --input K.json --permutate 100 --permute-phenotype BXD_pheno_Dave-GEMMA.txt -- -lmm 9 -g BXD-test.txt -n 5 -a BXD.8_snps.txt +``` + +It is fun to see that when I did a second run the + +``` +[100, ["95 percentile (significant) ", 0.0002998286, 3.5], ["67 percentile (suggestive) ", 0.01167864, 1.9]] +``` + +significance value was 3.5. Still, our hit is whopper - based on this. + +## Run permutations in parallel + +Next I introduced and fixed parallel support for permutations, now we can run gemma LOCO with decent speed - about 1 permutation per 3s! That is one trait in an hour on my machine. + +=> https://github.com/genetics-statistics/gemma-wrapper/commit/a8d3922a21c7807a9f20cf9ffb62d8b16f18c591 + +Now we can run 1000 permutations in an hour, rerunning above we get + +``` +["95 percentile (significant) ", 0.0006983356, 3.2] +["67 percentile (suggestive) ", 0.01200505, 1.9] +``` + +which proves that 100 permutations is not enough. It is a bit crazy to think that 5% of randomized phenotypes will get a LOD score of 3.2 or higher! + +Down the line I can use Arun's CWL implementation to fire this on a cluster. Coming... + +## Reduce genotypes for permutations + +In the next phase we need to check if shuffling the full set of BXDs makes sense for computing permutations. Since I wrote a script for this exercise to transform BIMBAM genotypes I can reuse that: + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/a8d3922a21c7807a9f20cf9ffb62d8b16f18c591/bin/gn-geno-to-gemma.py#L31 + +If we check the sample names we can write a reduced genotype matrix. Use that to compute the GRM. Next permute with the smaller BXD sample set and genotypes. + +Instead of modifying above script I decided to add another one + +``` +bimbam-filter.py --json BXD.geno.json --sample-file BXD_pheno_Dave-GEMMA-samples.txt BXD_geno.txt > BXD_geno-samples.txt +``` + +which takes as inputs the json file from gn-geno-to-gemma and the GEMMA input file. This is not to mix targets and keeping the code simple. Now create the GRM with + +``` +./bin/gemma-wrapper --loco --json -- -gk -g BXD_geno-samples.txt -p BXD_pheno_Dave-GEMMA-samples.txt -n 5 -a BXD.8_snps.txt > K-samples.json +./bin/gemma-wrapper --keep --force --json --loco --input K-samples.json -- -lmm 9 -g BXD_geno-samples.txt -p BXD_pheno_Dave-GEMMA-samples.txt -n 5 -a BXD.8_snps.txt > GWA-samples.json +``` + +Now the hit got reduced: + +``` +-Math.log10(1.111411e-04) +=> 3.9541253091741235 +``` + +and with 1000 permutations + +``` +./bin/gemma-wrapper --debug --parallel --keep --force --json --loco --input K-samples.json --permutate 1000 --permute-phenotype BXD_pheno_Dave-GEMMA-samples.txt -- -lmm 9 -g BXD_geno-samples.txt -n 5 -a BXD.8_snps.txt +["95 percentile (significant) ", 0.0004184217, 3.4] +["67 percentile (suggestive) ", 0.006213012, 2.2] +``` + +we are still significant. Though the question is now why results differ so much, compared to using the full BXD genotypes. + +## Why do we have a difference with the full BXD genotypes? + +GEMMA strips out the missing phenotypes in a list. Only the actual phenotypes are used. We need to check how the GRM is used and what genotypes are used by GEMMA. For the GRM the small genotype file compares vs the large: + +``` +Samples small large +BXD1 <-> BXD1 0.248 0.253 +BXD24 <-> BXD24 0.255 0.248 +BXD1 <-> BXD24 -0.040 -0.045 +BXD1 <-> BXD29 0.010 0.009 +``` + +You can see there is a small difference in the computation of K even though it looks pretty close. This is logical because with the full BXD set all genotypes are used. With a smaller BXD set only those genotypes are used. We expect a difference in values, but not much of a difference in magnitude (shift). The only way to prove that K impacts the outcome is to take the larger matrix and reduce it to the smaller one using those values. I feel another script coming ;) + +Above numbers are without LOCO. With LOCO on CHR18 + +``` +Samples small large +BXD1 <-> BXD1 0.254 0.248 +BXD1 <-> BXD24 -0.037 -0.042 +``` + +again a small shift. OK, let's try computing with a reduced matrix and compare results for rs3718618. Example: + +``` +gemma -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt -o full-bxd +gemma -lmm 9 -k output/full-bxd.cXX.txt -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt -o full-bxd +``` + +we get three outcomes where full-bxd is the full set, +``` +output/full-bxd.assoc.txt:18 rs3718618 7.532662e-05 +output/full-reduced-bxd.assoc.txt:18 rs3718618 2.336439e-04 +output/small-bxd.assoc.txt:18 rs3718618 2.338226e-04 +``` + +even without LOCO you can see a huge jump for the full BXD kinship matrix, just looking at our hit rs3718618: + +``` +-Math.log10(7.532662e-05) +=> 4.123051519468808 +-Math.log10(2.338226e-04) +=> 3.631113514641496 +``` + +With LOCO the difference may be even greater. + +So, which one to use? Truth is that the GRM is a blunt instrument. Essentially every combination of two samples/strains/genometypes gets compressed into a single number that gives a distance between the genomes. This number represents a hierarchy of relationships computed in differences in DNA (haplotypes) between those individuals. The more DNA variation is represented in the calculation, the more 'fine tuned' this GRM matrix becomes. Instinctively the larger matrix, or full BXD population, is a better estimate of distance between the individuals than just using a subset of DNA. + +So, I still underwrite using the full BXD for computing the GRM. To run GEMMA, I have just proven we can use the reduced GRM which will be quite a bit faster too, as the results are the same. For permutations we *should* use the reduced form of the full BXD GRM as it does not make sense to shuffle phenotypes against BXDs we don't use. So I need to recompute that. + +## Recomputing significance with the reduced GRM matrix + +* [ ] Recomute significance with reduced GRM + +I can reuse the script I wrote for the previous section. + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/master/bin/grm-filter.py + +So, the idea is to rerun permutations with the small set, but with the reduced GRM from the full BXD population. That ought to be straightforward by using the new matrix as an input for GWA. Only problem is that LOCO generates a GRM for every chromosome, so we need to make gemma-wrapper aware about the matrix reduction. As the reduction is fast we can do it for every run of gemma-wrapper and destroy it automatically with tmpdir. So: + +* [X] Compute the full GRM for every LOCO (if not cached) - already part of gemma-wrapper +* [X] Run through GRMs and reduce them in tmpdir +* [X] Plug new GRM name into computations - which really updates the JSON file that is input for GWA + +The interesting bit is that GEMMA requires input of phenotypes, but does not use them to compute the GRM. + +After giving it some thought we want GRM reduction to work in production GN because of the speed benefit. That means modifying gemma-wrapper to take a list of samples/genometypes as input - and we'll output that with GN. It is a good idea anyhow because it can give us some improved error feedback down the line. + +We'll use the --input switch to gemma-wrapper by providing the full list of genometypes that are used to compute the GRM and the 'reduced' list of genometypes that are used to reduce the GRM and compute GWA after. +So the first step is to create this JSON input file. We already created the "gn-geno-to-gemma" output that has a full list of samples as parsed from the GN .geno file. Now we need a script to generate the reduced samples JSON and merge that to "gn-geno-to-gemma-reduced" by addind a "samples-reduced" vector. + +The rqtl2-pheno-to-gemma.py script I wrote above already takes the "gn-geno-to-gemma" JSON. It now adds to the JSON: + +``` + "samples-column": 2, + "samples-reduced": { + "BXD1": 18.5, + "BXD24": 27.510204, + "BXD29": 17.204, + "BXD43": 21.825397, + "BXD44": 23.454, + "BXD60": 22.604, + "BXD63": 19.171, + "BXD65": 21.607, + "BXD66": 17.056999, + "BXD70": 17.962999, + "BXD73b": 20.231001, + "BXD75": 19.952999, + "BXD78": 19.514, + "BXD83": 18.031, + "BXD87": 18.258715, + "BXD89": 18.365, + "BXD90": 20.489796, + "BXD101": 20.6, + "BXD102": 18.785, + "BXD113": 24.52, + "BXD124": 21.762142, + "BXD128a": 18.952, + "BXD154": 20.143, + "BXD161": 15.623, + "BXD210": 23.771999, + "BXD214": 19.533117 + }, + "numsamples-reduced": 26 +``` + +which is kinda cool because now I can reduce and write the pheno file in one go. Implementation: + +=> https://github.com/genetics-statistics/gemma-wrapper/blob/master/bin/rqtl2-pheno-to-gemma.py + +OK, we are going to input the resulting JSON file into gemma-wrapper. At the GRM stage we ignore the reduction but we need to add these details to the outgoing JSON. So the following commands can run: + +``` +./bin/gemma-wrapper --loco --json --input BXD_pheno_Dave-GEMMA.txt.json -- -gk -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > K.json +``` + +where K.json has a json["input"] which essentially is above structure. + +``` +./bin/gemma-wrapper --keep --force --json --loco --input K.json -- -lmm 9 -g BXD-test.txt -p BXD_pheno_Dave-GEMMA.txt -n 5 -a BXD.8_snps.txt > GWA.json +``` + +Now I have to deal with phenotype files as they are rewritten. We should still cater for `-p` for GEMMA. We already have `--permute-phenotypes filen` for gemma-wrapper. Now we are adding `--phenotypes` to gemma-wrapper which replaces both! +Note that we can use -p if --phenotypes is NOT defined. Problem is we have a few paths now: + +* [X] Check phenotypes are directly passed into GEMMA with -p switch +* [X] Check phenotypes are passed in as a file with --phenotypes switch +* [X] Check phenotypes are coming in using the JSON file + +Fixed the first one with + +=> https://github.com/genetics-statistics/gemma-wrapper/commit/2b7570a7f0ba0d1080c730b208823c0622dd8f2c + +though that does not do caching (yet). Next stop doing LOCO I notice xz is phenomenally slow. Turns out it was not xz, but when using `tar -C` we switch into the path and somehow xz kept growing its output. + +At this point David told me that we don't have to do epoch or covariates. So it is just the traits. After getting side-tracked by a slow running python program for haplotype assessment we start up again. + +So, now we can pass in a trait using JSON. This is probably not a great idea when you have a million values, but for our purposes it will do. K.json contains the reduced samples. Next GWA is run on that. I had to fix minor niggles and get `parallel' to give more useful debug info. + +Next write the pheno file and pass it in! + +``` +./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g test/data/input/BXD_geno.txt.gz -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 -n 2 -debug +``` + +note the '-n 2' switch to get the second generated column in the phenotype file. We had our first successful run! To run permutations I get: + +``` +./bin/gemma-wrapper:722:in `<main>': You should supply --permute-phenotypes with gemma-wrapper --permutate (RuntimeError) +``` + +and, of course, as this reduced file is generated it not available yet. That was an easy fix/hack. Next I got + +``` +./bin/gemma-wrapper:230:in `block in <main>': Do not use the GEMMA -p switch with gemma-wrapper if you are using JSON phenotypes! +``` + +Hmm. This is a bit harder. The call to GWAS takes a kinship matrix and it gets reduced with every permutation. That is probably OK because it runs quickly, but I'll need to remove the -p switch... OK. Done that and permutations are running in a second for 28 BXD! That implies computing significance in the web service comes into view - especially if we use a cluster on the backend. + +It is interesting to see that 60% of time is spent in the kernel - which means still heavy IO on GEMMA's end - even with the reduced data: + +``` +%Cpu0 : 39.1 us, 51.0 sy +%Cpu1 : 34.0 us, 54.8 sy +%Cpu2 : 35.8 us, 54.5 sy +%Cpu3 : 37.5 us, 49.8 sy +%Cpu4 : 36.0 us, 53.3 sy +%Cpu5 : 29.5 us, 57.9 sy +%Cpu6 : 42.7 us, 44.7 sy +%Cpu7 : 35.9 us, 52.2 sy +%Cpu8 : 27.0 us, 60.7 sy +%Cpu9 : 24.5 us, 63.2 sy +%Cpu10 : 29.8 us, 58.9 sy +%Cpu11 : 25.3 us, 62.7 sy +%Cpu12 : 28.1 us, 58.9 sy +%Cpu13 : 34.2 us, 52.8 sy +%Cpu14 : 34.6 us, 52.2 sy +%Cpu15 : 37.5 us, 51.8 sy +``` + +There is room for more optimization. + +The good news is for a peak we have we find that it is statistically significant: + +``` +["95 percentile (significant) ", 0.0004945423, 3.3] +["67 percentile (suggestive) ", 0.009975183, 2.0] +``` + +Even though it was low permutations there was actually a real bug. It turns out I only picked the values from the X chromosome (ugh!). It looks different now. + +For the peaks of + +=> https://genenetwork.org/show_trait?trait_id=21526&dataset=BXDPublish + +after 1000 permutations (I tried a few times) the significance threshold with MAF 0.05 ends up at approx. + +["95 percentile (significant) ", 1.434302e-05, 4.8] +["67 percentile (suggestive) ", 0.0001620244, 3.8] + +If it is it means that for this trait BXD_21526 the peaks on chr 14 at LOD 3.5 are not significant, but close to suggestive (aligning with Dave's findings and comments). It is interesting to see the numbers quickly stabilize by 100 permutations (see attached). Now, this is before correcting for epoch effects and other covariates. And I took the data from Dave as is (the distribution looks fairly normal). Also there is a problem with MAF I have to look into: + +GEMMA in GN2 shows the same result when setting MAF to 0.05 or 0.1 (you can try that). The GN2 GEMMA code for LOCO does pass in -maf (though I see that non-LOCO does not - ugh again). I need to run GEMMA to see if the output should differ and I'll need to see the GN2 logs to understand what is happening. Maybe it just says that the hits are haplotype driven - and that kinda makes sense because there is a range of them. + +That leads me to think that we only need to check for epoch when we have a single *low* MAF hit, say 0.01 for 28 mice. As we actively filter on MAF right now we won't likely see an epoch hit. + + +## Protocol for permutations + +First we run GEMMA just without LOCO using default settings that GN uses + +``` +# Convert the GN geno file to BIMBAM geno file +./bin/gn-geno-to-gemma.py BXD.geno > BXD.geno.txt +# Match pheno file +./bin/rqtl2-pheno-to-gemma.py BXD_pheno_Dave.csv --json BXD.geno.json > BXD_pheno_matched.txt + Wrote GEMMA pheno 237 from 237 with genometypes (rows) and 24 collections (cols)! +gemma -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 +gemma -lmm 9 -g BXD.geno.txt -p BXD_pheno_matched.txt -k output/result.cXX.txt -n 5 +``` + +So far the output is correct. + +``` +-Math.log10(7.532460e-05) +=> 4.123063165904243 +``` + +Try with gemma-wrapper + +``` +./bin/gemma-wrapper --json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 -a BXD.8_snps.txt > K.json +cp output/bab43175329bd14d485e582b7ad890cf0ec28915.cXX.txt /tmp +``` + +Works, but the following failed without the -n switch: + +``` +./bin/gemma-wrapper --debug --verbose --force --json --lmdb --input K.json -- -g BXD.geno.txt -a BXD.8_snps.txt -lmm 9 -p BXD_pheno_matched.txt -n 5 +``` + +and worked with. That is logical, if you see output like + +``` +19 rs30886715 46903165 0 X Y 0.536 0.000000e+00 0.000000e+00 1.000000e-05 1.000000e+00 +19 rs6376540 46905638 0 X Y 0.536 0.000000e+00 0.000000e+00 1.000000e-05 1.000000e+00 +19 rs50610897 47412184 0 X Y 0.538 0.000000e+00 0.000000e+00 1.000000e-05 1.000000e+00 +``` + +It means the phenotype column that was parsed has empty values. In this case the BXD strain names. GEMMA should show a meaningful error. + +Now that works we can move to a full LOCO + + +``` +./bin/gemma-wrapper --loco --json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 -a BXD.8_snps.txt > K.json +./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g BXD.geno.txt -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 -p BXD_pheno_matched.txt -n 5 +./bin/./bin/view-gemma-mdb --sort /tmp/test/ca55b05e8b48fb139179fe09c35cff0340fe13bc.mdb +``` + +and we get + +``` +18,69216071,rs3718618,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69825784,rs50446650,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,68189477,rs29539715,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +``` + +When we converted BXD.geno to its BIMBAM BXD.geno.txt we also got a BXD.geno.json file which contains a list of the individuals/genometypes that were used in the genotype file. + +Now we reduce the traits file to something GEMMA can use for permutations - adding the trait number and output BXD_pheno_Dave.csv.json + +```sh +./bin/rqtl2-pheno-to-gemma.py BXD_pheno_Dave.csv --json BXD.geno.json -n 5 > BXD_pheno_matched-5.txt +``` + +The matched file should be identical to the earlier BXD_pheno_matched.txt file. Meanwhile, if you inspect the JSON file you should see + +``` +jq < BXD_pheno_Dave.csv.json + "samples-column": 5, + "trait": "21529", + "samples-reduced": { + "BXD1": 1919.450806, + "BXD101": 2546.293945, + "BXD102": 1745.671997, +``` + +So far we are OK! + +At this point we have a reduced sample set, a BIMBAM file and a phenotype file GEMMA can use! + +``` +./bin/gemma-wrapper --loco --json --input BXD_pheno_Dave.csv.json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -a BXD.8_snps.txt -n 5 > K.json +``` + +Note that at this step we actually create a full GRM. Reducing happens in the next mapping stage. + +``` +./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g BXD.geno.txt -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 +``` + +Note the use of '-n' switch. We should change that. + +``` +./bin/./bin/view-gemma-mdb /tmp/test/8599834ee474b9da9ff39cc4954d662518a6b5c8.mdb --sort +``` + +Look for rs3718618 at 69216071 and I am currently getting the wrong result for trait 21529 and it is not clear why that is: + +``` +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +16,88032783,?,0.538,-134.1339,75.7837,0.0,0.0009,3.02 +16,88038734,?,0.538,-134.1339,75.7837,0.0,0.0009,3.02 +(...) +18,69216071,?,0.462,10.8099,93.3936,0.0,0.8097,0.09 +``` + +The failing command is: + +``` +/bin/gemma -loco 18 -k /tmp/test/reduced-GRM-18.txt.tmp -o 69170e8a2d2f08905daa14461eca1d82a676b4c4.18.assoc.txt -p /tmp/test/reduced-pheno.txt.tmp -n 2 -g BXD.geno.txt -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 -outdir /tmp/test +``` + +produces + +``` +18 rs3718618 69216071 0 X Y 0.462 -2.161984e+01 9.339365e+01 1.000000e-05 8.097026e-01 +``` + +The pheno file looks correct, so it has to be the reduced GRM. And this does not look good either: + +``` +number of SNPS for K = 7070 +number of SNPS for GWAS = 250 +``` + +When running GEMMA on genenetwork.org we get a peak for LOCO at that position for rs3718618. I note that the non-LOCO version at 4.1 vs 4.5 for LOCO has a higher peak. We should compute the significance for both! + +Now, when I run the non-LOCO version by hand I get + +``` +-Math.log10(7.532460e-05) +=> 4.123063165904243 +``` + +## Finally + +So, we rolled back to not using reduced phenotypes for now. + +For trait 21529 after 1000 permutations we get for LOCO: + +``` +["95 percentile (significant) ", 1.051208e-05, 5.0] +["67 percentile (suggestive) ", 0.0001483188, 3.8] +``` + +which means our GWA hit is at 4.5 is not so close to being significant. + +Next I made sure the phenotypes got shuffled against the BXD used - which is arguably the right thing to do. +It should not have a huge impact because the BXDs share haplotypes - so randomized association should end up in the same ball park. The new result after 1000 permutations is: + +``` +["95 percentile (significant) ", 8.799303e-06, 5.1] +["67 percentile (suggestive) ", 0.0001048443, 4.0] +``` + +## More for Dave + + +Run and permute: + +``` +./bin/gemma-wrapper --lmdb --debug --phenotypes BXD_pheno_matched.txt --verbose --force --loco --json --input K.json -- -g BXD.geno.txt -a BXD.8. -lmm 9 -maf 0.05 -n 2 -p BXD_pheno_matched.txt +./bin/gemma-wrapper --debug --phenotypes BXD_pheno_matched.txt --permutate 1000 --phenotype-column 2 --verbose --force --loco --json --input K.json -- -g BXD.geno.txt -a test/data/input/BXD_snps.txt -lmm 9 -maf 0.05 +``` + +``` +21526 How old was the mouse when a tumor was first detected? +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +14,99632276,?,0.462,-0.6627,0.3322,100000.0,0.0003,3.56 +14,99694520,?,0.462,-0.6627,0.3322,100000.0,0.0003,3.56 +17,80952261,?,0.538,0.6528,0.3451,100000.0,0.0005,3.31 +["95 percentile (significant) ", 6.352578e-06, 5.2] +["67 percentile (suggestive) ", 0.0001007502, 4.0] +``` + +``` +24406 What was the weight of the first tumor that developed, at death? +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +11,9032629,?,0.536,0.1293,0.0562,100000.0,0.0,4.36 +11,9165457,?,0.536,0.1293,0.0562,100000.0,0.0,4.36 +11,11152439,?,0.5,0.126,0.0562,100000.0,0.0001,4.21 +11,11171143,?,0.5,0.126,0.0562,100000.0,0.0001,4.21 +11,11525458,?,0.5,0.126,0.0562,100000.0,0.0001,4.21 +11,8786241,?,0.571,0.1203,0.0581,100000.0,0.0002,3.78 +11,8836726,?,0.571,0.1203,0.0581,100000.0,0.0002,3.78 +11,19745817,?,0.536,0.1183,0.061,100000.0,0.0003,3.46 +11,19833554,?,0.536,0.1183,0.061,100000.0,0.0003,3.46 +["95 percentile (significant) ", 1.172001e-05, 4.9] +["67 percentile (suggestive) ", 0.0001175644, 3.9] +``` + +``` +27515 No description +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +4,103682035,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,103875085,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,104004372,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,104156915,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,104166428,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,104584276,?,0.481,-0.1653,0.0585,100000.0,0.0,5.57 +4,103634906,?,0.519,-0.1497,0.0733,100000.0,0.0002,3.67 +4,103640707,?,0.519,-0.1497,0.0733,100000.0,0.0002,3.67 +["95 percentile (significant) ", 7.501004e-06, 5.1] +["67 percentile (suggestive) ", 7.804668e-05, 4.1] +``` + +## Dealing with significance + +Now the significance thresholds appear to be a bit higher than we expect. So, let's see what is going on. First I check the randomization of the phenotypes. That looks great. There are 1000 different phenotype files and they randomized only the BXD we used. Let's zoom in on our most interesting 27515. When running in GN2 I get more hits - they are at the same level, but somehow SNPs have dropped off. In those runs our SNP of interest shows only a few higher values: + +``` +./6abd89211d93b0d03dc4281ac3a0abe7fc10da46.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 -2.932957e-01 7.337327e-02 1.000000e+05 2.700506e-04 +./b6e58d6092987d0c23ae1735d11d4a293782c511.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 -2.413067e-01 6.416133e-02 1.000000e+05 5.188637e-04 +./4266656951ab0c5f3097ddb4bf917448d7542dd5.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 2.757074e-01 6.815899e-02 1.000000e+05 2.365318e-04 +./265e44a4c078d2a608b7117bbdcb9be36f56c7de.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 2.358494e-01 5.743872e-02 1.000000e+05 1.996261e-04 +napoli:/export/local/home/wrk/iwrk/opensource/code/genetics/gemma-wrapper/tmp/test$ rg 103682035 .|grep 5$ +./b29f08a4b1061301d52f939087f1a4c1376256f0.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 -2.841255e-01 6.194426e-02 1.000000e+05 5.220922e-05 +./3e5b12e9b7478b127b47c23ccdfba2127cf7e2b2.4.assoc.txt.assoc.txt:4 rs28166983 103682035 0 X Y 0.481 -2.813968e-01 6.379554e-02 1.000000e+05 8.533857e-05 +``` + +but none as high as the original hit of 5.57 + +``` +irb(main):001:0> -Math.log10(2.700506e-04) +=> 3.5685548534637 +irb(main):002:0> -Math.log10(5.220922e-05) +=> 4.282252795052573 +irb(main):003:0> -Math.log10(8.533857e-05) +=> 4.06885463879464 +``` + +All good. This leaves two things to look into. First, I see less hits than with GN2(!). Second, qnorm gives a higher peak in GN2. + +* [X] Check for number of SNPs + +The number of SNPs is not enough: + +``` +GEMMA 0.98.6 (2022-08-05) by Xiang Zhou, Pjotr Prins and team (C) 2012-2022 +Reading Files ... +## number of total individuals = 237 +## number of analyzed individuals = 26 +## number of covariates = 1 +## number of phenotypes = 1 +## leave one chromosome out (LOCO) = 1 +## number of total SNPs/var = 21056 +## number of SNPS for K = 6684 +## number of SNPS for GWAS = 636 +## number of analyzed SNPs = 21056 +``` + +Even when disabling MAF filtering we still see a subset of SNPs. I am wondering what GN2 does here. + +## Missing SNPs + +In our results we miss SNPs that are listed on GN2, but do appear in our genotypes, e.g. + +``` +BXD.8_snps.txt +19463:rsm10000013598, 69448067, 18 +``` + +First of all we find we used a total of 6360 SNPs out of the original 21056. For this SNP the genotype files show: + +``` +BXD_geno.txt +19463:rsm10000013598, X, Y, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1, 0.5, 1, 1, 1, 1, 0, 1, 0, 1, 0.5, 0, 0, 0, 1, 0.5, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0.5, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0, 0, 0, 1, 1, 1, 0.5, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0.5, 0.5, 0, 0.5, 0.5, 0.5, 0, 0.5, 0.5, 0.5, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 0.5, 1, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0, 0.5, 1, 0.5, 0, 0.5 +``` + +and in our updated + +``` +BXD.geno.txt +rsm10000013598,X,Y,2,0,2,0,2,0,2,0,0,0,0,0,0,2,2,2,0,0,0,0,2,2,2,2,2,0,0,0,0,2,0,2,2,2,0,0,0,0,2,0,2,2,2,2,0,0,2,0,0,0,2,2,0,2,0,0,2,2,2,0,0,2,2,2,2,2,2,2,2,2,2,0,0,2,2,0,2,2,2,2,0,2,2,2,2,2,2,2,0,0,2,2,0,2,0,0,2,2,2,0,2,2,2,0,1,1,1,1,1,1,2,2,1,2,2,2,2,0,2,0,2,1,0,0,0,2,1,0,2,2,2,2,2,0,0,2,2,0,2,2,0,2,2,2,2,2,2,2,2,0,2,2,2,2,2,0,0,0,0,0,2,0,0,2,0,2,1,0,2,0,0,0,0,0,0,0,0,1,2,0,0,0,2,2,2,1,0,2,2,2,2,0,2,0,0,0,2,2,2,2,1,1,0,1,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,2,1,2,1,1,2,1,1,1,1,1,1,0,1,2,1,0,1 +``` + +That looks good. Turns out we need the annotation file(?!) + +I figured out where the missing SNPs went. Turns out that, if you pass in an annotation file, and if it is not complete, GEMMA drops the non-annotated SNPs unceremoniously. Getting the right annotation file fixed it. GEMMA should obviously not behave like that ;). Anyway, I am in sync with GN2 now. Unfortunately, with permutations, the significance threshold did not change much (which kinda makes sense). + +I want to see why gemma is giving this number. If I can't find it fast I'll try to run bulklmm or R/qtl2 lmm instead and see if they disagree with gemma and if we can get close to what Rob expects. + + +``` +gemma -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 22 +gemma -lmm 9 -g BXD.geno.txt -p BXD_pheno_matched.txt -k output/result.cXX.txt -n 22 +``` + +Now that works we can move to a full LOCO + +``` +./bin/gemma-wrapper --loco --json -- -gk -g BXD.geno.txt -p BXD_pheno_matched.txt -n 5 -a BXD.8_snps.txt > K.json +./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g BXD.geno.txt -a BXD.8_snps.txt -lmm 9 -maf 0.05 -p BXD_pheno_matched.txt -n 5 +./bin/./bin/view-gemma-mdb --sort /tmp/test/ca55b05e8b48fb139179fe09c35cff0340fe13bc.mdb +``` + +and we get + +``` +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +18,69216071,rs3718618,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69448067,rsm10000013598,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69463065,rsm10000013599,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69803489,rsm10000013600,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69825784,rs50446650,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,69836387,rsm10000013601,0.635,-195.5784,82.1243,100000.0,0.0,4.5 +18,68188822,rsm10000013579,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68189477,rs29539715,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68195226,rsm10000013580,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68195289,rsm10000013581,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68195758,rsm10000013582,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68454446,rs30216358,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68514475,rs6346101,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68521138,rsm10000013583,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68526029,rs29984158,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68542739,rsm10000013584,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68543456,rsm10000013585,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68564736,rsm10000013586,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +18,68565230,rsm10000013587,0.596,-189.7332,79.7479,100000.0,0.0,4.49 +``` + +which is in line with GN2. + +Run and permute: + +``` +./bin/gemma-wrapper --debug --phenotypes BXD_pheno_matched.txt --permutate 1000 --phenotype-column 2 --verbose --force --loco --json --input K.json -- -g BXD.geno.txt -a BXD.8_snps.txt -lmm 9 -maf 0.05 +``` + +* [X] Test significance effect for higher and lower MAF than 0.05 + +Lower MAF increases significance thresholds? + +``` +0.05? +["95 percentile (significant) ", 6.268117e-06, 5.2] +["67 percentile (suggestive) ", 7.457537e-05, 4.1] + +0.01 +["95 percentile (significant) ", 5.871237e-06, 5.2] +["67 percentile (suggestive) ", 7.046853e-05, 4.2] +``` + +* [ ] Check distribution of hits with permutations + +## What about significance + +What we are trying to do here is to decide on a significance level that says that the chance of a hit caused by a random event is less that 1 in a thousand. We are currently finding levels of 5.0 and from earlier work it should be less than 4.0. We are essentially following Gary Churchill's '94 paper: ``Empirical threshold values for quantitative trait mapping''. The significance level depends on the shape of the data - i.e., the shape of both genotypes and the trait under study. If the significance level is 5.0 it means that we can expect alpha=0.05 or 5% of random trait vectors can be expected to show a LOD score of 5 or higher. + +What GEMMA does is look for a correlation between a marker, e.g. + +``` +BXD.geno.txt +rsm10000013598,X,Y,2,0,2,0,2,0,2,0,0,0,0,0,0,2,2,2,0,0,0,0,2,2,2,2,2,0,0,0,0,2,0,2,2,2,0,0,0,0,2,0,2,2,2,2,0,0,2,0,0,0,2,2,0,2,0,0,2,2,2,0,0,2,2,2,2,2,2,2,2,2,2,0,0,2,2,0,2,2,2,2,0,2,2,2,2,2,2,2,0,0,2,2,0,2,0,0,2,2,2,0,2,2,2,0,1,1,1,1,1,1,2,2,1,2,2,2,2,0,2,0,2,1,0,0,0,2,1,0,2,2,2,2,2,0,0,2,2,0,2,2,0,2,2,2,2,2,2,2,2,0,2,2,2,2,2,0,0,0,0,0,2,0,0,2,0,2,1,0,2,0,0,0,0,0,0,0,0,1,2,0,0,0,2,2,2,1,0,2,2,2,2,0,2,0,0,0,2,2,2,2,1,1,0,1,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,2,1,2,1,1,2,1,1,1,1,1,1,0,1,2,1,0,1 +``` + +and a trait that is measured for a limited number against these individuals/strains/genometypes. We also correct for kinship between the individuals, but that is tied to the individuals, so we can ignore that for now. So you get a vector of: + +``` +marker rsm10000013598 +ind trait +0 8.1 +0 7.9 +2 12.3 +2 13.4 +``` + +We permute the data after breaking the correlation between left and right columns. When running 1000 permutations for this particular hit we find that the shuffled never gets a higher value then for our main run. That is comforting because random permutations are always less correlated (for this marker). + +If we do this genome-wide we also see a randomly positioned highest hit across all chromosomes after shuffling the trait vector and our hit never appears the highest. E.g. + +``` +[10, ["2", "rs13476914", "170826974"], ["95 percentile (significant) ", 1.870138e-05, 4.7], ["67 percentile (suggestive) ", 6.3797e-05, 4.2]] +[11, ["6", "rsm10000004149", "25227945"], ["95 percentile (significant) ", 1.870138e-05, 4.7], ["67 percentile (suggestive) ", 6.3797e-05, 4. 2]] +[12, ["9", "rsm10000006852", "81294046"], ["95 percentile (significant) ", 1.555683e-05, 4.8], ["67 percentile (suggestive) ", 4.216931e-05, 4.4]] +[13, ["2", "rsm10000001382", "57898368"], ["95 percentile (significant) ", 1.555683e-05, 4.8], ["67 percentile (suggestive) ", 6.3797e-05, 4. 2]] +[14, ["1", "rsm10000000166", "94030054"], ["95 percentile (significant) ", 1.555683e-05, 4.8], ["67 percentile (suggestive) ", 6.3797e-05, 4. 2]] +[15, ["X", "rsm10000014672", "163387262"], ["95 percentile (significant) ", 1.555683e-05, 4.8], ["67 percentile (suggestive) ", 6.3797e-05, 4 .2]] +``` + +### Shuffling a normally distributed trait + + +So the randomization works well. Still, or 95% is close to 5.0 and that is by chance. What happens when we change the shape of the data? Let's create a new trait, so the distribution is random and normal: + +``` +> rnorm(25, mean = 10, sd = 2) + [1] 10.347116 9.475156 11.747876 10.969742 11.374611 12.283834 11.499779 + [8] 11.123520 10.830300 11.640049 10.392085 11.586836 11.540470 10.700869 +[15] 8.802858 10.238498 11.099536 8.832104 6.463636 10.347956 11.222558 +[22] 8.658024 7.796304 10.684967 9.540483 +``` + +These random trait values renders a hit of -Math.log10(8.325683e-04) = 3.0! Now we permute and we get: + +["95 percentile (significant) ", 5.22093e-06, 5.3] +["67 percentile (suggestive) ", 7.303966e-05, 4.1] + +So the shape of a normally distribute trait gives a higher threshold - it is easier to get a hit by chance. + +### Genotypes + +So 95% of random shuffled trait runs still gives us 5.x. So this has to be a property of the genotypes in conjunction with the method GEMMA applies. With regard to genotypes, the BXD are not exactly random because they share markers from two parents which run along haplotypes. I.e. we are dealing with a patchwork of similar genotypes. You may expect that would suppress the chance of finding random hits. Let's try to prove that by creating fully random genotypes and an extreme haplotype set. And, for good measure something in between. + +* [X] Fully random genotypes + +In the next phase we are going to play a bit with the haplotypes. First we fully randomize the genotype matrix. This way we break all haplotypes. As BIMBAM is a simple format we'll just modify an existing BIMBAM file. It looks like + +``` +rs3677817,X,Y,1.77,0.42,0.18,0.42,1.42,0.34,0.69,1.57,0.52,0.1,0.37,1.27,0.62,1.87,1.71,1.65,1.83,0.04,1.05,0.52,1.92,0.57,0.61,0.11,1.49,1.07,1.48,1.7,0.5,1.75,1.74,0.29,0.37,1.78,1.91,1.37,1.64,0.32,0.09,1.21,1.58,0.4,1.0,0.62,1.1,0.7,0.35,0.86,0.7,0.46,1.14,0.04,1.87,1.96,0.61,1.34,0.63,1.04,1.95,0.22,0.54,0.31,0.14,0.95,1.45,0.93,0.37,0.79,1.37,0.87,1.79,0.41,1.73,1.25,1.49,1.57,0.39,1.61,0.37,1.85,1.83,1.71,1.5,1.78,1.34,1.29,1.41,1.54,1.05,0.3,0.87,1.85,0.5,0.19,1.54,0.53,0.26,1.47,0.67,0.84,0.18,0.79,0.68,1.48,0.4,1.83,1.76,1.09,0.2,1.48,0.24,0.53,0.41,1.24,1.38,1.31,1.73,0.52,1.86,1.21,0.58,1.68,0.79,0.4,1.41,0.07,0.57,0.42,0.47,0.49,0.05,0.77,1.33,0.15,1.41,0.03,0.24,1.66,1.39,2.0,0.23,1.4,1.05,0.79,0.51,0.66,1.24,0.29,1.12,0.46,0.92,1.12,1.53,1.78,1.22,1.35,0.1,0.43,0.41,1.89,0.09,0.13,1.04,0.24,1.4,1.25,0.24,0.26,0.31,0.36,0.31,1.34,1.23,1.91,0.7,0.08,1.43,0.17,1.9,0.06,1.42,1.94,0.43,0.54,1.96,1.29,0.64,0.82,1.85,1.63,0.23,1.79,0.52,1.65,1.43,0.95,1.13,0.59,0.07,0.66,1.79,0.92,1.89,1.2,0.51,0.18,0.96,0.44,0.46,0.88,0.39,0.89,1.68,0.07,1.46,1.61,1.73,0.56,1.33,1.67,0.16,1.78,0.61,1.55,0.88,0.15,1.98,1.96,0.61,0.04,0.12,1.4,1.65,0.71,1.3,1.85,0.49 +``` + +We'll stick in the old hit for good measure and run our genotypes: + +``` +./bin/gemma-wrapper --loco --json -- -gk -g BXD.geno.rand.txt -p BXD_pheno_matched.txt -n 5 -a BXD.8_snps.txt > K.json +./bin/gemma-wrapper --debug --verbose --force --loco --json --lmdb --input K.json -- -g BXD.geno.rand.txt -a BXD.8_snps.txt -lmm 9 -maf 0.05 -p BXD_pheno_matched.txt -n 22 +./bin/./bin/view-gemma-mdb --sort /tmp/test/ca55b05e8b48fb139179fe09c35cff0340fe13bc.mdb +./bin/view-gemma-mdb /tmp/e279abbebee8e41d7eb9dae...-gemma-GWA.tar.xz --anno BXD.8_snps.txt|head -20 +chr,pos,marker,af,beta,se,l_mle,l_lrt,-logP +X,139258413,rsm10000014629,0.496,0.2248,0.093,100000.0,0.0,4.58 +6,132586518,rsm10000003691,0.517,0.2399,0.1068,100000.0,0.0001,4.17 +2,161895805,rs27350606,0.585,-0.2303,0.1059,100000.0,0.0001,4.0 +X,47002415,rsm10000014323,0.562,-0.1904,0.0877,100000.0,0.0001,3.99 +3,32576363,rsm10000001568,0.468,-0.2251,0.104,100000.0,0.0001,3.97 +14,19281191,rs52350512,0.5,-0.2454,0.1154,100000.0,0.0001,3.88 +7,111680092,rs32385258,0.536,0.2022,0.0968,100000.0,0.0002,3.79 +4,151267320,rsm10000002095,0.604,-0.2257,0.1102,100000.0,0.0002,3.69 +2,157353289,rs27323024,0.455,0.2188,0.1072,100000.0,0.0002,3.67 +19,56503719,rsm10000013894,0.617,0.2606,0.1302,100000.0,0.0003,3.58 +``` + +Interestingly our trait did not do that well: + +``` +18,69448067,rsm10000013598,0.635,0.0941,0.0774,100000.0,0.0167,1.78 +``` + +It shows how large the impact of the GRM is. We can run our permutations. + +``` +./bin/gemma-wrapper --debug --phenotypes BXD_pheno_matched.txt --permutate 1000 --phenotype-column 22 --verbose --force --loco --json --input K.json -- -g BXD.geno.rand.txt -a BXD.8_snps.txt -lmm 9 -maf 0.05 +["95 percentile (significant) ", 1.478479e-07, 6.8] +["67 percentile (suggestive) ", 1.892087e-06, 5.7] +``` + +Well that went through the roof :). It makes sense when you think about it. Randomizing genotypes of 21K SNPs gives you a high chance of finding SNPs that correlate with the trait. Let's go the other way and give 20% of indidivuals the exact same haplotypes, basically copying + +``` +rsm10000013598,X,Y,2,0,2,0,2,0,2,0,0,0,0,0,0,2,2,2,0,0,0,0,2,2,2,2,2,0,0,0,0,2,0,2,2,2,0,0,0,0,2,0,2,2,2,2,0,0,2,0,0,0,2,2,0,2,0,0,2,2,2,0,0,2,2,2,2,2,2,2,2,2,2,0,0,2,2,0,2,2,2,2,0,2,2,2,2,2,2,2,0,0,2,2,0,2,0,0,2,2,2,0,2,2,2,0,1,1,1,1,1,1,2,2,1,2,2,2,2,0,2,0,2,1,0,0,0,2,1,0,2,2,2,2,2,0,0,2,2,0,2,2,0,2,2,2,2,2,2,2,2,0,2,2,2,2,2,0,0,0,0,0,2,0,0,2,0,2,1,0,2,0,0,0,0,0,0,0,0,1,2,0,0,0,2,2,2,1,0,2,2,2,2,0,2,0,0,0,2,2,2,2,1,1,0,1,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,2,1,2,1,1,2,1,1,1,1,1,1,0,1,2,1,0,1 +``` + +``` +./bin/bimbam-rewrite.py --inject inject.geno.txt BXD.geno.txt --perc=20 > BXD.geno.20.txt +rg -c "2,0,2,0,2,0,2,0,0,0,0,0,0,2,2,2,0,0,0,0,2,2,2,2,2,0,0,0,0,2,0,2,2,2,0,0,0,0,2,0,2,2,2,2,0,0,2,0,0,0,2,2,0,2,0,0,2,2,2,0,0,2,2,2,2,2,2,2,2,2,2,0,0,2,2,0,2,2,2,2,0,2,2,2,2,2,2,2,0,0,2,2,0,2,0,0,2,2,2,0,2,2,2,0,1,1,1,1,1,1,2,2,1,2,2,2,2,0,2,0,2,1,0,0,0,2,1,0,2,2,2,2,2,0,0,2,2,0,2,2,0,2,2,2,2,2,2,2,2,0,2,2,2,2,2,0,0,0,0,0,2,0,0,2,0,2,1,0,2,0,0,0,0,0,0,0,0,1,2,0,0,0,2,2,2,1,0,2,2,2,2,0,2,0,0,0,2,2,2,2,1,1,0,1,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,2,1,2,1,1,2,1,1,1,1,1,1,0,1,2,1,0,1" BXD.geno.20.txt +4276 +``` + +so 4K out of 20K SNPs has identical haplotypes which correlate with our trait of interest: + +``` +["95 percentile (significant) ", 5.16167e-06, 5.3] +["67 percentile (suggestive) ", 6.163728e-05, 4.2] +``` + +and at 40% haplotype injection we get + +``` +["95 percentile (significant) ", 3.104788e-06, 5.5] +["67 percentile (suggestive) ", 7.032406e-05, 4.2] +``` + +* [X] Haplotype equal genotypes 20% and 40% + +All looks interesting, but does not help. + +Also when we halve the number of SNPs the results are similar too. + +``` +["95 percentile (significant) ", 6.026549e-06, 5.2] +["67 percentile (suggestive) ", 8.571557e-05, 4.1] +``` + +Even though the threshold is high, it is kind of interesting to see that no matter what you do you end up similar levels. After a meeting with Rob and Saunak the latter pointed out that these numbers are not completely surprising. For LMMs we need to use an adaptation - i.e. shuffle the trait values after rotation and transformation and then reverse that procedure. There is only the assumption of normality that Churchill does not require. The good news is that BulkLMM contains that method and thresholds will be lower. The bad news is that I'll have to adapt it because it does not handle missing data. + +Oh yes, rereading the Churchill paper from 1994 I now realise he also suggests an at marker significance method that will end lower - we saw that already in an earlier comparison. Saunak, however, says that we *should* do experiment-wide. + +## BulkLMM + +* [ ] Run bulklmm + + +## Dealing with epoch + +Rob pointed out that the GRM does not necessarily represent epoch and that may influence the significance level. I.e. we should check for that. I agree that the GRM distances are not precise enough (blunt instrument) to capture a few variants that appeared in a new epoch of mice. I.e., the mice from the 90s may be different from the mice today in a few DNA variants that won't be reflected in the GRM. + +* [ ] Deal with epoch + +We have two or more possible solutions to deal with hierarchy in the population. + +## Covariates + +* [ ] Try covariates Dave + +## Later + +* [ ] Check running or trait without LOCO with both standard and random GRMs +* [ ] Test non-loco effect for rsm10000013598 - looks too low and does not agree with GN2 +* [X] Try qnorm run +* [ ] Fix non-use of MAF in GN for non-LOCO +* [ ] Fix running of -p switch when assoc cache exists (bug) + +Quantile-Based Permutation Thresholds for Quantitative Trait Loci Hotspots +https://academic.oup.com/genetics/article/191/4/1355/5935078 +by Karl, Ritsert et al. 2012 diff --git a/topics/lmms/rqtl2/genenetwork-rqtl2-implementation.gmi b/topics/lmms/rqtl2/genenetwork-rqtl2-implementation.gmi new file mode 100644 index 0000000..452930f --- /dev/null +++ b/topics/lmms/rqtl2/genenetwork-rqtl2-implementation.gmi @@ -0,0 +1,71 @@ +# Implementation of QTL Analysis Using r-qtl2 in GeneNetwork +## Tags + +* Assigned: alexm +* Keywords: RQTL, GeneNetwork2, implementation +* Type: Feature +* Status: In Progress + +## Description + +This document outlines the implementation of a QTL analysis tool in GeneNetwork using r-qtl2 (see docs: https://kbroman.org/qtl2/) and explains what the script does. +This PR contains the implementation of the r-qtl2 script for genenetwork: +=> https://github.com/genenetwork/genenetwork3/pull/201 + +## Tasks + +The script currently aims to achieve the following: + +* [x] Parsing arguments required for the script +* [x] Data validation for the script +* [x] Generating the cross file +* [x] Reading the cross file +* [x] Calculating genotype probabilities +* [x] Performing Geno Scan (scan1) using HK, LOCO, etc. +* [x] Finding LOD peaks +* [x] Performing permutation tests +* [x] Conducting QTL analysis for multiparent populations +* [ ] Generating required plots + +## How to Run the Script + +The script requires an input file containing all the necessary data to generate the control file. Example: + +```json +{ + "crosstype": "riself", + "geno_file": "grav2_geno.csv", + "geno_map_file": "grav2_gmap.csv", + "pheno_file": "grav2_pheno.csv", + "phenocovar_file": "grav2_phenocovar.csv" +} + +``` +In addition other parameters required are + +* output file (A file path of where the output for the script will be generated) +* --directory ( A workspace of where to generate the control file) + +Optional parameters include +* --output_file: The file path where the output for the script will be generated. +* --directory: The workspace directory where the control file will be generated. + +Optional parameters: + +* --cores: The number of cores to use (set to 0 for using all cores). +* --method: The scanning method to use (e.g., Haley-Knott, Linear Mixed Model, or LMM with Leave-One-Chromosome-Out). +* --pstrata: Use permutation strata. +* --threshold: Minimum LOD score for a peak. + + +An example of how to run the script: + +```sh + +Rscript rqtl2_wrapper.R --input_file [file_path] --directory [workspace_dir] --output_file [file_path] --nperm 100 --cores 3 + +``` +## Related issues: +https://issues.genenetwork.org/topics/lmms/rqtl2/using-rqtl2 +=> ./using-rqtl2 +=> ./gn-rqtl-design-implementation diff --git a/topics/lmms/rqtl2/gn-rqtl-design-implementation.gmi b/topics/lmms/rqtl2/gn-rqtl-design-implementation.gmi new file mode 100644 index 0000000..f37da42 --- /dev/null +++ b/topics/lmms/rqtl2/gn-rqtl-design-implementation.gmi @@ -0,0 +1,203 @@ +# RQTL Implementation for GeneNetwork Design Proposal + +## Tags + +* Assigned: alexm, +* Keywords: RQTL, GeneNetwork2, Design +* Type: Enhancements, +* Status: In Progress + + + +## Description + +This document outlines the design proposal for the re-implementation of the RQTL feature in GeneNetwork providing also a console view to track the stdout from the external process. + +### Problem Definition + +The current RQTL implementation faces the following challenges: + +- Lack of adequate error handling for the API and scripts. + +- Insufficient separation of concerns between GN2 and GN3. + +- lack way for user to track the progress of the r-qtl script being executed + +- There is lack of a clear way in which the r-qtl script is executed + +We will address these challenges and add enhancements by: + +- Rewriting the R script using r-qtl2 instead of r-qtl. + +- Establishing clear separation of concerns between GN2 and GN3, eliminating file path transfers between the two. + +- Implementing better error handling for both the API and the RQTL script. + +- run the script as a job in a task queue + +- Piping stdout from the script to the browser through a console for real-time monitoring. + +- Improving the overall design and architecture of the system. + + + +## High-Level Design +This is divided into three major components: + +* GN3 RQTL-2 Script implementation +* RQTL Api +* Monitoring system for the rqtl script + + +### GN3 RQTL-2 Script implementation +We currently have an rqtl script written in rqtl https://github.com/genenetwork/genenetwork3/blob/main/scripts/rqtl_wrapper.R +There is a newer rqtl implementation (rqtl-2) which is +a reimplementation of the QTL analysis software R/qtl, to better handle high-dimensional data and complex cross designs. +To see the difference between the two see documentation: +=> https://kbroman.org/qtl2/assets/vignettes/rqtl_diff.html +We aim to implement a seperate script using this while maintaining the one +implemented using rqtl1 (rqtl) . +(TODO) This probably needs to be split to a new issue(with enough knowledge) , to capture +each computation step in the r script. + +### RQTL Api + + +This component will serve as the entry point for running RQTL in GN3. At this stage, we need to improve the overall architecture and error handling. This process will be divided into the following steps: + +- Data Validation +In this step, we must validate that all required data to run RQTL is provided in the JSON format. This includes the mapping method, genotype file, phenotype file, etc. Please refer to the r-qtl2 documentation for an overview on the requirements : +=> https://rqtl.org/ + +- Data Preprocessing +During this stage, we will transform the data into a format that R can understand. This includes converting boolean values to the appropriate representations, preparing the RQTL command with all required values, and adding defaults where necessary. + +- Data Computation +In this stage, we will pass the RQTL script command to the task queue to run as a job. + +- Output Data Processing +In this step, we need to retrieve the results outputted from the script in a specified format, such as JSON or CSV and process the data. This may include outputs like RQTL pair scans and generated diagrams. Please refer to the documentation for an overview: +=> https://rqtl.org/ + + + +**Subtasks:** + +- [ ] add the rqtl api endpoint (10%) +- [ ] Input Data validation (15%) +- [ ] Input data processing (20%) +- [ ] Passing data to r-script for the computation (40%) +- [ ] output data processing (80%) + -[ ] add unittests for this module (100%) + + +### Monitoring system for the rqtl script + +This component involves creating a monitoring system to track the state of the external process and output relevant information to the user. +We need a way to determine the status for the current job for example +QUEUED, STARTED, INPROGRESS, COMPLETED (see deep dive for more on this) + + +## Deep Dive + + +### Running the External Script +The RQTL implementation is in R, and we need a strategy for executing this script as an external process. This can be subdivided into several key steps: + +- **Task Queue Integration**: + + - We will utilize a task queue system , + We already have an implementation in gn3 + to manage script execution + +- https://github.com/genenetwork/genenetwork3/blob/0820295202c2fe747c05b93ce0f1c5a604442f69/gn3/commands.py#L101 + +- **Job Submission**: + - Each API call will create a new job in the task queue, which will handle the execution of the R script. + +- **Script Execution**: + - This stage involves executing the R script in a controlled environment, ensuring all necessary dependencies are loaded. + +- **Monitoring and Logging**: + +- The system will include monitoring tools to track the status of each job. Users will receive real-time updates on job progress and logs for the current task. + +In this stage, we can have different states for the current job, such as QUEUED, IN PROGRESS, and COMPLETED. + +We need to output to the user which stage of computation we are currently on during the script +execution. + +- During the QUEUED state, the standard output (stdout) should display the command to be executed along with all its arguments. + +- During the STARTED stage, the stdout should notify the user that execution has begun. + +- In the IN PROGRESS stage, we need to fetch logs from the script being executed at each computation step. Please refer to this documentation for an overview of the different computations we +shall have : +=> https://rqtl.org/ + +- During the DONE step, the system should output the results from the R/qtl script to the user. + + +- **Result Retrieval**: + - Once the R script completes (either successfully or with an error), results will be returned to the API call. + +- **Error Handling**: + - Better error handling will be implemented to manage potential issues during script execution. This includes capturing errors from the R script and providing meaningful feedback to users through the application. + +### Additional Error Handling Considerations +This will involve: +* API error handling +* Error handling within the R script + +## Additional UI Considerations +We need to rethink where to output the external process stdout in the UI. Currently, we can add flags to the URL to enable this functionality, e.g., `URL/page&flags&console=1`. +Also the design suggestion is to output the results in a terminal emulator for +example xterm ,See more: https://xtermjs.org/, A current implementation already exists +for gn3 see +=> https://github.com/genenetwork/genenetwork2/blob/abe324888fc3942d4b3469ec8d1ce2c7dcbd8a93/gn2/wqflask/templates/wgcna_setup.html#L89 + +### Design Suggestions: +#### With HTMX, offer a split screen +This will include an output page and a monitoring system page. + +#### Popup button for preview +A button that allows users to preview and hide the console output. + + + + + +## Long-Term Goals +We aim to run computations on clusters rather than locally. This project will serve as a pioneer for that approach. + +## Related Issues +=> https://issues.genenetwork.org/topics/lmms/rqtl2/using-rqtl2 + +### Tasks + +* stage 1 (20%) * + + - [x] implement the rqtl script using rqtl2 + +* stage 2 (40%) * + +- [ ] Implement the RQTL API endpoints +- [ ] validation and preprocessing for data from the client +- [ ] Implement state-of-the-art error handling +- [ ] Add unit tests for the rqtl api module +- [ ] Make improvements to the current R script if possible + +* stage 3 (60%)* + +- [ ] Task queue integration (refer to the Deep Dive section) +- [ ] Implement a monitoring and logging system for job execution (refer to the deep dive section +- [ ] Fetch results from running jobs +- [ ] Processing output from the external script + +* stage 4 (80%) * +- [ ] Implement a console preview UI for user feedback +- [ ] Refactor the GN2 UI + +* stage 5 (100%) * + +- [ ] Run this computation on clusters
\ No newline at end of file diff --git a/topics/lmms/rqtl2/using-rqtl2.gmi b/topics/lmms/rqtl2/using-rqtl2.gmi new file mode 100644 index 0000000..7f671ba --- /dev/null +++ b/topics/lmms/rqtl2/using-rqtl2.gmi @@ -0,0 +1,44 @@ +# R/qtl2 + +# Tags + +* assigned: pjotrp, alexm +* priority: high +* type: enhancement +* status: open +* keywords: database, gemma, reaper, rqtl2 + +# Description + +R/qtl2 handles multi-parent populations, such as DO, HS rat and the collaborative cross (CC). It also comes with an LMM implementation. Here we describe using and embedding R/qtl2 in GN2. + +# Tasks + + +## R/qtl2 + +R/qtl2 is packaged in guix and can be run in a shell with + + +``` +guix shell -C r r-qtl2 +R +library(qtl2) +``` + +R/qtl2 also comes with many tests. When starting up with development tools in the R/qtl2 checked out git repo + +```sh +cd qtl2 +guix shell -C -D r r-qtl2 r-devtools make coreutils gcc-toolchain +make test +Warning: Your system is mis-configured: '/var/db/timezone/localtime' is not a symlink +i Testing qtl2 +Error in dyn.load(dll_copy_file) : +unable to load shared object '/tmp/RtmpWaf4td/pkgload31850824d/qtl2.so': /gnu/store/hs6jjk97kzafl3qn4wkdc8l73bfqqmqh-gfortran-11.4.0-lib/lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /tmp/RtmpWaf4td/pkgload31850824d/qtl2.so) +Calls: <Anonymous> ... <Anonymous> -> load_dll -> library.dynam2 -> dyn.load +Execution halted +make: *** [Makefile:9: test] Error 1 +``` + +not sure what the problem is yet. diff --git a/topics/meetings/gn-nairobi-2025.gmi b/topics/meetings/gn-nairobi-2025.gmi new file mode 100644 index 0000000..fb357a5 --- /dev/null +++ b/topics/meetings/gn-nairobi-2025.gmi @@ -0,0 +1,17 @@ +# Meeting Notes + +## 2024-01-10 +* @flisso: Prepare gn-uploader presentation for KEMRI. +* @flisso: Put c-elegans dataset to staging. +* @flisso: PHEWAS --- extract phenotypes from genenetwork and analyse them using PHEWAS. +* @alexm: Clean up R/Qtl 1. +* @alexm: Add R/Qtl 2 in gn. +* @alexm: Fix UI issues around GN AI. +* @bonfacem: Fix UI for group pages. +* @bonfacem: Add git hooks to cd container for self-hosted repositories. +* @bonfacem: Share developer work container and have Alex test it out. +* @bonfacem: Prepare RDF presentation for KEMRI. + +Nice to have: +* @bonfacem: Start dataset metadata editing work. +* @flisso: Write PhD concept note. diff --git a/topics/meetings/jnduli_bmunyoki.gmi b/topics/meetings/jnduli_bmunyoki.gmi index 5af7221..26621d1 100644 --- a/topics/meetings/jnduli_bmunyoki.gmi +++ b/topics/meetings/jnduli_bmunyoki.gmi @@ -1,5 +1,462 @@ # Meeting Notes +## 2024-10-15 +* DONE: @flisso: Follow up with the Medaka team on verification of genotype sample names +* DONE: @flisso: Understand uploader scripts and help improve then. +* CANCELLED: @flisso: Set up virtuoso. @bonfacem shall share notes on this. +* NOT DONE: @flisso: Write PhD concept note. +* DONE: @alexm @jnduli: R/Qtl script. +* DONE: @bonfacem: Test the production container locally and provide @fredm some feedback. +* DONE: @bonfacem: Wrap-up re-writing gn-guile to be part of genenetwork-webservices. +* NOT DONE: @bonfacem: Start dataset metadata editing work. + +## 2024-10-08 +* NOT DONE: @bonfacem RIF Indexing for RIF page in Xapian. +* IN PROGRESS: @bonfacem: Test the production container locally and provide @fredm some feedback. +* IN PROGRESS: @bonfacem: Re-writing gn-guile to be part of genenetwork-webservices. +* NOT DONE: @shelbys @bonfacem: Getting RDF into R2R. +* NOT DONE: @flisso: Follow up with the Medaka team on verification of genotype sample names. NOTE: Medaka team are yet to respond. +* IN PROGRESS: @flisso: Figure out how to add C Elegans data in staging. NOTE: Got access to staging server. Ran example tests. Still working on some errors. +* NOT DONE: @flisso: Set up virtuoso. @bonfacem shall share notes on this. +* NOT DONE: @flisso: Write PhD concept note. NOTE: Doing some lit review. +* @shelbys: Be able to test things on lambda01 for LLM tests. +* @alexm @jnduli: R/Qtl script. + +## 2024-10-18 +* IN-PROGRESS: @priscilla @flisso: Set up mariadb and virtuoso to test out some GN3 endpoints. NOTE: Mariadb set-up +* NOT DONE: @priscilla @flisso @bmunyoki: Improve docs while hacking on the above. +* DONE: @jnduli Remove gn-auth code from GN3. +* DONE: @jnduli Resolve current issue with broken auth in gn-qa. +* DONE: @jnduli @alexm Work on the R/Qtl design doc. +* IN-PROGRESS: @alexm: R/Qtl script. NOTE: Reviewed by @jnduli. +* DONE: @flisso MIKK genotyping. NOTE: Verification pending from Medaka team. +* DONE: @flisso Make sure we have C Elegans and HS Rats dataset to testing, and have the genotyping pipeline working. NOTE: Issues with tux02 staging server. +* DONE: @shelbys: Modify existing Grant write-up for pangenomes. NOTES: Some more edits to be done. +* NOT DONE: @shelbys @bonfacem: Getting RDF into R2R. +* NOT DONE: @bonfacem RIF Indexing for RIF page in Xapian. +* DONE: @bonfacem Work on properly containerizing gn-guile. NOTE: Send in patches to @alexm, @aruni, and @fredm to review later today. +* DONE: @bonfacem: Fix the virtuoso CI job in CD: NOTE: I'm awaiting feedback from @arun/@fredm. + +## 2024-10-11 +* WIP @priscilla @flisso: Try out API endpoints that don't require auth. NOTE: Priscilla got to set-up guix channels for gn3. Felix ran into problems. Priscilla set up the MySQL in her Ubuntu system. +* NOT DONE: @jnduli Harden hook system for gn-auth. +* WIP: @jnduli Remove gn-auth code from GN3. NOTE: Sent latest patches to Fred. Running issue, some patches may have caused gn-qa to fail. +* DONE: @jnduli @bonfacem Finish up RIF Editing project. +* NOT DONE: @jnduli @alexm Create issue on describing the monitoring system. +* NOT DONE: @jnduli @alexm Create issue on prompt engineering in GN to improve what we already have. +* WIP: @alex Work on R/Qtl. NOTE: @jnduli/@bonfacem help out with this. NOTE: Finished writing the design doc for gn-qa. +* DONE: Looked at documentation for R/Qtl. +* NOT DONE: @alex: Review @bmunyoki's work on RIF/Indexing. +* WIP: @flisso: Make sure we have C Elegans dataset and MIKK genotypes to production. NOTE: Issues with data entry scripts. Fred/Zach working to set up test environment. +* WIP: @flisso: MIKK genotyping. NOTE: Still testing the pipeline. Halfway there. +* NOT DONE: @flisso: Make sure we have HS Rats in testing stage. +* WIP: @flisso: Make progress in learning back-end coding WRT GN. NOTE: Issue setting up GN3. +* WIP: @shelbys: Modify existing Grant write-up for pangenomes. NOTE: Reviewed by Pj and Eric. More mods based of feedback. Paper got accepted by BioArxiv. Added some docs to R2R evaluation code. +* DONE: @shelbys: Finish getting all the R2R scores from the first study. NOTE: Got scores for all the scores from first papers using R2R instead of Fahamu. +* NOT DONE: @bonfacem RIF Indexing for RIF page in Xapian. +* WIP: @bonfacem Work on properly containerizing gn-guile. +* DONE: @bonfacem Fix the gn-transform-database in CI. Sent patches to Arun for review. +* DONE: @bonfacem Fixed broken utf-8 characters in gn-gemtext. + +## 2024-10-04 +* IN PROGRESS: @priscilla @bonfacem Setting up GN3. @priscilla try out API endpoints that don't require auth. NOTE: @priscilla Able to set up guix as a package manager. Trouble with Guix set-up with GN3. @bonfacem good opportunity to improve docs in GN3. +* IN PROGRESS: @jnduli Harden hook system for gn-auth. +* IN PROGRESS: @jnduli Remove gn-auth code from GN3. +* DONE: @jnduli Finish UI changes for RIF editing. NOTE: Demo done in GN Learning team. +* IN PROGRESS: @alex Work on R/Qtl. NOTE: Met with Karl Brohman/PJ. Been reading the docs. Will track this issue in GN. +* NOT DONE: @alex @bonfacem Work on properly containerizing gn-guile. +* DONE: @bonfacem API/Display of NCBI Rif metadata. +* IN PROGRESS: @bonfacem @alex RIF Indexing for RIF page in Xapian. +* IN PROGRESS: @flisso Push data to production. Commence work on Arabidopsis data and HS Rats data. NOTE: C-Elegans pushed in process of being pushed to testing server, then later production. WIP with HS Rats data in collab with Palmer. +* DONE: @flisso: Learning how to use SQL WRT C Elegans data. +* IN PROGRESS: @shelbys Re-formatting grant to use pangenomes. Waiting for Garisson for feedback. +* DONE: @shelbys Got the R2R for the human generated questions. TODO: Run this for GPT 4.0 model. + +## 2024-09-27 + +* DONE: @jnduli @bonfacem @alex Look at base files refactor and merge work. +* DONE: @priscilla continue to upload more papers. NOTE: Uploaded an extra 200 papers. +* NOT DONE: @priscilla @flisso Set up GN3. Goal is to be able to query some APIs in cURL. +* IN PROGRESS: @jnduli Improve hook systems for gn-auth. NOTE: Still figuring out a cleaner implementation for some things. +* IN PROGRESS: @jnduli Trying to remove auth code GN3 code. NOTE: Idea, though unsure about safety. @fred to review work and make sure things are safe. +* DONE: @jnduli @bonfacem @alex Push most recent changes to production. Figure out what needs doing. NOTE: @Zach is in charge of deployment. @fredm is working on the production container. +* DONE: @alex Close down remaining issues on issue tracker. NOTE: Merged work on cleaning up base files. Few more minor modifications to the UI. +* NOT DONE: @alex investigate the dumped static files for encoding issues. +* IN PROGRESS: @bonfacem NCBI Metadata - Modelling and Display. NOTE: Done with the modelling. Almost done with API/UI work. +* DONE: @bonfacem Fix broken CD tests. NOTE: We have tests running inside the guix build phase. +* IN-PROGRESS: @flisso Continue work on uploading datasets: C Elegans and MIKK. NOTE: Managed to create data files that need to be upleaded to the testing gn2 stage server. +* NOT DONE: @flisso @jnduli help @flisso with SQL. + +## 2024-09-20 +* NOT DONE: @priscilla @flisso @bmunyoki @jnduli set up GN ecosystem and review UI PRs +* DONE: @priscilla continue to upload more papers. NOTE: Shared access to drive to @bmunyoki. We are at 800 papers. +* DONE: @bmunyoki update tux02/01 with recent RIF modifications +* DONE: @jnduli Finish up experiments on hook system. NOTE: Patches got merged. Needs to make some things more concrete. +* NOT DONE: @alex @bonfacem investigate the dumped static files for encoding issues. +* DONE: Refactoring base files for GN2. +* IN PROGRESS: @flisso: Continue work on uploading datasets: C Elegans and MIKK. Note: Waiting for the original MIKK genotype file from the Medaka team. C Elegans yet to process the annotation file---some info is missing. +* NOT DONE: @flisso: Do code reviews on Sarthak's script. +* NOT DONE: @bmunyoki NCBI Metadata - Modelling and Display. +* DONE: @bmunyoki update tux02/01 with recent RIF modifications. NOTE: CD tests are broken and need to be fixed. + +## 2024-09-13 +* NOT DONE: @jnduli @bmunyoki fetch ncbi metadata and display them in GN2 +* DONE: @jnduli @bmunyoki add auth layer to edit rifs functionality +* DONE: @jnduli complete design doc for hooks system for gn-auth. NOTE: More experimentation with this. +* DONE: @jnduli @alex bug fixes for LLM integration. +* DONE: @priscilla added more papers to the LLM ~ 250 papers. +* NOT DONE: @priscilla @flisso @bmunyoki @jnduli set up GN ecosystem and review UI PRs +* DONE: @bmunyoki modify edit api to also write to RIF +* NOT DONE: @bmunyoki update tux02/01 with recent RIF modifications +* DONE: @bmunyoki Add test cases for RDF +* DONE: @alex Bug fix for session expiry. +* DONE: @alex Update links for static content to use self-hosted git repo. +* IN PROGRESS: @flisso Upload C Elegans Dataset. Nb: MIKK one has some issues, so work is paused for now. NOTE: Waiting for annotation and phenotype file for the C Elegans Dataset. +* DONE @flisso: Reviewed gemma wrapper scripts. + + +Nice to have: +* @bmunyoki build system container for gn-guile and write documentation for creating containers + +## 2024-09-06 + +* DONE: @bmunyoki Replicate GN1 WIKI+RIF in GN2. +* DONE: @bmunyoki update server to include latest code changes +* IN PROGRESS: @bmunyoki modify edit api to also write to RIF +* NOT DONE: @bmunyoki build system container for gn-guile and write documentation for creating containers +* DONE: @bmunyoki @flisso update case attributes to capture hierarchy info +* DONE: @bmunyoki prepare presentation for RIF work to GN learning team (goal is to present on Wednesday next week) +* NOT DONE: @bmunyoki update tux02/01 with recent RIF modifications +* NOT DONE: @jnduli @bmunyoki fetch ncbi metadata and display them in GN2 +* NOT DONE: @jnduli complete design doc for hooks system for gn-auth; Focus for next week. +* DONE: @alexm @jnduli integrate LLM in GN2 and GN3: On the look-out for bug-fixes. +* IN PROGRESS: @jnduli add auth layer to edit rifs functionality +* DONE: @flisso generate genotype file on Medaka fish dataset: @arthur to have a look at this. +* IN PROGRESS: @flisso code reviews for gemma-wrapper with @pjotr +* DONE: @flisso update gemtext documentation +* DONE: @flisso help Masters students with their proposal defences +* @priscilla add more papers to LLM +* NOT DONE: @priscilla @flisso @bmunyoki @jnduli set up GN ecosystem and review UI PRs + + +## 2024-09-02 (Sync with @flisso+@bonfacem) + +### Case-Attributes + +* @bmunyoki understood case attributes by reverse-engineering the relevant tables from GeneNetwork's database. + +* One source of confusion for @bmunyoki is that we have the same "CaseAttribute.Name" that applies to different strains. Example Query: + +``` +SELECT * FROM CaseAttribute JOIN CaseAttributeXRef ON CaseAttribute.CaseAttributeId = CaseAttributeXRef.CaseAttributeId WHERE CaseAttribute.Name = "Sex"\G +``` + +* @rob wants fine-grained access control with case attributes. + +* @flisso, case-attributes are GN invention. Case Attributes are extra metadata about a given dataset beyond the phenotype measurements. E.g. We can have the phenotype: "Central nervous system"; whereby we collect the values, and SE. However, we can also collect extra metadata like "Body Weight", "Sex", "Status", etc, and in GN terminology, that extra metadata is called Case Attributes. + +* @bmunyoki. Most of the confusion around case-attributes is because of how we store case-attributes. We don't have unique identifiers for case-attributes. + +## 2024-08-30 + +* IN PROGRESS: @bmunyoki Replicate GN1 WIKI+RIF in GN2. +* DONE: @bmunyoki and @alex help Alex deploy gn-guile code on tux02, run this in a tmux session. +* DONE: @bmunyoki api for history for all tasks +* DONE: @bmunyoki UI layer for RDF history +* @bmunyoki modify edit api to also write to RIF +* @bmunyoki build system container for gn-guile and write documentation for creating containers +* NOT DONE: @jnduli complete design doc for hooks system for gn-auth +* DONE: @alexm @jnduli create branches to testing for LLM in GN2 and GN3 +* IN PROGRESS: @alexm @jnduli integrate LLM in GN2 and GN3 +* IN PROGRESS: @jnduli add auth layer to edit rifs functionality +* DONE: @bmunyoki @felix sync on case attributes and document +* DONE: @flisso managed to upload <TODO> dataset to production + + +### nice to haves + +* nice_to_have: @bmunyoki experiment and document updating gn-bioinformatics set up packages (to support rshiny) + +## 2024-08-23 +* @shelby re-ingest data and run RAGAs against the queries already in the system to perform comparison with new papers. +* @shelby figure out Claude Sonnet stuff. +* IN PROGRESS: @felix @fred push RQTL bundles to uploader, also includes metadata. +* IN PROGESS: @felix look for means to fix metadata challenge ie. trouble associating data we upload and metadata that provides descriptions. +* DONE: @bmunyoki API: Get all RIF metadata by symbols from rdf. +* NOT DONE: @bmunyoki UI: Modify traits page to have "GN2 (GeneWiki)", to be picked after RDF is updated in tux02 +* DONE: @bmunyoki UI: Integrate with API +* NOT DONE: @bmunyoki Replicate GN1 WIKI+RIF in GN2. +* IN PROGRESS: @bmunyoki and @alex help Alex deploy gn-guile code on tux02. +* DONE: @bmunyoki @jnduli review gn2 UI change for markdown editor +* NOT DONE: @bmunyoki create template for bio paper +* DONE: @alex sync with Boni to set up gn-guile +* DONE: @alex @bmunyoki @jnduli sync to plan out work for llm integration +* DONE: @jnduli edit WIKI+RIF +* NOT DONE: @jnduli set up gn-uploader locally and improve docs +* NOT DONE: @jnduli complete design doc for hooks system for gn-auth +* DONE: @felix to document email threads on gemtext + +## 2024-08-22 + +=> https://issues.genenetwork.org/issues/edit-rif-metadata APIs for wiki editting and broke down wiki-editting task to sub-projects. + +## 2024-08-20 + +Integrating GNQA to GN2 website and how it will work? + +1. Have the context information displayed to the right of the GN2 xapian search page +2. When someone clicks the context info page, it opens the search from GNQA which has all the references. +3. Cache queries since many searches are the same. + +Problems: + +1. search has xapian specific terminology. How do we handle this? Remove xapian prefixes and provide the key words to search. +2. how do we handle cache expiry? + - no expiry for now. + - store them in a database table. + - every quarter year, the search can be updated. + - group:bxd, species: mouse -> bxd mouse + mouse bxd: -> when caching the ordering for the seach terms shouldn't matter much. + +Game Plan: + +1. Production the code relating to LLM search. Get the code for LLMs merged into main branch. +2. UI changes to show the search context from LLM. +3. Figuring out caching: + - database table structure + - cache expiry (use 1 month for now) + - modify LLM search to pick from cache if it exists. +4. Have another qa branch that fixes all errors since we had the freeze. +5. Only logged in users will have access to this functionality. + +## 2024-08-16 +* @jnduli Fix failing unit tests on GN-Auth. +* @jnduli Exploring Mechanical Rob for Integration Tests. GN-Auth should be as stable as possible. +* @jnduli Research e-mail patch workflow and propose a sane workflow for GN through an engineering blog post. +* @jnduli Help @alexm with auth work. +* @felix @fred push RQTL bundles to uploader. +* @felix look for means to fix metadata challenge ie. trouble associating data we upload and metadata that provides descriptions. +* @felix @jnduli programming learning: started building a web server to learn backend using Flask. +* @felix @jnduli Read Shelby's paper and provide feedback by the end of Saturday. + +## 2024-08-16 +* DONE: @jnduli Fix failing unit tests on GN-Auth. +* NOT DONE: @jnduli Exploring Mechanical Rob for Integration Tests. GN-Auth should be as stable as possible. +* NOT DONE: @jnduli Research e-mail patch workflow and propose a sane workflow for GN through an engineering blog post. +* DONE: @jnduli Help @alexm with auth work. +* IN PROGRESS: @felix @fred push RQTL bundles to uploader, also includes metadata. +* IN PROGRESS: @felix look for means to fix metadata challenge ie. trouble associating data we upload and metadata that provides descriptions. +* DONE: @felix @jnduli programming learning: started building a web server to learn backend using Flask. Learning html and css and will share the progress with this. +* DONE: @felix ~@jnduli~ Read Shelby's paper and provide feedback by the end of Saturday. +* DONE: @felix tested the time tracker script. +* IN PROGRESS: @bmunyoki implementation code work to edit Rif + WIki SQL n RDF data. We'll break this down. +* @bmunyoki and @alex help Alex deploy gn-guile code on tux02. +* NOT DONE: @bmunyoki Replicate GN1 WIKI+RIF in GN2. +* @shelby @bonfacem @alex Integrate QNQA Search to global search. +* @shelby handling edits with the current open paper + +Nice To Have: +* DEPRIORITIZED: @felix figure out how to fix large data uploads ie. most data sets are large e.g. 45GB. Uploader cannot handle these large files. +* DONE: @felix Try out John's time tracking tool and provide feedback. +* @shelby run RAGAs against the queries already in the system to perform comparison with new papers: re-ingesting, now at 1500 papers. +* @bmunyoki Send out emails to the culprit on failing tests in CI/CD. + +## 2024-08-15 +### RTF Editing (bmunyoki+alexm) + +In our static content, we don't really store RTF; instead we store, HTML. As an example compare these 2 documements and note their difference: + +=> https://github.com/bitfocus/rtf2text/blob/master/sample.rtf => [Proper RTF] sample.rtf +=> https://github.com/genenetwork/gn-docs/blob/master/general/datasets/Br_u_1203_rr/acknowledgment.rtf => [GN] acknowledgement.rtf + +* TODO @alexm Rename all the *rtf to *html during transform to make things clearer. Send @bonfacem PR. + +## 2024-08-13 +### Markdown Editor (bmunyoki+alexm) + +* @alexm @bonfacem Tested the Markdown Editor locally and it works fine. Only issue is that someone can make edits without logging in. +* API end-points to be only exposed locally. +* @alexm: Fix minor bug for when showing the diff. Have a back arrow. +* @bonfacem, @alexm: Deploy gn-guile; make sure it's only exposed locally. +* [blocking] @alexm having issues setting up gn-auth. @jnduli to help out to set up gn-auth and work out any quirks. @alexm to make sure you can't make edits without being logged in. +* @bmunyoki to set merge ge-editor UI work once basic auth is figured out. +* [nice-to-have] @alexm work on packaging: "diff2html-ui.min.js", "diff.min.js", "marked.min.js", "index.umd.js", "diff2html.min.js". +* [nice-to-have] @alexm to check-out djlint for linting jinja templates. +* @bonfacem share pre-commit hooks for setting up djlint and auto-pep8. +* [nice-to-have] @alexm to checkout: + +> djlint gn2/wqflask/templates/gn_editor.htmll --profile=jinja --reformat --format-css --format-js +=> https://www.djlint.com/ dj Lint; Lint & Format HTML Templates + +## 2024-08-09 + +* @shelby figure out Claude Sonnet stuff: NOT DONE, main focus was on the paper +* @shelby planning session for next work and tasks for Priscilla. DONE: Priscilla was given some work. Loop in Priscilla for our meetings. +* @shelby format output for ingested paper so that we can test the RAG engine. IN PROGRESS. Most focus has been on editing paper and some funding pursuit. +* @shelby run RAGAs against the queries already in the system to perform comparison with new papers. NOT DONE. +* @bmunyoki implementation code work to edit Rif + WIki SQL n RDF data. IN PROGRESS. Updated the RDF transform for geneWIKI; Now we can do a single GET for a single comment in RDF. +* @bmunyoki @shelby group paper on dissertation to target Arxiv. NOT DONE. +* @bmunyoki and @alex help Alex deploy gn-guile code on tux02. NOT DONE. Currently auth is a blocker. +* @bmunyoki review UI code editor work. DONE. +* @alex address comments in UI work. DONE. +* @felix @fred push RQTL bundles to uploader. In Progress: OOM Killer killing upload process. +* @felix look for means to fix metadata challenge ie. trouble associating data we upload and metadata that provides descriptions. The metadata doesn't meeting requirements. In Progress: Some things to be confirmed with Rob/PJ on coming up with a good format for adding metadata. NOT DONE. +* @felix figure out how to fix large data uploads ie. most data sets are large e.g. 45GB. Uploader cannot handle these large files. +* @felix @jnduli programming learning: started building a web server to learn backend using Flask. NOT DONE. +* @felix (@bmunyoki / @alex) learning emacs so that he figures out how to track times. @jnduli shared his time-tracking tool with @felix. DONE. +* @jnduli fix group creation bug in gn-auth. DONE: Group creation wasn't exactly a bug; updated docs, and fixed the masquerade API. +* @jnduli edit rif metadata using gn3. NOT DONE +* @jnduli update documentation for gn-auth setup. DONE +* @jnduli investigate more bugs related to gn-auth. DONE + +Note: When setting up sync between @jnduli and @felix, add @bmunyoki too. + + +## 2024-08-02 + +* DONE: @bmunyoki virtuoso and xapian updated in prod +* @bmunyoki code work to edit Rif + WIki SQL n RDF data: WIP, we have desired API, but we need to implement code. +* NOT DONE: @bmunyoki group paper on dissertation to target Arxiv +* DONE: @bmunyoki fix case insensitivity in Xapian search +* DONE: @jnduli review Alex patches +* DONE: @bmunyoki: updated gn2 and gn3 on git.genetwork server. Shared QA code with @shelby on a special branch. +* @bmunyoki @jnduli: fixed minor bug on xapian reflected with stemming. +* @shelby figure out Claude Sonnet stuff: NOT DONE, main focus was on the paper +* IN PROGRESS: @shelby edit paper with @pjtor +* @shelby planning session for next work and tasks for Priscilla. +* @shelby use RAGAS to test R2R with the new papers (follow up on the ingestion of papers tasks) +* @shelby and @boni to discuss R2R and interfacing with Virtuoso: deprioritized, we'll figure out interfacing with R2R. Implementation to happen later. +* DONE: @jnduli get up to speed on gn-auth +* @alex have an instance of gn-guile running on production: Code in prod, but needs to liase with Boni to get this working. +* @jgart getting genecup and rshiny containers to run as normal users instead of root users. May use libvirts APIs; or podman/docker as normal user; or rewriting the services as guix home services: system container doesn't have work around this, there's no work around. Because guix by default needs root to run as a system container. We also need sudo since at root level we define our system containers in a systemd that needs to be run as root. Why systemd? Systemd no one needs to run this. + +### Meeting with Sevila on Masters Papers + +- mainly stylistic changes provided. +- provide an email explaining how long ethical review took, so that he follows up on unexpected delays. +- met up with Dr Betsy, once done with defences in October (hopefully), and Boni may get his degree before graduation next year, to facilitate Boni applying for PhD. + +### Guix Root Container + +- With docker, to prevent the need for sudo, we usually create a docker group, and add users that need to run this to this group. Can this ahppen in guix? +- Guix has a guix group. Why haven't we done this??? @jgart and @boni + +## 2024-07-26 +Plan for this week: + +* NOT DONE, needs a meeting: @bmunyoki virtuoso and xapian are up-to-date in prod. Boni doesn't have root access in production, so coordination with Fred and Zach is causing delays. +* Apis design DONE, actual CODE incomplete: @bmunyoki update RIF+WIKI on SQL and RDF from gn2 website +* DONE: @bmunyoki and @shelby review dissertation for Masters +* DONE, needs to review new changes: @bmunyoki and @jgart to review patches for `genecup` and `rshiny`. +* @bmunyoki and @jnduli to review patches for markdown parser +* DONE, patches sent. @alexm add validation and document to markdown parser. +* DONE: @shelby ingest ageing data to RAG, 10% left to complete. +* DONE: @shelby do another round for editting on the AI paper +* IN PROGRESS: @shelby RAG engine only works with OpenAI, figure out Claude Sonnet integration +* IN PROGRESS: @jnduli get up to speed on gn-auth +* @jgart enabling acme service in genecup and rshiny containers. +* @jnduli and @bmunyoki to attempt to get familiar with R2R + +Nice to have: +* @bmunyoki fix CI job for GN transformer database i.e. instead of checksums just run full job once per month: scheme script created that dumps the example files, next step is to create Gexp that runs this script. Bandwidth constraints. + +## 2024-07-23 +### LLM Meeting (@shelby+@bmunyoki) +* There's no clear way of ingesting human-readable data with context into the RAG Graph from RDF. +* What specific graph should we ingest into the RAG Graph from RDF? @bmunyoki suggested RIF, PubMed Metadata. We'll figure this out. +* @bonfacem recommended: Much better to work with SPARQL than directly with TTL files. +* We've uploaded rdf triples, yet they loose their strength as the RAG system is not undergirded with a knowledge graph. @bonfacem should read the following for more context and should reach out to @shelby on how to move forward with SPARQL more concretely: + +=> https://r2r-docs.sciphi.ai/cookbooks/knowledge-graph#r2r-knowledge-graph-configuration + +* We need to test the knowledge graph backend of R2R to see how feasible it is to use with the existing data (RDF). +* Fahamu just stored the object and lost the subject+predicate +* Loop in Alex. + + +## 2024-07-19 +Plan for this week: + +* DONE: @jgart getting `genecup` app to run in a guix container i.e. `gunicorn service` should then run `genecup`, similar to how gn2 and gn-uploader work. Patches sent to Boni, include `genecup` and `rshiny` and the container patches are tested. +* @jgart enable acme certificates for `genecup` container: Should just enable a single form, let's use arun's email since its what we use for all our services. Reverse proxy happens inside the container. Add a comment explaining that this shouldn't be standard python set up. +* INPROGRESS: @bmunyoki virtuoso and xapian are up-to-date in prod: +* NOT DONE: @bmunyoki update RIF+WIKI on SQL and RDF from gn2 website +* INPROGRESS: @bmunyoki fix CI job for GN transformer database i.e. instead of checksums just run full job once per month: scheme script created that dumps the example files, next step is to create Gexp that runs this script. Bandwidth constraints. +* @bmunyoki and @shelby review dissertation for Masters: @bonz needs to send updated version. Also reviewed another masters by Johannes. +* ON HOLD: @alexm rewrite UI code using htmx +* INPROGRESS: @alexm address review comments in markdown parser. Api endpoints are getting reimplemented. Needs to add validation and documentation and send v2 patches for review. +* DONE: @shelby compile ingesting 500 more papers into RAG engine +* @shelby ingesting ageing research into the RAG engine: diabetes reseach is ingested, ageing will be done later. +* NOT DONE: @shelby RAG engine only works with OpenAI, figure out Claude Sonnet integration +* DONE: @shelby @bmunyoki @alexm to define the problem with RDF triple stores +* DONE: @jnduli finish up on RIF update +* IN PROGRESS: @jnduli get up to speed on gn-auth + +AOB + +* RAG engine uses R2R for the integration. It would be great if we could integrate this into guix. @shelby will send @jgart the paper on how we use the RAG. + + +## 2024-07-12 + +Plan for this week: + +* @shelby use Claude Sonnet with R2R RAG engine with 1000 papers and fix bugs: 500 papers ingested into R2R, remaining with 500. +* @shelby final run through for paper 1 before Pjotr's review. DONE, configurations fixed. New repo gnai that contains the results and will contain R2R stuff. +* NOT DONE: @shelby and @bmunyoki review dissertation paper for Masters +* @shelby @bmunyoki @alexm to define the problem with RDF triple stores +* @alexm integrate the markdown parser: DONE, patches sent to Boni +* @alexm rewrite UI code using htmx: NOT DONE +* @bmunyoki investigate why xapian index isn't getting rebuilt: DONE +* @bmunyoki investigate discrepancies between wiki and rif search: DONE, get this to prod to be tested +* @jnduli update the generif_basic table from NCBI: IN PROGRESS. +* @jnduli blog post of preference for documentation: DONE. + +We have qa.genenetwork.com. We need to have this set up to `qa.genenetwork.com/paper1` so that we always have the system that was used for this. How? + +Nice to Haves + +* @bmunyoki Nice to have tag for paper1: Fix this with Boni and get done later on/iron them out then. +* @bmunyoki fix CI job that transforms gn tables to TTL: Move this to running a cron job once per month instead of + + +## 2024-06-24 + +Plan for this Week: + +* CANCELED: @bmunyoki Remove boolean prefixes from search where it makes sense. +* DONE: @bmunyoki GeneWiki + GeneRIF search in production. Mostly needs to be run in prod to see impact. +* DONE: @jnduli Children process termination when we kill the main index-genenetwork script +* CANCELED: @bmunyoki Follow up on getting virtuoso child pages in production +* IN PROGRESS @alexm push endpoints for editting and making commits for markdown files +* DONE: @all Reply to survey from Shelby +* DONE: @jnduli Fix JS import orders (without messing up the rest of Genenetwork) +* DONE: @jnduli fix search results when nothing is found +* CANCELED: @jnduli test out running guix cron jobs locally +* NOT DONE: @Jnduli mention our indexing documentation in gn2 README + +Note: For qa.genenetwork.com, we chose to pause work on this until papers are done. + +Review for last week + +* DONE: @bmunyoki rebuild guix container with new mcron changes +* WIP: @jnduli attempts to make UI change that shows all supported keys in the search: Blocked because our JS imports aren't ordered correctly and using `boolean_prefixes` means our searches don't work as we'd expect. +* WIP: @bmunyoki create an issue with all the problems experienced with search and potential solutions. Make sure it has replication steps, and plans for solutions. Issue was created but we need to get a better understanding for how cis and trans searches work. +* TODO: @bmunyoki and @jnduli genewiki indexing: PR for WIKI indexing is completed, but we didn't test it out due to the outage caused by RAM and our script. We don't have a way to easily instrument how much RAM our process uses and how to kill the process. +* DONE: @bmunyoki demoes and documents how to run and test guix cron job for indexing +* DONE: @bmunyoki trains @jnduli on how to review patchsets from emails +* DONE: @jnduli Follow up notes on setting up local index-genenetwork search +* DONE: @alexm handling with graduation, AFK +* TODO: @bmunyoki follow up with Rob to makes sure he tests search after everything is complete: He got some feedback and Rob is out of Town but wants RIF and Wiki search by July 2nd. + +Nice to haves: + +* TODO: minor: bonfacem makes sure that mypy/pylint in CI runs against the index-genenetwork script. +* TODO: @bmunyoki follow up how do we make sure that xapian prefix changes in code retrigger xapian indexing? + - howto: xapian prefix changes, let's maintain a hash for the file and store it in xapian + - howto: for RDF changes, since we have ttl files, if this ever changes we trigger the script. It's also nice to be able to automatically also load up data to virtuoso if this file changes. + + ## 2024-06-21 ### Outage for 2024-06-20 diff --git a/topics/octopus/lizardfs/README.gmi b/topics/octopus/lizardfs/README.gmi index 78316ef..7c91136 100644 --- a/topics/octopus/lizardfs/README.gmi +++ b/topics/octopus/lizardfs/README.gmi @@ -86,14 +86,23 @@ Other commands can be found with `man lizardfs-admin`. ## Deleted files -Lizardfs also keeps deleted files, by default for 30 days. If you need to recover deleted files (or delete them permanently) then the metadata directory can be mounted with: +Lizardfs also keeps deleted files, by default for 30 days in `/mnt/lizardfs-meta/trash`. If you need to recover deleted files (or delete them permanently) then the metadata directory can be mounted with: ``` $ mfsmount /path/to/unused/mount -o mfsmeta ``` For more information see the lizardfs documentation online -=> https://dev.lizardfs.com/docs/adminguide/advanced_configuration.html#trash-directory lizardfs documentation for the trash directory +=> https://lizardfs-docs.readthedocs.io/en/latest/adminguide/advanced_configuration.html#trash-directory lizardfs documentation for the trash directory + +## Start lizardfs-mount (lizardfs reader daemon) after a system reboot + +``` +sudo bash +systemctl daemon-reload +systemctl restart lizardfs-mount +systemctl status lizardfs-mount +``` ## Gotchas diff --git a/topics/octopus/maintenance.gmi b/topics/octopus/maintenance.gmi new file mode 100644 index 0000000..65ea52e --- /dev/null +++ b/topics/octopus/maintenance.gmi @@ -0,0 +1,98 @@ +# Octopus/Tux maintenance + +## To remember + +`fdisk -l` to see disk models +`lsblk -nd` to see mounted disks + +## Status + +octopus02 +- Devices: 2 3.7T SSDs + 2 894.3G SSDs + 2 4.6T HDDs +- **Status: Slurm not OK, LizardFS not OK** +- Notes: + - `octopus02 mfsmount[31909]: can't resolve master hostname and/or portname (octopus01:9421)`, + - **I don't see 2 drives that are physically mounted** + +octopus03 +- Devices: 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK +- Notes: **I don't see 2 drives that are physically mounted** + +octopus04 +- Devices: 4 7.3 T SSDs (Neil) + 1 4.6T HDD + 1 3.7T SSD + 2 894.3G SSDs +- Status: Slurm NO, LizardFS OK (we don't share the HDD) +- Notes: no + +octopus05 +- Devices: 1 7.3 T SSDs (Neil) + 5 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK +- Notes: no + +octopus06 +- Devices: 1 7.3 T SSDs (Neil) + 1 4.6T HDD + 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK (we don't share the HDD) +- Notes: no + +octopus07 +- Devices: 1 7.3 T SSDs (Neil) + 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK +- Notes: **I don't see 1 device that is physically mounted** + +octopus08 +- Devices: 1 7.3 T SSDs (Neil) + 1 4.6T HDD + 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK (we don't share the HDD) +- Notes: no + +octopus09 +- Devices: 1 7.3 T SSDs (Neil) + 1 4.6T HDD + 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK (we don't share the HDD) +- Notes: no + +octopus10 +- Devices: 1 7.3 T SSDs (Neil) + 4 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK (we don't share the HDD) +- Notes: **I don't see 1 device that is physically mounted** + +octopus11 +- Devices: 1 7.3 T SSDs (Neil) + 5 3.7T SSDs + 2 894.3G SSDs +- Status: Slurm OK, LizardFS OK +- Notes: on + +tux05 +- Devices: 1 3.6 NVMe + 1 1.5T NVMe + 1 894.3G NVMe +- Status: Slurm OK, LizardFS OK (we don't share anything) +- Notes: **I don't have a picture to confirm physically mounted devices** + +tux06 +- Devices: 2 3.6 T SSDs (1 from Neil) + 1 1.5T NVMe + 1 894.3G NVMe +- Status: Slurm OK, LizardFS (we don't share anything) +- Notes: + - **Last picture reports 1 7.3 T SSD (Neil) that is missing** + - **Disk /dev/sdc: 3.64 TiB (Samsung SSD 990: free and usable for lizardfs** + - **Disk /dev/sdd: 3.64 TiB (Samsung SSD 990): free and usable for lizardfs** + +tux07 +- Devices: 3 3.6 T SSDs + 1 1.5T NVMe (Neil) + 1 894.3G NVMe +- Status: Slurm OK, LizardFS +- Notes: + - **Disk /dev/sdb: 3.64 TiB (Samsung SSD 990): free and usable for lizardfs** + - **Disk /dev/sdd: 3.64 TiB (Samsung SSD 990): mounted at /mnt/sdb and shared on LIZARDFS: TO CHECK BECAUSE IT HAS NO PARTITIONS** + +tux08 +- Devices: 3 3.6 T SSDs + 1 1.5T NVMe (Neil) + 1 894.3G NVMe +- Status: Slurm OK, LizardFS +- Notes: no + +tux09 +- Devices: 1 3.6 T SSDs + 1 1.5T NVMe + 1 894.3G NVMe +- Status: Slurm OK, LizardFS +- Notes: **I don't see 1 device that is physically mounted** + +## Neil disks +- four 8TB SSDs on the right of octopus04 +- one 8TB SSD in the left slot of octopus05 +- six 8TB SSDs bottom-right slot of octopus06,07,08,09,10,11 +- one 4TB NVMe and one 8TB SSDs on tux06, NVME in the bottom-right of the group of 4 on the left, SSD on the bottom-left of the group of 4 on the right +- one 4TB NVMe on tux07, on the top-left of the group of 4 on the right +- one 4TB NVMe on tux08, on the top-left of the group of 4 on the right diff --git a/topics/octopus/recent-rust.gmi b/topics/octopus/recent-rust.gmi new file mode 100644 index 0000000..7ce8968 --- /dev/null +++ b/topics/octopus/recent-rust.gmi @@ -0,0 +1,76 @@ +# Use a recent Rust on Octopus + + +For impg we currently need a rust that is more recent than what we have in Debian +or Guix. No panic, because Rust has few requirements. + +Install latest rust using the script + +``` +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +``` + +Set path + +``` +. ~/.cargo/env +``` + +Update rust + +``` +rustup default stable +``` + +Next update Rust + +``` +octopus01:~/tmp/impg$ . ~/.cargo/env +octopus01:~/tmp/impg$ rustup default stable +info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu' +info: latest update on 2025-05-15, rust version 1.87.0 (17067e9ac 2025-05-09) +info: downloading component 'cargo' +info: downloading component 'clippy' +info: downloading component 'rust-docs' +info: downloading component 'rust-std' +info: downloading component 'rustc' +(...) +``` + +and build the package + +``` +octopus01:~/tmp/impg$ cargo build +``` + +Since we are not in guix we get the local dependencies: + +``` +octopus01:~/tmp/impg$ ldd target/debug/impg + linux-vdso.so.1 (0x00007ffdb266a000) + libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe404001000) + librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe403ff7000) + libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe403fd6000) + libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe403fd1000) + libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe403e11000) + /lib64/ld-linux-x86-64.so.2 (0x00007fe404682000) +``` + +Login on another octopus - say 02 you can run impg from this directory: + +``` +octopus02:~$ ~/tmp/impg/target/debug/impg +Command-line tool for querying overlaps in PAF files + +Usage: impg <COMMAND> + +Commands: + index Create an IMPG index + partition Partition the alignment + query Query overlaps in the alignment + stats Print alignment statistics + +Options: + -h, --help Print help + -V, --version Print version +``` diff --git a/topics/octopus/set-up-guix-for-new-users.gmi b/topics/octopus/set-up-guix-for-new-users.gmi new file mode 100644 index 0000000..f459559 --- /dev/null +++ b/topics/octopus/set-up-guix-for-new-users.gmi @@ -0,0 +1,38 @@ +# Set up Guix for new users + +This document describes how to set up Guix for new users on a machine in which Guix is already installed (such as octopus01). + +## Create a per-user profile for yourself by running your first guix pull + +"Borrow" some other user's guix to run guix pull. In the example below, we use root's guix, but it might as well be any guix. +``` +$ /var/guix/profiles/per-user/root/current-guix/bin/guix pull +``` +This should create your very own Guix profile at ~/.config/guix/current. You may invoke guix from this profile as +``` +$ ~/.config/guix/current/bin/guix ... +``` +But, you'd normally want to make this more convenient. So, add ~/.config/guix/current/bin to your PATH. To do this, add the following to your ~/.profile +``` +GUIX_PROFILE=~/.config/guix/current +. $GUIX_PROFILE/etc/profile +``` +Thereafter, you may run any guix command simply as +``` +$ guix ... +``` + +## Pulling from a different channels.scm + +By default, guix pull pulls the latest commit of the main upstream Guix channel. You may want to pull from additional channels as well. Put the channels you want into ~/.config/guix/channels.scm, and then run guix pull. For example, here's a channels.scm if you want to use the guix-bioinformatics channel. +``` +$ cat ~/.config/guix/channels.scm +(list (channel + (name 'gn-bioinformatics) + (url "https://git.genenetwork.org/guix-bioinformatics") + (branch "master"))) +``` +And, +``` +$ guix pull +``` diff --git a/topics/octopus/slurm-upgrade.gmi b/topics/octopus/slurm-upgrade.gmi new file mode 100644 index 0000000..822f68e --- /dev/null +++ b/topics/octopus/slurm-upgrade.gmi @@ -0,0 +1,89 @@ +# How to upgrade slurm on octopus + +This document closely mirrors the official upgrade guide. The official upgrade guide is very thorough. Please refer to it and update this document if something is not clear. +=> https://slurm.schedmd.com/upgrades.html Official slurm upgrade guide + +## Preparation + +It is possible to upgrade slurm in-place without upsetting running jobs. But, for our small cluster, we don't mind a little downtime. So, it is simpler if we schedule some downtime with other users and make sure there are no running jobs. + +slurm can only be upgraded safely in small version increments. For example, it is safe to upgrade version 18.08 to 19.05 or 20.02, but not to 20.11 or later. This compatibility information is in the RELEASE_NOTES file of the slurm git repo with the git tag corresponding to the version checked out. Any configuration file changes are also outlined in this file. +=> https://github.com/SchedMD/slurm/ slurm git repository + +## Backup + +Stop the slurmdbd, slurmctld and slurmd services. +``` +# systemctl stop slurmdbd slurmctld slurmd slurmrestd +``` +Backup the slurm StateSaveLocation (/var/spool/slurmd/ctld in our case) and the slurm configuration directory. +``` +# cp -av /var/spool/slurmd/ctld /somewhere/safe/ +# cp -av /etc/slurm /somewhere/safe/ +``` +Backup the slurmdbd MySQL database. Enter the password when prompted. The password is specified in StoragePass of /etc/slurm/slurmdbd.conf. +``` +$ mysqldump -u slurm -p --databases slurm_acct_db > /somewhere/safe/slurm_acct_db.sql +``` + +## Upgrade slurm on octopus01 (the head node) + +Clone the gn-machines git repo. +``` +$ git clone https://git.genenetwork.org/gn-machines +``` +Edit slurm.scm to build the version of slurm you are upgrading to. Ensure it builds successfully using +``` +$ guix build -f slurm.scm +``` +Upgrade slurm. +``` +# ./slurm-head-deploy.sh +``` +Make any configuration file changes outlined in RELEASE_NOTES. Next, run the slurmdbd daemon, wait for it to start up successfully and then exit with Ctrl+C. During upgrades, slurmdbd may take extra time to update the database. This may cause systemd to timeout and kill slurmdbd. So, we do it this way, instead of simply starting the slurmdbd systemd service. +``` +# sudo -u slurm slurmdbd -D +``` +Reload the new systemd configuration files. Then, start the slurmdbd, slurmctld and slurmd services one at a time ensuring that each starts up correctly before proceeding on to the next. +``` +# systemctl daemon-reload +# systemctl start slurmdbd +# systemctl start slurmctld +# systemctl start slurmd +# systemctl start slurmrestd +``` + +## Upgrade slurm on the worker nodes + +Repeat the steps below on every worker node. + +Stop the slurmd service. +``` +# systemctl stop slurmd +``` +Upgrade slurm, passing slurm-worker-deploy.sh the slurm store path obtained from building slurm using guix build on octopus01. Recall that you cannot invoke guix build on the worker nodes. +``` +# ./slurm-worker-deploy.sh /gnu/store/...-slurm +``` +Copy over any configuration file changes from octopus01. Then, reload the new systemd configuration files and start slurmd. +``` +# systemctl daemon-reload +# systemctl start slurmd +``` + +## Tip: Running the same command on all worker nodes + +It is a lot of typing to run the same command on all worker nodes. You could make this a little less cumbersome with the following bash for loop. +``` +for node in octopus02 octopus03 octopus05 octopus06 octopus07 octopus08 octopus09 octopus10 octopus11 tux05 tux06 tux07 tux08 tux09; +do + ssh $node your command +done +``` +You can even do this for sudo commands using the -S flag of sudo that makes it read the password from stdin. Assuming your password is in the pass password manager, the bash for loop would then look like: +``` +for node in octopus02 octopus03 octopus05 octopus06 octopus07 octopus08 octopus09 octopus10 octopus11 tux05 tux06 tux07 tux08 tux09; +do + pass octopus | ssh $node sudo -S your command +done +```
\ No newline at end of file diff --git a/topics/programming/autossh-for-keeping-ssh-tunnels.gmi b/topics/programming/autossh-for-keeping-ssh-tunnels.gmi new file mode 100644 index 0000000..a977232 --- /dev/null +++ b/topics/programming/autossh-for-keeping-ssh-tunnels.gmi @@ -0,0 +1,65 @@ +# Using autossh to Keep SSH Tunnels Alive + +## Tags +* keywords: ssh, autossh, tunnel, alive + + +## TL;DR + +``` +guix package -i autossh # Install autossh with Guix +autossh -M 0 -o "ServerAliveInterval 60" -o "ServerAliveCountMax 5" -L 4000:127.0.0.1:3306 alexander@remoteserver.org +``` + +## Introduction + +Autossh is a utility for automatically restarting SSH sessions and tunnels if they drop or become inactive. It's particularly useful for long-lived tunnels in unstable network environments. + +See official docs: + +=> https://www.harding.motd.ca/autossh/ + +## Installing autossh + +Install autossh using Guix: + +``` +guix package -i autossh +``` + +Basic usage: + +``` +autossh [-V] [-M monitor_port[:echo_port]] [-f] [SSH_OPTIONS] +``` + +## Examples + +### Keep a database tunnel alive with autossh + +Forward a remote MySQL port to your local machine: + +**Using plain SSH:** + +``` +ssh -L 5000:localhost:3306 alexander@remoteserver.org +``` + +**Using autossh:** + +``` +autossh -L 5000:localhost:3306 alexander@remoteserver.org +``` + +### Better option + +``` +autossh -M 0 -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" -L 5000:localhost:3306 alexander@remoteserver.org +``` + +#### Option explanations: + +- `ServerAliveInterval`: Seconds between sending keepalive packets to the server (default: 0). +- `ServerAliveCountMax`: Number of unanswered keepalive packets before SSH disconnects (default: 3). + +You can also configure these options in your `~/.ssh/config` file to simplify command-line usage. diff --git a/topics/systems/backup-drops.gmi b/topics/systems/backup-drops.gmi index 191b185..3f81c5a 100644 --- a/topics/systems/backup-drops.gmi +++ b/topics/systems/backup-drops.gmi @@ -4,6 +4,10 @@ To make backups we use a combination of sheepdog, borg, sshfs, rsync. sheepdog i This system proves pretty resilient over time. Only on the synology server I can't get it to work because of some CRON permission issue. +For doing the actual backups see + +=> ./backups-with-borg.gmi + # Tags * assigned: pjotrp @@ -13,7 +17,7 @@ This system proves pretty resilient over time. Only on the synology server I can ## Borg backups -It is advised to use a backup password and not store that on the remote. +Despite our precautions it is advised to use a backup password and *not* store that on the remote. ## Running sheepdog on rabbit @@ -59,14 +63,14 @@ where remote can be an IP address. Warning: if you introduce this `AllowUsers` command all users should be listed or people may get locked out of the machine. -Next create a special key on the backup machine's ibackup user (just hit enter): +Next create a special password-less key on the backup machine's ibackup user (just hit enter): ``` su ibackup ssh-keygen -t ecdsa -f $HOME/.ssh/id_ecdsa_backup ``` -and copy the public key into the remote /home/bacchus/.ssh/authorized_keys +and copy the public key into the remote /home/bacchus/.ssh/authorized_keys. Now test it from the backup server with @@ -82,13 +86,20 @@ On the drop server you can track messages by tail -40 /var/log/auth.log ``` +or on recent linux with systemd + +``` +journalctl -r +``` + Next ``` ssh -v -i ~/.ssh/id_ecdsa_backup bacchus@dropserver ``` -should give a Broken pipe(!). In auth.log you may see something like +should give a Broken pipe(!) or -- more recently -- it says `This service allows sftp connections only`. +When running sshd with a verbose switch you may see something like fatal: bad ownership or modes for chroot directory component "/export/backup/" @@ -110,6 +121,19 @@ chown bacchus.bacchus backup/bacchus/drop/ chmod 0700 backup/bacchus/drop/ ``` +Another error may be: + +``` +fusermount3: mount failed: Operation not permitted +``` + +This means you need to set the suid on the fusermount3 command. Bit nasty in Guix. + +``` +apt-get install fuse(3) sshfs +chmod 4755 /usr/bin/fusermount +``` + If auth.log says error: /dev/pts/11: No such file or directory on ssh, or received disconnect (...) disconnected by user we are good to go! Note: at this stage it may pay to track the system log with @@ -171,3 +195,5 @@ sshfs -o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3,IdentityFile=~/. The recent scripts can be found at => https://github.com/genenetwork/gn-deploy-servers/blob/master/scripts/tux01/backup_drop.sh + +# borg-borg diff --git a/topics/systems/backups-with-borg.gmi b/topics/systems/backups-with-borg.gmi new file mode 100644 index 0000000..1ad0112 --- /dev/null +++ b/topics/systems/backups-with-borg.gmi @@ -0,0 +1,220 @@ +# Borg backups + +We use borg for backups. Borg is an amazing tool and after 25+ years of making backups it just feels right. +With the new tux04 production install we need to organize backups off-site. The first step is to create a +borg runner using sheepdog -- sheepdog we use for monitoring success/failure. +Sheepdog essentially wraps a Unix command and sends a report to a local or remote redis instance. +Sheepdog also includes a web server for output: + +=> http://sheepdog.genenetwork.org/sheepdog/status.html + +which I run on one of my machines. + +# Tags + +* assigned: pjotrp +* keywords: systems, backup, sheepdog, database + +# Install borg + +Usually I use a version of borg from guix. This should really be done as the borg user (ibackup). + +``` +mkdir ~/opt +guix package -i borg ~/opt/borg +tux04:~$ ~/opt/borg/bin/borg --version + 1.2.2 +``` + +# Create a new backup dir and user + +The backup should live on a different disk from the things we backup, so when that disk fails we have another. + +The SQL database lives on /export and the containers live on /export2. /export3 is a largish slow drive, so perfect. + +By convention I point /export/backup to the real backup dir on /export3/backup/borg/ Another convention is that we use an ibackup user which has the backup passphrase in ~/.borg-pass. As root: + +``` +mkdir /export/backup/borg +chown ibackup:ibackup /export/backup/borg +chown ibackup:ibackup /home/ibackup/.borg-pass +su ibackup +``` + +Now you should be able to load the passphrase and create the backup dir + +``` +id + uid=1003(ibackup) +. ~/.borg-pass +cd /export/backup/borg +~/opt/borg/bin/borg init --encryption=repokey-blake2 genenetwork +``` + +Now we can run our first backup. Note that ibackup should be a member of the mysql and gn groups + +``` +mysql:x:116:ibackup +``` + +# First backup + +Run the backup the first time: + +``` +id + uid=1003(ibackup) groups=1003(ibackup),116(mysql) +~/opt/borg/bin/borg create --progress --stats genenetwork::first-backup /export/mysql/database/* +``` + +You may first need to update permissions to give group access + +``` +chmod g+rx -R /var/lib/mysql/* +``` + +When that works borg reports: + +``` +Archive name: first-backup +Archive fingerprint: 376d32fda9738daa97078fe4ca6d084c3fa9be8013dc4d359f951f594f24184d +Time (start): Sat, 2025-02-08 04:46:48 +Time (end): Sat, 2025-02-08 05:30:01 +Duration: 43 minutes 12.87 seconds +Number of files: 799 +Utilization of max. archive size: 0% +------------------------------------------------------------------------------ + Original size Compressed size Deduplicated size +This archive: 534.24 GB 238.43 GB 237.85 GB +All archives: 534.24 GB 238.43 GB 238.38 GB + Unique chunks Total chunks +Chunk index: 200049 227228 +------------------------------------------------------------------------------ +``` + +50% compression is not bad. borg is incremental so it will only backup differences next round. + +Once borg works we could run a CRON job. But we should use the sheepdog monitor to make sure backups keep going without failure going unnoticed. + +# Using the sheepdog + +=> https://github.com/pjotrp/deploy sheepdog code + +## Clone sheepdog + +=> https://github.com/pjotrp/deploy#install sheepdog install + +Essentially clone the repo so it shows up in ~/deploy + +``` +cd /home/ibackup +git clone https://github.com/pjotrp/deploy.git +/export/backup/scripts/tux04/backup-tux04.sh +``` + +## Setup redis + +All sheepdog messages get pushed to redis. You can run it locally or remotely. + +By default we use redis, but syslog and others may also be used. The advantage of redis is that it is not bound to the same host, can cross firewalls using an ssh reverse tunnel, and is easy to query. + +=> https://github.com/pjotrp/deploy#install sheepdog install + +In our case we use redis on a remote host and the results get displayed by a webserver. Also some people get E-mail updates on failure. The configuration is in + +``` +/home/ibackup# cat .config/sheepdog/sheepdog.conf . +{ + "redis": { + "host" : "remote-host", + "password": "something" + } +} +``` + +If you see localhost with port 6377 it is probably a reverse tunnel setup: + +=> https://github.com/pjotrp/deploy#redis-reverse-tunnel + +Update the fields according to what we use. Main thing is that is the definition of the sheepdog->redis connector. If you also use sheepdog as another user you'll need to add a config. + +Sheepdog should show a warning when you configure redis and it is not connecting. + +## Scripts + +Typically I run the cron job from root CRON so people can find it. Still it is probably a better idea to use an ibackup CRON. In my version a script is run that also captures output: + +```cron root +0 6 * * * /bin/su ibackup -c /export/backup/scripts/tux04/backup-tux04.sh >> ~/cron.log 2>&1 +``` + +The script contains something like + +```bash +#! /bin/bash +if [ "$EUID" -eq 0 ] + then echo "Please do not run as root. Run as: su ibackup -c $0" + exit +fi +rundir=$(dirname "$0") +# ---- for sheepdog +source $rundir/sheepdog_env.sh +cd $rundir +sheepdog_borg.rb -t borg-tux04-sql --group ibackup -v -b /export/backup/borg/genenetwork /export/mysql/database/* +``` + +and the accompanying sheepdov_env.sh + +``` +export GEM_PATH=/home/ibackup/opt/deploy/lib/ruby/vendor_ruby +export PATH=/home/ibackup/opt/deploy/deploy/bin:/home/wrk/opt/deploy/bin:$PATH +``` + +If it reports + +``` +/export/backup/scripts/tux04/backup-tux04.sh: line 11: /export/backup/scripts/tux04/sheepdog_env.sh: No such file or directory +``` + +you need to install sheepdog first. + +If all shows green (and takes some time) we made a backup. Check the backup with + +``` +ibackup@tux04:/export/backup/borg$ borg list genenetwork/ +first-backup Sat, 2025-02-08 04:39:50 [58715b883c080996ab86630b3ae3db9bedb65e6dd2e83977b72c8a9eaa257cdf] +borg-tux04-sql-20250209-01:43-Sun Sun, 2025-02-09 01:43:23 [5e9698a032143bd6c625cdfa12ec4462f67218aa3cedc4233c176e8ffb92e16a] +``` +and you should see the latest. The contents with all files should be visible with + +``` +borg list genenetwork::borg-tux04-sql-20250209-01:43-Sun +``` + +Make sure you not only see just a symlink. + +# More backups + +Our production server runs databases and file stores that need to be backed up too. + +# Drop backups + +Once backups work it is useful to copy them to a remote server, so when the machine stops functioning we have another chance at recovery. See + +=> ./backup-drops.gmi + +# Recovery + +With tux04 we ran into a problem where all disks were getting corrupted(!) Probably due to the RAID controller, but we still need to figure that one out. + +Anyway, we have to assume the DB is corrupt. Files are corrupt AND the backups are corrupt. Borg backup has checksums which you can + +``` +borg check repo +``` + +it has a --repair switch which we needed to remove some faults in the backup itself: + +``` +borg check --repair repo +``` diff --git a/topics/systems/ci-cd.gmi b/topics/systems/ci-cd.gmi index 6aa17f2..a1ff2e3 100644 --- a/topics/systems/ci-cd.gmi +++ b/topics/systems/ci-cd.gmi @@ -31,7 +31,7 @@ Arun has figured out the CI part. It runs a suitably configured laminar CI servi CD hasn't been figured out. Normally, Guix VMs and containers created by `guix system` can only access the store read-only. Since containers don't have write access to the store, you cannot `guix build' from within a container or deploy new containers from within a container. This is a problem for CD. How do you make Guix containers have write access to the store? -Another alternative for CI/ CID were to have the quick running tests, e.g unit tests, run on each commit to branch "main". Once those are successful, the CI/CD system we choose should automatically pick the latest commit that passed the quick running tests for for further testing and deployment, maybe once an hour or so. Once the next battery of tests is passed, the CI/CD system will create a build/artifact to be deployed to staging and have the next battery of tests runs against it. If that passes, then that artifact could be deployed to production, and details on the commit and +Another alternative for CI/ CD were to have the quick running tests, e.g unit tests, run on each commit to branch "main". Once those are successful, the CI/CD system we choose should automatically pick the latest commit that passed the quick running tests for for further testing and deployment, maybe once an hour or so. Once the next battery of tests is passed, the CI/CD system will create a build/artifact to be deployed to staging and have the next battery of tests runs against it. If that passes, then that artifact could be deployed to production, and details on the commit and #### Possible Steps @@ -90,3 +90,49 @@ This contains a check-list of things that need to be done: => /topics/systems/orchestration Orchestration => /issues/broken-cd Broken-cd (Resolved) + +## Adding a web-hook + +### Github hooks + +IIRC actions run artifacts inside github's infrastracture. We use webhooks: e.g. + +Update the hook at + +=> https://github.com/genenetwork/genenetwork3/settings/hooks + +=> ./screenshot-github-webhook.png + +To trigger CI manually, run this with the project name: + +``` +curl https://ci.genenetwork.org/hooks/example-gn3 +``` + +For gemtext we have a github hook that adds a forge-project and looks like + +```lisp +(define gn-gemtext-threads-project + (forge-project + (name "gn-gemtext-threads") + (repository "https://github.com/genenetwork/gn-gemtext-threads/") + (ci-jobs (list (forge-laminar-job + (name "gn-gemtext-threads") + (run (with-packages (list nss-certs openssl) + (with-imported-modules '((guix build utils)) + #~(begin + (use-modules (guix build utils)) + + (setenv "LC_ALL" "en_US.UTF-8") + (invoke #$(file-append tissue "/bin/tissue") + "pull" "issues.genenetwork.org")))))))) + (ci-jobs-trigger 'webhook))) +``` + +Guix forge can be found at + +=> https://git.systemreboot.net/guix-forge/ + +### git.genenetwork.org hooks + +TBD diff --git a/topics/systems/dns-changes.gmi b/topics/systems/dns-changes.gmi index 7f1d8f1..30aae58 100644 --- a/topics/systems/dns-changes.gmi +++ b/topics/systems/dns-changes.gmi @@ -27,6 +27,7 @@ We are moving thing to a new DNS hosting service. We have accounts on both. To m * Import DNS settings on DNSimple (cut-N-paste) + Edit delegation - make sure the delegation box is set => https://support.dnsimple.com/articles/delegating-dnsimple-registered + + Registration menu item comes up after transfer... * Approve transfer on GoDaddy a few minutes later (!!), see + https://dcc.godaddy.com/control/transfers * Add DNSSec diff --git a/topics/systems/hpc/performance.gmi b/topics/systems/hpc/performance.gmi index ce6a111..ac5e861 100644 --- a/topics/systems/hpc/performance.gmi +++ b/topics/systems/hpc/performance.gmi @@ -12,6 +12,23 @@ For disk speeds make sure there is no load and run hdparm -Ttv /dev/sdc1 ``` +Cheap and cheerful: + +Read test: + +``` +dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct +``` + +Write test: + +``` +/sbin/sysctl -w vm.drop_caches=3 +dd if=./test of=/dev/zero bs=512k count=2048 +``` + + + ## Networking To check the network devices installed use diff --git a/topics/systems/linux/add-boot-partition.gmi b/topics/systems/linux/add-boot-partition.gmi new file mode 100644 index 0000000..564e044 --- /dev/null +++ b/topics/systems/linux/add-boot-partition.gmi @@ -0,0 +1,52 @@ +# Add (2nd) boot and other partitions + +As we handle machines remotely it is often useful to have a secondary boot partition that can be used from grub. + +Basically, create a similar sized boot partition on a different disk and copy the running one over with: + +``` +parted -a optimal /dev/sdb +(parted) p +Model: NVMe CT4000P3SSD8 (scsi) +Disk /dev/sdb: 4001GB +Sector size (logical/physical): 512B/512B +Partition Table: gpt +Disk Flags: + +Number Start End Size File system Name Flags + 1 32.0GB 4001GB 3969GB ext4 bulk + +(parted) rm 1 +mklabel gpt +mkpart fat23 1 1GB +set 1 esp on +align-check optimal 1 +mkpart ext4 1GB 32GB +mkpart swap 32GB 48GB +set 2 boot on # other flags are raid, swap, lvm +set 3 swap on +mkpart scratch 48GB 512GB +mkpart ceph 512GB -1 +``` + +We also took the opportunity to create a new scratch partition (for moving things around) and a ceph partition (for testing). +Resulting in + +``` +Number Start End Size File system Name Flags + 1 1049kB 1000MB 999MB fat23 boot, esp + 2 1000MB 24.0GB 23.0GB ext4 boot, esp + 3 24.0GB 32.0GB 8001MB swap swap + 4 32.0GB 512GB 480GB ext4 scratch + 5 512GB 4001GB 3489GB ceph +``` + +Now we have the drive ready we can copy the existing boot partitions and make sure you don't get it wrong and the target partitiong is larger. +Here the original boot disk is /dev/sda (894Gb). We copy that to the new disk /dev/sdb (3.64Tb) + +``` +root@tux05:/home/wrk# dd if=/dev/sda1 of=/dev/sdb1 +root@tux05:/home/wrk# dd if=/dev/sda2 of=/dev/sdb2 +``` + +Next, test mount the dirs and reboot. You make want to run e2fsck and resize2fs on the new partitions (or their equivalent if you use xfs or something). diff --git a/topics/systems/mariadb/mariadb.gmi b/topics/systems/mariadb/mariadb.gmi index ae0ab19..ec8b739 100644 --- a/topics/systems/mariadb/mariadb.gmi +++ b/topics/systems/mariadb/mariadb.gmi @@ -16,6 +16,8 @@ To install Mariadb (as a container) see below and Start the client and: ``` +mysql +show databases MariaDB [db_webqtl]> show binary logs; +-----------------------+-----------+ | Log_name | File_size | @@ -60,4 +62,11 @@ Stop the running mariadb-guix.service. Restore the latest backup archive and ove => https://www.borgbackup.org/ Borg => https://borgbackup.readthedocs.io/en/stable/ Borg documentation -# +# Upgrade mariadb + +It is wise to upgrade mariadb once in a while. In a disaster recovery it is better to move forward in versions too. +Before upgrading make sure there is a decent backup of the current setup. + +See also + +=> issues/systems/tux04-disk-issues.gmi diff --git a/topics/systems/mariadb/precompute-mapping-input-data.gmi b/topics/systems/mariadb/precompute-mapping-input-data.gmi index 0c89fe5..977120d 100644 --- a/topics/systems/mariadb/precompute-mapping-input-data.gmi +++ b/topics/systems/mariadb/precompute-mapping-input-data.gmi @@ -49,10 +49,29 @@ The original reaper precompute lives in => https://github.com/genenetwork/genenetwork2/blob/testing/scripts/maintenance/QTL_Reaper_v6.py -This script first fetches inbredsets +More recent incarnations are at v8, including a PublishData version that can be found in + +=> https://github.com/genenetwork/genenetwork2/tree/testing/scripts/maintenance + +Note that the locations are on space: + +``` +cd /mount/space2/lily-clone/acenteno/GN-Data +ls -l +python QTL_Reaper_v8_space_good.py 116 +-- +python UPDATE_Mean_MySQL_tab.py +cd /mount/space2/lily-clone/gnshare/gn/web/webqtl/maintainance +ls -l +python QTL_Reaper_cal_lrs.py 7 +``` + +The first task is to prepare an update script that can run a set at a time and compute GEMMA output (instead of reaper). + +The script first fetches inbredsets ``` - select Id,InbredSetId,InbredSetName,Name,SpeciesId,FullName,public,MappingMethodId,GeneticType,Family,FamilyOrder,MenuOrderId,InbredSetCode from InbredSet LIMIT 5; +select Id,InbredSetId,InbredSetName,Name,SpeciesId,FullName,public,MappingMethodId,GeneticType,Family,FamilyOrder,MenuOrderId,InbredSetCode from InbredSet LIMIT 5; +----+-------------+-------------------+----------+-----------+-------------------+--------+-----------------+-------------+--------------------------------------------------+-------------+-------------+---------------+ | Id | InbredSetId | InbredSetName | Name | SpeciesId | FullName | public | MappingMethodId | GeneticType | Family | FamilyOrder | MenuOrderId | InbredSetCode | +----+-------------+-------------------+----------+-----------+-------------------+--------+-----------------+-------------+--------------------------------------------------+-------------+-------------+---------------+ diff --git a/topics/systems/migrate-p2.gmi b/topics/systems/migrate-p2.gmi deleted file mode 100644 index c7fcb90..0000000 --- a/topics/systems/migrate-p2.gmi +++ /dev/null @@ -1,12 +0,0 @@ -* Penguin2 crash - -This week the boot partition of P2 crashed. We have a few lessons here, not least having a fallback for all services ;) - -* Tasks - -- [ ] setup space.uthsc.edu for GN2 development -- [ ] update DNS to tux02 128.169.4.52 and space 128.169.5.175 -- [ ] move CI/CD to tux02 - - -* Notes diff --git a/topics/systems/restore-backups.gmi b/topics/systems/restore-backups.gmi index 518c56d..b97af2b 100644 --- a/topics/systems/restore-backups.gmi +++ b/topics/systems/restore-backups.gmi @@ -26,7 +26,7 @@ The last backup on 'tux02' is from October 2022 - after I did a reinstall. That According to sheepdog the drops are happening to 'space' and 'epysode', but 'tux02' is missing: -=> https://rabbit.genenetwork.org/sheepdog/index.html +=> http://sheepdog.genenetwork.org/sheepdog/status.html ## Mariadb diff --git a/topics/systems/screenshot-github-webhook.png b/topics/systems/screenshot-github-webhook.png Binary files differnew file mode 100644 index 0000000..08feed3 --- /dev/null +++ b/topics/systems/screenshot-github-webhook.png diff --git a/topics/systems/synchronising-the-different-environments.gmi b/topics/systems/synchronising-the-different-environments.gmi new file mode 100644 index 0000000..207b234 --- /dev/null +++ b/topics/systems/synchronising-the-different-environments.gmi @@ -0,0 +1,68 @@ +# Synchronising the Different Environments + +## Tags + +* status: open +* priority: +* type: documentation +* assigned: fredm +* keywords: doc, docs, documentation + +## Introduction + +We have different environments we run for various reasons, e.g. + +* Production: This is the user-facing environment. This is what GeneNetwork is about. +* gn2-fred: production-adjacent. It is meant to test out changes before they get to production. It is **NOT** meant for users. +* CI/CD: Used for development. The latest commits get auto-deployed here. It's the first place (outside of developer machines) where errors and breakages are caught and/or revealed. This will break a lot. Do not expose to users! +* staging: Uploader environment. This is where Felix, Fred and Arthur flesh out the upload process, and tasks, and also test out the uploader. + +These different environments demand synchronisation, in order to have mostly similar results and failure modes. + +## Synchronisation of the Environments + +### Main Database: MariaDB + +* [ ] TODO: Describe process + +=> https://issues.genenetwork.org/topics/systems/restore-backups Extract borg archive +* Automate? Will probably need some checks for data sanity. + +### Authorisation Database + +* [ ] TODO: Describe process + +* Copy backup from production +* Update/replace GN2 client configs in database +* What other things? + +### Virtuoso/RDF + +* [ ] TODO: Describe process + +* Copy TTL (Turtle) files from (where?). Production might not always be latest source of TTL files. +=> https://issues.genenetwork.org/issues/set-up-virtuoso-on-production Run setup to "activate" database entries +* Can we automate this? What checks are necessary? + +## Genotype Files + +* [ ] TODO: Describe process + +* Copy from source-of-truth (currently Zach's tux01 and/or production). +* Rsync? + +### gn-docs + +* [ ] TODO: Describe process + +* Not sure changes from other environments should ever take + +### AI Summaries (aka. gnqna) + +* [ ] TODO: Describe process + +* Update configs (should be once, during container setup) + +### Others? + +* [ ] TODO: Describe process diff --git a/topics/systems/update-production-checklist.gmi b/topics/systems/update-production-checklist.gmi new file mode 100644 index 0000000..b17077b --- /dev/null +++ b/topics/systems/update-production-checklist.gmi @@ -0,0 +1,182 @@ +# Update production checklist + + +# Tasks + +* [X] Install underlying Debian +* [X] Get guix going +* [ ] Check database +* [ ] Check gemma working +* [ ] Check global search +* [ ] Check authentication +* [ ] Check sending E-mails +* [ ] Make sure info.genenetwork.org can reach the DB +* [ ] Backups + +The following are at the system level + +* [ ] Make journalctl presistent +* [ ] Update certificates in CRON +* [ ] Run trim in CRON + +# Install underlying Debian + +For our production systems we use Debian as a base install. Once installed: + +* [X] set up git in /etc and limit permissions to root user +* [X] add ttyS0 support for grub and kernel - so out-of-band works +* [X] start ssh server and configure not to use with passwords +* [X] start nginx and check external networking +* [ ] set up E-mail routing + +It may help to mount the old root if you have it. Now it is on + +``` +mount /dev/sdd2 /mnt/old-root/ +``` + +# Get Guix going + +* [X] Install Guix daemon +* [X] Move /gnu/store to larger partition +* [X] Update Guix daemon and setup in systemd +* [X] Make available in /usr/local/guix-profiles +* [X] Clean up /etc/profile + +We can bootstrap with the Debian guix package. Next move the store to a large partion and hard mount it in /etc/fstab with + +``` +/export2/gnu /gnu none defaults,bind 0 0 +``` + +Run guix pull + +``` +wrk@tux04:~$ guix pull -p ~/opt/guix-pull --url=https://codeberg.org/guix/guix-mirror.git +``` + +Use that to install guix in /usr/local/guix-profiles + +``` +guix package -i guix -p /usr/local/guix-profiles/guix +``` + +and update the daemon in systemd accordingly. After that I tend to remove /usr/bin/guix + +The Debian installer configures guix. I tend to remove the profiles from /etc/profile so people have a minimal profile. + +# Check database + +* [X] Install mariadb +* [ ] Recover database +* [ ] Test permissions +* [ ] Mariadb update my.cnf + +Basically recover the database from a backup is the best start and set permissions. We usually take the default mariadb unless production is already on a newer version - so we move to guix deployment. + +On tux02 mariadb-10.5.8 is running. On Debian it is now 10.11.11-0+deb12u1, so we should be good. On Guix is 10.10 at this point. + +``` +apt-get install mariadb-server +``` + +Next unpack the database files and set permissions to the mysql user. And (don't forget) update the /etc/mysql config files. + +Restart mysql until you see: + +``` +mysql -u webqtlout -p -e "show databases" ++---------------------------+ +| Database | ++---------------------------+ +| 20081110_uthsc_dbdownload | +| db_GeneOntology | +| db_webqtl | +| db_webqtl_s | +| go | +| information_schema | +| kegg | +| mysql | +| performance_schema | +| sys | ++---------------------------+ +``` + +=> topics/systems/mariadb/mariadb.gmi + +## Recover database + +We use borg for backups. First restore the backup on the PCIe. Also a test for overheating! + + +# Check sending E-mails + +The swaks package is quite useful to test for a valid receive host: + +``` +swaks --to testing-my-server@gmail.com --server smtp.uthsc.edu +=== Trying smtp.uthsc.edu:25... +=== Connected to smtp.uthsc.edu. +<- 220 mailrouter8.uthsc.edu ESMTP NO UCE + -> EHLO tux04.uthsc.edu +<- 250-mailrouter8.uthsc.edu +<- 250-PIPELINING +<- 250-SIZE 26214400 +<- 250-VRFY +<- 250-ETRN +<- 250-STARTTLS +<- 250-ENHANCEDSTATUSCODES +<- 250-8BITMIME +<- 250-DSN +<- 250 SMTPUTF8 + -> MAIL FROM:<root@tux04.uthsc.edu> +<- 250 2.1.0 Ok + -> RCPT TO:<pjotr2020@thebird.nl> +<- 250 2.1.5 Ok + -> DATA +<- 354 End data with <CR><LF>.<CR><LF> + -> Date: Thu, 06 Mar 2025 08:34:24 +0000 + -> To: pjotr2020@thebird.nl + -> From: root@tux04.uthsc.edu + -> Subject: test Thu, 06 Mar 2025 08:34:24 +0000 + -> Message-Id: <20250306083424.624509@tux04.uthsc.edu> + -> X-Mailer: swaks v20201014.0 jetmore.org/john/code/swaks/ + -> + -> This is a test mailing + -> + -> + -> . +<- 250 2.0.0 Ok: queued as 4157929DD + -> QUIT +<- 221 2.0.0 Bye === Connection closed with remote host +``` + +An exim configuration can be + +``` +dc_eximconfig_configtype='smarthost' +dc_other_hostnames='genenetwork.org' +dc_local_interfaces='127.0.0.1 ; ::1' +dc_readhost='' +dc_relay_domains='' +dc_minimaldns='false' +dc_relay_nets='' +dc_smarthost='smtp.uthsc.edu' +CFILEMODE='644' +dc_use_split_config='false' +dc_hide_mailname='false' +dc_mailname_in_oh='true' +dc_localdelivery='maildir_home' +``` + +And this should work: + +``` +swaks --to myemailaddress --from john@uthsc.edu --server localhost +``` + +# Backups + +* [ ] Create an ibackup user. +* [ ] Install borg (usually guix version) +* [ ] Create a borg passphrase diff --git a/topics/systems/virtuoso.gmi b/topics/systems/virtuoso.gmi index e911a8b..94a15f0 100644 --- a/topics/systems/virtuoso.gmi +++ b/topics/systems/virtuoso.gmi @@ -104,7 +104,7 @@ After running virtuoso, you will want to change the default password of the `dba In a typical production virtuoso installation, you will want to change the password of the dba user and disable the dav user. Here are the commands to do so. Pay attention to the single versus double quoting. ``` -SQL> set password "dba" "rFw,OntlJ@Sz"; +SQL> set password "dba" "dba"; SQL> UPDATE ws.ws.sys_dav_user SET u_account_disabled=1 WHERE u_name='dav'; SQL> CHECKPOINT; ``` diff --git a/topics/testing/mechanical-rob.gmi b/topics/testing/mechanical-rob.gmi index 9413b47..baf111a 100644 --- a/topics/testing/mechanical-rob.gmi +++ b/topics/testing/mechanical-rob.gmi @@ -1,9 +1,74 @@ # Mechanical Rob -We need to run Mechanical Rob tests as part of our continuous integration tests. +## Tags -The Mechanical Rob CI tests are functioning again now. To see how to run Mechanical Rob, see the CI job definition in the genenetwork-machines repo. +* type: documentation, docs +* assigned: bonfacem, rookie101, fredm +* priority: medium +* status: open +* keywords: tests, testing, mechanical-rob -=> genenetwork-machines/src/branch/main/genenetwork-development.scm +## What is Mechanical Rob? -The invocation procedure is bound to change as the many environment variables in genenetwork2 are cleared up. +Mechanical Rob is our name for what could be considered our integration tests. + +The idea is that we observe how Prof. Robert Williams (Rob) (and other scientists) use(s) GeneNetwork and create a "mechanical" facsimile of that. The purpose is to ensure that the system works correctly with each and every commit in any of our various repositories. + +If any commit causes any part of the Mechanical Rob system to raise an error, then we know, immediately, that something is broken, and the culprit can get onto fixing that with haste. + +## Show Me Some Code!!! + +Nice! I like your enthusiasm. + +You can find the +=> https://github.com/genenetwork/genenetwork2/tree/testing/test/requests Mechanical Rob code here +within the genenetwork2 repository. + +You can also see how it is triggered in the gn-machines repository in +=> https://git.genenetwork.org/gn-machines/tree/genenetwork-development.scm this module. +Search for "genenetwork2-mechanical-rob" within that module and you should find how the system is triggered. + +## How About Running it Locally + +All the above is nice and all, but sometimes you just want to run the checks locally. + +In that case, you can run Mechanical Rob locally by following the steps below: +(note that these steps are mostly the same ones to run GN2 locally). + + +1. Get a guix shell for GN2 development: +``` +$ cd genenetwork2/ +$ guix shell --container --network \ + --expose=</path/to/directory/with/genotypes> \ + --expose=</path/to/local/genenetwork3> \ + --expose=</path/to/setting/file> \ + --expose=</path/to/secrets/file> \ + --file=guix.scm bash +``` +The last `bash` is to ensure we install the Bourne-Again Shell whic we use to launch the application. The `</path/to/local/genenetwork3>` can be omitted if you do not need the latest code in GN3 to be included in your running GN2. + +2. Set up the appropriate environment variables: +``` +[env]$ export HOME=</path/to/home/directory> +[env]$ export GN2_SETTINGS=</path/to/settings/file> +[env]$ export SERVER_PORT=5003 +[env]$ export GN2_PROFILE="${GUIX_ENVIRONMENT}" +[env]$ export GN3_PYTHONPATH=</path/to/local/genenetwor3> # Only needed if you need to test GN3 updates +``` + +3. Run the mechanical-rob tests +``` +[env]$ bash bin/genenetwork2 gn2/default_settings.py -c \ + test/requests/test-website.py \ + --all "http://localhost:${SERVER_PORT}" +``` +Of course, here we are assuming that `SERVER_PORT` has the value of the port on which GN2 is running. + + +## Possible Improvements + +Look into using geckodriver to help with the mechanical-rob tests. +`geckodriver` comes with the +=> https://icecatbrowser.org/index.html GNU IceCat browser +which is present as a package in GNU Guix. diff --git a/topics/xapian/xapian-indexing.gmi b/topics/xapian/xapian-indexing.gmi index 1c82018..68ab7a6 100644 --- a/topics/xapian/xapian-indexing.gmi +++ b/topics/xapian/xapian-indexing.gmi @@ -2,18 +2,48 @@ Due to the enormous size of the GeneNetwork database, indexing it in a reasonable amount of time is a tricky process that calls for careful identification and optimization of the performance bottlenecks. This document is a description of how we achieve it. -Indexing happens in the following three phases. +Indexing happens in these phases. * Phase 1: retrieve data from SQL -* Phase 2: index text -* Phase 3: write Xapian index to disk +* Phase 2: retrieve metadata from RDF +* Phase 3: index text +* Phase 4: write Xapian index to disk -Phases 1 and 3 (that is, the retrieval of data from SQL and writing of the Xapian index to disk) are I/O bound processes. Phase 2 (the actual indexing of text) is CPU bound. So, we parallelize phase 2 while keeping phases 1 and 3 sequential. +Phases 1, 2 and 4 are I/O bound processes. Phase 3 (the actual indexing of text) is CPU bound. So, we parallelize phase 2 while keeping phases 1, 2 and 3 sequential. -There is a long delay in retrieving data from SQL and loading it into memory. In this time, the CPU is waiting on I/O and idling away. In order to avoid this, we retrieve SQL data chunk by chunk and spawn off phase 2 worker processes. Thus, we interleave phase 1 and 2 so that they don't block each other. Despite this, on tux02, the indexing script is only able to keep around 10 of the 128 CPUs busy. As phase 1 is dishing out jobs to phase 2 worker processes, before it can finish dishing out jobs to all 128 CPUs, the earliest worker processes finish and exit. The only way to avoid this and improve CPU utilization would be to further optimize the I/O of phase 1. +There is a long delay in retrieving data from SQL and loading it into memory. In this time, the CPU is waiting on I/O and idling away. In order to avoid this, we retrieve SQL data chunk by chunk and spawn off phase 3 worker processes. We get RDF data in one large call before any processing is done. Thus, we interleave phase 1 and 3 so that they don't block each other. Despite this, on tux02, the indexing script is only able to keep around 10 of the 128 CPUs busy. As phase 1 is dishing out jobs to phase 2 worker processes, before it can finish dishing out jobs to all 128 CPUs, the earliest worker processes finish and exit. The only way to avoid this and improve CPU utilization would be to further optimize the I/O of phase 1. Building a single large Xapian index is not scalable. See detailed report on Xapian scalability. => xapian-scalability So, we let each process of phase 2 build its own separate Xapian index. Finally, we compact and combine them into one large index. When writing smaller indexes in parallel, we take care to lock access to the disk so that only one process is writing to the disk at any given time. If many processes try to simultaneously write to the disk, the write speed is slowed down, often considerably, due to I/O contention. -It is important to note that the performance bottlenecks identified in this document are machine-specific. For example, on my laptop with only 2 cores, CPU performance in phase 2 is the bottleneck. Phase 1 I/O waits on the CPU to finish instead of the other way around. +It is important to note that the performance bottlenecks identified in this document are machine-specific. For example, on my laptop with only 2 cores, CPU performance in phase 3 is the bottleneck. Phase 1 I/O waits on the CPU to finish instead of the other way around. + +## Local Development + +For local development, see: + +=> https://issues.genenetwork.org/topics/database/working-with-virtuoso-locally Working with Virtuoso for Local Development + +Ping @bmunyoki for the ttl folder backups. + +Set up mysql with instructions from + +=> https://issues.genenetwork.org/topics/database/setting-up-local-development-database + +and load up the backup file using: + +> mariadb gn2 < /path/to/backup/file.sql + +A backup file can be generated using: + +> mysqldump -u mysqluser -pmysqlpasswd --opt --where="1 limit 100000" db_webqtl > out.sql +> xz out.sql + +And run the index script using: + +> python3 scripts/index-genenetwork create-xapian-index /tmp/xapian "mysql://gn2:password@localhost/gn2" "http://localhost:8890/sparql" + +Verify the index with: + +> xapian-delve /tmp/xapian |