summary refs log tree commit diff
path: root/issues
diff options
context:
space:
mode:
authorMunyoki Kilyungi2026-02-09 15:19:56 +0300
committerMunyoki Kilyungi2026-02-09 15:20:57 +0300
commit3f5b36124bc2ade33aaf205f5b7b179334fd39c0 (patch)
tree5264f86d1b949a1f1545b4a9329e205c807e1c21 /issues
parent838c4f5595abce433f534839a262d6634610b910 (diff)
downloadgn-ai-3f5b36124bc2ade33aaf205f5b7b179334fd39c0.tar.gz
Update issues.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
Diffstat (limited to 'issues')
-rw-r--r--issues/rdf/expose-lmdb-view-in-rdf.gmi7
-rw-r--r--issues/rdf/rdf-refinement.gmi40
2 files changed, 45 insertions, 2 deletions
diff --git a/issues/rdf/expose-lmdb-view-in-rdf.gmi b/issues/rdf/expose-lmdb-view-in-rdf.gmi
index d58f7a7c..12b5d868 100644
--- a/issues/rdf/expose-lmdb-view-in-rdf.gmi
+++ b/issues/rdf/expose-lmdb-view-in-rdf.gmi
@@ -57,12 +57,17 @@ ORDER BY
 > Comments: PR in genenetwork3
 => https://github.com/genenetwork/genenetwork3/pull/240
 
-* [-] Deploy functionality to tux02
+* [X] Deploy functionality to tux02
 
 Prototype:
 
 => https://github.com/Alexanderlacuna/gn_lmdb_rdf_interface/tree/master
 
+* [X] (bonfacem) Modify endpoint to use json extension:
+```
+curl http://127.0.0.1:8091/dataset/bxd-publish/values/10002.json
+```
+* [X] (alexm) Remove null values from data end-point.
 * [ ] (bonfacem, pjotrp, alexm) How to work with case-attributes metadata.
 * [ ] (bonfacem) Add above link to RDF.
 
diff --git a/issues/rdf/rdf-refinement.gmi b/issues/rdf/rdf-refinement.gmi
index 9199b656..a841c011 100644
--- a/issues/rdf/rdf-refinement.gmi
+++ b/issues/rdf/rdf-refinement.gmi
@@ -275,7 +275,7 @@ Genotypes and markers are different but related.  Different Species can have dif
 * [X] Link geno-files to the correct data (ref gn2 code on how this is done)
 => https://files.genenetwork.org/current/ Genotype files.   The dir reps the InfoPages.AccesionId.
 * [X] Create global namespace for geno-files.
-* [ ] probesets
+* [X] probesets
 
 All probesets should have a name:
 ```
@@ -283,8 +283,45 @@ SELECT *
 FROM ProbeSet
 WHERE ProbeSet.Name IS NULL
    OR TRIM(ProbeSet.Name) = ''\G
+```
+Number of probesets we have:
+```
+MariaDB [db_webqtl]> select count(*) from ProbeSet;
++----------+
+| count(*) |
++----------+
+|  6436251 |
++----------+
+```
 
+Number of experiment that use probesets:
+
+```
+MariaDB [db_webqtl]> select count(*) from ProbeSetXRef;
++----------+
+| count(*) |
++----------+
+| 49131499 |
++----------+
+```
+
+We can get away with tx'ing ProbeSet in one go.   However, file size gets too big and rapper complains about it.   Instead, figure out a way to tx ProbeSetXRef in chunks.   Note: total transform times averages at about ~21 mins.   With probesets/probesetxref, that will balloon upto >1hr.   Not worried about optimising things now.    That can be worked out for later.
+* [ ] ProbeSetXRef
+* [ ] (w/ Johannesm/pjotrp/rob) What columns to put into RDF.   We have 72 rows ATM:
+
+```
+MariaDB [db_webqtl]> SELECT COUNT(*) AS column_count
+    -> FROM INFORMATION_SCHEMA.COLUMNS
+    -> WHERE TABLE_SCHEMA = DATABASE()
+    ->   AND TABLE_NAME = 'ProbeSet';
++--------------+
+| column_count |
++--------------+
+|           72 |
++--------------+
+1 row in set (0.00 sec)
 ```
+* [ ] (w/ Alex) ProbeSetData
 * [X] RIF
 * [-] (cancelled) Gene Symbols
 
@@ -295,6 +332,7 @@ WHERE ProbeSet.Name IS NULL
 
 ## Post Mark-up
 * [ ] ! Generate a list of data older than 2020 and ping Rob/Pjotr.
+* [ ] (w/Alex) Revisit data privacy in the LMDB view.
 * [-] (Cancelled) Re-visit how we store all HTML metadata.   Clean this up.
 * [ ] Sync mariadb tux01 with tux02; have rdf.genenetwork.org be the latest.
 * [ ] Make sure that the rdf.genenetwork.org named graph is available on public end-point (mention to Fred about the nuance of moving to a new graph without breaking CD/Prod from old code that used the old genenetwork.org graph).