From f71c760fab17446605d9f1d101a759a1a391dfc7 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sun, 3 Dec 2023 09:49:19 -0600 Subject: Renames --- topics/genenetwork/add-metadata-to-trait-page.gmi | 61 +++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 topics/genenetwork/add-metadata-to-trait-page.gmi (limited to 'topics/genenetwork') diff --git a/topics/genenetwork/add-metadata-to-trait-page.gmi b/topics/genenetwork/add-metadata-to-trait-page.gmi new file mode 100644 index 0000000..8e2d433 --- /dev/null +++ b/topics/genenetwork/add-metadata-to-trait-page.gmi @@ -0,0 +1,61 @@ +# Add Metadata To The Trait Page (RDF) + +Fri 30 Sep 2022 11:48:41 EAT + +## Introduction + +We are migrating the GN2 relational database to a plain text and RDF database. Matrix-like data (E.g. fetching sample data for a given data) will be stored inside GN. + +So far, we are able to convert the sql data to rdf using "dump.scm" defined in: + +=> https://github.com/genenetwork/dump-genenetwork-database + +## What are we trying to solve? + +Data stored in genenetwork resembles a tree. As an example: we have several species; each of these species belong to a group; each group belongs to a "data type"; and each data type belongs to a particular dataset. The first step: capturing - albeit requiring more refinement - this data in RDF has been achieved using the aformentioned scheme script. + +The overall goal is to be ablet to: + +* Incrementally replace MySQL queries with RDF. + +* Annotating existing data with metadata that does not yet exist in GN2. + +## Goals + +In the Trait Analysis page, for example: + +=> https://genenetwork.org/show_trait?trait_id=1434280_at&dataset=HC_M2_0606_P + +and the corresponding GN1 link: + +=> http://gn1.genenetwork.org/webqtl/main.py?cmd=show&db=HC_M2_0606_P&probeset=1434280_at + +which on further inspection presents metadata on that specific dataset group here: + +=> http://gn1.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112 + +We notice that there's metadata in GN1 - which we have in RDF - that we can add to the GN2 traits page. As such, this design doc will be limited to using RDF to: + +* Append metadata about the tissue +* Append relevant metadata about the dataset group, in particular: about the data values and it's processing; about the array platform; experiment type; and contributors. + +Beyond querying metadata, this design doc also proposes the creation of a monadic rdf-fetch similar to what happens in: + +=> https://issues.genenetwork.org/topics/maybe-monad + +### Non-goals + +* Refactoring base classes/sql to solely use RDF. +* Using federated queries - they are slow. +* Writing a script in Guile to fetch and append extra metadata from wikidata and insert them into RDF as extra nodes. This should be tackled as a separate issue. + +## Actual Design + +* Rewrite the existing way of fetching RDF using pymonads. +* React to the change-amplification - should any exist - caused by the above change and add tests where feasible. +* Create endpoints to add extra annotations for Tissue, Dataset Group, Dataset Values and Processing; array platform; experiment type; and contributors. +* Add metadata as links, tooltips, or html tag to the relevant html section(s). + +## Resources + +=> https://www.linkedin.com/pulse/six-secret-sparql-ninja-tricks-kurt-cagle/ -- cgit v1.2.3