blob: 8e2d433ae094b948acae24700aff3940b8fd3ca2 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
|
# Add Metadata To The Trait Page (RDF)
Fri 30 Sep 2022 11:48:41 EAT
## Introduction
We are migrating the GN2 relational database to a plain text and RDF database. Matrix-like data (E.g. fetching sample data for a given data) will be stored inside GN.
So far, we are able to convert the sql data to rdf using "dump.scm" defined in:
=> https://github.com/genenetwork/dump-genenetwork-database
## What are we trying to solve?
Data stored in genenetwork resembles a tree. As an example: we have several species; each of these species belong to a group; each group belongs to a "data type"; and each data type belongs to a particular dataset. The first step: capturing - albeit requiring more refinement - this data in RDF has been achieved using the aformentioned scheme script.
The overall goal is to be ablet to:
* Incrementally replace MySQL queries with RDF.
* Annotating existing data with metadata that does not yet exist in GN2.
## Goals
In the Trait Analysis page, for example:
=> https://genenetwork.org/show_trait?trait_id=1434280_at&dataset=HC_M2_0606_P
and the corresponding GN1 link:
=> http://gn1.genenetwork.org/webqtl/main.py?cmd=show&db=HC_M2_0606_P&probeset=1434280_at
which on further inspection presents metadata on that specific dataset group here:
=> http://gn1.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112
We notice that there's metadata in GN1 - which we have in RDF - that we can add to the GN2 traits page. As such, this design doc will be limited to using RDF to:
* Append metadata about the tissue
* Append relevant metadata about the dataset group, in particular: about the data values and it's processing; about the array platform; experiment type; and contributors.
Beyond querying metadata, this design doc also proposes the creation of a monadic rdf-fetch similar to what happens in:
=> https://issues.genenetwork.org/topics/maybe-monad
### Non-goals
* Refactoring base classes/sql to solely use RDF.
* Using federated queries - they are slow.
* Writing a script in Guile to fetch and append extra metadata from wikidata and insert them into RDF as extra nodes. This should be tackled as a separate issue.
## Actual Design
* Rewrite the existing way of fetching RDF using pymonads.
* React to the change-amplification - should any exist - caused by the above change and add tests where feasible.
* Create endpoints to add extra annotations for Tissue, Dataset Group, Dataset Values and Processing; array platform; experiment type; and contributors.
* Add metadata as links, tooltips, or html <summary> tag to the relevant html section(s).
## Resources
=> https://www.linkedin.com/pulse/six-secret-sparql-ninja-tricks-kurt-cagle/
|