summaryrefslogtreecommitdiff
path: root/issues/add-metadata-to-traits-page.gmi
blob: 5844aa9539f0c0f57add9a4618badc4d331f5cbc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# Annotate traits page with metadata from RDF

* assigned: bonfacem
* priority: high
* status: in progress
* keywords: rdf, critical, metadata

Read the design-doc here:

=> /topics/add-metadata-to-trait-page

This task is related to:

=> /issues/capture-data-on-BXDs-in-RDF


# Tasks
## Exploration

* [X] Modify - experimental - how we dump the CSV files

In one of the reviews, it was pointed that we shouldn't really store table IDs in RDF.  They are confusing and not necessarily important.

* [X] Experiment with replacing some of the base SQL queries with RDF

No much progress was made with this.  We use table IDs to reference and build relationships in GN.  This has made it difficult to have drop-in replacements in RDF without breaking a big part of GN2 functionality.  Also, how we fetch data in GN through deep inheritance.  With time, this deep inheritance that introduces unnecessary coupling should be untangled - a task for another day - in favor of composition.

* [X] Explore federated queries using wikidata/Uniprot

This demo was unsatisfactory; but nevertheless, I have a good understanding of how federated queries work. My findings are: federated queries can be slow (querying wikidata took as long as 5-30 seconds).  As such, a better strategy would be to write scripts to enrich our dataset from other data sources as entries with the right ontology.  Also, being exposed to many other RDF sources that use different ontology was confusing.


## Fetch metadata - datasets - using RDF

* [X] Initial experimentation in python-script + demo to the team.

The submitted demo was for this SPARQL query for the trait:

```
PREFIX gn: <http://genenetwork.org/>
SELECT ?name ?dataset ?dataset_group ?title ?summary ?aboutTissue ?aboutPlatform ?aboutProcessing
WHERE {
  ?dataset gn:accessionId "GN112" ;
           rdf:type gn:dataset .
OPTIONAL { ?dataset gn:name ?name } .
OPTIONAL { ?dataset gn:aboutTissue ?aboutTissue} .
OPTIONAL { ?dataset gn:title ?title } .
OPTIONAL { ?dataset gn:summary ?summary } .
OPTIONAL { ?dataset gn:aboutPlatform ?aboutPlatform} .
OPTIONAL { ?dataset gn:aboutProcessing ?aboutProcessing} .
OPTIONAL { ?dataset gn:geoSeries ?geo_series } .
}
```

The particular trait in GN2 is:

=> https://genenetwork.org/show_trait?trait_id=1458764_at&dataset=HC_M2_0606_P

The equivalent version in GN1 is:

=> http://gn1.genenetwork.org/webqtl/main.py?cmd=show&db=HC_M2_0606_P&probeset=1454998_at

Metadata about this dataset can be found in:

=> http://gn1.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112


* [X] Work out what type of datasets have accession id's
* [X] Refactor the dataset fetch fn in GN3 to use the Maybe Monad
* [X] Write tests for the above
* [X] Test on test database upstream - if this is set-up
* [X] Submit patches for review

## Display the metadata in GN2 as HTML

* [x] Display the metadata as part of GN2 web-page.  This data is stored as HTML.  How do work around this WRT displaying in a UI friendly way?
* [x] Determine whether to load the RDF metadata as part of the response or create an entirely different endpoint for it
* [x] Submit patches for review

## Editing metadata

* [x] Research ways of editing that text if the user really wants to
* [x] Inspect how GN1 did the edits
* [x] Work out if the edits are feasible/important to work on and communicate to PJ/Arun

## Next steps

* [x] Spec out LMDB integration and how it would work out with current statistical operations.  Collaborate with Alex/Fred on this

## Resolution

This has been worked on in

=> https://github.com/genenetwork/genenetwork3/pull/108 GN3#108

=> https://github.com/genenetwork/genenetwork3/pull/107 GN3#107

=> https://github.com/genenetwork/genenetwork3/pull/106 GN3#106

=> https://github.com/genenetwork/genenetwork3/pull/104 GN3#104

=> https://github.com/genenetwork/genenetwork3/pull/101 GN3#101

=> https://github.com/genenetwork/genenetwork2/pull/757 GN2#757

=> https://github.com/genenetwork/genenetwork2/pull/756 GN2#756

=> https://github.com/genenetwork/genenetwork2/pull/755 GN2#755

=> https://github.com/genenetwork/genenetwork2/pull/753 GN2#753

=> https://github.com/genenetwork/genenetwork2/pull/751 GN2#751

=> https://github.com/genenetwork/genenetwork2/pull/747 GN2#747

* closed