summaryrefslogtreecommitdiff
path: root/issues/capture-data-on-BXDs-in-RDF.gmi
blob: 2d50f6a63877decc737e64bb5bc2eaaa85771817 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# Capture Data on the BXDs in RDF

## Tags

* assigned: bonfacem, pjotrp, zsloan
* priority: high
* status: unclear
* type: feature-request, enhancement
* keywords: RDF, BXD, REST, from github, high priority

## Description

### Is your feature request related to a problem? Please describe.

We aim to capture metadata on the BXDs in RDF which will make querying
metadata fairly trivial. Next to a pangenome, which is a graph on DNA,
we should present metadata in a structured way (a graph on everything
else).

Metadata and pangenomes ought to go hand in hand


### Describe the solution you'd like

For the BXD, let's take the GN metadata forms as a starting point and
explain those in RDF terms. Next we slowly add what we think
sensible. When you have metadata, what do you want to get out of it?
For examples:

* What experiments are in GN related to BXD and addiction?
* What experiments are in GN on addiction related genes?
* What publications are in GN related to addiction and published after 2010?

We want to make search easy and to disambiguate terms.

### Describe alternatives you've considered

Modelling everything in MariaDB. The problem with that is that we have
redundant data and complexity increases with time.

### Additional context

http://gn1.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112 View existing metadata here

We should use wikidata because it's based on RDF and is here to stay. For example this mouse gene

=> https://www.wikidata.org/wiki/Q18303218
relates to mouse

=> https://www.wikidata.org/wiki/Q83310.
You can find some models

=> https://www.wikidata.org/wiki/Q24082698.
To add the BXD we would start by editing wikidata which has the benefit that it gets presented in Wikipedia (wikidata is the backend).



Look at existing ontology. A few are mentioned in:
=> https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-4-13 The Vertebrate Trait Ontology: a controlled vocabulary for the annotation of trait data across species

When an ontology exists and it looks sensible we should reuse that. If non-existent, we create our own ontology. Obviously we can't get away from adding free flow textual fields. What we want to lift out is what we want to be able to search on.


## Resolution

Relevant topic:

=> topics/add-metadata-to-trait-page Add Metadata To The Trait Page (RDF)

This is currently being worked on in:

=> issues/add-metadata-to-traits-page Annotate traits page with metadata from RDF

Work on dumping RDF has already been done in:

=> https://github.com/genenetwork/dump-genenetwork-database dump-genenetwork-database

Also, vector/matrix data should be put in lmdb, and this is a separate issue on it's own.

* closed