summaryrefslogtreecommitdiff
path: root/issues/annotation.gmi
blob: da739de46f670ee31b385b19f9c9fa5561bbfde7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Annotation

Annotation, assembly and liftover are recurring themes because they are tied to (updating) reference genomes. Here we track a next generation annotation system which will support versions of annotation with a matching reference genome. In the future we'll build this out with pangenome support.

# Tags

* assigned: pjotrp,zsloan,arthurc,robw
* priority: medium
* type: enhancement
* status: open, in progress
* keywords: mapping, architecture


# TODO

* [ ] Point out where annotation is used in code (@zsloan)
* [ ] Come up with a storage model to replace the existing Mariadb tables (pjotrp, aruni)
* [ ] Implement storage backend
* [ ] Create examples to replace existing code
* [ ] Replace all code to allow for selecting versions

# Notes

=> https://github.com/genenetwork/genenetwork2/blob/testing/wqflask/wqflask/interval_analyst/GeneUtil.py GeneList table

is only used here, with all position fields being the ones that would ideally account for assembly (txStart, txEnd, cdsStart, cdsEnd, exonStarts, exonEnds - no idea why this last one has an "s" after Start/End)

=> https://github.com/genenetwork/genenetwork2/blob/df36cc808e21202075d32559c9f295087bd8aee2/wqflask/base/trait.py#L528-L597

where max LRS locus position is fetched for traits

=> https://github.com/genenetwork/genenetwork2/blob/df36cc808e21202075d32559c9f295087bd8aee2/wqflask/wqflask/do_search.py#L819-L857 - Search queries

notably the Position Search (the most "ambitious" version of this would allow for querying based off of different assemblies, but I can't think of a good way to do this with SQL)