summaryrefslogtreecommitdiff
path: root/topics/features/genenetwork/search.gmi
blob: 003a8873ea4e2b11c9a12eb6b2901590e60cc3ed (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Search

## Overview

One of the key features of GN is the powerful search functionality. For most users it is the entry point for using the GeneNetwork web service. On the front-page a menu is offered that allows selecting species, e.g. mouse or rat, and relevant datasets grouped by family, e.g. BXD, and type, e.g. Hippocampus mRNA.
Recently we introduced the Xapian search engine that allows for fast lookups and powerful search queries.
A example search for the BRCA2 results in GN searching for the term "BRCA2" in 754 datasets and 39,765,944 traits across 10 species and found 7998 results that match the query.
The search URL looks like 'https://genenetwork.org/gsearch?type=gene&terms=BRCA2' and can be copy-pasted and shared with other users.

More powerful queries will narrow down on field in the result table. For example to get only mouse results "species:mouse BRCA2" found 5916 results.

Example search terms:

species:mouse BRCA2 - looks good, has Liver
species:mouse Tissue:Liver - 0 results
species:mouse tissue:Liver - 0 results
brca - only renders 2 results, why not BRCA2?

Keywords like tissue appear to be case sensitive. Should not be the case.

In the paper `New Insights on Gene by Environmental Effects of Drugs of Abuse in Animal Models Using GeneNetwork' the authors reanalyzed an older experimental data in the GeneNetwork database.
They discovered QTLs on mouse chromosomes 3, 5, 9, 11, and 14, not found in the original study
and found new candidate genes included Slitrk6 and Cdk14. Slitrk6, in a Chromosome14 QTL for locomotion, was found to be part of a co-expression network involved in voluntary movement and associated with neuropsychiatric phenotypes. Cdk14, one of only three genes in a Chromosome5 QTL, is associated with handling induced convulsions after ethanol treatment, that is regulated by the anticonvulsant drug valproic acid\cite{PMC9024903}.

## Future

* Add human SNP search and synteny
* Support more sources, e.g. geneweaver

## Methods

GN search is built on
Xapian, an Open Source Search Engine Library, released under the GPL v2+. It's written in C++, with bindings to allow use from Python, Guile and other languages.
Xapian is actively maintained and current Xapian users include, for example, the Debian website and the notmuch E-mail indexer.
Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It has built-in support for several families of weighting models and also supports a rich set of boolean query operators\cite{Xapian}.
We build the Xapian index from SQL data and RDF data in the GN databases.

=> topics/xapian-indexing.gmi indexing optimizations