Browse Source


Pjotr Prins 10 months ago
2 changed files with 46 additions and 2 deletions
  1. 2
  2. 44

+ 2
- 2

@@ -631,8 +631,8 @@ to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.

<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
Mining summary statistics
Copyright (C) 2019 Pjotr Prins

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by

+ 44
- 0 View File

@@ -0,0 +1,44 @@
* Mining summary statistics with Racket

This module fetches summary statistics, starting with the new EBI API
described at and The general idea is to start
from a gene (alias) or genome segment and find relevant SNPs and
associated phenotypes. These in turn we can use in GeneNetwork to
locate interesting hits related to mouse/rat genes and phenotypes.

In the near future we may be adding more resources.

* EBI Summary Statistics

EBI is providing an update of their old GWAS resource in the form of
an API that can get candidates. They also provide the classic
downloadable tabular data resources which we do not want to use for
two reasons:

1. We want to use the latest information automatically
2. For EBI it is easier to track use and fund future initiatives

The latter reason is more important than meets the eye. These
resources are *not* free.

* Start with a gene

Let's start with mouse gene [[][Shh]] (human alias SHH) and rat [[][Brca2]] (human
alias BRCA2). For the human variant the [[][wikidata page for BRCA2]] points
to the [[][NCBI]] gene description (which is usefully elaborate). That page
in turn points to the [[][Uniprot]] resource.

Wikidata also has the start and en position of the gene for two
reference genomes. hg38 is known as [[][GRCh38.p2]] (2014) which is now at
p13 (2019). So, which one does EBI use? GRCh38.p12 according to the
[[][FAQ]] (October 2019).

The [[][FAQ]] suggests the API supports searching by gene which should make
a lookup easier. In the full API docs this query is shown

: curl '' -i -H 'Accept: application/json' > BRCA2.json

results in a list of SNPs with top/curated associations. For summary statistics
there is no lookup by gene yet - I have put in a request. So, for now
we need to figure out the gene position first using the ENSEMBL API.