aboutsummaryrefslogtreecommitdiff
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Running GEMMA from GN3\n",
    "\n",
    "This document outlines how to use gemma from Genenetwork3.\n",
    "\n",
    "The current mechanism for how Gemma runs is that when you run one of the endpoints that runs the actual gemma, it constructs the command and queues it in to a REDIS queue; thereby bypassing any time-out issues with the endpoint for a long running process. A worker(sheepdog) processes the endpoints.\n",
    "\n",
    "If you are running gn3 through a development environment, ensure that it is up and running by running the command:\n",
    "\n",
    "```sh\n",
    "env FLASK_APP=\"main.py\" flask run --port 8080\n",
    "```\n",
    "\n",
    "**PS: For these examples, I'm assuming you provide your Genotype files. This will be edited out since they will be replaced by those provided by IPFS.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "### All imports go here!\n",
    "import requests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### GET api/gemma/version\n",
    "\n",
    "To ensure that things are working, run:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'code': 0, 'output': 'gemma-wrapper 0.98.1 (Ruby 2.6.5) by Pjotr Prins 2017,2018\\n'}\n"
     ]
    }
   ],
   "source": [
    "r = requests.get(\"http://gn3-test.genenetwork.org/api/gemma/version\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Uploading data\n",
    "\n",
    "Before you perform any computation, you need to ensure you have your data uploaded. For these examples, I'll use data provided [here](https://github.com/genetics-statistics/gemma-wrapper/tree/master/test/data/input)\n",
    "\n",
    "An example of a metadata.json file:\n",
    "\n",
    "```\n",
    "{\n",
    "    \"title\": \"This is my dataset for testing the REST API\",\n",
    "    \"description\": \"Longer description\",\n",
    "    \"date\": \"20210127\",\n",
    "    \"authors\": [\n",
    "        \"R. W. Williams\"\n",
    "    ],\n",
    "    \"cross\": \"BXD\",\n",
    "    \"geno\": \"/ipfs/QmakcPHuxKouUvuNZ5Gna1pyXSAPB5fFSkqFt5pDydd9A4/GN638/BXH.geno\",\n",
    "    \"pheno\": \"BXD_pheno.txt\",\n",
    "    \"snps\": \"BXD_snps.txt\",\n",
    "    \"covar\": \"BXD_covariates.txt\"\n",
    "}\n",
    "\n",
    "```\n",
    "\n",
    "Note that you provide the genotype file with an ipfs valid address.\n",
    "\n",
    "##### POST /api/metadata/upload/:token\n",
    "\n",
    "If token is provided, your user directory will be overridden with the new upload data!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PEERO1-WE4BBX\n"
     ]
    }
   ],
   "source": [
    "# I intentionally upload the data file without having the metadata file so that I can upload it later.\n",
    "r = requests.post(\"http://gn3-test.genenetwork.org/api/metadata/upload/\",\n",
    "                  files=[('file', ('file.tar.gz',\n",
    "                                   open('/tmp/file-no-metadata.tar.gz',\n",
    "                                        'rb'), 'application/octet-stream'))])\n",
    "\n",
    "token_no_metadata = r.json()[\"token\"]\n",
    "print(token_no_metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can override the upload with the relevant metadatafile if we want. If the contents are the same, we get the same token!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "r = requests.post(\"http://gn3-test.genenetwork.org/api/metadata/upload/VROD4N-XVW1L0\",\n",
    "                  files=[('file', ('file.tar.gz',\n",
    "                                   open('/tmp/file.tar.gz',\n",
    "                                        'rb'), 'application/octet-stream'))])\n",
    "\n",
    "token = r.json()[\"token\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "VROD4N-XVW1L0\n"
     ]
    }
   ],
   "source": [
    "print(token)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use the above token for computations!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## K-Computation\n",
    "\n",
    "Make sure that your metadata file is up to date!\n",
    "You need the genofile, traitfile, and snpsfile. You also need a token provided when you first uploaded your metadata file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Computing K-Values (no loco): \n",
    "##### POST /gemma/k-compute/:token\n",
    "The end command will look something like:\n",
    "\n",
    "```\n",
    "gemma-wrapper --json -- -debug \\\n",
    "                -g genotype-file -p traitfile \\\n",
    "                -a genotypte-snps -gk > input-hash-k-output-filename\"\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'output_file': '5e9b1e32053f3d456418af2119a0a9e0-output.json', 'status': 'queued', 'unique_id': 'cmd::2021-03-2209-1901-1901-aef4973e-6d07-49f3-8216-ea883c6442b3'}\n"
     ]
    }
   ],
   "source": [
    "r = requests.post(\"http://gn3-test.genenetwork.org/api/\"\n",
    "                  f\"gemma/k-compute/{token}\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This generates the command:\n",
    "\n",
    "```\n",
    "gemma-wrapper --json -- -g /tmp/cache/QmRZpiWCPxn6d1sCaPLpFQBpDmaPaPygvZ165oK4B9myjy/GN602/BXD.geno -p /tmp/VROD4N-XVW1L0/BXD_pheno.txt -a /tmp/VROD4N-XVW1L0/BXD_snps.txt -gk > /tmp/VROD4N-XVW1L0/5e9b1e32053f3d456418af2119a0a9e0-output.json\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check the status of the command"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'status': 'error'}\n"
     ]
    }
   ],
   "source": [
    "r = requests.get(\"http://gn3-test.genenetwork.org/api/gemma/\"\n",
    "                 \"status/cmd%3A%3A2021-03-2209-1901-1901-aef4973e-6d07-\"\n",
    "                 \"49f3-8216-ea883c6442b3\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Work as expected with the error log:\n",
    "\n",
    "```\n",
    "Using GEMMA 0.98.4 (2020-12-15) by Xiang Zhou and team (C) 2012-2020\n",
    "\n",
    "Found 0.98.4, comparing against expected v0.98.0\n",
    "\n",
    "GEMMA 0.98.4 (2020-12-15) by Xiang Zhou and team (C) 2012-2020\n",
    "Reading Files ...\n",
    "**** FAILED: Parsing input file '/tmp/cache/QmRZpiWCPxn6d1sCaPLpFQBpDmaPaPygvZ165oK4B9myjy/GN602/BXD.geno' failed in function ReadFile_geno in src/gemma_io.cpp at line 744                                                              \n",
    "{\"warnings\":[],\"errno\":2,\"debug\":[],\"type\":\"K\",\"files\":[[null,\"/tmp/605220a4d963542392fb44c15060cf6a8cee659b.log.txt\",\"/tmp/605220a4d963542392fb44c15060cf6a8cee659b.cXX.txt\"]],\"gemma_command\":\"/gnu/store/yyrfmg0i18w190l4lb21cv86fqclalsk-gemma-gn2-git-0.98.3-47221d6/bin/gemma -g /tmp/cache/QmRZpiWCPxn6d1sCaPLpFQBpDmaPaPygvZ165oK4B9myjy/GN602/BXD.geno -p /tmp/VROD4N-XVW1L0/BXD_pheno.txt -a /tmp/VROD4N-XVW1L0/BXD_snps.txt -gk -outdir /tmp -o 605220a4d963542392fb44c15060cf6a8cee659b\"}Traceback (most recent call last):                          \n",
    "        6: from /gnu/store/86df7mjr3y1mrz62k4zipm6bznj10faj-profile/bin/gemma-wrapper:4:in `<main>'                                                        \n",
    "        5: from /gnu/store/86df7mjr3y1mrz62k4zipm6bznj10faj-profile/bin/gemma-wrapper:4:in `load'                                                          \n",
    "        4: from /gnu/store/p1alkfcsw3m24vlgxfb6gk24zn67h8n2-gemma-wrapper-0.98.1/bin/.real/gemma-wrapper:23:in `<top (required)>'                          \n",
    "        3: from /gnu/store/p1alkfcsw3m24vlgxfb6gk24zn67h8n2-gemma-wrapper-0.98.1/bin/.real/gemma-wrapper:23:in `load'                                      \n",
    "        2: from /gnu/store/p1alkfcsw3m24vlgxfb6gk24zn67h8n2-gemma-wrapper-0.98.1/lib/ruby/vendor_ruby/gems/bio-gemma-wrapper-0.98.1/bin/gemma-wrapper:345:in `<top (required)>'                                                          \n",
    "        1: from /gnu/store/p1alkfcsw3m24vlgxfb6gk24zn67h8n2-gemma-wrapper-0.98.1/lib/ruby/vendor_ruby/gems/bio-gemma-wrapper-0.98.1/bin/gemma-wrapper:316:in `block in <top (required)>'                                                 \n",
    "/gnu/store/p1alkfcsw3m24vlgxfb6gk24zn67h8n2-gemma-wrapper-0.98.1/lib/ruby/vendor_ruby/gems/bio-gemma-wrapper-0.98.1/bin/gemma-wrapper:278:in `block in <top (required)>': exit on GEMMA error 2 (RuntimeError) \n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'status': 'error'}\n"
     ]
    }
   ],
   "source": [
    "# After a short while:\n",
    "# Commands will throw error since I didn't know where the right\n",
    "# genotype files for BXD are :(\n",
    "r = requests.get(\"http://gn3-test.genenetwork.org/api/gemma/status/\"\n",
    "                 \"cmd%3A%3A2021-03-2209-0015-\"\n",
    "                 \"0015-bb5c765f-93a3-45d1-a50d-b817cd7192e7\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### POST /api/gemma/k-compute/loco/:chromosomes/:token\n",
    "\n",
    "Cqmpute K values with chromosomes. The end command will look similar to:\n",
    "\n",
    "```\n",
    "  gemma-wrapper --json --loco 1,2,3,4 \\\n",
    "                -debug -g genotypefile -p traitfile \\\n",
    "                -a genotype-snps -gk > k_output_filename.json\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'output_file': '8f4906862459e59dcb452fd8162d2cc1-+O9bus-output.json', 'status': 'queued', 'unique_id': 'cmd::2021-03-1006-1926-1926-60d0aed2-1645-44e0-ba21-28f37bb4e688'}\n"
     ]
    }
   ],
   "source": [
    "r = requests.post(\"http://gn3-test.genenetwork.org/api/gemma/\"\n",
    "                  \"k-compute/loco/1%2C2%2C3%2C4/VROD4N-XVW1L0\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## GWA Computation\n",
    "##### POST /api/gemma/gwa-compute/:k-inputfile/:token\n",
    "(No Loco; No covars)\n",
    "Assuming we use the previously generated k-inputfile\n",
    "Also, K-inputfile can be any file you added during the data upload!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'output_file': '8f4906862459e59dcb452fd8162d2cc1-output.json', 'status': 'queued', 'unique_id': 'cmd::2021-03-1006-3257-3257-30669a04-dc3d-4cce-a622-8be2103a864f'}\n"
     ]
    }
   ],
   "source": [
    "r = requests.post(\"http://gn3-test.genenetwork.org/api/\"\n",
    "                  \"gemma/gwa-compute/8f4906862459e59dcb\"\n",
    "                  \"452fd8162d2cc1-output.json/VROD4N-XVW1L0\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### POST /api/gemma/gwa-compute/covars/:k-inputfile/:token\n",
    "\n",
    "The covars file is fetched from the metadata file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'output_file': 'c718773b04935405258315b9588d13e6-output.json', 'status': 'queued', 'unique_id': 'cmd::2021-03-1006-3518-3518-70d057d3-cb07-4171-be07-e1dafe1fb278'}\n"
     ]
    }
   ],
   "source": [
    "r = requests.post(\"http://gn3-test.genenetwork.org/api/gemma/\"\n",
    "                  \"gwa-compute/covars/\"\n",
    "                  \"8f4906862459e59dcb452fd8162d2cc1-output.json/\"\n",
    "                  \"VROD4N-XVW1L0\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### POST /api/gemma/gwa-compute/:k-inputfile/loco/maf/:maf/:token\n",
    "\n",
    "Compute GWA with loco(maf has to be given), no covars.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'output_file': '8f4906862459e59dcb452fd8162d2cc1-output.json', 'status': 'queued', 'unique_id': 'cmd::2021-03-1006-3848-3848-c66924be-fbf9-494a-9123-eb5941aca912'}\n"
     ]
    }
   ],
   "source": [
    "r = requests.post(\"http://gn3-test.genenetwork.org/api/\"\n",
    "                  \"gemma/gwa-compute/\"\n",
    "                  \"8f4906862459e59dcb452fd8162d2cc1-output.json/\"\n",
    "                  \"loco/maf/9/VROD4N-XVW1L0\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### POST /api/gemma/gwa-compute/:k-inputfile/loco/covars/maf/:maf/:token\n",
    "\n",
    "The covariate file is fetched from the name defined in the metadata json file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'output_file': 'c718773b04935405258315b9588d13e6-output.json', 'status': 'queued', 'unique_id': 'cmd::2021-03-1006-5255-5255-46765ebe-bbca-4402-86fa-a4c145ad4f71'}\n"
     ]
    }
   ],
   "source": [
    "r = requests.post(\"http://gn3-test.genenetwork.org/api/gemma/\"\n",
    "                  \"gwa-compute/8f4906862459e59dcb452fd8162d2cc1\"\n",
    "                  \"-output.json/loco/covars/maf/9/VROD4N-XVW1L0\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## K-GWA computation\n",
    "\n",
    "Computing k and gwa in one full swoop. This is important since in GN2, gemma does this in one full swoop."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### POST /api/gemma/k-gwa-compute/covars/maf/:maf/:token\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'output_file': 'c718773b04935405258315b9588d13e6-output.json', 'status': 'queued', 'unique_id': 'cmd::2021-03-1010-1138-1138-558d7849-793f-4625-aac9-73d6bbc6dfdb'}\n"
     ]
    }
   ],
   "source": [
    "r = requests.post(\"http://gn3-test.genenetwork.org/api/gemma/\"\n",
    "                  \"k-gwa-compute/covars/VROD4N-XVW1L0\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### POST /api/gemma/k-gwa-compute/loco/:chromosomes/maf/:maf/:token"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'output_file': '8f4906862459e59dcb452fd8162d2cc1-output.json', 'status': 'queued', 'unique_id': 'cmd::2021-03-1010-1402-1402-1fe13423-e4f6-4f4f-9c1d-c855e3ab55b5'}\n"
     ]
    }
   ],
   "source": [
    "r = requests.post(\"http://gn3-test.genenetwork.org/api/gemma/\"\n",
    "                  \"k-gwa-compute/loco/1%2C2%2C3/maf/9/VROD4N-XVW1L0\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### POST /api/k-gwa-compute/covars/loco/:chromosomes/maf/:maf/:token"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'output_file': 'c718773b04935405258315b9588d13e6-output.json', 'status': 'queued', 'unique_id': 'cmd::2021-03-1010-1947-1947-c07c85e7-1355-4fab-bf9c-6c5e68f91a36'}\n"
     ]
    }
   ],
   "source": [
    "r = requests.post(\"http://gn3-test.genenetwork.org/api/gemma/\"\n",
    "                  \"k-gwa-compute/covars/loco/1%2C2%2C3/maf/9/VROD4N-XVW1L0\")\n",
    "print(r.json())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}