From c045122908d36bba4ca197f3f67e89d80958f38f Mon Sep 17 00:00:00 2001
From: Muriithi Frederick Muriuki
Date: Mon, 30 Aug 2021 05:23:22 +0300
Subject: Document acquired knowledge on `rust-qtlreaper`

Issue:
https://github.com/genenetwork/gn-gemtext-threads/blob/main/topics/gn1-migration-to-gn2/clustering.gmi

* gn3/heatmaps/heatmaps.py: document format of the traits file

  To assist future developers, and development of the system, this commit
  documents some of the hard-won knowledge about the operation of the system
  to ease future development of the system.

  The documentation, if good, might also help with future onboarding of new
  developers to the system.
---
 README.md | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

(limited to 'README.md')
diff --git a/README.md b/README.md
index c3a9848..0e0e509 100644
--- a/README.md
+++ b/README.md
@@ -120,3 +120,24 @@ guix. To freeze dependencies:
 pip freeze --path venv/lib/python3.8/site-packages > requirements.txt
 
 ```
+
+## QTLReaper (rust-qtlreaper) and Trait Files
+
+To run QTL computations, this system makes use of the [rust-qtlreaper](https://github.com/chfi/rust-qtlreaper.git) utility.
+
+To do this, the system needs to export the trait data into a tab-separated file, that can then be passed to the utility using the `--traits` option. For more information about the available options, please [see the rust-qtlreaper](https://github.com/chfi/rust-qtlreaper.git) repository.
+
+### Traits File Format
+
+The traits file begins with a header row/line with the column headers. The first column in the file has the header **"Trait"**. Every other column has a header for one of the strains in consideration.
+
+Under the **"Trait"** column, the traits are numbered from **T1** to **T<n>** where **<n>** is the count of the total number of traits in consideration.
+
+As an example, you could end up with a trait file like the following:
+
+```txt
+Traits	BXD27	BXD32	DBA/2J	BXD21	...
+T1	10.5735	9.27408	9.48255	9.18253	...
+T2	6.4471	6.7191	5.98015	6.68051	...
+...
+```
-- 
cgit v1.2.3


From 58f59b8f7df82969b58a604070aec095d17e0501 Mon Sep 17 00:00:00 2001
From: Muriithi Frederick Muriuki
Date: Mon, 30 Aug 2021 11:44:37 +0300
Subject: Fix issues with traits file format

* README.md: update header: Traits ==> Trait
* gn3/computations/qtlreaper.py: update header: Traits ==> Trait
* qtlfilesexport.py: Choose only BXD strains

  Rename the first column header from "Traits" to "Trait" to correspond with
  what `rust-qtlreaper` expects.

  Choose only the BXD strains for the proof-of-concept example - this helped
  bring out the fact that the traits file SHOULD NOT contain a strain column
  for a strain that does not exist in the genotype file in consideration.

  If the traits file has a strain column which does not exist in the genotype
  file, then `rust-qtlreaper` fails with a panic, since, from what I can tell,
  it tries to get a value from the genotype file for the non-existent strain,
  which results to a `None` type. Subsequent attempts at running an operation
  on the `None` type lead to the panic.
---
 README.md                     |  4 +++-
 gn3/computations/qtlreaper.py |  2 +-
 qtlfilesexport.py             | 31 ++++++++++++++++++++++++++++++-
 3 files changed, 34 insertions(+), 3 deletions(-)

(limited to 'README.md')

diff --git a/README.md b/README.md
index 0e0e509..b54015f 100644
--- a/README.md
+++ b/README.md
@@ -136,8 +136,10 @@ Under the **"Trait"** column, the traits are numbered from **T1** to **T<n>** wh
 As an example, you could end up with a trait file like the following:
 
 ```txt
-Traits	BXD27	BXD32	DBA/2J	BXD21	...
+Trait	BXD27	BXD32	DBA/2J	BXD21	...
 T1	10.5735	9.27408	9.48255	9.18253	...
 T2	6.4471	6.7191	5.98015	6.68051	...
 ...
 ```
+
+It is very important that the column header names for the strains correspond to the genotype file used.
diff --git a/gn3/computations/qtlreaper.py b/gn3/computations/qtlreaper.py
index a88659e..9b13a55 100644
--- a/gn3/computations/qtlreaper.py
+++ b/gn3/computations/qtlreaper.py
@@ -24,7 +24,7 @@ def generate_traits_file(strains, trait_values, traits_filename):
     traits_filename: The tab-separated value to put the values in for
         computation of QTLs.
     """
-    header = "Traits\t{}\n".format("\t".join(strains))
+    header = "Trait\t{}\n".format("\t".join(strains))
     data = [header] + [
         "T{}\t{}\n".format(i+1, "\t".join([str(i) for i in t]))
         for i, t in enumerate(trait_values[:-1])] + [
diff --git a/qtlfilesexport.py b/qtlfilesexport.py
index 0543dc9..adc5e77 100644
--- a/qtlfilesexport.py
+++ b/qtlfilesexport.py
@@ -41,7 +41,36 @@ def main():
         retrieve_trait_info(threshold, fullname, conn)
         for fullname in trait_fullnames()]
     traits_data_list = [retrieve_trait_data(t, conn) for t in traits]
-    strains = list(set([k for td in traits_data_list for k in td["data"].keys()]))
+    # strains = list(set([k for td in traits_data_list for k in td["data"].keys()]))
+    strains = [# Use only the strains in the BXD.geno genotype file
+        "BXD1", "BXD2", "BXD5", "BXD6", "BXD8", "BXD9", "BXD11", "BXD12",
+        "BXD13", "BXD14", "BXD15", "BXD16", "BXD18", "BXD19", "BXD20", "BXD21",
+        "BXD22", "BXD23", "BXD24", "BXD24a", "BXD25", "BXD27", "BXD28", "BXD29",
+        "BXD30", "BXD31", "BXD32", "BXD33", "BXD34", "BXD35", "BXD36", "BXD37",
+        "BXD38", "BXD39", "BXD40", "BXD41", "BXD42", "BXD43", "BXD44", "BXD45",
+        "BXD48", "BXD48a", "BXD49", "BXD50", "BXD51", "BXD52", "BXD53", "BXD54",
+        "BXD55", "BXD56", "BXD59", "BXD60", "BXD61", "BXD62", "BXD63", "BXD64",
+        "BXD65", "BXD65a", "BXD65b", "BXD66", "BXD67", "BXD68", "BXD69",
+        "BXD70", "BXD71", "BXD72", "BXD73", "BXD73a", "BXD73b", "BXD74",
+        "BXD75", "BXD76", "BXD77", "BXD78", "BXD79", "BXD81", "BXD83", "BXD84",
+        "BXD85", "BXD86", "BXD87", "BXD88", "BXD89", "BXD90", "BXD91", "BXD93",
+        "BXD94", "BXD95", "BXD98", "BXD99", "BXD100", "BXD101", "BXD102",
+        "BXD104", "BXD105", "BXD106", "BXD107", "BXD108", "BXD109", "BXD110",
+        "BXD111", "BXD112", "BXD113", "BXD114", "BXD115", "BXD116", "BXD117",
+        "BXD119", "BXD120", "BXD121", "BXD122", "BXD123", "BXD124", "BXD125",
+        "BXD126", "BXD127", "BXD128", "BXD128a", "BXD130", "BXD131", "BXD132",
+        "BXD133", "BXD134", "BXD135", "BXD136", "BXD137", "BXD138", "BXD139",
+        "BXD141", "BXD142", "BXD144", "BXD145", "BXD146", "BXD147", "BXD148",
+        "BXD149", "BXD150", "BXD151", "BXD152", "BXD153", "BXD154", "BXD155",
+        "BXD156", "BXD157", "BXD160", "BXD161", "BXD162", "BXD165", "BXD168",
+        "BXD169", "BXD170", "BXD171", "BXD172", "BXD173", "BXD174", "BXD175",
+        "BXD176", "BXD177", "BXD178", "BXD180", "BXD181", "BXD183", "BXD184",
+        "BXD186", "BXD187", "BXD188", "BXD189", "BXD190", "BXD191", "BXD192",
+        "BXD193", "BXD194", "BXD195", "BXD196", "BXD197", "BXD198", "BXD199",
+        "BXD200", "BXD201", "BXD202", "BXD203", "BXD204", "BXD205", "BXD206",
+        "BXD207", "BXD208", "BXD209", "BXD210", "BXD211", "BXD212", "BXD213",
+        "BXD214", "BXD215", "BXD216", "BXD217", "BXD218", "BXD219", "BXD220"
+    ]
     exported_traits_data_list = [
         export_trait_data(td, strains) for td in traits_data_list]
     slinked = slink(cluster_traits(exported_traits_data_list))
-- 
cgit v1.2.3


From b8777bcfee70325263d5389367e3a93ec2842f69 Mon Sep 17 00:00:00 2001
From: Muriithi Frederick Muriuki
Date: Mon, 30 Aug 2021 12:01:54 +0300
Subject: Update documentation on genotype files

* Provide documentation on downloading and using the genotype files.
---
 README.md | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

(limited to 'README.md')

diff --git a/README.md b/README.md
index b54015f..61ca539 100644
--- a/README.md
+++ b/README.md
@@ -14,7 +14,7 @@ guix environment --load=guix.scm
 Also, make sure you have the [guix-bioinformatics](https://git.genenetwork.org/guix-bioinformatics/guix-bioinformatics) channel set up.
 
 ```bash
-env GUIX_PACKAGE_PATH=~/guix-bioinformatics/ ~/.config/guix/current/bin/guix environment --load=guix.scm
+env GUIX_PACKAGE_PATH=~/guix-bioinformatics/ ~/.config/guix/current/bin/guix environment --expose=$HOME/genotype_files/ --load=guix.scm
 python3
   import redis
 ```
@@ -22,7 +22,7 @@ python3
 Better run a proper container
 
 ```
-env GUIX_PACKAGE_PATH=~/guix-bioinformatics/ ~/.config/guix/current/bin/guix environment -C --network --load=guix.scm
+env GUIX_PACKAGE_PATH=~/guix-bioinformatics/ ~/.config/guix/current/bin/guix environment -C --network --expose=$HOME/genotype_files/ --load=guix.scm
 ```
 
 If you get a Guix error, such as `ice-9/boot-9.scm:1669:16: In procedure raise-exception:
@@ -121,6 +121,19 @@ pip freeze --path venv/lib/python3.8/site-packages > requirements.txt
 
 ```
 
+## Genotype Files
+
+You can get the genotype files from http://ipfs.genenetwork.org/ipfs/QmXQy3DAUWJuYxubLHLkPMNCEVq1oV7844xWG2d1GSPFPL and save them on your host machine at, say `$HOME/genotype_files` with something like:
+
+```bash
+$ mkdir -p $HOME/genotype_files
+$ cd $HOME/genotype_files
+$ yes | 7z x genotype_files.tar.7z
+$ tar xf genotype_files.tar
+```
+
+The `genotype_files.tar.7z` file seems to only contain the **BXD.geno** genotype file.
+
 ## QTLReaper (rust-qtlreaper) and Trait Files
 
 To run QTL computations, this system makes use of the [rust-qtlreaper](https://github.com/chfi/rust-qtlreaper.git) utility.
-- 
cgit v1.2.3