summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMuriithi Frederick Muriuki2021-08-20 10:37:58 +0300
committerMuriithi Frederick Muriuki2021-08-20 10:37:58 +0300
commit9f6dd5a02aea5751bd22185dfc5c90abf894a8a3 (patch)
treeff4d67716ae9766d85c087014db9a87444a5bbd8
parentdeeb6dfc10a029b2a0ca1d65358e9bbca80c89f0 (diff)
downloadgn-gemtext-9f6dd5a02aea5751bd22185dfc5c90abf894a8a3.tar.gz
Update issue with write-up on heatmap image generation
* Work through the migrated code, and the old GN1 code and write-up a flow of the data, to assist in figuring out the differences in the data. The write-up helped realise that the code migrated so far only helps with the "distance" lines, and there is still need to figure out, and compute or lay out the data needed for the actual heatmap generation.
-rw-r--r--topics/gn1-migration-to-gn2/clustering.gmi62
1 files changed, 62 insertions, 0 deletions
diff --git a/topics/gn1-migration-to-gn2/clustering.gmi b/topics/gn1-migration-to-gn2/clustering.gmi
index b18f231..186228f 100644
--- a/topics/gn1-migration-to-gn2/clustering.gmi
+++ b/topics/gn1-migration-to-gn2/clustering.gmi
@@ -147,3 +147,65 @@ Paused on heatmap generation to first test out the database access code.
Added tests and fixed issues with older db-access code to get a sample of the data for drawing heatmaps.
+## 2021-08-20
+
+The data that seems to be used for drawing the actual heatmap is the following data from strains:
+
+* value
+* variance
+* N (I'm not sure what N is)
+
+this is retrieved with the
+
+=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/db/traits.py#L627-L668 `retrieve_trait_data` function
+
+which is then processed with the
+
+=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/computations/heatmap.py#L12-L77 `export_trait_data` function
+
+into a list of lists the example of which is as shown:
+
+```
+[(7.51879, 7.77141, 8.39265, 8.17443, 8.30401, 7.80944), (6.1427, 6.50588, 7.73705, 6.68328, 7.49293, 7.27398), (8.4211, 8.30581, 9.24076, 8.51173, 9.18455, 8.36077), (10.0904, 10.6509, 9.36716, 9.91202, 8.57444, 10.5731), (10.188, 9.76652, 9.54813, 9.05074, 9.52319, 9.10505), (6.74676, 7.01029, 7.54169, 6.48574, 7.01427, 7.26815), (6.39359, 6.85321, 5.78337, 7.11141, 6.22101, 6.16544), (6.84118, 7.08432, 7.59844, 7.08229, 7.26774, 7.24991), (9.45215, 10.6943, 8.64719, 10.1592, 7.75044, 8.78615), (7.04737, 6.87185, 7.58586, 6.92456, 6.84243, 7.36913)]
+```
+
+clustering the example data above with
+
+=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/computations/heatmap.py#L104-L126 the `cluster_traits` function
+
+gives
+
+```
+((0.0, 0.20337048635536847, 0.16381088984330505, 1.7388553629398245, 1.5025235756329178, 0.6952839500255574, 1.271661230252733, 0.2100487290977544, 1.4699690641062024, 0.7934461515867415), (0.20337048635536847, 0.0, 0.2198321044997198, 1.5753041735592204, 1.4815755944537086, 0.26087293140686374, 1.6939790104301427, 0.06024619831474998, 1.7430082449189215, 0.4497104244247795), (0.16381088984330505, 0.2198321044997198, 0.0, 1.9073926868549234, 1.0396738891139845, 0.5278328671176757, 1.6275069061182947, 0.2636503792482082, 1.739617877037615, 0.7127042590637039), (1.7388553629398245, 1.5753041735592204, 1.9073926868549234, 0.0, 0.9936846292920328, 1.1169999189889366, 0.6007483980555253, 1.430209221053372, 0.25879514152086425, 0.9313185954797953), (1.5025235756329178, 1.4815755944537086, 1.0396738891139845, 0.9936846292920328, 0.0, 1.027827186339337, 1.1441743109173244, 1.4122477962364253, 0.8968250491499363, 1.1683723389247052), (0.6952839500255574, 0.26087293140686374, 0.5278328671176757, 1.1169999189889366, 1.027827186339337, 0.0, 1.8420471110023269, 0.19179284676938602, 1.4875072385631605, 0.23451785425383564), (1.271661230252733, 1.6939790104301427, 1.6275069061182947, 0.6007483980555253, 1.1441743109173244, 1.8420471110023269, 0.0, 1.6540234785929928, 0.2140799896286565, 1.7413442197913358), (0.2100487290977544, 0.06024619831474998, 0.2636503792482082, 1.430209221053372, 1.4122477962364253, 0.19179284676938602, 1.6540234785929928, 0.0, 1.5225640692832796, 0.33370067057028485), (1.4699690641062024, 1.7430082449189215, 1.739617877037615, 0.25879514152086425, 0.8968250491499363, 1.4875072385631605, 0.2140799896286565, 1.5225640692832796, 0.0, 1.3256191648260216), (0.7934461515867415, 0.4497104244247795, 0.7127042590637039, 0.9313185954797953, 1.1683723389247052, 0.23451785425383564, 1.7413442197913358, 0.33370067057028485, 1.3256191648260216, 0.0))
+```
+
+and that is then run through the
+
+=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/computations/slink.py#L140-L198 the `slink` function
+
+to give
+
+```
+[(((0, 2, 0.16381088984330505), ((1, 7, 0.06024619831474998), 5, 0.19179284676938602), 0.20337048635536847), 9, 0.23451785425383564), ((3, (6, 8, 0.2140799896286565), 0.25879514152086425), 4, 0.8968250491499363), 0.9313185954797953]
+```
+
+this, "slinked" data, I think, is what is used to draw the "distance" lines in
+
+=> ./heatmap.png the 'Cluster Traits' heatmap diagram
+
+
+For the actual heatmap representation, it looks to me like the `neworder` variable initialised to an empty list in
+
+=> https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/Heatmap.py#L120 GN1's `buildCanvas` function
+
+is what is populated and used to draw the "cells" of the heatmap diagram: see
+
+=> https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/Heatmap.py#L206-L316
+
+This has not yet been migrated over
+
+There **might** be need to migrate the
+
+=> https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/Heatmap.py#L419-L438 `getNearestMarker` function out
+
+So, it does seem like I had previously missed out on a lot of extra computation that still needs migration.