diff options
author | Muriithi Frederick Muriuki | 2021-08-20 10:37:58 +0300 |
---|---|---|
committer | Muriithi Frederick Muriuki | 2021-08-20 10:37:58 +0300 |
commit | 9f6dd5a02aea5751bd22185dfc5c90abf894a8a3 (patch) | |
tree | ff4d67716ae9766d85c087014db9a87444a5bbd8 /topics/gn1-migration-to-gn2/clustering.gmi | |
parent | deeb6dfc10a029b2a0ca1d65358e9bbca80c89f0 (diff) | |
download | gn-gemtext-9f6dd5a02aea5751bd22185dfc5c90abf894a8a3.tar.gz |
Update issue with write-up on heatmap image generation
* Work through the migrated code, and the old GN1 code and write-up a
flow of the data, to assist in figuring out the differences in the
data.
The write-up helped realise that the code migrated so far only helps
with the "distance" lines, and there is still need to figure out,
and compute or lay out the data needed for the actual heatmap
generation.
Diffstat (limited to 'topics/gn1-migration-to-gn2/clustering.gmi')
-rw-r--r-- | topics/gn1-migration-to-gn2/clustering.gmi | 62 |
1 files changed, 62 insertions, 0 deletions
diff --git a/topics/gn1-migration-to-gn2/clustering.gmi b/topics/gn1-migration-to-gn2/clustering.gmi index b18f231..186228f 100644 --- a/topics/gn1-migration-to-gn2/clustering.gmi +++ b/topics/gn1-migration-to-gn2/clustering.gmi @@ -147,3 +147,65 @@ Paused on heatmap generation to first test out the database access code. Added tests and fixed issues with older db-access code to get a sample of the data for drawing heatmaps. +## 2021-08-20 + +The data that seems to be used for drawing the actual heatmap is the following data from strains: + +* value +* variance +* N (I'm not sure what N is) + +this is retrieved with the + +=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/db/traits.py#L627-L668 `retrieve_trait_data` function + +which is then processed with the + +=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/computations/heatmap.py#L12-L77 `export_trait_data` function + +into a list of lists the example of which is as shown: + +``` +[(7.51879, 7.77141, 8.39265, 8.17443, 8.30401, 7.80944), (6.1427, 6.50588, 7.73705, 6.68328, 7.49293, 7.27398), (8.4211, 8.30581, 9.24076, 8.51173, 9.18455, 8.36077), (10.0904, 10.6509, 9.36716, 9.91202, 8.57444, 10.5731), (10.188, 9.76652, 9.54813, 9.05074, 9.52319, 9.10505), (6.74676, 7.01029, 7.54169, 6.48574, 7.01427, 7.26815), (6.39359, 6.85321, 5.78337, 7.11141, 6.22101, 6.16544), (6.84118, 7.08432, 7.59844, 7.08229, 7.26774, 7.24991), (9.45215, 10.6943, 8.64719, 10.1592, 7.75044, 8.78615), (7.04737, 6.87185, 7.58586, 6.92456, 6.84243, 7.36913)] +``` + +clustering the example data above with + +=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/computations/heatmap.py#L104-L126 the `cluster_traits` function + +gives + +``` +((0.0, 0.20337048635536847, 0.16381088984330505, 1.7388553629398245, 1.5025235756329178, 0.6952839500255574, 1.271661230252733, 0.2100487290977544, 1.4699690641062024, 0.7934461515867415), (0.20337048635536847, 0.0, 0.2198321044997198, 1.5753041735592204, 1.4815755944537086, 0.26087293140686374, 1.6939790104301427, 0.06024619831474998, 1.7430082449189215, 0.4497104244247795), (0.16381088984330505, 0.2198321044997198, 0.0, 1.9073926868549234, 1.0396738891139845, 0.5278328671176757, 1.6275069061182947, 0.2636503792482082, 1.739617877037615, 0.7127042590637039), (1.7388553629398245, 1.5753041735592204, 1.9073926868549234, 0.0, 0.9936846292920328, 1.1169999189889366, 0.6007483980555253, 1.430209221053372, 0.25879514152086425, 0.9313185954797953), (1.5025235756329178, 1.4815755944537086, 1.0396738891139845, 0.9936846292920328, 0.0, 1.027827186339337, 1.1441743109173244, 1.4122477962364253, 0.8968250491499363, 1.1683723389247052), (0.6952839500255574, 0.26087293140686374, 0.5278328671176757, 1.1169999189889366, 1.027827186339337, 0.0, 1.8420471110023269, 0.19179284676938602, 1.4875072385631605, 0.23451785425383564), (1.271661230252733, 1.6939790104301427, 1.6275069061182947, 0.6007483980555253, 1.1441743109173244, 1.8420471110023269, 0.0, 1.6540234785929928, 0.2140799896286565, 1.7413442197913358), (0.2100487290977544, 0.06024619831474998, 0.2636503792482082, 1.430209221053372, 1.4122477962364253, 0.19179284676938602, 1.6540234785929928, 0.0, 1.5225640692832796, 0.33370067057028485), (1.4699690641062024, 1.7430082449189215, 1.739617877037615, 0.25879514152086425, 0.8968250491499363, 1.4875072385631605, 0.2140799896286565, 1.5225640692832796, 0.0, 1.3256191648260216), (0.7934461515867415, 0.4497104244247795, 0.7127042590637039, 0.9313185954797953, 1.1683723389247052, 0.23451785425383564, 1.7413442197913358, 0.33370067057028485, 1.3256191648260216, 0.0)) +``` + +and that is then run through the + +=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/computations/slink.py#L140-L198 the `slink` function + +to give + +``` +[(((0, 2, 0.16381088984330505), ((1, 7, 0.06024619831474998), 5, 0.19179284676938602), 0.20337048635536847), 9, 0.23451785425383564), ((3, (6, 8, 0.2140799896286565), 0.25879514152086425), 4, 0.8968250491499363), 0.9313185954797953] +``` + +this, "slinked" data, I think, is what is used to draw the "distance" lines in + +=> ./heatmap.png the 'Cluster Traits' heatmap diagram + + +For the actual heatmap representation, it looks to me like the `neworder` variable initialised to an empty list in + +=> https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/Heatmap.py#L120 GN1's `buildCanvas` function + +is what is populated and used to draw the "cells" of the heatmap diagram: see + +=> https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/Heatmap.py#L206-L316 + +This has not yet been migrated over + +There **might** be need to migrate the + +=> https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/Heatmap.py#L419-L438 `getNearestMarker` function out + +So, it does seem like I had previously missed out on a lot of extra computation that still needs migration. |