diff options
-rw-r--r-- | topics/gn1-migration-to-gn2/clustering.gmi | 62 |
1 files changed, 62 insertions, 0 deletions
diff --git a/topics/gn1-migration-to-gn2/clustering.gmi b/topics/gn1-migration-to-gn2/clustering.gmi index b18f231..186228f 100644 --- a/topics/gn1-migration-to-gn2/clustering.gmi +++ b/topics/gn1-migration-to-gn2/clustering.gmi @@ -147,3 +147,65 @@ Paused on heatmap generation to first test out the database access code. Added tests and fixed issues with older db-access code to get a sample of the data for drawing heatmaps. +## 2021-08-20 + +The data that seems to be used for drawing the actual heatmap is the following data from strains: + +* value +* variance +* N (I'm not sure what N is) + +this is retrieved with the + +=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/db/traits.py#L627-L668 `retrieve_trait_data` function + +which is then processed with the + +=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/computations/heatmap.py#L12-L77 `export_trait_data` function + +into a list of lists the example of which is as shown: + +``` +[(7.51879, 7.77141, 8.39265, 8.17443, 8.30401, 7.80944), (6.1427, 6.50588, 7.73705, 6.68328, 7.49293, 7.27398), (8.4211, 8.30581, 9.24076, 8.51173, 9.18455, 8.36077), (10.0904, 10.6509, 9.36716, 9.91202, 8.57444, 10.5731), (10.188, 9.76652, 9.54813, 9.05074, 9.52319, 9.10505), (6.74676, 7.01029, 7.54169, 6.48574, 7.01427, 7.26815), (6.39359, 6.85321, 5.78337, 7.11141, 6.22101, 6.16544), (6.84118, 7.08432, 7.59844, 7.08229, 7.26774, 7.24991), (9.45215, 10.6943, 8.64719, 10.1592, 7.75044, 8.78615), (7.04737, 6.87185, 7.58586, 6.92456, 6.84243, 7.36913)] +``` + +clustering the example data above with + +=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/computations/heatmap.py#L104-L126 the `cluster_traits` function + +gives + +``` +((0.0, 0.20337048635536847, 0.16381088984330505, 1.7388553629398245, 1.5025235756329178, 0.6952839500255574, 1.271661230252733, 0.2100487290977544, 1.4699690641062024, 0.7934461515867415), (0.20337048635536847, 0.0, 0.2198321044997198, 1.5753041735592204, 1.4815755944537086, 0.26087293140686374, 1.6939790104301427, 0.06024619831474998, 1.7430082449189215, 0.4497104244247795), (0.16381088984330505, 0.2198321044997198, 0.0, 1.9073926868549234, 1.0396738891139845, 0.5278328671176757, 1.6275069061182947, 0.2636503792482082, 1.739617877037615, 0.7127042590637039), (1.7388553629398245, 1.5753041735592204, 1.9073926868549234, 0.0, 0.9936846292920328, 1.1169999189889366, 0.6007483980555253, 1.430209221053372, 0.25879514152086425, 0.9313185954797953), (1.5025235756329178, 1.4815755944537086, 1.0396738891139845, 0.9936846292920328, 0.0, 1.027827186339337, 1.1441743109173244, 1.4122477962364253, 0.8968250491499363, 1.1683723389247052), (0.6952839500255574, 0.26087293140686374, 0.5278328671176757, 1.1169999189889366, 1.027827186339337, 0.0, 1.8420471110023269, 0.19179284676938602, 1.4875072385631605, 0.23451785425383564), (1.271661230252733, 1.6939790104301427, 1.6275069061182947, 0.6007483980555253, 1.1441743109173244, 1.8420471110023269, 0.0, 1.6540234785929928, 0.2140799896286565, 1.7413442197913358), (0.2100487290977544, 0.06024619831474998, 0.2636503792482082, 1.430209221053372, 1.4122477962364253, 0.19179284676938602, 1.6540234785929928, 0.0, 1.5225640692832796, 0.33370067057028485), (1.4699690641062024, 1.7430082449189215, 1.739617877037615, 0.25879514152086425, 0.8968250491499363, 1.4875072385631605, 0.2140799896286565, 1.5225640692832796, 0.0, 1.3256191648260216), (0.7934461515867415, 0.4497104244247795, 0.7127042590637039, 0.9313185954797953, 1.1683723389247052, 0.23451785425383564, 1.7413442197913358, 0.33370067057028485, 1.3256191648260216, 0.0)) +``` + +and that is then run through the + +=> https://github.com/genenetwork/genenetwork3/blob/main/gn3/computations/slink.py#L140-L198 the `slink` function + +to give + +``` +[(((0, 2, 0.16381088984330505), ((1, 7, 0.06024619831474998), 5, 0.19179284676938602), 0.20337048635536847), 9, 0.23451785425383564), ((3, (6, 8, 0.2140799896286565), 0.25879514152086425), 4, 0.8968250491499363), 0.9313185954797953] +``` + +this, "slinked" data, I think, is what is used to draw the "distance" lines in + +=> ./heatmap.png the 'Cluster Traits' heatmap diagram + + +For the actual heatmap representation, it looks to me like the `neworder` variable initialised to an empty list in + +=> https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/Heatmap.py#L120 GN1's `buildCanvas` function + +is what is populated and used to draw the "cells" of the heatmap diagram: see + +=> https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/Heatmap.py#L206-L316 + +This has not yet been migrated over + +There **might** be need to migrate the + +=> https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/heatmap/Heatmap.py#L419-L438 `getNearestMarker` function out + +So, it does seem like I had previously missed out on a lot of extra computation that still needs migration. |