about summary refs log tree commit diff
path: root/test/performance
diff options
context:
space:
mode:
Diffstat (limited to 'test/performance')
-rw-r--r--test/performance/releases.org45
1 files changed, 45 insertions, 0 deletions
diff --git a/test/performance/releases.org b/test/performance/releases.org
index b208e54..c973607 100644
--- a/test/performance/releases.org
+++ b/test/performance/releases.org
@@ -1,5 +1,50 @@
 * GEMMA performance stats
 
+** GEMMA 0.98.5
+
+Measurements taken on a recent AMD Ryzen 7 3700X 8-Core Processor @2.195GHz.
+
+We are facing a time regression.
+
+premake5 gmake2 && make verbose=1 config=release -j 8 gemma && time LD_LIBRARY_PATH=$GUIX_ENVIRONMENT/lib ./build/bin/Release/gemma -g ./example/mouse_hs1940.geno.txt.gz -p ./example/mouse_hs1940.pheno.txt -n 1 -a ./example/mouse_hs1940.anno.txt -k ./output/result.cXX.txt -lmm -no-check -debug
+
+
+#+begin_src sh
+Pangemma --- GEMMA 0.98.5 compatible executable 1.0.0 (2025-11-22) with guile 3.0.9 by Xiang Zhou, Pjotr Prins and team (C) 2012-2025
+Reading Files ...
+## number of total individuals = 1940
+## number of analyzed individuals = 1410
+## number of covariates = 1
+## number of phenotypes = 1
+## number of total SNPs/var        =    12226
+## number of analyzed SNPs         =    10768
+Start Eigen-Decomposition...
+pve estimate =0.608801
+se(pve) =0.032774
+================================================== 100%
+real    0m16.772s
+user    0m25.443s
+sys     0m0.901s
+#+end_src sh
+
+The output looks the same. Good. So far the first difference is a much later openblas 0.3.30 (over 0.3.9). In the source code we added checkpoints and more debugging, particularly write statements. I disabled the latter, but still no dice.
+
+When compiled with the profile library prefix the gemma run with
+
+#+begin_src sh
+CPUPROFILE=gemma.prof
+pprof --text build/bin/Debug/gemma gemma.prof
+
+    1024  50.7%  50.7%     1024  50.7% dcopy_k_ZEN
+      99   4.9%  55.6%       99   4.9% openblas_read_env
+      67   3.3%  58.9%      107   5.3% ____strtod_l_internal
+      67   3.3%  62.3%       67   3.3% gsl_vector_div
+#+end_src sh
+
+this led me to try the newer openblas on the older gemma - and indeed, the regression is coming from the openblas version. Even though it says 'OpenBLAS 0.3.30 DYNAMIC_ARCH NO_AFFINITY Zen MAX_THREADS=128' I suspect the dynamic arch is not really optimizing.
+
+Well, at least I found the problem. Time for a special openblas build like I used to do.
+
 ** GEMMA 0.98.5-pre1
 
 Measurements taken on a recent AMD Ryzen 7 3700X 8-Core Processor @2.195GHz.