about summary refs log tree commit diff
path: root/test
diff options
context:
space:
mode:
Diffstat (limited to 'test')
-rw-r--r--test/performance/releases.org12
1 files changed, 7 insertions, 5 deletions
diff --git a/test/performance/releases.org b/test/performance/releases.org
index c973607..af0cbb7 100644
--- a/test/performance/releases.org
+++ b/test/performance/releases.org
@@ -29,16 +29,18 @@ sys     0m0.901s
 
 The output looks the same. Good. So far the first difference is a much later openblas 0.3.30 (over 0.3.9). In the source code we added checkpoints and more debugging, particularly write statements. I disabled the latter, but still no dice.
 
-When compiled with the profile library prefix the gemma run with
+When compiled with the profiler library prefix the gemma run with
 
 #+begin_src sh
+premake5 gmake2 && make verbose=1 config=debug -j 8 gemma && time CPUPROFILE=gemma.prof LD_LIBRARY_PATH=$GUIX_ENVIRONMENT/lib ./build/bin/Debug/gemma -g ./example/mouse_hs1940.geno.txt.gz -p ./example/mouse_hs1940.pheno.txt -n 1 -a ./example/mouse_hs1940.anno.txt -k ./output/result.cXX.txt -lmm -no-check -debug
 CPUPROFILE=gemma.prof
 pprof --text build/bin/Debug/gemma gemma.prof
 
-    1024  50.7%  50.7%     1024  50.7% dcopy_k_ZEN
-      99   4.9%  55.6%       99   4.9% openblas_read_env
-      67   3.3%  58.9%      107   5.3% ____strtod_l_internal
-      67   3.3%  62.3%       67   3.3% gsl_vector_div
+    1007  49.2%  49.2%     1015  49.6% dot_compute
+      94   4.6%  53.8%       94   4.6% rpcc
+      74   3.6%  57.5%       74   3.6% gsl_vector_div
+      62   3.0%  60.5%       92   4.5% ____strtod_l_internal
+      42   2.1%  62.5%       42   2.1% dgemm_kernel_ZEN
 #+end_src sh
 
 this led me to try the newer openblas on the older gemma - and indeed, the regression is coming from the openblas version. Even though it says 'OpenBLAS 0.3.30 DYNAMIC_ARCH NO_AFFINITY Zen MAX_THREADS=128' I suspect the dynamic arch is not really optimizing.