diff options
Diffstat (limited to 'test/performance')
| -rw-r--r-- | test/performance/releases.org | 45 |
1 files changed, 45 insertions, 0 deletions
diff --git a/test/performance/releases.org b/test/performance/releases.org index b208e54..c973607 100644 --- a/test/performance/releases.org +++ b/test/performance/releases.org @@ -1,5 +1,50 @@ * GEMMA performance stats +** GEMMA 0.98.5 + +Measurements taken on a recent AMD Ryzen 7 3700X 8-Core Processor @2.195GHz. + +We are facing a time regression. + +premake5 gmake2 && make verbose=1 config=release -j 8 gemma && time LD_LIBRARY_PATH=$GUIX_ENVIRONMENT/lib ./build/bin/Release/gemma -g ./example/mouse_hs1940.geno.txt.gz -p ./example/mouse_hs1940.pheno.txt -n 1 -a ./example/mouse_hs1940.anno.txt -k ./output/result.cXX.txt -lmm -no-check -debug + + +#+begin_src sh +Pangemma --- GEMMA 0.98.5 compatible executable 1.0.0 (2025-11-22) with guile 3.0.9 by Xiang Zhou, Pjotr Prins and team (C) 2012-2025 +Reading Files ... +## number of total individuals = 1940 +## number of analyzed individuals = 1410 +## number of covariates = 1 +## number of phenotypes = 1 +## number of total SNPs/var = 12226 +## number of analyzed SNPs = 10768 +Start Eigen-Decomposition... +pve estimate =0.608801 +se(pve) =0.032774 +================================================== 100% +real 0m16.772s +user 0m25.443s +sys 0m0.901s +#+end_src sh + +The output looks the same. Good. So far the first difference is a much later openblas 0.3.30 (over 0.3.9). In the source code we added checkpoints and more debugging, particularly write statements. I disabled the latter, but still no dice. + +When compiled with the profile library prefix the gemma run with + +#+begin_src sh +CPUPROFILE=gemma.prof +pprof --text build/bin/Debug/gemma gemma.prof + + 1024 50.7% 50.7% 1024 50.7% dcopy_k_ZEN + 99 4.9% 55.6% 99 4.9% openblas_read_env + 67 3.3% 58.9% 107 5.3% ____strtod_l_internal + 67 3.3% 62.3% 67 3.3% gsl_vector_div +#+end_src sh + +this led me to try the newer openblas on the older gemma - and indeed, the regression is coming from the openblas version. Even though it says 'OpenBLAS 0.3.30 DYNAMIC_ARCH NO_AFFINITY Zen MAX_THREADS=128' I suspect the dynamic arch is not really optimizing. + +Well, at least I found the problem. Time for a special openblas build like I used to do. + ** GEMMA 0.98.5-pre1 Measurements taken on a recent AMD Ryzen 7 3700X 8-Core Processor @2.195GHz. |
