diff options
| author | Pjotr Prins | 2025-11-24 13:06:50 +0100 |
|---|---|---|
| committer | Pjotr Prins | 2025-11-24 13:06:50 +0100 |
| commit | f03c82ea21acda54de8cced07ba8150cfafb3769 (patch) | |
| tree | 2432c99cbfed02f3fe9a84a5b55643aff44c1bdb /test | |
| parent | c5a402a651d3c6393b1f758fc011c7247e4f042f (diff) | |
| download | pangemma-f03c82ea21acda54de8cced07ba8150cfafb3769.tar.gz | |
Added profiler and figured speed regression with openblas
Diffstat (limited to 'test')
| -rw-r--r-- | test/performance/releases.org | 45 | ||||
| -rwxr-xr-x | test/runner | 24 |
2 files changed, 69 insertions, 0 deletions
diff --git a/test/performance/releases.org b/test/performance/releases.org index b208e54..c973607 100644 --- a/test/performance/releases.org +++ b/test/performance/releases.org @@ -1,5 +1,50 @@ * GEMMA performance stats +** GEMMA 0.98.5 + +Measurements taken on a recent AMD Ryzen 7 3700X 8-Core Processor @2.195GHz. + +We are facing a time regression. + +premake5 gmake2 && make verbose=1 config=release -j 8 gemma && time LD_LIBRARY_PATH=$GUIX_ENVIRONMENT/lib ./build/bin/Release/gemma -g ./example/mouse_hs1940.geno.txt.gz -p ./example/mouse_hs1940.pheno.txt -n 1 -a ./example/mouse_hs1940.anno.txt -k ./output/result.cXX.txt -lmm -no-check -debug + + +#+begin_src sh +Pangemma --- GEMMA 0.98.5 compatible executable 1.0.0 (2025-11-22) with guile 3.0.9 by Xiang Zhou, Pjotr Prins and team (C) 2012-2025 +Reading Files ... +## number of total individuals = 1940 +## number of analyzed individuals = 1410 +## number of covariates = 1 +## number of phenotypes = 1 +## number of total SNPs/var = 12226 +## number of analyzed SNPs = 10768 +Start Eigen-Decomposition... +pve estimate =0.608801 +se(pve) =0.032774 +================================================== 100% +real 0m16.772s +user 0m25.443s +sys 0m0.901s +#+end_src sh + +The output looks the same. Good. So far the first difference is a much later openblas 0.3.30 (over 0.3.9). In the source code we added checkpoints and more debugging, particularly write statements. I disabled the latter, but still no dice. + +When compiled with the profile library prefix the gemma run with + +#+begin_src sh +CPUPROFILE=gemma.prof +pprof --text build/bin/Debug/gemma gemma.prof + + 1024 50.7% 50.7% 1024 50.7% dcopy_k_ZEN + 99 4.9% 55.6% 99 4.9% openblas_read_env + 67 3.3% 58.9% 107 5.3% ____strtod_l_internal + 67 3.3% 62.3% 67 3.3% gsl_vector_div +#+end_src sh + +this led me to try the newer openblas on the older gemma - and indeed, the regression is coming from the openblas version. Even though it says 'OpenBLAS 0.3.30 DYNAMIC_ARCH NO_AFFINITY Zen MAX_THREADS=128' I suspect the dynamic arch is not really optimizing. + +Well, at least I found the problem. Time for a special openblas build like I used to do. + ** GEMMA 0.98.5-pre1 Measurements taken on a recent AMD Ryzen 7 3700X 8-Core Processor @2.195GHz. diff --git a/test/runner b/test/runner new file mode 100755 index 0000000..ad5b381 --- /dev/null +++ b/test/runner @@ -0,0 +1,24 @@ +#!/bin/sh +# -*- mode: scheme; -*- +exec guile --debug -s "$0" "$@" +!# + +(define-module (test-runner) + #:use-module (ice-9 match) + #:use-module (srfi srfi-64) + ) + +(test-begin "runner") + +(test-begin "vec-test") +(define v (make-vector 5 99)) +;; Require that an expression evaluate to true. +(test-assert (vector? v)) +;; Test that an expression is eqv? to some other expression. +(test-eqv 99 (vector-ref v 2)) +(vector-set! v 2 7) +(test-eqv 7 (vector-ref v 2)) +;; Finish the testsuite, and report results. +(test-end "vec-test") + +(test-end "runner") |
