aboutsummaryrefslogtreecommitdiff
path: root/INSTALL.md
diff options
context:
space:
mode:
Diffstat (limited to 'INSTALL.md')
-rw-r--r--INSTALL.md45
1 files changed, 44 insertions, 1 deletions
diff --git a/INSTALL.md b/INSTALL.md
index e450a2a..897b062 100644
--- a/INSTALL.md
+++ b/INSTALL.md
@@ -14,7 +14,7 @@ GEMMA runs on Linux and MAC OSX and the runtime has the following
dependencies:
* C++ tool chain >= 4.9
-* GNU Science library (GSL) 1.x (GEMMA does not currently work with GSL >= 2).
+* GNU Science library (GSL) 1.x (note that 2.x is not yet supported)
* blas/openblas
* lapack
* [Eigen3 library](http://eigen.tuxfamily.org/dox/)
@@ -100,3 +100,46 @@ GEMMA includes the shunit2 test framework (version 2.0).
or
./run_tests.sh
+
+## Optimizing performance
+
+### OpenBlas
+
+Linking against a built-from-source OpenBlas is a first optimization
+step because it will optimize code for the local architecture. When
+you check the output .log file of GEMMA after a run, it will tell you
+how the linked-in OpenBlas was compiled.
+
+To link a new version, compile OpenBlas as per
+[instructions](http://www.openblas.net/). You can start with the default:
+
+ make -j 4
+
+or play with the switches
+
+ make USE_THREAD=1 NUM_THREADS=16 NO_AFFINITY=1 -j 4
+
+rendering for example:
+
+ OpenBLAS build complete. (BLAS CBLAS)
+ OS ... Linux
+ Architecture ... x86_64
+ BINARY ... 64bit
+ C compiler ... GCC (command line : gcc)
+ Library Name ... libopenblas_haswellp-r0.3.0.dev.a (Multi threaded; Max num-threads is 16)
+
+ To install the library, you can run "make PREFIX=/path/to/your/installation install".
+
+
+This generates a static library which you can link using the full path
+with using the GEMMA Makefile:
+
+ time env OPENBLAS_NUM_THREADS=4 make EIGEN_INCLUDE_PATH=~/.guix-profile/include/eigen3 LIBS="~/tmp/OpenBLAS/libopenblas_haswellp-r0.3.0.dev.a -lgsl -lgslcblas -pthread -lz -llapack" WITH_OPENBLAS=1 -j 4 fast-check
+
+ make EIGEN_INCLUDE_PATH=~/.guix-profile/include/eigen3 LIBS="~/tmp/OpenBLAS/libopenblas_haswellp-r0.3.0.dev.a -lgsl -lgslcblas -pthread -lz -llapack" WITH_OPENBLAS=1 -j 4 unittests
+
+Batch of 1000:
+
+real 4m24.923s
+user 4m33.576s
+sys 0m11.004s