From 3787440fb80f3d0e0f6834aaf9fda07d6dd9b2e1 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sun, 14 Nov 2021 16:15:19 -0600 Subject: gemma-wrapper transactions --- .../gemma/gemma-wrapper-has-incomplete-files.gmi | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 issues/gemma/gemma-wrapper-has-incomplete-files.gmi diff --git a/issues/gemma/gemma-wrapper-has-incomplete-files.gmi b/issues/gemma/gemma-wrapper-has-incomplete-files.gmi new file mode 100644 index 0000000..4bea71d --- /dev/null +++ b/issues/gemma/gemma-wrapper-has-incomplete-files.gmi @@ -0,0 +1,22 @@ +# gemma-wrapper has incomplete files + +Gemma wrapper caches files - but it can happen a cached file is incomplete and never updated again. The problem appears when GNU parallel is invoked and hits an error. The task here is to make gemma-wrapper transactional. + +## Tags + +* assigned: pjotrp, zachs + +## Tasks + +* [ ] parse parallel job log for failed tasks and remove the output files. +* [ ] create a (global) lock file for gemma-wrapper + +## Info + +GNU parallel can fail, but does not tell how individual processes did. Need to check if it can return a thread (number). If not we have the option of checking the GEMMA status file and/or see if the output file is complete (by counting number of lines). + +Turns out GNU parallel can keep track of jobs in a job log - and even rerun the ones missing. The last we don't need because we are using a cache. But we can use the log file to remove any incomplete output files! + +There is another parallel issue (pun intended) where gemma-wrapper is invoked twice for the same job. This is quite possible when people get impatient waiting for a first job to finish. + +One solution is to write a lock file using the inputs as a hash. The lock file can contain a PID and we can check if that is still alive. I should do the same for sheepdog locks(!) -- cgit v1.2.3