diff options
-rw-r--r-- | issues/gemma/gemma-wrapper-has-incomplete-files.gmi | 22 |
1 files changed, 22 insertions, 0 deletions
diff --git a/issues/gemma/gemma-wrapper-has-incomplete-files.gmi b/issues/gemma/gemma-wrapper-has-incomplete-files.gmi new file mode 100644 index 0000000..4bea71d --- /dev/null +++ b/issues/gemma/gemma-wrapper-has-incomplete-files.gmi @@ -0,0 +1,22 @@ +# gemma-wrapper has incomplete files + +Gemma wrapper caches files - but it can happen a cached file is incomplete and never updated again. The problem appears when GNU parallel is invoked and hits an error. The task here is to make gemma-wrapper transactional. + +## Tags + +* assigned: pjotrp, zachs + +## Tasks + +* [ ] parse parallel job log for failed tasks and remove the output files. +* [ ] create a (global) lock file for gemma-wrapper + +## Info + +GNU parallel can fail, but does not tell how individual processes did. Need to check if it can return a thread (number). If not we have the option of checking the GEMMA status file and/or see if the output file is complete (by counting number of lines). + +Turns out GNU parallel can keep track of jobs in a job log - and even rerun the ones missing. The last we don't need because we are using a cache. But we can use the log file to remove any incomplete output files! + +There is another parallel issue (pun intended) where gemma-wrapper is invoked twice for the same job. This is quite possible when people get impatient waiting for a first job to finish. + +One solution is to write a lock file using the inputs as a hash. The lock file can contain a PID and we can check if that is still alive. I should do the same for sheepdog locks(!) |