summaryrefslogtreecommitdiff
path: root/issues
diff options
context:
space:
mode:
Diffstat (limited to 'issues')
-rw-r--r--issues/gemma/gemma-wrapper-has-incomplete-files.gmi22
1 files changed, 22 insertions, 0 deletions
diff --git a/issues/gemma/gemma-wrapper-has-incomplete-files.gmi b/issues/gemma/gemma-wrapper-has-incomplete-files.gmi
new file mode 100644
index 0000000..4bea71d
--- /dev/null
+++ b/issues/gemma/gemma-wrapper-has-incomplete-files.gmi
@@ -0,0 +1,22 @@
+# gemma-wrapper has incomplete files
+
+Gemma wrapper caches files - but it can happen a cached file is incomplete and never updated again. The problem appears when GNU parallel is invoked and hits an error. The task here is to make gemma-wrapper transactional.
+
+## Tags
+
+* assigned: pjotrp, zachs
+
+## Tasks
+
+* [ ] parse parallel job log for failed tasks and remove the output files.
+* [ ] create a (global) lock file for gemma-wrapper
+
+## Info
+
+GNU parallel can fail, but does not tell how individual processes did. Need to check if it can return a thread (number). If not we have the option of checking the GEMMA status file and/or see if the output file is complete (by counting number of lines).
+
+Turns out GNU parallel can keep track of jobs in a job log - and even rerun the ones missing. The last we don't need because we are using a cache. But we can use the log file to remove any incomplete output files!
+
+There is another parallel issue (pun intended) where gemma-wrapper is invoked twice for the same job. This is quite possible when people get impatient waiting for a first job to finish.
+
+One solution is to write a lock file using the inputs as a hash. The lock file can contain a PID and we can check if that is still alive. I should do the same for sheepdog locks(!)