summaryrefslogtreecommitdiff
path: root/issues/gemma/gemma-wrapper-has-incomplete-files.gmi
blob: 4bea71d539b190a8e10815ce99965af86ed59316 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# gemma-wrapper has incomplete files

Gemma wrapper caches files - but it can happen a cached file is incomplete and never updated again. The problem appears when GNU parallel is invoked and hits an error. The task here is to make gemma-wrapper transactional.

## Tags

* assigned: pjotrp, zachs

## Tasks

* [ ] parse parallel job log for failed tasks and remove the output files.
* [ ] create a (global) lock file for gemma-wrapper

## Info

GNU parallel can fail, but does not tell how individual processes did. Need to check if it can return a thread (number). If not we have the option of checking the GEMMA status file and/or see if the output file is complete (by counting number of lines).

Turns out GNU parallel can keep track of jobs in a job log - and even rerun the ones missing. The last we don't need because we are using a cache. But we can use the log file to remove any incomplete output files!

There is another parallel issue (pun intended) where gemma-wrapper is invoked twice for the same job. This is quite possible when people get impatient waiting for a first job to finish.

One solution is to write a lock file using the inputs as a hash. The lock file can contain a PID and we can check if that is still alive. I should do the same for sheepdog locks(!)