# Background Jobs We run background jobs for long-running processes, e.g. quality-assurance checks across multiple huge files, inserting huge data to databases, etc. The system needs to keep track of the progress of these jobs and communicate the state to the user whenever the user requests. This details some thoughts on how to handle these jobs, especially in failure conditions. We currently use Redis[^redis] to keep track of the state of the background processes. Every background job started will have a Redis[^redis] key with the prefix `gn-uploader:jobs` ## Users Currently (2024-10-23T13:29UTC-05:00), we do not track the user that started the job. Moving forward, we will track this information. We could have the keys be something like, `gn-uploader:jobs::`. Another option is track any particular users jobs with a key of the form `gn-uploader:users::jobs` and in that case, have the job keys take the form `gn-uploader:jobs:`. I (@fredmanglis) favour this option over having the user's ID in the jobs keys directly, since it provides a way to interact with **ALL** the jobs without indirecting through each specific user. This is a useful ability to have, especially for system administrative tasks. ## Multiprocessing Within Jobs Some jobs, e.g. quality-assurance jobs, can run multiple threads/processes themselves. This brings up a problem because Redis[^redis] does not allow parallel access to a key, especially for writing. We also do not want to create bottlenecks by writing to the same key from multiple threads/processes. The design I have currently come up with, that might work is as follows: - At any point just before where multiple threads/processes are started, a list of new keys, each of which will collect the output from a single thread, will be built. - These keys are recorded in the parent's redis key data - The threads/processes are started and do whatever they need, pushing their outputs to the appropriate keys within redis. The new keys for the children threads/processe could build on the theme ## Fetching Jobs Status Different jobs could have different ways of requirements for handling/processing their outputs, and those of any children they might spawn. The system will need to provide a way to pass in the correct function/code to process the outputs at the point where the job status is requested. This implies that we need to track the type of job in order to be able to select the correct code for processing such output. ## Links - [^redis]: https://redis.io/