1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
|
># RQTL Implementation for GeneNetwork Design Proposal
## Tags
* Assigned: alexm,
* Keywords: RQTL, GeneNetwork2, Design
* Type: Enhancements,
* Status: In Progress
## Description
This document outlines the design proposal for the re-implementation of the RQTL feature in GeneNetwork providing also a console view to track the external process.
### Problem Definition
The current RQTL implementation faces the following challenges:
- Lack of adequate error handling for the API and scripts.
- Insufficient separation of concerns between GN2 and GN3.
- lack way for user to track the progress of the r-qtl script being executed
- There is lack of a clear way in which the r-qtl script is executed
We will address these challenges and add enhancements by:
- Rewriting the R script using r-qtl2 instead of r-qtl.
- Establishing clear separation of concerns between GN2 and GN3, eliminating file path transfers between the two.
- Implementing better error handling for both the API and the RQTL script.
- run the script as a job in a task queue
- Piping stdout from the script to the browser through a console for real-time monitoring.
- Improving the overall design and architecture of the system.
- The system requires a cleaner architecture for future use.
## High-Level Design
This is divided into two major components:
* RQTL Api
* Monitoring system for the rqtl script
### RQTL Api
This component will serve as the entry point for running RQTL in GN3. At this stage, we need to improve the overall architecture and error handling. This process will be divided into the following steps:
- Data Validation
In this step, we must validate that all required data to run RQTL is provided in the JSON format. This includes the mapping method, genotype file, phenotype file, etc. Please refer to the r-qtl2 documentation for an overview on the requirements :
=> https://rqtl.org/
- Data Preprocessing
During this stage, we will transform the data into a format that R can understand. This includes converting boolean values to the appropriate representations, preparing the RQTL command with all required values, and adding defaults where necessary.
- Data Computation
In this stage, we will pass the RQTL script command to the task queue to run as a job.
- Output Data Processing
In this step, we need to retrieve the results outputted from the script in a specified format, such as JSON or CSV. This may include outputs like RQTL pair scans and generated diagrams. Please refer to the documentation for an overview:
=> https://rqtl.org/
**Subtasks:**
- [ ] add the rqtl api endpoint (10%)
- [ ] Input Data validation (15%)
- [ ] Input data processing (20%)
- [ ] Passing data to r-script for the computation (40%)
- [ ] output data processing (80%)
-[ ] add unittests for this module (100%)
### Monitoring system for the rqtl script
This component involves creating a monitoring system to track the state of the external process and output relevant information to the user.
We need a way to determine the status for the current job for example
QUEUED, STARTED, INPROGRESS, COMPLETED
## Deep Dive
### Running the External Script
The RQTL implementation is in R, and we need a strategy for executing this script as an external process. This can be subdivided into several key steps:
- **Task Queue Integration**:
- We will utilize a task queue system ,
We already have an implementation in gn3
to manage script execution
- https://github.com/genenetwork/genenetwork3/blob/0820295202c2fe747c05b93ce0f1c5a604442f69/gn3/commands.py#L101
- **Job Submission**:
- Each API call will create a new job in the task queue, which will handle the execution of the R script.
- **Script Execution**:
- This stage involves executing the R script in a controlled environment, ensuring all necessary dependencies are loaded.
- **Monitoring and Logging**:
- The system will have monitoring tools to track the status of each job. Users will receive real-time updates on job progress and logs for the current task.
- **Result Retrieval**:
- Once the R script completes (either successfully or with an error), results will be returned to the API call.
- **Error Handling**:
- Better error handling will be implemented to manage potential issues during script execution. This includes capturing errors from the R script and providing meaningful feedback to users through the application.
### Additional Error Handling Considerations
This will involve:
* API error handling
* Error handling within the R script
## Additional UI Considerations
We need to rethink where to output the external process logs in the UI. Currently, we can add flags to the URL to enable this functionality, e.g., `URL/page&flags&console=1`.
Also the design suggestion is to out the results in a terminal emulator for
example xterm ,See more: https://xtermjs.org/, A current implementation already exists
for gn3 see
=> https://github.com/genenetwork/genenetwork2/blob/abe324888fc3942d4b3469ec8d1ce2c7dcbd8a93/gn2/wqflask/templates/wgcna_setup.html#L89
### Current Design Suggestions:
#### With HTMX, offer a split screen
This will include an output page and a monitoring system page.
#### Popup button for preview
A button that allows users to preview and hide the console output.
## Long-Term Goals
We aim to run computations on clusters rather than locally. This project will serve as a pioneer for that approach.
## Related Issues
=> https://issues.genenetwork.org/topics/lmms/rqtl2/using-rqtl2
### Tasks
* stage 1 *
- [ ] Implement the RQTL API endpoints
- [ ] validation and preprocessing for data from the client
- [ ] Implement state-of-the-art error handling
- [ ] Add unit tests for the rqtl api module
- [ ] Make improvements to the current R script if possible
* stage 2 *
- [ ] Task queue integration (refer to the Deep Dive section)
- [ ] Implement a monitoring and logging system for job execution (refer to the deep dive section
- [ ] Fetch results from running jobs
- [ ] Processing output from the external script
* stage 3 *
- [ ] Implement a console preview UI for user feedback
- [ ] Refactor the GN2 UI
* stage 4 *
- [ ] Run this computation on clusters
|