1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
|
# Investigate and Fix `rm` Command in `rqtl` Logs
## Tags
* assigned: alex, bonfacem
* type: Bug
* status: in progress
* keywords: external, qtl, rqtl, bug, logs
## Description
For QTL analysis, we invoke the `rqtl` script as an external process through Python's `subprocess` module.
For reference, see the `rqtl_wrapper.R` script:
=> https://github.com/genenetwork/genenetwork3/blob/main/scripts/rqtl_wrapper.R
The issue is that, upon analyzing the logs for `rqtl`, we see that an `rm` command is unexpectedly invoked:
```
sh: line 1: rm: command not found
```
This command cannot be traced to its origin, and it does not appear to be part of the expected behavior.
The issue is currently observed only in the CD environment. The only way I have attempted to reproduce this locally is by invoking the command in a shell environment with string injection, which is not the case for GeneNetwork3, where all strings are parsed and passed as a list argument.
Here’s an example of the above attempt:
```python
def run_process(cmd, output_file, run_id):
"""Function to execute an external process and capture the stdout in a file.
Args:
cmd: The command to execute, provided as a list of arguments.
output_file: Absolute file path to write the stdout.
run_id: Unique ID to identify the process.
Returns:
A dictionary with the results, indicating success or failure.
"""
cmd.append(" && rm") # Injecting potentially problematic command
cmd = " ".join(cmd) # The command is passed as a string
try:
# Phase: Execute the command in a shell environment
with subprocess.Popen(
cmd,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
) as process:
# Process output handling goes here
```
The error generated at the end of the `rqtl` if the rm run does not exists inside the container is:
```
sh: line 1: rm: command not found
```
The actual code for GeneNetwork3 is:
```python
def run_process(cmd, output_file, run_id):
"""Function to execute an external process and capture the stdout in a file.
Args:
cmd: The command to execute, provided as a list of arguments.
output_file: Absolute file path to write the stdout.
run_id: Unique ID to identify the process.
Returns:
A dictionary with the results, indicating success or failure.
"""
try:
# Phase: Execute the command in a shell environment
with subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
) as process:
# Process output handling goes here
```
## Investigated and Excluded Possibilities
* [x] The `rm` command is not explicitly invoked within the `rqtl` script.
* [x] The `rqtl` command is passed as a list of parsed arguments (i.e., no direct string injection).
* [x] The subprocess is not invoked within a shell environment, which would otherwise result in string injection.
* [x] We simulated invoking a system command within the `rqtl` script, but the error does not match the observed issue.
## TODO
* [ ] Test in a similar environment to the CD environment to replicate the issue.
* [ ] Investigate the internals of the QTL library for any unintended `rm` invocation.
|