blob: 43e6d49434e563e3d3bcc7641ceb57787316e478 (
about) (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
# Speed Up QC on R/qtl2 Bundles
## Tags
## Description
The default format for the CSV files in a R/qtl2 bundle is:
```
matrix of individuals × (markers/phenotypes/covariates/phenotype covariates/etc.)
```
(A) (f/F)ile(s) in the R/qtl2 bundle could however
=> https://kbroman.org/qtl2/assets/vignettes/input_files.html#csv-files be transposed,
which means the system needs to "un-transpose" the file(s) before processing.
Currently, the system does this by reading all the files of a particular type, and then "un-transposing" the entire thing. This leads to a very slow system.
This issue proposes to do the quality control/assurance processing on each file in isolation, where possible - this will allow parallelisation/multiprocessing of the QC checks.
The main considerations that need to be handled are as follows:
* Do QC on (founder) genotype files (when present) before any of the other files
* Genetic and physical maps (if present) can have QC run on them after the genotype files
* Do QC on phenotype files (when present) after genotype files but before any other files
* Covariate and phenotype covariate files come after the phenotype files
* Cross information files … ?
* Sex information files … ?
We should probably detail the type of QC checks done for each type of file
|