diff options
author | Frederick Muriuki Muriithi | 2024-06-24 15:20:55 -0500 |
---|---|---|
committer | Frederick Muriuki Muriithi | 2024-06-24 15:20:55 -0500 |
commit | a4d802329f5139e6d37fc4a7d4c56b2409e17bfe (patch) | |
tree | 06304bc1c71b96deba83425f7ec608f114626da0 /issues/gn-uploader/speed-up-rqtl2-qc.gmi | |
parent | d2178e0179fdc723621b63d5dcaf82050cf869e0 (diff) | |
download | gn-gemtext-a4d802329f5139e6d37fc4a7d4c56b2409e17bfe.tar.gz |
New Issue: gn-uploader: Speed up QC
Diffstat (limited to 'issues/gn-uploader/speed-up-rqtl2-qc.gmi')
-rw-r--r-- | issues/gn-uploader/speed-up-rqtl2-qc.gmi | 30 |
1 files changed, 30 insertions, 0 deletions
diff --git a/issues/gn-uploader/speed-up-rqtl2-qc.gmi b/issues/gn-uploader/speed-up-rqtl2-qc.gmi new file mode 100644 index 0000000..43e6d49 --- /dev/null +++ b/issues/gn-uploader/speed-up-rqtl2-qc.gmi @@ -0,0 +1,30 @@ +# Speed Up QC on R/qtl2 Bundles + +## Tags + +## Description + +The default format for the CSV files in a R/qtl2 bundle is: + +``` +matrix of individuals × (markers/phenotypes/covariates/phenotype covariates/etc.) +``` + +(A) (f/F)ile(s) in the R/qtl2 bundle could however +=> https://kbroman.org/qtl2/assets/vignettes/input_files.html#csv-files be transposed, +which means the system needs to "un-transpose" the file(s) before processing. + +Currently, the system does this by reading all the files of a particular type, and then "un-transposing" the entire thing. This leads to a very slow system. + +This issue proposes to do the quality control/assurance processing on each file in isolation, where possible - this will allow parallelisation/multiprocessing of the QC checks. + +The main considerations that need to be handled are as follows: + +* Do QC on (founder) genotype files (when present) before any of the other files +* Genetic and physical maps (if present) can have QC run on them after the genotype files +* Do QC on phenotype files (when present) after genotype files but before any other files +* Covariate and phenotype covariate files come after the phenotype files +* Cross information files … ? +* Sex information files … ? + +We should probably detail the type of QC checks done for each type of file |