summaryrefslogtreecommitdiff
path: root/issues/gn-uploader/speed-up-rqtl2-qc.gmi
diff options
context:
space:
mode:
authorFrederick Muriuki Muriithi2024-06-24 15:20:55 -0500
committerFrederick Muriuki Muriithi2024-06-24 15:20:55 -0500
commita4d802329f5139e6d37fc4a7d4c56b2409e17bfe (patch)
tree06304bc1c71b96deba83425f7ec608f114626da0 /issues/gn-uploader/speed-up-rqtl2-qc.gmi
parentd2178e0179fdc723621b63d5dcaf82050cf869e0 (diff)
downloadgn-gemtext-a4d802329f5139e6d37fc4a7d4c56b2409e17bfe.tar.gz
New Issue: gn-uploader: Speed up QC
Diffstat (limited to 'issues/gn-uploader/speed-up-rqtl2-qc.gmi')
-rw-r--r--issues/gn-uploader/speed-up-rqtl2-qc.gmi30
1 files changed, 30 insertions, 0 deletions
diff --git a/issues/gn-uploader/speed-up-rqtl2-qc.gmi b/issues/gn-uploader/speed-up-rqtl2-qc.gmi
new file mode 100644
index 0000000..43e6d49
--- /dev/null
+++ b/issues/gn-uploader/speed-up-rqtl2-qc.gmi
@@ -0,0 +1,30 @@
+# Speed Up QC on R/qtl2 Bundles
+
+## Tags
+
+## Description
+
+The default format for the CSV files in a R/qtl2 bundle is:
+
+```
+matrix of individuals × (markers/phenotypes/covariates/phenotype covariates/etc.)
+```
+
+(A) (f/F)ile(s) in the R/qtl2 bundle could however
+=> https://kbroman.org/qtl2/assets/vignettes/input_files.html#csv-files be transposed,
+which means the system needs to "un-transpose" the file(s) before processing.
+
+Currently, the system does this by reading all the files of a particular type, and then "un-transposing" the entire thing. This leads to a very slow system.
+
+This issue proposes to do the quality control/assurance processing on each file in isolation, where possible - this will allow parallelisation/multiprocessing of the QC checks.
+
+The main considerations that need to be handled are as follows:
+
+* Do QC on (founder) genotype files (when present) before any of the other files
+* Genetic and physical maps (if present) can have QC run on them after the genotype files
+* Do QC on phenotype files (when present) after genotype files but before any other files
+* Covariate and phenotype covariate files come after the phenotype files
+* Cross information files … ?
+* Sex information files … ?
+
+We should probably detail the type of QC checks done for each type of file