summaryrefslogtreecommitdiff
path: root/topics/data
diff options
context:
space:
mode:
authorPjotr Prins2024-08-16 12:26:42 +0200
committerPjotr Prins2024-08-18 13:49:40 +0200
commita3e2e81c5775a52b09c55e2693e2528d504b9953 (patch)
tree396edf998762129b75788021eaf2f212b029ba82 /topics/data
parent1a2c0e52ff88c1b12aa1b8984b3aa6c220dc7bf2 (diff)
downloadgn-gemtext-a3e2e81c5775a52b09c55e2693e2528d504b9953.tar.gz
Move file
Diffstat (limited to 'topics/data')
-rw-r--r--topics/data/R-qtl2-format-notes.gmi43
1 files changed, 43 insertions, 0 deletions
diff --git a/topics/data/R-qtl2-format-notes.gmi b/topics/data/R-qtl2-format-notes.gmi
new file mode 100644
index 0000000..e0109b1
--- /dev/null
+++ b/topics/data/R-qtl2-format-notes.gmi
@@ -0,0 +1,43 @@
+# R/qtl2 Format Notes
+
+This document is mostly to help other non-biologists figure out their way around the format(s) of the R/qtl2 files. It mostly deals with the meaning/significance of the various fields.
+
+From the R/qtl2 format documentation:
+
+> The comma-delimited (CSV) files are each in the form of a simple matrix, with the first column being a set of IDs and the first row being a set of variable names.
+
+and
+
+> All of these CSV files may be transposed relative to the form described below.
+
+We are going to consider the "non-transposed" form here, for ease of documentation: simply flip the meanings as appropriate for the transposed files.
+
+## geno files
+
+> The genotype data file is a matrix of individuals × markers. The first column is the individual IDs; the first row is the marker names.
+
+For GeneNetwork, this means that the first column contains the Sample names (previously "strain names"). The first row would be a list of markers.
+
+## gmap and pmap files
+
+The first column of the gmap/pmap file contains genetic marker values. There are no Individuals/samples (or strains) here.
+
+## pheno files
+
+The first column is the list of individuals (samples/strains) whereas the first column is the list of phenotypes.
+
+## phenocovar files
+
+These seem to contain extra metadata for the phenotypes.
+
+The first column is the list of phenotype identifiers whereas the first column is a list of metadata headers (phenotype covariates).
+
+As an example,
+=> https://github.com/rqtl/qtl2data/blob/main/BXD/bxd_phenocovar.csv The phenocovar file for BXD mice
+
+We see here that this contains the individual identifier (id), and a description for each individual/sample.
+
+# References
+
+=> https://kbroman.org/qtl2/assets/vignettes/input_files.html
+=> https://github.com/rqtl/qtl2data