summaryrefslogtreecommitdiff
path: root/issues/quality-control/qc-r-qtl2-bundles.gmi
blob: e8dc07db30adf59dc157f7b05a4b25fd2cae7d34 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Quality Control of Data in Uploaded R/qtl2 Bundles

## Tags

* assigned: fredm, acenteno
* status: open
* type: feature request
* priority: medium
* keywords: quality control, QC, R/qtl2 bundle

## Description

Currently (2024-02-02T05:41+03:00UTC), the code simply allows the upload of data, doing the bare minimum in terms of quality control. In this document, we detail the quality control checks that are required to be run against the uploaded data, to ensure the data we have is acceptable.

The following "key" details the meanings of certain notations in this file:

* [ ]: not started
* [-]: partially done or in progress
* [x]: completed

### [-] Control File

* [x] MUST exist in bundle
* [x] One and only one control file in the bundle
* [-] Defaults for control data are auto-provided by code
* [ ] Every file listed in control file MUST exist in the bundle

### [ ] geno File(s)

* [ ] Every value existing in file is one of the genotype encodings in the control file

### [ ] phenocovar File(s)

* [ ] At least one of the phenocovar files contains a "description" column
* [ ] The description of every phenotype fits the rules[1]

### [ ] pheno File(s)

* [ ] Check for a minimal number of decimal places (three?)
* [ ] Verify that all listed samples/cases exist in the database, prior to attempting to parse the file and load data into the database

### [ ] phenose File(s)

This is a proposed addition for our specific use-case. If the data in the pheno file(s) was derived from averaging values, then the user could provide the corresponding "standard error" file(s).

Has similar QC checks to those for the "pheno" file(s) above. The number of decimal places might vary, however.

## Questions Fred Has

* Is there a way to detect whether data has been log2 normalised? GN requires that all data is log2 normalised.

## Resources

* [1]: Description rules: https://info.genenetwork.org/faq.php#q-22