aboutsummaryrefslogtreecommitdiff
path: root/qc_app/templates/data_review.html
blob: b7528fd2a36a70622d731ebadc34b6281d048952 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
{%extends "base.html"%}

{%block title%}Data Review{%endblock%}

{%block contents%}
<h1 class="heading">data review</h1>

<div class="row">
  <h2 id="data-concerns">Data Concerns</h2>
  <p>The following are some of the requirements that the data in your file
    <strong>MUST</strong> fulfil before it is considered valid for this system:
  </p>

  <ol>
    <li>File headings
      <ul>
	<li>The first row in the file should contains the headings. The number of
	  headings in this first row determines the number of columns expected for
	  all other lines in the file.</li>
	<li>Each heading value in the first row MUST appear in the first row
	  <strong>ONE AND ONLY ONE</strong> time</li>
	<li>The sample/cases (previously 'strains') headers in your first row will be
          against those in the <a href="https://genenetwork.org"
                                  title="Link to the GeneNetwork service">
            GeneNetwork</a> database.<br />
          <small class="text-muted">
            If you encounter an error saying your sample(s)/case(s) do not exist
            in the GeneNetwork database, then you will have to use the
            <a href="{{url_for('samples.select_species')}}"
               title="Upload samples/cases feature">Upload Samples/Cases</a>
            option on this system to upload them.
          </small>
      </ul>
    </li>

    <li>Data
      <ol>
	<li><strong>NONE</strong> of the data cells/fields is allowed to be empty.
	  All fields/cells <strong>MUST</strong> contain a value.</li>
	<li>The first column of the data rows will be considered a textual field,
	  holding the "identifier" for that row<li>
	<li>Except for the first column/field for each data row,
	  <strong>NONE</strong> of the data columns/cells/fields should contain
	  spurious characters like `eeeee`, `5.555iloveguix`, etc...<br />
	  All of them should be decimal values</li>
	<li>decimal numbers must conform to the following criteria:
	  <ul>
	    <li>when checking an average file decimal numbers must have exactly three
	      decimal places to the right of the decimal point.</li>
	    <li>when checking a standard error file decimal numbers must have six or
	      greater decimal places to the right of the decimal point.</li>
	    <li>there must be a number to the left side of the decimal place
	      (e.g. 0.55555 is allowed but .55555 is not).</li>
	  </ul>
	</li>
      </ol>
    </li>
  </ol>
</div>


<div class="row">
  <h2 id="file-types">Supported File Types</h2>
  We support the following file types:

  <ul>
    <li>Tab-Separated value files (.tsv)
      <ul>
	<li>The <strong>TAB</strong> character is used to separate the fields of each
	  column</li>
	<li>The values of each field <strong>ARE NOT</strong> quoted.</li>
	<li>Here is an
	  <a href="https://gitlab.com/fredmanglis/gnqc_py/-/blob/main/tests/test_data/no_data_errors.tsv">
	    example file</a> with a single data row.</li>
      </ul>
    </li>
    <li>.txt files: Content has the same format as .tsv file above</li>
    <li>.zip files: each zip file should contain
      <strong>ONE AND ONLY ONE</strong> file of the .tsv or .txt type above.
      <br />Any zip file with more than one file is invalid, and so is an empty
      zip file.</li>
  </ul>

</div>
{%endblock%}