aboutsummaryrefslogtreecommitdiff
path: root/qc_app/templates/data_review.html
blob: 7ac01ba8bbdc48644a9396d88484567f491c7a13 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
{%extends "base.html"%}

{%block title%}Data Review{%endblock%}

{%block contents%}
<h1 class="heading">data review</h1>

<div id="explainer">
  <h2 id="data-concerns">Data Concerns</h2>
  <p>The following are some of the requirements that the data in your file
    <strong>MUST</strong> fulfil before it is considered valid for this system:
  </p>

  <ol>
    <li>File headings
      <ul>
	<li>The first row in the file should contains the headings. The number of
	  headings in this first row determines the number of columns expected for
	  all other lines in the file.</li>
	<li>Each heading value in the first row MUST appear in the first row
	  <strong>ONE AND ONLY ONE</strong> time</li>
	<li>The strain headers in your first row will be against a source of truth
	  (<a href="https://gitlab.com/fredmanglis/gnqc_py/-/blob/main/etc/strains.csv"
	      title="list of expected strains">see strains.csv [1.7M]</a>).<br />
	  Pick the strain names from the <strong>'Name'</strong> and
	  <strong>'Name2'</strong> fields.</li>
      </ul>
    </li>

    <li>Data
      <ol>
	<li><strong>NONE</strong> of the data cells/fields is allowed to be empty.
	  All fields/cells <strong>MUST</strong> contain a value.</li>
	<li>The first column of the data rows will be considered a textual field,
	  holding the "identifier" for that row<li>
	<li>Except for the first column/field for each data row,
	  <strong>NONE</strong> of the data columns/cells/fields should contain
	  spurious characters like `eeeee`, `5.555iloveguix`, etc...<br />
	  All of them should be decimal values</li>
	<li>decimal numbers must conform to the following criteria:
	  <ul>
	    <li>when checking an average file decimal numbers must have exactly three
	      decimal places to the right of the decimal point.</li>
	    <li>when checking a standard error file decimal numbers must have six or
	      greater decimal places to the right of the decimal point.</li>
	    <li>there must be a number to the left side of the decimal place
	      (e.g. 0.55555 is allowed but .55555 is not).<li>
	  </ul>
	</li>
      </ol>
    </li>
  </ol>

  
  <h2 id="file-types">Supported File Types</h2>
  We support the following file types:

  <ul>
    <li>Tab-Separated value files (.tsv)
      <ul>
	<li>The <strong>TAB</strong> character is used to separate the fields of each
	  column</li>
	<li>The values of each field <strong>ARE NOT</strong> quoted.</li>
	<li>Here is an
	  <a href="https://gitlab.com/fredmanglis/gnqc_py/-/blob/main/tests/test_data/no_data_errors.tsv">
	    example file</a> with a single data row.</li>
      </ul>
    </li>
    <li>.txt files: Content has the same format as .tsv file above</li>
    <li>.zip files: each zip file should contain
      <strong>ONE AND ONLY ONE</strong> file of the .tsv or .txt type above.
      <br />Any zip file with more than one file is invalid, and so is an empty
      zip file.</li>
  </ul>

</div>
{%endblock%}