From f9c2780c228a8f8e0290f758e19ea6985be9883e Mon Sep 17 00:00:00 2001
From: Frederick Muriuki Muriithi
Date: Fri, 13 Dec 2024 13:57:37 -0600
Subject: Add page documentation.

---
 .../phenotypes/add-phenotypes-raw-files.html       | 148 ++++++++++++++++++++-
 1 file changed, 143 insertions(+), 5 deletions(-)

(limited to 'uploader')
diff --git a/uploader/templates/phenotypes/add-phenotypes-raw-files.html b/uploader/templates/phenotypes/add-phenotypes-raw-files.html
index 612bff7..a39ace8 100644
--- a/uploader/templates/phenotypes/add-phenotypes-raw-files.html
+++ b/uploader/templates/phenotypes/add-phenotypes-raw-files.html
@@ -43,7 +43,11 @@
     <span class="form-text text-muted">
       Provide the character that separates the fields in your file(s). It should
       be the same character for all files (if more than one is provided).<br />
-      A tab character will be assumed if you leave this field blank.</span>
+      A tab character will be assumed if you leave this field blank. See
+      <a href="#docs-file-separator"
+         title="Documentation for file-separator characters">
+        documentation for more information</a>.
+    </span>
   </div>
 
   <div class="form-group">
@@ -62,7 +66,11 @@
     </div>
     <span class="form-text text-muted">
       This specifies that lines that begin with the character provided will be
-      considered comment lines and ignored in their entirety.</span>
+      considered comment lines and ignored in their entirety. See
+      <a href="#docs-file-comment-character"
+         title="Documentation for comment characters">
+        documentation for more information</a>.
+    </span>
   </div>
 
   <div class="form-group">
@@ -81,7 +89,9 @@
       This specifies strings in your file indicate that there is no value for a
       particular cell (a cell is where a column and row intersect). Provide a
       space-separated list of strings if you have more than one way of
-      indicating no values.</span>
+      indicating no values. See
+      <a href="#docs-file-na" title="Documentation for no-value fields">
+        documentation for more information</a>.</span>
   </div>
 </fieldset>
 
@@ -112,10 +122,11 @@
            required="required"  />
     <span class="form-text text-muted">
       Provide a file that contains only the phenotype data. See
-      <a href="#docs-file-example"
+      <a href="#docs-phenotype-data"
          title="Documentation of the phenotype data file format.">
         the documentation for the expected format of the file</a>.</span>
   </div>
+
   {%if population.Family in families_with_se_and_n%}
   <div class="form-group">
     <label for="finput-phenotype-se" class="form-label">Phenotype: Standard Errors</label>
@@ -146,7 +157,134 @@
 
 
 {%block page_documentation%}
-page documentation goes here!!!
+<div class="row">
+  <h2 class="heading" id="docs-help">Help</h2>
+  <h3 class="subheading">Common Features</h3>
+  <p>The following are the common expectations for <strong>ALL</strong> the
+    files provided in the form above:
+    <ul>
+      <li>The file <strong>MUST</strong> be character-separated values (CSV)
+        text file</li>
+      <li>The first row in the file <strong>MUST</strong> be a heading row, and
+        will be composed of the list identifiers for all of
+        samples/individuals/cases involved in your study.</li>
+      <li>The first column of data in the file <strong>MUST</strong> be the
+        identifiers for all of the phenotypes you wish to upload.</li>
+    </ul>
+  </p>
+
+  <p>If you do not specify the separator character, then we will assume a
+    <strong>TAB</strong> character was used as your separator.</p>
+
+  <p>We also assume you might include comments lines in your files. In that
+    case, if you do not specify what character denotes that a line in your files
+    is a comment line, we will assume the <strong>#</strong> character.<br />
+    A comment <strong>MUST ALWAYS</strong> begin at the start of the line marked
+    with the comment character specified.</p>
+
+  <h3 class="subheading" id="docs-file-metadata">File Metadata</h3>
+  <p>We request some details about your files to help us parse and process the
+    files correctly. The details we collect are:</p>
+  <dl>
+    <dt id="docs-file-separator">File separator</dt>
+    <dd>The files you provide should be character-separated value (CSV) files.
+      We need to know what character you used to separate the values in your
+      file. Some common ones are the Tab character, the comma, etc.<br />
+      Providing that information makes it possible for the system to parse and
+      process your files correctly.<br>
+      <strong>NOTE:</strong> All the files you upload MUST use the same
+      separator.</dd>
+
+    <dt id="docs-file-comment-character">Comment character</dt>
+    <dd>We support use of comment lines in your files. We only support one type
+      of comment style, the <em>line comment</em>.<br />
+      This mean the comment begins at the start of the line, and the end of that
+      line indicates the end of that comment. If you have a really long comment,
+      then you need to break it across multiple lines, marking each line a
+      comment line.<br />
+      The "comment character" is the character at the start of the line that
+      indicates that the line is a line comment.</dd>
+
+    <dt id="docs-file-na">No-Value indicator(s)</dt>
+    <dd>Data in the real world is messy, and in some cases, entirely absent. You
+      need to indicate, in your files, that a particular field did not have a
+      value, and once you do that, you then need to let the system know how you
+      mark such fields. Common ways of indicating "empty values" are, leaving
+      the field blank, using a character such as '-', or using strings like
+      "NA", "N/A", "NULL", etc.<br />
+      Providing this information will help with parsing and processing such
+      no-value fields the correct way.</dd>
+  </dl>
+
+  <h3 class="subheading" id="docs-file-phenotype-description">
+    file: Phenotypes Descriptions</h3>
+  <p>The data in this file is a matrix of <em>phenotypes × metadata-fields</em>.
+    Please note we use the term "metadata-fields" above loosely, due to lack of
+    a good word for this.</p>
+  <p>The file <strong>MUST</strong> have columns in this order:
+    <dl>
+      <dt>Phenotype Identifiers</dt>
+      <dd>These are the names/identifiers for your phenotypes. These
+        names/identifiers are the same ones you will have in all the other files you are
+        uploading.</dd>
+
+      <dt>Descriptions</dt>
+      <dd>Each phenotype will need a description. Good description are necessary
+        to inform other people of what the data is about. Good description are
+        hard to construct, so we provide
+        <a href="https://info.genenetwork.org/faq.php#q-22"
+           title="How to write phenotype descriptions">
+          advice on describing your phenotypes.</a></dd>
+
+      <dt>Units</dt>
+      <dd>Each phenotype will need units for the measurements taken. If there are
+        none, then indicate the field is a no-value field.</dd>
+  </dl></p>
+  <p>You can add more columns after those three if you want to, but these 3
+    <strong>MUST</strong> be present.</p>
+  <p>The file would, for example, look like the following:</p>
+  <code>id,description,units,…<br />
+    pheno10001|Central nervous system, behavior, cognition; …|mg|…<br />
+    pheno10002|Aging, metabolism, central nervous system: …|mg|…<br />
+    ⋮<br /></code>
+
+  <p><strong>Note 01</strong>: The first usable row is the heading row.</p>
+  <p><strong>Note 02: </strong>This example demonstrates a subtle issue that
+    could make your CSV file invalid &mdash; the choice of your field separator
+    character.<br >
+    In the example above, we use the pipe character (<code>|</code>) as our
+    field separator. This is because, if we follow the advice on how to write
+    good descriptions, then we cannot use the comma as our separator &ndash; if
+    we did, then our CSV file would be invalid because the system would have no
+    way to tell the difference between the comma as a field separator, and the
+    comma as a way to separate the "general category and ontology terms".</p>
+
+  <h3 class="subheading">file: Phenotype Data, Standard Errors and/or Sample Counts</h3>
+  <span id="docs-phenotype-data"></span>
+  <span id="docs-phenotype-se"></span>
+  <span id="docs-phenotype-n"></span>
+  <p>The data is a matrix of <em>phenotypes × individuals</em>, e.g.</p>
+  <code>
+    # num-cases: 2549
+    # num-phenos: 13
+    id,IND001,IND002,IND003,IND004,…<br />
+    pheno10001,61.400002,54.099998,483,49.799999,…<br />
+    pheno10002,49,50.099998,403,45.5,…<br />
+    pheno10003,62.5,53.299999,501,62.900002,…<br />
+    pheno10004,53.099998,55.099998,403,NA,…<br />
+    ⋮<br /></code>
+
+  <p>where <code>IND001,IND002,IND003,IND004,…</code> are the
+    samples/individuals/cases in your study, and
+    <code>pheno10001,pheno10002,pheno10004,pheno10004,…</code> are the
+    identifiers for your phenotypes.</p>
+  <p>The lines beginning with the "<em>#</em>" symbol (i.e.
+    <code># num-cases: 2549</code> and <code># num-phenos: 13</code> are comment
+    lines and will be ignored</p>
+  <p>In this example, the comma (,) is used as the file separator.</p>
+</div>
+
+{%endblock%}
 
 
 {%block more_javascript%}
-- 
cgit v1.2.3