From f9c2780c228a8f8e0290f758e19ea6985be9883e Mon Sep 17 00:00:00 2001
From: Frederick Muriuki Muriithi
Date: Fri, 13 Dec 2024 13:57:37 -0600
Subject: Add page documentation.
---
.../phenotypes/add-phenotypes-raw-files.html | 148 ++++++++++++++++++++-
1 file changed, 143 insertions(+), 5 deletions(-)
(limited to 'uploader')
diff --git a/uploader/templates/phenotypes/add-phenotypes-raw-files.html b/uploader/templates/phenotypes/add-phenotypes-raw-files.html
index 612bff7..a39ace8 100644
--- a/uploader/templates/phenotypes/add-phenotypes-raw-files.html
+++ b/uploader/templates/phenotypes/add-phenotypes-raw-files.html
@@ -43,7 +43,11 @@
Provide the character that separates the fields in your file(s). It should
be the same character for all files (if more than one is provided).
- A tab character will be assumed if you leave this field blank.
+ A tab character will be assumed if you leave this field blank. See
+
+ documentation for more information.
+
The following are the common expectations for ALL the + files provided in the form above: +
If you do not specify the separator character, then we will assume a + TAB character was used as your separator.
+ +We also assume you might include comments lines in your files. In that
+ case, if you do not specify what character denotes that a line in your files
+ is a comment line, we will assume the # character.
+ A comment MUST ALWAYS begin at the start of the line marked
+ with the comment character specified.
We request some details about your files to help us parse and process the + files correctly. The details we collect are:
+The data in this file is a matrix of phenotypes × metadata-fields. + Please note we use the term "metadata-fields" above loosely, due to lack of + a good word for this.
+The file MUST have columns in this order: +
You can add more columns after those three if you want to, but these 3 + MUST be present.
+The file would, for example, look like the following:
+id,description,units,…
+ pheno10001|Central nervous system, behavior, cognition; …|mg|…
+ pheno10002|Aging, metabolism, central nervous system: …|mg|…
+ ⋮
+
+ Note 01: The first usable row is the heading row.
+Note 02: This example demonstrates a subtle issue that
+ could make your CSV file invalid — the choice of your field separator
+ character.
+ In the example above, we use the pipe character (|
) as our
+ field separator. This is because, if we follow the advice on how to write
+ good descriptions, then we cannot use the comma as our separator – if
+ we did, then our CSV file would be invalid because the system would have no
+ way to tell the difference between the comma as a field separator, and the
+ comma as a way to separate the "general category and ontology terms".
The data is a matrix of phenotypes × individuals, e.g.
+
+ # num-cases: 2549
+ # num-phenos: 13
+ id,IND001,IND002,IND003,IND004,…
+ pheno10001,61.400002,54.099998,483,49.799999,…
+ pheno10002,49,50.099998,403,45.5,…
+ pheno10003,62.5,53.299999,501,62.900002,…
+ pheno10004,53.099998,55.099998,403,NA,…
+ ⋮
+
+ where IND001,IND002,IND003,IND004,…
are the
+ samples/individuals/cases in your study, and
+ pheno10001,pheno10002,pheno10004,pheno10004,…
are the
+ identifiers for your phenotypes.
The lines beginning with the "#" symbol (i.e.
+ # num-cases: 2549
and # num-phenos: 13
are comment
+ lines and will be ignored
In this example, the comma (,) is used as the file separator.
+