From f9c2780c228a8f8e0290f758e19ea6985be9883e Mon Sep 17 00:00:00 2001 From: Frederick Muriuki Muriithi Date: Fri, 13 Dec 2024 13:57:37 -0600 Subject: Add page documentation. --- .../phenotypes/add-phenotypes-raw-files.html | 148 ++++++++++++++++++++- 1 file changed, 143 insertions(+), 5 deletions(-) (limited to 'uploader') diff --git a/uploader/templates/phenotypes/add-phenotypes-raw-files.html b/uploader/templates/phenotypes/add-phenotypes-raw-files.html index 612bff7..a39ace8 100644 --- a/uploader/templates/phenotypes/add-phenotypes-raw-files.html +++ b/uploader/templates/phenotypes/add-phenotypes-raw-files.html @@ -43,7 +43,11 @@ Provide the character that separates the fields in your file(s). It should be the same character for all files (if more than one is provided).
- A tab character will be assumed if you leave this field blank.
+ A tab character will be assumed if you leave this field blank. See + + documentation for more information. +
@@ -62,7 +66,11 @@
This specifies that lines that begin with the character provided will be - considered comment lines and ignored in their entirety. + considered comment lines and ignored in their entirety. See + + documentation for more information. +
@@ -81,7 +89,9 @@ This specifies strings in your file indicate that there is no value for a particular cell (a cell is where a column and row intersect). Provide a space-separated list of strings if you have more than one way of - indicating no values. + indicating no values. See + + documentation for more information.
@@ -112,10 +122,11 @@ required="required" /> Provide a file that contains only the phenotype data. See - the documentation for the expected format of the file. + {%if population.Family in families_with_se_and_n%}
@@ -146,7 +157,134 @@ {%block page_documentation%} -page documentation goes here!!! +
+

Help

+

Common Features

+

The following are the common expectations for ALL the + files provided in the form above: +

+

+ +

If you do not specify the separator character, then we will assume a + TAB character was used as your separator.

+ +

We also assume you might include comments lines in your files. In that + case, if you do not specify what character denotes that a line in your files + is a comment line, we will assume the # character.
+ A comment MUST ALWAYS begin at the start of the line marked + with the comment character specified.

+ +

File Metadata

+

We request some details about your files to help us parse and process the + files correctly. The details we collect are:

+
+
File separator
+
The files you provide should be character-separated value (CSV) files. + We need to know what character you used to separate the values in your + file. Some common ones are the Tab character, the comma, etc.
+ Providing that information makes it possible for the system to parse and + process your files correctly.
+ NOTE: All the files you upload MUST use the same + separator.
+ +
Comment character
+
We support use of comment lines in your files. We only support one type + of comment style, the line comment.
+ This mean the comment begins at the start of the line, and the end of that + line indicates the end of that comment. If you have a really long comment, + then you need to break it across multiple lines, marking each line a + comment line.
+ The "comment character" is the character at the start of the line that + indicates that the line is a line comment.
+ +
No-Value indicator(s)
+
Data in the real world is messy, and in some cases, entirely absent. You + need to indicate, in your files, that a particular field did not have a + value, and once you do that, you then need to let the system know how you + mark such fields. Common ways of indicating "empty values" are, leaving + the field blank, using a character such as '-', or using strings like + "NA", "N/A", "NULL", etc.
+ Providing this information will help with parsing and processing such + no-value fields the correct way.
+
+ +

+ file: Phenotypes Descriptions

+

The data in this file is a matrix of phenotypes × metadata-fields. + Please note we use the term "metadata-fields" above loosely, due to lack of + a good word for this.

+

The file MUST have columns in this order: +

+
Phenotype Identifiers
+
These are the names/identifiers for your phenotypes. These + names/identifiers are the same ones you will have in all the other files you are + uploading.
+ +
Descriptions
+
Each phenotype will need a description. Good description are necessary + to inform other people of what the data is about. Good description are + hard to construct, so we provide + + advice on describing your phenotypes.
+ +
Units
+
Each phenotype will need units for the measurements taken. If there are + none, then indicate the field is a no-value field.
+

+

You can add more columns after those three if you want to, but these 3 + MUST be present.

+

The file would, for example, look like the following:

+ id,description,units,…
+ pheno10001|Central nervous system, behavior, cognition; …|mg|…
+ pheno10002|Aging, metabolism, central nervous system: …|mg|…
+ ⋮
+ +

Note 01: The first usable row is the heading row.

+

Note 02: This example demonstrates a subtle issue that + could make your CSV file invalid — the choice of your field separator + character.
+ In the example above, we use the pipe character (|) as our + field separator. This is because, if we follow the advice on how to write + good descriptions, then we cannot use the comma as our separator – if + we did, then our CSV file would be invalid because the system would have no + way to tell the difference between the comma as a field separator, and the + comma as a way to separate the "general category and ontology terms".

+ +

file: Phenotype Data, Standard Errors and/or Sample Counts

+ + + +

The data is a matrix of phenotypes × individuals, e.g.

+ + # num-cases: 2549 + # num-phenos: 13 + id,IND001,IND002,IND003,IND004,…
+ pheno10001,61.400002,54.099998,483,49.799999,…
+ pheno10002,49,50.099998,403,45.5,…
+ pheno10003,62.5,53.299999,501,62.900002,…
+ pheno10004,53.099998,55.099998,403,NA,…
+ ⋮
+ +

where IND001,IND002,IND003,IND004,… are the + samples/individuals/cases in your study, and + pheno10001,pheno10002,pheno10004,pheno10004,… are the + identifiers for your phenotypes.

+

The lines beginning with the "#" symbol (i.e. + # num-cases: 2549 and # num-phenos: 13 are comment + lines and will be ignored

+

In this example, the comma (,) is used as the file separator.

+
+ +{%endblock%} {%block more_javascript%} -- cgit v1.2.3