summaryrefslogtreecommitdiff
path: root/topics/editing/case_attributes.gmi
blob: 5a110265cd0aa7bc39007d0dd41367d402bad35a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
# Editing Case-Attributes

## Tags

* type: document
* keywords: case-attribute, editing
* assigned: fredm, zachs, acenteno
* status: requirements gathering

## Introduction

Case-Attributes are essentially the metadata for the samples. In the GN2 system, they are the extra columns in the table in the "Reviews and Edit Data" accordion tab besides the value and its error margin.

To quote @zachs

> "Case Attributes" are basically just sample metadata. So stuff like the sex, age, etc of the various individuals (and exist separately from "normal" traits mainly because they're non-numeric)

They are the metadata for the various sample in a trait. The case attributes are determined at the group-level:

> Since they're metadata (or "attributes" in this case) for samples, they're group-level so for BXD, case attributes would apply at the level of each sample, across all BXD data

Also From email:
> Every strain has a unique attribute and it's fixed, not variable.

## Direction

We need to differentiate two things:
* Case-Attribute labels/names/categories (e.g. Sex, Height, Cage-handler, etc)
* Case-Attribute values (e.g. Male/Female, 20cm, Frederick, etc.)

As is currently implemented (as of before 2023-08-31), both the labels and values are set at group level.

A look at
=> https://github.com/genenetwork/genenetwork1/blob/0f170f0b748a4e10eaf8538f6bcbf88b573ce8e7/web/webqtl/showTrait/DataEditingPage.py Case-Attributes on GeneNetwork1
is a good starting point to help with understanding how case-attributes were implemented and how they worked.

## Status

There is code that existed for the case-attributes editing, but it had a critical bug where the data for existing attributes would be deleted/replaced randomly when one made a change. This lead to a pause in this effort.

The chosen course of action will, however, not make use of this existing code. Instead, we will reimplement the feature with code in GN3, exposing the data and its editing via API endpoints.

## Database

The existing database tables of concern to us are:

* InbredSet
* CaseAttribute
* StrainXRef
* Strain
* CaseAttributeXRefNew

We can fetch case-attribute data from the database with:

```
SELECT
	caxrn.*, ca.Name AS CaseAttributeName,
	ca.Description AS CaseAttributeDescription,
	iset.InbredSetId AS OrigInbredSetId
FROM
	CaseAttribute AS ca INNER JOIN CaseAttributeXRefNew AS caxrn
	ON ca.Id=caxrn.CaseAttributeId
INNER JOIN
      StrainXRef AS sxr
      ON caxrn.StrainId=sxr.StrainId
INNER JOIN
      InbredSet AS iset
      ON sxr.InbredSetId=iset.InbredSetId
WHERE
	caxrn.value != 'x'
	AND caxrn.value IS NOT NULL;
```

which gives us all the information we need to rework the database schema.

Since the Case-Attributes are group-level, we need to move the `InbredSetId` to the `CaseAttribute` table from the `CaseAttributeXRefNew` table.

For more concrete relationship declaration, we can have the `CaseAttributeXRefNew` table have it primary key be composed of the `InbredSetId`, `StrainId` and `CaseAttributeId`. That has the added advantage that we can index the table on `InbredSetId` and `StrainId`.

That leaves the `CaseAttribute` table with the following columns:

* InbredSetId: Foreign Key from `InbredSet` table
* Id: The CaseAttribute identifier
* Name: Textual name for the Case-Attribute
* Description: Textual description fro the case-attribute

while the `CaseAttributeXRefNew` table ends up with the following columns:

* InbredSetId: Foreign Key from `InbredSet` table
* StrainId: The strain
* CaseAttributeId: The case-attribute identifier
* Value: The value for the case-attribute for this specific strain

There will not be any `NULL` values allowed for any of the columns in both tables. If a strain has no value, we simply delete the corresponding record from the `CaseAttributeXRefNew` table.

To that end, the following script has been added to ease the migration of the table schemas:
=> https://github.com/genenetwork/genenetwork3/blob/dd0b29c07017ec398c447ca683dd4b4be18d73b7/scripts/update-case-attribute-tables-20230818
The script is meant to be run only once, and makes the changes mentioned above for both tables.

## Data Types

> ... (and exist separately from "normal" traits mainly because they're non-numeric)

The values for Case-Attributes are non-numeric data. This will probably be mostly textual data.

As an example:
=> https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish Trait Data and Analysis for BXD_10010
we see Case-Attributes as:

* Free-form text (no constraints) - see the `Status` column
* Enumerations - textual data, but where the user can only pick from specific values
* Links - The value displayed also acts as a link - e.g. the 'JAX:*' values in the `RRID` column


=> https://genenetwork.org/show_trait?trait_id=10002&dataset=CCPublish For this trait

We see:
* Numeric data - see the `N` and `SE` columns
though that might be a misunderstanding of the quote

> In the following link for example, every column after Value is a case attribute - https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish

**TODO**: Verify whether `N` and `SE` are Case-Attributes

## Authorisation

From email:
> it's probably not okay to let anyone who can edit sample data for a trait also edit case attributes, since they're group level

and from matrix:
> The weird bug aside, Bonface had (mostly) successfully implemented editing these through the CSV files in the same way as any other sample data, but for authorization reasons this probably doesn't make sense (since a user having access to editing sample data for specific traits doesn't imply that they'd have access for editing case attributes across the entire group)

From this, it implies we might need a new set of privileges for dealing with case-attributes, e.g.
* group:resource:add-case-attributes - Allows user to add a completely new case attribute
* group:resource:edit-case-attributes - Allows user to edit an existing case attribute
* group:resource:delete-case-attributes - Allows user to delete an existing case attribute
* group:resource:view-case-attributes - Allows user to view case attributes and their value

Considering, however, that groups (InbredSets) are not directly linked to any auth resource, this might mean some indirection of sorts, or maybe add a new resource type that handles groups.

## Features

* Editing existing case-attributes: YES
* Adding new case attributes: ???
* Deleting existing case attributes: ???

Strains/samples are shared across traits. The values for the case attributes are the same for a particular strain/sample for all traits within a particular InbredSet (group).

## Related and Unsynthesised Chats

=> https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$myIoafLp_dIONnyNvEI0k2xf3Y8-LyiI_mkP2vBN08o?via=matrix.org
```
Zachary SloanZ
I'm pretty sure multiple phenotypes and mRNA datasets can belong to the same experiment (and definitely for the purposes of case attributes
since the mRNA datasets are split by tissue
genotype traits should all be considered part of the same "experiment" (at least as long as we're still only databasing a single genotype file for each group)

pjotrp
: Case attribute editing will still need to be group level, at least until the whole feature is completely changed. Since they're basically just phenotypes we choose to show in the trait page table, and phenotypes are at the group level
```

=> https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$P6SNnpY-nAZsDr3VZlRi05m6MT32lXBsCl-BYLh-YLM?via=matrix.org
```
Zachary SloanZ
21:14
Groups are defined by their list of samples/strains, and the "case attributes" are just "the characteristics of those samples/strains we choose to show on the trait page" (if we move away from the "group" concept entirely that could change, but if we did that we probably would also replace "case attributes" with something else because the way that's implemented is kind of weird to begin with)
ZB
```

## Related issues

=> /issues/case-attr-edit-error
=> /issues/fix-case-attribute-work
=> /issues/fix-case-attribute-editing
=> /issues/consecutive-crud-applications-when-uploading-data
=> /issues/edit-metadata-bugs

## References

=> /topics/data-uploads/datasets