summaryrefslogtreecommitdiff
path: root/topics/editing/case_attributes.gmi
blob: e4767edad5ef6cbc74896ea27121196e5a7bdc38 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# Editing Case-Attributes

## Tags

* type: document
* keywords: case-attribute, editing
* assigned: fredm, zachs, acenteno
* status: requirements gathering

## Introduction

Case-Attributes are essentially the metadata for the samples. In the GN2 system, they are the extra columns in the table in the "Reviews and Edit Data" accordion tab besides the value and its error margin.

To quote @zachs

> "Case Attributes" are basically just sample metadata. So stuff like the sex, age, etc of the various individuals (and exist separately from "normal" traits mainly because they're non-numeric)

They are the metadata for the various sample in a trait. The case attributes are determined at the group-level:

> Since they're metadata (or "attributes" in this case) for samples, they're group-level so for BXD, case attributes would apply at the level of each sample, across all BXD data

Also From email:
> Every strain has a unique attribute and it's fixed, not variable.

## Status

There is code that existed for the case-attributes editing, but it had a critical bug where the data for existing attributes would be deleted/replaced randomly when one made a change. This lead to a pause in this effort.

The chosen course of action will, however, not make use of this existing code. Instead, we will reimplement the feature with code in GN3, exposing the data and its editing via API endpoints.

## Database

The existing database tables of concern to us are:

* InbredSet
* CaseAttribute
* StrainXRef
* Strain
* CaseAttributeXRefNew

We can fetch case-attribute data from the database with:

```
SELECT
	caxrn.*, ca.Name AS CaseAttributeName,
	ca.Description AS CaseAttributeDescription,
	iset.InbredSetId AS OrigInbredSetId
FROM
	CaseAttribute AS ca INNER JOIN CaseAttributeXRefNew AS caxrn
	ON ca.Id=caxrn.CaseAttributeId
INNER JOIN
      StrainXRef AS sxr
      ON caxrn.StrainId=sxr.StrainId
INNER JOIN
      InbredSet AS iset
      ON sxr.InbredSetId=iset.InbredSetId
WHERE
	caxrn.value != 'x'
	AND caxrn.value IS NOT NULL;
```

which gives us all the information we need to rework the database schema.

Since the Case-Attributes are group-level, we need to move the `InbredSetId` to the `CaseAttribute` table from the `CaseAttributeXRefNew` table.

For more concrete relationship declaration, we can have the `CaseAttributeXRefNew` table have it primary key be composed of the `InbredSetId`, `StrainId` and `CaseAttributeId`. That has the added advantage that we can index the table on `InbredSetId` and `StrainId`.

That leaves the `CaseAttribute` table with the following columns:

* InbredSetId: Foreign Key from `InbredSet` table
* Id: The CaseAttribute identifier
* Name: Textual name for the Case-Attribute
* Description: Textual description fro the case-attribute

while the `CaseAttributeXRefNew` table ends up with the following columns:

* InbredSetId: Foreign Key from `InbredSet` table
* StrainId: The strain
* CaseAttributeId: The case-attribute identifier
* Value: The value for the case-attribute for this specific strain

There will not be any `NULL` values allowed for any of the columns in both tables. If a strain has no value, we simply delete the corresponding record from the `CaseAttributeXRefNew` table.

## Data Types

> ... (and exist separately from "normal" traits mainly because they're non-numeric)

The values for Case-Attributes are non-numeric data. This will probably be mostly textual data.

As an example:
=> https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish Trait Data and Analysis for BXD_10010
we see Case-Attributes as:

* Free-form text (no constraints) - see the `Status` column
* Enumerations - textual data, but where the user can only pick from specific values
* Links - The value displayed also acts as a link - e.g. the 'JAX:*' values in the `RRID` column


=> https://genenetwork.org/show_trait?trait_id=10002&dataset=CCPublish For this trait

We see:
* Numeric data - see the `N` and `SE` columns
though that might be a misunderstanding of the quote

> In the following link for example, every column after Value is a case attribute - https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish

**TODO**: Verify whether `N` and `SE` are Case-Attributes

## Authorisation

From email:
> it's probably not okay to let anyone who can edit sample data for a trait also edit case attributes, since they're group level

and from matrix:
> The weird bug aside, Bonface had (mostly) successfully implemented editing these through the CSV files in the same way as any other sample data, but for authorization reasons this probably doesn't make sense (since a user having access to editing sample data for specific traits doesn't imply that they'd have access for editing case attributes across the entire group)

From this, it implies we might need a new set of privileges for dealing with case-attributes, e.g.
* group:resource:add-case-attributes - Allows user to add a completely new case attribute
* group:resource:edit-case-attributes - Allows user to edit an existing case attribute
* group:resource:delete-case-attributes - Allows user to delete an existing case attribute
* group:resource:view-case-attributes - Allows user to view case attributes and their value

Considering, however, that groups (InbredSets) are not directly linked to any auth resource, this might mean some indirection of sorts, or maybe add a new resource type that handles groups.

## Features

* Editing existing case-attributes: YES
* Adding new case attributes: ???
* Deleting existing case attributes: ???

Strains/samples are shared across traits. The values for the case attributes are the same for a particular strain/sample for all traits within a particular InbredSet (group).

## Related issues

=> /issues/case-attr-edit-error
=> /issues/fix-case-attribute-work
=> /issues/fix-case-attribute-editing
=> /issues/consecutive-crud-applications-when-uploading-data
=> /issues/edit-metadata-bugs

## References

=> /topics/data-uploads/datasets