# Editing Case-Attributes ## Tags * type: document * keywords: case-attribute, editing * assigned: fredm, zachs, acenteno * status: requirements gathering ## Introduction Case-Attributes are essentially the metadata for the samples. In the GN2 system, they are the extra columns in the table in the "Reviews and Edit Data" accordion tab besides the value and its error margin. To quote @zachs > "Case Attributes" are basically just sample metadata. So stuff like the sex, age, etc of the various individuals (and exist separately from "normal" traits mainly because they're non-numeric) They are the metadata for the various sample in a trait. The case attributes are determined at the group-level: > Since they're metadata (or "attributes" in this case) for samples, they're group-level so for BXD, case attributes would apply at the level of each sample, across all BXD data Also From email: > Every strain has a unique attribute and it's fixed, not variable. ## Direction We need to differentiate two things: * Case-Attribute labels/names/categories (e.g. Sex, Height, Cage-handler, etc) * Case-Attribute values (e.g. Male/Female, 20cm, Frederick, etc.) As is currently implemented (as of before 2023-08-31), both the labels and values are set at group level. A look at => https://github.com/genenetwork/genenetwork1/blob/0f170f0b748a4e10eaf8538f6bcbf88b573ce8e7/web/webqtl/showTrait/DataEditingPage.py Case-Attributes on GeneNetwork1 is a good starting point to help with understanding how case-attributes were implemented and how they worked. ## Status There is code that existed for the case-attributes editing, but it had a critical bug where the data for existing attributes would be deleted/replaced randomly when one made a change. This lead to a pause in this effort. The chosen course of action will, however, not make use of this existing code. Instead, we will reimplement the feature with code in GN3, exposing the data and its editing via API endpoints. ## Database The existing database tables of concern to us are: * InbredSet * CaseAttribute * StrainXRef * Strain * CaseAttributeXRefNew We can fetch case-attribute data from the database with: ``` SELECT caxrn.*, ca.Name AS CaseAttributeName, ca.Description AS CaseAttributeDescription, iset.InbredSetId AS OrigInbredSetId FROM CaseAttribute AS ca INNER JOIN CaseAttributeXRefNew AS caxrn ON ca.Id=caxrn.CaseAttributeId INNER JOIN StrainXRef AS sxr ON caxrn.StrainId=sxr.StrainId INNER JOIN InbredSet AS iset ON sxr.InbredSetId=iset.InbredSetId WHERE caxrn.value != 'x' AND caxrn.value IS NOT NULL; ``` which gives us all the information we need to rework the database schema. Since the Case-Attributes are group-level, we need to move the `InbredSetId` to the `CaseAttribute` table from the `CaseAttributeXRefNew` table. For more concrete relationship declaration, we can have the `CaseAttributeXRefNew` table have it primary key be composed of the `InbredSetId`, `StrainId` and `CaseAttributeId`. That has the added advantage that we can index the table on `InbredSetId` and `StrainId`. That leaves the `CaseAttribute` table with the following columns: * InbredSetId: Foreign Key from `InbredSet` table * Id: The CaseAttribute identifier * Name: Textual name for the Case-Attribute * Description: Textual description fro the case-attribute while the `CaseAttributeXRefNew` table ends up with the following columns: * InbredSetId: Foreign Key from `InbredSet` table * StrainId: The strain * CaseAttributeId: The case-attribute identifier * Value: The value for the case-attribute for this specific strain There will not be any `NULL` values allowed for any of the columns in both tables. If a strain has no value, we simply delete the corresponding record from the `CaseAttributeXRefNew` table. To that end, the following script has been added to ease the migration of the table schemas: => https://github.com/genenetwork/genenetwork3/blob/dd0b29c07017ec398c447ca683dd4b4be18d73b7/scripts/update-case-attribute-tables-20230818 The script is meant to be run only once, and makes the changes mentioned above for both tables. ## Data Types > ... (and exist separately from "normal" traits mainly because they're non-numeric) The values for Case-Attributes are non-numeric data. This will probably be mostly textual data. As an example: => https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish Trait Data and Analysis for BXD_10010 we see Case-Attributes as: * Free-form text (no constraints) - see the `Status` column * Enumerations - textual data, but where the user can only pick from specific values * Links - The value displayed also acts as a link - e.g. the 'JAX:*' values in the `RRID` column => https://genenetwork.org/show_trait?trait_id=10002&dataset=CCPublish For this trait We see: * Numeric data - see the `N` and `SE` columns though that might be a misunderstanding of the quote > In the following link for example, every column after Value is a case attribute - https://genenetwork.org/show_trait?trait_id=10010&dataset=BXDPublish **TODO**: Verify whether `N` and `SE` are Case-Attributes ## Authorisation From email: > it's probably not okay to let anyone who can edit sample data for a trait also edit case attributes, since they're group level and from matrix: > The weird bug aside, Bonface had (mostly) successfully implemented editing these through the CSV files in the same way as any other sample data, but for authorization reasons this probably doesn't make sense (since a user having access to editing sample data for specific traits doesn't imply that they'd have access for editing case attributes across the entire group) From this, it implies we might need a new set of privileges for dealing with case-attributes, e.g. * group:resource:add-case-attributes - Allows user to add a completely new case attribute * group:resource:edit-case-attributes - Allows user to edit an existing case attribute * group:resource:delete-case-attributes - Allows user to delete an existing case attribute * group:resource:view-case-attributes - Allows user to view case attributes and their value Considering, however, that groups (InbredSets) are not directly linked to any auth resource, this might mean some indirection of sorts, or maybe add a new resource type that handles groups. ## Features * Editing existing case-attributes: YES * Adding new case attributes: ??? * Deleting existing case attributes: ??? Strains/samples are shared across traits. The values for the case attributes are the same for a particular strain/sample for all traits within a particular InbredSet (group). ## Related and Unsynthesised Chats => https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$myIoafLp_dIONnyNvEI0k2xf3Y8-LyiI_mkP2vBN08o?via=matrix.org ``` Zachary SloanZ I'm pretty sure multiple phenotypes and mRNA datasets can belong to the same experiment (and definitely for the purposes of case attributes since the mRNA datasets are split by tissue genotype traits should all be considered part of the same "experiment" (at least as long as we're still only databasing a single genotype file for each group) pjotrp : Case attribute editing will still need to be group level, at least until the whole feature is completely changed. Since they're basically just phenotypes we choose to show in the trait page table, and phenotypes are at the group level ``` => https://matrix.to/#/!EhYjlMbZHbGPjmAFbh:matrix.org/$P6SNnpY-nAZsDr3VZlRi05m6MT32lXBsCl-BYLh-YLM?via=matrix.org ``` Zachary SloanZ 21:14 Groups are defined by their list of samples/strains, and the "case attributes" are just "the characteristics of those samples/strains we choose to show on the trait page" (if we move away from the "group" concept entirely that could change, but if we did that we probably would also replace "case attributes" with something else because the way that's implemented is kind of weird to begin with) ZB ``` ## Related issues => /issues/case-attr-edit-error => /issues/fix-case-attribute-work => /issues/fix-case-attribute-editing => /issues/consecutive-crud-applications-when-uploading-data => /issues/edit-metadata-bugs ## References => /topics/data-uploads/datasets