From 798deb388638f13ed40ecc19eed8c53d44b6ab99 Mon Sep 17 00:00:00 2001 From: Munyoki Kilyungi Date: Wed, 20 Dec 2023 01:26:39 +0300 Subject: Update genelist issue. Signed-off-by: Munyoki Kilyungi --- issues/transform-genelist-to-rdf.gmi | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) (limited to 'issues') diff --git a/issues/transform-genelist-to-rdf.gmi b/issues/transform-genelist-to-rdf.gmi index 3c20b5e..bc0f1b8 100644 --- a/issues/transform-genelist-to-rdf.gmi +++ b/issues/transform-genelist-to-rdf.gmi @@ -47,11 +47,21 @@ Identifying duplicates: SELECT GeneSymbol, GeneId, SpeciesId, COUNT(CONCAT(GeneSymbol, "_", GeneId, "_", SpeciesId)) AS `count` FROM GeneList GROUP BY BINARY GeneSymbol, GeneId, chromosome, txStart, txEnd HAVING COUNT(CONCAT(GeneSymbol, "_", GeneId, "_", SpeciesId)) > 1; ``` -## Resolution -This has been resolved in 533c8d85809b, cfcfa78e0149 in: +## Unique Gene Identifiers + +In the GeneList table, some genes share GeneIds and GeneSymbols. GeneIds are unique within a species, while GeneSymbols are unique across species. In cases where GeneSymbols and GeneIDs match, different AlignIDs exist. To create unique identifiers for genes in the GeneList table, we use a query like: + +```sql +SELECT CONCAT_WS("_", GeneSymbol, GeneID, AlignID) FROM GeneList; +``` + +For the GeneList_rn33 table, due to ambiguous cases, we rely on the table's id as a unique identifier. Here's an example of duplicate entries for a gene, differing only in txStart/txEnd/cdsStart/cdsEnd/exonStarts/exonEnd values: + +```sql +SELECT * FROM GeneList_rn33 WHERE geneSymbol="Cbara1" AND NM_ID="NM_199412"\G +``` -=> https://git.genenetwork.org/gn-transform-databases/ gn-transform-databases * closed -- cgit v1.2.3