summaryrefslogtreecommitdiff
path: root/issues
diff options
context:
space:
mode:
authorMunyoki Kilyungi2023-12-20 01:26:39 +0300
committerMunyoki Kilyungi2023-12-20 01:26:39 +0300
commit798deb388638f13ed40ecc19eed8c53d44b6ab99 (patch)
treea1d4ca2026d20dd88fa478ebae827d1de17de8c6 /issues
parentd0714caaa88df251584ed9cb47745c1b35069533 (diff)
downloadgn-gemtext-798deb388638f13ed40ecc19eed8c53d44b6ab99.tar.gz
Update genelist issue.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
Diffstat (limited to 'issues')
-rw-r--r--issues/transform-genelist-to-rdf.gmi16
1 files changed, 13 insertions, 3 deletions
diff --git a/issues/transform-genelist-to-rdf.gmi b/issues/transform-genelist-to-rdf.gmi
index 3c20b5e..bc0f1b8 100644
--- a/issues/transform-genelist-to-rdf.gmi
+++ b/issues/transform-genelist-to-rdf.gmi
@@ -47,11 +47,21 @@ Identifying duplicates:
SELECT GeneSymbol, GeneId, SpeciesId, COUNT(CONCAT(GeneSymbol, "_", GeneId, "_", SpeciesId)) AS `count` FROM GeneList GROUP BY BINARY GeneSymbol, GeneId, chromosome, txStart, txEnd HAVING COUNT(CONCAT(GeneSymbol, "_", GeneId, "_", SpeciesId)) > 1;
```
-## Resolution
-This has been resolved in 533c8d85809b, cfcfa78e0149 in:
+## Unique Gene Identifiers
+
+In the GeneList table, some genes share GeneIds and GeneSymbols. GeneIds are unique within a species, while GeneSymbols are unique across species. In cases where GeneSymbols and GeneIDs match, different AlignIDs exist. To create unique identifiers for genes in the GeneList table, we use a query like:
+
+```sql
+SELECT CONCAT_WS("_", GeneSymbol, GeneID, AlignID) FROM GeneList;
+```
+
+For the GeneList_rn33 table, due to ambiguous cases, we rely on the table's id as a unique identifier. Here's an example of duplicate entries for a gene, differing only in txStart/txEnd/cdsStart/cdsEnd/exonStarts/exonEnd values:
+
+```sql
+SELECT * FROM GeneList_rn33 WHERE geneSymbol="Cbara1" AND NM_ID="NM_199412"\G
+```
-=> https://git.genenetwork.org/gn-transform-databases/ gn-transform-databases
* closed