about summary refs log tree commit diff
path: root/examples/genelist.scm
diff options
context:
space:
mode:
authorMunyoki Kilyungi2023-12-15 21:38:24 +0300
committerMunyoki Kilyungi2023-12-15 21:45:03 +0300
commit1ea6e2dd7655788e198dc13695c829287132498f (patch)
tree3ef484f9bd0010a40e2781826f11e9d673e887b6 /examples/genelist.scm
parent4a62e17816928e271ba982038ac36fcaf72783d2 (diff)
downloadgn-transform-databases-1ea6e2dd7655788e198dc13695c829287132498f.tar.gz
Preserve gene symbol case when used as an identifer.
Genes with varying casing (e.g., Shh, SHH) result in
`string->identifier` capitalizing the first letter by default.  This
creates inconsistencies in gene symbols, leading to different
predicates and objects for the same entity, introducing errors.

Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
Diffstat (limited to 'examples/genelist.scm')
-rwxr-xr-xexamples/genelist.scm10
1 files changed, 6 insertions, 4 deletions
diff --git a/examples/genelist.scm b/examples/genelist.scm
index fbd39c1..b19b30f 100755
--- a/examples/genelist.scm
+++ b/examples/genelist.scm
@@ -78,10 +78,12 @@
    (gnt:hasTargetSeq rdfs:domain gnc:Probeset))
   (triples
       (string->identifier
-       "gene" (regexp-substitute/global #f "[^A-Za-z0-9:]"
-                                        (string-trim-both
-                                         (field GeneList GeneSymbol))
-                                        'pre "_" 'post))
+       "gene" (regexp-substitute/global
+               #f "[^A-Za-z0-9:]"
+               (string-trim-both
+                (field GeneList GeneSymbol))
+               'pre "_" 'post)
+       #:proc (lambda (x) x))
     (set rdf:type 'gnc:GeneSymbol)
     (set rdfs:label (field GeneList GeneSymbol))
     (set dct:description (sanitize-rdf-string (field GeneList GeneDescription)))