summary refs log tree commit diff
diff options
context:
space:
mode:
authorMunyoki Kilyungi2023-03-31 16:00:03 +0300
committerMunyoki Kilyungi2023-04-03 10:48:05 +0300
commit95614b282d8bc201161282c053ac406e2558dd76 (patch)
tree70dadb5f54d0809cb40a21dc8aa8b234cd5a4e33
parent222fc5196623f177db77d00439dcbb367ed69b7b (diff)
downloadgn-gemtext-95614b282d8bc201161282c053ac406e2558dd76.tar.gz
List broken utf-8 characters during genewiki dump
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
-rw-r--r--issues/dump-genewiki-metadata.gmi18
1 files changed, 18 insertions, 0 deletions
diff --git a/issues/dump-genewiki-metadata.gmi b/issues/dump-genewiki-metadata.gmi
index 398fb67..a665dee 100644
--- a/issues/dump-genewiki-metadata.gmi
+++ b/issues/dump-genewiki-metadata.gmi
@@ -43,3 +43,21 @@ To query these entries:
 ```
 SELECT * FROM GeneRIF_BASIC WHERE symbol = 'NEWENTRY'\G
 ```
+
+* Broken UTF-8 character sets that rapper errored out on and that had to be manually fixed.  Here's a list:
+
+```
+'(("\x28" . "")
+  ("\x29" . "")
+  ("\xa0" . " ")
+  ("â\x81„" . "/")
+  ("â€\x9d" . #\")
+  ("’" . #\')
+  ("\x02" . "")
+  ("\x01" . "")
+  ("β" . "β")
+  ("α-Â\xad" . "α")
+  ("Â\xad" . "")
+  ("α" . "α")
+  ("–" . "-"))
+```