From 95614b282d8bc201161282c053ac406e2558dd76 Mon Sep 17 00:00:00 2001 From: Munyoki Kilyungi Date: Fri, 31 Mar 2023 16:00:03 +0300 Subject: List broken utf-8 characters during genewiki dump Signed-off-by: Munyoki Kilyungi --- issues/dump-genewiki-metadata.gmi | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'issues') diff --git a/issues/dump-genewiki-metadata.gmi b/issues/dump-genewiki-metadata.gmi index 398fb67..a665dee 100644 --- a/issues/dump-genewiki-metadata.gmi +++ b/issues/dump-genewiki-metadata.gmi @@ -43,3 +43,21 @@ To query these entries: ``` SELECT * FROM GeneRIF_BASIC WHERE symbol = 'NEWENTRY'\G ``` + +* Broken UTF-8 character sets that rapper errored out on and that had to be manually fixed. Here's a list: + +``` +'(("\x28" . "") + ("\x29" . "") + ("\xa0" . " ") + ("â\x81„" . "/") + ("â€\x9d" . #\") + ("’" . #\') + ("\x02" . "") + ("\x01" . "") + ("β" . "β") + ("α-Â\xad" . "α") + ("Â\xad" . "") + ("α" . "α") + ("–" . "-")) +``` -- cgit v1.2.3