summaryrefslogtreecommitdiff
path: root/issues/fix-broken-utf8-chars.gmi
blob: 161c3b0f23d6d61cc4bc02ad7def569b2dd3910b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Fix Broken UTF-8 characters in our Database

## Tags

* assigned: bonfacem, arthur
* type: database
* priority: high

## Description

We have jumbled up text in our database and this has been the case for years.  It's impractical for a user to do the fixes using the metadata editing form because there are too many cases.  A script that fixes this should be created to fix this issue.

This thread has some really nice ideas

=> https://stackoverflow.com/questions/1476356/detecting-utf8-broken-characters-in-mysql Detecting broken characters in mysql

An example of a broken unicode character is: ">".  The character ">" appears broken because it is not a valid Unicode character. This can happen for a number of reasons, such as a mistake when typing or pasting the character, corruption during transmission (most likely the case) or storage, or a lack of support for the character in the font or software being used to display the text.

To find the correct replacement for the character ">", or any other character for the matter, you can look up its Unicode code point. In this case, the code point for ">" is "U+2273", which corresponds to the character "≥". You can then use this code point to search for and replace the broken character with the correct character in the text.

Tables I've had to convert:

* Investigators
* InfoFiles

* closed