summaryrefslogtreecommitdiff
path: root/issues/csv-error-ITP_10001-longevity-data-set.gmi
blob: 140267af02c3465660675a186597d76930ca7b60 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# Wrong CSV in ITP_10001 longevity dataset

## Tags

* assigned: bonfacem
* type: bug
* status: closed
* priority: critical
* keywords: critical bug, in progress, metadata, phenotypes

## Description

[From Suheeta]:

"I downloaded the sample csv file for ITP_10001 to add a couple of attributes to it. What I see is that the csv file is somewhat messed up. The values for
Sex, Site, Tx, and Year have shifted down to next row for each sample (see screenshot below). Not sure if this is the case for everyone or did something
malfunction for me?"

The screenshot:

=> https://github.com/genenetwork/gn-gemtext-threads/blob/main/pictures/ITP_10001-error.png

## Tasks

- [X] Reproduce the issue

- [X] Check if database is affected in any way

- [X] Send a patch to fix

- [X] Notify every relevant user

- [X] Update case-attr page with links

## Notes

### Mon 11 Apr 2022 13:19:14 EAT:

The CSV file is technically fine. In that database, some characters are inserted with control sequences that need to be stripped out. Here's a current snip of how that looks like:

```
JL00005,896.000000,x,x,896,4/22/04,,4OHPBN_J,Oct,,0^M,M,JL,1,2004
JL00006,1077.000000,x,x,1077,4/22/04,,4OHPBN_J,Apr,,0^M,M,JL,1,2004
JL00007,790.000000,x,x,790,4/22/04,,4OHPBN_J,Jun,,0^M,M,JL,1,2004
JL00032,916.000000,x,x,916,4/21/04,2017.06,Cont_04_J,Oct,M,2^M,M,JL,0,2004
JL00033,1099.000000,x,x,1099,4/21/04,,Cont_04_J,Apr,,0^M,M,JL,0,2004
```

Notice the "^M" represents a carriage return. See:

=> https://en.wikipedia.org/wiki/ANSI_escape_code#C0_control_codes

When storing case attributes, values with control sequences are also stored, and when you download that data for use, they were not being stripped out. As a consequence, opening the csv file in Excel (or similar) software resulted in data that seems jumbled up.

### Tue 12 Apr 2022 13:33:31 EAT

The relevant fixes:

=> https://github.com/genenetwork/genenetwork2/pull/697

=> https://github.com/genenetwork/genenetwork3/pull/91