# Wrong CSV in ITP_10001 longevity dataset ## Tags * assigned: bonfacem * type: bug * status: closed * priority: critical * keywords: critical bug, in progress, metadata, phenotypes ## Description [From Suheeta]: "I downloaded the sample csv file for ITP_10001 to add a couple of attributes to it. What I see is that the csv file is somewhat messed up. The values for Sex, Site, Tx, and Year have shifted down to next row for each sample (see screenshot below). Not sure if this is the case for everyone or did something malfunction for me?" The screenshot: => https://github.com/genenetwork/gn-gemtext-threads/blob/main/pictures/ITP_10001-error.png ## Tasks - [X] Reproduce the issue - [X] Check if database is affected in any way - [X] Send a patch to fix - [X] Notify every relevant user - [X] Update case-attr page with links ## Notes ### Mon 11 Apr 2022 13:19:14 EAT: The CSV file is technically fine. In that database, some characters are inserted with control sequences that need to be stripped out. Here's a current snip of how that looks like: ``` JL00005,896.000000,x,x,896,4/22/04,,4OHPBN_J,Oct,,0^M,M,JL,1,2004 JL00006,1077.000000,x,x,1077,4/22/04,,4OHPBN_J,Apr,,0^M,M,JL,1,2004 JL00007,790.000000,x,x,790,4/22/04,,4OHPBN_J,Jun,,0^M,M,JL,1,2004 JL00032,916.000000,x,x,916,4/21/04,2017.06,Cont_04_J,Oct,M,2^M,M,JL,0,2004 JL00033,1099.000000,x,x,1099,4/21/04,,Cont_04_J,Apr,,0^M,M,JL,0,2004 ``` Notice the "^M" represents a carriage return. See: => https://en.wikipedia.org/wiki/ANSI_escape_code#C0_control_codes When storing case attributes, values with control sequences are also stored, and when you download that data for use, they were not being stripped out. As a consequence, opening the csv file in Excel (or similar) software resulted in data that seems jumbled up. ### Tue 12 Apr 2022 13:33:31 EAT The relevant fixes: => https://github.com/genenetwork/genenetwork2/pull/697 => https://github.com/genenetwork/genenetwork3/pull/91