# Data Integrity for Table Identifiers

## Tags

* type: bug
* assigned: fredm
* priority: critical
* status: closed, completed
* keywords: data integrity, mariadb, gn-uploader

## Description

The use of

```
SELECT MAX(Id) FROM …
```

to get the identifier for a new row is not very robust, especially for auto-increment fields.

It also leads to problems when we have tables that have multiple identifiers, e.g table 'InbredSet' with identifiers 'Id' and 'InbredSetId'.

Instead, we need to do something like:

```
︙
cursor.execute("INSERT INTO InbredSet(…) …")
new_id = cursor.lastrowid
cursor.execute("UPDATE InbredSet SET InbredSetId=%s WHERE Id=%s",
               (new_id, new_id))
︙
```

To see the modules that need to be updated, do

```
$ cd /path/to/gn-uploader/
$ find ./ -name '*.py' -print0 | xargs --no-run-if-empty --null --max-procs=7 grep 'MAX(Id)'
```

The changes proposed above only work for single inserts; it will not work for `executemany()` calls like those present in the following files:

* scripts/insert_data.py
* scripts/rqtl2/install_phenos.py
* scripts/rqtl2/install_genotypes.py

For these, we have to retain the `SELECT MAX(Id) …` form.