diff options
-rw-r--r-- | issues/gn-uploader/data-integrity-for-table-ids.gmi | 39 |
1 files changed, 39 insertions, 0 deletions
diff --git a/issues/gn-uploader/data-integrity-for-table-ids.gmi b/issues/gn-uploader/data-integrity-for-table-ids.gmi new file mode 100644 index 0000000..7e9d815 --- /dev/null +++ b/issues/gn-uploader/data-integrity-for-table-ids.gmi @@ -0,0 +1,39 @@ +# Data Integrity for Table Identifiers + +## Tags + +* type: bug +* status: open +* assigned: fredm +* priority: critical +* keywords: data integrity, mariadb, gn-uploader + +## Description + +The use of + +``` +SELECT MAX(Id) FROM … +``` + +to get the identifier for a new row is not very robust, especially for auto-increment fields. + +It also leads to problems when we have tables that have multiple identifiers, e.g table 'InbredSet' with identifiers 'Id' and 'InbredSetId'. + +Instead, we need to do something like: + +``` +︙ +cursor.execute("INSERT INTO InbredSet(…) …") +new_id = cursor.lastrowid +cursor.execute("UPDATE InbredSet SET InbredSetId=%s WHERE Id=%s", + (new_id, new_id)) +︙ +``` + +To see the modules that need to be updated, do + +``` +$ cd /path/to/gn-uploader/ +$ find ./ -name '*.py' -print0 | xargs --no-run-if-empty --null --max-procs=7 grep 'MAX(Id)' +``` |