From e1cfcced925e5f290c67b0194817b1af907ddeb6 Mon Sep 17 00:00:00 2001 From: Frederick Muriuki Muriithi Date: Tue, 16 Apr 2024 11:58:51 +0300 Subject: New issue: Data integrity issue with table identifiers --- .../gn-uploader/data-integrity-for-table-ids.gmi | 39 ++++++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 issues/gn-uploader/data-integrity-for-table-ids.gmi diff --git a/issues/gn-uploader/data-integrity-for-table-ids.gmi b/issues/gn-uploader/data-integrity-for-table-ids.gmi new file mode 100644 index 0000000..7e9d815 --- /dev/null +++ b/issues/gn-uploader/data-integrity-for-table-ids.gmi @@ -0,0 +1,39 @@ +# Data Integrity for Table Identifiers + +## Tags + +* type: bug +* status: open +* assigned: fredm +* priority: critical +* keywords: data integrity, mariadb, gn-uploader + +## Description + +The use of + +``` +SELECT MAX(Id) FROM … +``` + +to get the identifier for a new row is not very robust, especially for auto-increment fields. + +It also leads to problems when we have tables that have multiple identifiers, e.g table 'InbredSet' with identifiers 'Id' and 'InbredSetId'. + +Instead, we need to do something like: + +``` +︙ +cursor.execute("INSERT INTO InbredSet(…) …") +new_id = cursor.lastrowid +cursor.execute("UPDATE InbredSet SET InbredSetId=%s WHERE Id=%s", + (new_id, new_id)) +︙ +``` + +To see the modules that need to be updated, do + +``` +$ cd /path/to/gn-uploader/ +$ find ./ -name '*.py' -print0 | xargs --no-run-if-empty --null --max-procs=7 grep 'MAX(Id)' +``` -- cgit v1.2.3