summary refs log tree commit diff
path: root/issues/gn-uploader
diff options
context:
space:
mode:
Diffstat (limited to 'issues/gn-uploader')
-rw-r--r--issues/gn-uploader/AuthorisationError-gn-uploader.gmi70
-rw-r--r--issues/gn-uploader/check-genotypes-in-database-too.gmi22
-rw-r--r--issues/gn-uploader/export-uploaded-data-to-RDF-store.gmi88
-rw-r--r--issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi2
-rw-r--r--issues/gn-uploader/guix-build-gn-uploader-error.gmi2
-rw-r--r--issues/gn-uploader/handling-tissues-in-uploader.gmi10
-rw-r--r--issues/gn-uploader/link-authentication-authorisation.gmi21
-rw-r--r--issues/gn-uploader/move-uploader-to-tux02.gmi48
-rw-r--r--issues/gn-uploader/probeset-not-applicable-to-all-data.gmi9
-rw-r--r--issues/gn-uploader/provide-page-for-uploaded-data.gmi27
-rw-r--r--issues/gn-uploader/replace-redis-with-sqlite3.gmi29
-rw-r--r--issues/gn-uploader/resume-upload.gmi41
-rw-r--r--issues/gn-uploader/speed-up-rqtl2-qc.gmi30
-rw-r--r--issues/gn-uploader/uploading-samples.gmi51
14 files changed, 445 insertions, 5 deletions
diff --git a/issues/gn-uploader/AuthorisationError-gn-uploader.gmi b/issues/gn-uploader/AuthorisationError-gn-uploader.gmi
new file mode 100644
index 0000000..262ad19
--- /dev/null
+++ b/issues/gn-uploader/AuthorisationError-gn-uploader.gmi
@@ -0,0 +1,70 @@
+# AuthorisationError in gn uploader 
+
+## Tags 
+* assigned: fredm 
+* status: closed, obsoleted
+* priority: critical 
+* type: error 
+* key words: authorisation, permission 
+
+## Description 
+
+Trying to create population for Kilifish dataset in the gn-uploader webpage, 
+then encountered the following error: 
+```sh 
+Traceback (most recent call last):
+ File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/flask/app.py", line 917, in full_dispatch_request
+   rv = self.dispatch_request()
+ File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/flask/app.py", line 902, in dispatch_request
+   return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
+ File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/authorisation.py", line 23, in __is_session_valid__
+   return session.user_token().either(
+ File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/pymonad/either.py", line 89, in either
+   return right_function(self.value)
+ File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/authorisation.py", line 25, in <lambda>
+   lambda token: function(*args, **kwargs))
+ File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/population/views.py", line 185, in create_population
+   ).either(
+ File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/pymonad/either.py", line 91, in either
+   return left_function(self.monoid[0])
+ File "/gnu/store/wxb6rqf7125sb6xqd4kng44zf9yzsm5p-profile/lib/python3.10/site-packages/uploader/monadic_requests.py", line 99, in __fail__
+   raise Exception(_data)
+Exception: {'error': 'AuthorisationError', 'error-trace': 'Traceback (most recent call last):
+  File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/flask/app.py", line 917, in full_dispatch_request
+      rv = self.dispatch_request()
+  File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/flask/app.py", line 902, in dispatch_request
+    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
+  File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/authlib/integrations/flask_oauth2/resource_protector.py", line 110, in decorated
+    return f(*args, **kwargs)
+  File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/resources/inbredset/views.py", line 95, in create_population_resource
+    ).then(
+  File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/pymonad/monad.py", line 152, in then
+    result = self.map(function)
+  File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/pymonad/either.py", line 106, in map
+    return self.__class__(function(self.value), (None, True))
+  File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/resources/inbredset/views.py", line 98, in <lambda>
+    "resource": create_resource(
+  File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/resources/inbredset/models.py", line 25, in create_resource
+    return _create_resource(cursor,
+  File "/gnu/store/38iayxz7dgm86f2x76kfaa6gwicnnjg4-profile/lib/python3.10/site-packages/gn_auth/auth/authorisation/checks.py", line 56, in __authoriser__
+    raise AuthorisationError(error_description)
+gn_auth.auth.errors.AuthorisationError: Insufficient privileges to create a resource
+', 'error_description': 'Insufficient privileges to create a resource'}
+
+```
+The error above resulted from the attempt to upload the following information on the gn-uploader-`create population section` 
+Input details are as follows: 
+Full Name: Kilifish F2 Intercross Lines
+Name: KF2_Lines
+Population code: KF2
+Description: Kilifish second generation population 
+Family: Crosses, AIL, HS 
+Mapping Methods: GEMMA, QTLReaper, R/qtl 
+Genetic type: intercross 
+
+And when pressed the `Create Population` icon, it led to the error above.  
+
+## Closed as Obsolete
+
+* The service this was happening on (https://staging-uploader.genenenetwork.org) is no longer running
+* Most of the authorisation issues are resolved in newer code
diff --git a/issues/gn-uploader/check-genotypes-in-database-too.gmi b/issues/gn-uploader/check-genotypes-in-database-too.gmi
new file mode 100644
index 0000000..4e034b7
--- /dev/null
+++ b/issues/gn-uploader/check-genotypes-in-database-too.gmi
@@ -0,0 +1,22 @@
+# Check Genotypes in the Database for R/qtl2 Uploads
+
+## Tags
+
+* type: bug
+* assigned: fredm
+* priority: high
+* status: closed, completed, fixed
+* keywords: gn-uploader, uploader, upload, genotypes, geno
+
+## Description
+
+Currently, the uploader expects that a R/qtl2 bundle be self-contained, i.e. it contains all the genotypes and other data that fully describe the data in that bundle.
+
+This is unnecessary, in a lot of situations, seeing as Genenetwork might already have the appropriate genotypes already in its database.
+
+This issue tracks the implementation for the check of the genotypes against both the genotypes provided in the bundle, and those already in the database.
+
+### Updates
+
+Fixed in
+=> https://git.genenetwork.org/gn-uploader/commit/?id=0e74a1589db9f367cdbc3dce232b1b6168e3aca1 this commit
diff --git a/issues/gn-uploader/export-uploaded-data-to-RDF-store.gmi b/issues/gn-uploader/export-uploaded-data-to-RDF-store.gmi
new file mode 100644
index 0000000..3ef05cd
--- /dev/null
+++ b/issues/gn-uploader/export-uploaded-data-to-RDF-store.gmi
@@ -0,0 +1,88 @@
+# Export Uploaded Data to LMDB and RDF Stores
+
+## Tags
+
+* assigned: fredm, bonz
+* priority: medium
+* type: feature-request
+* status: open
+* keywords: API, data upload, gn-uploader
+
+## Description
+
+With the QC/Data Upload project nearing completion, and being placed in front of the initial user-testing cohort, we need a way for exporting all data that is uploaded into the RDF store, either at upload time, or a short time after.
+
+
+Users will use the QC/Data upload project[1] to upload their data to GeneNetwork. This will mostly be numerical data in Tab-Separated-Values (.tsv) files.
+
+Once this is done, we do want to have this data available to the user on GeneNetwork as soon as possible so that they can start doing their analyses with the data.
+
+Following @Munyoki's work[2] on getting the data endpoints on GN3, it should, hypothetically, be possible for the user to simply upload the data, and using the GN3 API, immediately begin their analyses on the data. In practice, however, it will need that we export the uploaded data into LMDB, and possibly any related metadata into virtuoso to enable this to work.
+
+This document explores what is needed to get that to work.
+
+## Exporting Sample Data
+
+We can export the sample (numeric) data to LMDB with the "dataset->lmdb" project[3].
+
+The project (as of 2023-11-14T10:12+03:00UTC) does not define an installable binary/script, and therefore cannot be simply added to the data upload project[1] as a dependency and invoked in the background.
+
+### Data Differences
+
+The first line of the .tsv file uploaded is a header line indicating what each field is.
+The first field of the .tsv is a trait's name/identifier. All other fields are numerical strain/sample values for each line/record in the file.
+
+A sample of a .tsv for upload
+=> https://gitlab.com/fredmanglis/gnqc_py/-/blob/main/tests/test_data/average.tsv?ref_type=heads can be found here
+
+From
+=> https://github.com/BonfaceKilz/gn-dataset-dump/blob/main/README.org the readme
+it looks like the each record/line/trait from the .tsv file will correspond to a "db-path" in the LMDB data store. This path could be of the form:
+
+```
+/path/to/lmdb/storage/directory/<group-or-inbredset>/<trait-name-or-identifier>/
+```
+
+where
+
+* `<group-or-inbredset>` is a population/group of sorts, e.g. BXD, BayXSha, etc
+* `<trait-name-or-identifier>` is the value in the first field for each and every line
+
+**NB**: Verify this with @Munyoki
+
+### TODOs
+
+* [ ] build an entrypoint binary/script to invoke from other projects
+* [ ] verify initial inference/assumptions regarding data with @Munyoki
+* [ ] translate the uploaded data into a form ingestable by the export program. This could be done in either one of the projects -- I propose the QC/Data Upload project
+* [ ] figure out and document new GN3 data endpoints for users
+* [ ]
+
+## Exporting Metadata
+
+Immediately after upload of the data from the .tsv files, the data will most likely have very little metadata attached. Some of the metadata that is assured to be present is:
+
+* Species: The species that the data regards
+* Group/InbredSet
+* Dataset: The dataset that the data is attached to
+
+The metadata is useful for searching for the data. The "metadata->rdf" project[4] is used for exporting the metadata to RDF and will need to be used to initialise the metadata for newly uploaded data.
+
+### TODOs
+
+* [ ] How do we handle this?
+
+
+## Related Issues and Topics
+
+=> https://issues.genenetwork.org/topics/next-gen-databases/design-doc
+=> https://issues.genenetwork.org/topics/lmms/rqtl2/using-rqtl2-lmdb-adapter
+=> https://issues.genenetwork.org/issues/dump-sample-data-to-lmdb
+=> https://issues.genenetwork.org/topics/database/genotype-database
+
+## Footnotes
+
+=> https://git.genenetwork.org/gn-uploader/ 1: QC/Data upload project (gn-uploader) repository
+=> https://github.com/genenetwork/genenetwork3/pull/130 2: Munyoki's Pull request
+=> https://github.com/BonfaceKilz/gn-dataset-dump 3: Dataset -> LMDB export repository
+=> https://git.genenetwork.org/gn-transform-databases/ 4: Metadata -> RDF export repository
diff --git a/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi b/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi
index d2c33e8..5a5cdfa 100644
--- a/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi
+++ b/issues/gn-uploader/gn-uploader-container-running-wrong-gn2.gmi
@@ -3,7 +3,7 @@
 ## Tags
 
 * assigned: fredm, aruni
-* status: open
+* status: closed, completed
 * priority: high
 * type: bug
 * keywords: guix, gn-uploader
diff --git a/issues/gn-uploader/guix-build-gn-uploader-error.gmi b/issues/gn-uploader/guix-build-gn-uploader-error.gmi
index 44a5c4b..aeb6308 100644
--- a/issues/gn-uploader/guix-build-gn-uploader-error.gmi
+++ b/issues/gn-uploader/guix-build-gn-uploader-error.gmi
@@ -86,7 +86,7 @@ Filesystem      Size  Used Avail Use% Mounted on
 
 so we know that's not a problem.
 
-A similar thing had shown up on space.uthsc.edu.
+A similar thing had shown up on our space server.
 
 ### More Troubleshooting Efforts
 
diff --git a/issues/gn-uploader/handling-tissues-in-uploader.gmi b/issues/gn-uploader/handling-tissues-in-uploader.gmi
index 826af15..0c43040 100644
--- a/issues/gn-uploader/handling-tissues-in-uploader.gmi
+++ b/issues/gn-uploader/handling-tissues-in-uploader.gmi
@@ -2,11 +2,11 @@
 
 ## Tags
 
-* status: open
+* status: closed, wontfix
 * priority: high
 * assigned: fredm
 * type: feature-request
-* keywords: gn-uploader, tissues
+* keywords: gn-uploader, tissues, archived
 
 ## Description
 
@@ -112,3 +112,9 @@ ALTER TABLE Tissue MODIFY Id INT(5) UNIQUE NOT NULL;
 
 * [1] https://gn1.genenetwork.org/webqtl/main.py?FormID=schemaShowPage#ProbeFreeze
 * [2] https://gn1.genenetwork.org/webqtl/main.py?FormID=schemaShowPage#Tissue
+
+## Closed as WONTFIX
+
+I am closing this issue because it was created (2024-03-28) while I had a fundamental misunderstanding of the way data is laid out in the database.
+
+The information on the schema/layout of the tables is still useful, but chances are, we'll look at the tables themselves anyway should we need to figure out the schema.
diff --git a/issues/gn-uploader/link-authentication-authorisation.gmi b/issues/gn-uploader/link-authentication-authorisation.gmi
new file mode 100644
index 0000000..b64f887
--- /dev/null
+++ b/issues/gn-uploader/link-authentication-authorisation.gmi
@@ -0,0 +1,21 @@
+# Link Authentication/Authorisation
+
+## Tags
+
+* status: closed, completed
+* assigned: fredm
+* priority: critical
+* type: feature request, feature-request
+* keywords: gn-uploader, gn-auth, authorisation, authentication, uploader, upload
+
+## Description
+
+The last chain in the link to the uploads is the authentication/authorisation. Once the user uploads their data, they need access to it. The auth system, by default, will deny anyone/everyone access to any data that is not linked to a resource and which no user has any roles allowing them access to the data.
+
+We, currently, assign such data to the user manually, but that is not a sustainable way of working, especially as the uploader is exposed to more and more users.
+
+### Close as Completed
+
+The current iteration of the uploader does actually take into account the user that is uploading the data, granting them ownership of the uploaded data. By default, the data is not public, and is only accessible to the user who uploaded it.
+
+The user who uploads the data (and therefore own it) can later grant access to other users of the system.
diff --git a/issues/gn-uploader/move-uploader-to-tux02.gmi b/issues/gn-uploader/move-uploader-to-tux02.gmi
new file mode 100644
index 0000000..20c5b24
--- /dev/null
+++ b/issues/gn-uploader/move-uploader-to-tux02.gmi
@@ -0,0 +1,48 @@
+# Move Uploader to tux02
+
+## Tags
+
+* type: migration
+* assigned: fredm
+* priority: high
+* status: closed, completed, fixed
+* keywords: gn-uploader, guix, container, deploy
+
+## Databases
+
+### MariaDB
+
+To avoid corrupting the data on CI/CD, we need to run a separate database server.
+This implies separate configurations, and separate startup.
+
+Some of the things to do to enable this, then, are:
+
+* [x] Provide separate configs and run db server on separate port
+  - Configs put in /etc/mysql3307
+  - Selected port 3307
+  - datadir in /var/lib/mysql3307 -> /export5
+* [x] Provide separate data directory for the content
+  - extract backup
+* [x] Maybe suffix the files with the port number, e.g.
+  ```
+    datadir       = /var/lib/mysql3307
+    socket        = /var/run/mysqld/mysqld3307.sock
+    ︙
+  ```
+
+### SQLite
+
+- [ ] Provide separate path for the SQLite database file
+- [ ] Run migrations on SQLite database file
+- [ ] Create admin user
+- [ ] Make existing data public by default
+
+## Build Script
+
+- [x] Provide separate host directories that are writeable from the container(s)
+
+## Systemd
+
+- [x] Provide unit file for separate MariadDB running on different port
+
+## …
diff --git a/issues/gn-uploader/probeset-not-applicable-to-all-data.gmi b/issues/gn-uploader/probeset-not-applicable-to-all-data.gmi
index 1841d36..af3b274 100644
--- a/issues/gn-uploader/probeset-not-applicable-to-all-data.gmi
+++ b/issues/gn-uploader/probeset-not-applicable-to-all-data.gmi
@@ -4,7 +4,7 @@
 
 * type: bug
 * assigned: fredm
-* status: open
+* status: closed
 * priority: high
 * keywords: gn-uploader, uploader, ProbeSet
 
@@ -20,3 +20,10 @@ applicable to our data, I don't think.
 ```
 
 It seems like some of the data does not require a ProbeSet, and in that case, it should be possible to add it without one.
+
+
+## Notes
+
+This "bug" is obsoleted by the fact that the implementation leading to it was entirely wrong.
+
+The feature that was leading to this bug no longer exists, and will have to be re-implemented from scratch with the involvement of @acenteno.
diff --git a/issues/gn-uploader/provide-page-for-uploaded-data.gmi b/issues/gn-uploader/provide-page-for-uploaded-data.gmi
new file mode 100644
index 0000000..5ab7f80
--- /dev/null
+++ b/issues/gn-uploader/provide-page-for-uploaded-data.gmi
@@ -0,0 +1,27 @@
+# Provide Page/Link for/to Uploaded Data
+
+## Tags
+
+* status: closed, completed
+* assigned: fredm
+* priority: medium
+* type: feature, feature request, feature-request
+* keywords: gn-uploader, uploader, data dashboard
+
+## Description
+
+Once a user has uploaded their data, provide them with a landing page/dashboard for the data they have uploaded, with details on what that data is.
+
+* Should we provide a means to edit the data here (mostly to add metadata and the like)?
+* Maybe the page should actually be shown on GN2?
+
+## Blockers
+
+Depends on
+
+=> /issues/gn-uploader/link-authentication-authorisation
+
+
+## Close as complete
+
+Current uploader directs user to a view of the data they uploader on GN2. This is complete.
diff --git a/issues/gn-uploader/replace-redis-with-sqlite3.gmi b/issues/gn-uploader/replace-redis-with-sqlite3.gmi
new file mode 100644
index 0000000..d3f94f0
--- /dev/null
+++ b/issues/gn-uploader/replace-redis-with-sqlite3.gmi
@@ -0,0 +1,29 @@
+# Replace Redis with SQL
+
+## Tags
+
+* status: open
+* priority: low
+* assigned: fredm
+* type: feature, feature-request, feature request
+* keywords: gn-uploader, uploader, redis, sqlite, sqlite3
+
+## Description
+
+We currently (as of 2024-06-27) use Redis for tracking any asynchronous jobs (e.g. QC on uploaded files).
+
+A lot of what we use redis for, we can do in one of the many SQL databases (we'll probably use SQLite3 anyway), which are more standardised, and easier to migrate data from and to. It has the added advantage that we can open multiple connections to the database, enabling the different processes to update the status and metadata of the same job consistently.
+
+Changes done here can then be migrated to the other systems, i.e. GN2, GN3, and gn-auth, as necessary.
+
+### 2025-12-31: Progress Update
+
+Initial basic implementation can be found in:
+
+=> https://git.genenetwork.org/gn-libs/tree/gn_libs/jobs
+=> https://git.genenetwork.org/gn-uploader/commit/?id=774a0af9db439f50421a47249c57e5a0a6932301
+=> https://git.genenetwork.org/gn-uploader/commit/?id=589ab74731aed62b1e1b3901d25a95fc73614f57
+
+and others.
+
+More work needs to be done to clean-up some minor annoyances.
diff --git a/issues/gn-uploader/resume-upload.gmi b/issues/gn-uploader/resume-upload.gmi
new file mode 100644
index 0000000..0f9ba30
--- /dev/null
+++ b/issues/gn-uploader/resume-upload.gmi
@@ -0,0 +1,41 @@
+# gn-uploader: Resume Upload
+
+## Tags
+
+* status: closed, completed, fixed
+* priority: medium
+* assigned: fredm, flisso
+* type: feature request, feature-request
+* keywords: gn-uploader, uploader, upload, resume upload
+
+## Description
+
+If a user is uploading a particularly large file, we might need to provide a way for the user to resume their upload of the file.
+
+Maybe this can wait until we have
+=> /issues/gn-uploader/link-authentication-authorisation linked authentication/authorisation to gn-uploader.
+In this way, each upload can be linked to a specific user.
+
+### TODOs
+
+* [x] Build UI to allow uploads
+* [x] Build back-end to handle uploads
+* [x] Handle upload failures/errors
+* [x] Deploy to staging
+
+### Updates
+
+=> https://git.genenetwork.org/gn-uploader/commit/?id=9a8dddab072748a70d43416ac8e6db69ad6fb0cb
+=> https://git.genenetwork.org/gn-uploader/commit/?id=df9da3d5b5e4382976ede1b54eb1aeb04c4c45e5
+=> https://git.genenetwork.org/gn-uploader/commit/?id=47c2ea64682064d7cb609e5459d7bd2e49efa17e
+=> https://git.genenetwork.org/gn-uploader/commit/?id=a68fe177ae41f2e58a64b3f8dcf3f825d004eeca
+
+### Possible Resources
+
+=> https://javascript.info/resume-upload
+=> https://github.com/23/resumable.js/
+=> https://www.dropzone.dev/
+=> https://stackoverflow.com/questions/69339582/what-hash-python-3-hashlib-yields-a-portable-hash-of-file-contents
+
+
+This is mostly fixed. Any arising bugs can be tracked is separate issues.
diff --git a/issues/gn-uploader/speed-up-rqtl2-qc.gmi b/issues/gn-uploader/speed-up-rqtl2-qc.gmi
new file mode 100644
index 0000000..43e6d49
--- /dev/null
+++ b/issues/gn-uploader/speed-up-rqtl2-qc.gmi
@@ -0,0 +1,30 @@
+# Speed Up QC on R/qtl2 Bundles
+
+## Tags
+
+## Description
+
+The default format for the CSV files in a R/qtl2 bundle is:
+
+```
+matrix of individuals × (markers/phenotypes/covariates/phenotype covariates/etc.)
+```
+
+(A) (f/F)ile(s) in the R/qtl2 bundle could however
+=> https://kbroman.org/qtl2/assets/vignettes/input_files.html#csv-files be transposed,
+which means the system needs to "un-transpose" the file(s) before processing.
+
+Currently, the system does this by reading all the files of a particular type, and then "un-transposing" the entire thing. This leads to a very slow system.
+
+This issue proposes to do the quality control/assurance processing on each file in isolation, where possible - this will allow parallelisation/multiprocessing of the QC checks.
+
+The main considerations that need to be handled are as follows:
+
+* Do QC on (founder) genotype files (when present) before any of the other files
+* Genetic and physical maps (if present) can have QC run on them after the genotype files
+* Do QC on phenotype files (when present) after genotype files but before any other files
+* Covariate and phenotype covariate files come after the phenotype files
+* Cross information files … ?
+* Sex information files … ?
+
+We should probably detail the type of QC checks done for each type of file
diff --git a/issues/gn-uploader/uploading-samples.gmi b/issues/gn-uploader/uploading-samples.gmi
new file mode 100644
index 0000000..11842b9
--- /dev/null
+++ b/issues/gn-uploader/uploading-samples.gmi
@@ -0,0 +1,51 @@
+# Uploading Samples
+
+## Tags
+
+* status: open
+* assigned: fredm
+* interested: acenteno, zachs, flisso
+* priority: high
+* type: feature-request
+* keywords: gn-uploader, uploader, samples, strains
+
+## Description
+
+This will track the various notes regarding the upload of samples onto GeneNetwork.
+
+### Sample Lists
+
+From the email thread(s) with @zachs, @flisso and @acenteno
+
+```
+When there's a new set of individuals, it generally needs to be added as a new group. In the absence of genotype data, a "dummy" .geno file currently needs to be generated* in order to define the sample list (if you look at the list of .geno files in genotype_files/genotype you'll find some really small files that just have either a single marker or a bunch of fake markers calls "Marker1, Marker2, etc" - these are solely just used to get the samplelist from the columns). So in theory such a file could be generated as a part of the upload process in the absence of genotypes
+```
+
+We note, however, that the as @zachs mentions
+
+```
+This is really goofy and should probably change. I've brought up the idea of just replacing these with JSON files containing group metadata (including samplelist), but we've never actually gone through with making any change to this. I already did something sorta similar to this with the existing JSON files (in genotype_files/genotype), but those are currently only used in situations where there are either multiple genotype files, or a genotype file only contains a subset of samples/strains from a group (so the JSON file tells mapping to only use those samples/strains).
+```
+
+We need to explore whether such a change might need updates to the GN2/GN3 code to ensure code that depends on these dummy files can also use the new format JSON files too.
+
+Regarding the order of the samples, from the email thread:
+
+```
+Regarding the order of samples, it can basically be whatever we decide it is. It just needs to stay consistent (like if there are multiple genotype files). It only really affects how it's displayed, and any other genotype files we use for mapping needs to share the same order.
+```
+
+The ordering of the samples has no bearing on the analysis of the data, i.e. it does not affect the results of computations.
+
+
+### Curation
+
+```
+But any time new samples are involved, there probably needs to be some explicit confirmation by a curator like Rob (since we want to avoid a situation where a sample/strain just has a typo or somethin and we treat it like a new sample/strain).
+```
+
+also
+
+```
+When there's a mix of existing individuals, I think it's usually the case that it's the same group (that is being expanded with new individuals), but anything that involves adding new samples should probably involve some sort of direct/explicit confirmation from a curator like Rob or something.
+```