The following are some of the requirements that the data in your file
- MUST fulfil before it is considered valid for this system:
-
-
-
-
File headings
-
-
The first row in the file should contains the headings. The number of
- headings in this first row determines the number of columns expected for
- all other lines in the file.
-
Each heading value in the first row MUST appear in the first row
- ONE AND ONLY ONE time
-
The sample/cases (previously 'strains') headers in your first row will be
- against those in the
- GeneNetwork database.
-
- If you encounter an error saying your sample(s)/case(s) do not exist
- in the GeneNetwork database, then you will have to use the
- Upload Samples/Cases
- option on this system to upload them.
-
-
-
-
-
Data
-
-
NONE of the data cells/fields is allowed to be empty.
- All fields/cells MUST contain a value.
-
The first column of the data rows will be considered a textual field,
- holding the "identifier" for that row
-
Except for the first column/field for each data row,
- NONE of the data columns/cells/fields should contain
- spurious characters like `eeeee`, `5.555iloveguix`, etc...
- All of them should be decimal values
-
decimal numbers must conform to the following criteria:
-
-
when checking an average file decimal numbers must have exactly three
- decimal places to the right of the decimal point.
-
when checking a standard error file decimal numbers must have six or
- greater decimal places to the right of the decimal point.
-
there must be a number to the left side of the decimal place
- (e.g. 0.55555 is allowed but .55555 is not).
-
-
-
-
-
-
-
-
-
-
Supported File Types
- We support the following file types:
-
-
-
Tab-Separated value files (.tsv)
-
-
The TAB character is used to separate the fields of each
- column
.txt files: Content has the same format as .tsv file above
-
.zip files: each zip file should contain
- ONE AND ONLY ONE file of the .tsv or .txt type above.
- Any zip file with more than one file is invalid, and so is an empty
- zip file.
Each of the sections below gives you a different option for data upload.
- Please read the documentation for each section carefully to understand what
- each section is about.
-
-
-
-
-
R/qtl2 Bundles
-
-
-
This feature combines and extends the two upload methods below. Instead of
- uploading one item at a time, the R/qtl2 bundle you upload can contain both
- the genotypes data (samples/individuals/cases and their data) and the
- expression data.
-
The R/qtl2 bundle, additionally, can contain extra metadata, that neither
- of the methods below can handle.
This feature enables you to upload expression data. It expects the data to
- be in tab-separated values (TSV) files. The data should be
- a simple matrix of phenotype × sample, i.e. The first column is a
- list of the phenotypes and the first row is a list of
- samples/cases.
-
-
If you haven't done so please go to this page to learn the requirements for
- file formats and helpful suggestions to enter your data in a fast and easy
- way.
-
-
-
PLEASE REVIEW YOUR DATA.Make sure your data complies
- with our system requirements. (
- Help
- )
-
UPLOAD YOUR DATA FOR DATA VERIFICATION. We accept
- .csv, .txt and .zip
- files (Help)
For the expression data above, you need the samples/cases in your file to
- already exist in the GeneNetwork database. If there are any samples that do
- not already exist the upload of the expression data will fail.
-
This section gives you the opportunity to upload any missing samples
{{job_name}}: parse results
-
-{%if user_aborted%}
-Job aborted by the user
-{%endif%}
-
-{{errors_display(errors, "No errors found in the file", "We found the following errors", True)}}
-
-{%if errors | length == 0 and not user_aborted %}
-
The processing of the R/qtl2 bundle you uploaded has failed. We have
- provided some information below to help you figure out what the problem
- could be.
-
If you find that you cannot figure out what the problem is on your own,
- please contact the team running the system for assistance, providing the
- following details:
-
-
R/qtl2 bundle you uploaded
-
This URL: {{request_url()}}
-
(maybe) a screenshot of this page
-
-
-
-
-
stdout
-{{cli_output(job, "stdout")}}
-
-
stderr
-{{cli_output(job, "stderr")}}
-
-
Log
-
- {%for msg in messages%}
- {{msg}}
- {%endfor%}
-
Your R/qtl2 files bundle contains a "geno" specification. You will
- therefore need to select from one of the existing Genotype datasets or
- create a new one.
-
This is the dataset where your data will be organised under.
The data is organised in a hierarchical form, beginning with
- species at the very top. Under species the data is
- organised by population, sometimes referred to as grouping.
- (In some really old documents/systems, you might see this referred to as
- InbredSet.)
-
In this section, you get to define what population your data is to be
- organised by.
This is the information you have provided to accompany the R/qtl2 bundle
- you have uploaded. Please verify the information is correct before
- proceeding.
- Provide a valid R/qtl2 zip file here. In particular, ensure your zip bundle
- contains exactly one control file and the corresponding files mentioned in
- the control file.
-
-
- The control file can be either a YAML or JSON file. ALL other data
- files in the zip bundle should be CSV files.
-
You have successfully uploaded the zipped bundle of R/qtl2 files.
-
The next step is to select the various extra information we need to figure
- out what to do with the data. You will select/create the relevant studies
- and/or datasets to organise the data in the steps that follow.
We organise the samples/cases/strains in a hierarchichal form, starting
- with species at the very top. Under species, we have a
- grouping in terms of the relevant population
- (e.g. Inbred populations, cell tissue, etc.)
- There was a critical failure launching the job to parse your file.
- This is our fault and (probably) has nothing to do with the file you uploaded.
-
-
-
- Please notify the developers of this issue when you encounter it,
- providing the link to this page, or the information below.
-
-
-
Debugging Information
-
-
-
job id: {{job_id}}
-
-
-{%endblock%}
diff --git a/qc_app/upload/__init__.py b/qc_app/upload/__init__.py
deleted file mode 100644
index 5f120d4..0000000
--- a/qc_app/upload/__init__.py
+++ /dev/null
@@ -1,7 +0,0 @@
-"""Package handling upload of files."""
-from flask import Blueprint
-
-from .rqtl2 import rqtl2
-
-upload = Blueprint("upload", __name__)
-upload.register_blueprint(rqtl2, url_prefix="/rqtl2")
diff --git a/qc_app/upload/rqtl2.py b/qc_app/upload/rqtl2.py
deleted file mode 100644
index 51d8321..0000000
--- a/qc_app/upload/rqtl2.py
+++ /dev/null
@@ -1,1157 +0,0 @@
-"""Module to handle uploading of R/qtl2 bundles."""#pylint: disable=[too-many-lines]
-import sys
-import json
-import traceback
-from pathlib import Path
-from datetime import date
-from uuid import UUID, uuid4
-from functools import partial
-from zipfile import ZipFile, is_zipfile
-from typing import Union, Callable, Optional
-
-import MySQLdb as mdb
-from redis import Redis
-from MySQLdb.cursors import DictCursor
-from werkzeug.utils import secure_filename
-from flask import (
- flash,
- escape,
- request,
- jsonify,
- url_for,
- redirect,
- Response,
- Blueprint,
- render_template,
- current_app as app)
-
-from r_qtl import r_qtl2
-
-from qc_app import jobs
-from qc_app.files import save_file, fullpath
-from qc_app.dbinsert import species as all_species
-from qc_app.db_utils import with_db_connection, database_connection
-
-from qc_app.db.platforms import platform_by_id, platforms_by_species
-from qc_app.db.averaging import averaging_methods, averaging_method_by_id
-from qc_app.db.tissues import all_tissues, tissue_by_id, create_new_tissue
-from qc_app.db import (
- species_by_id,
- save_population,
- populations_by_species,
- population_by_species_and_id,)
-from qc_app.db.datasets import (
- geno_dataset_by_id,
- geno_datasets_by_species_and_population,
-
- probeset_study_by_id,
- probeset_create_study,
- probeset_dataset_by_id,
- probeset_create_dataset,
- probeset_datasets_by_study,
- probeset_studies_by_species_and_population)
-
-rqtl2 = Blueprint("rqtl2", __name__)
-
-@rqtl2.route("/", methods=["GET", "POST"])
-@rqtl2.route("/select-species", methods=["GET", "POST"])
-def select_species():
- """Select the species."""
- if request.method == "GET":
- return render_template("rqtl2/index.html", species=with_db_connection(all_species))
-
- species_id = request.form.get("species_id")
- species = with_db_connection(
- lambda conn: species_by_id(conn, species_id))
- if bool(species):
- return redirect(url_for(
- "upload.rqtl2.select_population", species_id=species_id))
- flash("Invalid species or no species selected!", "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_species"))
-
-
-@rqtl2.route("/upload/species//select-population",
- methods=["GET", "POST"])
-def select_population(species_id: int):
- """Select/Create the population to organise data under."""
- with database_connection(app.config["SQL_URI"]) as conn:
- species = species_by_id(conn, species_id)
- if not bool(species):
- flash("Invalid species selected!", "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_species"))
-
- if request.method == "GET":
- return render_template(
- "rqtl2/select-population.html",
- species=species,
- populations=populations_by_species(conn, species_id))
-
- population = population_by_species_and_id(
- conn, species["SpeciesId"], request.form.get("inbredset_id"))
- if not bool(population):
- flash("Invalid Population!", "alert-error error-rqtl2")
- return redirect(
- url_for("upload.rqtl2.select_population", pgsrc="error"),
- code=307)
-
- return redirect(url_for("upload.rqtl2.upload_rqtl2_bundle",
- species_id=species["SpeciesId"],
- population_id=population["InbredSetId"]))
-
-
-@rqtl2.route("/upload/species//create-population",
- methods=["POST"])
-def create_population(species_id: int):
- """Create a new population for the given species."""
- population_page = redirect(url_for("upload.rqtl2.select_population",
- species_id=species_id))
- with database_connection(app.config["SQL_URI"]) as conn:
- species = species_by_id(conn, species_id)
- population_name = request.form.get("inbredset_name", "").strip()
- population_fullname = request.form.get("inbredset_fullname", "").strip()
- if not bool(species):
- flash("Invalid species!", "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_species"))
- if not bool(population_name):
- flash("Invalid Population Name!", "alert-error error-rqtl2")
- return population_page
- if not bool(population_fullname):
- flash("Invalid Population Full Name!", "alert-error error-rqtl2")
- return population_page
- new_population = save_population(conn, {
- "SpeciesId": species["SpeciesId"],
- "Name": population_name,
- "InbredSetName": population_fullname,
- "FullName": population_fullname,
- "Family": request.form.get("inbredset_family") or None,
- "Description": request.form.get("description") or None
- })
-
- flash("Population created successfully.", "alert-success")
- return redirect(
- url_for("upload.rqtl2.upload_rqtl2_bundle",
- species_id=species_id,
- population_id=new_population["population_id"],
- pgsrc="create-population"),
- code=307)
-
-
-class __RequestError__(Exception): #pylint: disable=[invalid-name]
- """Internal class to avoid pylint's `too-many-return-statements` error."""
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle"),
- methods=["GET", "POST"])
-def upload_rqtl2_bundle(species_id: int, population_id: int):
- """Allow upload of R/qtl2 bundle."""
- with database_connection(app.config["SQL_URI"]) as conn:
- species = species_by_id(conn, species_id)
- population = population_by_species_and_id(
- conn, species["SpeciesId"], population_id)
- if not bool(species):
- flash("Invalid species!", "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_species"))
- if not bool(population):
- flash("Invalid Population!", "alert-error error-rqtl2")
- return redirect(
- url_for("upload.rqtl2.select_population", pgsrc="error"),
- code=307)
- if request.method == "GET" or (
- request.method == "POST"
- and bool(request.args.get("pgsrc"))):
- return render_template("rqtl2/upload-rqtl2-bundle-step-01.html",
- species=species,
- population=population)
-
- try:
- app.logger.debug("Files in the form: %s", request.files)
- the_file = save_file(request.files["rqtl2_bundle_file"],
- Path(app.config["UPLOAD_FOLDER"]))
- except AssertionError:
- app.logger.debug(traceback.format_exc())
- flash("Please provide a valid R/qtl2 zip bundle.",
- "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.upload_rqtl2_bundle",
- species_id=species_id,
- population_id=population_id))
-
- if not is_zipfile(str(the_file)):
- app.logger.debug("The file is not a zip file.")
- raise __RequestError__("Invalid file! Expected a zip file.")
-
- jobid = trigger_rqtl2_bundle_qc(
- species_id,
- population_id,
- the_file,
- request.files["rqtl2_bundle_file"].filename)#type: ignore[arg-type]
- return redirect(url_for(
- "upload.rqtl2.rqtl2_bundle_qc_status", jobid=jobid))
-
-
-def trigger_rqtl2_bundle_qc(
- species_id: int,
- population_id: int,
- rqtl2bundle: Path,
- originalfilename: str
-) -> UUID:
- """Trigger QC on the R/qtl2 bundle."""
- redisuri = app.config["REDIS_URL"]
- with Redis.from_url(redisuri, decode_responses=True) as rconn:
- jobid = uuid4()
- redis_ttl_seconds = app.config["JOBS_TTL_SECONDS"]
- jobs.launch_job(
- jobs.initialise_job(
- rconn,
- jobs.jobsnamespace(),
- str(jobid),
- [sys.executable, "-m", "scripts.qc_on_rqtl2_bundle",
- app.config["SQL_URI"], app.config["REDIS_URL"],
- jobs.jobsnamespace(), str(jobid), str(species_id),
- str(population_id), "--redisexpiry",
- str(redis_ttl_seconds)],
- "rqtl2-bundle-qc-job",
- redis_ttl_seconds,
- {"job-metadata": json.dumps({
- "speciesid": species_id,
- "populationid": population_id,
- "rqtl2-bundle-file": str(rqtl2bundle.absolute()),
- "original-filename": originalfilename})}),
- redisuri,
- f"{app.config['UPLOAD_FOLDER']}/job_errors")
- return jobid
-
-
-def chunk_name(uploadfilename: str, chunkno: int) -> str:
- """Generate chunk name from original filename and chunk number"""
- if uploadfilename == "":
- raise ValueError("Name cannot be empty!")
- if chunkno < 1:
- raise ValueError("Chunk number must be greater than zero")
- return f"{secure_filename(uploadfilename)}_part_{chunkno:05d}"
-
-
-def chunks_directory(uniqueidentifier: str) -> Path:
- """Compute the directory where chunks are temporarily stored."""
- if uniqueidentifier == "":
- raise ValueError("Unique identifier cannot be empty!")
- return Path(app.config["UPLOAD_FOLDER"], f"tempdir_{uniqueidentifier}")
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle-chunked"),
- methods=["GET"])
-def upload_rqtl2_bundle_chunked_get(# pylint: disable=["unused-argument"]
- species_id: int,
- population_id: int
-):
- """
- Extension to the `upload_rqtl2_bundle` endpoint above that provides a way
- for testing whether all the chunks have been uploaded and to assist with
- resuming a failed upload.
- """
- fileid = request.args.get("resumableIdentifier", type=str) or ""
- filename = request.args.get("resumableFilename", type=str) or ""
- chunk = request.args.get("resumableChunkNumber", type=int) or 0
- if not(fileid or filename or chunk):
- return jsonify({
- "message": "At least one required query parameter is missing.",
- "error": "BadRequest",
- "statuscode": 400
- }), 400
-
- if Path(chunks_directory(fileid),
- chunk_name(filename, chunk)).exists():
- return "OK"
-
- return jsonify({
- "message": f"Chunk {chunk} was not found.",
- "error": "NotFound",
- "statuscode": 404
- }), 404
-
-
-def __merge_chunks__(targetfile: Path, chunkpaths: tuple[Path, ...]) -> Path:
- """Merge the chunks into a single file."""
- with open(targetfile, "ab") as _target:
- for chunkfile in chunkpaths:
- with open(chunkfile, "rb") as _chunkdata:
- _target.write(_chunkdata.read())
-
- chunkfile.unlink()
- return targetfile
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle-chunked"),
- methods=["POST"])
-def upload_rqtl2_bundle_chunked_post(species_id: int, population_id: int):
- """
- Extension to the `upload_rqtl2_bundle` endpoint above that allows large
- files to be uploaded in chunks.
-
- This should hopefully speed up uploads, and if done right, even enable
- resumable uploads
- """
- _totalchunks = request.form.get("resumableTotalChunks", type=int) or 0
- _chunk = request.form.get("resumableChunkNumber", default=1, type=int)
- _uploadfilename = request.form.get(
- "resumableFilename", default="", type=str) or ""
- _fileid = request.form.get(
- "resumableIdentifier", default="", type=str) or ""
- _targetfile = Path(app.config["UPLOAD_FOLDER"], _fileid)
-
- if _targetfile.exists():
- return jsonify({
- "message": (
- "A file with a similar unique identifier has previously been "
- "uploaded and possibly is/has being/been processed."),
- "error": "BadRequest",
- "statuscode": 400
- }), 400
-
- try:
- # save chunk data
- chunks_directory(_fileid).mkdir(exist_ok=True, parents=True)
- request.files["file"].save(Path(chunks_directory(_fileid),
- chunk_name(_uploadfilename, _chunk)))
-
- # Check whether upload is complete
- chunkpaths = tuple(
- Path(chunks_directory(_fileid), chunk_name(_uploadfilename, _achunk))
- for _achunk in range(1, _totalchunks+1))
- if all(_file.exists() for _file in chunkpaths):
- # merge_files and clean up chunks
- __merge_chunks__(_targetfile, chunkpaths)
- chunks_directory(_fileid).rmdir()
- jobid = trigger_rqtl2_bundle_qc(
- species_id, population_id, _targetfile, _uploadfilename)
- return url_for(
- "upload.rqtl2.rqtl2_bundle_qc_status", jobid=jobid)
- except Exception as exc:# pylint: disable=[broad-except]
- msg = "Error processing uploaded file chunks."
- app.logger.error(msg, exc_info=True, stack_info=True)
- return jsonify({
- "message": msg,
- "error": type(exc).__name__,
- "error-description": " ".join(str(arg) for arg in exc.args),
- "error-trace": traceback.format_exception(exc)
- }), 500
-
- return "OK"
-
-
-@rqtl2.route("/upload/species/rqtl2-bundle/qc-status/",
- methods=["GET", "POST"])
-def rqtl2_bundle_qc_status(jobid: UUID):
- """Check the status of the QC jobs."""
- with (Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn,
- database_connection(app.config["SQL_URI"]) as dbconn):
- try:
- thejob = jobs.job(rconn, jobs.jobsnamespace(), jobid)
- messagelistname = thejob.get("log-messagelist")
- logmessages = (rconn.lrange(messagelistname, 0, -1)
- if bool(messagelistname) else [])
- jobstatus = thejob["status"]
- if jobstatus == "error":
- return render_template("rqtl2/rqtl2-qc-job-error.html",
- job=thejob,
- errorsgeneric=json.loads(
- thejob.get("errors-generic", "[]")),
- errorsgeno=json.loads(
- thejob.get("errors-geno", "[]")),
- errorspheno=json.loads(
- thejob.get("errors-pheno", "[]")),
- errorsphenose=json.loads(
- thejob.get("errors-phenose", "[]")),
- errorsphenocovar=json.loads(
- thejob.get("errors-phenocovar", "[]")),
- messages=logmessages)
- if jobstatus == "success":
- jobmeta = json.loads(thejob["job-metadata"])
- species = species_by_id(dbconn, jobmeta["speciesid"])
- return render_template(
- "rqtl2/rqtl2-qc-job-results.html",
- species=species,
- population=population_by_species_and_id(
- dbconn, species["SpeciesId"], jobmeta["populationid"]),
- rqtl2bundle=Path(jobmeta["rqtl2-bundle-file"]).name,
- rqtl2bundleorig=jobmeta["original-filename"])
-
- def compute_percentage(thejob, filetype) -> Union[str, None]:
- if f"{filetype}-linecount" in thejob:
- return "100"
- if f"{filetype}-filesize" in thejob:
- percent = ((int(thejob.get(f"{filetype}-checked", 0))
- /
- int(thejob.get(f"{filetype}-filesize", 1)))
- * 100)
- return f"{percent:.2f}"
- return None
-
- return render_template(
- "rqtl2/rqtl2-qc-job-status.html",
- job=thejob,
- geno_percent=compute_percentage(thejob, "geno"),
- pheno_percent=compute_percentage(thejob, "pheno"),
- phenose_percent=compute_percentage(thejob, "phenose"),
- messages=logmessages)
- except jobs.JobNotFound:
- return render_template("rqtl2/no-such-job.html", jobid=jobid)
-
-
-def redirect_on_error(flaskroute, **kwargs):
- """Utility to redirect on error"""
- return redirect(url_for(flaskroute, **kwargs, pgsrc="error"),
- code=(307 if request.method == "POST" else 302))
-
-
-def check_species(conn: mdb.Connection, formargs: dict) -> Optional[
- tuple[str, Response]]:
- """
- Check whether the 'species_id' value is provided, and whether a
- corresponding species exists in the database.
-
- Maybe give the function a better name..."""
- speciespage = redirect_on_error("upload.rqtl2.select_species")
- if "species_id" not in formargs:
- return "You MUST provide the Species identifier.", speciespage
-
- if not bool(species_by_id(conn, formargs["species_id"])):
- return "No species with the provided identifier exists.", speciespage
-
- return None
-
-
-def check_population(conn: mdb.Connection,
- formargs: dict,
- species_id) -> Optional[tuple[str, Response]]:
- """
- Check whether the 'population_id' value is provided, and whether a
- corresponding population exists in the database.
-
- Maybe give the function a better name..."""
- poppage = redirect_on_error(
- "upload.rqtl2.select_species", species_id=species_id)
- if "population_id" not in formargs:
- return "You MUST provide the Population identifier.", poppage
-
- if not bool(population_by_species_and_id(
- conn, species_id, formargs["population_id"])):
- return "No population with the provided identifier exists.", poppage
-
- return None
-
-
-def check_r_qtl2_bundle(formargs: dict,
- species_id,
- population_id) -> Optional[tuple[str, Response]]:
- """Check for the existence of the R/qtl2 bundle."""
- fileuploadpage = redirect_on_error("upload.rqtl2.upload_rqtl2_bundle",
- species_id=species_id,
- population_id=population_id)
- if not "rqtl2_bundle_file" in formargs:
- return (
- "You MUST provide a R/qtl2 zip bundle for upload.", fileuploadpage)
-
- if not Path(fullpath(formargs["rqtl2_bundle_file"])).exists():
- return "No R/qtl2 bundle with the given name exists.", fileuploadpage
-
- return None
-
-
-def check_geno_dataset(conn: mdb.Connection,
- formargs: dict,
- species_id,
- population_id) -> Optional[tuple[str, Response]]:
- """Check for the Genotype dataset."""
- genodsetpg = redirect_on_error("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id)
- if not bool(formargs.get("geno-dataset-id")):
- return (
- "You MUST provide a valid Genotype dataset identifier", genodsetpg)
-
- with conn.cursor(cursorclass=DictCursor) as cursor:
- cursor.execute("SELECT * FROM GenoFreeze WHERE Id=%s",
- (formargs["geno-dataset-id"],))
- results = cursor.fetchall()
- if not bool(results):
- return ("No genotype dataset with the provided identifier exists.",
- genodsetpg)
- if len(results) > 1:
- return (
- "Data corruption: More than one genotype dataset with the same "
- "identifier.",
- genodsetpg)
-
- return None
-
-def check_tissue(
- conn: mdb.Connection,formargs: dict) -> Optional[tuple[str, Response]]:
- """Check for tissue/organ/biological material."""
- selectdsetpg = redirect_on_error("upload.rqtl2.select_dataset_info",
- species_id=formargs["species_id"],
- population_id=formargs["population_id"])
- if not bool(formargs.get("tissueid", "").strip()):
- return ("No tissue/organ/biological material provided.", selectdsetpg)
-
- with conn.cursor(cursorclass=DictCursor) as cursor:
- cursor.execute("SELECT * FROM Tissue WHERE Id=%s",
- (formargs["tissueid"],))
- results = cursor.fetchall()
- if not bool(results):
- return ("No tissue/organ with the provided identifier exists.",
- selectdsetpg)
-
- if len(results) > 1:
- return (
- "Data corruption: More than one tissue/organ with the same "
- "identifier.",
- selectdsetpg)
-
- return None
-
-
-def check_probe_study(conn: mdb.Connection,
- formargs: dict,
- species_id,
- population_id) -> Optional[tuple[str, Response]]:
- """Check for the ProbeSet study."""
- dsetinfopg = redirect_on_error("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id)
- if not bool(formargs.get("probe-study-id")):
- return "No probeset study was selected!", dsetinfopg
-
- if not bool(probeset_study_by_id(conn, formargs["probe-study-id"])):
- return ("No probeset study with the provided identifier exists",
- dsetinfopg)
-
- return None
-
-
-def check_probe_dataset(conn: mdb.Connection,
- formargs: dict,
- species_id,
- population_id) -> Optional[tuple[str, Response]]:
- """Check for the ProbeSet dataset."""
- dsetinfopg = redirect_on_error("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id)
- if not bool(formargs.get("probe-dataset-id")):
- return "No probeset dataset was selected!", dsetinfopg
-
- if not bool(probeset_dataset_by_id(conn, formargs["probe-dataset-id"])):
- return ("No probeset dataset with the provided identifier exists",
- dsetinfopg)
-
- return None
-
-
-def with_errors(endpointthunk: Callable, *checkfns):
- """Run 'endpointthunk' with error checking."""
- formargs = {**dict(request.args), **dict(request.form)}
- errors = tuple(item for item in (_fn(formargs=formargs) for _fn in checkfns)
- if item is not None)
- if len(errors) > 0:
- flash(errors[0][0], "alert-error error-rqtl2")
- return errors[0][1]
-
- return endpointthunk()
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/select-geno-dataset"),
- methods=["POST"])
-def select_geno_dataset(species_id: int, population_id: int):
- """Select from existing geno datasets."""
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- geno_dset = geno_datasets_by_species_and_population(
- conn, species_id, population_id)
- if not bool(geno_dset):
- flash("No genotype dataset was provided!",
- "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_geno_dataset",
- species_id=species_id,
- population_id=population_id,
- pgsrc="error"),
- code=307)
-
- flash("Genotype accepted", "alert-success error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id,
- pgsrc="upload.rqtl2.select_geno_dataset"),
- code=307)
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population, conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/create-geno-dataset"),
- methods=["POST"])
-def create_geno_dataset(species_id: int, population_id: int):
- """Create a new geno dataset."""
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- sgeno_page = redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id,
- pgsrc="error"),
- code=307)
- errorclasses = "alert-error error-rqtl2 error-rqtl2-create-geno-dataset"
- if not bool(request.form.get("dataset-name")):
- flash("You must provide the dataset name", errorclasses)
- return sgeno_page
- if not bool(request.form.get("dataset-fullname")):
- flash("You must provide the dataset full name", errorclasses)
- return sgeno_page
- public = 2 if request.form.get("dataset-public") == "on" else 0
-
- with conn.cursor(cursorclass=DictCursor) as cursor:
- datasetname = request.form["dataset-name"]
- new_dataset = {
- "name": datasetname,
- "fname": request.form.get("dataset-fullname"),
- "sname": request.form.get("dataset-shortname") or datasetname,
- "today": date.today().isoformat(),
- "pub": public,
- "isetid": population_id
- }
- cursor.execute("SELECT * FROM GenoFreeze WHERE Name=%s",
- (datasetname,))
- results = cursor.fetchall()
- if bool(results):
- flash(
- f"A genotype dataset with name '{escape(datasetname)}' "
- "already exists.",
- errorclasses)
- return redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id,
- pgsrc="error"),
- code=307)
- cursor.execute(
- "INSERT INTO GenoFreeze("
- "Name, FullName, ShortName, CreateTime, public, InbredSetId"
- ") "
- "VALUES("
- "%(name)s, %(fname)s, %(sname)s, %(today)s, %(pub)s, %(isetid)s"
- ")",
- new_dataset)
- flash("Created dataset successfully.", "alert-success")
- return render_template(
- "rqtl2/create-geno-dataset-success.html",
- species=species_by_id(conn, species_id),
- population=population_by_species_and_id(
- conn, species_id, population_id),
- rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
- geno_dataset={**new_dataset, "id": cursor.lastrowid})
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population, conn=conn, species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/select-tissue"),
- methods=["POST"])
-def select_tissue(species_id: int, population_id: int):
- """Select from existing tissues."""
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- if not bool(request.form.get("tissueid", "").strip()):
- flash("Invalid tissue selection!",
- "alert-error error-select-tissue error-rqtl2")
-
- return redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id,
- pgsrc="upload.rqtl2.select_geno_dataset"),
- code=307)
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/create-tissue"),
- methods=["POST"])
-def create_tissue(species_id: int, population_id: int):
- """Add new tissue, organ or biological material to the system."""
- form = request.form
- datasetinfopage = redirect(
- url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id,
- pgsrc="upload.rqtl2.select_geno_dataset"),
- code=307)
- with database_connection(app.config["SQL_URI"]) as conn:
- tissuename = form.get("tissuename", "").strip()
- tissueshortname = form.get("tissueshortname", "").strip()
- if not bool(tissuename):
- flash("Organ/Tissue name MUST be provided.",
- "alert-error error-create-tissue error-rqtl2")
- return datasetinfopage
-
- if not bool(tissueshortname):
- flash("Organ/Tissue short name MUST be provided.",
- "alert-error error-create-tissue error-rqtl2")
- return datasetinfopage
-
- try:
- tissue = create_new_tissue(conn, tissuename, tissueshortname)
- flash("Tissue created successfully!", "alert-success")
- return render_template(
- "rqtl2/create-tissue-success.html",
- species=species_by_id(conn, species_id),
- population=population_by_species_and_id(
- conn, species_id, population_id),
- rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
- geno_dataset=geno_dataset_by_id(
- conn,
- int(request.form["geno-dataset-id"])),
- tissue=tissue)
- except mdb.IntegrityError as _ierr:
- flash("Tissue/Organ with that short name already exists!",
- "alert-error error-create-tissue error-rqtl2")
- return datasetinfopage
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/select-probeset-study"),
- methods=["POST"])
-def select_probeset_study(species_id: int, population_id: int):
- """Select or create a probeset study."""
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- summary_page = redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id),
- code=307)
- if not bool(probeset_study_by_id(conn, int(request.form["probe-study-id"]))):
- flash("Invalid study selected!", "alert-error error-rqtl2")
- return summary_page
-
- return summary_page
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_tissue, conn=conn),
- partial(check_probe_study,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/select-probeset-dataset"),
- methods=["POST"])
-def select_probeset_dataset(species_id: int, population_id: int):
- """Select or create a probeset dataset."""
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- summary_page = redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id),
- code=307)
- if not bool(probeset_study_by_id(conn, int(request.form["probe-study-id"]))):
- flash("Invalid study selected!", "alert-error error-rqtl2")
- return summary_page
-
- return summary_page
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_tissue, conn=conn),
- partial(check_probe_study,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_probe_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/create-probeset-study"),
- methods=["POST"])
-def create_probeset_study(species_id: int, population_id: int):
- """Create a new probeset study."""
- errorclasses = "alert-error error-rqtl2 error-rqtl2-create-probeset-study"
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- form = request.form
- dataset_info_page = redirect(
- url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id),
- code=307)
-
- if not (bool(form.get("platformid")) and
- bool(platform_by_id(conn, int(form["platformid"])))):
- flash("Invalid platform selected.", errorclasses)
- return dataset_info_page
-
- if not (bool(form.get("tissueid")) and
- bool(tissue_by_id(conn, int(form["tissueid"])))):
- flash("Invalid tissue selected.", errorclasses)
- return dataset_info_page
-
- studyname = form["studyname"]
- try:
- study = probeset_create_study(
- conn, population_id, int(form["platformid"]), int(form["tissueid"]),
- studyname, form.get("studyfullname") or "",
- form.get("studyshortname") or "")
- except mdb.IntegrityError as _ierr:
- flash(f"ProbeSet study with name '{escape(studyname)}' already "
- "exists.",
- errorclasses)
- return dataset_info_page
- return render_template(
- "rqtl2/create-probe-study-success.html",
- species=species_by_id(conn, species_id),
- population=population_by_species_and_id(
- conn, species_id, population_id),
- rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
- geno_dataset=geno_dataset_by_id(
- conn,
- int(request.form["geno-dataset-id"])),
- study=study)
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_tissue, conn=conn))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/create-probeset-dataset"),
- methods=["POST"])
-def create_probeset_dataset(species_id: int, population_id: int):#pylint: disable=[too-many-return-statements]
- """Create a new probeset dataset."""
- errorclasses = "alert-error error-rqtl2 error-rqtl2-create-probeset-dataset"
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():#pylint: disable=[too-many-return-statements]
- form = request.form
- summary_page = redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id),
- code=307)
- if not bool(form.get("averageid")):
- flash("Averaging method not selected!", errorclasses)
- return summary_page
- if not bool(form.get("datasetname")):
- flash("Dataset name not provided!", errorclasses)
- return summary_page
- if not bool(form.get("datasetfullname")):
- flash("Dataset full name not provided!", errorclasses)
- return summary_page
-
- tissue = tissue_by_id(conn, form.get("tissueid", "").strip())
-
- study = probeset_study_by_id(conn, int(form["probe-study-id"]))
- if not bool(study):
- flash("Invalid ProbeSet study provided!", errorclasses)
- return summary_page
-
- avgmethod = averaging_method_by_id(conn, int(form["averageid"]))
- if not bool(avgmethod):
- flash("Invalid averaging method provided!", errorclasses)
- return summary_page
-
- try:
- dset = probeset_create_dataset(conn,
- int(form["probe-study-id"]),
- int(form["averageid"]),
- form["datasetname"],
- form["datasetfullname"],
- form["datasetshortname"],
- form["datasetpublic"] == "on",
- form.get(
- "datasetdatascale", "log2"))
- except mdb.IntegrityError as _ierr:
- app.logger.debug("Possible integrity error: %s", traceback.format_exc())
- flash(("IntegrityError: The data you provided has some errors: "
- f"{_ierr.args}"),
- errorclasses)
- return summary_page
- except Exception as _exc:# pylint: disable=[broad-except]
- app.logger.debug("Error creating ProbeSet dataset: %s",
- traceback.format_exc())
- flash(("There was a problem creating your dataset. Please try "
- "again."),
- errorclasses)
- return summary_page
- return render_template(
- "rqtl2/create-probe-dataset-success.html",
- species=species_by_id(conn, species_id),
- population=population_by_species_and_id(
- conn, species_id, population_id),
- rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
- geno_dataset=geno_dataset_by_id(
- conn,
- int(request.form["geno-dataset-id"])),
- tissue=tissue,
- study=study,
- avgmethod=avgmethod,
- dataset=dset)
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_tissue, conn=conn),
- partial(check_probe_study,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/dataset-info"),
- methods=["POST"])
-def select_dataset_info(species_id: int, population_id: int):
- """
- If `geno` files exist in the R/qtl2 bundle, prompt user to provide the
- dataset the genotypes belong to.
- """
- form = request.form
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- species = species_by_id(conn, species_id)
- population = population_by_species_and_id(
- conn, species_id, population_id)
- thefile = fullpath(form["rqtl2_bundle_file"])
- with ZipFile(str(thefile), "r") as zfile:
- cdata = r_qtl2.control_data(zfile)
-
- geno_dataset = geno_dataset_by_id(
- conn,form.get("geno-dataset-id", "").strip())
- if "geno" in cdata and not bool(form.get("geno-dataset-id")):
- return render_template(
- "rqtl2/select-geno-dataset.html",
- species=species,
- population=population,
- rqtl2_bundle_file=thefile.name,
- datasets=geno_datasets_by_species_and_population(
- conn, species_id, population_id))
-
- tissue = tissue_by_id(conn, form.get("tissueid", "").strip())
- if "pheno" in cdata and not bool(tissue):
- return render_template(
- "rqtl2/select-tissue.html",
- species=species,
- population=population,
- rqtl2_bundle_file=thefile.name,
- geno_dataset=geno_dataset,
- studies=probeset_studies_by_species_and_population(
- conn, species_id, population_id),
- platforms=platforms_by_species(conn, species_id),
- tissues=all_tissues(conn))
-
- probeset_study = probeset_study_by_id(
- conn, form.get("probe-study-id", "").strip())
- if "pheno" in cdata and not bool(probeset_study):
- return render_template(
- "rqtl2/select-probeset-study-id.html",
- species=species,
- population=population,
- rqtl2_bundle_file=thefile.name,
- geno_dataset=geno_dataset,
- studies=probeset_studies_by_species_and_population(
- conn, species_id, population_id),
- platforms=platforms_by_species(conn, species_id),
- tissue=tissue)
- probeset_study = probeset_study_by_id(
- conn, int(form["probe-study-id"]))
-
- probeset_dataset = probeset_dataset_by_id(
- conn, form.get("probe-dataset-id", "").strip())
- if "pheno" in cdata and not bool(probeset_dataset):
- return render_template(
- "rqtl2/select-probeset-dataset.html",
- species=species,
- population=population,
- rqtl2_bundle_file=thefile.name,
- geno_dataset=geno_dataset,
- probe_study=probeset_study,
- tissue=tissue,
- datasets=probeset_datasets_by_study(
- conn, int(form["probe-study-id"])),
- avgmethods=averaging_methods(conn))
-
- return render_template("rqtl2/summary-info.html",
- species=species,
- population=population,
- rqtl2_bundle_file=thefile.name,
- geno_dataset=geno_dataset,
- tissue=tissue,
- probe_study=probeset_study,
- probe_dataset=probeset_dataset)
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/confirm-bundle-details"),
- methods=["POST"])
-def confirm_bundle_details(species_id: int, population_id: int):
- """Confirm the details and trigger R/qtl2 bundle processing..."""
- redisuri = app.config["REDIS_URL"]
- with (database_connection(app.config["SQL_URI"]) as conn,
- Redis.from_url(redisuri, decode_responses=True) as rconn):
- def __thunk__():
- redis_ttl_seconds = app.config["JOBS_TTL_SECONDS"]
- jobid = str(uuid4())
- _job = jobs.launch_job(
- jobs.initialise_job(
- rconn,
- jobs.jobsnamespace(),
- jobid,
- [
- sys.executable, "-m", "scripts.process_rqtl2_bundle",
- app.config["SQL_URI"], app.config["REDIS_URL"],
- jobs.jobsnamespace(), jobid, "--redisexpiry",
- str(redis_ttl_seconds)],
- "R/qtl2 Bundle Upload",
- redis_ttl_seconds,
- {
- "bundle-metadata": json.dumps({
- "speciesid": species_id,
- "populationid": population_id,
- "rqtl2-bundle-file": str(fullpath(
- request.form["rqtl2_bundle_file"])),
- "geno-dataset-id": request.form.get(
- "geno-dataset-id", ""),
- "probe-study-id": request.form.get(
- "probe-study-id", ""),
- "probe-dataset-id": request.form.get(
- "probe-dataset-id", ""),
- **({
- "platformid": probeset_study_by_id(
- conn,
- int(request.form["probe-study-id"]))["ChipId"]
- } if bool(request.form.get("probe-study-id")) else {})
- })
- }),
- redisuri,
- f"{app.config['UPLOAD_FOLDER']}/job_errors")
-
- return redirect(url_for("upload.rqtl2.rqtl2_processing_status",
- jobid=jobid))
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_probe_study,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_probe_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route("/status/")
-def rqtl2_processing_status(jobid: UUID):
- """Retrieve the status of the job processing the uploaded R/qtl2 bundle."""
- with Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn:
- try:
- thejob = jobs.job(rconn, jobs.jobsnamespace(), jobid)
-
- messagelistname = thejob.get("log-messagelist")
- logmessages = (rconn.lrange(messagelistname, 0, -1)
- if bool(messagelistname) else [])
-
- if thejob["status"] == "error":
- return render_template(
- "rqtl2/rqtl2-job-error.html", job=thejob, messages=logmessages)
- if thejob["status"] == "success":
- return render_template("rqtl2/rqtl2-job-results.html",
- job=thejob,
- messages=logmessages)
-
- return render_template(
- "rqtl2/rqtl2-job-status.html", job=thejob, messages=logmessages)
- except jobs.JobNotFound as _exc:
- return render_template("rqtl2/no-such-job.html", jobid=jobid)
diff --git a/scripts/insert_data.py b/scripts/insert_data.py
index 1465348..4b2e5f3 100644
--- a/scripts/insert_data.py
+++ b/scripts/insert_data.py
@@ -14,8 +14,8 @@ from MySQLdb.cursors import DictCursor
from functional_tools import take
from quality_control.file_utils import open_file
-from qc_app.db_utils import database_connection
-from qc_app.check_connections import check_db, check_redis
+from uploader.db_utils import database_connection
+from uploader.check_connections import check_db, check_redis
# Set up logging
stderr_handler = logging.StreamHandler(stream=sys.stderr)
diff --git a/scripts/insert_samples.py b/scripts/insert_samples.py
index 8431462..87f29dc 100644
--- a/scripts/insert_samples.py
+++ b/scripts/insert_samples.py
@@ -7,10 +7,10 @@ import argparse
import MySQLdb as mdb
from redis import Redis
-from qc_app.db_utils import database_connection
-from qc_app.check_connections import check_db, check_redis
-from qc_app.db import species_by_id, population_by_id
-from qc_app.samples import (
+from uploader.db_utils import database_connection
+from uploader.check_connections import check_db, check_redis
+from uploader.db import species_by_id, population_by_id
+from uploader.samples import (
save_samples_data,
read_samples_file,
cross_reference_samples)
diff --git a/scripts/process_rqtl2_bundle.py b/scripts/process_rqtl2_bundle.py
index 4da3936..a7e169d 100644
--- a/scripts/process_rqtl2_bundle.py
+++ b/scripts/process_rqtl2_bundle.py
@@ -17,9 +17,9 @@ import r_qtl.errors as rqe
import r_qtl.r_qtl2 as rqtl2
import r_qtl.r_qtl2_qc as rqc
-from qc_app import jobs
-from qc_app.db_utils import database_connection
-from qc_app.check_connections import check_db, check_redis
+from uploader import jobs
+from uploader.db_utils import database_connection
+from uploader.check_connections import check_db, check_redis
from scripts.cli_parser import init_cli_parser
from scripts.redis_logger import setup_redis_logger
diff --git a/scripts/qc.py b/scripts/qc.py
index e8573a9..6de051f 100644
--- a/scripts/qc.py
+++ b/scripts/qc.py
@@ -11,7 +11,7 @@ from quality_control.utils import make_progress_calculator
from quality_control.errors import InvalidValue, DuplicateHeading
from quality_control.parsing import FileType, strain_names, collect_errors
-from qc_app.db_utils import database_connection
+from uploader.db_utils import database_connection
from .cli_parser import init_cli_parser
diff --git a/scripts/qc_on_rqtl2_bundle.py b/scripts/qc_on_rqtl2_bundle.py
index 40809b7..150fbce 100644
--- a/scripts/qc_on_rqtl2_bundle.py
+++ b/scripts/qc_on_rqtl2_bundle.py
@@ -16,9 +16,9 @@ from redis import Redis
from quality_control.errors import InvalidValue
from quality_control.checks import decimal_points_error
-from qc_app import jobs
-from qc_app.db_utils import database_connection
-from qc_app.check_connections import check_db, check_redis
+from uploader import jobs
+from uploader.db_utils import database_connection
+from uploader.check_connections import check_db, check_redis
from r_qtl import errors as rqe
from r_qtl import r_qtl2 as rqtl2
diff --git a/scripts/qcapp_wsgi.py b/scripts/qcapp_wsgi.py
index 349c006..fe77031 100644
--- a/scripts/qcapp_wsgi.py
+++ b/scripts/qcapp_wsgi.py
@@ -5,8 +5,8 @@ from logging import getLogger, StreamHandler
from flask import Flask
-from qc_app import create_app
-from qc_app.check_connections import check_db, check_redis
+from uploader import create_app
+from uploader.check_connections import check_db, check_redis
def setup_logging(appl: Flask) -> Flask:
"""Setup appropriate logging paradigm depending on environment."""
diff --git a/scripts/rqtl2/entry.py b/scripts/rqtl2/entry.py
index 93fc130..b7fb68e 100644
--- a/scripts/rqtl2/entry.py
+++ b/scripts/rqtl2/entry.py
@@ -6,9 +6,9 @@ from argparse import Namespace
from redis import Redis
from MySQLdb import Connection
-from qc_app import jobs
-from qc_app.db_utils import database_connection
-from qc_app.check_connections import check_db, check_redis
+from uploader import jobs
+from uploader.db_utils import database_connection
+from uploader.check_connections import check_db, check_redis
from scripts.redis_logger import setup_redis_logger
diff --git a/scripts/validate_file.py b/scripts/validate_file.py
index 0028795..a40d7e7 100644
--- a/scripts/validate_file.py
+++ b/scripts/validate_file.py
@@ -12,8 +12,8 @@ from redis.exceptions import ConnectionError # pylint: disable=[redefined-builti
from quality_control.utils import make_progress_calculator
from quality_control.parsing import FileType, strain_names, collect_errors
-from qc_app import jobs
-from qc_app.db_utils import database_connection
+from uploader import jobs
+from uploader.db_utils import database_connection
from .cli_parser import init_cli_parser
from .qc import add_file_validation_arguments
diff --git a/scripts/worker.py b/scripts/worker.py
index 0eb9ea5..91b0332 100644
--- a/scripts/worker.py
+++ b/scripts/worker.py
@@ -11,8 +11,8 @@ from tempfile import TemporaryDirectory
from redis import Redis
-from qc_app import jobs
-from qc_app.check_connections import check_redis
+from uploader import jobs
+from uploader.check_connections import check_redis
def parse_args():
"Parse the command-line arguments"
diff --git a/tests/conftest.py b/tests/conftest.py
index a39acf0..9012221 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -11,8 +11,8 @@ from redis import Redis
from functional_tools import take
-from qc_app import jobs, create_app
-from qc_app.jobs import JOBS_PREFIX
+from uploader import jobs, create_app
+from uploader.jobs import JOBS_PREFIX
from quality_control.errors import InvalidValue, DuplicateHeading
diff --git a/tests/qc_app/test_parse.py b/tests/qc_app/test_parse.py
index 3915a4d..076c47c 100644
--- a/tests/qc_app/test_parse.py
+++ b/tests/qc_app/test_parse.py
@@ -4,7 +4,7 @@ import sys
import redis
import pytest
-from qc_app.jobs import job, jobsnamespace
+from uploader.jobs import job, jobsnamespace
from tests.conftest import uploadable_file_object
@@ -24,7 +24,7 @@ def test_parse_with_existing_uploaded_file(#pylint: disable=[too-many-arguments]
1. the system redirects to the job/parse status page
2. the job is placed on redis for processing
"""
- monkeypatch.setattr("qc_app.jobs.uuid4", lambda : job_id)
+ monkeypatch.setattr("uploader.jobs.uuid4", lambda : job_id)
# Upload a file
speciesid = 1
filename = "no_data_errors.tsv"
diff --git a/uploader/__init__.py b/uploader/__init__.py
new file mode 100644
index 0000000..3ee8aa0
--- /dev/null
+++ b/uploader/__init__.py
@@ -0,0 +1,48 @@
+"""The Quality-Control Web Application entry point"""
+import os
+import logging
+from pathlib import Path
+
+from flask import Flask, request
+
+from .entry import entrybp
+from .upload import upload
+from .parse import parsebp
+from .samples import samples
+from .base_routes import base
+from .dbinsert import dbinsertbp
+from .errors import register_error_handlers
+
+def override_settings_with_envvars(
+ app: Flask, ignore: tuple[str, ...]=tuple()) -> None:
+ """Override settings in `app` with those in ENVVARS"""
+ for setting in (key for key in app.config if key not in ignore):
+ app.config[setting] = os.environ.get(setting) or app.config[setting]
+
+
+def create_app():
+ """The application factory"""
+ app = Flask(__name__)
+ app.config.from_pyfile(
+ Path(__file__).parent.joinpath("default_settings.py"))
+ if "QCAPP_CONF" in os.environ:
+ app.config.from_envvar("QCAPP_CONF") # Override defaults with instance path
+
+ override_settings_with_envvars(app, ignore=tuple())
+
+ if "QCAPP_SECRETS" in os.environ:
+ app.config.from_envvar("QCAPP_SECRETS")
+
+ # setup jinja2 symbols
+ app.jinja_env.globals.update(request_url=lambda : request.url)
+
+ # setup blueprints
+ app.register_blueprint(base, url_prefix="/")
+ app.register_blueprint(entrybp, url_prefix="/")
+ app.register_blueprint(parsebp, url_prefix="/parse")
+ app.register_blueprint(upload, url_prefix="/upload")
+ app.register_blueprint(dbinsertbp, url_prefix="/dbinsert")
+ app.register_blueprint(samples, url_prefix="/samples")
+
+ register_error_handlers(app)
+ return app
diff --git a/uploader/base_routes.py b/uploader/base_routes.py
new file mode 100644
index 0000000..9daf439
--- /dev/null
+++ b/uploader/base_routes.py
@@ -0,0 +1,29 @@
+"""Basic routes required for all pages"""
+import os
+from flask import Blueprint, send_from_directory
+
+base = Blueprint("base", __name__)
+
+def appenv():
+ """Get app's guix environment path."""
+ return os.environ.get("GN_UPLOADER_ENVIRONMENT")
+
+@base.route("/bootstrap/")
+def bootstrap(filename):
+ """Fetch bootstrap files."""
+ return send_from_directory(
+ appenv(), f"share/genenetwork2/javascript/bootstrap/{filename}")
+
+
+@base.route("/jquery/")
+def jquery(filename):
+ """Fetch jquery files."""
+ return send_from_directory(
+ appenv(), f"share/genenetwork2/javascript/jquery/{filename}")
+
+
+@base.route("/node-modules/")
+def node_modules(filename):
+ """Fetch node-js modules."""
+ return send_from_directory(
+ appenv(), f"lib/node_modules/{filename}")
diff --git a/uploader/check_connections.py b/uploader/check_connections.py
new file mode 100644
index 0000000..2561e55
--- /dev/null
+++ b/uploader/check_connections.py
@@ -0,0 +1,28 @@
+"""Check the various connection used in the application"""
+import sys
+import traceback
+
+import redis
+import MySQLdb
+
+from uploader.db_utils import database_connection
+
+def check_redis(uri: str):
+ "Check the redis connection"
+ try:
+ with redis.Redis.from_url(uri) as rconn:
+ rconn.ping()
+ except redis.exceptions.ConnectionError as conn_err:
+ print(conn_err, file=sys.stderr)
+ print(traceback.format_exc(), file=sys.stderr)
+ sys.exit(1)
+
+def check_db(uri: str):
+ "Check the mysql connection"
+ try:
+ with database_connection(uri) as dbconn: # pylint: disable=[unused-variable]
+ pass
+ except MySQLdb.OperationalError as op_err:
+ print(op_err, file=sys.stderr)
+ print(traceback.format_exc(), file=sys.stderr)
+ sys.exit(1)
diff --git a/uploader/db/__init__.py b/uploader/db/__init__.py
new file mode 100644
index 0000000..36e93e8
--- /dev/null
+++ b/uploader/db/__init__.py
@@ -0,0 +1,8 @@
+"""Database functions"""
+from .species import species, species_by_id
+from .populations import (
+ save_population,
+ population_by_id,
+ populations_by_species,
+ population_by_species_and_id)
+from .datasets import geno_datasets_by_species_and_population
diff --git a/uploader/db/averaging.py b/uploader/db/averaging.py
new file mode 100644
index 0000000..62bbe67
--- /dev/null
+++ b/uploader/db/averaging.py
@@ -0,0 +1,23 @@
+"""Functions for db interactions for averaging methods"""
+from typing import Optional
+
+import MySQLdb as mdb
+from MySQLdb.cursors import DictCursor
+
+def averaging_methods(conn: mdb.Connection) -> tuple[dict, ...]:
+ """Fetch all available averaging methods"""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM AvgMethod")
+ return tuple(dict(row) for row in cursor.fetchall())
+
+def averaging_method_by_id(
+ conn: mdb.Connection, averageid: int) -> Optional[dict]:
+ """Fetch the averaging method by its ID"""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM AvgMethod WHERE Id=%s",
+ (averageid,))
+ result = cursor.fetchone()
+ if bool(result):
+ return dict(result)
+
+ return None
diff --git a/uploader/db/datasets.py b/uploader/db/datasets.py
new file mode 100644
index 0000000..767ec41
--- /dev/null
+++ b/uploader/db/datasets.py
@@ -0,0 +1,133 @@
+"""Functions for accessing the database relating to datasets."""
+from datetime import date
+from typing import Optional
+
+import MySQLdb as mdb
+from MySQLdb.cursors import DictCursor
+
+def geno_datasets_by_species_and_population(
+ conn: mdb.Connection,
+ speciesid: int,
+ populationid: int) -> tuple[dict, ...]:
+ """Retrieve all genotypes datasets by species and population"""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute(
+ "SELECT gf.* FROM InbredSet AS iset INNER JOIN GenoFreeze AS gf "
+ "ON iset.InbredSetId=gf.InbredSetId "
+ "WHERE iset.SpeciesId=%(sid)s AND iset.InbredSetId=%(pid)s",
+ {"sid": speciesid, "pid": populationid})
+ return tuple(dict(row) for row in cursor.fetchall())
+
+def geno_dataset_by_id(conn: mdb.Connection, dataset_id) -> Optional[dict]:
+ """Retrieve genotype dataset by ID"""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM GenoFreeze WHERE Id=%s", (dataset_id,))
+ _dataset = cursor.fetchone()
+ return dict(_dataset) if bool(_dataset) else None
+
+def probeset_studies_by_species_and_population(
+ conn: mdb.Connection,
+ speciesid: int,
+ populationid: int) -> tuple[dict, ...]:
+ """Retrieve all probesets"""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute(
+ "SELECT pf.* FROM InbredSet AS iset INNER JOIN ProbeFreeze AS pf "
+ "ON iset.InbredSetId=pf.InbredSetId "
+ "WHERE iset.SpeciesId=%(sid)s AND iset.InbredSetId=%(pid)s",
+ {"sid": speciesid, "pid": populationid})
+ return tuple(dict(row) for row in cursor.fetchall())
+
+def probeset_datasets_by_study(conn: mdb.Connection,
+ studyid: int) -> tuple[dict, ...]:
+ """Retrieve all probeset databases by study."""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM ProbeSetFreeze WHERE ProbeFreezeId=%s",
+ (studyid,))
+ return tuple(dict(row) for row in cursor.fetchall())
+
+def probeset_study_by_id(conn: mdb.Connection, studyid) -> Optional[dict]:
+ """Retrieve ProbeSet study by ID"""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM ProbeFreeze WHERE Id=%s", (studyid,))
+ _study = cursor.fetchone()
+ return dict(_study) if bool(_study) else None
+
+def probeset_create_study(conn: mdb.Connection,#pylint: disable=[too-many-arguments]
+ populationid: int,
+ platformid: int,
+ tissueid: int,
+ studyname: str,
+ studyfullname: str = "",
+ studyshortname: str = ""):
+ """Create a new ProbeSet study."""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ studydata = {
+ "platid": platformid,
+ "tissueid": tissueid,
+ "name": studyname,
+ "fname": studyfullname or studyname,
+ "sname": studyshortname,
+ "today": date.today().isoformat(),
+ "popid": populationid
+ }
+ cursor.execute(
+ """
+ INSERT INTO ProbeFreeze(
+ ChipId, TissueId, Name, FullName, ShortName, CreateTime,
+ InbredSetId
+ ) VALUES (
+ %(platid)s, %(tissueid)s, %(name)s, %(fname)s, %(sname)s,
+ %(today)s, %(popid)s
+ )
+ """,
+ studydata)
+ studyid = cursor.lastrowid
+ cursor.execute("UPDATE ProbeFreeze SET ProbeFreezeId=%s WHERE Id=%s",
+ (studyid, studyid))
+ return {**studydata, "studyid": studyid}
+
+def probeset_create_dataset(conn: mdb.Connection,#pylint: disable=[too-many-arguments]
+ studyid: int,
+ averageid: int,
+ datasetname: str,
+ datasetfullname: str,
+ datasetshortname: str="",
+ public: bool = True,
+ datascale="log2") -> dict:
+ """Create a new ProbeSet dataset."""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ dataset = {
+ "studyid": studyid,
+ "averageid": averageid,
+ "name2": datasetname,
+ "fname": datasetfullname,
+ "name": datasetshortname,
+ "sname": datasetshortname,
+ "today": date.today().isoformat(),
+ "public": 2 if public else 0,
+ "authorisedusers": "williamslab",
+ "datascale": datascale
+ }
+ cursor.execute(
+ """
+ INSERT INTO ProbeSetFreeze(
+ ProbeFreezeId, AvgId, Name, Name2, FullName, ShortName,
+ CreateTime, public, AuthorisedUsers, DataScale)
+ VALUES(
+ %(studyid)s, %(averageid)s, %(name)s, %(name2)s, %(fname)s,
+ %(sname)s, %(today)s, %(public)s, %(authorisedusers)s,
+ %(datascale)s)
+ """,
+ dataset)
+ return {**dataset, "datasetid": cursor.lastrowid}
+
+def probeset_dataset_by_id(conn: mdb.Connection, datasetid) -> Optional[dict]:
+ """Fetch a ProbeSet dataset by its ID"""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM ProbeSetFreeze WHERE Id=%s", (datasetid,))
+ result = cursor.fetchone()
+ if bool(result):
+ return dict(result)
+
+ return None
diff --git a/uploader/db/platforms.py b/uploader/db/platforms.py
new file mode 100644
index 0000000..cb527a7
--- /dev/null
+++ b/uploader/db/platforms.py
@@ -0,0 +1,25 @@
+"""Handle db interactions for platforms."""
+from typing import Optional
+
+import MySQLdb as mdb
+from MySQLdb.cursors import DictCursor
+
+def platforms_by_species(
+ conn: mdb.Connection, speciesid: int) -> tuple[dict, ...]:
+ """Retrieve platforms by the species"""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM GeneChip WHERE SpeciesId=%s "
+ "ORDER BY GeneChipName ASC",
+ (speciesid,))
+ return tuple(dict(row) for row in cursor.fetchall())
+
+def platform_by_id(conn: mdb.Connection, platformid: int) -> Optional[dict]:
+ """Retrieve a platform by its ID"""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM GeneChip WHERE Id=%s",
+ (platformid,))
+ result = cursor.fetchone()
+ if bool(result):
+ return dict(result)
+
+ return None
diff --git a/uploader/db/populations.py b/uploader/db/populations.py
new file mode 100644
index 0000000..4485e52
--- /dev/null
+++ b/uploader/db/populations.py
@@ -0,0 +1,54 @@
+"""Functions for accessing the database relating to species populations."""
+import MySQLdb as mdb
+from MySQLdb.cursors import DictCursor
+
+def population_by_id(conn: mdb.Connection, population_id) -> dict:
+ """Get the grouping/population by id."""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM InbredSet WHERE InbredSetId=%s",
+ (population_id,))
+ return cursor.fetchone()
+
+def population_by_species_and_id(
+ conn: mdb.Connection, species_id, population_id) -> dict:
+ """Retrieve a population by its identifier and species."""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM InbredSet WHERE SpeciesId=%s AND Id=%s",
+ (species_id, population_id))
+ return cursor.fetchone()
+
+def populations_by_species(conn: mdb.Connection, speciesid) -> tuple:
+ "Retrieve group (InbredSet) information from the database."
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ query = "SELECT * FROM InbredSet WHERE SpeciesId=%s"
+ cursor.execute(query, (speciesid,))
+ return tuple(cursor.fetchall())
+
+ return tuple()
+
+def save_population(conn: mdb.Connection, population_details: dict) -> dict:
+ """Save the population details to the db."""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute(
+ "INSERT INTO InbredSet("
+ "InbredSetId, InbredSetName, Name, SpeciesId, FullName, "
+ "MenuOrderId, Description"
+ ") "
+ "VALUES ("
+ "%(InbredSetId)s, %(InbredSetName)s, %(Name)s, %(SpeciesId)s, "
+ "%(FullName)s, %(MenuOrderId)s, %(Description)s"
+ ")",
+ {
+ "MenuOrderId": 0,
+ "InbredSetId": 0,
+ **population_details
+ })
+ new_id = cursor.lastrowid
+ cursor.execute("UPDATE InbredSet SET InbredSetId=%s WHERE Id=%s",
+ (new_id, new_id))
+ return {
+ **population_details,
+ "Id": new_id,
+ "InbredSetId": new_id,
+ "population_id": new_id
+ }
diff --git a/uploader/db/species.py b/uploader/db/species.py
new file mode 100644
index 0000000..653e59b
--- /dev/null
+++ b/uploader/db/species.py
@@ -0,0 +1,22 @@
+"""Database functions for species."""
+import MySQLdb as mdb
+from MySQLdb.cursors import DictCursor
+
+def species(conn: mdb.Connection) -> tuple:
+ "Retrieve the species from the database."
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute(
+ "SELECT SpeciesId, SpeciesName, LOWER(Name) AS Name, MenuName, "
+ "FullName FROM Species")
+ return tuple(cursor.fetchall())
+
+ return tuple()
+
+def species_by_id(conn: mdb.Connection, speciesid) -> dict:
+ "Retrieve the species from the database by id."
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute(
+ "SELECT SpeciesId, SpeciesName, LOWER(Name) AS Name, MenuName, "
+ "FullName FROM Species WHERE SpeciesId=%s",
+ (speciesid,))
+ return cursor.fetchone()
diff --git a/uploader/db/tissues.py b/uploader/db/tissues.py
new file mode 100644
index 0000000..9fe7bab
--- /dev/null
+++ b/uploader/db/tissues.py
@@ -0,0 +1,50 @@
+"""Handle db interactions for tissue."""
+from typing import Union, Optional
+
+import MySQLdb as mdb
+from MySQLdb.cursors import DictCursor
+
+def all_tissues(conn: mdb.Connection) -> tuple[dict, ...]:
+ """All available tissue."""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM Tissue ORDER BY TissueName")
+ return tuple(dict(row) for row in cursor.fetchall())
+
+
+def tissue_by_id(conn: mdb.Connection, tissueid) -> Optional[dict]:
+ """Retrieve a tissue by its ID"""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM Tissue WHERE Id=%s", (tissueid,))
+ result = cursor.fetchone()
+ if bool(result):
+ return dict(result)
+
+ return None
+
+
+def create_new_tissue(
+ conn: mdb.Connection,
+ name: str,
+ shortname: str,
+ birnlexid: Optional[str] = None,
+ birnlexname: Optional[str] = None
+) -> dict[str, Union[int, str, None]]:
+ """Add a new tissue, organ or biological material to the database."""
+ with conn.cursor() as cursor:
+ cursor.execute(
+ "INSERT INTO "
+ "Tissue(TissueName, Name, Short_Name, BIRN_lex_ID, BIRN_lex_Name) "
+ "VALUES (%s, %s, %s, %s, %s)",
+ (name, name, shortname, birnlexid, birnlexname))
+ tissueid = cursor.lastrowid
+ cursor.execute("UPDATE Tissue SET TissueId=%s WHERE Id=%s",
+ (tissueid, tissueid))
+ return {
+ "Id": tissueid,
+ "TissueId": tissueid,
+ "TissueName": name,
+ "Name": name,
+ "Short_Name": shortname,
+ "BIRN_lex_ID": birnlexid,
+ "BIRN_lex_Name": birnlexname
+ }
diff --git a/uploader/db_utils.py b/uploader/db_utils.py
new file mode 100644
index 0000000..ef26398
--- /dev/null
+++ b/uploader/db_utils.py
@@ -0,0 +1,46 @@
+"""module contains all db related stuff"""
+import logging
+import traceback
+import contextlib
+from urllib.parse import urlparse
+from typing import Any, Tuple, Optional, Iterator, Callable
+
+import MySQLdb as mdb
+from redis import Redis
+from flask import current_app as app
+
+def parse_db_url(db_url) -> Tuple:
+ """
+ Parse SQL_URI configuration variable.
+ """
+ parsed_db = urlparse(db_url)
+ return (parsed_db.hostname, parsed_db.username,
+ parsed_db.password, parsed_db.path[1:], parsed_db.port)
+
+
+@contextlib.contextmanager
+def database_connection(db_url: Optional[str] = None) -> Iterator[mdb.Connection]:
+ """function to create db connector"""
+ host, user, passwd, db_name, db_port = parse_db_url(
+ db_url or app.config["SQL_URI"])
+ connection = mdb.connect(
+ host, user, passwd, db_name, port=(db_port or 3306))
+ try:
+ yield connection
+ connection.commit()
+ except mdb.Error as _mdb_err:
+ logging.error(traceback.format_exc())
+ connection.rollback()
+ finally:
+ connection.close()
+
+def with_db_connection(func: Callable[[mdb.Connection], Any]) -> Any:
+ """Call `func` with a MySQDdb database connection."""
+ with database_connection(app.config["SQL_URI"]) as conn:
+ return func(conn)
+
+def with_redis_connection(func: Callable[[Redis], Any]) -> Any:
+ """Call `func` with a redis connection."""
+ redisuri = app.config["REDIS_URL"]
+ with Redis.from_url(redisuri, decode_responses=True) as rconn:
+ return func(rconn)
diff --git a/uploader/dbinsert.py b/uploader/dbinsert.py
new file mode 100644
index 0000000..88d16ef
--- /dev/null
+++ b/uploader/dbinsert.py
@@ -0,0 +1,397 @@
+"Handle inserting data into the database"
+import os
+import json
+from typing import Union
+from functools import reduce
+from datetime import datetime
+
+from redis import Redis
+from MySQLdb.cursors import DictCursor
+from flask import (
+ flash, request, url_for, Blueprint, redirect, render_template,
+ current_app as app)
+
+from uploader.db_utils import with_db_connection, database_connection
+from uploader.db import species, species_by_id, populations_by_species
+
+from . import jobs
+
+dbinsertbp = Blueprint("dbinsert", __name__)
+
+def render_error(error_msg):
+ "Render the generic error page"
+ return render_template("dbupdate_error.html", error_message=error_msg), 400
+
+def make_menu_items_grouper(grouping_fn=lambda item: item):
+ "Build function to be used to group menu items."
+ def __grouper__(acc, row):
+ grouping = grouping_fn(row[2])
+ row_values = (row[0].strip(), row[1].strip())
+ if acc.get(grouping) is None:
+ return {**acc, grouping: (row_values,)}
+ return {**acc, grouping: (acc[grouping] + (row_values,))}
+ return __grouper__
+
+def genechips():
+ "Retrieve the genechip information from the database"
+ def __organise_by_species__(acc, chip):
+ speciesid = chip["SpeciesId"]
+ if acc.get(speciesid) is None:
+ return {**acc, speciesid: (chip,)}
+ return {**acc, speciesid: acc[speciesid] + (chip,)}
+
+ with database_connection() as conn:
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM GeneChip ORDER BY GeneChipName ASC")
+ return reduce(__organise_by_species__, cursor.fetchall(), {})
+
+ return {}
+
+def platform_by_id(genechipid:int) -> Union[dict, None]:
+ "Retrieve the gene platform by id"
+ with database_connection() as conn:
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute(
+ "SELECT * FROM GeneChip WHERE GeneChipId=%s",
+ (genechipid,))
+ return cursor.fetchone()
+
+def studies_by_species_and_platform(speciesid:int, genechipid:int) -> tuple:
+ "Retrieve the studies by the related species and gene platform"
+ with database_connection() as conn:
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ query = (
+ "SELECT Species.SpeciesId, ProbeFreeze.* "
+ "FROM Species INNER JOIN InbredSet "
+ "ON Species.SpeciesId=InbredSet.SpeciesId "
+ "INNER JOIN ProbeFreeze "
+ "ON InbredSet.InbredSetId=ProbeFreeze.InbredSetId "
+ "WHERE Species.SpeciesId = %s "
+ "AND ProbeFreeze.ChipId = %s")
+ cursor.execute(query, (speciesid, genechipid))
+ return tuple(cursor.fetchall())
+
+ return tuple()
+
+def organise_groups_by_family(acc:dict, group:dict) -> dict:
+ "Organise the group (InbredSet) information by the group field"
+ family = group["Family"]
+ if acc.get(family):
+ return {**acc, family: acc[family] + (group,)}
+ return {**acc, family: (group,)}
+
+def tissues() -> tuple:
+ "Retrieve type (Tissue) information from the database."
+ with database_connection() as conn:
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM Tissue ORDER BY Name")
+ return tuple(cursor.fetchall())
+
+ return tuple()
+
+@dbinsertbp.route("/platform", methods=["POST"])
+def select_platform():
+ "Select the platform (GeneChipId) used for the data."
+ job_id = request.form["job_id"]
+ with (Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn,
+ database_connection(app.config["SQL_URI"]) as conn):
+ job = jobs.job(rconn, jobs.jobsnamespace(), job_id)
+ if job:
+ filename = job["filename"]
+ filepath = f"{app.config['UPLOAD_FOLDER']}/{filename}"
+ if os.path.exists(filepath):
+ default_species = 1
+ gchips = genechips()
+ return render_template(
+ "select_platform.html", filename=filename,
+ filetype=job["filetype"], totallines=int(job["currentline"]),
+ default_species=default_species, species=species(conn),
+ genechips=gchips[default_species],
+ genechips_data=json.dumps(gchips))
+ return render_error(f"File '{filename}' no longer exists.")
+ return render_error(f"Job '{job_id}' no longer exists.")
+ return render_error("Unknown error")
+
+@dbinsertbp.route("/study", methods=["POST"])
+def select_study():
+ "View to select/create the study (ProbeFreeze) associated with the data."
+ form = request.form
+ try:
+ assert form.get("filename"), "filename"
+ assert form.get("filetype"), "filetype"
+ assert form.get("species"), "species"
+ assert form.get("genechipid"), "platform"
+
+ speciesid = form["species"]
+ genechipid = form["genechipid"]
+
+ the_studies = studies_by_species_and_platform(speciesid, genechipid)
+ the_groups = reduce(
+ organise_groups_by_family,
+ with_db_connection(
+ lambda conn: populations_by_species(conn, speciesid)),
+ {})
+ return render_template(
+ "select_study.html", filename=form["filename"],
+ filetype=form["filetype"], totallines=form["totallines"],
+ species=speciesid, genechipid=genechipid, studies=the_studies,
+ groups=the_groups, tissues = tissues(),
+ selected_group=int(form.get("inbredsetid", -13)),
+ selected_tissue=int(form.get("tissueid", -13)))
+ except AssertionError as aserr:
+ return render_error(f"Missing data: {aserr.args[0]}")
+
+@dbinsertbp.route("/create-study", methods=["POST"])
+def create_study():
+ "Create a new study (ProbeFreeze)."
+ form = request.form
+ try:
+ assert form.get("filename"), "filename"
+ assert form.get("filetype"), "filetype"
+ assert form.get("species"), "species"
+ assert form.get("genechipid"), "platform"
+ assert form.get("studyname"), "study name"
+ assert form.get("inbredsetid"), "group"
+ assert form.get("tissueid"), "type/tissue"
+
+ with database_connection() as conn:
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ values = (
+ form["genechipid"],
+ form["tissueid"],
+ form["studyname"],
+ form.get("studyfullname", ""),
+ form.get("studyshortname", ""),
+ datetime.now().date().strftime("%Y-%m-%d"),
+ form["inbredsetid"])
+ query = (
+ "INSERT INTO ProbeFreeze("
+ "ChipId, TissueId, Name, FullName, ShortName, CreateTime, "
+ "InbredSetId"
+ ") VALUES (%s, %s, %s, %s, %s, %s, %s)")
+ cursor.execute(query, values)
+ new_studyid = cursor.lastrowid
+ cursor.execute(
+ "UPDATE ProbeFreeze SET ProbeFreezeId=%s WHERE Id=%s",
+ (new_studyid, new_studyid))
+ flash("Study created successfully", "alert-success")
+ return render_template(
+ "continue_from_create_study.html",
+ filename=form["filename"], filetype=form["filetype"],
+ totallines=form["totallines"], species=form["species"],
+ genechipid=form["genechipid"], studyid=new_studyid)
+ except AssertionError as aserr:
+ flash(f"Missing data: {aserr.args[0]}", "alert-error")
+ return redirect(url_for("dbinsert.select_study"), code=307)
+
+def datasets_by_study(studyid:int) -> tuple:
+ "Retrieve datasets associated with a study with the ID `studyid`."
+ with database_connection() as conn:
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ query = "SELECT * FROM ProbeSetFreeze WHERE ProbeFreezeId=%s"
+ cursor.execute(query, (studyid,))
+ return tuple(cursor.fetchall())
+
+ return tuple()
+
+def averaging_methods() -> tuple:
+ "Retrieve averaging methods from database"
+ with database_connection() as conn:
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM AvgMethod")
+ return tuple(cursor.fetchall())
+
+ return tuple()
+
+def dataset_datascales() -> tuple:
+ "Retrieve datascales from database"
+ with database_connection() as conn:
+ with conn.cursor() as cursor:
+ cursor.execute(
+ 'SELECT DISTINCT DataScale FROM ProbeSetFreeze '
+ 'WHERE DataScale IS NOT NULL AND DataScale != ""')
+ return tuple(
+ item for item in
+ (res[0].strip() for res in cursor.fetchall())
+ if (item is not None and item != ""))
+
+ return tuple()
+
+@dbinsertbp.route("/dataset", methods=["POST"])
+def select_dataset():
+ "Select the dataset to add the file contents against"
+ form = request.form
+ try:
+ assert form.get("filename"), "filename"
+ assert form.get("filetype"), "filetype"
+ assert form.get("species"), "species"
+ assert form.get("genechipid"), "platform"
+ assert form.get("studyid"), "study"
+
+ studyid = form["studyid"]
+ datasets = datasets_by_study(studyid)
+ return render_template(
+ "select_dataset.html", **{**form, "studyid": studyid},
+ datasets=datasets, avgmethods=averaging_methods(),
+ datascales=dataset_datascales())
+ except AssertionError as aserr:
+ return render_error(f"Missing data: {aserr.args[0]}")
+
+@dbinsertbp.route("/create-dataset", methods=["POST"])
+def create_dataset():
+ "Select the dataset to add the file contents against"
+ form = request.form
+ try:
+ assert form.get("filename"), "filename"
+ assert form.get("filetype"), "filetype"
+ assert form.get("species"), "species"
+ assert form.get("genechipid"), "platform"
+ assert form.get("studyid"), "study"
+ assert form.get("avgid"), "averaging method"
+ assert form.get("datasetname2"), "Dataset Name 2"
+ assert form.get("datasetfullname"), "Dataset Full Name"
+ assert form.get("datasetshortname"), "Dataset Short Name"
+ assert form.get("datasetpublic"), "Dataset public specification"
+ assert form.get("datasetconfidentiality"), "Dataset confidentiality"
+ assert form.get("datasetdatascale"), "Dataset Datascale"
+
+ with database_connection() as conn:
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ datasetname = form["datasetname"]
+ cursor.execute("SELECT * FROM ProbeSetFreeze WHERE Name=%s",
+ (datasetname,))
+ results = cursor.fetchall()
+ if bool(results):
+ flash("A dataset with that name already exists.",
+ "alert-error")
+ return redirect(url_for("dbinsert.select_dataset"), code=307)
+ values = (
+ form["studyid"], form["avgid"],
+ datasetname, form["datasetname2"],
+ form["datasetfullname"], form["datasetshortname"],
+ datetime.now().date().strftime("%Y-%m-%d"),
+ form["datasetpublic"], form["datasetconfidentiality"],
+ "williamslab", form["datasetdatascale"])
+ query = (
+ "INSERT INTO ProbeSetFreeze("
+ "ProbeFreezeId, AvgID, Name, Name2, FullName, "
+ "ShortName, CreateTime, OrderList, public, "
+ "confidentiality, AuthorisedUsers, DataScale) "
+ "VALUES"
+ "(%s, %s, %s, %s, %s, %s, %s, NULL, %s, %s, %s, %s)")
+ cursor.execute(query, values)
+ new_datasetid = cursor.lastrowid
+ return render_template(
+ "continue_from_create_dataset.html",
+ filename=form["filename"], filetype=form["filetype"],
+ species=form["species"], genechipid=form["genechipid"],
+ studyid=form["studyid"], datasetid=new_datasetid,
+ totallines=form["totallines"])
+ except AssertionError as aserr:
+ flash(f"Missing data {aserr.args[0]}", "alert-error")
+ return redirect(url_for("dbinsert.select_dataset"), code=307)
+
+def study_by_id(studyid:int) -> Union[dict, None]:
+ "Get a study by its Id"
+ with database_connection() as conn:
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute(
+ "SELECT * FROM ProbeFreeze WHERE Id=%s",
+ (studyid,))
+ return cursor.fetchone()
+
+def dataset_by_id(datasetid:int) -> Union[dict, None]:
+ "Retrieve a dataset by its id"
+ with database_connection() as conn:
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute(
+ ("SELECT AvgMethod.Name AS AvgMethodName, ProbeSetFreeze.* "
+ "FROM ProbeSetFreeze INNER JOIN AvgMethod "
+ "ON ProbeSetFreeze.AvgId=AvgMethod.AvgMethodId "
+ "WHERE ProbeSetFreeze.Id=%s"),
+ (datasetid,))
+ return cursor.fetchone()
+
+def selected_keys(original: dict, keys: tuple) -> dict:
+ "Return a new dict from the `original` dict with only `keys` present."
+ return {key: value for key,value in original.items() if key in keys}
+
+@dbinsertbp.route("/final-confirmation", methods=["POST"])
+def final_confirmation():
+ "Preview the data before triggering entry into the database"
+ form = request.form
+ try:
+ assert form.get("filename"), "filename"
+ assert form.get("filetype"), "filetype"
+ assert form.get("species"), "species"
+ assert form.get("genechipid"), "platform"
+ assert form.get("studyid"), "study"
+ assert form.get("datasetid"), "dataset"
+
+ speciesid = form["species"]
+ genechipid = form["genechipid"]
+ studyid = form["studyid"]
+ datasetid=form["datasetid"]
+ return render_template(
+ "final_confirmation.html", filename=form["filename"],
+ filetype=form["filetype"], totallines=form["totallines"],
+ species=speciesid, genechipid=genechipid, studyid=studyid,
+ datasetid=datasetid, the_species=selected_keys(
+ with_db_connection(lambda conn: species_by_id(conn, speciesid)),
+ ("SpeciesName", "Name", "MenuName")),
+ platform=selected_keys(
+ platform_by_id(genechipid),
+ ("GeneChipName", "Name", "GeoPlatform", "Title", "GO_tree_value")),
+ study=selected_keys(
+ study_by_id(studyid), ("Name", "FullName", "ShortName")),
+ dataset=selected_keys(
+ dataset_by_id(datasetid),
+ ("AvgMethodName", "Name", "Name2", "FullName", "ShortName",
+ "DataScale")))
+ except AssertionError as aserr:
+ return render_error(f"Missing data: {aserr.args[0]}")
+
+@dbinsertbp.route("/insert-data", methods=["POST"])
+def insert_data():
+ "Trigger data insertion"
+ form = request.form
+ try:
+ assert form.get("filename"), "filename"
+ assert form.get("filetype"), "filetype"
+ assert form.get("species"), "species"
+ assert form.get("genechipid"), "platform"
+ assert form.get("studyid"), "study"
+ assert form.get("datasetid"), "dataset"
+
+ filename = form["filename"]
+ filepath = f"{app.config['UPLOAD_FOLDER']}/{filename}"
+ redisurl = app.config["REDIS_URL"]
+ if os.path.exists(filepath):
+ with Redis.from_url(redisurl, decode_responses=True) as rconn:
+ job = jobs.launch_job(
+ jobs.data_insertion_job(
+ rconn, filepath, form["filetype"], form["totallines"],
+ form["species"], form["genechipid"], form["datasetid"],
+ app.config["SQL_URI"], redisurl,
+ app.config["JOBS_TTL_SECONDS"]),
+ redisurl, f"{app.config['UPLOAD_FOLDER']}/job_errors")
+
+ return redirect(url_for("dbinsert.insert_status", job_id=job["jobid"]))
+ return render_error(f"File '{filename}' no longer exists.")
+ except AssertionError as aserr:
+ return render_error(f"Missing data: {aserr.args[0]}")
+
+@dbinsertbp.route("/status/", methods=["GET"])
+def insert_status(job_id: str):
+ "Retrieve status of data insertion."
+ with Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn:
+ job = jobs.job(rconn, jobs.jobsnamespace(), job_id)
+
+ if job:
+ job_status = job["status"]
+ if job_status == "success":
+ return render_template("insert_success.html", job=job)
+ if job["status"] == "error":
+ return render_template("insert_error.html", job=job)
+ return render_template("insert_progress.html", job=job)
+ return render_template("no_such_job.html", job_id=job_id), 400
diff --git a/uploader/default_settings.py b/uploader/default_settings.py
new file mode 100644
index 0000000..7a9da0f
--- /dev/null
+++ b/uploader/default_settings.py
@@ -0,0 +1,14 @@
+"""
+The default configuration file. The values here should be overridden in the
+actual configuration file used for the production and staging systems.
+"""
+
+import os
+
+LOG_LEVEL = os.getenv("LOG_LEVEL", "WARNING")
+SECRET_KEY = b""
+UPLOAD_FOLDER = "/tmp/qc_app_files"
+REDIS_URL = "redis://"
+JOBS_TTL_SECONDS = 1209600 # 14 days
+GNQC_REDIS_PREFIX="GNQC"
+SQL_URI = ""
diff --git a/uploader/entry.py b/uploader/entry.py
new file mode 100644
index 0000000..4a02f1e
--- /dev/null
+++ b/uploader/entry.py
@@ -0,0 +1,127 @@
+"""Entry-point module"""
+import os
+import mimetypes
+from typing import Tuple
+from zipfile import ZipFile, is_zipfile
+
+from werkzeug.utils import secure_filename
+from flask import (
+ flash,
+ request,
+ url_for,
+ redirect,
+ Blueprint,
+ render_template,
+ current_app as app,
+ send_from_directory)
+
+from uploader.db import species
+from uploader.db_utils import with_db_connection
+
+entrybp = Blueprint("entry", __name__)
+
+@entrybp.route("/favicon.ico", methods=["GET"])
+def favicon():
+ """Return the favicon."""
+ return send_from_directory(os.path.join(app.root_path, "static"),
+ "images/CITGLogo.png",
+ mimetype="image/png")
+
+
+def errors(rqst) -> Tuple[str, ...]:
+ """Return a tuple of the errors found in the request `rqst`. If no error is
+ found, then an empty tuple is returned."""
+ def __filetype_error__():
+ return (
+ ("Invalid file type provided.",)
+ if rqst.form.get("filetype") not in ("average", "standard-error")
+ else tuple())
+
+ def __file_missing_error__():
+ return (
+ ("No file was uploaded.",)
+ if ("qc_text_file" not in rqst.files or
+ rqst.files["qc_text_file"].filename == "")
+ else tuple())
+
+ def __file_mimetype_error__():
+ text_file = rqst.files["qc_text_file"]
+ return (
+ (
+ ("Invalid file! Expected a tab-separated-values file, or a zip "
+ "file of the a tab-separated-values file."),)
+ if text_file.mimetype not in (
+ "text/plain", "text/tab-separated-values",
+ "application/zip")
+ else tuple())
+
+ return (
+ __filetype_error__() +
+ (__file_missing_error__() or __file_mimetype_error__()))
+
+def zip_file_errors(filepath, upload_dir) -> Tuple[str, ...]:
+ """Check the uploaded zip file for errors."""
+ zfile_errors: Tuple[str, ...] = tuple()
+ if is_zipfile(filepath):
+ with ZipFile(filepath, "r") as zfile:
+ infolist = zfile.infolist()
+ if len(infolist) != 1:
+ zfile_errors = zfile_errors + (
+ ("Expected exactly one (1) member file within the uploaded zip "
+ f"file. Got {len(infolist)} member files."),)
+ if len(infolist) == 1 and infolist[0].is_dir():
+ zfile_errors = zfile_errors + (
+ ("Expected a member text file in the uploaded zip file. Got a "
+ "directory/folder."),)
+
+ if len(infolist) == 1 and not infolist[0].is_dir():
+ zfile.extract(infolist[0], path=upload_dir)
+ mime = mimetypes.guess_type(f"{upload_dir}/{infolist[0].filename}")
+ if mime[0] != "text/tab-separated-values":
+ zfile_errors = zfile_errors + (
+ ("Expected the member text file in the uploaded zip file to"
+ " be a tab-separated file."),)
+
+ return zfile_errors
+
+@entrybp.route("/", methods=["GET"])
+def index():
+ """Load the landing page"""
+ return render_template("index.html")
+
+@entrybp.route("/upload", methods=["GET", "POST"])
+def upload_file():
+ """Enables uploading the files"""
+ if request.method == "GET":
+ return render_template(
+ "select_species.html", species=with_db_connection(species))
+
+ upload_dir = app.config["UPLOAD_FOLDER"]
+ request_errors = errors(request)
+ if request_errors:
+ for error in request_errors:
+ flash(error, "alert-danger error-expr-data")
+ return redirect(url_for("entry.upload_file"))
+
+ filename = secure_filename(request.files["qc_text_file"].filename)
+ if not os.path.exists(upload_dir):
+ os.mkdir(upload_dir)
+
+ filepath = os.path.join(upload_dir, filename)
+ request.files["qc_text_file"].save(os.path.join(upload_dir, filename))
+
+ zip_errors = zip_file_errors(filepath, upload_dir)
+ if zip_errors:
+ for error in zip_errors:
+ flash(error, "alert-danger error-expr-data")
+ return redirect(url_for("entry.upload_file"))
+
+ return redirect(url_for("parse.parse",
+ speciesid=request.form["speciesid"],
+ filename=filename,
+ filetype=request.form["filetype"]))
+
+@entrybp.route("/data-review", methods=["GET"])
+def data_review():
+ """Provide some help on data expectations to the user."""
+ return render_template("data_review.html")
diff --git a/uploader/errors.py b/uploader/errors.py
new file mode 100644
index 0000000..3e7c893
--- /dev/null
+++ b/uploader/errors.py
@@ -0,0 +1,29 @@
+"""Application error handling."""
+import traceback
+from werkzeug.exceptions import HTTPException
+
+import MySQLdb as mdb
+from flask import Flask, request, render_template, current_app as app
+
+def handle_general_exception(exc: Exception):
+ """Handle generic exceptions."""
+ trace = traceback.format_exc()
+ app.logger.error(
+ "Error (%s.%s): Generic unhandled exception!! (URI: %s)\n%s",
+ exc.__class__.__module__, exc.__class__.__name__, request.url, trace)
+ return render_template("unhandled_exception.html", trace=trace), 500
+
+def handle_http_exception(exc: HTTPException):
+ """Handle HTTP exceptions."""
+ app.logger.error(
+ "HTTP Error %s: %s", exc.code, exc.description, exc_info=True)
+ return render_template("http-error.html",
+ request_url=request.url,
+ exc=exc,
+ trace=traceback.format_exception(exc)), exc.code
+
+def register_error_handlers(appl: Flask):
+ """Register top-level error/exception handlers."""
+ appl.register_error_handler(Exception, handle_general_exception)
+ appl.register_error_handler(HTTPException, handle_http_exception)
+ appl.register_error_handler(mdb.MySQLError, handle_general_exception)
diff --git a/uploader/files.py b/uploader/files.py
new file mode 100644
index 0000000..b163612
--- /dev/null
+++ b/uploader/files.py
@@ -0,0 +1,26 @@
+"""Utilities to deal with uploaded files."""
+import hashlib
+from pathlib import Path
+from datetime import datetime
+from flask import current_app
+
+from werkzeug.utils import secure_filename
+from werkzeug.datastructures import FileStorage
+
+def save_file(fileobj: FileStorage, upload_dir: Path) -> Path:
+ """Save the uploaded file and return the path."""
+ assert bool(fileobj), "Invalid file object!"
+ hashed_name = hashlib.sha512(
+ f"{fileobj.filename}::{datetime.now().isoformat()}".encode("utf8")
+ ).hexdigest()
+ filename = Path(secure_filename(hashed_name)) # type: ignore[arg-type]
+ if not upload_dir.exists():
+ upload_dir.mkdir()
+
+ filepath = Path(upload_dir, filename)
+ fileobj.save(filepath)
+ return filepath
+
+def fullpath(filename: str):
+ """Get a file's full path. This makes use of `flask.current_app`."""
+ return Path(current_app.config["UPLOAD_FOLDER"], filename).absolute()
diff --git a/uploader/input_validation.py b/uploader/input_validation.py
new file mode 100644
index 0000000..9abe742
--- /dev/null
+++ b/uploader/input_validation.py
@@ -0,0 +1,27 @@
+"""Input validation utilities"""
+from typing import Any
+
+def is_empty_string(value: str) -> bool:
+ """Check whether as string is empty"""
+ return (isinstance(value, str) and value.strip() == "")
+
+def is_empty_input(value: Any) -> bool:
+ """Check whether user provided an empty value."""
+ return (value is None or is_empty_string(value))
+
+def is_integer_input(value: Any) -> bool:
+ """
+ Check whether user provided a value that can be parsed into an integer.
+ """
+ def __is_int__(val, base):
+ try:
+ int(val, base=base)
+ except ValueError:
+ return False
+ return True
+ return isinstance(value, int) or (
+ (not is_empty_input(value)) and (
+ isinstance(value, str) and (
+ __is_int__(value, 10)
+ or __is_int__(value, 8)
+ or __is_int__(value, 16))))
diff --git a/uploader/jobs.py b/uploader/jobs.py
new file mode 100644
index 0000000..21889da
--- /dev/null
+++ b/uploader/jobs.py
@@ -0,0 +1,130 @@
+"""Handle jobs"""
+import os
+import sys
+import shlex
+import subprocess
+from uuid import UUID, uuid4
+from datetime import timedelta
+from typing import Union, Optional
+
+from redis import Redis
+from flask import current_app as app
+
+JOBS_PREFIX = "JOBS"
+
+class JobNotFound(Exception):
+ """Raised if we try to retrieve a non-existent job."""
+
+def jobsnamespace():
+ """
+ Return the jobs namespace prefix. It depends on app configuration.
+
+ Calling this function outside of an application context will cause an
+ exception to be raised. It is mostly a convenience utility to use within the
+ application.
+ """
+ return f"{app.config['GNQC_REDIS_PREFIX']}:{JOBS_PREFIX}"
+
+def job_key(namespaceprefix: str, jobid: Union[str, UUID]) -> str:
+ """Build the key by appending it to the namespace prefix."""
+ return f"{namespaceprefix}:{jobid}"
+
+def raise_jobnotfound(rprefix:str, jobid: Union[str,UUID]):
+ """Utility to raise a `NoSuchJobError`"""
+ raise JobNotFound(f"Could not retrieve job '{jobid}' from '{rprefix}.")
+
+def error_filename(jobid, error_dir):
+ "Compute the path of the file where errors will be dumped."
+ return f"{error_dir}/job_{jobid}.error"
+
+def initialise_job(# pylint: disable=[too-many-arguments]
+ rconn: Redis, rprefix: str, jobid: str, command: list, job_type: str,
+ ttl_seconds: int = 86400, extra_meta: Optional[dict] = None) -> dict:
+ "Initialise a job 'object' and put in on redis"
+ the_job = {
+ "jobid": jobid, "command": shlex.join(command), "status": "pending",
+ "percent": 0, "job-type": job_type, **(extra_meta or {})
+ }
+ rconn.hset(job_key(rprefix, jobid), mapping=the_job)
+ rconn.expire(
+ name=job_key(rprefix, jobid), time=timedelta(seconds=ttl_seconds))
+ return the_job
+
+def build_file_verification_job(#pylint: disable=[too-many-arguments]
+ redis_conn: Redis,
+ dburi: str,
+ redisuri: str,
+ speciesid: int,
+ filepath: str,
+ filetype: str,
+ ttl_seconds: int):
+ "Build a file verification job"
+ jobid = str(uuid4())
+ command = [
+ sys.executable, "-m", "scripts.validate_file",
+ dburi, redisuri, jobsnamespace(), jobid,
+ "--redisexpiry", str(ttl_seconds),
+ str(speciesid), filetype, filepath,
+ ]
+ return initialise_job(
+ redis_conn, jobsnamespace(), jobid, command, "file-verification",
+ ttl_seconds, {
+ "filetype": filetype,
+ "filename": os.path.basename(filepath), "percent": 0
+ })
+
+def data_insertion_job(# pylint: disable=[too-many-arguments]
+ redis_conn: Redis, filepath: str, filetype: str, totallines: int,
+ speciesid: int, platformid: int, datasetid: int, databaseuri: str,
+ redisuri: str, ttl_seconds: int) -> dict:
+ "Build a data insertion job"
+ jobid = str(uuid4())
+ command = [
+ sys.executable, "-m", "scripts.insert_data", filetype, filepath,
+ speciesid, platformid, datasetid, databaseuri, redisuri
+ ]
+ return initialise_job(
+ redis_conn, jobsnamespace(), jobid, command, "data-insertion",
+ ttl_seconds, {
+ "filename": os.path.basename(filepath), "filetype": filetype,
+ "totallines": totallines
+ })
+
+def launch_job(the_job: dict, redisurl: str, error_dir):
+ """Launch a job in the background"""
+ if not os.path.exists(error_dir):
+ os.mkdir(error_dir)
+
+ jobid = the_job["jobid"]
+ with open(error_filename(jobid, error_dir),
+ "w",
+ encoding="utf-8") as errorfile:
+ subprocess.Popen( # pylint: disable=[consider-using-with]
+ [sys.executable, "-m", "scripts.worker", redisurl, jobsnamespace(),
+ jobid],
+ stderr=errorfile,
+ env={"PYTHONPATH": ":".join(sys.path)})
+
+ return the_job
+
+def job(rconn: Redis, rprefix: str, jobid: Union[str,UUID]):
+ "Retrieve the job"
+ thejob = (rconn.hgetall(job_key(rprefix, jobid)) or
+ raise_jobnotfound(rprefix, jobid))
+ return thejob
+
+def update_status(
+ rconn: Redis, rprefix: str, jobid: Union[str, UUID], status: str):
+ """Update status of job in redis."""
+ rconn.hset(name=job_key(rprefix, jobid), key="status", value=status)
+
+def update_stdout_stderr(rconn: Redis,
+ rprefix: str,
+ jobid: Union[str, UUID],
+ bytes_read: bytes,
+ stream: str):
+ "Update the stdout/stderr keys according to the value of `stream`."
+ thejob = job(rconn, rprefix, jobid)
+ contents = thejob.get(stream, '')
+ new_contents = contents + bytes_read.decode("utf-8")
+ rconn.hset(name=job_key(rprefix, jobid), key=stream, value=new_contents)
diff --git a/uploader/parse.py b/uploader/parse.py
new file mode 100644
index 0000000..865dae2
--- /dev/null
+++ b/uploader/parse.py
@@ -0,0 +1,175 @@
+"""File parsing module"""
+import os
+
+import jsonpickle
+from redis import Redis
+from flask import flash, request, url_for, redirect, Blueprint, render_template
+from flask import current_app as app
+
+from quality_control.errors import InvalidValue, DuplicateHeading
+
+from uploader import jobs
+from uploader.dbinsert import species_by_id
+from uploader.db_utils import with_db_connection
+
+parsebp = Blueprint("parse", __name__)
+
+def isinvalidvalue(item):
+ """Check whether item is of type InvalidValue"""
+ return isinstance(item, InvalidValue)
+
+def isduplicateheading(item):
+ """Check whether item is of type DuplicateHeading"""
+ return isinstance(item, DuplicateHeading)
+
+@parsebp.route("/parse", methods=["GET"])
+def parse():
+ """Trigger file parsing"""
+ errors = False
+ speciesid = request.args.get("speciesid")
+ filename = request.args.get("filename")
+ filetype = request.args.get("filetype")
+ if speciesid is None:
+ flash("No species selected", "alert-error error-expr-data")
+ errors = True
+ else:
+ try:
+ speciesid = int(speciesid)
+ species = with_db_connection(
+ lambda con: species_by_id(con, speciesid))
+ if not bool(species):
+ flash("No such species.", "alert-error error-expr-data")
+ errors = True
+ except ValueError:
+ flash("Invalid speciesid provided. Expected an integer.",
+ "alert-error error-expr-data")
+ errors = True
+
+ if filename is None:
+ flash("No file provided", "alert-error error-expr-data")
+ errors = True
+
+ if filetype is None:
+ flash("No filetype provided", "alert-error error-expr-data")
+ errors = True
+
+ if filetype not in ("average", "standard-error"):
+ flash("Invalid filetype provided", "alert-error error-expr-data")
+ errors = True
+
+ if filename:
+ filepath = os.path.join(app.config["UPLOAD_FOLDER"], filename)
+ if not os.path.exists(filepath):
+ flash("Selected file does not exist (any longer)",
+ "alert-error error-expr-data")
+ errors = True
+
+ if errors:
+ return redirect(url_for("entry.upload_file"))
+
+ redisurl = app.config["REDIS_URL"]
+ with Redis.from_url(redisurl, decode_responses=True) as rconn:
+ job = jobs.launch_job(
+ jobs.build_file_verification_job(
+ rconn, app.config["SQL_URI"], redisurl,
+ speciesid, filepath, filetype,
+ app.config["JOBS_TTL_SECONDS"]),
+ redisurl,
+ f"{app.config['UPLOAD_FOLDER']}/job_errors")
+
+ return redirect(url_for("parse.parse_status", job_id=job["jobid"]))
+
+@parsebp.route("/status/", methods=["GET"])
+def parse_status(job_id: str):
+ "Retrieve the status of the job"
+ with Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn:
+ try:
+ job = jobs.job(rconn, jobs.jobsnamespace(), job_id)
+ except jobs.JobNotFound as _exc:
+ return render_template("no_such_job.html", job_id=job_id), 400
+
+ error_filename = jobs.error_filename(
+ job_id, f"{app.config['UPLOAD_FOLDER']}/job_errors")
+ if os.path.exists(error_filename):
+ stat = os.stat(error_filename)
+ if stat.st_size > 0:
+ return redirect(url_for("parse.fail", job_id=job_id))
+
+ job_id = job["jobid"]
+ progress = float(job["percent"])
+ status = job["status"]
+ filename = job.get("filename", "uploaded file")
+ errors = jsonpickle.decode(
+ job.get("errors", jsonpickle.encode(tuple())))
+ if status in ("success", "aborted"):
+ return redirect(url_for("parse.results", job_id=job_id))
+
+ if status == "parse-error":
+ return redirect(url_for("parse.fail", job_id=job_id))
+
+ app.jinja_env.globals.update(
+ isinvalidvalue=isinvalidvalue,
+ isduplicateheading=isduplicateheading)
+ return render_template(
+ "job_progress.html",
+ job_id = job_id,
+ job_status = status,
+ progress = progress,
+ message = job.get("message", ""),
+ job_name = f"Parsing '{filename}'",
+ errors=errors)
+
+@parsebp.route("/results/", methods=["GET"])
+def results(job_id: str):
+ """Show results of parsing..."""
+ with Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn:
+ job = jobs.job(rconn, jobs.jobsnamespace(), job_id)
+
+ if job:
+ filename = job["filename"]
+ errors = jsonpickle.decode(job.get("errors", jsonpickle.encode(tuple())))
+ app.jinja_env.globals.update(
+ isinvalidvalue=isinvalidvalue,
+ isduplicateheading=isduplicateheading)
+ return render_template(
+ "parse_results.html",
+ errors=errors,
+ job_name = f"Parsing '{filename}'",
+ user_aborted = job.get("user_aborted"),
+ job_id=job["jobid"])
+
+ return render_template("no_such_job.html", job_id=job_id)
+
+@parsebp.route("/fail/", methods=["GET"])
+def fail(job_id: str):
+ """Handle parsing failure"""
+ with Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn:
+ job = jobs.job(rconn, jobs.jobsnamespace(), job_id)
+
+ if job:
+ error_filename = jobs.error_filename(
+ job_id, f"{app.config['UPLOAD_FOLDER']}/job_errors")
+ if os.path.exists(error_filename):
+ stat = os.stat(error_filename)
+ if stat.st_size > 0:
+ return render_template(
+ "worker_failure.html", job_id=job_id)
+
+ return render_template("parse_failure.html", job=job)
+
+ return render_template("no_such_job.html", job_id=job_id)
+
+@parsebp.route("/abort", methods=["POST"])
+def abort():
+ """Handle user request to abort file processing"""
+ job_id = request.form["job_id"]
+
+ with Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn:
+ job = jobs.job(rconn, jobs.jobsnamespace(), job_id)
+
+ if job:
+ rconn.hset(name=jobs.job_key(jobs.jobsnamespace(), job_id),
+ key="user_aborted",
+ value=int(True))
+
+ return redirect(url_for("parse.parse_status", job_id=job_id))
diff --git a/uploader/samples.py b/uploader/samples.py
new file mode 100644
index 0000000..9c95770
--- /dev/null
+++ b/uploader/samples.py
@@ -0,0 +1,354 @@
+"""Code regarding samples"""
+import os
+import sys
+import csv
+import uuid
+from pathlib import Path
+from typing import Iterator
+
+import MySQLdb as mdb
+from redis import Redis
+from MySQLdb.cursors import DictCursor
+from flask import (
+ flash,
+ request,
+ url_for,
+ redirect,
+ Blueprint,
+ render_template,
+ current_app as app)
+
+from functional_tools import take
+
+from uploader import jobs
+from uploader.files import save_file
+from uploader.input_validation import is_integer_input
+from uploader.db_utils import (
+ with_db_connection,
+ database_connection,
+ with_redis_connection)
+from uploader.db import (
+ species_by_id,
+ save_population,
+ population_by_id,
+ populations_by_species,
+ species as fetch_species)
+
+samples = Blueprint("samples", __name__)
+
+@samples.route("/upload/species", methods=["GET", "POST"])
+def select_species():
+ """Select the species."""
+ if request.method == "GET":
+ return render_template("samples/select-species.html",
+ species=with_db_connection(fetch_species))
+
+ index_page = redirect(url_for("entry.upload_file"))
+ species_id = request.form.get("species_id")
+ if bool(species_id):
+ species_id = int(species_id)
+ species = with_db_connection(
+ lambda conn: species_by_id(conn, species_id))
+ if bool(species):
+ return redirect(url_for(
+ "samples.select_population", species_id=species_id))
+ flash("Invalid species selected!", "alert-error")
+ flash("You need to select a species", "alert-error")
+ return index_page
+
+@samples.route("/upload/species//create-population",
+ methods=["POST"])
+def create_population(species_id: int):
+ """Create new grouping/population."""
+ if not is_integer_input(species_id):
+ flash("You did not provide a valid species. Please select one to "
+ "continue.",
+ "alert-danger")
+ return redirect(url_for("samples.select_species"))
+ species = with_db_connection(lambda conn: species_by_id(conn, species_id))
+ if not bool(species):
+ flash("Species with given ID was not found.", "alert-danger")
+ return redirect(url_for("samples.select_species"))
+
+ species_page = redirect(url_for("samples.select_species"), code=307)
+ with database_connection(app.config["SQL_URI"]) as conn:
+ species = species_by_id(conn, species_id)
+ pop_name = request.form.get("inbredset_name", "").strip()
+ pop_fullname = request.form.get("inbredset_fullname", "").strip()
+
+ if not bool(species):
+ flash("Invalid species!", "alert-error error-create-population")
+ return species_page
+ if (not bool(pop_name)) or (not bool(pop_fullname)):
+ flash("You *MUST* provide a grouping/population name",
+ "alert-error error-create-population")
+ return species_page
+
+ pop = save_population(conn, {
+ "SpeciesId": species["SpeciesId"],
+ "Name": pop_name,
+ "InbredSetName": pop_fullname,
+ "FullName": pop_fullname,
+ "Family": request.form.get("inbredset_family") or None,
+ "Description": request.form.get("description") or None
+ })
+
+ flash("Grouping/Population created successfully.", "alert-success")
+ return redirect(url_for("samples.upload_samples",
+ species_id=species_id,
+ population_id=pop["population_id"]))
+
+@samples.route("/upload/species//population",
+ methods=["GET", "POST"])
+def select_population(species_id: int):
+ """Select from existing groupings/populations."""
+ if not is_integer_input(species_id):
+ flash("You did not provide a valid species. Please select one to "
+ "continue.",
+ "alert-danger")
+ return redirect(url_for("samples.select_species"))
+ species = with_db_connection(lambda conn: species_by_id(conn, species_id))
+ if not bool(species):
+ flash("Species with given ID was not found.", "alert-danger")
+ return redirect(url_for("samples.select_species"))
+
+ if request.method == "GET":
+ return render_template(
+ "samples/select-population.html",
+ species=species,
+ populations=with_db_connection(
+ lambda conn: populations_by_species(conn, species_id)))
+
+ population_page = redirect(url_for(
+ "samples.select_population", species_id=species_id), code=307)
+ _population_id = request.form.get("inbredset_id")
+ if not is_integer_input(_population_id):
+ flash("You did not provide a valid population. Please select one to "
+ "continue.",
+ "alert-danger")
+ return population_page
+ population = with_db_connection(
+ lambda conn: population_by_id(conn, _population_id))
+ if not bool(population):
+ flash("Invalid grouping/population!",
+ "alert-error error-select-population")
+ return population_page
+
+ return redirect(url_for("samples.upload_samples",
+ species_id=species_id,
+ population_id=_population_id),
+ code=307)
+
+def read_samples_file(filepath, separator: str, firstlineheading: bool, **kwargs) -> Iterator[dict]:
+ """Read the samples file."""
+ with open(filepath, "r", encoding="utf-8") as inputfile:
+ reader = csv.DictReader(
+ inputfile,
+ fieldnames=(
+ None if firstlineheading
+ else ("Name", "Name2", "Symbol", "Alias")),
+ delimiter=separator,
+ quotechar=kwargs.get("quotechar", '"'))
+ for row in reader:
+ yield row
+
+def save_samples_data(conn: mdb.Connection,
+ speciesid: int,
+ file_data: Iterator[dict]):
+ """Save the samples to DB."""
+ data = ({**row, "SpeciesId": speciesid} for row in file_data)
+ total = 0
+ with conn.cursor() as cursor:
+ while True:
+ batch = take(data, 5000)
+ if len(batch) == 0:
+ break
+ cursor.executemany(
+ "INSERT INTO Strain(Name, Name2, SpeciesId, Symbol, Alias) "
+ "VALUES("
+ " %(Name)s, %(Name2)s, %(SpeciesId)s, %(Symbol)s, %(Alias)s"
+ ") ON DUPLICATE KEY UPDATE Name=Name",
+ batch)
+ total += len(batch)
+ print(f"\tSaved {total} samples total so far.")
+
+def cross_reference_samples(conn: mdb.Connection,
+ species_id: int,
+ population_id: int,
+ strain_names: Iterator[str]):
+ """Link samples to their population."""
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute(
+ "SELECT MAX(OrderId) AS loid FROM StrainXRef WHERE InbredSetId=%s",
+ (population_id,))
+ last_order_id = (cursor.fetchone()["loid"] or 10)
+ total = 0
+ while True:
+ batch = take(strain_names, 5000)
+ if len(batch) == 0:
+ break
+ params_str = ", ".join(["%s"] * len(batch))
+ ## This query is slow -- investigate.
+ cursor.execute(
+ "SELECT s.Id FROM Strain AS s LEFT JOIN StrainXRef AS sx "
+ "ON s.Id = sx.StrainId WHERE s.SpeciesId=%s AND s.Name IN "
+ f"({params_str}) AND sx.StrainId IS NULL",
+ (species_id,) + tuple(batch))
+ strain_ids = (sid["Id"] for sid in cursor.fetchall())
+ params = tuple({
+ "pop_id": population_id,
+ "strain_id": strain_id,
+ "order_id": last_order_id + (order_id * 10),
+ "mapping": "N",
+ "pedigree": None
+ } for order_id, strain_id in enumerate(strain_ids, start=1))
+ cursor.executemany(
+ "INSERT INTO StrainXRef( "
+ " InbredSetId, StrainId, OrderId, Used_for_mapping, PedigreeStatus"
+ ")"
+ "VALUES ("
+ " %(pop_id)s, %(strain_id)s, %(order_id)s, %(mapping)s, "
+ " %(pedigree)s"
+ ")",
+ params)
+ last_order_id += (len(params) * 10)
+ total += len(batch)
+ print(f"\t{total} total samples cross-referenced to the population "
+ "so far.")
+
+def build_sample_upload_job(# pylint: disable=[too-many-arguments]
+ speciesid: int,
+ populationid: int,
+ samplesfile: Path,
+ separator: str,
+ firstlineheading: bool,
+ quotechar: str):
+ """Define the async command to run the actual samples data upload."""
+ return [
+ sys.executable, "-m", "scripts.insert_samples", app.config["SQL_URI"],
+ str(speciesid), str(populationid), str(samplesfile.absolute()),
+ separator, f"--redisuri={app.config['REDIS_URL']}",
+ f"--quotechar={quotechar}"
+ ] + (["--firstlineheading"] if firstlineheading else [])
+
+@samples.route("/upload/species//populations//samples",
+ methods=["GET", "POST"])
+def upload_samples(species_id: int, population_id: int):#pylint: disable=[too-many-return-statements]
+ """Upload the samples."""
+ samples_uploads_page = redirect(url_for("samples.upload_samples",
+ species_id=species_id,
+ population_id=population_id))
+ if not is_integer_input(species_id):
+ flash("You did not provide a valid species. Please select one to "
+ "continue.",
+ "alert-danger")
+ return redirect(url_for("samples.select_species"))
+ species = with_db_connection(lambda conn: species_by_id(conn, species_id))
+ if not bool(species):
+ flash("Species with given ID was not found.", "alert-danger")
+ return redirect(url_for("samples.select_species"))
+
+ if not is_integer_input(population_id):
+ flash("You did not provide a valid population. Please select one "
+ "to continue.",
+ "alert-danger")
+ return redirect(url_for("samples.select_population",
+ species_id=species_id),
+ code=307)
+ population = with_db_connection(
+ lambda conn: population_by_id(conn, int(population_id)))
+ if not bool(population):
+ flash("Invalid grouping/population!", "alert-error")
+ return redirect(url_for("samples.select_population",
+ species_id=species_id),
+ code=307)
+
+ if request.method == "GET" or request.files.get("samples_file") is None:
+ return render_template("samples/upload-samples.html",
+ species=species,
+ population=population)
+
+ try:
+ samples_file = save_file(request.files["samples_file"],
+ Path(app.config["UPLOAD_FOLDER"]))
+ except AssertionError:
+ flash("You need to provide a file with the samples data.",
+ "alert-error")
+ return samples_uploads_page
+
+ firstlineheading = (request.form.get("first_line_heading") == "on")
+
+ separator = request.form.get("separator", ",")
+ if separator == "other":
+ separator = request.form.get("other_separator", ",")
+ if not bool(separator):
+ flash("You need to provide a separator character.", "alert-error")
+ return samples_uploads_page
+
+ quotechar = (request.form.get("field_delimiter", '"') or '"')
+
+ redisuri = app.config["REDIS_URL"]
+ with Redis.from_url(redisuri, decode_responses=True) as rconn:
+ the_job = jobs.launch_job(
+ jobs.initialise_job(
+ rconn,
+ jobs.jobsnamespace(),
+ str(uuid.uuid4()),
+ build_sample_upload_job(
+ species["SpeciesId"],
+ population["InbredSetId"],
+ samples_file,
+ separator,
+ firstlineheading,
+ quotechar),
+ "samples_upload",
+ app.config["JOBS_TTL_SECONDS"],
+ {"job_name": f"Samples Upload: {samples_file.name}"}),
+ redisuri,
+ f"{app.config['UPLOAD_FOLDER']}/job_errors")
+ return redirect(url_for(
+ "samples.upload_status", job_id=the_job["jobid"]))
+
+@samples.route("/upload/status/", methods=["GET"])
+def upload_status(job_id: uuid.UUID):
+ """Check on the status of a samples upload job."""
+ job = with_redis_connection(lambda rconn: jobs.job(
+ rconn, jobs.jobsnamespace(), job_id))
+ if job:
+ status = job["status"]
+ if status == "success":
+ return render_template("samples/upload-success.html", job=job)
+
+ if status == "error":
+ return redirect(url_for("samples.upload_failure", job_id=job_id))
+
+ error_filename = Path(jobs.error_filename(
+ job_id, f"{app.config['UPLOAD_FOLDER']}/job_errors"))
+ if error_filename.exists():
+ stat = os.stat(error_filename)
+ if stat.st_size > 0:
+ return redirect(url_for(
+ "samples.upload_failure", job_id=job_id))
+
+ return render_template(
+ "samples/upload-progress.html",
+ job=job) # maybe also handle this?
+
+ return render_template("no_such_job.html", job_id=job_id), 400
+
+@samples.route("/upload/failure/", methods=["GET"])
+def upload_failure(job_id: uuid.UUID):
+ """Display the errors of the samples upload failure."""
+ job = with_redis_connection(lambda rconn: jobs.job(
+ rconn, jobs.jobsnamespace(), job_id))
+ if not bool(job):
+ return render_template("no_such_job.html", job_id=job_id), 400
+
+ error_filename = Path(jobs.error_filename(
+ job_id, f"{app.config['UPLOAD_FOLDER']}/job_errors"))
+ if error_filename.exists():
+ stat = os.stat(error_filename)
+ if stat.st_size > 0:
+ return render_template("worker_failure.html", job_id=job_id)
+
+ return render_template("samples/upload-failure.html", job=job)
diff --git a/uploader/static/css/custom-bootstrap.css b/uploader/static/css/custom-bootstrap.css
new file mode 100644
index 0000000..67f1199
--- /dev/null
+++ b/uploader/static/css/custom-bootstrap.css
@@ -0,0 +1,23 @@
+/** Customize some bootstrap selectors **/
+.btn {
+ text-transform: capitalize;
+}
+
+.navbar-inverse {
+ background-color: #336699;
+ border-color: #080808;
+ color: #FFFFFF;
+ background-image: none;
+}
+
+.navbar-inverse .navbar-nav>li>a {
+ color: #FFFFFF;
+}
+
+.navbar-nav > li > a {
+ padding: 5px;
+}
+
+.navbar {
+ min-height: 30px;
+}
diff --git a/uploader/static/css/styles.css b/uploader/static/css/styles.css
new file mode 100644
index 0000000..a88c229
--- /dev/null
+++ b/uploader/static/css/styles.css
@@ -0,0 +1,7 @@
+.heading {
+ text-transform: capitalize;
+}
+
+label {
+ text-transform: capitalize;
+}
diff --git a/uploader/static/css/two-column-with-separator.css b/uploader/static/css/two-column-with-separator.css
new file mode 100644
index 0000000..b6efd46
--- /dev/null
+++ b/uploader/static/css/two-column-with-separator.css
@@ -0,0 +1,27 @@
+.two-column-with-separator {
+ display: grid;
+ grid-template-columns: 9fr 1fr 9fr;
+}
+
+.two-col-sep-col1 {
+ grid-column: 1 / 2;
+}
+
+.two-col-sep-separator {
+ grid-column: 2 / 3;
+ text-align: center;
+ color: #FE3535;
+ font-weight: bolder;
+}
+
+.two-col-sep-col2 {
+ grid-column: 3 / 4;
+}
+
+.two-col-sep-col1, .two-col-sep-col2 {
+ border-style: solid;
+ border-color: #FE3535;
+ border-width: 1px;
+ border-radius: 2em;
+ padding: 2em 3em 2em 3em;
+}
diff --git a/uploader/static/images/CITGLogo.png b/uploader/static/images/CITGLogo.png
new file mode 100644
index 0000000..ae99fed
Binary files /dev/null and b/uploader/static/images/CITGLogo.png differ
diff --git a/uploader/static/js/select_platform.js b/uploader/static/js/select_platform.js
new file mode 100644
index 0000000..4fdd865
--- /dev/null
+++ b/uploader/static/js/select_platform.js
@@ -0,0 +1,70 @@
+function radio_column(chip) {
+ col = document.createElement("td");
+ radio = document.createElement("input");
+ radio.setAttribute("type", "radio");
+ radio.setAttribute("name", "genechipid");
+ radio.setAttribute("value", chip["GeneChipId"]);
+ radio.setAttribute("required", "required");
+ col.appendChild(radio);
+ return col;
+}
+
+function setup_genechips(genechip_data) {
+ columns = ["GeneChipId", "GeneChipName"]
+ submit_button = document.querySelector(
+ "#select-platform-form button[type='submit']");
+ elt = document.getElementById(
+ "genechips-table").getElementsByTagName("tbody")[0];
+ remove_children(elt);
+ if((genechip_data === undefined) || genechip_data.length === 0) {
+ row = document.createElement("tr");
+ col = document.createElement("td");
+ col.setAttribute("colspan", "3");
+ text = document.createTextNode("No chips found for selected species");
+ col.appendChild(text);
+ row.appendChild(col);
+ elt.appendChild(row);
+ submit_button.setAttribute("disabled", true);
+ return false;
+ }
+
+ submit_button.removeAttribute("disabled")
+ genechip_data.forEach(chip => {
+ row = document.createElement("tr");
+ row.appendChild(radio_column(chip));
+ columns.forEach(column => {
+ col = document.createElement("td");
+ content = document.createTextNode(chip[column]);
+ col.appendChild(content);
+ row.appendChild(col);
+ });
+ elt.appendChild(row);
+ });
+}
+
+function genechips() {
+ return JSON.parse(
+ document.getElementById("select-platform-form").getAttribute(
+ "data-genechips"));
+}
+
+function update_genechips(event) {
+ genec = genechips();
+
+ species_elt = document.getElementById("species");
+
+ if(event.target == species_elt) {
+ setup_genechips(genec[species_elt.value.toLowerCase()]);
+ }
+}
+
+function select_row_radio(row) {
+ radio = row.getElementsByTagName(
+ "td")[0].getElementsByTagName(
+ "input")[0];
+ if(radio === undefined) {
+ return false;
+ }
+ radio.setAttribute("checked", "checked");
+ return true;
+}
diff --git a/uploader/static/js/upload_progress.js b/uploader/static/js/upload_progress.js
new file mode 100644
index 0000000..9638b36
--- /dev/null
+++ b/uploader/static/js/upload_progress.js
@@ -0,0 +1,97 @@
+function make_processing_indicator(elt) {
+ var count = 0;
+ return function() {
+ var message = "Finalising upload and saving file: "
+ if(count > 5) {
+ count = 1;
+ }
+ for(i = 0; i < count; i++) {
+ message = message + ".";
+ }
+ elt.innerHTML = message
+ count = count + 1
+ };
+}
+
+function make_progress_updater(file, indicator_elt) {
+ var progress_bar = indicator_elt.querySelector("#progress-bar");
+ var progress_text = indicator_elt.querySelector("#progress-text");
+ var extra_text = indicator_elt.querySelector("#progress-extra-text");
+ return function(event) {
+ if(event.loaded <= file.size) {
+ var percent = Math.round((event.loaded / file.size) * 100);
+ progress_bar.value = percent
+ progress_text.innerHTML = "Uploading: " + percent + "%";
+ extra_text.setAttribute("class", "hidden")
+ }
+
+ if(event.loaded == event.total) {
+ progress_bar.value = 100;
+ progress_text.innerHTML = "Uploaded: 100%";
+ extra_text.setAttribute("class", null);
+ intv = setInterval(make_processing_indicator(extra_text), 400);
+ setTimeout(function() {clearTimeout(intv);}, 20000);
+ }
+ };
+}
+
+function setup_cancel_upload(request, indicator_elt) {
+ document.getElementById("btn-cancel-upload").addEventListener(
+ "click", function(event) {
+ event.preventDefault();
+ request.abort();
+ });
+}
+
+function setup_request(file, progress_indicator_elt) {
+ var request = new XMLHttpRequest();
+ var updater = make_progress_updater(file, progress_indicator_elt);
+ request.upload.addEventListener("progress", updater);
+ request.onload = function(event) {
+ document.location.assign(request.responseURL);
+ };
+ setup_cancel_upload(request, progress_indicator_elt)
+ return request;
+}
+
+function selected_filetype(radios) {
+ for(idx = 0; idx < radios.length; idx++) {
+ if(radios[idx].checked) {
+ return radios[idx].value;
+ }
+ }
+}
+
+function make_data_uploader(setup_formdata) {
+ return function(event) {
+ event.preventDefault();
+
+ var pindicator = document.getElementById("upload-progress-indicator");
+
+ var form = event.target;
+ var the_file = form.querySelector("input[type='file']").files[0];
+ if(the_file === undefined) {
+ form.querySelector("input[type='file']").parentElement.setAttribute(
+ "class", "invalid-input");
+ error_elt = form.querySelector("#no-file-error");
+ if(error_elt !== undefined) {
+ error_elt.setAttribute("style", "display: block;");
+ }
+ return false;
+ }
+ var formdata = setup_formdata(form);
+
+ document.getElementById("progress-filename").innerHTML = the_file.name;
+ var request = setup_request(the_file, pindicator);
+ request.open(form.getAttribute("method"), form.getAttribute("action"));
+ request.send(formdata);
+ return false;
+ }
+}
+
+
+function setup_upload_handlers(formid, datauploader) {
+ console.info("Setting up the upload handlers.")
+ upload_form = document.getElementById(formid);
+ upload_form.addEventListener("submit", datauploader);
+}
diff --git a/uploader/static/js/upload_samples.js b/uploader/static/js/upload_samples.js
new file mode 100644
index 0000000..aed536f
--- /dev/null
+++ b/uploader/static/js/upload_samples.js
@@ -0,0 +1,132 @@
+/*
+ * Read the file content and set the `data-preview-content` attribute on the
+ * file element
+ */
+function read_first_n_lines(event,
+ fileelement,
+ numlines,
+ firstlineheading = true) {
+ var thefile = fileelement.files[0];
+ var reader = new FileReader();
+ reader.addEventListener("load", (event) => {
+ var filecontent = event.target.result.split(
+ "\n").slice(
+ 0, (numlines + (firstlineheading ? 1 : 0))).map(
+ (line) => {return line.trim("\r");});
+ fileelement.setAttribute(
+ "data-preview-content", JSON.stringify(filecontent));
+ display_preview(event);
+ })
+ reader.readAsText(thefile);
+}
+
+function remove_rows(preview_table) {
+ var table_body = preview_table.getElementsByTagName("tbody")[0];
+ while(table_body.children.length > 0) {
+ table_body.removeChild(table_body.children.item(0));
+ }
+}
+
+/*
+ * Display error row
+ */
+function display_error_row(preview_table, error_message) {
+ remove_rows(preview_table);
+ row = document.createElement("tr");
+ cell = document.createElement("td");
+ cell.setAttribute("colspan", 4);
+ cell.innerHTML = error_message;
+ row.appendChild(cell);
+ preview_table.getElementsByTagName("tbody")[0].appendChild(row);
+}
+
+function strip(str, chars) {
+ var end = str.length;
+ var start = 0
+ for(var j = str.length; j > 0; j--) {
+ if(!chars.includes(str[j - 1])) {
+ break;
+ }
+ end = end - 1;
+ }
+ for(var i = 0; i < end; i++) {
+ if(!chars.includes(str[i])) {
+ break;
+ }
+ start = start + 1;
+ }
+ return str.slice(start, end);
+}
+
+function process_preview_data(preview_data, separator, delimiter) {
+ return preview_data.map((line) => {
+ return line.split(separator).map((field) => {
+ return strip(field, delimiter);
+ });
+ });
+}
+
+function render_preview(preview_table, preview_data) {
+ remove_rows(preview_table);
+ var table_body = preview_table.getElementsByTagName("tbody")[0];
+ preview_data.forEach((line) => {
+ var row = document.createElement("tr");
+ line.forEach((field) => {
+ var cell = document.createElement("td");
+ cell.innerHTML = field;
+ row.appendChild(cell);
+ });
+ table_body.appendChild(row);
+ });
+}
+
+/*
+ * Display a preview of the data, relying on the user's selection.
+ */
+function display_preview(event) {
+ var data_preview_table = document.getElementById("tbl:samples-preview");
+ remove_rows(data_preview_table);
+
+ var separator = document.getElementById("select:separator").value;
+ if(separator === "other") {
+ separator = document.getElementById("txt:separator").value;
+ }
+ if(separator == "") {
+ display_error_row(data_preview_table, "Please provide a separator.");
+ return false;
+ }
+
+ var delimiter = document.getElementById("txt:delimiter").value;
+
+ var firstlineheading = document.getElementById("chk:heading").checked;
+
+ var fileelement = document.getElementById("file:samples");
+ var preview_data = JSON.parse(
+ fileelement.getAttribute("data-preview-content") || "[]");
+ if(preview_data.length == 0) {
+ display_error_row(
+ data_preview_table,
+ "No file data to preview. Check that file is provided.");
+ }
+
+ render_preview(data_preview_table, process_preview_data(
+ preview_data.slice(0 + (firstlineheading ? 1 : 0)),
+ separator,
+ delimiter));
+}
+
+document.getElementById("chk:heading").addEventListener(
+ "change", display_preview);
+document.getElementById("select:separator").addEventListener(
+ "change", display_preview);
+document.getElementById("txt:separator").addEventListener(
+ "keyup", display_preview);
+document.getElementById("txt:delimiter").addEventListener(
+ "keyup", display_preview);
+document.getElementById("file:samples").addEventListener(
+ "change", (event) => {
+ read_first_n_lines(event,
+ document.getElementById("file:samples"),
+ 30,
+ document.getElementById("chk:heading").checked);
+ });
diff --git a/uploader/static/js/utils.js b/uploader/static/js/utils.js
new file mode 100644
index 0000000..045dd47
--- /dev/null
+++ b/uploader/static/js/utils.js
@@ -0,0 +1,10 @@
+function remove_children(element) {
+ Array.from(element.children).forEach(child => {
+ element.removeChild(child);
+ });
+}
+
+function trigger_change_event(element) {
+ evt = new Event("change");
+ element.dispatchEvent(evt);
+}
diff --git a/uploader/templates/base.html b/uploader/templates/base.html
new file mode 100644
index 0000000..eb5e6b7
--- /dev/null
+++ b/uploader/templates/base.html
@@ -0,0 +1,51 @@
+
+
+
+
+
+
+ {%block extrameta%}{%endblock%}
+
+ GN Uploader: {%block title%}{%endblock%}
+
+
+
+
+
+
+
+
+
+
+ {%block css%}{%endblock%}
+
+
+
+
The following are some of the requirements that the data in your file
+ MUST fulfil before it is considered valid for this system:
+
+
+
+
File headings
+
+
The first row in the file should contains the headings. The number of
+ headings in this first row determines the number of columns expected for
+ all other lines in the file.
+
Each heading value in the first row MUST appear in the first row
+ ONE AND ONLY ONE time
+
The sample/cases (previously 'strains') headers in your first row will be
+ against those in the
+ GeneNetwork database.
+
+ If you encounter an error saying your sample(s)/case(s) do not exist
+ in the GeneNetwork database, then you will have to use the
+ Upload Samples/Cases
+ option on this system to upload them.
+
+
+
+
+
Data
+
+
NONE of the data cells/fields is allowed to be empty.
+ All fields/cells MUST contain a value.
+
The first column of the data rows will be considered a textual field,
+ holding the "identifier" for that row
+
Except for the first column/field for each data row,
+ NONE of the data columns/cells/fields should contain
+ spurious characters like `eeeee`, `5.555iloveguix`, etc...
+ All of them should be decimal values
+
decimal numbers must conform to the following criteria:
+
+
when checking an average file decimal numbers must have exactly three
+ decimal places to the right of the decimal point.
+
when checking a standard error file decimal numbers must have six or
+ greater decimal places to the right of the decimal point.
+
there must be a number to the left side of the decimal place
+ (e.g. 0.55555 is allowed but .55555 is not).
+
+
+
+
+
+
+
+
+
+
Supported File Types
+ We support the following file types:
+
+
+
Tab-Separated value files (.tsv)
+
+
The TAB character is used to separate the fields of each
+ column
.txt files: Content has the same format as .tsv file above
+
.zip files: each zip file should contain
+ ONE AND ONLY ONE file of the .tsv or .txt type above.
+ Any zip file with more than one file is invalid, and so is an empty
+ zip file.
Each of the sections below gives you a different option for data upload.
+ Please read the documentation for each section carefully to understand what
+ each section is about.
+
+
+
+
+
R/qtl2 Bundles
+
+
+
This feature combines and extends the two upload methods below. Instead of
+ uploading one item at a time, the R/qtl2 bundle you upload can contain both
+ the genotypes data (samples/individuals/cases and their data) and the
+ expression data.
+
The R/qtl2 bundle, additionally, can contain extra metadata, that neither
+ of the methods below can handle.
This feature enables you to upload expression data. It expects the data to
+ be in tab-separated values (TSV) files. The data should be
+ a simple matrix of phenotype × sample, i.e. The first column is a
+ list of the phenotypes and the first row is a list of
+ samples/cases.
+
+
If you haven't done so please go to this page to learn the requirements for
+ file formats and helpful suggestions to enter your data in a fast and easy
+ way.
+
+
+
PLEASE REVIEW YOUR DATA.Make sure your data complies
+ with our system requirements. (
+ Help
+ )
+
UPLOAD YOUR DATA FOR DATA VERIFICATION. We accept
+ .csv, .txt and .zip
+ files (Help)
For the expression data above, you need the samples/cases in your file to
+ already exist in the GeneNetwork database. If there are any samples that do
+ not already exist the upload of the expression data will fail.
+
This section gives you the opportunity to upload any missing samples
{{job_name}}: parse results
+
+{%if user_aborted%}
+Job aborted by the user
+{%endif%}
+
+{{errors_display(errors, "No errors found in the file", "We found the following errors", True)}}
+
+{%if errors | length == 0 and not user_aborted %}
+
The processing of the R/qtl2 bundle you uploaded has failed. We have
+ provided some information below to help you figure out what the problem
+ could be.
+
If you find that you cannot figure out what the problem is on your own,
+ please contact the team running the system for assistance, providing the
+ following details:
+
+
R/qtl2 bundle you uploaded
+
This URL: {{request_url()}}
+
(maybe) a screenshot of this page
+
+
+
+
+
stdout
+{{cli_output(job, "stdout")}}
+
+
stderr
+{{cli_output(job, "stderr")}}
+
+
Log
+
+ {%for msg in messages%}
+ {{msg}}
+ {%endfor%}
+
Your R/qtl2 files bundle contains a "geno" specification. You will
+ therefore need to select from one of the existing Genotype datasets or
+ create a new one.
+
This is the dataset where your data will be organised under.
The data is organised in a hierarchical form, beginning with
+ species at the very top. Under species the data is
+ organised by population, sometimes referred to as grouping.
+ (In some really old documents/systems, you might see this referred to as
+ InbredSet.)
+
In this section, you get to define what population your data is to be
+ organised by.
This is the information you have provided to accompany the R/qtl2 bundle
+ you have uploaded. Please verify the information is correct before
+ proceeding.
+ Provide a valid R/qtl2 zip file here. In particular, ensure your zip bundle
+ contains exactly one control file and the corresponding files mentioned in
+ the control file.
+
+
+ The control file can be either a YAML or JSON file. ALL other data
+ files in the zip bundle should be CSV files.
+
You have successfully uploaded the zipped bundle of R/qtl2 files.
+
The next step is to select the various extra information we need to figure
+ out what to do with the data. You will select/create the relevant studies
+ and/or datasets to organise the data in the steps that follow.
We organise the samples/cases/strains in a hierarchichal form, starting
+ with species at the very top. Under species, we have a
+ grouping in terms of the relevant population
+ (e.g. Inbred populations, cell tissue, etc.)
+ There was a critical failure launching the job to parse your file.
+ This is our fault and (probably) has nothing to do with the file you uploaded.
+
+
+
+ Please notify the developers of this issue when you encounter it,
+ providing the link to this page, or the information below.
+
+
+
Debugging Information
+
+
+
job id: {{job_id}}
+
+
+{%endblock%}
diff --git a/uploader/upload/__init__.py b/uploader/upload/__init__.py
new file mode 100644
index 0000000..5f120d4
--- /dev/null
+++ b/uploader/upload/__init__.py
@@ -0,0 +1,7 @@
+"""Package handling upload of files."""
+from flask import Blueprint
+
+from .rqtl2 import rqtl2
+
+upload = Blueprint("upload", __name__)
+upload.register_blueprint(rqtl2, url_prefix="/rqtl2")
diff --git a/uploader/upload/rqtl2.py b/uploader/upload/rqtl2.py
new file mode 100644
index 0000000..6aed1f7
--- /dev/null
+++ b/uploader/upload/rqtl2.py
@@ -0,0 +1,1157 @@
+"""Module to handle uploading of R/qtl2 bundles."""#pylint: disable=[too-many-lines]
+import sys
+import json
+import traceback
+from pathlib import Path
+from datetime import date
+from uuid import UUID, uuid4
+from functools import partial
+from zipfile import ZipFile, is_zipfile
+from typing import Union, Callable, Optional
+
+import MySQLdb as mdb
+from redis import Redis
+from MySQLdb.cursors import DictCursor
+from werkzeug.utils import secure_filename
+from flask import (
+ flash,
+ escape,
+ request,
+ jsonify,
+ url_for,
+ redirect,
+ Response,
+ Blueprint,
+ render_template,
+ current_app as app)
+
+from r_qtl import r_qtl2
+
+from uploader import jobs
+from uploader.files import save_file, fullpath
+from uploader.dbinsert import species as all_species
+from uploader.db_utils import with_db_connection, database_connection
+
+from uploader.db.platforms import platform_by_id, platforms_by_species
+from uploader.db.averaging import averaging_methods, averaging_method_by_id
+from uploader.db.tissues import all_tissues, tissue_by_id, create_new_tissue
+from uploader.db import (
+ species_by_id,
+ save_population,
+ populations_by_species,
+ population_by_species_and_id,)
+from uploader.db.datasets import (
+ geno_dataset_by_id,
+ geno_datasets_by_species_and_population,
+
+ probeset_study_by_id,
+ probeset_create_study,
+ probeset_dataset_by_id,
+ probeset_create_dataset,
+ probeset_datasets_by_study,
+ probeset_studies_by_species_and_population)
+
+rqtl2 = Blueprint("rqtl2", __name__)
+
+@rqtl2.route("/", methods=["GET", "POST"])
+@rqtl2.route("/select-species", methods=["GET", "POST"])
+def select_species():
+ """Select the species."""
+ if request.method == "GET":
+ return render_template("rqtl2/index.html", species=with_db_connection(all_species))
+
+ species_id = request.form.get("species_id")
+ species = with_db_connection(
+ lambda conn: species_by_id(conn, species_id))
+ if bool(species):
+ return redirect(url_for(
+ "upload.rqtl2.select_population", species_id=species_id))
+ flash("Invalid species or no species selected!", "alert-error error-rqtl2")
+ return redirect(url_for("upload.rqtl2.select_species"))
+
+
+@rqtl2.route("/upload/species//select-population",
+ methods=["GET", "POST"])
+def select_population(species_id: int):
+ """Select/Create the population to organise data under."""
+ with database_connection(app.config["SQL_URI"]) as conn:
+ species = species_by_id(conn, species_id)
+ if not bool(species):
+ flash("Invalid species selected!", "alert-error error-rqtl2")
+ return redirect(url_for("upload.rqtl2.select_species"))
+
+ if request.method == "GET":
+ return render_template(
+ "rqtl2/select-population.html",
+ species=species,
+ populations=populations_by_species(conn, species_id))
+
+ population = population_by_species_and_id(
+ conn, species["SpeciesId"], request.form.get("inbredset_id"))
+ if not bool(population):
+ flash("Invalid Population!", "alert-error error-rqtl2")
+ return redirect(
+ url_for("upload.rqtl2.select_population", pgsrc="error"),
+ code=307)
+
+ return redirect(url_for("upload.rqtl2.upload_rqtl2_bundle",
+ species_id=species["SpeciesId"],
+ population_id=population["InbredSetId"]))
+
+
+@rqtl2.route("/upload/species//create-population",
+ methods=["POST"])
+def create_population(species_id: int):
+ """Create a new population for the given species."""
+ population_page = redirect(url_for("upload.rqtl2.select_population",
+ species_id=species_id))
+ with database_connection(app.config["SQL_URI"]) as conn:
+ species = species_by_id(conn, species_id)
+ population_name = request.form.get("inbredset_name", "").strip()
+ population_fullname = request.form.get("inbredset_fullname", "").strip()
+ if not bool(species):
+ flash("Invalid species!", "alert-error error-rqtl2")
+ return redirect(url_for("upload.rqtl2.select_species"))
+ if not bool(population_name):
+ flash("Invalid Population Name!", "alert-error error-rqtl2")
+ return population_page
+ if not bool(population_fullname):
+ flash("Invalid Population Full Name!", "alert-error error-rqtl2")
+ return population_page
+ new_population = save_population(conn, {
+ "SpeciesId": species["SpeciesId"],
+ "Name": population_name,
+ "InbredSetName": population_fullname,
+ "FullName": population_fullname,
+ "Family": request.form.get("inbredset_family") or None,
+ "Description": request.form.get("description") or None
+ })
+
+ flash("Population created successfully.", "alert-success")
+ return redirect(
+ url_for("upload.rqtl2.upload_rqtl2_bundle",
+ species_id=species_id,
+ population_id=new_population["population_id"],
+ pgsrc="create-population"),
+ code=307)
+
+
+class __RequestError__(Exception): #pylint: disable=[invalid-name]
+ """Internal class to avoid pylint's `too-many-return-statements` error."""
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle"),
+ methods=["GET", "POST"])
+def upload_rqtl2_bundle(species_id: int, population_id: int):
+ """Allow upload of R/qtl2 bundle."""
+ with database_connection(app.config["SQL_URI"]) as conn:
+ species = species_by_id(conn, species_id)
+ population = population_by_species_and_id(
+ conn, species["SpeciesId"], population_id)
+ if not bool(species):
+ flash("Invalid species!", "alert-error error-rqtl2")
+ return redirect(url_for("upload.rqtl2.select_species"))
+ if not bool(population):
+ flash("Invalid Population!", "alert-error error-rqtl2")
+ return redirect(
+ url_for("upload.rqtl2.select_population", pgsrc="error"),
+ code=307)
+ if request.method == "GET" or (
+ request.method == "POST"
+ and bool(request.args.get("pgsrc"))):
+ return render_template("rqtl2/upload-rqtl2-bundle-step-01.html",
+ species=species,
+ population=population)
+
+ try:
+ app.logger.debug("Files in the form: %s", request.files)
+ the_file = save_file(request.files["rqtl2_bundle_file"],
+ Path(app.config["UPLOAD_FOLDER"]))
+ except AssertionError:
+ app.logger.debug(traceback.format_exc())
+ flash("Please provide a valid R/qtl2 zip bundle.",
+ "alert-error error-rqtl2")
+ return redirect(url_for("upload.rqtl2.upload_rqtl2_bundle",
+ species_id=species_id,
+ population_id=population_id))
+
+ if not is_zipfile(str(the_file)):
+ app.logger.debug("The file is not a zip file.")
+ raise __RequestError__("Invalid file! Expected a zip file.")
+
+ jobid = trigger_rqtl2_bundle_qc(
+ species_id,
+ population_id,
+ the_file,
+ request.files["rqtl2_bundle_file"].filename)#type: ignore[arg-type]
+ return redirect(url_for(
+ "upload.rqtl2.rqtl2_bundle_qc_status", jobid=jobid))
+
+
+def trigger_rqtl2_bundle_qc(
+ species_id: int,
+ population_id: int,
+ rqtl2bundle: Path,
+ originalfilename: str
+) -> UUID:
+ """Trigger QC on the R/qtl2 bundle."""
+ redisuri = app.config["REDIS_URL"]
+ with Redis.from_url(redisuri, decode_responses=True) as rconn:
+ jobid = uuid4()
+ redis_ttl_seconds = app.config["JOBS_TTL_SECONDS"]
+ jobs.launch_job(
+ jobs.initialise_job(
+ rconn,
+ jobs.jobsnamespace(),
+ str(jobid),
+ [sys.executable, "-m", "scripts.qc_on_rqtl2_bundle",
+ app.config["SQL_URI"], app.config["REDIS_URL"],
+ jobs.jobsnamespace(), str(jobid), str(species_id),
+ str(population_id), "--redisexpiry",
+ str(redis_ttl_seconds)],
+ "rqtl2-bundle-qc-job",
+ redis_ttl_seconds,
+ {"job-metadata": json.dumps({
+ "speciesid": species_id,
+ "populationid": population_id,
+ "rqtl2-bundle-file": str(rqtl2bundle.absolute()),
+ "original-filename": originalfilename})}),
+ redisuri,
+ f"{app.config['UPLOAD_FOLDER']}/job_errors")
+ return jobid
+
+
+def chunk_name(uploadfilename: str, chunkno: int) -> str:
+ """Generate chunk name from original filename and chunk number"""
+ if uploadfilename == "":
+ raise ValueError("Name cannot be empty!")
+ if chunkno < 1:
+ raise ValueError("Chunk number must be greater than zero")
+ return f"{secure_filename(uploadfilename)}_part_{chunkno:05d}"
+
+
+def chunks_directory(uniqueidentifier: str) -> Path:
+ """Compute the directory where chunks are temporarily stored."""
+ if uniqueidentifier == "":
+ raise ValueError("Unique identifier cannot be empty!")
+ return Path(app.config["UPLOAD_FOLDER"], f"tempdir_{uniqueidentifier}")
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle-chunked"),
+ methods=["GET"])
+def upload_rqtl2_bundle_chunked_get(# pylint: disable=["unused-argument"]
+ species_id: int,
+ population_id: int
+):
+ """
+ Extension to the `upload_rqtl2_bundle` endpoint above that provides a way
+ for testing whether all the chunks have been uploaded and to assist with
+ resuming a failed upload.
+ """
+ fileid = request.args.get("resumableIdentifier", type=str) or ""
+ filename = request.args.get("resumableFilename", type=str) or ""
+ chunk = request.args.get("resumableChunkNumber", type=int) or 0
+ if not(fileid or filename or chunk):
+ return jsonify({
+ "message": "At least one required query parameter is missing.",
+ "error": "BadRequest",
+ "statuscode": 400
+ }), 400
+
+ if Path(chunks_directory(fileid),
+ chunk_name(filename, chunk)).exists():
+ return "OK"
+
+ return jsonify({
+ "message": f"Chunk {chunk} was not found.",
+ "error": "NotFound",
+ "statuscode": 404
+ }), 404
+
+
+def __merge_chunks__(targetfile: Path, chunkpaths: tuple[Path, ...]) -> Path:
+ """Merge the chunks into a single file."""
+ with open(targetfile, "ab") as _target:
+ for chunkfile in chunkpaths:
+ with open(chunkfile, "rb") as _chunkdata:
+ _target.write(_chunkdata.read())
+
+ chunkfile.unlink()
+ return targetfile
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle-chunked"),
+ methods=["POST"])
+def upload_rqtl2_bundle_chunked_post(species_id: int, population_id: int):
+ """
+ Extension to the `upload_rqtl2_bundle` endpoint above that allows large
+ files to be uploaded in chunks.
+
+ This should hopefully speed up uploads, and if done right, even enable
+ resumable uploads
+ """
+ _totalchunks = request.form.get("resumableTotalChunks", type=int) or 0
+ _chunk = request.form.get("resumableChunkNumber", default=1, type=int)
+ _uploadfilename = request.form.get(
+ "resumableFilename", default="", type=str) or ""
+ _fileid = request.form.get(
+ "resumableIdentifier", default="", type=str) or ""
+ _targetfile = Path(app.config["UPLOAD_FOLDER"], _fileid)
+
+ if _targetfile.exists():
+ return jsonify({
+ "message": (
+ "A file with a similar unique identifier has previously been "
+ "uploaded and possibly is/has being/been processed."),
+ "error": "BadRequest",
+ "statuscode": 400
+ }), 400
+
+ try:
+ # save chunk data
+ chunks_directory(_fileid).mkdir(exist_ok=True, parents=True)
+ request.files["file"].save(Path(chunks_directory(_fileid),
+ chunk_name(_uploadfilename, _chunk)))
+
+ # Check whether upload is complete
+ chunkpaths = tuple(
+ Path(chunks_directory(_fileid), chunk_name(_uploadfilename, _achunk))
+ for _achunk in range(1, _totalchunks+1))
+ if all(_file.exists() for _file in chunkpaths):
+ # merge_files and clean up chunks
+ __merge_chunks__(_targetfile, chunkpaths)
+ chunks_directory(_fileid).rmdir()
+ jobid = trigger_rqtl2_bundle_qc(
+ species_id, population_id, _targetfile, _uploadfilename)
+ return url_for(
+ "upload.rqtl2.rqtl2_bundle_qc_status", jobid=jobid)
+ except Exception as exc:# pylint: disable=[broad-except]
+ msg = "Error processing uploaded file chunks."
+ app.logger.error(msg, exc_info=True, stack_info=True)
+ return jsonify({
+ "message": msg,
+ "error": type(exc).__name__,
+ "error-description": " ".join(str(arg) for arg in exc.args),
+ "error-trace": traceback.format_exception(exc)
+ }), 500
+
+ return "OK"
+
+
+@rqtl2.route("/upload/species/rqtl2-bundle/qc-status/",
+ methods=["GET", "POST"])
+def rqtl2_bundle_qc_status(jobid: UUID):
+ """Check the status of the QC jobs."""
+ with (Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn,
+ database_connection(app.config["SQL_URI"]) as dbconn):
+ try:
+ thejob = jobs.job(rconn, jobs.jobsnamespace(), jobid)
+ messagelistname = thejob.get("log-messagelist")
+ logmessages = (rconn.lrange(messagelistname, 0, -1)
+ if bool(messagelistname) else [])
+ jobstatus = thejob["status"]
+ if jobstatus == "error":
+ return render_template("rqtl2/rqtl2-qc-job-error.html",
+ job=thejob,
+ errorsgeneric=json.loads(
+ thejob.get("errors-generic", "[]")),
+ errorsgeno=json.loads(
+ thejob.get("errors-geno", "[]")),
+ errorspheno=json.loads(
+ thejob.get("errors-pheno", "[]")),
+ errorsphenose=json.loads(
+ thejob.get("errors-phenose", "[]")),
+ errorsphenocovar=json.loads(
+ thejob.get("errors-phenocovar", "[]")),
+ messages=logmessages)
+ if jobstatus == "success":
+ jobmeta = json.loads(thejob["job-metadata"])
+ species = species_by_id(dbconn, jobmeta["speciesid"])
+ return render_template(
+ "rqtl2/rqtl2-qc-job-results.html",
+ species=species,
+ population=population_by_species_and_id(
+ dbconn, species["SpeciesId"], jobmeta["populationid"]),
+ rqtl2bundle=Path(jobmeta["rqtl2-bundle-file"]).name,
+ rqtl2bundleorig=jobmeta["original-filename"])
+
+ def compute_percentage(thejob, filetype) -> Union[str, None]:
+ if f"{filetype}-linecount" in thejob:
+ return "100"
+ if f"{filetype}-filesize" in thejob:
+ percent = ((int(thejob.get(f"{filetype}-checked", 0))
+ /
+ int(thejob.get(f"{filetype}-filesize", 1)))
+ * 100)
+ return f"{percent:.2f}"
+ return None
+
+ return render_template(
+ "rqtl2/rqtl2-qc-job-status.html",
+ job=thejob,
+ geno_percent=compute_percentage(thejob, "geno"),
+ pheno_percent=compute_percentage(thejob, "pheno"),
+ phenose_percent=compute_percentage(thejob, "phenose"),
+ messages=logmessages)
+ except jobs.JobNotFound:
+ return render_template("rqtl2/no-such-job.html", jobid=jobid)
+
+
+def redirect_on_error(flaskroute, **kwargs):
+ """Utility to redirect on error"""
+ return redirect(url_for(flaskroute, **kwargs, pgsrc="error"),
+ code=(307 if request.method == "POST" else 302))
+
+
+def check_species(conn: mdb.Connection, formargs: dict) -> Optional[
+ tuple[str, Response]]:
+ """
+ Check whether the 'species_id' value is provided, and whether a
+ corresponding species exists in the database.
+
+ Maybe give the function a better name..."""
+ speciespage = redirect_on_error("upload.rqtl2.select_species")
+ if "species_id" not in formargs:
+ return "You MUST provide the Species identifier.", speciespage
+
+ if not bool(species_by_id(conn, formargs["species_id"])):
+ return "No species with the provided identifier exists.", speciespage
+
+ return None
+
+
+def check_population(conn: mdb.Connection,
+ formargs: dict,
+ species_id) -> Optional[tuple[str, Response]]:
+ """
+ Check whether the 'population_id' value is provided, and whether a
+ corresponding population exists in the database.
+
+ Maybe give the function a better name..."""
+ poppage = redirect_on_error(
+ "upload.rqtl2.select_species", species_id=species_id)
+ if "population_id" not in formargs:
+ return "You MUST provide the Population identifier.", poppage
+
+ if not bool(population_by_species_and_id(
+ conn, species_id, formargs["population_id"])):
+ return "No population with the provided identifier exists.", poppage
+
+ return None
+
+
+def check_r_qtl2_bundle(formargs: dict,
+ species_id,
+ population_id) -> Optional[tuple[str, Response]]:
+ """Check for the existence of the R/qtl2 bundle."""
+ fileuploadpage = redirect_on_error("upload.rqtl2.upload_rqtl2_bundle",
+ species_id=species_id,
+ population_id=population_id)
+ if not "rqtl2_bundle_file" in formargs:
+ return (
+ "You MUST provide a R/qtl2 zip bundle for upload.", fileuploadpage)
+
+ if not Path(fullpath(formargs["rqtl2_bundle_file"])).exists():
+ return "No R/qtl2 bundle with the given name exists.", fileuploadpage
+
+ return None
+
+
+def check_geno_dataset(conn: mdb.Connection,
+ formargs: dict,
+ species_id,
+ population_id) -> Optional[tuple[str, Response]]:
+ """Check for the Genotype dataset."""
+ genodsetpg = redirect_on_error("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id)
+ if not bool(formargs.get("geno-dataset-id")):
+ return (
+ "You MUST provide a valid Genotype dataset identifier", genodsetpg)
+
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM GenoFreeze WHERE Id=%s",
+ (formargs["geno-dataset-id"],))
+ results = cursor.fetchall()
+ if not bool(results):
+ return ("No genotype dataset with the provided identifier exists.",
+ genodsetpg)
+ if len(results) > 1:
+ return (
+ "Data corruption: More than one genotype dataset with the same "
+ "identifier.",
+ genodsetpg)
+
+ return None
+
+def check_tissue(
+ conn: mdb.Connection,formargs: dict) -> Optional[tuple[str, Response]]:
+ """Check for tissue/organ/biological material."""
+ selectdsetpg = redirect_on_error("upload.rqtl2.select_dataset_info",
+ species_id=formargs["species_id"],
+ population_id=formargs["population_id"])
+ if not bool(formargs.get("tissueid", "").strip()):
+ return ("No tissue/organ/biological material provided.", selectdsetpg)
+
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ cursor.execute("SELECT * FROM Tissue WHERE Id=%s",
+ (formargs["tissueid"],))
+ results = cursor.fetchall()
+ if not bool(results):
+ return ("No tissue/organ with the provided identifier exists.",
+ selectdsetpg)
+
+ if len(results) > 1:
+ return (
+ "Data corruption: More than one tissue/organ with the same "
+ "identifier.",
+ selectdsetpg)
+
+ return None
+
+
+def check_probe_study(conn: mdb.Connection,
+ formargs: dict,
+ species_id,
+ population_id) -> Optional[tuple[str, Response]]:
+ """Check for the ProbeSet study."""
+ dsetinfopg = redirect_on_error("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id)
+ if not bool(formargs.get("probe-study-id")):
+ return "No probeset study was selected!", dsetinfopg
+
+ if not bool(probeset_study_by_id(conn, formargs["probe-study-id"])):
+ return ("No probeset study with the provided identifier exists",
+ dsetinfopg)
+
+ return None
+
+
+def check_probe_dataset(conn: mdb.Connection,
+ formargs: dict,
+ species_id,
+ population_id) -> Optional[tuple[str, Response]]:
+ """Check for the ProbeSet dataset."""
+ dsetinfopg = redirect_on_error("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id)
+ if not bool(formargs.get("probe-dataset-id")):
+ return "No probeset dataset was selected!", dsetinfopg
+
+ if not bool(probeset_dataset_by_id(conn, formargs["probe-dataset-id"])):
+ return ("No probeset dataset with the provided identifier exists",
+ dsetinfopg)
+
+ return None
+
+
+def with_errors(endpointthunk: Callable, *checkfns):
+ """Run 'endpointthunk' with error checking."""
+ formargs = {**dict(request.args), **dict(request.form)}
+ errors = tuple(item for item in (_fn(formargs=formargs) for _fn in checkfns)
+ if item is not None)
+ if len(errors) > 0:
+ flash(errors[0][0], "alert-error error-rqtl2")
+ return errors[0][1]
+
+ return endpointthunk()
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle/select-geno-dataset"),
+ methods=["POST"])
+def select_geno_dataset(species_id: int, population_id: int):
+ """Select from existing geno datasets."""
+ with database_connection(app.config["SQL_URI"]) as conn:
+ def __thunk__():
+ geno_dset = geno_datasets_by_species_and_population(
+ conn, species_id, population_id)
+ if not bool(geno_dset):
+ flash("No genotype dataset was provided!",
+ "alert-error error-rqtl2")
+ return redirect(url_for("upload.rqtl2.select_geno_dataset",
+ species_id=species_id,
+ population_id=population_id,
+ pgsrc="error"),
+ code=307)
+
+ flash("Genotype accepted", "alert-success error-rqtl2")
+ return redirect(url_for("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id,
+ pgsrc="upload.rqtl2.select_geno_dataset"),
+ code=307)
+
+ return with_errors(__thunk__,
+ partial(check_species, conn=conn),
+ partial(check_population, conn=conn,
+ species_id=species_id),
+ partial(check_r_qtl2_bundle,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_geno_dataset,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id))
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle/create-geno-dataset"),
+ methods=["POST"])
+def create_geno_dataset(species_id: int, population_id: int):
+ """Create a new geno dataset."""
+ with database_connection(app.config["SQL_URI"]) as conn:
+ def __thunk__():
+ sgeno_page = redirect(url_for("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id,
+ pgsrc="error"),
+ code=307)
+ errorclasses = "alert-error error-rqtl2 error-rqtl2-create-geno-dataset"
+ if not bool(request.form.get("dataset-name")):
+ flash("You must provide the dataset name", errorclasses)
+ return sgeno_page
+ if not bool(request.form.get("dataset-fullname")):
+ flash("You must provide the dataset full name", errorclasses)
+ return sgeno_page
+ public = 2 if request.form.get("dataset-public") == "on" else 0
+
+ with conn.cursor(cursorclass=DictCursor) as cursor:
+ datasetname = request.form["dataset-name"]
+ new_dataset = {
+ "name": datasetname,
+ "fname": request.form.get("dataset-fullname"),
+ "sname": request.form.get("dataset-shortname") or datasetname,
+ "today": date.today().isoformat(),
+ "pub": public,
+ "isetid": population_id
+ }
+ cursor.execute("SELECT * FROM GenoFreeze WHERE Name=%s",
+ (datasetname,))
+ results = cursor.fetchall()
+ if bool(results):
+ flash(
+ f"A genotype dataset with name '{escape(datasetname)}' "
+ "already exists.",
+ errorclasses)
+ return redirect(url_for("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id,
+ pgsrc="error"),
+ code=307)
+ cursor.execute(
+ "INSERT INTO GenoFreeze("
+ "Name, FullName, ShortName, CreateTime, public, InbredSetId"
+ ") "
+ "VALUES("
+ "%(name)s, %(fname)s, %(sname)s, %(today)s, %(pub)s, %(isetid)s"
+ ")",
+ new_dataset)
+ flash("Created dataset successfully.", "alert-success")
+ return render_template(
+ "rqtl2/create-geno-dataset-success.html",
+ species=species_by_id(conn, species_id),
+ population=population_by_species_and_id(
+ conn, species_id, population_id),
+ rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
+ geno_dataset={**new_dataset, "id": cursor.lastrowid})
+
+ return with_errors(__thunk__,
+ partial(check_species, conn=conn),
+ partial(check_population, conn=conn, species_id=species_id),
+ partial(check_r_qtl2_bundle,
+ species_id=species_id,
+ population_id=population_id))
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle/select-tissue"),
+ methods=["POST"])
+def select_tissue(species_id: int, population_id: int):
+ """Select from existing tissues."""
+ with database_connection(app.config["SQL_URI"]) as conn:
+ def __thunk__():
+ if not bool(request.form.get("tissueid", "").strip()):
+ flash("Invalid tissue selection!",
+ "alert-error error-select-tissue error-rqtl2")
+
+ return redirect(url_for("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id,
+ pgsrc="upload.rqtl2.select_geno_dataset"),
+ code=307)
+
+ return with_errors(__thunk__,
+ partial(check_species, conn=conn),
+ partial(check_population,
+ conn=conn,
+ species_id=species_id),
+ partial(check_r_qtl2_bundle,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_geno_dataset,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id))
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle/create-tissue"),
+ methods=["POST"])
+def create_tissue(species_id: int, population_id: int):
+ """Add new tissue, organ or biological material to the system."""
+ form = request.form
+ datasetinfopage = redirect(
+ url_for("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id,
+ pgsrc="upload.rqtl2.select_geno_dataset"),
+ code=307)
+ with database_connection(app.config["SQL_URI"]) as conn:
+ tissuename = form.get("tissuename", "").strip()
+ tissueshortname = form.get("tissueshortname", "").strip()
+ if not bool(tissuename):
+ flash("Organ/Tissue name MUST be provided.",
+ "alert-error error-create-tissue error-rqtl2")
+ return datasetinfopage
+
+ if not bool(tissueshortname):
+ flash("Organ/Tissue short name MUST be provided.",
+ "alert-error error-create-tissue error-rqtl2")
+ return datasetinfopage
+
+ try:
+ tissue = create_new_tissue(conn, tissuename, tissueshortname)
+ flash("Tissue created successfully!", "alert-success")
+ return render_template(
+ "rqtl2/create-tissue-success.html",
+ species=species_by_id(conn, species_id),
+ population=population_by_species_and_id(
+ conn, species_id, population_id),
+ rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
+ geno_dataset=geno_dataset_by_id(
+ conn,
+ int(request.form["geno-dataset-id"])),
+ tissue=tissue)
+ except mdb.IntegrityError as _ierr:
+ flash("Tissue/Organ with that short name already exists!",
+ "alert-error error-create-tissue error-rqtl2")
+ return datasetinfopage
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle/select-probeset-study"),
+ methods=["POST"])
+def select_probeset_study(species_id: int, population_id: int):
+ """Select or create a probeset study."""
+ with database_connection(app.config["SQL_URI"]) as conn:
+ def __thunk__():
+ summary_page = redirect(url_for("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id),
+ code=307)
+ if not bool(probeset_study_by_id(conn, int(request.form["probe-study-id"]))):
+ flash("Invalid study selected!", "alert-error error-rqtl2")
+ return summary_page
+
+ return summary_page
+ return with_errors(__thunk__,
+ partial(check_species, conn=conn),
+ partial(check_population,
+ conn=conn,
+ species_id=species_id),
+ partial(check_r_qtl2_bundle,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_geno_dataset,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_tissue, conn=conn),
+ partial(check_probe_study,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id))
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle/select-probeset-dataset"),
+ methods=["POST"])
+def select_probeset_dataset(species_id: int, population_id: int):
+ """Select or create a probeset dataset."""
+ with database_connection(app.config["SQL_URI"]) as conn:
+ def __thunk__():
+ summary_page = redirect(url_for("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id),
+ code=307)
+ if not bool(probeset_study_by_id(conn, int(request.form["probe-study-id"]))):
+ flash("Invalid study selected!", "alert-error error-rqtl2")
+ return summary_page
+
+ return summary_page
+
+ return with_errors(__thunk__,
+ partial(check_species, conn=conn),
+ partial(check_population,
+ conn=conn,
+ species_id=species_id),
+ partial(check_r_qtl2_bundle,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_geno_dataset,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_tissue, conn=conn),
+ partial(check_probe_study,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_probe_dataset,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id))
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle/create-probeset-study"),
+ methods=["POST"])
+def create_probeset_study(species_id: int, population_id: int):
+ """Create a new probeset study."""
+ errorclasses = "alert-error error-rqtl2 error-rqtl2-create-probeset-study"
+ with database_connection(app.config["SQL_URI"]) as conn:
+ def __thunk__():
+ form = request.form
+ dataset_info_page = redirect(
+ url_for("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id),
+ code=307)
+
+ if not (bool(form.get("platformid")) and
+ bool(platform_by_id(conn, int(form["platformid"])))):
+ flash("Invalid platform selected.", errorclasses)
+ return dataset_info_page
+
+ if not (bool(form.get("tissueid")) and
+ bool(tissue_by_id(conn, int(form["tissueid"])))):
+ flash("Invalid tissue selected.", errorclasses)
+ return dataset_info_page
+
+ studyname = form["studyname"]
+ try:
+ study = probeset_create_study(
+ conn, population_id, int(form["platformid"]), int(form["tissueid"]),
+ studyname, form.get("studyfullname") or "",
+ form.get("studyshortname") or "")
+ except mdb.IntegrityError as _ierr:
+ flash(f"ProbeSet study with name '{escape(studyname)}' already "
+ "exists.",
+ errorclasses)
+ return dataset_info_page
+ return render_template(
+ "rqtl2/create-probe-study-success.html",
+ species=species_by_id(conn, species_id),
+ population=population_by_species_and_id(
+ conn, species_id, population_id),
+ rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
+ geno_dataset=geno_dataset_by_id(
+ conn,
+ int(request.form["geno-dataset-id"])),
+ study=study)
+
+ return with_errors(__thunk__,
+ partial(check_species, conn=conn),
+ partial(check_population,
+ conn=conn,
+ species_id=species_id),
+ partial(check_r_qtl2_bundle,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_geno_dataset,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_tissue, conn=conn))
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle/create-probeset-dataset"),
+ methods=["POST"])
+def create_probeset_dataset(species_id: int, population_id: int):#pylint: disable=[too-many-return-statements]
+ """Create a new probeset dataset."""
+ errorclasses = "alert-error error-rqtl2 error-rqtl2-create-probeset-dataset"
+ with database_connection(app.config["SQL_URI"]) as conn:
+ def __thunk__():#pylint: disable=[too-many-return-statements]
+ form = request.form
+ summary_page = redirect(url_for("upload.rqtl2.select_dataset_info",
+ species_id=species_id,
+ population_id=population_id),
+ code=307)
+ if not bool(form.get("averageid")):
+ flash("Averaging method not selected!", errorclasses)
+ return summary_page
+ if not bool(form.get("datasetname")):
+ flash("Dataset name not provided!", errorclasses)
+ return summary_page
+ if not bool(form.get("datasetfullname")):
+ flash("Dataset full name not provided!", errorclasses)
+ return summary_page
+
+ tissue = tissue_by_id(conn, form.get("tissueid", "").strip())
+
+ study = probeset_study_by_id(conn, int(form["probe-study-id"]))
+ if not bool(study):
+ flash("Invalid ProbeSet study provided!", errorclasses)
+ return summary_page
+
+ avgmethod = averaging_method_by_id(conn, int(form["averageid"]))
+ if not bool(avgmethod):
+ flash("Invalid averaging method provided!", errorclasses)
+ return summary_page
+
+ try:
+ dset = probeset_create_dataset(conn,
+ int(form["probe-study-id"]),
+ int(form["averageid"]),
+ form["datasetname"],
+ form["datasetfullname"],
+ form["datasetshortname"],
+ form["datasetpublic"] == "on",
+ form.get(
+ "datasetdatascale", "log2"))
+ except mdb.IntegrityError as _ierr:
+ app.logger.debug("Possible integrity error: %s", traceback.format_exc())
+ flash(("IntegrityError: The data you provided has some errors: "
+ f"{_ierr.args}"),
+ errorclasses)
+ return summary_page
+ except Exception as _exc:# pylint: disable=[broad-except]
+ app.logger.debug("Error creating ProbeSet dataset: %s",
+ traceback.format_exc())
+ flash(("There was a problem creating your dataset. Please try "
+ "again."),
+ errorclasses)
+ return summary_page
+ return render_template(
+ "rqtl2/create-probe-dataset-success.html",
+ species=species_by_id(conn, species_id),
+ population=population_by_species_and_id(
+ conn, species_id, population_id),
+ rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
+ geno_dataset=geno_dataset_by_id(
+ conn,
+ int(request.form["geno-dataset-id"])),
+ tissue=tissue,
+ study=study,
+ avgmethod=avgmethod,
+ dataset=dset)
+
+ return with_errors(__thunk__,
+ partial(check_species, conn=conn),
+ partial(check_population,
+ conn=conn,
+ species_id=species_id),
+ partial(check_r_qtl2_bundle,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_geno_dataset,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_tissue, conn=conn),
+ partial(check_probe_study,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id))
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle/dataset-info"),
+ methods=["POST"])
+def select_dataset_info(species_id: int, population_id: int):
+ """
+ If `geno` files exist in the R/qtl2 bundle, prompt user to provide the
+ dataset the genotypes belong to.
+ """
+ form = request.form
+ with database_connection(app.config["SQL_URI"]) as conn:
+ def __thunk__():
+ species = species_by_id(conn, species_id)
+ population = population_by_species_and_id(
+ conn, species_id, population_id)
+ thefile = fullpath(form["rqtl2_bundle_file"])
+ with ZipFile(str(thefile), "r") as zfile:
+ cdata = r_qtl2.control_data(zfile)
+
+ geno_dataset = geno_dataset_by_id(
+ conn,form.get("geno-dataset-id", "").strip())
+ if "geno" in cdata and not bool(form.get("geno-dataset-id")):
+ return render_template(
+ "rqtl2/select-geno-dataset.html",
+ species=species,
+ population=population,
+ rqtl2_bundle_file=thefile.name,
+ datasets=geno_datasets_by_species_and_population(
+ conn, species_id, population_id))
+
+ tissue = tissue_by_id(conn, form.get("tissueid", "").strip())
+ if "pheno" in cdata and not bool(tissue):
+ return render_template(
+ "rqtl2/select-tissue.html",
+ species=species,
+ population=population,
+ rqtl2_bundle_file=thefile.name,
+ geno_dataset=geno_dataset,
+ studies=probeset_studies_by_species_and_population(
+ conn, species_id, population_id),
+ platforms=platforms_by_species(conn, species_id),
+ tissues=all_tissues(conn))
+
+ probeset_study = probeset_study_by_id(
+ conn, form.get("probe-study-id", "").strip())
+ if "pheno" in cdata and not bool(probeset_study):
+ return render_template(
+ "rqtl2/select-probeset-study-id.html",
+ species=species,
+ population=population,
+ rqtl2_bundle_file=thefile.name,
+ geno_dataset=geno_dataset,
+ studies=probeset_studies_by_species_and_population(
+ conn, species_id, population_id),
+ platforms=platforms_by_species(conn, species_id),
+ tissue=tissue)
+ probeset_study = probeset_study_by_id(
+ conn, int(form["probe-study-id"]))
+
+ probeset_dataset = probeset_dataset_by_id(
+ conn, form.get("probe-dataset-id", "").strip())
+ if "pheno" in cdata and not bool(probeset_dataset):
+ return render_template(
+ "rqtl2/select-probeset-dataset.html",
+ species=species,
+ population=population,
+ rqtl2_bundle_file=thefile.name,
+ geno_dataset=geno_dataset,
+ probe_study=probeset_study,
+ tissue=tissue,
+ datasets=probeset_datasets_by_study(
+ conn, int(form["probe-study-id"])),
+ avgmethods=averaging_methods(conn))
+
+ return render_template("rqtl2/summary-info.html",
+ species=species,
+ population=population,
+ rqtl2_bundle_file=thefile.name,
+ geno_dataset=geno_dataset,
+ tissue=tissue,
+ probe_study=probeset_study,
+ probe_dataset=probeset_dataset)
+
+ return with_errors(__thunk__,
+ partial(check_species, conn=conn),
+ partial(check_population,
+ conn=conn,
+ species_id=species_id),
+ partial(check_r_qtl2_bundle,
+ species_id=species_id,
+ population_id=population_id))
+
+
+@rqtl2.route(("/upload/species//population/"
+ "/rqtl2-bundle/confirm-bundle-details"),
+ methods=["POST"])
+def confirm_bundle_details(species_id: int, population_id: int):
+ """Confirm the details and trigger R/qtl2 bundle processing..."""
+ redisuri = app.config["REDIS_URL"]
+ with (database_connection(app.config["SQL_URI"]) as conn,
+ Redis.from_url(redisuri, decode_responses=True) as rconn):
+ def __thunk__():
+ redis_ttl_seconds = app.config["JOBS_TTL_SECONDS"]
+ jobid = str(uuid4())
+ _job = jobs.launch_job(
+ jobs.initialise_job(
+ rconn,
+ jobs.jobsnamespace(),
+ jobid,
+ [
+ sys.executable, "-m", "scripts.process_rqtl2_bundle",
+ app.config["SQL_URI"], app.config["REDIS_URL"],
+ jobs.jobsnamespace(), jobid, "--redisexpiry",
+ str(redis_ttl_seconds)],
+ "R/qtl2 Bundle Upload",
+ redis_ttl_seconds,
+ {
+ "bundle-metadata": json.dumps({
+ "speciesid": species_id,
+ "populationid": population_id,
+ "rqtl2-bundle-file": str(fullpath(
+ request.form["rqtl2_bundle_file"])),
+ "geno-dataset-id": request.form.get(
+ "geno-dataset-id", ""),
+ "probe-study-id": request.form.get(
+ "probe-study-id", ""),
+ "probe-dataset-id": request.form.get(
+ "probe-dataset-id", ""),
+ **({
+ "platformid": probeset_study_by_id(
+ conn,
+ int(request.form["probe-study-id"]))["ChipId"]
+ } if bool(request.form.get("probe-study-id")) else {})
+ })
+ }),
+ redisuri,
+ f"{app.config['UPLOAD_FOLDER']}/job_errors")
+
+ return redirect(url_for("upload.rqtl2.rqtl2_processing_status",
+ jobid=jobid))
+
+ return with_errors(__thunk__,
+ partial(check_species, conn=conn),
+ partial(check_population,
+ conn=conn,
+ species_id=species_id),
+ partial(check_r_qtl2_bundle,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_geno_dataset,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_probe_study,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id),
+ partial(check_probe_dataset,
+ conn=conn,
+ species_id=species_id,
+ population_id=population_id))
+
+
+@rqtl2.route("/status/")
+def rqtl2_processing_status(jobid: UUID):
+ """Retrieve the status of the job processing the uploaded R/qtl2 bundle."""
+ with Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn:
+ try:
+ thejob = jobs.job(rconn, jobs.jobsnamespace(), jobid)
+
+ messagelistname = thejob.get("log-messagelist")
+ logmessages = (rconn.lrange(messagelistname, 0, -1)
+ if bool(messagelistname) else [])
+
+ if thejob["status"] == "error":
+ return render_template(
+ "rqtl2/rqtl2-job-error.html", job=thejob, messages=logmessages)
+ if thejob["status"] == "success":
+ return render_template("rqtl2/rqtl2-job-results.html",
+ job=thejob,
+ messages=logmessages)
+
+ return render_template(
+ "rqtl2/rqtl2-job-status.html", job=thejob, messages=logmessages)
+ except jobs.JobNotFound as _exc:
+ return render_template("rqtl2/no-such-job.html", jobid=jobid)
--
cgit v1.2.3