The following are some of the requirements that the data in your file
- MUST fulfil before it is considered valid for this system:
-
-
-
-
File headings
-
-
The first row in the file should contains the headings. The number of
- headings in this first row determines the number of columns expected for
- all other lines in the file.
-
Each heading value in the first row MUST appear in the first row
- ONE AND ONLY ONE time
-
The sample/cases (previously 'strains') headers in your first row will be
- against those in the
- GeneNetwork database.
-
- If you encounter an error saying your sample(s)/case(s) do not exist
- in the GeneNetwork database, then you will have to use the
- Upload Samples/Cases
- option on this system to upload them.
-
-
-
-
-
Data
-
-
NONE of the data cells/fields is allowed to be empty.
- All fields/cells MUST contain a value.
-
The first column of the data rows will be considered a textual field,
- holding the "identifier" for that row
-
Except for the first column/field for each data row,
- NONE of the data columns/cells/fields should contain
- spurious characters like `eeeee`, `5.555iloveguix`, etc...
- All of them should be decimal values
-
decimal numbers must conform to the following criteria:
-
-
when checking an average file decimal numbers must have exactly three
- decimal places to the right of the decimal point.
-
when checking a standard error file decimal numbers must have six or
- greater decimal places to the right of the decimal point.
-
there must be a number to the left side of the decimal place
- (e.g. 0.55555 is allowed but .55555 is not).
-
-
-
-
-
-
-
-
-
-
Supported File Types
- We support the following file types:
-
-
-
Tab-Separated value files (.tsv)
-
-
The TAB character is used to separate the fields of each
- column
.txt files: Content has the same format as .tsv file above
-
.zip files: each zip file should contain
- ONE AND ONLY ONE file of the .tsv or .txt type above.
- Any zip file with more than one file is invalid, and so is an empty
- zip file.
Each of the sections below gives you a different option for data upload.
- Please read the documentation for each section carefully to understand what
- each section is about.
-
-
-
-
-
R/qtl2 Bundles
-
-
-
This feature combines and extends the two upload methods below. Instead of
- uploading one item at a time, the R/qtl2 bundle you upload can contain both
- the genotypes data (samples/individuals/cases and their data) and the
- expression data.
-
The R/qtl2 bundle, additionally, can contain extra metadata, that neither
- of the methods below can handle.
This feature enables you to upload expression data. It expects the data to
- be in tab-separated values (TSV) files. The data should be
- a simple matrix of phenotype × sample, i.e. The first column is a
- list of the phenotypes and the first row is a list of
- samples/cases.
-
-
If you haven't done so please go to this page to learn the requirements for
- file formats and helpful suggestions to enter your data in a fast and easy
- way.
-
-
-
PLEASE REVIEW YOUR DATA.Make sure your data complies
- with our system requirements. (
- Help
- )
-
UPLOAD YOUR DATA FOR DATA VERIFICATION. We accept
- .csv, .txt and .zip
- files (Help)
For the expression data above, you need the samples/cases in your file to
- already exist in the GeneNetwork database. If there are any samples that do
- not already exist the upload of the expression data will fail.
-
This section gives you the opportunity to upload any missing samples
{{job_name}}: parse results
-
-{%if user_aborted%}
-Job aborted by the user
-{%endif%}
-
-{{errors_display(errors, "No errors found in the file", "We found the following errors", True)}}
-
-{%if errors | length == 0 and not user_aborted %}
-
The processing of the R/qtl2 bundle you uploaded has failed. We have
- provided some information below to help you figure out what the problem
- could be.
-
If you find that you cannot figure out what the problem is on your own,
- please contact the team running the system for assistance, providing the
- following details:
-
-
R/qtl2 bundle you uploaded
-
This URL: {{request_url()}}
-
(maybe) a screenshot of this page
-
-
-
-
-
stdout
-{{cli_output(job, "stdout")}}
-
-
stderr
-{{cli_output(job, "stderr")}}
-
-
Log
-
- {%for msg in messages%}
- {{msg}}
- {%endfor%}
-
Your R/qtl2 files bundle contains a "geno" specification. You will
- therefore need to select from one of the existing Genotype datasets or
- create a new one.
-
This is the dataset where your data will be organised under.
The data is organised in a hierarchical form, beginning with
- species at the very top. Under species the data is
- organised by population, sometimes referred to as grouping.
- (In some really old documents/systems, you might see this referred to as
- InbredSet.)
-
In this section, you get to define what population your data is to be
- organised by.
This is the information you have provided to accompany the R/qtl2 bundle
- you have uploaded. Please verify the information is correct before
- proceeding.
- Provide a valid R/qtl2 zip file here. In particular, ensure your zip bundle
- contains exactly one control file and the corresponding files mentioned in
- the control file.
-
-
- The control file can be either a YAML or JSON file. ALL other data
- files in the zip bundle should be CSV files.
-
You have successfully uploaded the zipped bundle of R/qtl2 files.
-
The next step is to select the various extra information we need to figure
- out what to do with the data. You will select/create the relevant studies
- and/or datasets to organise the data in the steps that follow.
We organise the samples/cases/strains in a hierarchichal form, starting
- with species at the very top. Under species, we have a
- grouping in terms of the relevant population
- (e.g. Inbred populations, cell tissue, etc.)
- There was a critical failure launching the job to parse your file.
- This is our fault and (probably) has nothing to do with the file you uploaded.
-
-
-
- Please notify the developers of this issue when you encounter it,
- providing the link to this page, or the information below.
-
-
-
Debugging Information
-
-
-
job id: {{job_id}}
-
-
-{%endblock%}
diff --git a/qc_app/upload/__init__.py b/qc_app/upload/__init__.py
deleted file mode 100644
index 5f120d4..0000000
--- a/qc_app/upload/__init__.py
+++ /dev/null
@@ -1,7 +0,0 @@
-"""Package handling upload of files."""
-from flask import Blueprint
-
-from .rqtl2 import rqtl2
-
-upload = Blueprint("upload", __name__)
-upload.register_blueprint(rqtl2, url_prefix="/rqtl2")
diff --git a/qc_app/upload/rqtl2.py b/qc_app/upload/rqtl2.py
deleted file mode 100644
index 51d8321..0000000
--- a/qc_app/upload/rqtl2.py
+++ /dev/null
@@ -1,1157 +0,0 @@
-"""Module to handle uploading of R/qtl2 bundles."""#pylint: disable=[too-many-lines]
-import sys
-import json
-import traceback
-from pathlib import Path
-from datetime import date
-from uuid import UUID, uuid4
-from functools import partial
-from zipfile import ZipFile, is_zipfile
-from typing import Union, Callable, Optional
-
-import MySQLdb as mdb
-from redis import Redis
-from MySQLdb.cursors import DictCursor
-from werkzeug.utils import secure_filename
-from flask import (
- flash,
- escape,
- request,
- jsonify,
- url_for,
- redirect,
- Response,
- Blueprint,
- render_template,
- current_app as app)
-
-from r_qtl import r_qtl2
-
-from qc_app import jobs
-from qc_app.files import save_file, fullpath
-from qc_app.dbinsert import species as all_species
-from qc_app.db_utils import with_db_connection, database_connection
-
-from qc_app.db.platforms import platform_by_id, platforms_by_species
-from qc_app.db.averaging import averaging_methods, averaging_method_by_id
-from qc_app.db.tissues import all_tissues, tissue_by_id, create_new_tissue
-from qc_app.db import (
- species_by_id,
- save_population,
- populations_by_species,
- population_by_species_and_id,)
-from qc_app.db.datasets import (
- geno_dataset_by_id,
- geno_datasets_by_species_and_population,
-
- probeset_study_by_id,
- probeset_create_study,
- probeset_dataset_by_id,
- probeset_create_dataset,
- probeset_datasets_by_study,
- probeset_studies_by_species_and_population)
-
-rqtl2 = Blueprint("rqtl2", __name__)
-
-@rqtl2.route("/", methods=["GET", "POST"])
-@rqtl2.route("/select-species", methods=["GET", "POST"])
-def select_species():
- """Select the species."""
- if request.method == "GET":
- return render_template("rqtl2/index.html", species=with_db_connection(all_species))
-
- species_id = request.form.get("species_id")
- species = with_db_connection(
- lambda conn: species_by_id(conn, species_id))
- if bool(species):
- return redirect(url_for(
- "upload.rqtl2.select_population", species_id=species_id))
- flash("Invalid species or no species selected!", "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_species"))
-
-
-@rqtl2.route("/upload/species//select-population",
- methods=["GET", "POST"])
-def select_population(species_id: int):
- """Select/Create the population to organise data under."""
- with database_connection(app.config["SQL_URI"]) as conn:
- species = species_by_id(conn, species_id)
- if not bool(species):
- flash("Invalid species selected!", "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_species"))
-
- if request.method == "GET":
- return render_template(
- "rqtl2/select-population.html",
- species=species,
- populations=populations_by_species(conn, species_id))
-
- population = population_by_species_and_id(
- conn, species["SpeciesId"], request.form.get("inbredset_id"))
- if not bool(population):
- flash("Invalid Population!", "alert-error error-rqtl2")
- return redirect(
- url_for("upload.rqtl2.select_population", pgsrc="error"),
- code=307)
-
- return redirect(url_for("upload.rqtl2.upload_rqtl2_bundle",
- species_id=species["SpeciesId"],
- population_id=population["InbredSetId"]))
-
-
-@rqtl2.route("/upload/species//create-population",
- methods=["POST"])
-def create_population(species_id: int):
- """Create a new population for the given species."""
- population_page = redirect(url_for("upload.rqtl2.select_population",
- species_id=species_id))
- with database_connection(app.config["SQL_URI"]) as conn:
- species = species_by_id(conn, species_id)
- population_name = request.form.get("inbredset_name", "").strip()
- population_fullname = request.form.get("inbredset_fullname", "").strip()
- if not bool(species):
- flash("Invalid species!", "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_species"))
- if not bool(population_name):
- flash("Invalid Population Name!", "alert-error error-rqtl2")
- return population_page
- if not bool(population_fullname):
- flash("Invalid Population Full Name!", "alert-error error-rqtl2")
- return population_page
- new_population = save_population(conn, {
- "SpeciesId": species["SpeciesId"],
- "Name": population_name,
- "InbredSetName": population_fullname,
- "FullName": population_fullname,
- "Family": request.form.get("inbredset_family") or None,
- "Description": request.form.get("description") or None
- })
-
- flash("Population created successfully.", "alert-success")
- return redirect(
- url_for("upload.rqtl2.upload_rqtl2_bundle",
- species_id=species_id,
- population_id=new_population["population_id"],
- pgsrc="create-population"),
- code=307)
-
-
-class __RequestError__(Exception): #pylint: disable=[invalid-name]
- """Internal class to avoid pylint's `too-many-return-statements` error."""
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle"),
- methods=["GET", "POST"])
-def upload_rqtl2_bundle(species_id: int, population_id: int):
- """Allow upload of R/qtl2 bundle."""
- with database_connection(app.config["SQL_URI"]) as conn:
- species = species_by_id(conn, species_id)
- population = population_by_species_and_id(
- conn, species["SpeciesId"], population_id)
- if not bool(species):
- flash("Invalid species!", "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_species"))
- if not bool(population):
- flash("Invalid Population!", "alert-error error-rqtl2")
- return redirect(
- url_for("upload.rqtl2.select_population", pgsrc="error"),
- code=307)
- if request.method == "GET" or (
- request.method == "POST"
- and bool(request.args.get("pgsrc"))):
- return render_template("rqtl2/upload-rqtl2-bundle-step-01.html",
- species=species,
- population=population)
-
- try:
- app.logger.debug("Files in the form: %s", request.files)
- the_file = save_file(request.files["rqtl2_bundle_file"],
- Path(app.config["UPLOAD_FOLDER"]))
- except AssertionError:
- app.logger.debug(traceback.format_exc())
- flash("Please provide a valid R/qtl2 zip bundle.",
- "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.upload_rqtl2_bundle",
- species_id=species_id,
- population_id=population_id))
-
- if not is_zipfile(str(the_file)):
- app.logger.debug("The file is not a zip file.")
- raise __RequestError__("Invalid file! Expected a zip file.")
-
- jobid = trigger_rqtl2_bundle_qc(
- species_id,
- population_id,
- the_file,
- request.files["rqtl2_bundle_file"].filename)#type: ignore[arg-type]
- return redirect(url_for(
- "upload.rqtl2.rqtl2_bundle_qc_status", jobid=jobid))
-
-
-def trigger_rqtl2_bundle_qc(
- species_id: int,
- population_id: int,
- rqtl2bundle: Path,
- originalfilename: str
-) -> UUID:
- """Trigger QC on the R/qtl2 bundle."""
- redisuri = app.config["REDIS_URL"]
- with Redis.from_url(redisuri, decode_responses=True) as rconn:
- jobid = uuid4()
- redis_ttl_seconds = app.config["JOBS_TTL_SECONDS"]
- jobs.launch_job(
- jobs.initialise_job(
- rconn,
- jobs.jobsnamespace(),
- str(jobid),
- [sys.executable, "-m", "scripts.qc_on_rqtl2_bundle",
- app.config["SQL_URI"], app.config["REDIS_URL"],
- jobs.jobsnamespace(), str(jobid), str(species_id),
- str(population_id), "--redisexpiry",
- str(redis_ttl_seconds)],
- "rqtl2-bundle-qc-job",
- redis_ttl_seconds,
- {"job-metadata": json.dumps({
- "speciesid": species_id,
- "populationid": population_id,
- "rqtl2-bundle-file": str(rqtl2bundle.absolute()),
- "original-filename": originalfilename})}),
- redisuri,
- f"{app.config['UPLOAD_FOLDER']}/job_errors")
- return jobid
-
-
-def chunk_name(uploadfilename: str, chunkno: int) -> str:
- """Generate chunk name from original filename and chunk number"""
- if uploadfilename == "":
- raise ValueError("Name cannot be empty!")
- if chunkno < 1:
- raise ValueError("Chunk number must be greater than zero")
- return f"{secure_filename(uploadfilename)}_part_{chunkno:05d}"
-
-
-def chunks_directory(uniqueidentifier: str) -> Path:
- """Compute the directory where chunks are temporarily stored."""
- if uniqueidentifier == "":
- raise ValueError("Unique identifier cannot be empty!")
- return Path(app.config["UPLOAD_FOLDER"], f"tempdir_{uniqueidentifier}")
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle-chunked"),
- methods=["GET"])
-def upload_rqtl2_bundle_chunked_get(# pylint: disable=["unused-argument"]
- species_id: int,
- population_id: int
-):
- """
- Extension to the `upload_rqtl2_bundle` endpoint above that provides a way
- for testing whether all the chunks have been uploaded and to assist with
- resuming a failed upload.
- """
- fileid = request.args.get("resumableIdentifier", type=str) or ""
- filename = request.args.get("resumableFilename", type=str) or ""
- chunk = request.args.get("resumableChunkNumber", type=int) or 0
- if not(fileid or filename or chunk):
- return jsonify({
- "message": "At least one required query parameter is missing.",
- "error": "BadRequest",
- "statuscode": 400
- }), 400
-
- if Path(chunks_directory(fileid),
- chunk_name(filename, chunk)).exists():
- return "OK"
-
- return jsonify({
- "message": f"Chunk {chunk} was not found.",
- "error": "NotFound",
- "statuscode": 404
- }), 404
-
-
-def __merge_chunks__(targetfile: Path, chunkpaths: tuple[Path, ...]) -> Path:
- """Merge the chunks into a single file."""
- with open(targetfile, "ab") as _target:
- for chunkfile in chunkpaths:
- with open(chunkfile, "rb") as _chunkdata:
- _target.write(_chunkdata.read())
-
- chunkfile.unlink()
- return targetfile
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle-chunked"),
- methods=["POST"])
-def upload_rqtl2_bundle_chunked_post(species_id: int, population_id: int):
- """
- Extension to the `upload_rqtl2_bundle` endpoint above that allows large
- files to be uploaded in chunks.
-
- This should hopefully speed up uploads, and if done right, even enable
- resumable uploads
- """
- _totalchunks = request.form.get("resumableTotalChunks", type=int) or 0
- _chunk = request.form.get("resumableChunkNumber", default=1, type=int)
- _uploadfilename = request.form.get(
- "resumableFilename", default="", type=str) or ""
- _fileid = request.form.get(
- "resumableIdentifier", default="", type=str) or ""
- _targetfile = Path(app.config["UPLOAD_FOLDER"], _fileid)
-
- if _targetfile.exists():
- return jsonify({
- "message": (
- "A file with a similar unique identifier has previously been "
- "uploaded and possibly is/has being/been processed."),
- "error": "BadRequest",
- "statuscode": 400
- }), 400
-
- try:
- # save chunk data
- chunks_directory(_fileid).mkdir(exist_ok=True, parents=True)
- request.files["file"].save(Path(chunks_directory(_fileid),
- chunk_name(_uploadfilename, _chunk)))
-
- # Check whether upload is complete
- chunkpaths = tuple(
- Path(chunks_directory(_fileid), chunk_name(_uploadfilename, _achunk))
- for _achunk in range(1, _totalchunks+1))
- if all(_file.exists() for _file in chunkpaths):
- # merge_files and clean up chunks
- __merge_chunks__(_targetfile, chunkpaths)
- chunks_directory(_fileid).rmdir()
- jobid = trigger_rqtl2_bundle_qc(
- species_id, population_id, _targetfile, _uploadfilename)
- return url_for(
- "upload.rqtl2.rqtl2_bundle_qc_status", jobid=jobid)
- except Exception as exc:# pylint: disable=[broad-except]
- msg = "Error processing uploaded file chunks."
- app.logger.error(msg, exc_info=True, stack_info=True)
- return jsonify({
- "message": msg,
- "error": type(exc).__name__,
- "error-description": " ".join(str(arg) for arg in exc.args),
- "error-trace": traceback.format_exception(exc)
- }), 500
-
- return "OK"
-
-
-@rqtl2.route("/upload/species/rqtl2-bundle/qc-status/",
- methods=["GET", "POST"])
-def rqtl2_bundle_qc_status(jobid: UUID):
- """Check the status of the QC jobs."""
- with (Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn,
- database_connection(app.config["SQL_URI"]) as dbconn):
- try:
- thejob = jobs.job(rconn, jobs.jobsnamespace(), jobid)
- messagelistname = thejob.get("log-messagelist")
- logmessages = (rconn.lrange(messagelistname, 0, -1)
- if bool(messagelistname) else [])
- jobstatus = thejob["status"]
- if jobstatus == "error":
- return render_template("rqtl2/rqtl2-qc-job-error.html",
- job=thejob,
- errorsgeneric=json.loads(
- thejob.get("errors-generic", "[]")),
- errorsgeno=json.loads(
- thejob.get("errors-geno", "[]")),
- errorspheno=json.loads(
- thejob.get("errors-pheno", "[]")),
- errorsphenose=json.loads(
- thejob.get("errors-phenose", "[]")),
- errorsphenocovar=json.loads(
- thejob.get("errors-phenocovar", "[]")),
- messages=logmessages)
- if jobstatus == "success":
- jobmeta = json.loads(thejob["job-metadata"])
- species = species_by_id(dbconn, jobmeta["speciesid"])
- return render_template(
- "rqtl2/rqtl2-qc-job-results.html",
- species=species,
- population=population_by_species_and_id(
- dbconn, species["SpeciesId"], jobmeta["populationid"]),
- rqtl2bundle=Path(jobmeta["rqtl2-bundle-file"]).name,
- rqtl2bundleorig=jobmeta["original-filename"])
-
- def compute_percentage(thejob, filetype) -> Union[str, None]:
- if f"{filetype}-linecount" in thejob:
- return "100"
- if f"{filetype}-filesize" in thejob:
- percent = ((int(thejob.get(f"{filetype}-checked", 0))
- /
- int(thejob.get(f"{filetype}-filesize", 1)))
- * 100)
- return f"{percent:.2f}"
- return None
-
- return render_template(
- "rqtl2/rqtl2-qc-job-status.html",
- job=thejob,
- geno_percent=compute_percentage(thejob, "geno"),
- pheno_percent=compute_percentage(thejob, "pheno"),
- phenose_percent=compute_percentage(thejob, "phenose"),
- messages=logmessages)
- except jobs.JobNotFound:
- return render_template("rqtl2/no-such-job.html", jobid=jobid)
-
-
-def redirect_on_error(flaskroute, **kwargs):
- """Utility to redirect on error"""
- return redirect(url_for(flaskroute, **kwargs, pgsrc="error"),
- code=(307 if request.method == "POST" else 302))
-
-
-def check_species(conn: mdb.Connection, formargs: dict) -> Optional[
- tuple[str, Response]]:
- """
- Check whether the 'species_id' value is provided, and whether a
- corresponding species exists in the database.
-
- Maybe give the function a better name..."""
- speciespage = redirect_on_error("upload.rqtl2.select_species")
- if "species_id" not in formargs:
- return "You MUST provide the Species identifier.", speciespage
-
- if not bool(species_by_id(conn, formargs["species_id"])):
- return "No species with the provided identifier exists.", speciespage
-
- return None
-
-
-def check_population(conn: mdb.Connection,
- formargs: dict,
- species_id) -> Optional[tuple[str, Response]]:
- """
- Check whether the 'population_id' value is provided, and whether a
- corresponding population exists in the database.
-
- Maybe give the function a better name..."""
- poppage = redirect_on_error(
- "upload.rqtl2.select_species", species_id=species_id)
- if "population_id" not in formargs:
- return "You MUST provide the Population identifier.", poppage
-
- if not bool(population_by_species_and_id(
- conn, species_id, formargs["population_id"])):
- return "No population with the provided identifier exists.", poppage
-
- return None
-
-
-def check_r_qtl2_bundle(formargs: dict,
- species_id,
- population_id) -> Optional[tuple[str, Response]]:
- """Check for the existence of the R/qtl2 bundle."""
- fileuploadpage = redirect_on_error("upload.rqtl2.upload_rqtl2_bundle",
- species_id=species_id,
- population_id=population_id)
- if not "rqtl2_bundle_file" in formargs:
- return (
- "You MUST provide a R/qtl2 zip bundle for upload.", fileuploadpage)
-
- if not Path(fullpath(formargs["rqtl2_bundle_file"])).exists():
- return "No R/qtl2 bundle with the given name exists.", fileuploadpage
-
- return None
-
-
-def check_geno_dataset(conn: mdb.Connection,
- formargs: dict,
- species_id,
- population_id) -> Optional[tuple[str, Response]]:
- """Check for the Genotype dataset."""
- genodsetpg = redirect_on_error("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id)
- if not bool(formargs.get("geno-dataset-id")):
- return (
- "You MUST provide a valid Genotype dataset identifier", genodsetpg)
-
- with conn.cursor(cursorclass=DictCursor) as cursor:
- cursor.execute("SELECT * FROM GenoFreeze WHERE Id=%s",
- (formargs["geno-dataset-id"],))
- results = cursor.fetchall()
- if not bool(results):
- return ("No genotype dataset with the provided identifier exists.",
- genodsetpg)
- if len(results) > 1:
- return (
- "Data corruption: More than one genotype dataset with the same "
- "identifier.",
- genodsetpg)
-
- return None
-
-def check_tissue(
- conn: mdb.Connection,formargs: dict) -> Optional[tuple[str, Response]]:
- """Check for tissue/organ/biological material."""
- selectdsetpg = redirect_on_error("upload.rqtl2.select_dataset_info",
- species_id=formargs["species_id"],
- population_id=formargs["population_id"])
- if not bool(formargs.get("tissueid", "").strip()):
- return ("No tissue/organ/biological material provided.", selectdsetpg)
-
- with conn.cursor(cursorclass=DictCursor) as cursor:
- cursor.execute("SELECT * FROM Tissue WHERE Id=%s",
- (formargs["tissueid"],))
- results = cursor.fetchall()
- if not bool(results):
- return ("No tissue/organ with the provided identifier exists.",
- selectdsetpg)
-
- if len(results) > 1:
- return (
- "Data corruption: More than one tissue/organ with the same "
- "identifier.",
- selectdsetpg)
-
- return None
-
-
-def check_probe_study(conn: mdb.Connection,
- formargs: dict,
- species_id,
- population_id) -> Optional[tuple[str, Response]]:
- """Check for the ProbeSet study."""
- dsetinfopg = redirect_on_error("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id)
- if not bool(formargs.get("probe-study-id")):
- return "No probeset study was selected!", dsetinfopg
-
- if not bool(probeset_study_by_id(conn, formargs["probe-study-id"])):
- return ("No probeset study with the provided identifier exists",
- dsetinfopg)
-
- return None
-
-
-def check_probe_dataset(conn: mdb.Connection,
- formargs: dict,
- species_id,
- population_id) -> Optional[tuple[str, Response]]:
- """Check for the ProbeSet dataset."""
- dsetinfopg = redirect_on_error("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id)
- if not bool(formargs.get("probe-dataset-id")):
- return "No probeset dataset was selected!", dsetinfopg
-
- if not bool(probeset_dataset_by_id(conn, formargs["probe-dataset-id"])):
- return ("No probeset dataset with the provided identifier exists",
- dsetinfopg)
-
- return None
-
-
-def with_errors(endpointthunk: Callable, *checkfns):
- """Run 'endpointthunk' with error checking."""
- formargs = {**dict(request.args), **dict(request.form)}
- errors = tuple(item for item in (_fn(formargs=formargs) for _fn in checkfns)
- if item is not None)
- if len(errors) > 0:
- flash(errors[0][0], "alert-error error-rqtl2")
- return errors[0][1]
-
- return endpointthunk()
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/select-geno-dataset"),
- methods=["POST"])
-def select_geno_dataset(species_id: int, population_id: int):
- """Select from existing geno datasets."""
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- geno_dset = geno_datasets_by_species_and_population(
- conn, species_id, population_id)
- if not bool(geno_dset):
- flash("No genotype dataset was provided!",
- "alert-error error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_geno_dataset",
- species_id=species_id,
- population_id=population_id,
- pgsrc="error"),
- code=307)
-
- flash("Genotype accepted", "alert-success error-rqtl2")
- return redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id,
- pgsrc="upload.rqtl2.select_geno_dataset"),
- code=307)
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population, conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/create-geno-dataset"),
- methods=["POST"])
-def create_geno_dataset(species_id: int, population_id: int):
- """Create a new geno dataset."""
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- sgeno_page = redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id,
- pgsrc="error"),
- code=307)
- errorclasses = "alert-error error-rqtl2 error-rqtl2-create-geno-dataset"
- if not bool(request.form.get("dataset-name")):
- flash("You must provide the dataset name", errorclasses)
- return sgeno_page
- if not bool(request.form.get("dataset-fullname")):
- flash("You must provide the dataset full name", errorclasses)
- return sgeno_page
- public = 2 if request.form.get("dataset-public") == "on" else 0
-
- with conn.cursor(cursorclass=DictCursor) as cursor:
- datasetname = request.form["dataset-name"]
- new_dataset = {
- "name": datasetname,
- "fname": request.form.get("dataset-fullname"),
- "sname": request.form.get("dataset-shortname") or datasetname,
- "today": date.today().isoformat(),
- "pub": public,
- "isetid": population_id
- }
- cursor.execute("SELECT * FROM GenoFreeze WHERE Name=%s",
- (datasetname,))
- results = cursor.fetchall()
- if bool(results):
- flash(
- f"A genotype dataset with name '{escape(datasetname)}' "
- "already exists.",
- errorclasses)
- return redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id,
- pgsrc="error"),
- code=307)
- cursor.execute(
- "INSERT INTO GenoFreeze("
- "Name, FullName, ShortName, CreateTime, public, InbredSetId"
- ") "
- "VALUES("
- "%(name)s, %(fname)s, %(sname)s, %(today)s, %(pub)s, %(isetid)s"
- ")",
- new_dataset)
- flash("Created dataset successfully.", "alert-success")
- return render_template(
- "rqtl2/create-geno-dataset-success.html",
- species=species_by_id(conn, species_id),
- population=population_by_species_and_id(
- conn, species_id, population_id),
- rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
- geno_dataset={**new_dataset, "id": cursor.lastrowid})
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population, conn=conn, species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/select-tissue"),
- methods=["POST"])
-def select_tissue(species_id: int, population_id: int):
- """Select from existing tissues."""
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- if not bool(request.form.get("tissueid", "").strip()):
- flash("Invalid tissue selection!",
- "alert-error error-select-tissue error-rqtl2")
-
- return redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id,
- pgsrc="upload.rqtl2.select_geno_dataset"),
- code=307)
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/create-tissue"),
- methods=["POST"])
-def create_tissue(species_id: int, population_id: int):
- """Add new tissue, organ or biological material to the system."""
- form = request.form
- datasetinfopage = redirect(
- url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id,
- pgsrc="upload.rqtl2.select_geno_dataset"),
- code=307)
- with database_connection(app.config["SQL_URI"]) as conn:
- tissuename = form.get("tissuename", "").strip()
- tissueshortname = form.get("tissueshortname", "").strip()
- if not bool(tissuename):
- flash("Organ/Tissue name MUST be provided.",
- "alert-error error-create-tissue error-rqtl2")
- return datasetinfopage
-
- if not bool(tissueshortname):
- flash("Organ/Tissue short name MUST be provided.",
- "alert-error error-create-tissue error-rqtl2")
- return datasetinfopage
-
- try:
- tissue = create_new_tissue(conn, tissuename, tissueshortname)
- flash("Tissue created successfully!", "alert-success")
- return render_template(
- "rqtl2/create-tissue-success.html",
- species=species_by_id(conn, species_id),
- population=population_by_species_and_id(
- conn, species_id, population_id),
- rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
- geno_dataset=geno_dataset_by_id(
- conn,
- int(request.form["geno-dataset-id"])),
- tissue=tissue)
- except mdb.IntegrityError as _ierr:
- flash("Tissue/Organ with that short name already exists!",
- "alert-error error-create-tissue error-rqtl2")
- return datasetinfopage
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/select-probeset-study"),
- methods=["POST"])
-def select_probeset_study(species_id: int, population_id: int):
- """Select or create a probeset study."""
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- summary_page = redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id),
- code=307)
- if not bool(probeset_study_by_id(conn, int(request.form["probe-study-id"]))):
- flash("Invalid study selected!", "alert-error error-rqtl2")
- return summary_page
-
- return summary_page
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_tissue, conn=conn),
- partial(check_probe_study,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/select-probeset-dataset"),
- methods=["POST"])
-def select_probeset_dataset(species_id: int, population_id: int):
- """Select or create a probeset dataset."""
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- summary_page = redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id),
- code=307)
- if not bool(probeset_study_by_id(conn, int(request.form["probe-study-id"]))):
- flash("Invalid study selected!", "alert-error error-rqtl2")
- return summary_page
-
- return summary_page
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_tissue, conn=conn),
- partial(check_probe_study,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_probe_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/create-probeset-study"),
- methods=["POST"])
-def create_probeset_study(species_id: int, population_id: int):
- """Create a new probeset study."""
- errorclasses = "alert-error error-rqtl2 error-rqtl2-create-probeset-study"
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- form = request.form
- dataset_info_page = redirect(
- url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id),
- code=307)
-
- if not (bool(form.get("platformid")) and
- bool(platform_by_id(conn, int(form["platformid"])))):
- flash("Invalid platform selected.", errorclasses)
- return dataset_info_page
-
- if not (bool(form.get("tissueid")) and
- bool(tissue_by_id(conn, int(form["tissueid"])))):
- flash("Invalid tissue selected.", errorclasses)
- return dataset_info_page
-
- studyname = form["studyname"]
- try:
- study = probeset_create_study(
- conn, population_id, int(form["platformid"]), int(form["tissueid"]),
- studyname, form.get("studyfullname") or "",
- form.get("studyshortname") or "")
- except mdb.IntegrityError as _ierr:
- flash(f"ProbeSet study with name '{escape(studyname)}' already "
- "exists.",
- errorclasses)
- return dataset_info_page
- return render_template(
- "rqtl2/create-probe-study-success.html",
- species=species_by_id(conn, species_id),
- population=population_by_species_and_id(
- conn, species_id, population_id),
- rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
- geno_dataset=geno_dataset_by_id(
- conn,
- int(request.form["geno-dataset-id"])),
- study=study)
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_tissue, conn=conn))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/create-probeset-dataset"),
- methods=["POST"])
-def create_probeset_dataset(species_id: int, population_id: int):#pylint: disable=[too-many-return-statements]
- """Create a new probeset dataset."""
- errorclasses = "alert-error error-rqtl2 error-rqtl2-create-probeset-dataset"
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():#pylint: disable=[too-many-return-statements]
- form = request.form
- summary_page = redirect(url_for("upload.rqtl2.select_dataset_info",
- species_id=species_id,
- population_id=population_id),
- code=307)
- if not bool(form.get("averageid")):
- flash("Averaging method not selected!", errorclasses)
- return summary_page
- if not bool(form.get("datasetname")):
- flash("Dataset name not provided!", errorclasses)
- return summary_page
- if not bool(form.get("datasetfullname")):
- flash("Dataset full name not provided!", errorclasses)
- return summary_page
-
- tissue = tissue_by_id(conn, form.get("tissueid", "").strip())
-
- study = probeset_study_by_id(conn, int(form["probe-study-id"]))
- if not bool(study):
- flash("Invalid ProbeSet study provided!", errorclasses)
- return summary_page
-
- avgmethod = averaging_method_by_id(conn, int(form["averageid"]))
- if not bool(avgmethod):
- flash("Invalid averaging method provided!", errorclasses)
- return summary_page
-
- try:
- dset = probeset_create_dataset(conn,
- int(form["probe-study-id"]),
- int(form["averageid"]),
- form["datasetname"],
- form["datasetfullname"],
- form["datasetshortname"],
- form["datasetpublic"] == "on",
- form.get(
- "datasetdatascale", "log2"))
- except mdb.IntegrityError as _ierr:
- app.logger.debug("Possible integrity error: %s", traceback.format_exc())
- flash(("IntegrityError: The data you provided has some errors: "
- f"{_ierr.args}"),
- errorclasses)
- return summary_page
- except Exception as _exc:# pylint: disable=[broad-except]
- app.logger.debug("Error creating ProbeSet dataset: %s",
- traceback.format_exc())
- flash(("There was a problem creating your dataset. Please try "
- "again."),
- errorclasses)
- return summary_page
- return render_template(
- "rqtl2/create-probe-dataset-success.html",
- species=species_by_id(conn, species_id),
- population=population_by_species_and_id(
- conn, species_id, population_id),
- rqtl2_bundle_file=request.form["rqtl2_bundle_file"],
- geno_dataset=geno_dataset_by_id(
- conn,
- int(request.form["geno-dataset-id"])),
- tissue=tissue,
- study=study,
- avgmethod=avgmethod,
- dataset=dset)
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_tissue, conn=conn),
- partial(check_probe_study,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/dataset-info"),
- methods=["POST"])
-def select_dataset_info(species_id: int, population_id: int):
- """
- If `geno` files exist in the R/qtl2 bundle, prompt user to provide the
- dataset the genotypes belong to.
- """
- form = request.form
- with database_connection(app.config["SQL_URI"]) as conn:
- def __thunk__():
- species = species_by_id(conn, species_id)
- population = population_by_species_and_id(
- conn, species_id, population_id)
- thefile = fullpath(form["rqtl2_bundle_file"])
- with ZipFile(str(thefile), "r") as zfile:
- cdata = r_qtl2.control_data(zfile)
-
- geno_dataset = geno_dataset_by_id(
- conn,form.get("geno-dataset-id", "").strip())
- if "geno" in cdata and not bool(form.get("geno-dataset-id")):
- return render_template(
- "rqtl2/select-geno-dataset.html",
- species=species,
- population=population,
- rqtl2_bundle_file=thefile.name,
- datasets=geno_datasets_by_species_and_population(
- conn, species_id, population_id))
-
- tissue = tissue_by_id(conn, form.get("tissueid", "").strip())
- if "pheno" in cdata and not bool(tissue):
- return render_template(
- "rqtl2/select-tissue.html",
- species=species,
- population=population,
- rqtl2_bundle_file=thefile.name,
- geno_dataset=geno_dataset,
- studies=probeset_studies_by_species_and_population(
- conn, species_id, population_id),
- platforms=platforms_by_species(conn, species_id),
- tissues=all_tissues(conn))
-
- probeset_study = probeset_study_by_id(
- conn, form.get("probe-study-id", "").strip())
- if "pheno" in cdata and not bool(probeset_study):
- return render_template(
- "rqtl2/select-probeset-study-id.html",
- species=species,
- population=population,
- rqtl2_bundle_file=thefile.name,
- geno_dataset=geno_dataset,
- studies=probeset_studies_by_species_and_population(
- conn, species_id, population_id),
- platforms=platforms_by_species(conn, species_id),
- tissue=tissue)
- probeset_study = probeset_study_by_id(
- conn, int(form["probe-study-id"]))
-
- probeset_dataset = probeset_dataset_by_id(
- conn, form.get("probe-dataset-id", "").strip())
- if "pheno" in cdata and not bool(probeset_dataset):
- return render_template(
- "rqtl2/select-probeset-dataset.html",
- species=species,
- population=population,
- rqtl2_bundle_file=thefile.name,
- geno_dataset=geno_dataset,
- probe_study=probeset_study,
- tissue=tissue,
- datasets=probeset_datasets_by_study(
- conn, int(form["probe-study-id"])),
- avgmethods=averaging_methods(conn))
-
- return render_template("rqtl2/summary-info.html",
- species=species,
- population=population,
- rqtl2_bundle_file=thefile.name,
- geno_dataset=geno_dataset,
- tissue=tissue,
- probe_study=probeset_study,
- probe_dataset=probeset_dataset)
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route(("/upload/species//population/"
- "/rqtl2-bundle/confirm-bundle-details"),
- methods=["POST"])
-def confirm_bundle_details(species_id: int, population_id: int):
- """Confirm the details and trigger R/qtl2 bundle processing..."""
- redisuri = app.config["REDIS_URL"]
- with (database_connection(app.config["SQL_URI"]) as conn,
- Redis.from_url(redisuri, decode_responses=True) as rconn):
- def __thunk__():
- redis_ttl_seconds = app.config["JOBS_TTL_SECONDS"]
- jobid = str(uuid4())
- _job = jobs.launch_job(
- jobs.initialise_job(
- rconn,
- jobs.jobsnamespace(),
- jobid,
- [
- sys.executable, "-m", "scripts.process_rqtl2_bundle",
- app.config["SQL_URI"], app.config["REDIS_URL"],
- jobs.jobsnamespace(), jobid, "--redisexpiry",
- str(redis_ttl_seconds)],
- "R/qtl2 Bundle Upload",
- redis_ttl_seconds,
- {
- "bundle-metadata": json.dumps({
- "speciesid": species_id,
- "populationid": population_id,
- "rqtl2-bundle-file": str(fullpath(
- request.form["rqtl2_bundle_file"])),
- "geno-dataset-id": request.form.get(
- "geno-dataset-id", ""),
- "probe-study-id": request.form.get(
- "probe-study-id", ""),
- "probe-dataset-id": request.form.get(
- "probe-dataset-id", ""),
- **({
- "platformid": probeset_study_by_id(
- conn,
- int(request.form["probe-study-id"]))["ChipId"]
- } if bool(request.form.get("probe-study-id")) else {})
- })
- }),
- redisuri,
- f"{app.config['UPLOAD_FOLDER']}/job_errors")
-
- return redirect(url_for("upload.rqtl2.rqtl2_processing_status",
- jobid=jobid))
-
- return with_errors(__thunk__,
- partial(check_species, conn=conn),
- partial(check_population,
- conn=conn,
- species_id=species_id),
- partial(check_r_qtl2_bundle,
- species_id=species_id,
- population_id=population_id),
- partial(check_geno_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_probe_study,
- conn=conn,
- species_id=species_id,
- population_id=population_id),
- partial(check_probe_dataset,
- conn=conn,
- species_id=species_id,
- population_id=population_id))
-
-
-@rqtl2.route("/status/")
-def rqtl2_processing_status(jobid: UUID):
- """Retrieve the status of the job processing the uploaded R/qtl2 bundle."""
- with Redis.from_url(app.config["REDIS_URL"], decode_responses=True) as rconn:
- try:
- thejob = jobs.job(rconn, jobs.jobsnamespace(), jobid)
-
- messagelistname = thejob.get("log-messagelist")
- logmessages = (rconn.lrange(messagelistname, 0, -1)
- if bool(messagelistname) else [])
-
- if thejob["status"] == "error":
- return render_template(
- "rqtl2/rqtl2-job-error.html", job=thejob, messages=logmessages)
- if thejob["status"] == "success":
- return render_template("rqtl2/rqtl2-job-results.html",
- job=thejob,
- messages=logmessages)
-
- return render_template(
- "rqtl2/rqtl2-job-status.html", job=thejob, messages=logmessages)
- except jobs.JobNotFound as _exc:
- return render_template("rqtl2/no-such-job.html", jobid=jobid)
--
cgit v1.2.3