From baeafc5ccc4a9893d22e6629db97720e3fa6d3ae Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sun, 3 Dec 2023 09:47:38 -0600 Subject: Rename/move --- topics/genenetwork/developing-against-gn.gmi | 198 +++++++++++++++++++++ topics/genenetwork/phenotype-naming-convention.gmi | 33 ++++ topics/genenetwork/starting_gn1.gmi | 102 +++++++++++ topics/genenetwork/starting_gn2_and_gn3.gmi | 52 ++++++ topics/genenetwork/temp-trait-submission.gmi | 11 ++ 5 files changed, 396 insertions(+) create mode 100644 topics/genenetwork/developing-against-gn.gmi create mode 100644 topics/genenetwork/phenotype-naming-convention.gmi create mode 100644 topics/genenetwork/starting_gn1.gmi create mode 100644 topics/genenetwork/starting_gn2_and_gn3.gmi create mode 100644 topics/genenetwork/temp-trait-submission.gmi (limited to 'topics/genenetwork') diff --git a/topics/genenetwork/developing-against-gn.gmi b/topics/genenetwork/developing-against-gn.gmi new file mode 100644 index 0000000..b94b681 --- /dev/null +++ b/topics/genenetwork/developing-against-gn.gmi @@ -0,0 +1,198 @@ +# Developing against GeneNetwork + +## Configuration + +GeneNetwork2 comes with a [default configuration file](./etc/default_settings.py) +which can be used as a starting point. + +The recommended way to deal with the configurations is to **copy** this default configuration file to a location outside of the repository, say, + +```sh +.../genenetwork2$ cp etc/default_settings.py "${HOME}/configurations/gn2.py" +``` + +then change the appropriate values in the new file. You can then pass in the new +file as the configuration file when launching the application, + +```sh +.../genenetwork2$ bin/genenetwork "${HOME}/configurations/gn2.py" +``` + +The other option is to override the configurations in `etc/default_settings.py` +by setting the configuration you want to override as an environment variable e.g. +to override the `SQL_URI` value, you could do something like: + +```sh +.../genenetwork2$ env SQL_URI="mysql://:@:/" \ + bin/genenetwork "${HOME}/configurations/gn2.py" +``` + +replacing the placeholders in the angle brackets with appropriate values. + +For a detailed breakdown of the configuration variables and their use, see the +[configuration documentation](doc/configurations.org) + +## Run + +Once having installed GN2 it can be run through a browser +interface + +```sh +genenetwork2 +``` + +A quick example is + +```sh +env GN2_PROFILE=~/opt/gn-latest SERVER_PORT=5300 \ + GENENETWORK_FILES=~/data/gn2_data/ \ + GN_PROXY_URL="http://localhost:8080"\ + GN3_LOCAL_URL="http://localhost:8081"\ + SPARQL_ENDPOINT=http://localhost:8892/sparql\ + ./bin/genenetwork2 ./etc/default_settings.py -gunicorn-dev +``` + +For full examples (you may need to set a number of environment +variables), including running scripts and a Python REPL, also see the +startup script [./bin/genenetwork2](https://github.com/genenetwork/genenetwork2/blob/testing/bin/genenetwork2). + +Also mariadb and redis need to be running, see +[INSTALL](./doc/README.org). + +## Debugging + +To run the application under the pdb debugger, you can add the `--with-pdb` +option when launching the application, for example: + +```sh +env GN2_PROFILE=~/opt/gn-latest SERVER_PORT=5300 \ + GENENETWORK_FILES=~/data/gn2_data/ \ + GN_PROXY_URL="http://localhost:8080"\ + GN3_LOCAL_URL="http://localhost:8081"\ + SPARQL_ENDPOINT=http://localhost:8892/sparql\ + ./bin/genenetwork2 ./etc/default_settings.py --with-pdb +``` + +**NOTE**: This should only ever be run in development. +**NOTE 2**: You will probably need to tell pdb to continue at least once before +the system begins serving the pages. + +Now, you can add the `breakpoint()` call wherever you need to debug and the +terminal where you started the application with `--with-pdb` will allow you to +issue commands to pdb to debug your application. + +## Development + +It may be useful to pull in the GN3 python modules locally. For this +use `GN3_PYTHONPATH` environment that gets injected in +the ./bin/genenetwork2 startup. + +A continuously deployed instance of genenetwork2 is available at +[https://cd.genenetwork.org/](https://cd.genenetwork.org/). This +instance is redeployed on every commit provided that the [continuous +integration tests](https://ci.genenetwork.org/jobs/genenetwork2) pass. + +## Testing + +To have tests pass, the redis and mariadb instance should be running, because of +asserts sprinkled in the code base. + +Right now, the only tests running in CI are unittests. Please make +sure the existing unittests are green when submitting a PR. + +From the root directory of the repository, you can run the tests with something +like: + +```sh +env GN_PROFILE=~/opt/gn-latest SERVER_PORT=5300 \ + SQL_URI= \ + ./bin/genenetwork2 ./etc/default_settings.py \ + -c -m pytest -vv +``` + +In the case where you use the default `etc/default_settings.py` configuration file, you can override any setting as demonstrated with the `SQL_URI` setting in the command above. + +In order to avoid having to set up a whole host of settings every time with the `env` command, you could copy the `etc/default_settings.py` file to a new location (outside the repository is best), and pass that to `bin/genenetwork2` instead. + +See +[./bin/genenetwork2](https://github.com/genenetwork/genenetwork2/blob/testing/doc/docker-container.org) +for more details. + +#### Mechanical Rob + +We are building 'Mechanical Rob' automated testing using Python +[requests](https://github.com/genenetwork/genenetwork2/tree/testing/test/requests) +which can be run with: + +```sh +env GN2_PROFILE=~/opt/gn-latest \ + ./bin/genenetwork2 \ + GN_PROXY_URL="http://localhost:8080" \ + GN3_LOCAL_URL="http://localhost:8081 "\ + ./etc/default_settings.py -c \ + ../test/requests/test-website.py -a http://localhost:5003 +``` + +The GN2_PROFILE is the Guix profile that contains all +dependencies. The ./bin/genenetwork2 script sets up the environment +and executes test-website.py in a Python interpreter. The -a switch +says to run all tests and the URL points to the running GN2 http +server. + +#### Unit tests + +To run unittests, first `cd` into the genenetwork2 directory: + +```sh +# You can use the coverage tool to run the tests +# You could omit the -v which makes the output verbose +runcmd coverage run -m unittest discover -v + +# Alternatively, you could run the unittests using: +runpython -m unittest discover -v + +# To generate a report in wqflask/coverage_html_report/: +runcmd coverage html +``` + +The `runcmd` and `runpython` are shell aliases defined in the following way: + +```sh +alias runpython="env GN2_PROFILE=~/opt/gn-latest TMPDIR=/tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ GN_PROXY_URL="http://localhost:8080" GN3_LOCAL_URL="http://localhost:8081" ./bin/genenetwork2 + +alias runcmd="time env GN2_PROFILE=~/opt/gn-latest TMPDIR=//tmp SERVER_PORT=5004 GENENETWORK_FILES=/gnu/data/gn2_data/ GN_PROXY_URL="http://localhost:8080" GN3_LOCAL_URL="http://localhost:8081" ./bin/genenetwork2 ./etc/default_settings.py -cli" +``` + +Replace some of the env variables as per your use case. + +### Troubleshooting + +If the menu does not pop up check your `GN2_BASE_URL`. E.g. + +``` +curl http://gn2-pjotr.genenetwork.org/api/v_pre1/gen_dropdown +``` + +check the logs. If there is ERROR 1054 (42S22): Unknown column +'InbredSet.Family' in 'field list' it may be you are trying the small +database. + +### Run Scripts + +As part of the profiling effort, some scripts are added to run specific parts of the system under a profiler without running the entire web-server - as such, to run the script, you could do something like: + +``` +env HOME=/home/frederick \ + GN2_PROFILE=~/opt/gn2-latest \ + GN3_DEV_REPO_PATH=~/genenetwork/genenetwork3 \ + SQL_URI="mysql://username:password@host-ip:host-port/db_webqtl" \ + SPARQL_ENDPOINT=http://localhost:8892/sparql\ + SERVER_PORT=5001 \ + bin/genenetwork2 ../gn2_settings.py \ + -cli python3 -m scripts.profile_corrs \ + ../performance_$(date +"%Y%m%dT%H:%M:%S").profile +``` + +and you can find the performance metrics at the file specified, in this case, a file starting with `performance_` with the date and time of the run, and ending with `.profile`. + +Please replace the environment variables in the sample command above with the appropriate values for your environment. diff --git a/topics/genenetwork/phenotype-naming-convention.gmi b/topics/genenetwork/phenotype-naming-convention.gmi new file mode 100644 index 0000000..dd7583f --- /dev/null +++ b/topics/genenetwork/phenotype-naming-convention.gmi @@ -0,0 +1,33 @@ +# Phenotype Naming Conventions + +In our phenotype data entry in GeneNetwork we have two fields for users to enter abbreviations of their phenotypes - abbreviation before publication and abbreviation after publication. The former must have value but can be cryptic such as EJC_Trait749. But the later abbreviation - which MUST be entered at the same time - is the permanent abbreviation to be used in graphs and figures. + +Many of these abbreviations are getting way way too long to be useful on graphs and plots. The painful reality is that there is almost no rhyme or reason to the format of these abbreviations because we have bad curation: + +* ymaze_SponAlt_12m_NtgBXD_Males +* Barnesiella_genus_HFD_log10_fraction +* OTU_12_CD_log10_fraction +* HW_BW_Male_16_months_and_older +* Log2Fold_vs_CTL_IL6_M_CORT_PFC +* Complex motor Learning +* M_CONSTRICT +* F_LD_TRANSITIONS +* LOC OFLD 20-25 +* Cnt_AdrWts +* Hbidm + +Since we have a second-generation curation tool in progress, it would be great to apply some formal reasoning and formatting conventions to our phenotype descriptions at a higher level. We can build a system that begs or demands that the use follow a particular structure on BUILDING up their abbreviations for their study. For example, we might ask users to use the following conventions for age and sex of cases + +* "_M6-8m" for males 6 to 8 months of age +* "_F>24m" for females older than 24 months, +* "_MF6-8d" for both males and females at 6 to 8 days of age + +1. First we need to impose a limit of 15 characters for true graph-compatible abbreviations. The main purpose of abbreviations is to add labels to graphs and figures. Even 15 characters may be too long, but we can truncate middle characters and just keep the first and last 5 characters if we need to be brutal. We can also allow a "Wordy Abbreviation" or the "Data Owner's Laboratory Style Abbreviation". + +2. Our GN abbreviations must be unique within a particular study but not necessarily across studies. But "across study" is a problem if we have *BW_M_6m* as the body weight of males at 6 months for 6 or more publications. Then we may need to programmatically add further tags such as year of publication (last two digits). + +3. We have to decide on a format that WE IMPOSE. For better or for worse, we are apparently one of the major curators for formats for phenotype abbreviations. Perhaps we need to formalize this with the Phenome Database team. + +Given the above concerns, the real way to think about metadata is descriptive RDF. I.e. separate terms for species, breed, trait, individual. It is fine to come up with identifiers that look descriptive, but they really should not be more than identifiers. Our current practice of parsing identifiers for 'logic' is very fragile and therefore a bad idea. + +There are better ways to do computable semantics; we have some need for “pretty” abbreviations but these are not required to be unique and must be useable on charts so we constrain the length and usually include uid. We are still able to do the curation for mouse traits, so you can access. diff --git a/topics/genenetwork/starting_gn1.gmi b/topics/genenetwork/starting_gn1.gmi new file mode 100644 index 0000000..efbfd0f --- /dev/null +++ b/topics/genenetwork/starting_gn1.gmi @@ -0,0 +1,102 @@ +# Starting GN1 + +The GN1 repos are at + +=> https://github.com/genenetwork/genenetwork1 + +Branches are: + +* master: my main branch - used in pjotr-test +* lily: running but almost discontinued +* production: on tux01 + +Note that there are some hard coded paths/IPs - so simply merging is not a great idea. + +On tux01 GN1 is running inside a Guix container. + +Start a screen and run the guix deploy script. See the README file in + +gn1@tux01:~/production/gnshare/gn + +# Guix + +At this point GN1 is fixed at Feb 2021: + +guix: 1.2.0-12.dffc918 +guix-past: 159be3d7e86e1f22b2b7b1efc938ed63120dc973 +guix-bioinformatics: 697a66bf0e897a101e8e3cefbaf250491039fe93 + +# Building + +On an update of guix the build may fail. Try + +``` +~/opt/guix-gn1/bin/guix build + -L /home/gn1/guix-past/modules/ \ + -L /home/gn1/guix-bioinformatics/ \ + genenetwork1 +``` + +## Updating mariadb connection on lily + +``` + restart apache in lily + [root@lily base]# /etc/init.d/httpd restart + Stopping httpd: [ OK ] + Starting httpd: [ OK ] + [root@lily base]# pwd + /gnshare/gn/web/webqtl/base + [root@lily base]# + /gnshare/gn/web/webqtl/base/webqtlConfigLocal.py + #######################################' + # Environment Variables - private + ######################################### + # sql_host = '[1]tux02.uthsc.edu' + # sql_host = '128.169.4.67' + sql_host = '172.23.18.213' + SERVERNAME = sql_host + MYSQL_SERVER = sql_host + DB_NAME = 'db_webqtl' + DB_USER = 'x' + DB_PASSWD = 'x' + MYSQL_UPDSERVER = sql_host + DB_UPDNAME = 'db_webqtl' + DB_UPDUSER = 'x' + DB_UPDPASSWD = 'x' + GNROOT = '/gnshare/gn/' + PythonPath = '/usr/bin/python' + PIDDLE_FONT_PATH = + '/usr/lib/python2.4/site-packages/piddle/truetypefonts/' +``` + +SQL may also need to be updated here: + +=> /gnshare/gn/web/webqtl/base/webqtlConfigLocal.py +=> /gnshare/gn/web/infoshare/includes/config.html +=> /gnshare/gn/web/infoshare/includes/db.inc + +## Updating from lily + +Git sync + +``` +gn1@tux01:~/production/gnshare/gn-pjotr-test$ +git checkout lily +git pull pjotr@lily.genenetwork.org:/gnshare/gn/ +``` + +Menu sync + +``` +gn1@tux01:~/production/gnshare/gn-pjotr-test$ +scp pjotr@lily.genenetwork.org:/gnshare/gn/web/javascript/*.js web/javascript/ +git status +``` + +## Updating httpd.conf + +To update the httpd.conf you need to edit the system file in guix-bioinformatics. It can be built with + +``` +guix build -L ~/guix-past/modules/ -L ~/guix-bioinformatics/ -e '(@ (gn services gn1-httpd-config) GN1-httpd-config)' +``` diff --git a/topics/genenetwork/starting_gn2_and_gn3.gmi b/topics/genenetwork/starting_gn2_and_gn3.gmi new file mode 100644 index 0000000..1cfed14 --- /dev/null +++ b/topics/genenetwork/starting_gn2_and_gn3.gmi @@ -0,0 +1,52 @@ +# How to Start GN2 and GN3 + +This document describes in a short how we run GN2 and GN3 on the current production setup. + +Note that we should replace this with a system container. + +This details how GN2/GN3 production are currently started. It's probably a good idea to create a shell script for starting GN3 like we have for GN2 at some point, since currently environment variables are set manually. + +See also + +=> systems/gn-services.gmi + +# GN3 + +GN2 depends on GN3 for REST services and libraries. + +## Environment + +Set PATH/PYTHONPATH/GN2_PROFILE environment variables + +Example: + +``` +export GN2_PROFILE=/home/zas1024/opt/gn-latest-20221206 +export PATH=$GN2_PROFILE/bin:$PATH +export PYTHONPATH="$GN2_PROFILE/lib/python3.9/site-packages" +``` + +## Start development on port 8081 + +Start GN3 from the relevant directory + +``` +env FLASK_DEBUG=1 FLASK_APP="main.py" CORS_ORIGINS="http://gn2-zach.genenetwork.org:*,https://gn2-zach.genenetwork.org:*,http://genenetwork.org:*,https://genenetwork.org:*" flask run --port=8081 +``` + +GN3 has a settings.py file now. See the README. + +## Start production on port 8087 + +``` +gn2@tux01: +cd ~/gn3_production/genenetwork3 +gunicorn --bind 0.0.0.0:8087 --workers 8 --keep-alive 6000 --max-requests 10 --max-requests-jitter 5 --timeout 1200 wsgi:app +``` + +Note I had to comment out some oauth stuff on the latest. + + +# GN2 + +1. Just run /home/gn2/production/run_production.sh diff --git a/topics/genenetwork/temp-trait-submission.gmi b/topics/genenetwork/temp-trait-submission.gmi new file mode 100644 index 0000000..7029e2a --- /dev/null +++ b/topics/genenetwork/temp-trait-submission.gmi @@ -0,0 +1,11 @@ +# How to Submit a Temp trait (for testing purposes or otherwise) + +1. Click Submit Trait under the Intro dropdown in the header bar + +2. Copy select the species and group you want to submit for from the dropdowns under Step 1 (I just use Mouse/BXD for testing) + +3. Navigate to the following GN1 link to get a sample list of trait values (so in this case navigate down to BXD) - https://gn1.genenetwork.org/RIsample.html + +4. Copy and paste those values into the Step 2 text area + +5. Click Submit Trait (which should then take you to a trait page with the submitted sample values) -- cgit v1.2.3