From f71c760fab17446605d9f1d101a759a1a391dfc7 Mon Sep 17 00:00:00 2001 From: Pjotr Prins Date: Sun, 3 Dec 2023 09:49:19 -0600 Subject: Renames --- topics/add-metadata-to-trait-page.gmi | 61 ---------- topics/automated-testing.gmi | 136 ---------------------- topics/genenetwork/add-metadata-to-trait-page.gmi | 61 ++++++++++ topics/programming/automated-testing.gmi | 136 ++++++++++++++++++++++ 4 files changed, 197 insertions(+), 197 deletions(-) delete mode 100644 topics/add-metadata-to-trait-page.gmi delete mode 100644 topics/automated-testing.gmi create mode 100644 topics/genenetwork/add-metadata-to-trait-page.gmi create mode 100644 topics/programming/automated-testing.gmi diff --git a/topics/add-metadata-to-trait-page.gmi b/topics/add-metadata-to-trait-page.gmi deleted file mode 100644 index 8e2d433..0000000 --- a/topics/add-metadata-to-trait-page.gmi +++ /dev/null @@ -1,61 +0,0 @@ -# Add Metadata To The Trait Page (RDF) - -Fri 30 Sep 2022 11:48:41 EAT - -## Introduction - -We are migrating the GN2 relational database to a plain text and RDF database. Matrix-like data (E.g. fetching sample data for a given data) will be stored inside GN. - -So far, we are able to convert the sql data to rdf using "dump.scm" defined in: - -=> https://github.com/genenetwork/dump-genenetwork-database - -## What are we trying to solve? - -Data stored in genenetwork resembles a tree. As an example: we have several species; each of these species belong to a group; each group belongs to a "data type"; and each data type belongs to a particular dataset. The first step: capturing - albeit requiring more refinement - this data in RDF has been achieved using the aformentioned scheme script. - -The overall goal is to be ablet to: - -* Incrementally replace MySQL queries with RDF. - -* Annotating existing data with metadata that does not yet exist in GN2. - -## Goals - -In the Trait Analysis page, for example: - -=> https://genenetwork.org/show_trait?trait_id=1434280_at&dataset=HC_M2_0606_P - -and the corresponding GN1 link: - -=> http://gn1.genenetwork.org/webqtl/main.py?cmd=show&db=HC_M2_0606_P&probeset=1434280_at - -which on further inspection presents metadata on that specific dataset group here: - -=> http://gn1.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112 - -We notice that there's metadata in GN1 - which we have in RDF - that we can add to the GN2 traits page. As such, this design doc will be limited to using RDF to: - -* Append metadata about the tissue -* Append relevant metadata about the dataset group, in particular: about the data values and it's processing; about the array platform; experiment type; and contributors. - -Beyond querying metadata, this design doc also proposes the creation of a monadic rdf-fetch similar to what happens in: - -=> https://issues.genenetwork.org/topics/maybe-monad - -### Non-goals - -* Refactoring base classes/sql to solely use RDF. -* Using federated queries - they are slow. -* Writing a script in Guile to fetch and append extra metadata from wikidata and insert them into RDF as extra nodes. This should be tackled as a separate issue. - -## Actual Design - -* Rewrite the existing way of fetching RDF using pymonads. -* React to the change-amplification - should any exist - caused by the above change and add tests where feasible. -* Create endpoints to add extra annotations for Tissue, Dataset Group, Dataset Values and Processing; array platform; experiment type; and contributors. -* Add metadata as links, tooltips, or html tag to the relevant html section(s). - -## Resources - -=> https://www.linkedin.com/pulse/six-secret-sparql-ninja-tricks-kurt-cagle/ diff --git a/topics/automated-testing.gmi b/topics/automated-testing.gmi deleted file mode 100644 index 6b22423..0000000 --- a/topics/automated-testing.gmi +++ /dev/null @@ -1,136 +0,0 @@ -# Automated Testing - -## Tags - -* keywords: testing, CI, CD - -## Introduction - -As part of the -=> ../systems/ci-cd.gmi CI/CD effort -there is need for automated tests to ensure that the system is working as expected. - -This document is meant to track the implementation of the automated tests and possibly the related infrastructure for running the tests. - -## Genenetwork 3 - -### Unit Tests - -There is a collection of unit tests in the *tests/unit* in the -=> https://github.com/genenetwork/genenetwork3 Genenetwork 3 repository - -### Integration - -There is (as of 2022-Feb-10) an -=> https://github.com/genenetwork/genenetwork3/tree/main/tests/integration integration tests directory -in the Genenetwork 3 repository. - -The tests there, however, are technically unit tests. Each test seems to test a single logical unit of the system e.g. correlations, gemma, etc. - -There is no test that seems to check for interactions among the logical units/modules of the system e.g. - -* authorisation <==> file-upload <==> correlations -* partial-correlations <==> trait-editting <==> gemma analysis - -etc. - -### API Tests - -There is need for tests to ensure that all expected endpoints are up and running. - -Maybe even check that the data is correct. - -### Performance and Responsiveness - -There is a need to ensure that the system does not take forever to compute stuff. - -There is a single performance tests module in -=> https://github.com/genenetwork/genenetwork3/tree/main/tests/performance the performance tests directory -for Genenetwork 3 but it is run manually, and mostly tests a very specific query that might or might not have been used in the code. - -The performance tests in GN3 should probably be focussed on checking the following (among others): - -* Each API endpoint responds within a specified amount of time -* Select computation-heavy functions respond within a specified amount of time for given data -* Database-querying functions used in the system respond within specified amount of time - -etc. - -This is relevant since GN3 is behind Nginx which defines a timeout. - -### Regression Tests - -Checks that previously working features are not broken. These can be added as we go along - -## Genenetwork 2 - -### Unit Tests - -Present under the -=> https://github.com/genenetwork/genenetwork2/tree/testing/wqflask/tests/unit unit tests directory -in the GN2 repository. - -### Integration - -Genenetwork 2 has a "Mechanical Rob" testing system that is under construction whose purpose (as far as I - fredm - can tell) is to "walk" some common paths that have multiple logical units working together, thus performing some form of integration testing. - -The only issue I (fredm) find in that as it is currently, it will not be able to test javascript interactions that are crucial to some operations in certain flows. - -### Performance and Responsiveness - -Since GN2 is not meant to handle computations itself, the bigger concern here is responsiveness. - -There might need to be checks for responsiveness built in. - -### Regression Tests - -Checks that previously working features are not broken. These can be added as we go along - -### Notes from Email Correspondence - -Selenium (and other browser-automation tools) were said to be too complicated, and are to be avoided as much as possible. It is better to use headless Firefox/Chromium and fetch pages with Mechanical Rob. - -Selenium had been previously introduced in GN2 and then swiftly removed. - - -=> https://github.com/genenetwork/gn-docs/blob/master/scripts/screenshot.rb sample script to create screen shots - - -=> https://github.com/genenetwork/gn-deploy-servers/blob/master/scripts/rabbit/monitor_websites.sh tests currently run to monitor GN2 and end points - -We should cover the GN2/GN3 ones. - - -For GN2 we should write scripts that test: - -* [ ] main menu selector -* [ ] search -* [ ] global search -* [ ] search with wild cards -* [ ] select items and add to shopping basket/collections -* [ ] mapping page -* [ ] run R/qtl mapping -* [ ] run GEMMA mapping -* [ ] run correlations -* [ ] Run the functions on the shopping basket/collections, such as CTL, network graph - - -> Please don't emulate hitting browser buttons (selenium style). Simply -> find URL paths that do the job. You may have to set cookies. -> -> Mechanical Rob can do that. -> -> If using the browser interface is too hard, then create 'back-end' -> tests that cover the real functionality. I am not too concerned about -> the browser display - i.e., real Rob catches that quickly enough. I am -> *really* worried about regressions where search etc. starts giving -> different results. -> -> See what I mean? -> -> Pj. - - -## Testing interface - -Tests in different categories should be grouped into different command-line endpoints. For example, unit tests could be run by "python3 setup.py check", integration tests could be run by "python3 setup.py integration-check", performance tests could be run by "python3 setup.py performance-check", and so on. This way, the CI will have to be configured only once, and then committers will be able to add new tests without requesting for a CI reconfiguration each time. We won't have to wait on others to respond. Less coordination will be required leading to smoother work for everyone. diff --git a/topics/genenetwork/add-metadata-to-trait-page.gmi b/topics/genenetwork/add-metadata-to-trait-page.gmi new file mode 100644 index 0000000..8e2d433 --- /dev/null +++ b/topics/genenetwork/add-metadata-to-trait-page.gmi @@ -0,0 +1,61 @@ +# Add Metadata To The Trait Page (RDF) + +Fri 30 Sep 2022 11:48:41 EAT + +## Introduction + +We are migrating the GN2 relational database to a plain text and RDF database. Matrix-like data (E.g. fetching sample data for a given data) will be stored inside GN. + +So far, we are able to convert the sql data to rdf using "dump.scm" defined in: + +=> https://github.com/genenetwork/dump-genenetwork-database + +## What are we trying to solve? + +Data stored in genenetwork resembles a tree. As an example: we have several species; each of these species belong to a group; each group belongs to a "data type"; and each data type belongs to a particular dataset. The first step: capturing - albeit requiring more refinement - this data in RDF has been achieved using the aformentioned scheme script. + +The overall goal is to be ablet to: + +* Incrementally replace MySQL queries with RDF. + +* Annotating existing data with metadata that does not yet exist in GN2. + +## Goals + +In the Trait Analysis page, for example: + +=> https://genenetwork.org/show_trait?trait_id=1434280_at&dataset=HC_M2_0606_P + +and the corresponding GN1 link: + +=> http://gn1.genenetwork.org/webqtl/main.py?cmd=show&db=HC_M2_0606_P&probeset=1434280_at + +which on further inspection presents metadata on that specific dataset group here: + +=> http://gn1.genenetwork.org/webqtl/main.py?FormID=sharinginfo&GN_AccessionId=112 + +We notice that there's metadata in GN1 - which we have in RDF - that we can add to the GN2 traits page. As such, this design doc will be limited to using RDF to: + +* Append metadata about the tissue +* Append relevant metadata about the dataset group, in particular: about the data values and it's processing; about the array platform; experiment type; and contributors. + +Beyond querying metadata, this design doc also proposes the creation of a monadic rdf-fetch similar to what happens in: + +=> https://issues.genenetwork.org/topics/maybe-monad + +### Non-goals + +* Refactoring base classes/sql to solely use RDF. +* Using federated queries - they are slow. +* Writing a script in Guile to fetch and append extra metadata from wikidata and insert them into RDF as extra nodes. This should be tackled as a separate issue. + +## Actual Design + +* Rewrite the existing way of fetching RDF using pymonads. +* React to the change-amplification - should any exist - caused by the above change and add tests where feasible. +* Create endpoints to add extra annotations for Tissue, Dataset Group, Dataset Values and Processing; array platform; experiment type; and contributors. +* Add metadata as links, tooltips, or html tag to the relevant html section(s). + +## Resources + +=> https://www.linkedin.com/pulse/six-secret-sparql-ninja-tricks-kurt-cagle/ diff --git a/topics/programming/automated-testing.gmi b/topics/programming/automated-testing.gmi new file mode 100644 index 0000000..6b22423 --- /dev/null +++ b/topics/programming/automated-testing.gmi @@ -0,0 +1,136 @@ +# Automated Testing + +## Tags + +* keywords: testing, CI, CD + +## Introduction + +As part of the +=> ../systems/ci-cd.gmi CI/CD effort +there is need for automated tests to ensure that the system is working as expected. + +This document is meant to track the implementation of the automated tests and possibly the related infrastructure for running the tests. + +## Genenetwork 3 + +### Unit Tests + +There is a collection of unit tests in the *tests/unit* in the +=> https://github.com/genenetwork/genenetwork3 Genenetwork 3 repository + +### Integration + +There is (as of 2022-Feb-10) an +=> https://github.com/genenetwork/genenetwork3/tree/main/tests/integration integration tests directory +in the Genenetwork 3 repository. + +The tests there, however, are technically unit tests. Each test seems to test a single logical unit of the system e.g. correlations, gemma, etc. + +There is no test that seems to check for interactions among the logical units/modules of the system e.g. + +* authorisation <==> file-upload <==> correlations +* partial-correlations <==> trait-editting <==> gemma analysis + +etc. + +### API Tests + +There is need for tests to ensure that all expected endpoints are up and running. + +Maybe even check that the data is correct. + +### Performance and Responsiveness + +There is a need to ensure that the system does not take forever to compute stuff. + +There is a single performance tests module in +=> https://github.com/genenetwork/genenetwork3/tree/main/tests/performance the performance tests directory +for Genenetwork 3 but it is run manually, and mostly tests a very specific query that might or might not have been used in the code. + +The performance tests in GN3 should probably be focussed on checking the following (among others): + +* Each API endpoint responds within a specified amount of time +* Select computation-heavy functions respond within a specified amount of time for given data +* Database-querying functions used in the system respond within specified amount of time + +etc. + +This is relevant since GN3 is behind Nginx which defines a timeout. + +### Regression Tests + +Checks that previously working features are not broken. These can be added as we go along + +## Genenetwork 2 + +### Unit Tests + +Present under the +=> https://github.com/genenetwork/genenetwork2/tree/testing/wqflask/tests/unit unit tests directory +in the GN2 repository. + +### Integration + +Genenetwork 2 has a "Mechanical Rob" testing system that is under construction whose purpose (as far as I - fredm - can tell) is to "walk" some common paths that have multiple logical units working together, thus performing some form of integration testing. + +The only issue I (fredm) find in that as it is currently, it will not be able to test javascript interactions that are crucial to some operations in certain flows. + +### Performance and Responsiveness + +Since GN2 is not meant to handle computations itself, the bigger concern here is responsiveness. + +There might need to be checks for responsiveness built in. + +### Regression Tests + +Checks that previously working features are not broken. These can be added as we go along + +### Notes from Email Correspondence + +Selenium (and other browser-automation tools) were said to be too complicated, and are to be avoided as much as possible. It is better to use headless Firefox/Chromium and fetch pages with Mechanical Rob. + +Selenium had been previously introduced in GN2 and then swiftly removed. + + +=> https://github.com/genenetwork/gn-docs/blob/master/scripts/screenshot.rb sample script to create screen shots + + +=> https://github.com/genenetwork/gn-deploy-servers/blob/master/scripts/rabbit/monitor_websites.sh tests currently run to monitor GN2 and end points + +We should cover the GN2/GN3 ones. + + +For GN2 we should write scripts that test: + +* [ ] main menu selector +* [ ] search +* [ ] global search +* [ ] search with wild cards +* [ ] select items and add to shopping basket/collections +* [ ] mapping page +* [ ] run R/qtl mapping +* [ ] run GEMMA mapping +* [ ] run correlations +* [ ] Run the functions on the shopping basket/collections, such as CTL, network graph + + +> Please don't emulate hitting browser buttons (selenium style). Simply +> find URL paths that do the job. You may have to set cookies. +> +> Mechanical Rob can do that. +> +> If using the browser interface is too hard, then create 'back-end' +> tests that cover the real functionality. I am not too concerned about +> the browser display - i.e., real Rob catches that quickly enough. I am +> *really* worried about regressions where search etc. starts giving +> different results. +> +> See what I mean? +> +> Pj. + + +## Testing interface + +Tests in different categories should be grouped into different command-line endpoints. For example, unit tests could be run by "python3 setup.py check", integration tests could be run by "python3 setup.py integration-check", performance tests could be run by "python3 setup.py performance-check", and so on. This way, the CI will have to be configured only once, and then committers will be able to add new tests without requesting for a CI reconfiguration each time. We won't have to wait on others to respond. Less coordination will be required leading to smoother work for everyone. -- cgit v1.2.3