diff options
author | jgart | 2021-10-21 23:31:40 -0400 |
---|---|---|
committer | jgart | 2021-10-21 23:31:40 -0400 |
commit | 3adb467dbf239d12b2e5476abd44ab3d0ac1df57 (patch) | |
tree | a1d7bdf6e8c8424f33dbe6b543714f7261da079d /topics | |
parent | 352d9f223a811f49120003397b34f8d13fec17e9 (diff) | |
download | gn-gemtext-3adb467dbf239d12b2e5476abd44ab3d0ac1df57.tar.gz |
Add QC project
Diffstat (limited to 'topics')
-rw-r--r-- | topics/quality-control/qc.gmi | 33 |
1 files changed, 33 insertions, 0 deletions
diff --git a/topics/quality-control/qc.gmi b/topics/quality-control/qc.gmi new file mode 100644 index 0000000..96c4604 --- /dev/null +++ b/topics/quality-control/qc.gmi @@ -0,0 +1,33 @@ +# Quality Control Project + +Develop an app with a web interface to automate the job of cleaning tsv data +files for entry. The app would be used by a group of users on a network to +upload data. + +QC should be embedded functionality of the data uploader that Bonface has written. + +* Upload data through REST API - it goes into a temp dir for a user (data is in + escrow) - Bonface wrote this already +* Run QC - what Arthur proposes (start here) +* Show results - run tools (hard part!) +* User can say - please accept data (Bonface wrote this) +* Curator accepts data (different person!) (Bonface wrote this) +* Data gets piped into GN proper + +The QC step consists of + +* Standard checks - some GN tools, such as outliers +* Run mapping + +So, even though the data is in 'escrow' we should be able to use it as +something that is in the database. GN1 does some of that. This is +where Arun comes in - we need to have a common handler for data that +is in the database and data that is in escrow. My idea is that this +will all be text files (truth files). A simple first QC step is to +check that all fields in the table are numbers where should be. Not +text. + +Note we could run QC through the REST API too. That would allow it to +be run from R and Python and Jupyter notebooks. Make it part of GN3. + +The tricky part is still how the data is handled in escrow. |