diff options
Diffstat (limited to 'tasks')
| -rw-r--r-- | tasks/alexm.gmi | 89 | ||||
| -rw-r--r-- | tasks/bonfacem.gmi | 76 | ||||
| -rw-r--r-- | tasks/felixl.gmi | 105 | ||||
| -rw-r--r-- | tasks/hasitha.gmi | 37 | ||||
| -rw-r--r-- | tasks/johannesm.gmi | 90 | ||||
| -rw-r--r-- | tasks/machine-room.gmi | 2 |
6 files changed, 373 insertions, 26 deletions
diff --git a/tasks/alexm.gmi b/tasks/alexm.gmi index 7ec8e87..e410046 100644 --- a/tasks/alexm.gmi +++ b/tasks/alexm.gmi @@ -75,6 +75,95 @@ You can refine the search by constraining the checks some more, e.g. to get high * masters ; submit documents +## 9/06/2025 + +* [x] no login for gnqna +* [-] hsrat rqtl2 integration: follow up on the dumping genotypes files to lmdb + + +## 16/06/2025 + +* [x] implementation for no login for gnqa users. + see issue here: https://issues.genenetwork.org/issues/gnqa/implement-no-login-requirement-for-gnqa +* [-] hsrat rqtl2 integration: follow up on the dumping genotypes files to lmdb with bons +* [] create rqtl2 adapter for reading cross from lmdb + + +## 23/06/2025 + +* [-] focus on documentation/source code for read cross ;;add option for reading lmdb +* [-] looking at thor an lmdb interface +* [x] implementation of no login for gnqna users ;; TODO push this code to cd. +* [x] for nologin llm provide the correct feedbacks to user if ai search does not meet criteria;; currently only static response `Please login to view AI generated summary` + + +## 30/06/2025 + +* [] rqtl2 lmdb interfac +e + +## 7/7/2025 + +* [x] guix packaging for thor package +* [x] reading metadata from lmdb file using thor +* [-] working on reconstructing the genotype files to geno, geno_map and pheno_map file +* [x] rate limiting for gnqna users. +* [x] check on tokens for no logged in users +* [x] Look at isse about signing tokens for non logged in users + + +## 21/7/2025 +* [x] working on reconstructing/reading the genotype files based on bons dumping script +* [x] generate and validate cross objects + +## 29/7/2025 +* [x] adding founder_geno and pheno covariates pheno +* [-] minor fixes for lmdb matrix script (missing metadata) +* [-] check phenotype work for lmdb + +## 4/8/2025 +* [-] refactoring lmdb matrix script +* [-] integrating fetching rqtl2 from lmdb using bxd as a test pilot + + +## 11/8/2025 + +* [x] integrating lmdb genotypes for rqtl2 computation for BXD + + +## 18/8/2025 + +* [x] integrating lmdb genotypes for rqtl2 computation for BXD + +## 25/8/2025 +* [x] script to dump phenotypes and cross metadata to lmdb + +## 31/8/2025 + +* [x] generic script that can parse json cross files and dump the metadata to lmdb +* [x] follow up on dumped phenotypes in LMDB for GeneNetwork cc @bons +* [x] integrating lmdb rqtl2 adapter to CD (test this on cd for bxd.) + +## 23/9/2025 +* [x] dumping phenotypes to lmdb (BXDPublish) +* [x] dumping cross metadata to lmdb + +## 30/9/2025 +* [x] profiling and benchmarking read_lmdb_cross against read_cross +* [-] Run this on tux02 and integrate to CD + + +## 7/9/2025 + +* [-] integrate rqtl2-lmdb adapter to CD cc @bons with directory setup. +* [x] improvement on rqtl2-lmdb adapter script; add covariates supports. + + +## 15/9/2025 + +* [-] clone qtl2 repo and build rqtl2-lmdb package locally. +* [] package rqtl2-lmdb package to guix-bioinformatics. + ## Next week(s) * [ ] Accelerate Xapian functionality - needs Aider key from Pjotr diff --git a/tasks/bonfacem.gmi b/tasks/bonfacem.gmi index 03848f1..2d56e72 100644 --- a/tasks/bonfacem.gmi +++ b/tasks/bonfacem.gmi @@ -4,41 +4,46 @@ * kanban: bonfacem * assigned: bonfacem -* status: in progress ## Tasks ### Note -- GN-auth dashboard fixes. Follow up with Fred. -- Case-attributes used in co-variates. -- Encourage FahamuAI to be open. +* Don't lose metadata. Have an array of disabled snips. +* Store by snip (rows). Storage by marker. 2 different files. +* gn-auth: + have wrappers around gn-auth (draw-back: folk may forget). + use Nginx as a MTM (re-routing calls). How to add handler in Nginx and to work with tokens. +* GN-auth dashboard fixes. Follow up with Fred. +* Case-attributes used in co-variates. + +### PhD Work + +* Concept note/ideas: Add all metadata in GN to an LLM that enriches GnQA. +* Use mapping output as full vectors for gpt/transformers. Integrate this work into GN. +* Share PhD concept note with PJ for polishing ### This week -* [+] Case Attributes (Do a diagnostic and delegate) -* - Git blame. Add tests. -* - Error when checking the history. -* - Reach out to Zach. -* - Disable diff in the UI. -* [ ] Distinct admin and dev user. -* [ ] Adapter to LMDB into a cross object. +* [] Look at deep-seek/anthropic (also really doc deployment in balg01). Run in debian machine. +* [~] Adapter to LMDB into a cross object. * - Try computations with R/qtl2. * - Look at R LMDB libraries. * - Look at functions that read the files. * - PJ: LMDB adapter in R and cross-type files. -* [ ] Send Arun an e-mail on how to go about upgrading shepherd. -* [ ] Dump all genotypes from production to LMDB. +* [~] gn-guile webhook. +* [~] Dump all genotypes from tux02 to LMDB. * - PJ sync tux01 genotypes with tux02/04. +* - Yet to set-up 2FA on new device + +### Later +* [ ] Generate RDF docs using AI. +* [ ] Editing genotype metadata +* [ ] Look at XAPIAN search for gene alias. +* [ ] Add GeneNetwork abstracts to XAPIAN search. * [+] Correlations hash. * - Add dataset count to RDF. * [ ] Spam + LLMs * - RateLimiting for Rif Editing. -* - Honep Pot approach. -* [+] Help Alex with SSL certification container error. -* - Put the changes in the actual scm files. -* [X] Python Fahamu. -* [X] Memvid - brief look. - -### Later +* - Honepot approach. * [ ] Dockerise GN container. For Harm. * [ ] Send emails when job fail. * [ ] Look at updating gn-auth/gn-libs to PYTHONPATH for gn2/3. @@ -92,7 +97,8 @@ Currently closed issues are: * [X] lmdb publishdata output and share with Pjotr and Johannes ## Done - +* [X] Updated Penguin2 with cuda drivers. Cuda no longer supports K80s +* [X] Provided Johannes anthropic tokens * [X] Add lmdb output hashes with index and export LMDB_DATA_DIRECTORY * [X] Share small database with @pjotrp and @felixl * [X] With Alex get rqtl2 demo going in CD (for BXD) @@ -131,3 +137,31 @@ Currently closed issues are: * - Simplify (focus on small files). Don't over-rely on Numpy. * [X] Assess adding GeneRIF to LLM. * [X] Referrer headers -- a way of preventing bots beyond rate-limiting. +* [X] Python Fahamu. +* [X] Memvid - brief look. +* [X] Encourage FahamuAI to be open. +* Another paper with his group should be out this month +* [X] Help Alex with SSL certification container error. +* - Fix SSL issues in local container. +* [X] Send Arun an e-mail on how to go about upgrading shepherd. +* [X] Case Attributes. +* - Git blame. Add tests. Fred. +* - NOTE: Fixed the diffs. But there's an edge-case with BXD longevity (I haven't checked. Shared scripts) +* - NOTE: Elpy broke. Eglot/lspemacs doesn't work. +* - NOTE: Moved away from storing diffs in files to LMDB. +* - Error when checking the history. Fixed by fixing the diffs. +* - Reach out to Zach. NOTE: Timing differences. +* - Disable diff in the UI - unnecessary. +* [X] Added LMDB_PATH to dev container. Updated old commits. +* [X] Merged no-login AI work that Alex did. +* [X] Talk to Fred and hand over case-attributes. +* [X] Distinct admin and dev user. [w/ Fred] +* - Extra fluff to grant dev user access to everything. +* [X] Merged rate-limiter. +* [X] Look at slow running CD (look at issue tracker and be systematic). +* [X] Fix CD. Build guix against a recent pinned profile. +* [X] Fix CD tests. +* [X] Look at different provider(s) for LLMs. +* [X] Install OpenCL in Penguin2 and try LLM script. Check differences between OpenCL and CUDA. PJ installed CUDA in balg01. +* [X] Look at container work. Look at permissions issue. +* [X] Set-up wolfshead: Resolving dependency conflicts in Python. Using DSPY diff --git a/tasks/felixl.gmi b/tasks/felixl.gmi index 347f387..7a472a1 100644 --- a/tasks/felixl.gmi +++ b/tasks/felixl.gmi @@ -41,7 +41,7 @@ * PhD tasks * [ X ] Complete and share concept note and timeline to supervisors, have a meeting for progress - * [ ] Make a milestone on chapter one manuscript (deep dive into the selected papers){THE BIG PICTURE; a complete draft by early May} + * [+] Make a milestone on chapter one manuscript (deep dive into the selected papers){THE BIG PICTURE; a complete draft by early May} * Programming * [ ] Make a milestone with the uploader (really push and learn!) @@ -116,9 +116,106 @@ * [X] - document the new findings about smoothing using bcftools and plink * ## this week (09-06-onwards) -* [ ] - identify start and end points for haplotypes in hs genotype files -* [ ] - upload the final updates to gn2, test and see the results -* [ ] - gn-uploader/uploader folder, explore +* [+] - identify start and end points for haplotypes in hs genotype files +* [+] - upload the final updates to gn2, test and see the results +* [-] - gn-uploader/uploader folder, explore + +* ## this week (16-06-onwards) +* [X] - hs rats proximal and distal haplotype edges +* [+] - uploading kilifish using the backend route + +* ## this week (23-06-onwards) +* [X] - hs rats recombination counts +* [+] - kilifish to gn2 via backend + +* ## this week (30-06-onwards) +* [ ] - mapping offsprings to founders (hs rats) +* [ ] - upload kilifish to genenetwork +* [ ] - revise celegans smoothing (genotypes) + +* ## this week (07-07-onwards) +* [X] - generate haplotypes for offsprings and founders combined; intepretation next.., +* [+] - keep improving the uploader via data uploading and error solving +* [-] - close smoothing revision for celegans, as left before +* [X] - why should people read my paper on improving genotyping methods? +* - on smoothing (low density genotypes for mapping, high density genotypes for fine mapping.,) +* - liftovers due to reference versions (currently, a challenge to be looked upon) +* - founders and their offsprings in genotyping +* - pangenomics and machine learning for improved genotyping + +** keys (+; in progress, X; done, -; not yet) +* ## this week (14-07-onwards) +* [+] - map founders to offspring, work with only pure recombiantions + [+] - tools available? (plink, rqtl2, beagle, etc) + [+] - custom pipeline, to reflect gaps in the existing tools? (dealing with multiparent species) + [+] - documentation for the paper write up + +* ## this week (21-07-onwards) +* [ ] - HS rats smoothing continues +* [ ] - documenting the milestones +* [ ] - see the possibility to write a tool from it +* [ ] - Pushing kilifish to genenetwork2/learn the source code build up +* [ ] - resmoothen celegans genotypes with the new knowledge + +* ## this week (28-07-onwards) +* [-] - predict genotype probabilities with rqlt2 functions + - problems with control setup to load in the needed files for the functions +* [+] - comparison models for @individual rat vs 8 founders (similarities and percentage composition) + [+] - ongoing discussion with alex, there's progress + +* ## this week (04-08-onwards) +* [+] - Testing the logic to infer Hs outbred genotypes with the founders + - Managed to identify parents of origin for each snp on each rat per position, corresponding to the 8 founders + - Still, need to filter in the disntictive snps, then generate haplo blocks., + +* ## this week (11 - 08 - onwards) +* [X] - generate final haplo file and document +* [+] - testing on local gemma and in gn2 + +* ## this week (18-08-onwards) +* [+] - push for the file to be in gn2, and feedback from the team +* [X] - complete the local gemma run, interpret the results +* [+] - process the rest of the Xsomes for a ready file to go to gn2 +* - issues: over filtering snps, neglecting the one parent of origin, takes long to run. +* [+] - prepare an abstract for CTC conference in Barcelona + +* ## this week (01-09-onWards) +* [ ] - finetune abstract +* - include more of what i achieved: main focus; genotype smoothing on models with complex traits +* - thought map: generate plots, compare before and after smoothing, check for overlaps, and whether or not the peaks in traits are same before and after smoothing +* [ ] - troubleshoot inferring scripts for all Xsomes +* - request bonz/alex's help on this (to save time) + + +* ## this week (30-06-onwards) +* [X] - mapping offsprings to founders (hs rats) +* [+] - upload kilifish to genenetwork +* [-] - revise celegans smoothing (genotypes) + +* ## this week (07-07-onwards) +* [X] - generate haplotypes for offsprings and founders combined; intepretation next.., +* [+] - keep improving the uploader via data uploading and error solving +* [-] - close smoothing revision for celegans, as left before +* [X] - why should people read my paper on improving genotyping methods? +* - on smoothing (low density genotypes for mapping, high density genotypes for fine mapping.,) +* - liftovers due to reference versions (currently, a challenge to be looked upon) +* - founders and their offsprings in genotyping +* - pangenomics and machine learning for improved genotyping + +** keys (+; in progress, X; done, -; not yet) +* ## this week (14-07-onwards) +* [+] - map founders to offspring, work with only pure recombiantions + [+] - tools available? (plink, rqtl2, beagle, etc) + [+] - custom pipeline, to reflect gaps in the existing tools? (dealing with multiparent species) + [+] - documentation for the paper write up + +* ## this week (21-07-onwards) +* [ ] - HS rats smoothing continues +* [ ] - documenting the milestones +* [ ] - see the possibility to write a tool from it +* [ ] - Pushing kilifish to genenetwork2/learn the source code build up +* [ ] - resmoothen celegans genotypes with the new knowledge + ### Later weeks (non-programming tasks) diff --git a/tasks/hasitha.gmi b/tasks/hasitha.gmi new file mode 100644 index 0000000..fcef29b --- /dev/null +++ b/tasks/hasitha.gmi @@ -0,0 +1,37 @@ +# Tasks for Hasitha + +## Tags + +* kanban: hasitha +* assigned: hasitha +* status: in progress + +## Tasks + +### Notes +* + +### This week +* [ ] Implementing CRAM encoding methods in GBAM +* - [ ] ReadName tokenization pipeline - could save ~6% space using this technique. Need to work on decoding. +* [ ] Starting off with Cigar compression sub project with Andrea +* [ ] Discuss AGC and population compression with Andrea + +### Later +* [ ] + +### Even later + +* [ ] + +### On Hold +* [ ] GBAM reader using noodles + +## Done + +* [X] Moving GBAM python and rust stuff to C +* [X] Fixing memory issues in C +* [X] SAM input to GBAM in Rust +* [X] agc-rs setup testing on M1 mac + + diff --git a/tasks/johannesm.gmi b/tasks/johannesm.gmi new file mode 100644 index 0000000..840bd3e --- /dev/null +++ b/tasks/johannesm.gmi @@ -0,0 +1,90 @@ +# Tasks for Johannes + +## Tags + +* kanban: johannesm +* assigned: johannesm +* status: in progress + +## Tasks + +### Ongoing + +* [] Get system into use in GN + +* [] Draft outline paper + +* [] Talk with Hao about publication on agent system and gain over GNQA + +* [] Read literature for paper + + + +### Later + +* [~] Catch up on SPARQL + +* [~] Catch up on LMDB + + +### Past + +* [X] Make RAG script available for reuse and clean it with Bonface + +* [X] Discuss with Bonface on how to get metadata for RAG + +* [X] Pickle RAG -> not successful :) + +* [X] Try out SPARQLWrapper + +* [X] Optimize RAG + +* [X] Integrate with RAG + +* [X] Get actual metadata with SPARQL endpoint + +* [X] Make RAG agentic -> AI system + +* [X] Visit precompute issue + +* [X] Find more affordable options for GNQA + +* [X] Replace GNQA backend with Bonz + +* [X] Test GPU + +* [X] Test new AI system + +* [X] Use GPU to make naturalization faster + +* [X] Fix bugs and optimize AI system + +* [X] Test and validate AI system working + +* [X] Read up on performance evaluation for AI systems + +* [X] Refetch all data from SPARQL + +* [X] Preproces RDF for improved naturalization + +* [X] Make asynchronous requests to server for naturalization + +* [X] Document work on issue tracker + +* [X] Package code AI system + +* [X] Draft API endpoint + +* [X] Test package and share with Bonz + +* [X] Test AI system with descriptions, qtl and real biology questions + +* [X] Compare performance of Claude and open model on GN data before finetuning + +* [X] Look into system finetuning + +* [X] Test performance gain + +* [X] Think about how to show responses of agent system and GNQA in one UI + +* [X] Get API working diff --git a/tasks/machine-room.gmi b/tasks/machine-room.gmi index 77f7b8e..d656f2f 100644 --- a/tasks/machine-room.gmi +++ b/tasks/machine-room.gmi @@ -63,7 +63,7 @@ Security: * [X] describe machines with Rick Stripes * [X] get bacchus back on line * [X] fix www.genenetwork.org and gn2.genenetwork.org https -* [-] get data from summer211.uthsc.edu (access machine room) +* [-] get data from summer211 (access machine room) * [X] VPN access and FoUT * [X] lambda: get fiber working * [X] lambda: add to Octopus HPC |
