summary refs log tree commit diff
path: root/tasks
diff options
context:
space:
mode:
Diffstat (limited to 'tasks')
-rw-r--r--tasks/alexm.gmi89
-rw-r--r--tasks/bonfacem.gmi76
-rw-r--r--tasks/felixl.gmi105
-rw-r--r--tasks/hasitha.gmi37
-rw-r--r--tasks/johannesm.gmi90
-rw-r--r--tasks/machine-room.gmi2
6 files changed, 373 insertions, 26 deletions
diff --git a/tasks/alexm.gmi b/tasks/alexm.gmi
index 7ec8e87..e410046 100644
--- a/tasks/alexm.gmi
+++ b/tasks/alexm.gmi
@@ -75,6 +75,95 @@ You can refine the search by constraining the checks some more, e.g. to get high
 
 * masters ; submit documents 
 
+## 9/06/2025
+
+* [x] no login for gnqna
+* [-] hsrat rqtl2 integration: follow up on the dumping genotypes files to lmdb 
+
+
+## 16/06/2025
+
+* [x] implementation for  no login  for gnqa users.
+ see issue here: https://issues.genenetwork.org/issues/gnqa/implement-no-login-requirement-for-gnqa
+* [-] hsrat rqtl2 integration: follow up on the dumping genotypes files to lmdb with bons 
+* [] create rqtl2 adapter for reading cross from  lmdb 
+
+
+## 23/06/2025
+
+*  [-]  focus on documentation/source code  for read cross  ;;add option for reading lmdb
+*  [-]  looking at thor an lmdb interface
+*  [x] implementation of no login for gnqna users ;; TODO push this code to cd.
+* [x]  for nologin llm provide the correct feedbacks to user  if ai search does not meet criteria;; currently only static response `Please login to view AI generated summary`
+
+
+## 30/06/2025
+
+* [] rqtl2 lmdb interfac
+e
+
+## 7/7/2025
+
+* [x] guix packaging for thor package
+* [x] reading metadata from lmdb file using thor 
+* [-] working on reconstructing the genotype files to geno, geno_map and pheno_map file
+* [x] rate limiting for gnqna users.
+* [x]  check on tokens for no logged in users
+* [x]  Look at isse about signing tokens for non logged in users 
+
+
+## 21/7/2025
+* [x] working on reconstructing/reading  the genotype files based on bons dumping script 
+* [x]  generate and validate cross objects
+
+## 29/7/2025
+* [x] adding founder_geno and pheno covariates pheno
+* [-] minor fixes for lmdb matrix script (missing metadata)
+* [-] check phenotype work  for lmdb
+
+## 4/8/2025
+* [-] refactoring lmdb matrix script
+* [-] integrating fetching rqtl2 from lmdb using bxd as a test pilot
+
+
+##  11/8/2025
+
+* [x] integrating lmdb genotypes  for rqtl2 computation for BXD
+
+
+##  18/8/2025
+
+* [x] integrating lmdb genotypes for rqtl2 computation for BXD
+
+## 25/8/2025
+* [x] script to dump phenotypes and cross metadata to lmdb
+
+## 31/8/2025
+
+* [x] generic script that can  parse json cross files and dump the metadata to lmdb
+* [x] follow up on dumped phenotypes in LMDB for GeneNetwork cc @bons
+* [x] integrating lmdb rqtl2 adapter to CD (test this on cd for bxd.)
+
+## 23/9/2025
+* [x] dumping  phenotypes to lmdb (BXDPublish)
+* [x] dumping cross metadata to lmdb
+
+## 30/9/2025
+* [x]  profiling and benchmarking  read_lmdb_cross against read_cross
+* [-] Run this on tux02 and integrate to CD
+
+
+## 7/9/2025
+
+* [-] integrate rqtl2-lmdb adapter to CD cc @bons with directory setup.
+* [x] improvement on rqtl2-lmdb adapter script;  add covariates supports.
+
+
+## 15/9/2025
+
+* [-] clone qtl2 repo and  build rqtl2-lmdb package locally.
+* [] package rqtl2-lmdb package to guix-bioinformatics.
+
 ## Next week(s)
 
 * [ ] Accelerate Xapian functionality - needs Aider key from Pjotr
diff --git a/tasks/bonfacem.gmi b/tasks/bonfacem.gmi
index 03848f1..2d56e72 100644
--- a/tasks/bonfacem.gmi
+++ b/tasks/bonfacem.gmi
@@ -4,41 +4,46 @@
 
 * kanban: bonfacem
 * assigned: bonfacem
-* status: in progress
 
 ## Tasks
 
 ### Note
-- GN-auth dashboard fixes.  Follow up with Fred.
-- Case-attributes used in co-variates.
-- Encourage FahamuAI to be open.
+* Don't lose metadata.   Have an array of disabled snips.
+* Store by snip (rows).  Storage by marker.  2 different files.
+* gn-auth:
+  have wrappers around gn-auth (draw-back: folk may forget).
+  use Nginx as a MTM (re-routing calls).  How to add handler in Nginx and to work with tokens.
+* GN-auth dashboard fixes.  Follow up with Fred.
+* Case-attributes used in co-variates.
+
+### PhD Work
+
+* Concept note/ideas: Add all metadata in GN to an LLM that enriches GnQA.
+* Use mapping output as full vectors for gpt/transformers.  Integrate this work into GN.
+* Share PhD concept note with PJ for polishing
 
 ### This week
-* [+] Case Attributes (Do a diagnostic and delegate)
-*     - Git blame.  Add tests.
-*     - Error when checking the history.
-*     - Reach out to Zach.
-*     - Disable diff in the UI.
-* [ ] Distinct admin and dev user.
-* [ ] Adapter to LMDB into a cross object.
+* [] Look at deep-seek/anthropic (also really doc deployment in balg01).  Run in debian machine.
+* [~] Adapter to LMDB into a cross object.
 *     - Try computations with R/qtl2.
 *     - Look at R LMDB libraries.
 *     - Look at functions that read the files.
 *     - PJ: LMDB adapter in R and cross-type files.
-* [ ] Send Arun an e-mail on how to go about upgrading shepherd.
-* [ ] Dump all genotypes from production to LMDB.
+* [~] gn-guile webhook.
+* [~] Dump all genotypes from tux02 to LMDB.
 *     - PJ sync tux01 genotypes with tux02/04.
+*     - Yet to set-up 2FA on new device
+
+### Later
+* [ ] Generate RDF docs using AI.
+* [ ] Editing genotype metadata
+* [ ] Look at XAPIAN search for gene alias.
+* [ ] Add GeneNetwork abstracts to XAPIAN search.
 * [+] Correlations hash.
 *     - Add dataset count to RDF.
 * [ ] Spam + LLMs
 *     - RateLimiting for Rif Editing.
-*     - Honep Pot approach.
-* [+] Help Alex with SSL certification container error.
-*     - Put the changes in the actual scm files.
-* [X] Python Fahamu.
-* [X] Memvid - brief look.
-
-### Later
+*     - Honepot approach.
 * [ ] Dockerise GN container.   For Harm.
 * [ ] Send emails when job fail.
 * [ ] Look at updating gn-auth/gn-libs to PYTHONPATH for gn2/3.
@@ -92,7 +97,8 @@ Currently closed issues are:
 * [X] lmdb publishdata output and share with Pjotr and Johannes
 
 ## Done
-
+* [X] Updated Penguin2 with cuda drivers.  Cuda no longer supports K80s
+* [X] Provided Johannes anthropic tokens
 * [X] Add lmdb output hashes with index and export LMDB_DATA_DIRECTORY
 * [X] Share small database with @pjotrp and @felixl
 * [X] With Alex get rqtl2 demo going in CD (for BXD)
@@ -131,3 +137,31 @@ Currently closed issues are:
 *     - Simplify (focus on small files).  Don't over-rely on Numpy.
 * [X] Assess adding GeneRIF to LLM.
 * [X] Referrer headers -- a way of preventing bots beyond rate-limiting. 
+* [X] Python Fahamu.
+* [X] Memvid - brief look.
+* [X] Encourage FahamuAI to be open.
+*     Another paper with his group should be out this month
+* [X] Help Alex with SSL certification container error.
+*     - Fix SSL issues in local container.
+* [X] Send Arun an e-mail on how to go about upgrading shepherd.
+* [X] Case Attributes.
+*     - Git blame.  Add tests.  Fred.
+*     - NOTE: Fixed the diffs.  But there's an edge-case with BXD longevity (I haven't checked.  Shared scripts)
+*     - NOTE: Elpy broke.  Eglot/lspemacs doesn't work.
+*     - NOTE: Moved away from storing diffs in files to LMDB.
+*     - Error when checking the history.  Fixed by fixing the diffs.
+*     - Reach out to Zach.  NOTE: Timing differences.
+*     - Disable diff in the UI - unnecessary.
+* [X] Added LMDB_PATH to dev container.   Updated old commits.
+* [X] Merged no-login AI work that Alex did.
+* [X] Talk to Fred and hand over case-attributes.
+* [X] Distinct admin and dev user. [w/ Fred]
+*     - Extra fluff to grant dev user access to everything.
+* [X] Merged rate-limiter.
+* [X] Look at slow running CD (look at issue tracker and be systematic).
+* [X] Fix CD.  Build guix against a recent pinned profile.
+* [X] Fix CD tests.
+* [X] Look at different provider(s) for LLMs.
+* [X] Install OpenCL in Penguin2 and try LLM script.  Check differences between OpenCL and CUDA.  PJ installed CUDA in balg01.
+* [X] Look at container work.  Look at permissions issue.
+* [X] Set-up wolfshead: Resolving dependency conflicts in Python.  Using DSPY
diff --git a/tasks/felixl.gmi b/tasks/felixl.gmi
index 347f387..7a472a1 100644
--- a/tasks/felixl.gmi
+++ b/tasks/felixl.gmi
@@ -41,7 +41,7 @@
 
 * PhD tasks 
   * [ X ] Complete and share concept note and timeline to supervisors, have a meeting for progress 
-  * [ ] Make a milestone on chapter one manuscript (deep dive into the selected papers){THE BIG PICTURE; a complete draft by early May} 
+  * [+] Make a milestone on chapter one manuscript (deep dive into the selected papers){THE BIG PICTURE; a complete draft by early May} 
 
 * Programming 
   * [ ] Make a milestone with the uploader (really push and learn!) 
@@ -116,9 +116,106 @@
 * [X] - document the new findings about smoothing using bcftools and plink
 
 * ## this week (09-06-onwards) 
-* [ ] - identify start and end points for haplotypes in hs genotype files 
-* [ ] - upload the final updates to gn2, test and see the results 
-* [ ] - gn-uploader/uploader folder, explore 
+* [+] - identify start and end points for haplotypes in hs genotype files 
+* [+] - upload the final updates to gn2, test and see the results 
+* [-] - gn-uploader/uploader folder, explore 
+
+* ## this week (16-06-onwards)
+* [X] - hs rats proximal and distal haplotype edges
+* [+] - uploading kilifish using the backend route
+
+* ## this week (23-06-onwards)
+* [X] - hs rats recombination counts
+* [+] - kilifish to gn2 via backend
+
+* ## this week (30-06-onwards)
+* [ ] - mapping offsprings to founders (hs rats)
+* [ ] - upload kilifish to genenetwork
+* [ ] - revise celegans smoothing (genotypes)
+
+* ## this week (07-07-onwards) 
+* [X] - generate haplotypes for offsprings and founders combined; intepretation next.., 
+* [+] - keep improving the uploader via data uploading and error solving 
+* [-] - close smoothing revision for celegans, as left before 
+* [X] - why should people read my paper on improving genotyping methods? 
+*        - on smoothing (low density genotypes for mapping, high density genotypes for fine mapping.,)  
+*        - liftovers due to reference versions (currently, a challenge to be looked upon) 
+*        - founders and their offsprings in genotyping 
+*        - pangenomics and machine learning for improved genotyping 
+
+** keys (+; in progress, X; done, -; not yet) 
+* ## this week (14-07-onwards) 
+* [+] - map founders to offspring, work with only pure recombiantions 
+     [+] - tools available? (plink, rqtl2, beagle, etc)  
+     [+] - custom pipeline, to reflect gaps in the existing tools? (dealing with multiparent species)  
+     [+] - documentation for the paper write up
+
+* ## this week (21-07-onwards) 
+* [ ] - HS rats smoothing continues 
+*    [ ] - documenting the milestones 
+*    [ ] - see the possibility to write a tool from it 
+* [ ] - Pushing kilifish to genenetwork2/learn the source code build up 
+* [ ] - resmoothen celegans genotypes with the new knowledge  
+
+* ## this week (28-07-onwards)
+* [-] - predict genotype probabilities with rqlt2 functions
+       - problems with control setup to load in the needed files for the functions
+* [+] - comparison models for @individual rat vs 8 founders (similarities and percentage composition)
+     [+] - ongoing discussion with alex, there's progress
+
+* ## this week (04-08-onwards)
+* [+] - Testing the logic to infer Hs outbred genotypes with the founders 
+           - Managed to identify parents of origin for each snp on each rat per position, corresponding to the 8 founders
+           - Still, need to filter in the disntictive snps, then generate haplo blocks., 
+
+* ## this week (11 - 08 - onwards) 
+* [X] - generate final haplo file and document 
+* [+] - testing on local gemma and in gn2 
+
+* ## this week (18-08-onwards) 
+* [+] - push for the file to be in gn2, and feedback from the team 
+* [X] - complete the local gemma run, interpret the results 
+* [+] - process the rest of the Xsomes for a ready file to go to gn2 
+*       - issues: over filtering snps, neglecting the one parent of origin, takes long to run.  
+* [+] - prepare an abstract for CTC conference in Barcelona  
+
+* ## this week (01-09-onWards) 
+* [ ] - finetune abstract 
+*       - include more of what i achieved: main focus; genotype smoothing on models with complex traits 
+*       - thought map: generate plots, compare before and after smoothing, check for overlaps, and whether or not the peaks in traits are same before and after smoothing 
+* [ ] - troubleshoot inferring scripts for all Xsomes 
+*       - request bonz/alex's help on this (to save time) 
+
+
+* ## this week (30-06-onwards)  
+* [X] - mapping offsprings to founders (hs rats) 
+* [+] - upload kilifish to genenetwork 
+* [-] - revise celegans smoothing (genotypes) 
+
+* ## this week (07-07-onwards) 
+* [X] - generate haplotypes for offsprings and founders combined; intepretation next.., 
+* [+] - keep improving the uploader via data uploading and error solving 
+* [-] - close smoothing revision for celegans, as left before 
+* [X] - why should people read my paper on improving genotyping methods? 
+*        - on smoothing (low density genotypes for mapping, high density genotypes for fine mapping.,)  
+*        - liftovers due to reference versions (currently, a challenge to be looked upon) 
+*        - founders and their offsprings in genotyping 
+*        - pangenomics and machine learning for improved genotyping 
+
+** keys (+; in progress, X; done, -; not yet) 
+* ## this week (14-07-onwards) 
+* [+] - map founders to offspring, work with only pure recombiantions 
+     [+] - tools available? (plink, rqtl2, beagle, etc)  
+     [+] - custom pipeline, to reflect gaps in the existing tools? (dealing with multiparent species)  
+     [+] - documentation for the paper write up
+
+* ## this week (21-07-onwards) 
+* [ ] - HS rats smoothing continues 
+*    [ ] - documenting the milestones 
+*    [ ] - see the possibility to write a tool from it 
+* [ ] - Pushing kilifish to genenetwork2/learn the source code build up 
+* [ ] - resmoothen celegans genotypes with the new knowledge  
+
 
 ### Later weeks (non-programming tasks)
 
diff --git a/tasks/hasitha.gmi b/tasks/hasitha.gmi
new file mode 100644
index 0000000..fcef29b
--- /dev/null
+++ b/tasks/hasitha.gmi
@@ -0,0 +1,37 @@
+# Tasks for Hasitha
+
+## Tags
+
+* kanban: hasitha
+* assigned: hasitha
+* status: in progress
+
+## Tasks
+
+### Notes
+* 
+
+### This week
+* [ ] Implementing CRAM encoding methods in GBAM
+*       - [ ] ReadName tokenization pipeline - could save ~6% space using this technique. Need to work on decoding.
+* [ ] Starting off with Cigar compression sub project with Andrea
+* [ ] Discuss AGC and population compression with Andrea
+
+### Later
+* [ ] 
+
+### Even later
+
+* [ ]
+
+### On Hold
+* [ ] GBAM reader using noodles
+
+## Done
+
+* [X] Moving GBAM python and rust stuff to C
+* [X] Fixing memory issues in C
+* [X] SAM input to GBAM in Rust
+* [X] agc-rs setup testing on M1 mac
+
+
diff --git a/tasks/johannesm.gmi b/tasks/johannesm.gmi
new file mode 100644
index 0000000..840bd3e
--- /dev/null
+++ b/tasks/johannesm.gmi
@@ -0,0 +1,90 @@
+# Tasks for Johannes
+
+## Tags
+
+* kanban: johannesm
+* assigned: johannesm
+* status: in progress
+
+## Tasks
+
+### Ongoing
+
+* [] Get system into use in GN
+
+* [] Draft outline paper
+
+* [] Talk with Hao about publication on agent system and gain over GNQA
+
+* [] Read literature for paper
+
+
+
+### Later
+
+* [~] Catch up on SPARQL
+
+* [~] Catch up on LMDB
+
+
+### Past
+
+* [X] Make RAG script available for reuse and clean it with Bonface
+
+* [X] Discuss with Bonface on how to get metadata for RAG
+
+* [X] Pickle RAG -> not successful :)
+
+* [X] Try out SPARQLWrapper
+
+* [X] Optimize RAG
+
+* [X] Integrate with RAG
+
+* [X] Get actual metadata with SPARQL endpoint
+
+* [X] Make RAG agentic -> AI system
+
+* [X] Visit precompute issue
+
+* [X] Find more affordable options for GNQA
+
+* [X] Replace GNQA backend with Bonz
+
+* [X] Test GPU
+
+* [X] Test new AI system
+
+* [X] Use GPU to make naturalization faster
+
+* [X] Fix bugs and optimize AI system
+
+* [X] Test and validate AI system working
+
+* [X] Read up on performance evaluation for AI systems
+
+* [X] Refetch all data from SPARQL
+
+* [X] Preproces RDF for improved naturalization
+
+* [X] Make asynchronous requests to server for naturalization
+
+* [X] Document work on issue tracker
+
+* [X] Package code AI system
+
+* [X] Draft API endpoint
+
+* [X] Test package and share with Bonz
+
+* [X] Test AI system with descriptions, qtl and real biology questions
+
+* [X] Compare performance of Claude and open model on GN data before finetuning
+
+* [X] Look into system finetuning
+
+* [X] Test performance gain
+
+* [X] Think about how to show responses of agent system and GNQA in one UI
+
+* [X] Get API working
diff --git a/tasks/machine-room.gmi b/tasks/machine-room.gmi
index 77f7b8e..d656f2f 100644
--- a/tasks/machine-room.gmi
+++ b/tasks/machine-room.gmi
@@ -63,7 +63,7 @@ Security:
 * [X] describe machines with Rick Stripes
 * [X] get bacchus back on line
 * [X] fix www.genenetwork.org and gn2.genenetwork.org https
-* [-] get data from summer211.uthsc.edu (access machine room)
+* [-] get data from summer211 (access machine room)
 * [X] VPN access and FoUT
 * [X] lambda: get fiber working
 * [X] lambda: add to Octopus HPC