# GNSoC 2023 GN Summer of Code ## Introduction We are running a GN Summer of Code in small teams. * Runs July + August 2023 * Weekly plenary where projects present progress - Thursday 9am EU, 10 am EAT. * Projects should be (slightly) out of comfort zone * Use gemtext documentation * Option for publishing on progress by then end as a BLOG or BioHackrXiv ## CI for guix-bioinformatics (guix pull) * lead: Arun * team: Efraim, Pjotr, Sarthak Making GN deployment rock solid git repo genenetwork-machines, guix-bioinformatics => ../../issues/gnsoc-ci-rethink Tracking progress ### Week 1 * Proposal written - guix pull on guix-bioinformatics - updated guix is broken gemma (Pjotr) * Efraim guix GN2 - so we can have a channel * Next step build substitutes for guix-bioinformatics - once built they are shared * And create a GN3 channel (Efraim?) ### Week 2 * RISC-V port progressing with node and zig 0.10 * guix-bioinformatics now has CI! * ~700 packages, 240 are broken ;) => https://ci.genenetwork.org/jobs/guix-bioinformatics * CI/CD is up and running again (and broken) * Rethink: channels and pull channels are used for CI/CD * Move unused packages elsewhere ## Nextgen databases lmdb+RDF * lead: Bonface * team: Fred, Alex * contact: Pjotr git repo genenetwork3 => ../../topics/next-gen-databases/design-doc Design doc ### Week 1 * RDF dumps * Parsing S-exp -> markdown * Hashing tables (Fred) - automated updates * Some progress on sample data from SQL -> lmdb (Alex) * Next week: guile bindings for lmdb - improving RDF ### Week 2 * RDF structure to markdown dump * Fred is running SPARQL queries * Alex is adding lmdb phenotype API endpoints ## LLMs & metadata (RDF) * lead: Shelby * team: Priscilla * contact: Bonface, Pjotr, Rupert => ../lmms/llm-metadata Tracking progress ### Week 1 * Created issue page * Downloading publications (Priscilla) * Flask server * Next: Connecting OpenAI * Create matrix room ### Week 2 * Open AI API is working * Shelby is integrating into a Flask interface for GeneNetwork * Using a pubmed UI style ## API to access data from GN * lead: Rupert * team: Flavia * contacts: Bonface, Zach, Fred Documentation and adding endpoints git repo gn-docs & genenetwork3 & SPARQL => https://github.com/genenetwork/gn-docs ### Week 1 * Mapping out the API => https://github.com/genenetwork/gn-docs/blob/master/api/questions-to-ask-GN.md * Ideas on structuring * Questions on GN * Next: unify access to information * collecting questions from users * settle on form of API * create example URLs for mouse ### Week 2 * GraphQL Arun gives a mumi demo - schema allows for (partial) queries and querying the schema itself * Pjotr convenience API demo - add endpoints in results * Flavia added questions in gn-doc - e.g. for synteny search => https://issues.genenetwork.org/topics/xapian-search-queries Examples for synteny ## Editing data * lead: Fred * team: Arthur, Rupert, Zach * contacts: Rob ### Week 1 * Edit phenotype metadata works * Next: phenotype values and testing on live ### Week 2 * Fixing issues * Meeting on requirements from Arthur and Zach ## Guix parametrization * lead: Sarthak * team: Pjotr, Gabor * contacts: Ludo ### Week 1 => https://blog.lispy.tech/parameterized-packages-an-update.html * Next: focus on statically built packages optimized for arch. ### Week 2 * Looking into GeneNetwork3 service * Enumerated types ## Links * Matrix room is GNSoC2023 => https://fosdem.org/2023/schedule/event/tissue/ Arun's talk on our issue tracker => https://github.com/genenetwork/gn-gemtext-threads Git repo on issues/tasks/topics => https://issues.genenetwork.org/topics/biohackathon/GNGSoC2023 This page For more info contact pjotr.public912 at thebird.nl