# GNSoC 2023

GN Summer of Code

## Introduction

We are running a GN Summer of Code in small teams.

* Runs July + August 2023
* Weekly plenary where projects present progress - Thursday 9am EU, 10 am EAT.
* Projects should be (slightly) out of comfort zone
* Use gemtext documentation
* Option for publishing on progress by then end as a BLOG or BioHackrXiv

## CI for guix-bioinformatics (guix pull)

* lead: Arun
* team: Efraim, Pjotr, Sarthak

Making GN deployment rock solid

git repo genenetwork-machines, guix-bioinformatics

=> ../../issues/gnsoc-ci-rethink Tracking progress

### Week 1

* Proposal written
  - guix pull on guix-bioinformatics
  - updated guix is broken gemma (Pjotr)
* Efraim guix GN2 - so we can have a channel
* Next step build substitutes for guix-bioinformatics
  - once built they are shared
* And create a GN3 channel (Efraim?)

### Week 2

* RISC-V port progressing with node and zig 0.10
* guix-bioinformatics now has CI!
* ~700 packages, 240 are broken ;)

### Week 3

* Arun gives a presentation on laminar using guix-forge: slides:
=> https://issues.genenetwork.org/topics/ci-rethink-slides
* New system is simpler and has reproducibility issues
* Efraim is doing channels for GN2 and GN3
* Sarthak will try to run guix-forge

### Week 4

* Arun added channels to CI
* localhost
* cgit
* Efraim: gn2 -> channel; Arun tested

### Week 5

* Added Klaus server for git
* Fixed channels to work wit Python 3.9 instead of 3.10

### Week 6

* Towards a new workbench with cgit support

### Week 7

* Arun has cgit running on his own server - soon bringing up tux02 after resolving https

### Week 8

* cgit deployment at https://git.genenetwork.org/
* guix forge

### Week 9

* Discussion on importers for guix forge
* Talked about propagator networks - part of Seattle presentation

### More

=> https://ci.genenetwork.org/jobs/guix-bioinformatics

* CI/CD is up and running again (and broken)
* Rethink: channels and pull channels are used for CI/CD
* Move unused packages elsewhere

## Nextgen databases

lmdb+RDF

* lead: Bonface
* team: Fred, Alex
* contact: Pjotr

git repo genenetwork3

=> ../../topics/next-gen-databases/design-doc Design doc

### Week 1

* RDF dumps
* Parsing S-exp -> markdown
* Hashing tables (Fred)
  - automated updates
* Some progress on sample data from SQL -> lmdb (Alex)
* Next week: guile bindings for lmdb
  - improving RDF

### Week 2

* RDF structure to markdown dump
* Fred is running SPARQL queries
* Alex is adding lmdb phenotype API endpoints

### Week 3

* Bonface demoes new documentation & code

### Week 4

* Settled on prefixes terms and id
* Updated man pages
=> https://github.com/genenetwork/gn-docs/tree/master/rdf-documentation
* Started work on lmdb+guile:
=> https://github.com/BonfaceKilz/gn-data-vault

### Week 5

* Metadata - renamed prefixes
* Short names gn: gnt:
* Updated virtuoso
* parsing geno files - lmdb support

### Week 6

* Improvements on RDF
### Week 7

* RDF improvements with ontology
* Inconsistencies and privacy discussion today

### Week 8

* Transformed most tables now
* guile-lmdb started by Alex

### Week 9

* New renaming and modelling
* GeneRif
* Working on unique IDs
* SKOS

## LLMs & metadata (RDF)

* lead: Shelby
* team: Priscilla
* contact: Bonface, Pjotr, Rupert

=> ../lmms/llm-metadata Tracking progress

### Week 1

* Created issue page
* Downloading publications (Priscilla)
* Flask server
* Next: Connecting OpenAI
* Create matrix room

### Week 2

* Open AI API is working
* Shelby is integrating into a Flask interface for GeneNetwork
* Using a pubmed UI style

### Week 3

* Shelby shows code
* Plan to host
* Priscilla is working on SLA and document acquisition
* Hosting GN Q&A

### Week 4

* working on container
* fetched 1000 publication
* JSON documentation on references

### Week 5

* Guix container for LLM
* Expose container to Rupert
* Add a password

### Week 6

* Very close to a working flask app Rupert can try next week
### Week 7

* Working FLASK app ready for testing - Rupert will have a go

### Week 8

* Working prototype!
* references as JSON

### Week 9

* Shelby showed working references and challenges
* System is ready for viewing by Rob

## API to access data from GN

* lead: Rupert
* team: Flavia
* contacts: Bonface, Zach, Fred

Documentation and adding endpoints

git repo gn-docs & genenetwork3 & SPARQL

=> https://github.com/genenetwork/gn-docs

### Week 1

* Mapping out the API

=> https://github.com/genenetwork/gn-docs/blob/master/api/questions-to-ask-GN.md

* Ideas on structuring
* Questions on GN
* Next: unify access to information
* collecting questions from users
* settle on form of API
* create example URLs for mouse

### Week 2

* GraphQL Arun gives a mumi demo - schema allows for (partial) queries and querying the schema itself
* Pjotr convenience API demo - add endpoints in results
* Flavia added questions in gn-doc - e.g. for synteny search
=> https://issues.genenetwork.org/topics/xapian-search-queries Examples for synteny

### Week 3

* Rupert proposes endpoints and metadata traversing

=> https://github.com/genenetwork/gn-docs/blob/master/api/alternative-API-structure.md

### Week 4

* Start working on endpoints
* R api - reference GN/API
* synteny

### Week 5

* Added back-end support for wikidata - finding inconsistencies

### Week 6

* API ready for running in a production environment
* Using latest RDF

### Week 7

* Test version of API is running at https://luna.genenetwork.org/api/v2.0/
* We'll continue building up facilities

### Week 8

* R script to parse API by Rupert
* Move forward with populations/strains and datasets

### Week 9

* Progressed API software to include groups

## Editing data

* lead: Fred
* team: Arthur, Rupert, Zach
* contacts: Rob

### Week 1

* Edit phenotype metadata works
* Next: phenotype values and testing on live

### Week 2

* Fixing issues
* Meeting on requirements from Arthur and Zach

### Week 4

* Editing works!
* Discussed the approval procedure and edit button for everyone

### Week 5

* Editing has gone on production - fixing issues
* Discussion on REST API

### Week 6

* Progress on authorization and editing

### Week 7

* Fred has moved code into a new repo for https://github.com/genenetwork/gn-auth

### Week 8

* gn-auth is building locally and needs to go on the forge
* case attributes
### Week 9

* gn-auth continuation
* case-attribute editing progress and challenges

## Guix parametrization

* lead: Sarthak
* team: Pjotr, Gabor
* contacts: Ludo

### Week 1

=> https://blog.lispy.tech/parameterized-packages-an-update.html

* Next: focus on statically built packages optimized for arch.

### Week 2

* Looking into GeneNetwork3 service
* Enumerated types

### Week 3

* Preparing for BLOG on S-exp

### Week 4

* BLOG in a stage that we discuss naming conventions

### Week 5

* Proposed DSL for parameters

### Week 6

* Posted BLOG and started implementation

### Week 7

* Agreed on final deliverables for GSoC

### Week 8

* Milestone on dependencies!

### Week 9

* Sarthak showed us the working prototype

## Links

* Matrix room is GNSoC2023

=> https://fosdem.org/2023/schedule/event/tissue/ Arun's talk on our issue tracker
=> https://github.com/genenetwork/gn-gemtext-threads Git repo on issues/tasks/topics
=> https://issues.genenetwork.org/topics/biohackathon/GNGSoC2023 This page

For more info contact pjotr.public912 at thebird.nl