Age | Commit message (Expand) | Author |
2024-07-23 | chore: add logging info | John Nduli |
2024-07-23 | chore: mypy and pylint fixes | John Nduli |
2024-07-23 | fix: resolve duplicate errors when updating data | John Nduli |
2024-07-23 | refactor: clean query for insert | John Nduli |
2024-07-23 | refactor: clean up insert query | John Nduli |
2024-07-23 | refactor: reorganize update_rif script to be more pythonic | John Nduli |
2024-07-23 | chore: fix pylint errors | John Nduli |
2024-07-23 | refactor: rename addRIf to update_rif_table.py | John Nduli |
2024-07-12 | Rename hash_rdf_graph -> md5hash_ttl_dir....Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-07-12 | Use correct ttl-dir path when generating checksums....* scripts/index-genenetwork (is_data_modified): Provide directory
instead of specific ttl file.
(create_xapian_index): Ditto.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-07-12 | fix: remove .py extension for addRif to prevent pylint checks | John Nduli |
2024-07-12 | refactor: fix mypy and pylint errors | John Nduli |
2024-07-12 | feat: copy addRif script from genenetwork1...original: https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/maintainance/addRif.py
Included some changes to make it python3 compatible
| John Nduli |
2024-07-08 | Pass output directory to R/qtl script instead of pulling it from the...environment
Also fixes issue where the control marker keyword was wrong
| zsloan |
2024-07-03 | Return a "-1" if the turtle directory does not exist....* scripts/index-genenetwork (hash_rdf_graph): Remove check for the
turtle directory.
(is_data_modified): Ditto.
(create_xapian_index): Ditto.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-07-03 | Generate a checksum for all the ttl files....* scripts/index-genenetwork (hash_generif_graph): Rename to
hash_rdf_graph. Generate a checksum of all the turtle files inside
the ttl directory that's the basis for the GN virtuoso graph.
(create_xapian_index): Rename hash_generif_graph -> hash_rdf_graph.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-07-03 | Add type-hints to hash_generif_graph....* scripts/index-genenetwork (hash_generif_graph): Add proper type
hints.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-07-03 | Refactor how the generif md5 sum is calculated and stored in XAPIAN....* scripts/index-genenetwork (hash_generif_graph): Build the generif
checksum by directly building it from the file.
(is_data_modified): Update how generif-checksums are verified.
(create_xapian_index): Update how generif-checksums are stored in
XAPIAN.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-07-03 | Use correct cache for RIF/Wiki entries....Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-07-03 | feat: drop intermediate folders when running parallel xapian compact | John Nduli |
2024-07-03 | feat: add support for parallel xapian compact | John Nduli |
2024-07-03 | feat: index rif and wiki without positions | John Nduli |
2024-07-03 | feat: drop common words when building rdf caches | John Nduli |
2024-07-03 | feat: set 67 parallel processes to run in prod | John Nduli |
2024-07-03 | fix: remove namespaces since child processes copy the rdf caches | John Nduli |
2024-07-03 | fix: use correct prefix and index key; group wiki cache query | John Nduli |
2024-07-03 | feat: add wikidata indexing | John Nduli |
2024-07-03 | feat: add global wikicache | John Nduli |
2024-07-03 | feat: add sparql query to get wikidata | John Nduli |
2024-06-24 | Use dataset Name instead of FullName for indexing...The Name is generally used as the identifier, while the FullName can container spaces which can cause problems
| zsloan |
2024-06-18 | Revert "Set the file path for the logger."...This reverts commit b21102bc4ad3678173e7c94d3e66333ec7c1d40a.
| Munyoki Kilyungi |
2024-06-18 | refactor: drop global variables | John Nduli |
2024-06-17 | Check table names in Xapian; if not, default to "-1"....Without this check, there will always be an error when this script is
run with the "is-data-modified" flag should there be no database in
the XAPIAN_DIRECTORY.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-17 | Fetch distinct comments....Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-14 | fix: typehints in index-genenetwork script | John Nduli |
2024-06-14 | fix: fix incorrect parameters in index_query function | John Nduli |
2024-06-12 | Move the generated xapian files to the correct directory....Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-12 | Set the file path for the logger....Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-12 | Change the date format for the logger....Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-12 | Log how long it takes to run the indexing script....Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-12 | Check for a running process by viewing the build dir's contents....In the CI build, the actual build is run in the
xapian_directory/build, which is seen as the xapian_directory in this
script. The CI handles clean up WRT removing files related to the
build process.
* scripts/index-genenetwork (create_xapian_index): Create the xapian
directory if it doesn't exist. If the xapian directory has files,
exit. Create the temporary directory inside the xapian_directory.
Remove "build_directory.rmdir()"
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-12 | Return 0 if data changes, else exit with 1....* scripts/index-genenetwork (is_data_modified): Replace click.echo
with the respective sys.exit call.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-12 | Explicitly pass sparql_uri to script....Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-12 | Rework how the indexes are built....Right now, the checks are done in Guix's build expression. This moves
that work to the index-genenetwork script.
| Munyoki Kilyungi |
2024-06-12 | Add method to check the validity of the tables+RDF checksums....* scripts/index-genenetwork (verify_checksums): New function.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-12 | Generate a SHA256 checksum for the generif graph....Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-01 | Use global cache to store generif metadata....This global caches has 3,528 entries and there's no expectation for it
to grow significantly. Since child processes inherit the parent’s
memory, we can pass the global cache to them, reducing fetch times
from 0.001s to 0.00001s, significantly boosting performance when
indexing the entire database and enriching results with RDF metadata.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-06-01 | Add geneRIF to gene index....* scripts/index-genenetwork: Import Template, lru_cache,
SPARQLWrapper, JSON
(get_rif_metadata): New function.
(index_rif_comments): New function.
(index_genes): Add rif comments to probeset index.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
| Munyoki Kilyungi |
2024-03-18 | pep8 formatting | Alexander_Kabui |
2024-03-18 | pep8 formatting | Alexander_Kabui |