aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-08-01Stem group field regardless of case.Arun Isaac
* gn3/api/search.py (parse_boolean_prefixed_field): New function. (parse_query): Use parse_boolean_prefixed_field for the group field.
2024-08-01Stem all the time.Arun Isaac
With the STEM_SOME, xapian does not stem query words that start with a capital letter. Hence, we switch to the STEM_ALL strategy. * gn3/api/search.py (parse_query): Set stemming strategy to STEM_ALL.
2024-07-23chore: add logging infoJohn Nduli
2024-07-23chore: mypy and pylint fixesJohn Nduli
2024-07-23fix: resolve duplicate errors when updating dataJohn Nduli
2024-07-23refactor: clean query for insertJohn Nduli
2024-07-23refactor: clean up insert queryJohn Nduli
2024-07-23refactor: reorganize update_rif script to be more pythonicJohn Nduli
2024-07-23chore: fix pylint errorsJohn Nduli
2024-07-23refactor: rename addRIf to update_rif_table.pyJohn Nduli
2024-07-12Rename hash_rdf_graph -> md5hash_ttl_dir.Munyoki Kilyungi
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-07-12Use correct ttl-dir path when generating checksums.Munyoki Kilyungi
* scripts/index-genenetwork (is_data_modified): Provide directory instead of specific ttl file. (create_xapian_index): Ditto. Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-07-12fix: remove .py extension for addRif to prevent pylint checksJohn Nduli
2024-07-12refactor: fix mypy and pylint errorsJohn Nduli
2024-07-12feat: copy addRif script from genenetwork1John Nduli
original: https://github.com/genenetwork/genenetwork1/blob/master/web/webqtl/maintainance/addRif.py Included some changes to make it python3 compatible
2024-07-08Pass output directory to R/qtl script instead of pulling it from thezsloan
environment Also fixes issue where the control marker keyword was wrong
2024-07-05fix: return query error message from xapianJohn Nduli
2024-07-03Return a "-1" if the turtle directory does not exist.Munyoki Kilyungi
* scripts/index-genenetwork (hash_rdf_graph): Remove check for the turtle directory. (is_data_modified): Ditto. (create_xapian_index): Ditto. Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-07-03Generate a checksum for all the ttl files.Munyoki Kilyungi
* scripts/index-genenetwork (hash_generif_graph): Rename to hash_rdf_graph. Generate a checksum of all the turtle files inside the ttl directory that's the basis for the GN virtuoso graph. (create_xapian_index): Rename hash_generif_graph -> hash_rdf_graph. Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-07-03Add type-hints to hash_generif_graph.Munyoki Kilyungi
* scripts/index-genenetwork (hash_generif_graph): Add proper type hints. Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-07-03Refactor how the generif md5 sum is calculated and stored in XAPIAN.Munyoki Kilyungi
* scripts/index-genenetwork (hash_generif_graph): Build the generif checksum by directly building it from the file. (is_data_modified): Update how generif-checksums are verified. (create_xapian_index): Update how generif-checksums are stored in XAPIAN. Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-07-03Use correct cache for RIF/Wiki entries.Munyoki Kilyungi
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-07-03feat: drop intermediate folders when running parallel xapian compactJohn Nduli
2024-07-03feat: add support for parallel xapian compactJohn Nduli
2024-07-03feat: index rif and wiki without positionsJohn Nduli
2024-07-03feat: drop common words when building rdf cachesJohn Nduli
2024-07-03feat: set 67 parallel processes to run in prodJohn Nduli
2024-07-03fix: remove namespaces since child processes copy the rdf cachesJohn Nduli
2024-07-03fix: use correct prefix and index key; group wiki cache queryJohn Nduli
2024-07-03feat: add wikidata prefix to search apiJohn Nduli
2024-07-03feat: add wikidata indexingJohn Nduli
2024-07-03feat: add global wikicacheJohn Nduli
2024-07-03feat: add sparql query to get wikidataJohn Nduli
2024-06-26Increase max number of results to 50000 for Xapian searchzsloan
This change needs to be accompanied by a change in GN2! If it's lower than the GN2 MAX_SEARCH_RESULTS value, searches will throw an error.
2024-06-24Use dataset Name instead of FullName for indexingzsloan
The Name is generally used as the identifier, while the FullName can container spaces which can cause problems
2024-06-18Revert "Set the file path for the logger."Munyoki Kilyungi
This reverts commit b21102bc4ad3678173e7c94d3e66333ec7c1d40a.
2024-06-18refactor: drop global variablesJohn Nduli
2024-06-17Check table names in Xapian; if not, default to "-1".Munyoki Kilyungi
Without this check, there will always be an error when this script is run with the "is-data-modified" flag should there be no database in the XAPIAN_DIRECTORY. Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-06-17Fetch distinct comments.Munyoki Kilyungi
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-06-14fix: typehints in index-genenetwork scriptJohn Nduli
2024-06-14fix: fix incorrect parameters in index_query functionJohn Nduli
2024-06-12Move the generated xapian files to the correct directory.Munyoki Kilyungi
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-06-12Set the file path for the logger.Munyoki Kilyungi
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-06-12Change the date format for the logger.Munyoki Kilyungi
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-06-12Log how long it takes to run the indexing script.Munyoki Kilyungi
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-06-12Check for a running process by viewing the build dir's contents.Munyoki Kilyungi
In the CI build, the actual build is run in the xapian_directory/build, which is seen as the xapian_directory in this script. The CI handles clean up WRT removing files related to the build process. * scripts/index-genenetwork (create_xapian_index): Create the xapian directory if it doesn't exist. If the xapian directory has files, exit. Create the temporary directory inside the xapian_directory. Remove "build_directory.rmdir()" Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-06-12Return 0 if data changes, else exit with 1.Munyoki Kilyungi
* scripts/index-genenetwork (is_data_modified): Replace click.echo with the respective sys.exit call. Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-06-12Explicitly pass sparql_uri to script.Munyoki Kilyungi
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
2024-06-12Rework how the indexes are built.Munyoki Kilyungi
Right now, the checks are done in Guix's build expression. This moves that work to the index-genenetwork script.
2024-06-12Add method to check the validity of the tables+RDF checksums.Munyoki Kilyungi
* scripts/index-genenetwork (verify_checksums): New function. Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>