summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMunyoki Kilyungi2024-06-21 19:50:33 +0300
committerMunyoki Kilyungi2024-06-21 19:50:33 +0300
commiteae3d25e056db22faf98da4a6dca016381378138 (patch)
tree79aaa3d25e96ec9ff05a0440e24dbddcfc1c5938
parent02a6d9ebc2e2c160874e2f52412cfc7be67b7231 (diff)
downloadgn-gemtext-eae3d25e056db22faf98da4a6dca016381378138.tar.gz
doc: document our current xapian search issues.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
-rw-r--r--issues/rdf/search-indexing-general-issues.gmi32
1 files changed, 32 insertions, 0 deletions
diff --git a/issues/rdf/search-indexing-general-issues.gmi b/issues/rdf/search-indexing-general-issues.gmi
new file mode 100644
index 0000000..3bcc36a
--- /dev/null
+++ b/issues/rdf/search-indexing-general-issues.gmi
@@ -0,0 +1,32 @@
+
+# XAPIAN Search General Issues
+
+* assigned: bonfacem
+
+## Dataset Search Issues
+
+The following full dataset name search yields no results
+
+> dataset:"BXD Published Phenotypes"
+
+In the indexer, we index the dataset name using "index_text"
+
+> index_dataset = lambda dataset: termgenerator.index_text(dataset, 0, "XDS")
+
+Yet in the search, we use a boolean prefix:
+
+> queryparser.add_boolean_prefix("dataset", "XDS")
+
+Currently to be able to do a search for "BXD Published Phenotypes", one would have to do:
+
+> dataset:bxd dataset:published dataset:phenotypes
+
+Note that the search is in all lower-case. The reason for this is that we have:
+
+> queryparser.set_stemming_strategy(queryparser.STEM_SOME)
+
+A fix for this would be to replace "add_boolean_prefix" with "add_prefix".
+
+## CIS/TRANS Searches
+
+The challenge with this search is that we would have to compare valuse for each possible result against one another, necessitating the generation of position values separately for every possible result. Also, for the devs (jnduli, bonfacem) we need to have a better understanding of how this work, which is currently vague.