summaryrefslogtreecommitdiff
path: root/topics
diff options
context:
space:
mode:
authorArun Isaac2023-05-02 20:50:45 +0100
committerArun Isaac2023-05-02 20:50:45 +0100
commit3f637a2a6155859abb86f29a589a83a30a489188 (patch)
tree907309a7bbde65a0b1ca62f230950d79ff6a5b1d /topics
parentbb209973accd98b5b490b1f3355ec01a98c1da01 (diff)
downloadgn-gemtext-3f637a2a6155859abb86f29a589a83a30a489188.tar.gz
Document xapian search code architecture.
Diffstat (limited to 'topics')
-rw-r--r--topics/xapian-search.gmi9
1 files changed, 9 insertions, 0 deletions
diff --git a/topics/xapian-search.gmi b/topics/xapian-search.gmi
new file mode 100644
index 0000000..732cb31
--- /dev/null
+++ b/topics/xapian-search.gmi
@@ -0,0 +1,9 @@
+# Xapian search
+
+Our main search engine (sometimes called the "global search" for historical reasons) is powered by Xapian, the excellent lightweight search engine library. This document aims to describe the architecture of the search code.
+
+The search engine consists of two separate parts---the indexer and the search query responder. In xapian (or rather, information retrieval) parlance, each possible search result is called a "document". Each document is associated with a set of "terms". The indexer builds an index mapping terms to documents. When a user submits a search query, the search query is decomposed into a set of terms and these terms are looked up in the index. "Terms" are often merely the words that constitute a document or search query. But these words are normalized to remove verb conjugations, plural forms of nouns, etc. For example, "using" is normalized to "use", "looked" is normalized to "look", "books" is normalized to "book", etc. This process is called stemming. Thanks to stemming and the trickery of statistics, the xapian search engine can pretend to a crude understanding of natural language.
+
+## Boolean terms, values, position information, and others
+
+TODO