Document xapian search code architecture.

author: Arun Isaac 2023-05-02 20:50:45 +0100
committer: Arun Isaac 2023-05-02 20:50:45 +0100
commit: 3f637a2a6155859abb86f29a589a83a30a489188 (patch)
tree: 907309a7bbde65a0b1ca62f230950d79ff6a5b1d /topics
parent: bb209973accd98b5b490b1f3355ec01a98c1da01 (diff)
download: gn-gemtext-3f637a2a6155859abb86f29a589a83a30a489188.tar.gz
1 files changed, 9 insertions, 0 deletions
diff --git a/topics/xapian-search.gmi b/topics/xapian-search.gmi
new file mode 100644
index 0000000..732cb31
--- /dev/null
+++ b/topics/xapian-search.gmi
@@ -0,0 +1,9 @@
+# Xapian search
+
+Our main search engine (sometimes called the "global search" for historical reasons) is powered by Xapian, the excellent lightweight search engine library. This document aims to describe the architecture of the search code.
+
+The search engine consists of two separate parts---the indexer and the search query responder. In xapian (or rather, information retrieval) parlance, each possible search result is called a "document". Each document is associated with a set of "terms". The indexer builds an index mapping terms to documents. When a user submits a search query, the search query is decomposed into a set of terms and these terms are looked up in the index. "Terms" are often merely the words that constitute a document or search query. But these words are normalized to remove verb conjugations, plural forms of nouns, etc. For example, "using" is normalized to "use", "looked" is normalized to "look", "books" is normalized to "book", etc. This process is called stemming. Thanks to stemming and the trickery of statistics, the xapian search engine can pretend to a crude understanding of natural language.
+
+## Boolean terms, values, position information, and others
+
+TODO
author	Arun Isaac	2023-05-02 20:50:45 +0100
committer	Arun Isaac	2023-05-02 20:50:45 +0100
commit	3f637a2a6155859abb86f29a589a83a30a489188 (patch)
tree	907309a7bbde65a0b1ca62f230950d79ff6a5b1d /topics
parent	bb209973accd98b5b490b1f3355ec01a98c1da01 (diff)
download	gn-gemtext-3f637a2a6155859abb86f29a589a83a30a489188.tar.gz