summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMunyoki Kilyungi2024-07-10 13:51:22 +0300
committerMunyoki Kilyungi2024-07-10 13:51:22 +0300
commitacdc3b89d6ed18bc6addf7b8df84e98de919fe1c (patch)
tree6a03c9ef8562359f02ee527cd115edfcac105c9d
parent647b3bae2514efa5d3050b8f0c6cf1366b0dfd8f (diff)
downloadgn-gemtext-acdc3b89d6ed18bc6addf7b8df84e98de919fe1c.tar.gz
New issue.
Signed-off-by: Munyoki Kilyungi <me@bonfacemunyoki.com>
-rw-r--r--issues/inspect-discrepancies-between-xapian-and-sql-search.gmi44
1 files changed, 44 insertions, 0 deletions
diff --git a/issues/inspect-discrepancies-between-xapian-and-sql-search.gmi b/issues/inspect-discrepancies-between-xapian-and-sql-search.gmi
new file mode 100644
index 0000000..c464187
--- /dev/null
+++ b/issues/inspect-discrepancies-between-xapian-and-sql-search.gmi
@@ -0,0 +1,44 @@
+# Inspect Discrepancies Between XAPIAN and SQL Search.
+
+* assigned: bonfacem, rookie101
+
+## Description
+
+When doing a XAPIAN search, we miss some data that is unavailable from the SQL Search. The searches we tested:
+
+=> https://cd.genenetwork.org/search?species=mouse&group=BXD&type=Hippocampus+mRNA&dataset=HC_M2_0606_P&search_terms_or=WIKI%3Dglioma&search_terms_and=&accession_id=None&FormID=searchResulto Normal SQL search for dataset=HC_M2_0606_P species=mouse group=BXD WIKI=glioma
+
+For the above search, we get 31 results.
+
+=> https://cd.genenetwork.org/gsearch?type=gene&terms=species%3Amouse+group%3Abxd+dataset%3Ahc_m2_0606_p+wiki%3Aglioma species:mouse group:bxd dataset:hc_m2_0606_p wiki:glioma
+
+For the above search, we get 26 results.
+
+We miss the following entries from the XAPIAN search:
+
+```
+15 1423803_s_at Gltscr2 glioma tumor suppressor candidate region gene 2
+16 1451121_a_at Gltscr2 glioma tumor suppressor candidate region 2; exons 8 and 9
+17 1452409_at Gltscr2 glioma tumor suppressor candidate region gene 2
+25 1416556_at Sas sarcoma amplified sequence
+26 1430029_a_at Sas sarcoma amplified sequence
+```
+
+We want to work out why the above miss in the xapian documents for the given trait. To do that we first use quest to search for one of the symbols to get the exact doc-id:
+
+```
+quest --msize=2 -s en --boolean-prefix="iden:Qgene:" \
+"iden:"1423803_s_at:hc_m2_0606_p"" --db=/export/data/genenetwork-xapian/
+
+Parsed Query: Query(0 * Qgene:1423803_s_at:hc_m2_0606_p) Exactly 1 matches MSet: 9665867: [0] {"name": "1423803_s_at", "symbol": "Gltscr2", "description": "glioma tumor suppressor candidate region gene 2", "chr": "1", "mb": 4.687986, "dataset": "HC_M2_0606_P", "dataset_fullname": "Hippocampus Consortium M430v2 (Jun06) PDNN", "species": "mouse", "group": "BXD", "tissue": "Hippocampus mRNA", "mean": 11.749030303030299, "lrs": 11.3847971289981, "additive": -0.0650828877005346, "geno_chr": "5", "geno_mb": 137.010795}
+```
+
+Inspecting the doc-id in XAPIAN, see:
+
+```
+bonfacem@tux02 /export5/xapian-test/xapian-07-04 $ xapian-delve -r 9665867 -d /export/data/genenetwork-xapian/
+
+Data for record #9665867:
+{"name": "1423803_s_at", "symbol": "Gltscr2", "description": "glioma tumor suppressor candidate region gene 2", "chr": "1", "mb": 4.687986, "dataset": "HC_M2_0606_P", "dataset_fullname": "Hippocampus Consortium M430v2 (Jun06) PDNN", "species": "mouse", "group": "BXD", "tissue": "Hippocampus mRNA", "mean": 11.749030303030299, "lrs": 11.3847971289981, "additive": -0.0650828877005346, "geno_chr": "5", "geno_mb": 137.010795}
+Term List for record #9665867: 1423803_s_at 2 5330430h08rik 9430097c02rik Qgene:1423803_s_at:hc_m2_0606_p XC1 XDShc_m2_0606_p XGbxd XIhippocampus XImrna XPC5 XSmouse XTgene XYgltscr2 ZXDShc_m2_0606_p ZXGbxd ZXIhippocampus ZXImrna ZXSmous ZXYgltscr2 Zbc017637 Zbxd Zcandid Zgene Zglioma Zgltscr2 Zhc_m2_0606_p Zhippocampus Zmous Zmrna Zregion Zsuppressor Ztumor bc017637 bxd candidate gene glioma gltscr2 hc_m2_0606_p hippocampus mouse mrna region suppressor tumor
+```