How searching works
Read
parcing.md
before
Entrypoint is Search
in search.go
. We take an array of fractions and search them in searchFrac
independently, and then merge the results. There's searchFrac
and searchFracImpl
: the first is an entrypoint and does some fraction initialization and takes locks, and the second is actual searching.
- First we need to build an eval tree from AST:
BuildEvalTree
insearch/eval.go
. This is mostly 1 to 1 mapping of AST nodes, but with different types (to split functionality in different types). If a field query have several candidates (ranges or wildcards), it will build binary tree with all the candidates. Making this binary tree saves a lot of time, so it is important. This tree's leaves implement node.Node interface.
- The active fraction uses the node.nodeStatic implementation of node.Node, based on slice of LIDs in memory.
- For sealed fraction lids.Iterator used. This iterator implements lazy load LIDs from disk/cache.
- Also, if we need to build an aggregation, we build a similar OR tree for the query "agg_field: *". Then, at the evaluation stage, we are getting the intersection of the received LIDs with the LIDs of the result of the main search query and build a map of counters with a token as a key.
- Also we may add histograms, which is basically a map from time interval into how many documents are there.
- Then we evaluate the tree, while there are some documents. We do this in "online" or "lazy", meaning that we don't evaluate each node fully, but read values one by one. It can heavily reduce the RAM usage during request. Values come in sorted order from each source, so we can do efficient operations on them.
Some basic overview of nodes:- OR: works kinda like a merge sort: it reads an integer from both sources, and returns the smallest. If they are equal, it removes the duplicate.
- AND: works similar to OR, but returns an element only if it's in both sources (== they are equal).
- NOT: iterates over all documents and returns only that, which are not in the source.
- NAND (AND with one NOT): returns only the documents from regular source, that are not in the negative source. Reads all the negative skipping them, until we go above current regular; if they're equal, skip the regular, if not, we return it.