element word positions - conceptual questions

Deadly 提交于 2019-12-06 03:00:54

When you turn on positions, we store a positions vector for each document in the index for the relevant term, instead of just the document id.

The way to think about this is in terms of the specificity of the leaf queries and the work involved in calculating them and intersecting intermediate results.

When you see a term-query in the query plan, that means it is just looking up document ids, so there is no knowledge of relative positioning -- a less accurate result for a long phrase like this, because the "element word" and "word position" could be occurring in two separate parent elements in the document. If your data only ever has one element with this name in each document, that could not happen, although you could still have false matches where the two-word subphrases occur in, say, the reverse order, or separated by other words.

When you see word-query in the query plan, that means we are going to be looking at positions, and here you see the relative positions for each of the words in the phrase. When this is resolved, we examine the positions vector and toss out the ones that don't mean this positional constraint. So all the matches will have this sequence of words in this order: a more precise match.

The element-query in the plan is also applying positional constraints, of the element instances relative to the matches inside the element. There are optimizations where the element positional constraints are actually pushed down to the leaves of the query tree to avoid excess intermediate calculations.

You also see some technically redundant term queries: the point of these is to do simple term lookups that are probably more constrained than the leaf word queries. Since intersection of term lists from an and-query always proceeds from the shortest matching posting list, this can provide a fail-fast mechanism to avoid the more expensive positions calculations. There is a certain amount of heuristic judgement in that, and given a complex set of index options and query variations, sometimes those additional terms are, in fact, not helpful.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!