How to use Morton Order(z order curve) in range search?

岁酱吖の 提交于 2020-02-18 05:06:10

问题


How to use Morton Order in range search? From the wiki, In the paragraph "Use with one-dimensional data structures for range searching",

it says

"the range being queried (x = 2, ..., 3, y = 2, ..., 6) is indicated by the dotted rectangle. Its highest Z-value (MAX) is 45. In this example, the value F = 19 is encountered when searching a data structure in increasing Z-value direction. ......BIGMIN (36 in the example).....only search in the interval between BIGMIN and MAX...."

My questions are:

1) why the F is 19? Why the F should not be 16?

2) How to get the BIGMIN?

3) Are there any web blogs demonstrate how to do the range search?


回答1:


EDIT: The AWS Database Blog now has a detailed introduction to this subject.


This blog post does a reasonable job of illustrating the process.

When searching the rectangular space x=[2,3], y=[2,6]:

  1. The minimum Z Value (12) is found by interleaving the bits of the lowest x and y values: 2 and 2, respectively.
  2. The maximum Z value (45) is found by interleaving the bits of the highest x and y values: 3 and 6, respectively.
  3. Having found the min and max Z values (12 and 45), we now have a linear range that we can iterate across that is guaranteed to contain all of the entries inside of our rectangular space. The data within the linear range is going to be a superset of the data we actually care about: the data in the rectangular space. If we simply iterate across the entire range, we are going to find all of the data we care about and then some. You can test each value you visit to see if it's relevant or not.

An obvious optimization is to try to minimize the amount of superfluous data that you must traverse. This is largely a function of the number of 'seams' that you cross in the data -- places where the 'Z' curve has to make large jumps to continue its path (e.g. from Z value 31 to 32 below).

This can be mitigated by employing the BIGMIN and LITMAX functions to identify these seams and navigate back to the rectangle. To minimize the amount of irrelevant data we evaluate, we can:

  1. Keep a count of the number of consecutive pieces of junk data we've visited.
  2. Decide on a maximum allowable value (maxConsecutiveJunkData) for this count. The blog post linked at the top uses 3 for this value.
  3. If we encounter maxConsecutiveJunkData pieces of irrelevant data in a row, we initiate BIGMIN and LITMAX. Importantly, at the point at which we've decided to use them, we're now somewhere within our linear search space (Z values 12 to 45) but outside the rectangular search space. In the Wikipedia article, they appear to have chosen a maxConsecutiveJunkData value of 4; they started at Z=12 and walked until they were 4 values outside of the rectangle (beyond 15) before deciding that it was now time to use BIGMIN. Because maxConsecutiveJunkData is left to your tastes, BIGMIN can be used on any value in the linear range (Z values 12 to 45). Somewhat confusingly, the article only shows the area from 19 on as crosshatched because that is the subrange of the search that will be optimized out when we use BIGMIN with a maxConsecutiveJunkData of 4.

When we realize that we've wandered outside of the rectangle too far, we can conclude that the rectangle in non-contiguous. BIGMIN and LITMAX are used to identify the nature of the split. BIGMIN is designed to, given any value in the linear search space (e.g. 19), find the next smallest value that will be back inside the half of the split rectangle with larger Z values (i.e. jumping us from 19 to 36). LITMAX is similar, helping us to find the largest value that will be inside the half of the split rectangle with smaller Z values. The implementations of BIGMIN and LITMAX are explained in depth in the zdivide function explanation in the linked blog post.




回答2:


It appears that the quoted example in the Wikipedia article has not been edited to clarify the context and assumptions. The approach used in that example is applicable to linear data structures that only allow sequential (forward and backward) seeking; that is, it is assumed that one cannot randomly seek to a storage cell in constant time using its morton index alone.

With that constraint, one's strategy begins with a full range that is the mininum morton index (16) and the maximum morton index (45). To make optimizations, one tries to find and eliminate large swaths of subranges that are outside the query rectangle. The hatched area in the diagram refers to what would have been accessed (sequentially) if such optimization (eliminating subranges) had not been applied.

After discussing the main optimization strategy for linear sequential data structures, it goes on to talk about other data structures with better seeking capability.



来源:https://stackoverflow.com/questions/30170783/how-to-use-morton-orderz-order-curve-in-range-search

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!