Traversal of Bounding Volume Hierachy in Shaders

ぐ巨炮叔叔 提交于 2020-05-13 05:30:05

问题


I am working on a path tracer using vulkan compute shaders. I implemented a tree representing a bounding volume hierachy. The idea of the BVH is to minimize the amount of objects a ray intersection test needs to be performed on.


#1 Naive Implementation

My first implementation is very fast, it traverses the tree down to a single leaf of the BVH tree. However, the ray might intersect multiple leaves. This code then leads to some triangles not being rendered (although they should).

int box_index = -1;

for (int i = 0; i < boxes_count; i++) {
    // the first box has no parent, boxes[0].parent is set to -1
    if (boxes[i].parent == box_index) {
        if (intersect_box(boxes[i], ray)) {
            box_index = i;
        }
    }
}

if (box_index > -1) {
    uint a = boxes[box_index].ids_offset;
    uint b = a + boxes[box_index].ids_count;

    for (uint j = a; j < b; j++) {
        uint triangle_id = triangle_references[j];
        // triangle intersection code ...
    }
}

#2 Multi-Leaf Implementation

My second implementation accounts for the fact that multiple leaves might be intersected. However, this implementation is 36x slower than implementation #1 (okay, I miss some intersection tests in #1, but still...).

bool[boxes.length()] hits;
hits[0] = intersect_box(boxes[0], ray);

for (int i = 1; i < boxes_count; i++) {
    if (hits[boxes[i].parent]) {
        hits[i] = intersect_box(boxes[i], ray);
    } else {
        hits[i] = false;
    }
}

for (int i = 0; i < boxes_count; i++) {
    if (!hits[i]) {
        continue;
    }

    // only leaves have ids_offset and ids_count defined (not set to -1)
    if (boxes[i].ids_offset < 0) {
        continue;
    }

    uint a = boxes[i].ids_offset;
    uint b = a + boxes[i].ids_count;

    for (uint j = a; j < b; j++) {
        uint triangle_id = triangle_references[j];
        // triangle intersection code ...
    }
}

This performance difference drives me crazy. It seems only having a single statement like if(dynamically_modified_array[some_index]) has a huge impact on performance. I suspect that the SPIR-V or GPU compiler is no longer able to do its optimization magic? So here are my questions:

  1. Is this indeed an optimization problem?

  2. If yes, can I transform implementation #2 to be better optimizable? Can I somehow give optimization hints?

  3. Is there a standard way to implement BVH tree queries in shaders?


回答1:


After some digging, I found a solution. Important to understand is that the BVH tree does not exclude the possibility that one needs to evaluate all leaves.

Implementation #3 below, uses hit and miss links. The boxes need to be sorted in a way that in the worst case all of them are queried in the correct order (so a single loop is enough). However, links are used to skip nodes which don't need to be evaluated. When the current node is a leaf node, the actual triangle intersections are performed.

  • hit link ~ which node to jump to in case of a hit (green below)
  • miss link ~ which node to jump to in case of a miss (red below)

Image taken from here. The associated paper and source code is also on Prof. Toshiya Hachisuka's page. The same concept is also described in this paper referenced in the slides.


#3 BVH Tree with Hit and Miss Links

I had to extend the data which is pushed to the shader with the links. Also some offline fiddling was required to store the tree correctly. At first I tried using a while loop (loop until box_index_next is -1) which resulted in a crazy slowdown again. Anyway, the following works reasonably fast:

int box_index_next = 0;

for (int box_index = 0; box_index < boxes_count; box_index++) {
    if (box_index != box_index_next) {
        continue;
    }

    bool hit = intersect_box(boxes[box_index], ray);
    bool leaf = boxes[box_index].ids_count > 0;

    if (hit) {
        box_index_next = boxes[box_index].links.x; // hit link
    } else {
        box_index_next = boxes[box_index].links.y; // miss link
    }

    if (hit && leaf) {
        uint a = boxes[box_index].ids_offset;
        uint b = a + boxes[box_index].ids_count;

        for (uint j = a; j < b; j++) {
            uint triangle_id = triangle_references[j];
            // triangle intersection code ...
        }
    }
}

This code is about 3x slower than the fast, but flawed implementation #1. This is somewhat expected, now the speed depends on the actual tree, not on the gpu optimization. Consider, for example, a degenerate case where triangles are aligned along an axis: a ray in the same direction might intersect with all triangles, then all tree leaves need to be evaluated.

Prof. Toshiya Hachisuka proposes a further optimization for such cases in his sildes (page 36 and onward): One stores multiple versions of the BVH tree, spatially sorted along x, -x, y, -y, z and -z. For traversal the correct version needs to be selected based on the ray. Then one can stop the traversal as soon as a triangle from a leaf is intersected, since all remaining nodes to be visited will be spatially behind this node (from the ray point of view).


Once the BVH tree is built, finding the links is quite straightforward (some python code below):

class NodeAABB(object):

    def __init__(self, obj_bounds, obj_ids):
        self.children = [None, None]
        self.obj_bounds = obj_bounds
        self.obj_ids = obj_ids

    def split(self):
        # split recursively and create children here
        raise NotImplementedError()

    def is_leaf(self):
        return set(self.children) == {None}

    def build_links(self, next_right_node=None):
        if not self.is_leaf():
            child1, child2 = self.children

            self.hit_node = child1
            self.miss_node = next_right_node

            child1.build_links(next_right_node=child2)
            child2.build_links(next_right_node=next_right_node)

        else:
            self.hit_node = next_right_node
            self.miss_node = self.hit_node

    def collect(self):
        # retrieve in depth first fashion for correct order
        yield self
        if not self.is_leaf():
            child1, child2 = self.children
            yield from child1.collect()
            yield from child2.collect()

After you store all AABBs in an array (which will be sent to the GPU) you can use hit_node and miss_node to look up the indices for the links and store them as well.



来源:https://stackoverflow.com/questions/55479683/traversal-of-bounding-volume-hierachy-in-shaders

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!