Using SIMD/AVX/SSE for tree traversal
I am currently researching whether it would be possible to speed up a van Emde Boas (or any tree) tree traversal. Given a single search query as input, already having multiple tree nodes in the cache line (van emde Boas layout), tree traversal seems to be instruction-bottlenecked. Being kinda new to SIMD/AVX/SSE instructions, I would like to know from experts in that topic whether it would be possible to compare multiple nodes at once to a value and then find out which tree path to follow further on. My research lead to the following question: How many cpu cycles/instructions are wasted on