Use C# Vector<T> SIMD to find index of matching element

穿精又带淫゛_ 提交于 2020-07-20 17:21:45

问题


Using C#'s Vector<T>, how can we most efficiently vectorize the operation of finding the index of a particular element in a set?

As constraints, the set will always be a Span<T> of an integer primitive, and it will contain at most 1 matching element.

I have come up with a solution that seems alright, but I'm curious if we can do better. Here is the approach:

  1. Create a Vector<T> consisting only of the target element, in each slot.
  2. Use Vector.Equals() between the input set vector and the vector from the previous step, to get a mask containing a 1 in the single matching slot (or only 0s if there is no match).
  3. Using a pre-initialized vector containing 1-based indexes (1, 2, 3, 4, ...), call Vector.Dot() between that vector and the mask from the previous step. Each index will be multiplied by 0, except the potential matching index, which will be multiplied by 1. What we get back is the sum of those multiplications, which is either 0, or the 1-based index of the matching element.
  4. If the result was 0, return -1 for no match. Otherwise, subtract one from the result to make it 0-based, and return that.

        // One-time initialized vector containing { 1, 2, 3, 4, ... }
        Vector<ushort> indexes = MemoryMarshal.Cast<ushort, Vector<ushort>>(Enumerable.Range(1, Vector<ushort>.Count).Select(index => (ushort)index).ToArray())[0];
    
        // The input set and the element to search for
        Span<ushort> set = stackalloc ushort[]{ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 };
        ushort element = 22;
    
        // Interpret input set as a sequence of vectors (set is assumed to have length power of two for brevity)
        var setVectors = MemoryMarshal.Cast<ushort, Vector<ushort>>(set);
    
        // Create a vector that contains the target element in each slot
        var elementVector = new Vector<ushort>(element);
    
        // Loop per vector rather than per element
        foreach (var vector in setVectors)
        {
            // Get a mask that has a 1 in the single matching slot, or only 0s
            var mask = Vector.Equals(vector, elementVector);
    
            // Get the dot product of the mask and the indexes
            // This will multiple each index by 0, or by 1 if it is the matching one, and return their sum, i.e. the matching index or 0
            // Note that the indexes are deliberately 1-based, to distinguished from 0 (no match)
            var index = Vector.Dot(indexes, mask);
    
            // Either return 0 for no match, or reduce the index by 1 to get the 0-based index
            return index == 0 ? -1 : index - 1;
        }
    

回答1:


The x86 asm you want the compiler to generate is compare-for-equal (pcmpeqb), pmovmskb or movmskps (vector to bitmask with 1-byte or 4-byte elements) and then if the mask is non-zero, bit-scan for first set bit (bsf or tzcnt).

That'll be more efficient than an integer dot product!!

You already have the compare-for-equal, and I think I've seen other C# Q&As with an intrinsic for vector->bitmap. If someone wants to edit this answer or post their own with C# that compiles / JITs to this asm, please do so. I don't know C#, I'm just here for the x86 SIMD.




回答2:


As I can see the simple Span<char>.IndexOf already is using Intrinsics for searching a simple value. You don't even need to cast to char to use it, since MemoryExtensions.IndexOf only cares about size and Unsafe.SizeOf<ushort>() == sizeof(char) !

Also in JsonReaderHelper.IndexOfOrLessThan you will find a more complex Vectorization example for searching. It is using byte searching, but I am sure that you can adapt it to your needs if Span<ushort>.IndexOf is not suitable.



来源:https://stackoverflow.com/questions/56955433/use-c-sharp-vectort-simd-to-find-index-of-matching-element

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!