Is there any difference between logical SSE intrinsics for different types? For example if we take OR operation, there are three intrinsics: _mm_or_ps, _mm_or_pd and _mm_or_
According to Intel and AMD optimization guidelines mixing op types with data types produces a performance hit as the CPU internally tags 64 bit halves of the register for a particular data type. This seems to mostly effect pipe-lining as the instruction is decoded and the uops are scheduled. Functionally they produce the same result. The newer versions for the integer data types have larger encoding and take up more space in the code segment. So if code size is a problem use the old ops as these have smaller encoding.