This is a multipurpose question:
To answer your second question, I think the naive byte-based strlen
implementation will result in better auto-vectorization by the compiler, if it's smart and support for vector instruction set extensions (e.g. SSE) has been enabled (e.g. with -msse
or an appropriate -march
). Unfortunately, it won't result in any vectorization with baseline cpus which lack these features, even though the compiler could generate 32- or 64-bit pseudo-vectorized code like the C code cited in the question, if it were smart enough...