strlen performance implementation

前端未结

关注

 3  1231

别那么骄傲

This is a multipurpose question:

How does this compare to the glibc strlen implementation?
Is there a better way to to this in general and for autovec

相关标签:

3条回答

旧时难觅i

2020-12-16 23:59
Also, please note this implementation can read past the end of a char array here:
```
for (w = (const void *)s; !HASZERO(*w); w++);
```
and therefore relies on undefined behaviour.
0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2020-12-17 00:07

To answer your second question, I think the naive byte-based strlen implementation will result in better auto-vectorization by the compiler, if it's smart and support for vector instruction set extensions (e.g. SSE) has been enabled (e.g. with -msse or an appropriate -march). Unfortunately, it won't result in any vectorization with baseline cpus which lack these features, even though the compiler could generate 32- or 64-bit pseudo-vectorized code like the C code cited in the question, if it were smart enough...

0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2020-12-17 00:11
Well, this implementation is based on virtually the same trick (Determine if a word has a zero byte) as the glibc implementation you linked. They do pretty much the same thing, except that in glibc version some loops are unrolled and bit masks are spelled out explicitly. The ONES and HIGHS from the code you posted is exactly himagic = 0x80808080L and lomagic = 0x01010101L form glibc version.

The only difference I see is that glibs version uses a slightly different criterion for detecting a zero byte
```
if ((longword - lomagic) & himagic)
```
without doing ... & ~longword (compare to HASZERO(x) macro in your example, which does the same thing with x, but also includes ~(x) member). Apparently glibc authors believed this shorter formula is more efficient. Yet it can result in false positives. So they check for false positives under that if.

It is indeed an interesting question, what is more efficient: a single-stage precise test (your code) or a two-stage test that begins with rough imprecise check followed, if necessary, by a precise second check (glibc code).

If you want to see how they compare in terms of actual performance - time them on your platform and your data. There's no other way.
0 讨论(0)
发布评论:

提交评论
- 加载中...