Fastest way to count number of 1s in a register, ARM assembly

后端未结

关注

 6  1786

不思量自难忘° 2020-12-05 11:44

So I had an interview question before regarding bit manipulation. The company is a well known GPU company. I had very little background in assembly language (weird despite b

6条回答

慢半拍i (楼主)

2020-12-05 12:05
If this code is fast or not depends on the processor. For sure it will be not very fast on Cortex-A8 but may run very fast on Cortex-A9 and newer CPU.

It is however a very short solution.

Expects input in r0, and returns output in r0
```
  vmov.32 d0[0], r0
  vcnt.8  d0, d0
  vmov.32 r0, d0[0]

  add r0, r0, r0, lsr #16
  add r0, r0, r0, lsr #8
  and r0, r0, #31
```
The main work is done in the vcnt.8 instruction which counts the bits of each byte in a NEON register and stores the bitcount back into the bytes of D0.

There is no vcnt.32 form, only .8, so you need to horizontally add the 4 bytes together, which is what the rest of the code is doing.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...