发表新帖

发表新帖

What's the coolest hack you've seen or done? [closed]

后端未结

关注

 30  2763

忘掉有多难 2020-12-22 15:44

30条回答

情深已故 (楼主)

2020-12-22 15:53
Its such a trivial thing, but when I first saw this code (by a fellow developer of mine) I was shocked because it is something I would have never thought of (comments added by me):
```
cglobal x264_sub8x8_dct_sse2, 3,3  ;3,3 means 3 arguments and 3 registers used
.skip_prologue:
    call .8x4
    add  r0, 64                    ;increment pointers
    add  r1, 4*FENC_STRIDE
    add  r2, 4*FDEC_STRIDE
.8x4:
    SUB_DCT4 2x4x4W                ;this macro does the actual transform
    movhps [r0+32], m0             ;store second half of output data
    movhps [r0+40], m1             ;the rest is done in the macro
    movhps [r0+48], m2
    movhps [r0+56], m3
    ret
```
It does an 8x8 block of 4 transforms by doing sets of 8x4 at a time. But it doesn't paste the code twice (that would waste code size), nor does it have an 8x4 function and call it twice. Nor does it have a loop either. Instead, it calls the "function" and then increments the pointers, and then "falls" right into it and does it again.

It gets the best of both worlds: no function calling overhead beyond the original (since the pointers r0, r1, and r2 aren't incremented in SUB_DCT4) and no code duplication, and no loop overhead.
0 讨论(0)

查看其它30个回答
发布评论:

提交评论
- 加载中...

热议问题