How to add values from vector to each other

纵饮孤独 提交于 2020-02-07 03:39:25

问题


In my code I solve integral

y=x^2-4x+6

I used SSE - it allows me to operate on 4 values in one time. I made program which solve this integral with values from 0 to 5 divided to five 4-element vectors n1, n2, n3, n4.

.data
n1: .float 0.3125,0.625,0.9375,1.25
n2: .float 1.5625,1.875,2.1875,2.5
n3: .float 2.8125,3.12500,3.4375,3.75
n4: .float 4.0625,4.37500,4.6875,5
szostka: .float 6,6,6,6
czworka: .float 4,4,4,4
.text
.global main
main:  
        movups (n1),%xmm0

        mulps %xmm0,%xmm0
        movups (szostka),%xmm2
        addps %xmm2,%xmm0
        movups (n1),%xmm1
        movups (czworka),%xmm2
        mulps %xmm2,%xmm1
        subps %xmm1,%xmm0
        movups %xmm0,%xmm7

        movups (n2),%xmm0

        mulps %xmm0,%xmm0
        movups (szostka),%xmm2
        addps %xmm2,%xmm0
        movups (n1),%xmm1
        movups (czworka),%xmm2
        mulps %xmm2,%xmm1
        subps %xmm1,%xmm0
        movups %xmm0,%xmm6

        movups (n3),%xmm0

        mulps %xmm0,%xmm0
        movups (szostka),%xmm2
        addps %xmm2,%xmm0
        movups (n1),%xmm1
        movups (czworka),%xmm2
        mulps %xmm2,%xmm1
        subps %xmm1,%xmm0
        movups %xmm0,%xmm5

        movups (n4),%xmm0

        mulps %xmm0,%xmm0
        movups (szostka),%xmm2
        addps %xmm2,%xmm0
        movups (n1),%xmm1
        movups (czworka),%xmm2
        mulps %xmm2,%xmm1
        subps %xmm1,%xmm0
        movups %xmm0,%xmm4

        mov $1,%eax
        mov $0,%ebx
        int $0x80 

In the end, I have 4 vectors in registers xmm7, xmm6, xmm5, xmm4. To solve integral, I need to add vectors to each other (which is easy) and then add values from vector also to each other.
How should I do this?


回答1:


As Paul R said in a comment, you can use haddps for horizontal ops within a vector, at the end.

Your code looks inefficient. If you're going to fully unroll, instead of using a loop and an accumulator, you can use a different register in the first place for each copy, instead of having a movups %xmm0,%xmmX at the end of every block.

Also, keep (szostka) and (czworka) in a register across iterations. Don't reload them every time. Similarly, replace movups (n1),%xmm1 with movups %xmm0, %xmm1 (before you square %xmm0). On IvyBridge and later, the register-renaming stage handles reg-reg moves, and they happen with zero latency.

If you did need to load (szostka) every time, it would be better to use addps with a memory operand, instead of a separate move and add. Micro-fusion could keep that operation as a single uop.

Check out http://agner.org/optimize/ for docs on how to optimize assembly. You might find it more useful to use intrinsics, to let the compiler take care of small details like register allocation, instead of writing in asm directly.



来源:https://stackoverflow.com/questions/30719340/how-to-add-values-from-vector-to-each-other

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!