multiplication using SSE (x*x*x)+(y*y*y)
问题 I'm trying to optimize this function using SIMD but I don't know where to start. long sum(int x,int y) { return x*x*x+y*y*y; } The disassembled function looks like this: 4007a0: 48 89 f2 mov %rsi,%rdx 4007a3: 48 89 f8 mov %rdi,%rax 4007a6: 48 0f af d6 imul %rsi,%rdx 4007aa: 48 0f af c7 imul %rdi,%rax 4007ae: 48 0f af d6 imul %rsi,%rdx 4007b2: 48 0f af c7 imul %rdi,%rax 4007b6: 48 8d 04 02 lea (%rdx,%rax,1),%rax 4007ba: c3 retq 4007bb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) The calling code