simd

Optimization of 3D Direct Convolution Implementation in C

本小妞迷上赌 提交于 2020-08-10 20:21:28
问题 For my project, I've written a naive C implementation of direct 3D convolution with periodic padding on the input. Unfortunately, since I'm new to C, the performance isn't so good... here's the code: int mod(int a, int b) { // calculate mod to get the correct index with periodic padding int r = a % b; return r < 0 ? r + b : r; } void convolve3D(const double *image, const double *kernel, const int imageDimX, const int imageDimY, const int imageDimZ, const int stencilDimX, const int stencilDimY

Golang assembly implement of _mm_add_epi32

一个人想着一个人 提交于 2020-08-10 13:10:13
问题 I'm trying to implement _mm_add_epi32 in golang assembly, optionally with help of avo. But I know little about assembly and do not even know how to start it. Can you give me some hint of code? Thank you all. Here's the equivalent slower golang version: func add(x, y []uint32) []uint32 { if len(x) != len(y) { return nil } result := make([]uint32, len(x)) for i := 0; i < len(x); i++ { result[i] = x[i] + y[i] } return result } I know that the struction paddq xmm, xmm is what we need, but do not