问题
I've written a simple serial 1D convolution function (below). I'm also experimenting with GPU convolution implementations. This is mostly for my own curiosity; I'm trying to learn the performance tradeoffs among various non-FFT implementation strategies.
Avoiding branching will be important for my GPU convolution experiments, since branching is expensive on Nvidia GPUs. One of my friends mentioned that there's a way to implement the code below without if/else statements, but he couldn't remember how it works.
How can I do a correct 1D convolution implementation without using any if/else statements?
Here's my basic 1D serial code in C++:
vector<int> myConv1d(vector<int> vec, vector<int> kernel)
{
int paddedLength = vec.size() + kernel.size() - 1;
vector<int> convolved(paddedLength); //zeros
reverse(kernel.begin(), kernel.end()); //flip the kernel (if we don't flip it, then we have correlation instead of convolution)
for(int outputIdx=0; outputIdx<paddedLength; outputIdx++) //index into 'convolved' vector
{
int vecIdx = outputIdx - kernel.size() + 1; //aligns with leftmost element of kernel
for(int kernelIdx=0; kernelIdx<kernel.size(); kernelIdx++)
{
if( (vecIdx+kernelIdx) >= 0 && (vecIdx+kernelIdx) < vec.size() ) //TODO: FIND A WAY TO REMOVE THIS
{
convolved[outputIdx] += kernel[kernelIdx]*vec[vecIdx+kernelIdx];
}
}
}
return convolved;
}
A couple of quick notes:
- I did find some related posts, but I didn't quite understand the strategy avoiding conditional statements.
- I've also written a 2D convolution implementation, and I'm hoping to apply the results of this SO post to the 2D version as well.
- This is NOT homework. It's marginally related to one of our research projects, but it's mostly for the sake of learning.
回答1:
Why don't you do something like this?
int lowerBound = std::max( 0, -vecIdx );
int upperBound = std::min( kernel.size(), vec.size() - vecIdx );
for( int kernelIdx = lowerBound; kernelIdx < upperBound; kernelIdx++ )
Sorry If I didn't understand the question.
回答2:
Either zero-extend or border-extend the source vector to avoid the checks. If the source vector V is of size L and the kernel of size K, pad it by prepending and appending K-1 elements.
Let L = 5 and K = 3, you should end up with the padded vector
p p v v v v v q q
where the vs are the vector elements, the ps and qs the padding. Keep in mind that GPU toolkits should allow to clamp elements outside of a source vector to either 0 or to the border value - effectively making the above padding useless.
来源:https://stackoverflow.com/questions/12576976/1d-convolution-without-if-else-statements-non-fft