reduction | 易学教程

SSE reduction of float vector

阅读更多关于 SSE reduction of float vector

How can I get sum elements (reduction) of float vector using sse intrinsics? Simple serial code: void(float *input, float &result, unsigned int NumElems) { result = 0; for(auto i=0; i<NumElems; ++i) result += input[i]; } Typically you generate 4 partial sums in your loop and then just sum horizontally across the 4 elements after the loop, e.g. #include <cassert> #include <cstdint> #include <emmintrin.h> float vsum(const float *a, int n) { float sum; __m128 vsum = _mm_set1_ps(0.0f); assert((n & 3) == 0); assert(((uintptr_t)a & 15) == 0); for (int i = 0; i < n; i += 4) { __m128 v = _mm_load_ps(

How to create a coupon on specific product in Magento?

阅读更多关于 How to create a coupon on specific product in Magento?

问题 Let's say I have 10% off coupon code. This coupon is applicable only to Product B A customer have in its cart : Product P1 Product B Product P2 I don't want my 10% off coupon apply to other product but only to Product B. Do you know how I can do that within Magento? 回答1: Here is the process to create Coupon Code for any particular product:- Login to Admin Go to Promotions -> Shopping Cart Price Rules Click Add New Rule Fill Rule Information Set Conditions On left sidebar, click Conditions tab

CUDA: In warp reduction and volatile keyword

阅读更多关于 CUDA: In warp reduction and volatile keyword

问题 After reading the question and its answer from the following LINK I still have a question remaining in my mind. From my background in C/C++; I understand that using volatile has it's demerits. And also it is pointed in the answers that in case of CUDA, the optimizations can replace shared array with registers to keep data if volatile keyword is not used. I want to know what would be the performance issues that can be encountered when calculating (sum) reduction. e.g. __device__ void sum

CUDA Thrust: reduce_by_key on only some values in an array, based off values in a “key” array

阅读更多关于 CUDA Thrust: reduce_by_key on only some values in an array, based off values in a “key” array

问题 Let's say I have two device_vector<byte> arrays, d_keys and d_data . If d_data is, for example, a flattened 2D 3x5 array ( e.g. { 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 7, 6, 5, 4, 3 } ) and d_keys is a 1D array of size 5 ( e.g. { 1, 0, 0, 1, 1 } ), how can I do a reduction such that I'd end up only adding values on a per-row basis if the corresponding d_keys value is one ( e.g. ending up with a result of { 10, 23, 14 } )? The sum_rows.cu example allows me to add every value in d_data , but that's not

Class Scheduling to Boolean satisfiability [Polynomial-time reduction]

阅读更多关于 Class Scheduling to Boolean satisfiability [Polynomial-time reduction]

问题 I have some theoretical/practical problem and I don't have clue for now on how to manage, Here it is: I create a SAT solver able to find a model when one is existing and to prove the contradiction when it's not the case on CNF problems in C using genetics algorithms. A SAT-problem looks basically like this kind of problem : My goal is to use this solver to find solutions in a lot of different NP-completes problems. Basically, I translate different problems into SAT, solve SAT with my solver

How to perform reduction on a huge 2D matrix along the row direction using cuda? (max value and max value's index for each row)

阅读更多关于 How to perform reduction on a huge 2D matrix along the row direction using cuda? (max value and max value's index for each row)

问题 I'm trying to implement a reduction along the row direction of a 2D matrix. I'm starting from a code I found on stackoverflow (thanks a lot Robert!) thrust::max_element slow in comparison cublasIsamax - More efficient implementation? The above link shows a custom kernel that performs reduction on a single row. It divides the input row into many rows and each row has 1024 threads. Works very well. For the 2D case, everything's the same except that now there's a y grid dimension. So each block

SSE reduction of float vector

阅读更多关于 SSE reduction of float vector

问题 How can I get sum elements (reduction) of float vector using sse intrinsics? Simple serial code: void(float *input, float &result, unsigned int NumElems) { result = 0; for(auto i=0; i<NumElems; ++i) result += input[i]; } 回答1: Typically you generate 4 partial sums in your loop and then just sum horizontally across the 4 elements after the loop, e.g. #include <cassert> #include <cstdint> #include <emmintrin.h> float vsum(const float *a, int n) { float sum; __m128 vsum = _mm_set1_ps(0.0f);

Converting BMP image to set of instructions for a plotter?

阅读更多关于 Converting BMP image to set of instructions for a plotter?

I have a plotter like this one: The task which I have to implement is conversion of 24 bits BMP to set of instructions for this plotter. In the plotter I can change 16 common colors. The first complexity which I face is the colors reduction. The second complexity which I face is how to transform pixels into set of drawing instructions. As drawing tool brush with oil paint will be used. It means that plotter drawing lines will not be so tiny and they will be relatively short. Please suggest algorithms which can be used for solving this image data conversion problem? Some initial results:

Reducing on array in OpenMP

阅读更多关于 Reducing on array in OpenMP

问题 I am trying to parallelize the following program, but don\'t know how to reduce on an array. I know it is not possible to do so, but is there an alternative? Thanks. (I added reduction on m which is wrong but would like to have an advice on how to do it.) #include <iostream> #include <stdio.h> #include <time.h> #include <omp.h> using namespace std; int main () { int A [] = {84, 30, 95, 94, 36, 73, 52, 23, 2, 13}; int S [10]; time_t start_time = time(NULL); #pragma omp parallel for private(m)

Converting BMP image to set of instructions for a plotter?

阅读更多关于 Converting BMP image to set of instructions for a plotter?

问题 I have a plotter like this one: The task which I have to implement is conversion of 24 bits BMP to set of instructions for this plotter. In the plotter I can change 16 common colors. The first complexity which I face is the colors reduction. The second complexity which I face is how to transform pixels into set of drawing instructions. As drawing tool brush with oil paint will be used. It means that plotter drawing lines will not be so tiny and they will be relatively short. Please suggest