vectorization

Efficient search for permutations that contain sub-permutations via array operations?

◇◆丶佛笑我妖孽 提交于 2019-12-19 03:33:21
问题 I have a set of integers, say S = {1,...,10}, and two matrices N and M, whose rows are some (but not necessarily all possible) permutations of elements from S of orders, say, 3 and 5 respectively, e.g. N = [1 2 3; 2 5 3;...], M = [1 2 3 4 5; 2 4 7 8 1;...]. A sub-permutation Q of a permutation P is just an indexed subset of P such that the order of the indices of the elements of Q is the same as the order of their indices in P. Example: [2,4,7] is a sub-permutation of [2,3,4,6,7,1], but [1,2

Using OpenMP stops GCC auto vectorising

纵然是瞬间 提交于 2019-12-19 02:51:52
问题 I have been working on making my code able to be auto vectorised by GCC, however, when I include the the -fopenmp flag it seems to stop all attempts at auto vectorisation. I am using the ftree-vectorize -ftree-vectorizer-verbose=5 to vectorise and monitor it. If I do not include the flag, it starts to give me a lot of information about each loop, if it is vectorised and why not. The compiler stops when I try to use the omp_get_wtime() function, since it can't be linked. Once the flag is

Is there a reason to prefer '&&' over '&' in 'if' statements, other than short-circuiting?

五迷三道 提交于 2019-12-18 12:59:10
问题 Yes I know, there have been a number of questions (see this one, for example) regarding the usage of & vs. && in R, but I have not found one that specifically answers my question. As I understand the differences, & does element-wise, vectorised comparison, much like the other arithmetic operations. It hence returns a logical vector that has length > 1 if both arguments have length > 1. && compares the first elements of both vectors and always returns a result of length 1. Moreover, it does

Auto-Vectorize comparison

不想你离开。 提交于 2019-12-18 09:17:22
问题 I've problems getting my g++ 5.4 use vectorization for comparison. Basically I want to compare 4 unsigned ints using vectorization. My first approach was straight forward: bool compare(unsigned int const pX[4]) { bool c1 = (temp[0] < 1); bool c2 = (temp[1] < 2); bool c3 = (temp[2] < 3); bool c4 = (temp[3] < 4); return c1 && c2 && c3 && c4; } Compiling with g++ -std=c++11 -Wall -O3 -funroll-loops -march=native -mtune=native -ftree-vectorize -msse -msse2 -ffast-math -fopt-info-vec-missed told

Reverse a AVX register containing doubles using a single AVX intrinsic

大憨熊 提交于 2019-12-18 09:08:15
问题 If I have an AVX register with 4 doubles in them and I want to store the reverse of this in another register, is it possible to do this with a single intrinsic command? For example: If I had 4 floats in a SSE register, I could use: _mm_shuffle_ps(A,A,_MM_SHUFFLE(0,1,2,3)); Can I do this using, maybe _mm256_permute2f128_pd() ? I don't think you can address each individual double using the above intrinsic. 回答1: You actually need 2 permutes to do this: _mm256_permute2f128_pd() only permutes in

Remove for loop from clustering algorithm in MATLAB

左心房为你撑大大i 提交于 2019-12-18 09:03:03
问题 I am trying to improve the performance of the OPTICS clustering algorithm. The implementation i've found in open source makes a use of a for loop for each sample and can run for hours... I believe some use of repmat() function may aid in improving its performance when the system has enough amount of RAM. You are more than welcome to suggest other ways of improving the implementation. Here is the code: x is the data: a [mxn] array where m is the sample size and n is the feature dimensionality,

New Dataframe column as a generic function of other rows (pandas)

一世执手 提交于 2019-12-18 08:46:05
问题 What is the fastest (and most efficient) way to create a new column in a DataFrame that is a function of other rows in pandas ? Consider the following example: import pandas as pd d = { 'id': [1, 2, 3, 4, 5, 6], 'word': ['cat', 'hat', 'hag', 'hog', 'dog', 'elephant'] } pandas_df = pd.DataFrame(d) Which yields: id word 0 1 cat 1 2 hat 2 3 hag 3 4 hog 4 5 dog 5 6 elephant Suppose I want to create a new column bar containing a value that is based on the output of using a function foo to compare

Numpy: assigning values to 2d array with list of indices

谁说我不能喝 提交于 2019-12-18 08:25:42
问题 I have 2d numpy array (think greyscale image). I want to assign certain value to a list of coordinates to this array, such that: img = np.zeros((5, 5)) coords = np.array([[0, 1], [1, 2], [2, 3], [3, 4]]) def bad_use_of_numpy(img, coords): for i, coord in enumerate(coords): img[coord[0], coord[1]] = 255 return img bad_use_of_numpy(img, coords) This works, but I feel like I can take advantage of numpy functionality to make it faster. I also might have a use case later to to something like

How to compute sum of binomial more efficiently?

社会主义新天地 提交于 2019-12-18 07:22:37
问题 I must calculate an equation as follows: where k1,k2 are given. I am using MATLAB to compute P . I think I have a correct implementation for the above equation. However, my implementation is so slow. I think the issue is from binomial coefficient. From the equation, could I have an efficient way to speed up the time? Thank all. For k1=150; k2=150; D=200; , it takes 11.6 seconds function main warning ('off'); function test_binom() k1=150; k2=150; D=200; P=0; for i=0:D-1 for j=0:i if (i-j>k2||j

How to Build a Distance Matrix without a Loop (Vectorization)?

心已入冬 提交于 2019-12-18 06:59:21
问题 I have many points and I want to build distance matrix i.e. distance of every point with all of other points but I want to don't use from loop because take too time... Is a better way for building this matrix? this is my loop: for a setl with size: 10000x3 this method take a lot of my time :( for i=1:size(setl,1) for j=1:size(setl,1) dist = sqrt((xl(i)-xl(j))^2+(yl(i)-yl(j))^2+... (zl(i)-zl(j))^2); distanceMatrix(i,j) = dist; end end 回答1: How about using some linear algebra? The distance of