vectorization

Why does vectorization fail?

两盒软妹~` 提交于 2020-01-14 09:44:48
问题 I want to optimize my code for vectorization using -msse2 -ftree-vectorizer-verbose=2. I have the following simple code: int main(){ int a[2048], b[2048], c[2048]; int i; for (i=0; i<2048; i++){ b[i]=0; c[i]=0; } for (i=0; i<2048; i++){ a[i] = b[i] + c[i]; } return 0; } Why do I get the note test.cpp:10: note: not vectorized: not enough data-refs in basic block. Thanks! 回答1: For what it's worth, after adding an asm volatile("": "+m"(a), "+m"(b), "+m"(c)::"memory"); near the end of main , my

Product of a sequence in NumPy

℡╲_俬逩灬. 提交于 2020-01-13 20:39:07
问题 I need to implement this following function with NumPy - where F_l(x) are N number of arrays that I need to calculate, which are dependent on an array G(x) , that I am given, and A_j are N coefficients that are also given. I would like to implement it in NumPy as I would have to calculate F_l(x) for every iteration of my program. The dummy way to do this is by for loops and ifs: import numpy as np A = np.arange(1.,5.,1) G = np.array([[1.,2.],[3.,4.]]) def calcF(G,A): N = A.size print A print

NumPy - Vectorizing loops involving range iterators

牧云@^-^@ 提交于 2020-01-13 10:10:12
问题 Is there any way to make this work without for loops? import import numpy as np import matplotlib.pyplot as plt L = 1 N = 255 dh = 2*L/N dh2 = dh*dh phi_0 = 1 c = int(N/2) r_0 = L/2 arr = np.empty((N, N)) for i in range(N): for j in range(N): arr[i, j] = phi_0 if (i - c)**2 + (j - c)**2 < r_0**2/dh2 else 0 plt.imshow(arr) I've tried calling function(x[None,:], y[:, None]), where: function(i, j): return phi_0 if (i - c)**2 + (j - c)**2 < r_0**2/dh2 else 0 but it requires list .any or .all

MATLAB: detect and remove mirror imaged pairs in 2 column matrix

风格不统一 提交于 2020-01-13 09:25:09
问题 I have a matrix [1 2 3 6 7 1 2 1] and would like to remove mirror imaged pairs..i.e. output would be either: [1 2 3 6 7 1] or [3 6 7 1 2 1] Is there a simple way to do this? I can imagine a complicated for loop, something like (or a version which wouldn't delete the original pair..only the duplicates): for i=1:y var1=(i,1); var2=(i,2); for i=1:y if array(i,1)==var1 && array(i,2)==var2 | array(i,1)==var2 && array(i,2)==var1 array(i,1:2)=[]; end end end thanks 回答1: How's this for simplicity - A

Max value per diagonal in 2d array

点点圈 提交于 2020-01-13 09:15:13
问题 I have array and need max of rolling difference with dynamic window. a = np.array([8, 18, 5,15,12]) print (a) [ 8 18 5 15 12] So first I create difference by itself: b = a - a[:, None] print (b) [[ 0 10 -3 7 4] [-10 0 -13 -3 -6] [ 3 13 0 10 7] [ -7 3 -10 0 -3] [ -4 6 -7 3 0]] Then replace upper triangle matrix to 0: c = np.tril(b) print (c) [[ 0 0 0 0 0] [-10 0 0 0 0] [ 3 13 0 0 0] [ -7 3 -10 0 0] [ -4 6 -7 3 0]] Last need max values per diagonal, so it means: max([0,0,0,0,0]) = 0 max([-10,13

Max value per diagonal in 2d array

血红的双手。 提交于 2020-01-13 09:15:11
问题 I have array and need max of rolling difference with dynamic window. a = np.array([8, 18, 5,15,12]) print (a) [ 8 18 5 15 12] So first I create difference by itself: b = a - a[:, None] print (b) [[ 0 10 -3 7 4] [-10 0 -13 -3 -6] [ 3 13 0 10 7] [ -7 3 -10 0 -3] [ -4 6 -7 3 0]] Then replace upper triangle matrix to 0: c = np.tril(b) print (c) [[ 0 0 0 0 0] [-10 0 0 0 0] [ 3 13 0 0 0] [ -7 3 -10 0 0] [ -4 6 -7 3 0]] Last need max values per diagonal, so it means: max([0,0,0,0,0]) = 0 max([-10,13

Improve performance of a for loop in Python (possibly with numpy or numba)

浪尽此生 提交于 2020-01-13 04:49:07
问题 I want to improve the performance of the for loop in this function. import numpy as np import random def play_game(row, n=1000000): """Play the game! This game is a kind of random walk. Arguments: row (int[]): row index to use in the p matrix for each step in the walk. Then length of this array is the same as n. n (int): number of steps in the random walk """ p = np.array([[ 0.499, 0.499, 0.499], [ 0.099, 0.749, 0.749]]) X0 = 100 Y0 = X0 % 3 X = np.zeros(n) tempX = X0 Y = Y0 for j in range(n)

Fast column shuffle of each row numpy

女生的网名这么多〃 提交于 2020-01-12 13:59:07
问题 I have a large 10,000,000+ length array that contains rows. I need to individually shuffle those rows. For example: [[1,2,3] [1,2,3] [1,2,3] ... [1,2,3]] to [[3,1,2] [2,1,3] [1,3,2] ... [1,2,3]] I'm currently using map(numpy.random.shuffle, array) But it's a python (not NumPy) loop and it's taking 99% of my execution time. Sadly, the PyPy JIT doesn't implement numpypy.random , so I'm out of luck. Is there any faster way? I'm willing to use any library ( pandas , scikit-learn , scipy , theano

Vectorized Trig functions in C?

筅森魡賤 提交于 2020-01-12 07:30:27
问题 I'm looking to calculate highly parallelized trig functions (in block of like 1024), and I'd like to take advantage of at least some of the parallelism that modern architectures have. When I compile a block for(int i=0; i<SIZE; i++) { arr[i]=sin((float)i/1024); } GCC won't vectorize it, and says not vectorized: relevant stmt not supported: D.3068_39 = __builtin_sinf (D.3069_38); Which makes sense to me. However, I'm wondering if there's a library to do parallel trig computations. With just a

Vectorizing the Kinect real-world coordinate processing algorithm for speed

末鹿安然 提交于 2020-01-11 19:51:27
问题 I recently started working with the Kinect V2 on Linux with pylibfreenect2. When I first was able to show the depth frame data in a scatter plot I was disappointed to see that none of the depth pixels seemed to be in the correct location. Side view of a room (notice that the ceiling is curved). I did some research and realized there's some simple trig involved to do the conversions. To test I started with a pre-written function in pylibfreenect2 which accepts a column, row and a depth pixel