vectorization | 易学教程

Why does vectorization fail?

阅读更多关于 Why does vectorization fail?

问题 I want to optimize my code for vectorization using -msse2 -ftree-vectorizer-verbose=2. I have the following simple code: int main(){ int a[2048], b[2048], c[2048]; int i; for (i=0; i<2048; i++){ b[i]=0; c[i]=0; } for (i=0; i<2048; i++){ a[i] = b[i] + c[i]; } return 0; } Why do I get the note test.cpp:10: note: not vectorized: not enough data-refs in basic block. Thanks! 回答1: For what it's worth, after adding an asm volatile("": "+m"(a), "+m"(b), "+m"(c)::"memory"); near the end of main , my

Product of a sequence in NumPy

阅读更多关于 Product of a sequence in NumPy

问题 I need to implement this following function with NumPy - where F_l(x) are N number of arrays that I need to calculate, which are dependent on an array G(x) , that I am given, and A_j are N coefficients that are also given. I would like to implement it in NumPy as I would have to calculate F_l(x) for every iteration of my program. The dummy way to do this is by for loops and ifs: import numpy as np A = np.arange(1.,5.,1) G = np.array([[1.,2.],[3.,4.]]) def calcF(G,A): N = A.size print A print

NumPy - Vectorizing loops involving range iterators

阅读更多关于 NumPy - Vectorizing loops involving range iterators

问题 Is there any way to make this work without for loops? import import numpy as np import matplotlib.pyplot as plt L = 1 N = 255 dh = 2*L/N dh2 = dh*dh phi_0 = 1 c = int(N/2) r_0 = L/2 arr = np.empty((N, N)) for i in range(N): for j in range(N): arr[i, j] = phi_0 if (i - c)**2 + (j - c)**2 < r_0**2/dh2 else 0 plt.imshow(arr) I've tried calling function(x[None,:], y[:, None]), where: function(i, j): return phi_0 if (i - c)**2 + (j - c)**2 < r_0**2/dh2 else 0 but it requires list .any or .all

MATLAB: detect and remove mirror imaged pairs in 2 column matrix

阅读更多关于 MATLAB: detect and remove mirror imaged pairs in 2 column matrix

问题 I have a matrix [1 2 3 6 7 1 2 1] and would like to remove mirror imaged pairs..i.e. output would be either: [1 2 3 6 7 1] or [3 6 7 1 2 1] Is there a simple way to do this? I can imagine a complicated for loop, something like (or a version which wouldn't delete the original pair..only the duplicates): for i=1:y var1=(i,1); var2=(i,2); for i=1:y if array(i,1)==var1 && array(i,2)==var2 | array(i,1)==var2 && array(i,2)==var1 array(i,1:2)=[]; end end end thanks 回答1: How's this for simplicity - A

Max value per diagonal in 2d array

阅读更多关于 Max value per diagonal in 2d array

问题 I have array and need max of rolling difference with dynamic window. a = np.array([8, 18, 5,15,12]) print (a) [ 8 18 5 15 12] So first I create difference by itself: b = a - a[:, None] print (b) [[ 0 10 -3 7 4] [-10 0 -13 -3 -6] [ 3 13 0 10 7] [ -7 3 -10 0 -3] [ -4 6 -7 3 0]] Then replace upper triangle matrix to 0: c = np.tril(b) print (c) [[ 0 0 0 0 0] [-10 0 0 0 0] [ 3 13 0 0 0] [ -7 3 -10 0 0] [ -4 6 -7 3 0]] Last need max values per diagonal, so it means: max([0,0,0,0,0]) = 0 max([-10,13

Max value per diagonal in 2d array

阅读更多关于 Max value per diagonal in 2d array

Improve performance of a for loop in Python (possibly with numpy or numba)

阅读更多关于 Improve performance of a for loop in Python (possibly with numpy or numba)

问题 I want to improve the performance of the for loop in this function. import numpy as np import random def play_game(row, n=1000000): """Play the game! This game is a kind of random walk. Arguments: row (int[]): row index to use in the p matrix for each step in the walk. Then length of this array is the same as n. n (int): number of steps in the random walk """ p = np.array([[ 0.499, 0.499, 0.499], [ 0.099, 0.749, 0.749]]) X0 = 100 Y0 = X0 % 3 X = np.zeros(n) tempX = X0 Y = Y0 for j in range(n)

Fast column shuffle of each row numpy

阅读更多关于 Fast column shuffle of each row numpy

问题 I have a large 10,000,000+ length array that contains rows. I need to individually shuffle those rows. For example: [[1,2,3] [1,2,3] [1,2,3] ... [1,2,3]] to [[3,1,2] [2,1,3] [1,3,2] ... [1,2,3]] I'm currently using map(numpy.random.shuffle, array) But it's a python (not NumPy) loop and it's taking 99% of my execution time. Sadly, the PyPy JIT doesn't implement numpypy.random , so I'm out of luck. Is there any faster way? I'm willing to use any library ( pandas , scikit-learn , scipy , theano

Vectorized Trig functions in C?

阅读更多关于 Vectorized Trig functions in C?

问题 I'm looking to calculate highly parallelized trig functions (in block of like 1024), and I'd like to take advantage of at least some of the parallelism that modern architectures have. When I compile a block for(int i=0; i<SIZE; i++) { arr[i]=sin((float)i/1024); } GCC won't vectorize it, and says not vectorized: relevant stmt not supported: D.3068_39 = __builtin_sinf (D.3069_38); Which makes sense to me. However, I'm wondering if there's a library to do parallel trig computations. With just a

Vectorizing the Kinect real-world coordinate processing algorithm for speed

阅读更多关于 Vectorizing the Kinect real-world coordinate processing algorithm for speed

问题 I recently started working with the Kinect V2 on Linux with pylibfreenect2. When I first was able to show the depth frame data in a scatter plot I was disappointed to see that none of the depth pixels seemed to be in the correct location. Side view of a room (notice that the ceiling is curved). I did some research and realized there's some simple trig involved to do the conversions. To test I started with a pre-written function in pylibfreenect2 which accepts a column, row and a depth pixel