vectorization | 易学教程

vectorize a loop which accesses non-consecutive memory locations

阅读更多关于 vectorize a loop which accesses non-consecutive memory locations

问题 I have a loop of this structure Reference : Maxwell Code Example do z=1,zend do y=1,yend do x=1,xend k=arr(x,y,z) do while(k.ne.0) ix=fooX(k) iy=fooY(k) iz=fooZ(k) x1=x(ix ,iy ,iz) x2=x(ix+1,iy ,iz) x3=x(ix ,iy+1,iz) x4=x(ix+1,iy+1,iz) x5=x(ix ,iy ,iz+1) x6=x(ix+1,iy ,iz+1) x7=x(ix ,iy+1,iz+1) x8=x(ix+1,iy+1,iz+1) y1=y(ix ,iy ,iz) y2=y(ix+1,iy ,iz) y3=y(ix ,iy+1,iz) y4=y(ix+1,iy+1,iz) y5=y(ix ,iy ,iz+1) y6=y(ix+1,iy ,iz+1) y7=y(ix ,iy+1,iz+1) y8=y(ix+1,iy+1,iz+1) z1=z(ix ,iy ,iz) z2=z(ix+1,iy

More efficient method for counting open cases as of creation time of each case

阅读更多关于 More efficient method for counting open cases as of creation time of each case

问题 I am trying find a more efficient way to count the number of cases that are open as of the creation time of each case. A case is "open" between its creation date/time stamp and its censor date/time stamp. You can copy-paste the code below to view a simple functional example: # Create a bunch of date/time stamps for our example two_thousand <- as.POSIXct("2000-01-01 00:00:00", format="%Y-%m-%d %H:%M:%S", tz="UTC", origin="1970-01-01"); two_thousand_one <- as.POSIXct("2001-01-01 00:00:00",

cumulative argmax of a numpy array

阅读更多关于 cumulative argmax of a numpy array

问题 Consider the array a np.random.seed([3,1415]) a = np.random.randint(0, 10, (10, 2)) a array([[0, 2], [7, 3], [8, 7], [0, 6], [8, 6], [0, 2], [0, 4], [9, 7], [3, 2], [4, 3]]) What is a vectorized way to get the cumulative argmax? array([[0, 0], <-- both start off as max position [1, 1], <-- 7 > 0 so 1st col = 1, 3 > 2 2nd col = 1 [2, 2], <-- 8 > 7 1st col = 2, 7 > 3 2nd col = 2 [2, 2], <-- 0 < 8 1st col stays the same, 6 < 7 2nd col stays the same [2, 2], [2, 2], [2, 2], [7, 2], <-- 9 is new

Subtracting multiple vectors from each row of an array (super broadcasting)

阅读更多关于 Subtracting multiple vectors from each row of an array (super broadcasting)

问题 I have a data set, X that is m x 2 , and three vectors stored in a matrix C = [c1'; c2'; c3'] that is 3 x 2 . I am trying to vectorize my code that finds, for each data point in X , which vector in C is closest (squared distance). I would like to subtract each vector (row) in C from each vector (row) in X , resulting in an m x 6 or 3m x 2 matrix of differences between the elements of X and the elements of C . My current implementation does this one row in X at a time: for i = 1:size(X, 1)

Compare two 16-byte values for equality using up to SSE 4.2?

阅读更多关于 Compare two 16-byte values for equality using up to SSE 4.2?

问题 I have a struct like this: struct { uint32_t a; uint16_t b; uint16_t c; uint16_t d; uint8_t e; } s; and I would like to compare two of the above structs for equality, in the fastest way possible. I looked at the Intel Intrinsics Guide but couldn't find a compare for integers, the options available were mainly doubles and single-floating point vector-inputs. Could somebody please advise the best approach? I can add a union to my struct to make processing easier. I am limited (for now) to using

GCC C vector extension: How to move contents of a vector to the left by one element?

阅读更多关于 GCC C vector extension: How to move contents of a vector to the left by one element?

问题 I am new to GCC's C vector extensions. I am considering use of them in my project, but their utility is (somewhat) contingent on the ability to efficiently move all elements in a vector one position to the left and store the result in a new vector. How can I do this efficiently (such as in a SIMD-accelerated way)? So, basically: OriginalVector = {1, 2, 3, 4, 5, 6, 7, 8} ShiftedVector = {2, 3, 4, 5, 6, 7, 8, X} (where X can be anything.) Background information (you can skip this): The purpose

vectorizing a for loop in numpy/scipy?

阅读更多关于 vectorizing a for loop in numpy/scipy?

问题 I'm trying to vectorize a for loop that I have inside of a class method. The for loop has the following form: it iterates through a bunch of points and depending on whether a certain variable (called "self.condition_met" below) is true, calls a pair of functions on the point, and adds the result to a list. Each point here is an element in a vector of lists, i.e. a data structure that looks like array([[1,2,3], [4,5,6], ...]). Here is the problematic function: def myClass: def my_inefficient

Numpy: find the euclidean distance between two 3-D arrays

阅读更多关于 Numpy: find the euclidean distance between two 3-D arrays

问题 Given, two 3-D arrays of dimensions (2,2,2): A = [[[ 0, 0], [92, 92]], [[ 0, 92], [ 0, 92]]] B = [[[ 0, 0], [92, 0]], [[ 0, 92], [92, 92]]] How do you find the Euclidean distance for each vector in A and B efficiently? I have tried for-loops but these are slow, and I'm working with 3-D arrays in the order of (>>2, >>2, 2). Ultimately I want a matrix of the form: C = [[d1, d2], [d3, d4]] Edit: I've tried the following loop, but the biggest issue with it is that loses the dimensions I want to

MATLAB vectorization: filling struct fields from vector elements

阅读更多关于 MATLAB vectorization: filling struct fields from vector elements

问题 I have a vector of structs each having a field x : s1.x = 1; s2.x = 2; s3.x = 3; S = [s1, s2, s3]; I would like to set the field x of all structs in S from a given vector X , i.e. I would like to vectorize the following loop: X = [97, 98, 99]; for i = 1 : length(S) S(i).x = X(i); end Is this possible? 回答1: You can do it this way: Xc = num2cell(X); %// convert X to cell array of numbers [S.x] = Xc{:}; %// generate comma-separated list from cell array, and assign For Matlab versions before 7.0

More efficient way to loop?

阅读更多关于 More efficient way to loop?

问题 I have a small piece of code from a much larger script. I figured out that when the function t_area is called, it is responsible for most of the run time. I tested the function by itself, and it is not slow, it takes a lot of time because of the number of times that it has to be ran I believe. Here is the code where the function is called: tri_area = np.zeros((numx,numy),dtype=float) for jj in range(0,numy-1): for ii in range(0,numx-1): xp = x[ii,jj] yp = y[ii,jj] zp = surface[ii,jj] ap = np