vectorization

MATLAB fast (componentwise) vector operations are…really fast

血红的双手。 提交于 2021-02-19 02:53:29
问题 I am writing MATLAB scripts since some time and, still, I do not understand how it works "under the hood". Consider the following script, that do some computation using (big) vectors in three different ways: MATLAB vector operations; Simple for cycle that do the same computation component-wise; An optimized cycle that is supposed to be faster than 2. since avoid some allocation and some assignment. Here is the code: N = 10000000; A = linspace(0,100,N); B = linspace(-100,100,N); C = linspace(0

MATLAB fast (componentwise) vector operations are…really fast

前提是你 提交于 2021-02-19 02:53:01
问题 I am writing MATLAB scripts since some time and, still, I do not understand how it works "under the hood". Consider the following script, that do some computation using (big) vectors in three different ways: MATLAB vector operations; Simple for cycle that do the same computation component-wise; An optimized cycle that is supposed to be faster than 2. since avoid some allocation and some assignment. Here is the code: N = 10000000; A = linspace(0,100,N); B = linspace(-100,100,N); C = linspace(0

AVX 4-bit integers

[亡魂溺海] 提交于 2021-02-18 12:12:32
问题 I need to perform the following operation: w[i] = scale * v[i] + point scale and point are fixed, whereas v[] is a vector of 4-bit integers. I need to compute w[] for the arbitrary input vector v[] and I want to speed up the process using AVX intrinsics. However, v[i] is a vector of 4-bit integers. The question is how to perform operations on 4-bit integers using intrinsics? I could use 8-bit integers and perform operations that way, but is there a way to do the following: [a,b] + [c,d] = [a

Numpy two matrices, pairwise dot product of rows [duplicate]

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-17 06:25:25
问题 This question already has answers here : Vectorized way of calculating row-wise dot product two matrices with Scipy (5 answers) Closed 4 years ago . We are currently working on a python project and have to vectorize a lot due to performance constraints. We end up with the following calculation: We have two numpy arrays of shape (20,6) and want to calculate the pairwise dot product of the rows, i.e. we should obtain a (20,1) matrix in the end, where each row is the scalar obtained by the

Numpy index of the maximum with reduction - numpy.argmax.reduceat

给你一囗甜甜゛ 提交于 2021-02-16 18:43:33
问题 I have a flat array b : a = numpy.array([0, 1, 1, 2, 3, 1, 2]) And an array c of indices marking the start of each "chunk": b = numpy.array([0, 4]) I know I can find the maximum in each "chunk" using a reduction: m = numpy.maximum.reduceat(a,b) >>> array([2, 3], dtype=int32) But... Is there a way to find the index of the maximum <edit> within a chunk </edit> (like numpy.argmax ), with vectorized operations (no lists, loops)? 回答1: Borrowing the idea from this post. Steps involved : Offset all

How to implement sign function with SSE3?

一世执手 提交于 2021-02-16 13:08:38
问题 1) Is there a way to efficiently implement sign function using SSE3 (no SSE4) with the following characteristics? the input is a float vector __m128 . the output should be also __m128 with [-1.0f, 0.0f, 1.0f] as its values I tried this, but it didn't work (though I think it should): inputVal = _mm_set_ps(-0.5, 0.5, 0.0, 3.0); comp1 = _mm_cmpgt_ps(_mm_setzero_ps(), inputVal); comp2 = _mm_cmpgt_ps(inputVal, _mm_setzero_ps()); comp1 = _mm_castsi128_ps(_mm_castps_si128(comp1)); comp2 = _mm

R apply multiple functions when large number of categories/types are present using case_when (R vectorization)

夙愿已清 提交于 2021-02-11 17:24:10
问题 Suppose I have a dataset of the following form: City=c(1,2,2,1) Business=c(2,1,1,2) ExpectedRevenue=c(35,20,15,19) zz=data.frame(City,Business,ExpectedRevenue) zz_new=do.call("rbind", replicate(zz, n=30, simplify = FALSE)) My actual dataset contains about 200K rows. Furthermore, it contains information for over 100 cities. Suppose, for each city (which I also call "Type"), I have the following functions which need to be applied: #Writing the custom functions for the categories here Type1

Issue vectorizing a recursive function that is used in iterative scheme to calculate Numpy array

假如想象 提交于 2021-02-11 15:07:46
问题 I have the following recursive function, def subspaceiterate(A,V,v,j): if j == 0: return v else: v_jm1 = V[:,j-1] v_jm1 = np.reshape(v_jm1,(np.size(V,axis=0),1)) v = v - np.matmul(v_jm1.T,np.matmul(A,v_jm1)) j = j - 1 subspaceiterate(A,V,v,j) A is an mxm matrix whose eigenvalues and eigenvectors I want to compute using an iterative method, V is an mxm matrix that stores my initial guess for the eigenvectors of A , v_j is a particular column of V , and j is an index that I descend and use to

Issue vectorizing a recursive function that is used in iterative scheme to calculate Numpy array

醉酒当歌 提交于 2021-02-11 15:06:29
问题 I have the following recursive function, def subspaceiterate(A,V,v,j): if j == 0: return v else: v_jm1 = V[:,j-1] v_jm1 = np.reshape(v_jm1,(np.size(V,axis=0),1)) v = v - np.matmul(v_jm1.T,np.matmul(A,v_jm1)) j = j - 1 subspaceiterate(A,V,v,j) A is an mxm matrix whose eigenvalues and eigenvectors I want to compute using an iterative method, V is an mxm matrix that stores my initial guess for the eigenvectors of A , v_j is a particular column of V , and j is an index that I descend and use to

Replacing for loops with function call inside with broadcasting/vectorized solution

a 夏天 提交于 2021-02-11 13:26:41
问题 Problem: When using broadcasting, rather than broadcasting scalars to match the arrays, the vectorized function is instead, for some reason, shrinking the arrays to scalars. MWE: Below is a MWE. It contains a double for loop. I am having trouble writing faster code that does not use the for loops, but instead, uses broadcasting/vectorized numpy. import numpy as np def OneD(x, y, z): ret = np.exp(x)**(y+1) / (z+1) return ret def ThreeD(a,b,c): value = OneD(a[0],b[0], c) value *= OneD(a[1],b[1]