vectorization

Efficiently replace part of value from one column with value from another column in pandas using regex?

喜夏-厌秋 提交于 2019-12-14 01:21:04
问题 I have a pandas dataframe df with dates as strings: Date1 Date2 2017-08-31 1970-01-01 17:35:00 2017-10-31 1970-01-01 15:00:00 2017-11-30 1970-01-01 16:30:00 2017-10-31 1970-01-01 16:00:00 2017-10-31 1970-01-01 16:12:00 What I want to do is replace each date part in the Date2 column with the corresponding date in Date1 but leave the time untouched, so the output is: Date1 Date2 2017-08-31 2017-08-31 17:35:00 2017-10-31 2017-10-31 15:00:00 2017-11-30 2017-11-30 16:30:00 2017-10-31 2017-10-31 16

How to check if all elements of a numpy array are in another numpy array

我的未来我决定 提交于 2019-12-14 01:07:32
问题 I have two 2D numpy arrays, for example: A = numpy.array([[1, 2, 4, 8], [16, 32, 32, 8], [64, 32, 16, 8]]) and B = numpy.array([[1, 2], [32, 32]]) I want to have all lines from A where I can find all elements from any of the lines of B . Where there are 2 of the same element in a row of B , lines from A must contain at least 2 as well. In case of my example, I want to achieve this: A_filtered = [[1, 2, 4, 8], [16, 32, 32, 8]] I have control over the values representation so I chose numbers

MATLAB Efficiently find the row that contains two of three elements in a large matrix

匆匆过客 提交于 2019-12-13 20:26:30
问题 I have a large matrix, let's call it A, which has dimension Mx3, e.g. M=4000 rows x 3 columns. Each row in the matrix contains three numbers, eg. [241 112 478]. Out of these three numbers, we can construct three pairs, eg. [241 112], [112 478], [241 478]. Of the other 3999 rows: For each of the three pairs, exactly one row of M (only one) will contain the same pair. However, the order of the numbers could be scrambled. For example, exactly one row will read: [333 478 112]. No other row will

Expression Template implementation not being optimized

时光毁灭记忆、已成空白 提交于 2019-12-13 20:00:02
问题 I'm trying to understand the concept of expression templates in C++, as such I've cobbled together pieces of example code etc to produce a simple vector and associated expression template infrastructure to support only binary operators (+,-,*). Everything compiles, however I've noticed the performance difference between the standard hand written loop versus the expression template variant is quite large. ET is nearly twice as slow as the hand written. I expected a difference but not that much

Vectorizing ther higher dimensions in nested for loop in Matlab

﹥>﹥吖頭↗ 提交于 2019-12-13 19:12:05
问题 I have a 5D matrix A , and I need to multiply the 3rd-5th dimensions with a vector. For example, see the following sample code: A=rand(50,50,10,8,6); B=rand(10,1); C=rand(8,1); D=rand(6,1); for i=1:size(A,3) for j=1:size(A,4) for K=1:size(A,5) A(:,:,i,j,K)=A(:,:,i,j,K)*B(i)*C(j)*D(K); end end end I wonder if there's a better \ vectorized \ faster way to do this? 回答1: Firstly, as a note, these days in Matlab, with JIT compilation, vectorised code is not necessarily faster/better. For big

vectorize numpy mean across the slices of an array

ぐ巨炮叔叔 提交于 2019-12-13 18:37:05
问题 Is there a way to vectorize a function so that the output would be an array of means where each mean represents the mean of the values from 0-index of the input array? Looping this is pretty straightforward but I am trying to be as efficient as possible. e.g. 0 = mean(0), 1 = mean(0-1), N = mean(0-N) 回答1: The intended operation could be coined as cumulative averaging . So, an obvious solution would involve cumulative summation and dividing those summations by the number of elements

Linspace applied on array [duplicate]

天涯浪子 提交于 2019-12-13 17:23:21
问题 This question already has an answer here : Linspace using matrix input matlab (1 answer) Closed last year . Given an array like a = [ -1; 0; 1]; . For each a(i) , I need to compute a linearly spaced vector with linspace(min(a(i),0),max(a(i),0),3); , where each linspace-vector should be stored into a matrix: A = [-1 -0.5 0; 0 0 0; 0 0.5 1]; With a for loop, I can do this like so: for i=1:3 A(i) = linspace(min(a(i),0),max(a(i),0),3); end How can I achieve this without using loops? 回答1: The

Efficiency problem of customizing numpy's vectorized operation

不问归期 提交于 2019-12-13 17:06:10
问题 I have a python function given below: def myfun(x): if x > 0: return 0 else: return np.exp(x) where np is the numpy library. I want to make the function vectorized in numpy, so I use: vec_myfun = np.vectorize(myfun) I did a test to evaluate the efficiency. First I generate a vector of 100 random numbers: x = np.random.randn(100) Then I run the following code to obtain the runtime: %timeit np.exp(x) %timeit vec_myfun(x) The runtime for np.exp(x) is 1.07 µs ± 24.9 ns per loop (mean ± std. dev.

create a matrix from array of elements under diagonal in numpy

て烟熏妆下的殇ゞ 提交于 2019-12-13 17:03:19
问题 I would like to create a matrix using a list whose elements would be the elements of the matrix under the diagonal. import numpy as np x1 = np.array([0.9375, 0.75, 0.4375, 0.0, 0.9375, 0.75, 0.4375, 0.9375, 0.75, 0.9375]) x1 the matrix I would like to have is array([[ 1. , 0.9375, 0.75 , 0.4375, 0. ], [ 0.9375, 1. , 0.9375, 0.75 , 0.4375], [ 0.75 , 0.9375, 1. , 0.9375, 0.75 ], [ 0.4375, 0.75 , 0.9375, 1. , 0.9375], [ 0. , 0.4375, 0.75 , 0.9375, 1. ]]) I thought you could do this with np.tril

User Warning: Your stop_words may be inconsistent with your preprocessing

不问归期 提交于 2019-12-13 15:23:26
问题 I am following this document clustering tutorial. As an input I give a txt file which can be downloaded here. It's a combined file of 3 other txt files divided with a use of \n. After creating a tf-idf matrix I received this warning: ,,UserWarning: Your stop_words may be inconsistent with your preprocessing. Tokenizing the stop words generated tokens ['abov', 'afterward', 'alon', 'alreadi', 'alway', 'ani', 'anoth', 'anyon', 'anyth', 'anywher', 'becam', 'becaus', 'becom', 'befor', 'besid',