vectorization

numpy/pandas vectorize custom for loop

牧云@^-^@ 提交于 2020-12-31 14:57:40
问题 I created some example code that mimic what code I got: import numpy as np arr = np.random.random(100) arr2 = np.linspace(0, 1, 20) arr3 = np.zeros(20) # this is the array i want to store the result in for index, num in enumerate(list(arr2)): arr3[index] = np.mean(arr[np.abs(num - arr) < 0.2]) >>> arr3 array([0.10970893, 0.1132479 , 0.14687451, 0.17257954, 0.19401919, 0.23852137, 0.29151448, 0.35715096, 0.43273118, 0.45800796, 0.52940421, 0.60345354, 0.63969432, 0.67656363, 0.72921913, 0

How do I write a function in r to do cacluations on a record?

跟風遠走 提交于 2020-12-26 10:58:05
问题 In C# I am used to the concept of a data set and a current record. It would be easy for me to write a complicated calc-price function with conditions on the current record. I am having trouble understanding how to do this in r. I tried the following train <- read.csv("Train.csv" ) df <- as.data.frame.matrix(train) v = c( df$Fuel.Type ,df$No.Gears) names(v ) <- c( "FuelType" ,"NoGears") df$FEType = FEType( v) Where the my function is defined as FEType <- function(v ){ ret="Low" if (v["FuelType

add a vector to all rows of a matrix

倾然丶 夕夏残阳落幕 提交于 2020-12-25 09:46:48
问题 I am maximizing a likelihood function and trying to reduce the loop. I want to add the vector(parameters to be estimated) to all rows of a matrix (data). The length of vector equals to the column of matrix. a+b would give the wrong results because the recycle rule of R is by column not row. a<-c(1,2,0,0,0) # parameters to be optimized b<-matrix(1,ncol=5,nrow=6) # data t(a+t(b)) # my code would work, anything more intuitive? Desired output [,1] [,2] [,3] [,4] [,5] [1,] 2 3 1 1 1 [2,] 2 3 1 1 1

add a vector to all rows of a matrix

心已入冬 提交于 2020-12-25 09:45:12
问题 I am maximizing a likelihood function and trying to reduce the loop. I want to add the vector(parameters to be estimated) to all rows of a matrix (data). The length of vector equals to the column of matrix. a+b would give the wrong results because the recycle rule of R is by column not row. a<-c(1,2,0,0,0) # parameters to be optimized b<-matrix(1,ncol=5,nrow=6) # data t(a+t(b)) # my code would work, anything more intuitive? Desired output [,1] [,2] [,3] [,4] [,5] [1,] 2 3 1 1 1 [2,] 2 3 1 1 1

R: Vectorize loop to create pairwise matrix

白昼怎懂夜的黑 提交于 2020-11-29 04:11:12
问题 I want to speed up a function for creating a pairwise matrix that describes the number of times an object is selected before and after all other objects, within a set of locations. Here is an example df : df <- data.frame(Shop = c("A","A","A","B","B","C","C","D","D","D","E","E","E"), Fruit = c("apple", "orange", "pear", "orange", "pear", "pear", "apple", "pear", "apple", "orange", "pear", "apple", "orange"), Order = c(1, 2, 3, 1, 2, 1, 2, 1, 2, 3, 1, 1, 1)) In each Shop , Fruit is picked by a

Vectorized creation of an array of diagonal square arrays from a liner array in Numpy or Tensorflow

佐手、 提交于 2020-11-28 09:19:08
问题 I have an array of shape [batch_size, N] , for example: [[1 2] [3 4] [5 6]] and I need to create a 3 indices array with shape [batch_size, N, N] where for every batch I have a N x N diagonal matrix, where diagonals are taken by the corresponding batch element, for example in this case, In this simple case, the result I am looking for is: [ [[1,0],[0,2]], [[3,0],[0,4]], [[5,0],[0,6]], ] How can I make this operation without for loops and exploting vectorization? I guess it is an extension of

Efficient computation of minimum of Haversine distances

懵懂的女人 提交于 2020-11-28 08:32:49
问题 I have a dataframe with >2.7MM coordinates , and a separate list of ~2,000 coordinates . I'm trying to return the minimum distance between the coordinates in each individual row compared to every coordinate in the list . The following code works on a small scale (dataframe with 200 rows), but when calculating over 2.7MM rows, it seemingly runs forever. from haversine import haversine df Latitude Longitude 39.989 -89.980 39.923 -89.901 39.990 -89.987 39.884 -89.943 39.030 -89.931 end_coords

Efficient computation of minimum of Haversine distances

本秂侑毒 提交于 2020-11-28 08:32:39
问题 I have a dataframe with >2.7MM coordinates , and a separate list of ~2,000 coordinates . I'm trying to return the minimum distance between the coordinates in each individual row compared to every coordinate in the list . The following code works on a small scale (dataframe with 200 rows), but when calculating over 2.7MM rows, it seemingly runs forever. from haversine import haversine df Latitude Longitude 39.989 -89.980 39.923 -89.901 39.990 -89.987 39.884 -89.943 39.030 -89.931 end_coords

Efficient computation of minimum of Haversine distances

一世执手 提交于 2020-11-28 08:32:26
问题 I have a dataframe with >2.7MM coordinates , and a separate list of ~2,000 coordinates . I'm trying to return the minimum distance between the coordinates in each individual row compared to every coordinate in the list . The following code works on a small scale (dataframe with 200 rows), but when calculating over 2.7MM rows, it seemingly runs forever. from haversine import haversine df Latitude Longitude 39.989 -89.980 39.923 -89.901 39.990 -89.987 39.884 -89.943 39.030 -89.931 end_coords

Efficient computation of minimum of Haversine distances

落花浮王杯 提交于 2020-11-28 08:32:09
问题 I have a dataframe with >2.7MM coordinates , and a separate list of ~2,000 coordinates . I'm trying to return the minimum distance between the coordinates in each individual row compared to every coordinate in the list . The following code works on a small scale (dataframe with 200 rows), but when calculating over 2.7MM rows, it seemingly runs forever. from haversine import haversine df Latitude Longitude 39.989 -89.980 39.923 -89.901 39.990 -89.987 39.884 -89.943 39.030 -89.931 end_coords