apply

looping into dates and apply function to pandas dataframe

我的梦境 提交于 2019-12-11 06:27:19
问题 I'm trying to detect the first dates when an event occur: here in my dataframe for the product A (see pivot table) I have 20 items stored for the first time on 2017-04-03. so I want to create a new variable calle new_var_2017-04-03 that store the increment. On the other hand on the next day 2017-04-04 I don't mind if the item is now 50 instead of 20, I only want to store only the 1st event It gives me several errors, I would like to know at least if the entire logic behind it makes sense, it

avoid R loop and parallelize with snow

女生的网名这么多〃 提交于 2019-12-11 06:12:15
问题 I have a large loop that will take too long (~100 days). I'm hoping to speed it up with the snow library, but I'm not great with apply statements. This is only part of the loop, but if I can figure this part out, the rest should be straightforward. I'm ok with a bunch of apply statements or loops, but one apply statement using a function to get object 'p' would be ideal. Original data dim(m1) == x x # x >>> 0 dim(m2) == y x # y >>> 0, y > x, y > x-10 dim(mout) == x x thresh == x-10 #specific

For each row extract the value in the column name that match another value in the cell

自古美人都是妖i 提交于 2019-12-11 05:51:01
问题 I have a question which can be easily solved with a for-loop. However, since I have hundred-thousands rows in a dataframe, this would take very long computational time, and thus I am looking for a quick and smart solution. For each row in my dataframe, I would like to paste the value of the cell whose column name matches the one from the first column (INDEX) The dataframe looks like this > mydata INDEX 1 2 3 4 5 6 1 2 18.9 9.5 22.6 4.7 16.2 7.4 2 2 18.9 9.5 22.6 4.7 16.2 7.4 3 2 18.9 9.5 22.6

How to apply a custom function over each column of a matrix?

邮差的信 提交于 2019-12-11 05:02:59
问题 I have been trying use a custom function that I found on here to recalculate median household income from census tracts aggregated to neighborhoods. My data looks like this > inc_df[, 1:5] San Francisco Bayview Hunters Point Bernal Heights Castro/Upper Market Chinatown 2500-9999 22457 1057 287 329 1059 10000-14999 20708 920 288 463 1327 1500-19999 12701 626 145 148 867 20000-24999 12106 491 285 160 689 25000-29999 10129 554 238 328 167 30000-34999 10310 338 257 179 289 35000-39999 9028 383

parSapply and progress bar

房东的猫 提交于 2019-12-11 04:45:40
问题 I am using the function parSapply to run a simulation on the parallel environment. Here is my code: runpar <- function(i) MonteCarloKfun(i=i) # Detect number of cores available ncores <- detectCores(logical=TRUE) # Set up parallel environment cl <- makeCluster(ncores, methods=FALSE) # Export objects to parallel environment clusterSetRNGStream(cl,1234567) # not necessary since we do not sample clusterExport(cl, c("kfunctions","frq","dvec","case","control","polygon", "MonteCarloKfun", "khat",

Create multiple new columns for pandas dataframe with apply + function

谁说胖子不能爱 提交于 2019-12-11 04:38:34
问题 I have a pandas dataframe df of the following shape: (763, 65) I use the following code to create 4 new columns: df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1) def myFunc(row): #code to get some result from another dataframe return result1, result2, result3, result4 The shape of the dataframe which is returned in myFunc is (1, 4) . The code runs into the following error: ValueError: Shape of passed values is (763, 4), indices imply (763, 65) I know that df has 65 columns and

Python pandas with lambda apply difficulty

时光总嘲笑我的痴心妄想 提交于 2019-12-11 04:34:17
问题 I am running the following function but somehow struggling to have it take the length condition into account (the if part). It simply runs the first part if the function only: stringDataFrame.apply(lambda x: x.str.replace(r'[^0-9]', '') if (len(x) >= 7) else x) it somehow only runs the x.str.replace(r'[^0-9]', '') part for some reason, what am I doing wrong here i have been stuck. 回答1: You can use applymap when you need to work on each value separately, because apply works with all column (

Adding columns sums in dataframe row wise

三世轮回 提交于 2019-12-11 03:07:36
问题 I would like to add the sums of the columns of my dataframe one row at a time. So for each row, I would like to compute the sum of the columns above it. Is there an elegant way to do this with a combination of colSums and apply (or sapply, rollapply)? I have been trying a couple of combinations of those, but could not quite figure it out. 回答1: new_df <- apply(data_frame, 2, cumsum) 回答2: With dplyr , we can do library(dplyr) data %>% mutate_all(cumsum) 来源: https://stackoverflow.com/questions

Apply function too slow in r

折月煮酒 提交于 2019-12-11 02:52:56
问题 I have to calculate for a lot of species a specific formula per row. The formula is a product between a value of abundance and a value present in the last row of the data frame. Then, all these products are summed. My current script consists in using an apply function which appears to be as slow as the for-loop I started with. I simplified the problem in the following script, using a simple df called az : az=data.frame(c(1,2,10),c(2,4,20),c(3,6,30)) colnames(az)=c("a","b","c") # Initial for

Select first row from multiple dataframe and bind

≡放荡痞女 提交于 2019-12-11 01:10:57
问题 I have three data frames which I have combined in a list d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6)) d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4)) d3 <- data.frame(y1 = c(5, 7, 8),y2 = c(6, 4, 2)) my.list <- list(d1, d2,d3) I want to extract the first row of each element in the list, bind them row wise and save as csv file. For example, in above example, I want to extract first row from d1 , d2 and d3 row1.d1 <- c(1,4) row1.d2 <- c(3,6) row1.d3 <- c(5,6) and bind them together dat