apply | 易学教程

looping into dates and apply function to pandas dataframe

阅读更多关于 looping into dates and apply function to pandas dataframe

问题 I'm trying to detect the first dates when an event occur: here in my dataframe for the product A (see pivot table) I have 20 items stored for the first time on 2017-04-03. so I want to create a new variable calle new_var_2017-04-03 that store the increment. On the other hand on the next day 2017-04-04 I don't mind if the item is now 50 instead of 20, I only want to store only the 1st event It gives me several errors, I would like to know at least if the entire logic behind it makes sense, it

avoid R loop and parallelize with snow

阅读更多关于 avoid R loop and parallelize with snow

问题 I have a large loop that will take too long (~100 days). I'm hoping to speed it up with the snow library, but I'm not great with apply statements. This is only part of the loop, but if I can figure this part out, the rest should be straightforward. I'm ok with a bunch of apply statements or loops, but one apply statement using a function to get object 'p' would be ideal. Original data dim(m1) == x x # x >>> 0 dim(m2) == y x # y >>> 0, y > x, y > x-10 dim(mout) == x x thresh == x-10 #specific

For each row extract the value in the column name that match another value in the cell

阅读更多关于 For each row extract the value in the column name that match another value in the cell

问题 I have a question which can be easily solved with a for-loop. However, since I have hundred-thousands rows in a dataframe, this would take very long computational time, and thus I am looking for a quick and smart solution. For each row in my dataframe, I would like to paste the value of the cell whose column name matches the one from the first column (INDEX) The dataframe looks like this > mydata INDEX 1 2 3 4 5 6 1 2 18.9 9.5 22.6 4.7 16.2 7.4 2 2 18.9 9.5 22.6 4.7 16.2 7.4 3 2 18.9 9.5 22.6

How to apply a custom function over each column of a matrix?

阅读更多关于 How to apply a custom function over each column of a matrix?

问题 I have been trying use a custom function that I found on here to recalculate median household income from census tracts aggregated to neighborhoods. My data looks like this > inc_df[, 1:5] San Francisco Bayview Hunters Point Bernal Heights Castro/Upper Market Chinatown 2500-9999 22457 1057 287 329 1059 10000-14999 20708 920 288 463 1327 1500-19999 12701 626 145 148 867 20000-24999 12106 491 285 160 689 25000-29999 10129 554 238 328 167 30000-34999 10310 338 257 179 289 35000-39999 9028 383

parSapply and progress bar

阅读更多关于 parSapply and progress bar

问题 I am using the function parSapply to run a simulation on the parallel environment. Here is my code: runpar <- function(i) MonteCarloKfun(i=i) # Detect number of cores available ncores <- detectCores(logical=TRUE) # Set up parallel environment cl <- makeCluster(ncores, methods=FALSE) # Export objects to parallel environment clusterSetRNGStream(cl,1234567) # not necessary since we do not sample clusterExport(cl, c("kfunctions","frq","dvec","case","control","polygon", "MonteCarloKfun", "khat",

Create multiple new columns for pandas dataframe with apply + function

阅读更多关于 Create multiple new columns for pandas dataframe with apply + function

问题 I have a pandas dataframe df of the following shape: (763, 65) I use the following code to create 4 new columns: df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1) def myFunc(row): #code to get some result from another dataframe return result1, result2, result3, result4 The shape of the dataframe which is returned in myFunc is (1, 4) . The code runs into the following error: ValueError: Shape of passed values is (763, 4), indices imply (763, 65) I know that df has 65 columns and

Python pandas with lambda apply difficulty

阅读更多关于 Python pandas with lambda apply difficulty

问题 I am running the following function but somehow struggling to have it take the length condition into account (the if part). It simply runs the first part if the function only: stringDataFrame.apply(lambda x: x.str.replace(r'[^0-9]', '') if (len(x) >= 7) else x) it somehow only runs the x.str.replace(r'[^0-9]', '') part for some reason, what am I doing wrong here i have been stuck. 回答1: You can use applymap when you need to work on each value separately, because apply works with all column (

Adding columns sums in dataframe row wise

阅读更多关于 Adding columns sums in dataframe row wise

问题 I would like to add the sums of the columns of my dataframe one row at a time. So for each row, I would like to compute the sum of the columns above it. Is there an elegant way to do this with a combination of colSums and apply (or sapply, rollapply)? I have been trying a couple of combinations of those, but could not quite figure it out. 回答1: new_df <- apply(data_frame, 2, cumsum) 回答2: With dplyr , we can do library(dplyr) data %>% mutate_all(cumsum) 来源： https://stackoverflow.com/questions

Apply function too slow in r

阅读更多关于 Apply function too slow in r

问题 I have to calculate for a lot of species a specific formula per row. The formula is a product between a value of abundance and a value present in the last row of the data frame. Then, all these products are summed. My current script consists in using an apply function which appears to be as slow as the for-loop I started with. I simplified the problem in the following script, using a simple df called az : az=data.frame(c(1,2,10),c(2,4,20),c(3,6,30)) colnames(az)=c("a","b","c") # Initial for

Select first row from multiple dataframe and bind

阅读更多关于 Select first row from multiple dataframe and bind

问题 I have three data frames which I have combined in a list d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6)) d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4)) d3 <- data.frame(y1 = c(5, 7, 8),y2 = c(6, 4, 2)) my.list <- list(d1, d2,d3) I want to extract the first row of each element in the list, bind them row wise and save as csv file. For example, in above example, I want to extract first row from d1 , d2 and d3 row1.d1 <- c(1,4) row1.d2 <- c(3,6) row1.d3 <- c(5,6) and bind them together dat