apply

Set column names while calling a function

我们两清 提交于 2019-12-04 18:09:45
Consider we have a numeric data.frame foo and want to find the sum of each two columns: foo <- data.frame(x=1:5,y=4:8,z=10:14, w=8:4) bar <- combn(colnames(foo), 2, function(x) foo[,x[1]] + foo[,x[2]]) bar # [,1] [,2] [,3] [,4] [,5] [,6] #[1,] 5 11 9 14 12 18 #[2,] 7 13 9 16 12 18 #[3,] 9 15 9 18 12 18 #[4,] 11 17 9 20 12 18 #[5,] 13 19 9 22 12 18 Everything is fine, except the column names that are missing from bar . I want column names of bar to show the related columns in foo , for instance in this example: colnames(bar) <- apply(combn(colnames(foo),2), 2, paste0,collapse="") colnames(bar)

Pandas groupby/apply has different behaviour with int and string types

大憨熊 提交于 2019-12-04 17:04:43
I have the following dataframe X Y 0 A 10 1 A 9 2 A 8 3 A 5 4 B 100 5 B 90 6 B 80 7 B 50 and two different functions that are very similar def func1(x): if x.iloc[0]['X'] == 'A': x['D'] = 1 else: x['D'] = 0 return x[['X', 'D']] def func2(x): if x.iloc[0]['X'] == 'A': x['D'] = 'u' else: x['D'] = 'v' return x[['X', 'D']] Now I can groupby/apply these functions df.groupby('X').apply(func1) df.groupby('X').apply(func2) The first line gives me what I want, i.e. X D 0 A 1 1 A 1 2 A 1 3 A 1 4 B 0 5 B 0 6 B 0 7 B 0 But the second line returns something quite strange X D 0 A u 1 A u 2 A u 3 A u 4 A u 5

`tapply()` to return data frame

人盡茶涼 提交于 2019-12-04 12:03:48
I have a dataset with a datetime (POSIXct), a "node" (factor) and and a "c" (numeric) columns, for example: date node c 1 2011-08-14 10:30:00 2 0.051236000 2 2011-08-14 10:30:00 2 0.081230000 3 2011-08-14 10:31:00 1 0.000000000 4 2011-08-14 10:31:00 4 0.001356337 5 2011-08-14 10:31:00 3 0.001356337 6 2011-08-14 10:32:00 2 0.000000000 I need to take the mean of column "c" for all pairs of "date" and "node", so I did this: tapply(data$c, list(data$node, data$date), mean) The result I obtain is what I want, but in a strange structure: num [1:5, 1:8923] 0 0 0.00092 0.00146 NA ... - attr(*,

Help me replace a for loop with an “apply” function

爱⌒轻易说出口 提交于 2019-12-04 10:51:40
...if that is possible My task is to find the longest streak of continuous days a user participated in a game. Instead of writing an sql function, I chose to use the R's rle function, to get the longest streaks and then update my db table with the results. The (attached) dataframe is something like this: day user_id 2008/11/01 2001 2008/11/01 2002 2008/11/01 2003 2008/11/01 2004 2008/11/01 2005 2008/11/02 2001 2008/11/02 2005 2008/11/03 2001 2008/11/03 2003 2008/11/03 2004 2008/11/03 2005 2008/11/04 2001 2008/11/04 2003 2008/11/04 2004 2008/11/04 2005 I tried the following to get per user

apply a function on rolling window in Dataframe where whole dataframe is passed to function

我与影子孤独终老i 提交于 2019-12-04 09:30:50
I have a dataframe of 5 columns indexed by YearMo: yearmo = np.repeat(np.arange(2000, 2010) * 100, 12) + [x for x in range(1,13)] * 10 rates = pd.DataFrame(data=np.random.random(120, 5)), index=pd.Series(data=yearmo, name='YearMo'), columns=['A', 'B','C', 'D', 'E']) rates.head() YearMo A B C D E 200411 0.237696 0.341937 0.258713 0.569689 0.470776 200412 0.601713 0.313006 0.221821 0.720162 0.889891 200501 0.024379 0.761315 0.225032 0.293682 0.302431 200502 0.996778 0.388783 0.026448 0.056188 0.744850 200503 0.942024 0.768416 0.484236 0.102904 0.287446 What I would like to do is to be able to

R plyr, data.table, apply certain columns of data.frame

怎甘沉沦 提交于 2019-12-04 08:30:20
I am looking for ways to speed up my code. I am looking into the apply / ply methods as well as data.table . Unfortunately, I am running into problems. Here is a small sample data: ids1 <- c(1, 1, 1, 1, 2, 2, 2, 2) ids2 <- c(1, 2, 3, 4, 1, 2, 3, 4) chars1 <- c("aa", " bb ", "__cc__", "dd ", "__ee", NA,NA, "n/a") chars2 <- c("vv", "_ ww_", " xx ", "yy__", " zz", NA, "n/a", "n/a") data <- data.frame(col1 = ids1, col2 = ids2, col3 = chars1, col4 = chars2, stringsAsFactors = FALSE) Here is a solution using loops: library("plyr") cols_to_fix <- c("col3","col4") for (i in 1:length(cols_to_fix)) {

Passing multiple arguments to apply (Python)

天大地大妈咪最大 提交于 2019-12-04 08:20:32
问题 I'm trying to clean up some code in Python to vectorize a set of features and I'm wondering if there's a good way to use apply to pass multiple arguments. Consider the following (current version): def function_1(x): if "string" in x: return 1 else: return 0 df['newFeature'] = df['oldFeature'].apply(function_1) With the above I'm having to write a new function (function_1, function_2, etc) to test for each substring "string" that I want to find. In an ideal world I could combine all of these

Row-wise iteration like apply with purrr

删除回忆录丶 提交于 2019-12-04 07:46:30
问题 How do I achieve row-wise iteration using purrr::map? Here's how I'd do it with a standard row-wise apply. df <- data.frame(a = 1:10, b = 11:20, c = 21:30) lst_result <- apply(df, 1, function(x){ var1 <- (x[['a']] + x[['b']]) var2 <- x[['c']]/2 return(data.frame(var1 = var1, var2 = var2)) }) However, this is not too elegant, and I would rather do it with purrr. May (or may not) be faster, too. 回答1: You can use pmap for row-wise iteration. The columns are used as the arguments of whatever

multiply multiple column and find sum of each column for multiple values

谁说胖子不能爱 提交于 2019-12-04 07:16:55
问题 I'm trying to multiply column and get its names. I have a data frame: v1 v2 v3 v4 v5 0 1 1 1 1 0 1 1 0 1 1 0 1 1 0 I'm trying to multiplying each column with other, like: v1v2 v1v3 v1v4 v1v5 and v2v3 v2v4 v2v5 etc, and v1v2v3 v1v2v4 v1v2v5 v2v3v4 v2v3v5 4 combination and 5 combination...if there is n column then n combination. I'm try to use following code in while loop, but it is not working: i<-1 while(i<=ncol(data) { results<-data.frame() v<-i results<- t(apply(data,1,function(x) combn(x,v

How to apply rolling functions in a group by object in pandas

跟風遠走 提交于 2019-12-04 07:11:57
I'm having difficulty to solve a look-back or roll-over problem in dataframe or perhaps in groupby. The following is a simple example of the dataframe I have: fruit amount 20140101 apple 3 20140102 apple 5 20140102 orange 10 20140104 banana 2 20140104 apple 10 20140104 orange 4 20140105 orange 6 20140105 grape 1 … 20141231 apple 3 20141231 grape 2 I need to calculate the average value of 'amount' of each fruit in the previous 3 days for everyday, and create the following data frame: fruit average_in_last 3 days 20140104 apple 4 20140104 orange 10 ... For example on 20140104, the previous 3