plyr

Object not found error with ddply inside a function

谁都会走 提交于 2019-11-27 03:38:36
This has really challenged my ability to debug R code. I want to use ddply() to apply the same functions to different columns that are sequentially named; eg. a, b, c. To do this I intend to repeatedly pass the column name as a string and use the eval(parse(text=ColName)) to allow the function to reference it. I grabbed this technique from another answer. And this works well, until I put ddply() inside another function. Here is the sample code: # Required packages: library(plyr) myFunction <- function(x, y){ NewColName = "a" z = ddply(x, y, summarize, Ave = mean(eval(parse(text=NewColName)),

Finding the column number and value the of second highest value in a row

。_饼干妹妹 提交于 2019-11-27 03:31:57
问题 I am trying to write some code which identifies the greatest two values for each row and provides their column number and value. df = data.frame( car = c (2,1,1,1,0), bus = c (0,2,0,1,0), walk = c (0,3,2,0,0), bike = c(0,4,0,0,1)) I've managed to get it to do this for the maximum value using the max and max.col functions. df$max = max.col(df,ties.method="first") df$val = apply(df[ ,1:4], 1, max) As far as I know there are no equivalent functions for the second highest value so doing this has

R Dynamically build “list” in data.table (or ddply)

戏子无情 提交于 2019-11-27 03:19:37
问题 My aggregation needs vary among columns / data.frames. I would like to pass the "list" argument to the data.table dynamically. As a minimal example: require(data.table) type <- c(rep("hello", 3), rep("bye", 3), rep("ok",3)) a <- (rep(1:3, 3)) b <- runif(9) c <- runif(9) df <- data.frame(cbind(type, a, b, c), stringsAsFactors=F) DT <-data.table(df) This call: DT[, list(suma = sum(as.numeric(a)), meanb = mean(as.numeric(b)), minc = min(as.numeric(c))), by= type] will have result similar to this

Subtract pairs of columns based on matching column

≡放荡痞女 提交于 2019-11-27 02:57:09
问题 I'll apologise in advance - I know this has likely been answered elsewhere, but I don't seem to be able to find the answer I need, and can't manage to adapt other code I have found to my needs. I have a data frame: FILE | TECHNIQUE | COUNT ------------------------ A | ONE | 10 A | TWO | 25 B | ONE | 5 B | TWO | 30 C | ONE | 30 C | TWO | 50 I would like to produce a data frame of the difference of the COUNT values between ONE and TWO, with a row for each FILE, i.e. FILE | DIFFERENCE ----------

How do I time out a lapply when a list item fails or takes too long?

偶尔善良 提交于 2019-11-27 02:43:43
问题 For several efforts I'm involved in at the moment, I am running large datasets with numerous parameter combinations through a series of functions. The functions have a wrapper (so I can mclapply ) for ease of operation on a cluster. However, I run into two major challenges. a) My parameter combinations are large (think 20k to 100k). Sometimes particular combinations will fail (e.g. survival is too high and mortality is too low so the model never converges as a hypothetical scenario). It's

Change value of variable with dplyr [duplicate]

我们两清 提交于 2019-11-27 02:34:13
This question already has an answer here: Set certain values to NA with dplyr 4 answers I regularly need to change the values of a variable based on the values on a different variable, like this: mtcars$mpg[mtcars$cyl == 4] <- NA I tried doing this with dplyr but failed miserably: mtcars %>% mutate(mpg = mpg == NA[cyl == 4]) %>% as.data.frame() How could I do this with dplyr ? We can use replace to change the values in 'mpg' to NA that corresponds to cyl==4 . mtcars %>% mutate(mpg=replace(mpg, cyl==4, NA)) %>% as.data.frame() 来源: https://stackoverflow.com/questions/28013850/change-value-of

Create columns from factors and count [duplicate]

被刻印的时光 ゝ 提交于 2019-11-27 02:23:44
This question already has an answer here: How do I get a contingency table? 6 answers Faster ways to calculate frequencies and cast from long to wide 4 answers A seemingly easy problem is keeping me very busy. I have a data frame: > df1 Name Score 1 Ben 1 2 Ben 2 3 John 1 4 John 2 5 John 3 I would like to create a summary of the table like this: > df2 Name Score_1 Score_2 Score_3 1 Ben 1 1 0 2 John 1 1 1 So df2 must (i) only show unique "Names" and (ii) create columns from the unique factors in "Score" and (iii) count the number of times a person received said score. I have tried: df2 <- ddply

Convert R list to dataframe with missing/NULL elements

a 夏天 提交于 2019-11-27 02:09:26
问题 Given a list: alist = list( list(name="Foo",age=22), list(name="Bar"), list(name="Baz",age=NULL) ) what's the best way to convert this into a dataframe with name and age columns, with missing values (I'll accept NA or "" in that order of preference)? Simple methods using ldply fail because it tries to convert each list element into a data frame, but the one with the NULL barfs because the lengths don't match. Best I have at the moment is: > ldply(alist,function(s){t(data.frame(unlist(s)))})

adding text to ggplot geom_jitter points that match a condition

*爱你&永不变心* 提交于 2019-11-27 02:00:31
问题 How can I add text to points rendered with geom_jittered to label them? geom_text will not work because I don't know the coordinates of the jittered dots. Could you capture the position of the jittered points so I can pass to geom_text? My practical usage would be to plot a boxplot with the geom_jitter over it to show the data distribution and I would like to label the outliers dots or the ones that match certain condition (for example the lower 10% for the values used for color the plots).

for each group summarise means for all variables in dataframe (ddply? split?)

心不动则不痛 提交于 2019-11-27 00:57:41
问题 A week ago I would have done this manually: subset dataframe by group to new dataframes. For each dataframe compute means for each variables, then rbind. very clunky ... Now i have learned about split and plyr , and I guess there must be an easier way using these tools. Please don't prove me wrong. test_data <- data.frame(cbind( var0 = rnorm(100), var1 = rnorm(100,1), var2 = rnorm(100,2), var3 = rnorm(100,3), var4 = rnorm(100,4), group = sample(letters[1:10],100,replace=T), year = sample(c