plyr | 易学教程

Object not found error with ddply inside a function

阅读更多关于 Object not found error with ddply inside a function

This has really challenged my ability to debug R code. I want to use ddply() to apply the same functions to different columns that are sequentially named; eg. a, b, c. To do this I intend to repeatedly pass the column name as a string and use the eval(parse(text=ColName)) to allow the function to reference it. I grabbed this technique from another answer. And this works well, until I put ddply() inside another function. Here is the sample code: # Required packages: library(plyr) myFunction <- function(x, y){ NewColName = "a" z = ddply(x, y, summarize, Ave = mean(eval(parse(text=NewColName)),

Finding the column number and value the of second highest value in a row

阅读更多关于 Finding the column number and value the of second highest value in a row

问题 I am trying to write some code which identifies the greatest two values for each row and provides their column number and value. df = data.frame( car = c (2,1,1,1,0), bus = c (0,2,0,1,0), walk = c (0,3,2,0,0), bike = c(0,4,0,0,1)) I've managed to get it to do this for the maximum value using the max and max.col functions. df$max = max.col(df,ties.method="first") df$val = apply(df[ ,1:4], 1, max) As far as I know there are no equivalent functions for the second highest value so doing this has

R Dynamically build “list” in data.table (or ddply)

阅读更多关于 R Dynamically build “list” in data.table (or ddply)

问题 My aggregation needs vary among columns / data.frames. I would like to pass the "list" argument to the data.table dynamically. As a minimal example: require(data.table) type <- c(rep("hello", 3), rep("bye", 3), rep("ok",3)) a <- (rep(1:3, 3)) b <- runif(9) c <- runif(9) df <- data.frame(cbind(type, a, b, c), stringsAsFactors=F) DT <-data.table(df) This call: DT[, list(suma = sum(as.numeric(a)), meanb = mean(as.numeric(b)), minc = min(as.numeric(c))), by= type] will have result similar to this

Subtract pairs of columns based on matching column

阅读更多关于 Subtract pairs of columns based on matching column

问题 I'll apologise in advance - I know this has likely been answered elsewhere, but I don't seem to be able to find the answer I need, and can't manage to adapt other code I have found to my needs. I have a data frame: FILE | TECHNIQUE | COUNT ------------------------ A | ONE | 10 A | TWO | 25 B | ONE | 5 B | TWO | 30 C | ONE | 30 C | TWO | 50 I would like to produce a data frame of the difference of the COUNT values between ONE and TWO, with a row for each FILE, i.e. FILE | DIFFERENCE ----------

How do I time out a lapply when a list item fails or takes too long?

阅读更多关于 How do I time out a lapply when a list item fails or takes too long?

问题 For several efforts I'm involved in at the moment, I am running large datasets with numerous parameter combinations through a series of functions. The functions have a wrapper (so I can mclapply ) for ease of operation on a cluster. However, I run into two major challenges. a) My parameter combinations are large (think 20k to 100k). Sometimes particular combinations will fail (e.g. survival is too high and mortality is too low so the model never converges as a hypothetical scenario). It's

Change value of variable with dplyr [duplicate]

阅读更多关于 Change value of variable with dplyr [duplicate]

This question already has an answer here: Set certain values to NA with dplyr 4 answers I regularly need to change the values of a variable based on the values on a different variable, like this: mtcars$mpg[mtcars$cyl == 4] <- NA I tried doing this with dplyr but failed miserably: mtcars %>% mutate(mpg = mpg == NA[cyl == 4]) %>% as.data.frame() How could I do this with dplyr ? We can use replace to change the values in 'mpg' to NA that corresponds to cyl==4 . mtcars %>% mutate(mpg=replace(mpg, cyl==4, NA)) %>% as.data.frame() 来源： https://stackoverflow.com/questions/28013850/change-value-of

Create columns from factors and count [duplicate]

阅读更多关于 Create columns from factors and count [duplicate]

This question already has an answer here: How do I get a contingency table? 6 answers Faster ways to calculate frequencies and cast from long to wide 4 answers A seemingly easy problem is keeping me very busy. I have a data frame: > df1 Name Score 1 Ben 1 2 Ben 2 3 John 1 4 John 2 5 John 3 I would like to create a summary of the table like this: > df2 Name Score_1 Score_2 Score_3 1 Ben 1 1 0 2 John 1 1 1 So df2 must (i) only show unique "Names" and (ii) create columns from the unique factors in "Score" and (iii) count the number of times a person received said score. I have tried: df2 <- ddply

Convert R list to dataframe with missing/NULL elements

阅读更多关于 Convert R list to dataframe with missing/NULL elements

问题 Given a list: alist = list( list(name="Foo",age=22), list(name="Bar"), list(name="Baz",age=NULL) ) what's the best way to convert this into a dataframe with name and age columns, with missing values (I'll accept NA or "" in that order of preference)? Simple methods using ldply fail because it tries to convert each list element into a data frame, but the one with the NULL barfs because the lengths don't match. Best I have at the moment is: > ldply(alist,function(s){t(data.frame(unlist(s)))})

adding text to ggplot geom_jitter points that match a condition

阅读更多关于 adding text to ggplot geom_jitter points that match a condition

问题 How can I add text to points rendered with geom_jittered to label them? geom_text will not work because I don't know the coordinates of the jittered dots. Could you capture the position of the jittered points so I can pass to geom_text? My practical usage would be to plot a boxplot with the geom_jitter over it to show the data distribution and I would like to label the outliers dots or the ones that match certain condition (for example the lower 10% for the values used for color the plots).

for each group summarise means for all variables in dataframe (ddply? split?)

阅读更多关于 for each group summarise means for all variables in dataframe (ddply? split?)

问题 A week ago I would have done this manually: subset dataframe by group to new dataframes. For each dataframe compute means for each variables, then rbind. very clunky ... Now i have learned about split and plyr , and I guess there must be an easier way using these tools. Please don't prove me wrong. test_data <- data.frame(cbind( var0 = rnorm(100), var1 = rnorm(100,1), var2 = rnorm(100,2), var3 = rnorm(100,3), var4 = rnorm(100,4), group = sample(letters[1:10],100,replace=T), year = sample(c