plyr

Updated: Plyr rename() not recognizing identical 'x'; Error: The following `from` values were not present in `x`:

做~自己de王妃 提交于 2019-12-18 09:45:15
问题 R 3.2.4 Plyr updated 2016-03-10 I am trying to rename columns in a large data set and running into the "The following from values were not present in x :" error. The columns from origin export are atrocious, which is why I'm using plyr rename, but it seems that even rename is having trouble. Example trouble column is [,3] in the linked data set and is titled: "Experimental.or.quasi.experimental..evaluation..compares.mentored.youth.to.a.comparison.or.â.œcontrolâ...group.of.non.mentored.youth.

Subset data based on Minimum Value

梦想与她 提交于 2019-12-18 09:30:09
问题 This might an easy one. Here's the data: dat <- read.table(header=TRUE, text=" Seg ID Distance Seg46 V21 160.37672 Seg72 V85 191.24400 Seg373 V85 167.38930 Seg159 V147 14.74852 Seg233 V171 193.01636 Seg234 V171 200.21458 ") dat Seg ID Distance Seg46 V21 160.37672 Seg72 V85 191.24400 Seg373 V85 167.38930 Seg159 V147 14.74852 Seg233 V171 193.01636 Seg234 V171 200.21458 I am intending to get a table like the following that will give me Seg for the minimized distance (as duplication is seen in ID

How do I use plyr to number rows?

被刻印的时光 ゝ 提交于 2019-12-17 17:52:36
问题 Basically I want an autoincremented id column based on my cohorts - in this case .(kmer, cvCut) > myDataFrame size kmer cvCut cumsum 1 8132 23 10 8132 10000 778 23 10 13789274 30000 324 23 10 23658740 50000 182 23 10 28534840 100000 65 23 10 33943283 200000 25 23 10 37954383 250000 584 23 12 16546507 300000 110 23 12 29435303 400000 28 23 12 34697860 600000 127 23 2 47124443 600001 127 23 2 47124570 I want a column added that has new row names based on the kmer/cvCut group > myDataFrame size

How can I use functions returning vectors (like fivenum) with ddply or aggregate?

最后都变了- 提交于 2019-12-17 16:56:14
问题 I would like to split my data frame using a couple of columns and call let's say fivenum on each group. aggregate(Petal.Width ~ Species, iris, function(x) summary(fivenum(x))) The returned value is a data.frame with only 2 columns and the second being a matrix. How can I turn it into normal columns of a data.frame? Update I want something like the following with less code using fivenum ddply(iris, .(Species), summarise, Min = min(Petal.Width), Q1 = quantile(Petal.Width, .25), Med = median

ddply with lm() function

隐身守侯 提交于 2019-12-17 15:41:21
问题 Hi guys how can I use ddply function for linear model: x1 <- c(1:10, 1:10) x2 <- c(1:5, 1:5, 1:5, 1:5) x3 <- c(rep(1,5), rep(2,5), rep(1,5), rep(2,5)) set.seed(123) y <- rnorm(20, 10, 3) mydf <- data.frame(x1, x2, x3, y) require(plyr) ddply(mydf, mydf$x3, .fun = lm(mydf$y ~ mydf$X1 + mydf$x2)) Generates this error: Error in model.frame.default(formula = mydf$y ~ mydf$X1 + mydf$x2, drop.unused.levels = TRUE) : invalid type (NULL) for variable 'mydf$X1' Appreciate your help. 回答1: Here is what

Reshape multiple categorical variables to binary response variables

徘徊边缘 提交于 2019-12-17 11:45:17
问题 I am trying to convert the following format: mydata <- data.frame(movie = c("Titanic", "Departed"), actor1 = c("Leo", "Jack"), actor2 = c("Kate", "Leo")) movie actor1 actor2 1 Titanic Leo Kate 2 Departed Jack Leo to binary response variables: movie Leo Kate Jack 1 Titanic 1 1 0 2 Departed 1 0 1 I tried the solution described in Convert row data to binary columns but I could get it to work for two variables, not three. I would really appreciate if there is a clean way to do this. 回答1: An

How to merge two data frames on common columns in R with sum of others?

风格不统一 提交于 2019-12-17 10:42:09
问题 R Version 2.11.1 32-bit on Windows 7 I got two data sets: data_A and data_B: data_A USER_A USER_B ACTION 1 11 0.3 1 13 0.25 1 16 0.63 1 17 0.26 2 11 0.14 2 14 0.28 data_B USER_A USER_B ACTION 1 13 0.17 1 14 0.27 2 11 0.25 Now I want to add the ACTION of data_B to the data_A if their USER_A and USER_B are equal. As the example above, the result would be: data_A USER_A USER_B ACTION 1 11 0.3 1 13 0.25+0.17 1 16 0.63 1 17 0.26 2 11 0.14+0.25 2 14 0.28 So how could I achieve it? 回答1: You can use

Can `ddply` (or similar) do a sliding window?

[亡魂溺海] 提交于 2019-12-17 10:36:07
问题 Something like sliding = function(df, n, f) ldply(1:(nrow(df) - n + 1), function(k) f(df[k:(k + n - 1), ]) ) That would be used like > df n a 1 1 0.8021891 2 2 0.9446330 ... > sliding(df, 2, function(df) with(df, + data.frame(n = n[1], a = a[1], b = sum(n - a)) + )) n a b 1 1 0.8021891 1.253178 ... Except straight inside ddply , so that I could get the nice syntactic sugar that comes with it? 回答1: Since there hasn't been an answer posted to this question, I thought I'd put one up to make the

R use ddply or aggregate

那年仲夏 提交于 2019-12-17 09:53:55
问题 I have a data frame with 3 columns: custId, saleDate, DelivDateTime. > head(events22) custId saleDate DelivDate 1 280356593 2012-11-14 14:04:59 11/14/12 17:29 2 280367076 2012-11-14 17:04:44 11/14/12 20:48 3 280380097 2012-11-14 17:38:34 11/14/12 20:45 4 280380095 2012-11-14 20:45:44 11/14/12 23:59 5 280380095 2012-11-14 20:31:39 11/14/12 23:49 6 280380095 2012-11-14 19:58:32 11/15/12 00:10 Here's the dput: > dput(events22) structure(list(custId = c(280356593L, 280367076L, 280380097L,

Object not found error with ddply inside a function

南笙酒味 提交于 2019-12-17 07:15:12
问题 This has really challenged my ability to debug R code. I want to use ddply() to apply the same functions to different columns that are sequentially named; eg. a, b, c. To do this I intend to repeatedly pass the column name as a string and use the eval(parse(text=ColName)) to allow the function to reference it. I grabbed this technique from another answer. And this works well, until I put ddply() inside another function. Here is the sample code: # Required packages: library(plyr) myFunction <-