plyr | 易学教程

Why am I seeing “Error: length(rows) == 1 is not TRUE” with ddply?

阅读更多关于 Why am I seeing “Error: length(rows) == 1 is not TRUE” with ddply?

问题 I have a data frame, say payroll, like: payroll <- read.table(text=" AgencyName Rate PayBasis Status NumRate HousingAuthority $26,843.00 Annual Full-Time 26843.00 HousingAuthority $14,970.00 ProratedAnnual Part-Time 14970.00 HousingAuthority $26,843.00 Annual Full-Time 26843.00 HousingAuthority $14,970.00 ProratedAnnual Part-Time 14970.00 HousingAuthority $13.50 Hourly Part-Time 13.50 HousingAuthority $14,970.00 ProratedAnnual Part-Time 14970.00 HousingAuthority $26,843.00 Annual Full-Time

Concatenate values by group in descending order [duplicate]

阅读更多关于 Concatenate values by group in descending order [duplicate]

This question already has answers here : Collapse / concatenate / aggregate a column to a single comma separated string within each group (3 answers) Closed 2 years ago . I want a data.My data A looks like author_id paper_id prob 731 24943 1 731 24943 1 731 688974 1 731 964345 .8 731 1201905 .9 731 1267992 1 736 249 .2 736 6889 1 736 94345 .7 736 1201905 .9 736 126992 .8 The output I am desiring is: author_id paper_id 731 24943,24943,688974,1201905,964345 736 6889,1201945,126992,94345,249 That is paper_id are arranged according to decreasing order of probability. If I use a combination of sql

How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs

阅读更多关于 How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs

I know it is a very silly question but I could not sort it out that is why asking... How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs as rownames. e.g. IDs Var2 Ae4 2 Ae4 4 Ae4 6 Bc3 3 Bc3 5 Ad2 8 Ad2 7 OutPut Var(x) Ae4 4 Bc3 4 Ad2 7.5 This kinds of things can easily be done using the plyr function ddply : dat = data.frame(ID = rep(LETTERS[1:5], each = 20), value = runif(100)) > head(dat) ID value 1 A 0.45800889 2 A 0.11221072 3 A 0.58833532 4 A 0.70056704 5 A 0.08337996 6 A 0.05195357 ddply(dat, .(ID),

Why doesn't the plyr package use my parallel backend?

阅读更多关于 Why doesn't the plyr package use my parallel backend?

问题 I'm trying to use the parallel package in R for parallel operations rather than doSNOW since it's built-in and ostensibly the way the R Project wants things to go. I'm doing something wrong that I can't pin down though. Take for example this: a <- rnorm(50) b <- rnorm(50) arr <- matrix(cbind(a,b),nrow=50) aaply(arr,.margin=1,function(x){x[1]+x[2]},.parallel=F) This works just fine, producing the sums of my two columns. But if I try to bring in the parallel package: library(parallel) nodes <-

plyr split_indices function crashes for long vectors

阅读更多关于 plyr split_indices function crashes for long vectors

I am trying to run acast function from the package reshape2 on a large data set, and getting the program crash. I was able to localize this problem: library(plyr) n <- 15784000 g <- 1:n split_indices(g, n) # NOTE for copy/pasters: # this may result in an abort and R exit I am getting the following error message: *** caught segfault *** address 0x7ffffc3c44f0, cause 'memory not mapped' Traceback: 1: .Call("split_indices", group, as.integer(n)) 2: split_indices(g, n) If I reduce the value of n: n <- 3946000 then the error message is different: Error: segfault from C stack overflow The R system I

Loops to create new variables in ddply

阅读更多关于 Loops to create new variables in ddply

问题 I am using ddply to aggregate and summarize data frame variables, and I am interested in looping through my data frame's list to create the new variables. new.data <- ddply(old.data, c("factor", "factor2"), function(df) c(a11_a10 = CustomFunction(df$a11_a10), a12_a11 = CustomFunction(df$a12_a11), a13_a12 = CustomFunction(df$a13_a12), ... ... ...)) Is there a way for me to insert a loop in ddply so that I can avoid writing each new summary variable out, e.g. for (i in 11:n) { paste("a", i, "_a

R - Group data but apply different functions to different columns

阅读更多关于 R - Group data but apply different functions to different columns

I'd like to group this data but apply different functions to some columns when grouping. ID type isDesc isImage 1 1 1 0 1 1 0 1 1 1 0 1 4 2 0 1 4 2 1 0 6 1 1 0 6 1 0 1 6 1 0 0 I want to group by ID , columns isDesc and isImage can be summed, but I would like to get the value of type as it is. type will be the same through the whole dataset. The result should look like this: ID type isDesc isImage 1 1 1 2 4 2 1 1 6 1 1 1 Currently I am using library(plyr) summarized = ddply(data, .(ID), numcolwise(sum)) but it simply sums up all the columns. You don't have to use ddply but if you think it's

Dividing values in a column of a data frame by values from a different data frame when row values match

阅读更多关于 Dividing values in a column of a data frame by values from a different data frame when row values match

I have a data.frame x with the following format: species site count 1: A 1.1 25 2: A 1.2 1152 3: A 2.1 26 4: A 3.5 1 5: A 3.7 98 --- 101: B 1.2 6 102: B 1.3 10 103: B 2.1 8 104: B 2.2 8 105: B 2.3 5 I also have another data.frame area with the following format: species area 1: A 59.7 2: B 34.4 3: C 37.7 4: D 22.8 I would like to divide the count column of data.frame x by values in the area column data.frame area when the values in the species column of each data.frame match I have been trying to make it work with a ddply function: density = ddply(x, "species", mutate, density = x$count/area[,2

How do I make doSMP play nicely with plyr?

阅读更多关于 How do I make doSMP play nicely with plyr?

This code works: library(plyr) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE) While this code fails: library(doSMP) workers <- startWorkers(2) registerDoSMP(workers) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) stopWorkers(workers) >Error in do.ply(i) : task 3 failed - "subscript out of bounds" In addition: Warning messages: 1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’ 2: <anonymous>: ... may be used in an incorrect context: ‘.fun

Working with unique values at scale (for loops, apply, or plyr)

阅读更多关于 Working with unique values at scale (for loops, apply, or plyr)

I'm not sure if this is possible, but if it is, it would make life oh so much more efficient. The general problem that would be interesting to the wider SO community: for loops (and base functions like apply) are applicable for general/consistent operations, like adding X to every column or row of a data frame. I have a general/consistent operation I want to carry out, but with unique values for each element of the data frame. Is there a way to do this more efficiently than subsetting my data frame for every grouping, applying the function with specific numbers relative to that grouping, then