plyr

Calculate elapsed “times”, where the reference time depends on a factor

末鹿安然 提交于 2019-12-02 12:19:51
I'm trying to calculate elapsed times in a data frame, where the 'start' value for the elapsed time depends on the value of a factor column in the data frame. (To simply the question, I'll treat the time values as numeric rather than time objects - my question is about split-apply-combine, not time objects). My data frame looks like this: df <- data.frame(id=gl(2, 3, 5, labels=c("a", "b")), time=1:5) I'd like to calculate elapsed times by subtracting the minimum time in each factor level from each time (although for the sake of this example I'll just deal with numeric values, not time values).

Not sure why dcast() this data set results in dropping variables

这一生的挚爱 提交于 2019-12-02 12:11:31
问题 I have a data frame that looks like: id fromuserid touserid from_country to_country length 1 1 54525953 47195889 US US 2 2 2 54525953 54361607 US US 1 3 3 54525953 53571081 US US 2 4 4 41943048 55379244 US US 1 5 5 47185938 53140304 US PR 1 6 6 47185938 54121387 US US 1 7 7 54525974 50928645 GB GB 1 8 8 54525974 53495302 GB GB 1 9 9 51380247 45214216 SG SG 2 10 10 51380247 43972484 SG US 2 Each row describes a number of messages (length) sent from one user to another user. What I would like

when is plyr better than data.table? [closed]

流过昼夜 提交于 2019-12-02 11:51:32
Better here can mean faster or easier to read/shorter syntax or it could also mean that the command is not even doable in data.table . I don't use plyr a lot and would like to know if there are cases when I should. Because I don't use it a lot, the only example I can come up with is rbind.fill that to my knowledge doesn't have a data.table analog and every other example I've seen of smth being done in both plyr and data.table , the latter was faster and easier to read/more compact. They are different packages with different purposes. One is not a substitute for the other, despite there being a

Error when calculating values greater than 95% quantile using plyr

Deadly 提交于 2019-12-02 10:49:46
My data is structured as follows: Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Harry", "Harry", "Harry", "Harry","Harry", "Harry", "Harry", "Harry", "Paul", "Paul", "Paul", "Paul"), Time = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), Condition = c("Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr"), Power = c(400,

Merge rows with duplicate IDs

为君一笑 提交于 2019-12-02 09:34:12
I would like to merge and sum the values of each row that contains duplicated IDs. For example, the data frame below contains a duplicated symbol 'LOC102723897'. I would like to merge these two rows and sum the value within each column, so that one row appears for the duplicated symbol. > head(y$genes) SM01 SM02 SM03 SM04 SM05 SM06 SM07 SM08 SM09 SM10 SM11 SM12 SM13 SM14 SM15 SM16 SM17 SM18 SM19 SM20 SM21 SM22 1 32 29 23 20 27 105 80 64 83 80 94 58 122 76 78 70 34 32 45 42 138 30 2 246 568 437 343 304 291 542 457 608 433 218 329 483 376 410 296 550 533 537 473 296 382 3 30 23 30 13 20 18 23 13

Group (factorial) data with multiple factors. error: incompatible size (0), expecting 1 (the group size) or 1

≡放荡痞女 提交于 2019-12-02 07:58:25
This post is a following up of Changing line color in ggplot based on "several factors" slope I would like to group the data (bellow) by "PQ", however I get the following error: "incompatible size (0), expecting 1 (the group size) or 1" Data ID<-c("A_P1","A_P1","A_P1","A_P1","A_P1","A_P2","A_P2","A_P2","A_P2","A_P2","A_P2","B_P1","B_P1","B_P1","B_P1","B_P1","B_P1","B_P1","B_P1","B_P2","B_P2","B_P2","B_P2","B_P2","B_P2","B_P2","B_P2") Q<-c("C1","C1","C2","C3","C3","C1","C1","C2","C2","C3","C3","Q1","Q1","Q1","Q1","Q3","Q3","Q4","Q4","Q1","Q1","Q1","Q1","Q3","Q3","Q4","Q4") PQ<-c("A_P1C1","A

rbinding a list of lists of dataframes based on nested order

风格不统一 提交于 2019-12-02 07:45:57
I have a dataframe, df and a function process that returns a list of two dataframes, a and b . I use dlply to split up the df on an id column, and then return a list of lists of dataframes. Here's sample data/code that approximates the actual data and methods: df <- data.frame(id1=rep(c(1,2,3,4), each=2)) process <- function(df) { a <- data.frame(d1=rnorm(1), d2=rnorm(1)) b <- data.frame(id1=df$id1, a=rnorm(nrow(df)), b=runif(nrow(df))) list(a=a, b=b) } require(plyr) output <- dlply(df, .(id1), process) output is a list of lists of dataframes, the nested list will always have two dataframes,

Maximum slope for a given interval each day

巧了我就是萌 提交于 2019-12-02 07:25:34
问题 I have a set of time series data with ground surface temperatures measured every 10 minutes over multiple days (actually 2 years of data) from three different locations. What I am interested in calculating is the maximum slope (rate of temperature increase) for any 60 minute interval for each day for each site. So essentially I would like to work through each day, 10 minutes at a time, with a 60 minute window and calculate the slope for each window, and then determine the maximum slope and

splitting text in column and add row number [duplicate]

无人久伴 提交于 2019-12-02 07:05:44
This question already has an answer here: Split comma-separated strings in a column into separate rows 5 answers I would like to split some text in a data frame column and save it into a data frame together with the row number or an id column. I normally used plyr to do that, but this is no longer working in dplyr. If I understand it correctly, it is more a bug in plyr and my code works since it is a bug. So I am looking for the correct way to do this. This is a minimal example in plyr: library(plyr) set.seed(1) df <- data.frame(a=seq(2), b=c(paste(sample(letters,3), collapse=';'), paste

R: using ddply in a loop over data frame columns

旧巷老猫 提交于 2019-12-02 06:37:53
问题 I need to calculate and add to a data frame multiple new columns based on the values in each column in a subset of columns in the data frame. These columns all hold time series data (there is a common date column). For example I need to calculate the change for the same month in the previous year for a dozen columns. I could specify them and calculate them individually but that becomes onerous with a large number of columns to transform, so I am trying to automate the process with a for loop.