plyr | 易学教程

R: Function “diff” over various groups

阅读更多关于 R: Function “diff” over various groups

问题 While searching for a solution to my problem I found this thread: Function "diff" over various groups in R. I've got a very similar question so I'll just work with the example there. This is what my desired output should look like: name class year diff 1 a c1 2009 NA 2 a c1 2010 67 3 b c1 2009 NA 4 b c1 2010 20 I have two variables which form subgroups - class and name. So I want to compare only the values which have the same name and class. I also want to have the differences from 2009 to

return rows with max/min value of column, by group, using plyr::ddply

阅读更多关于 return rows with max/min value of column, by group, using plyr::ddply

问题 I found an answer (now deleted) to this question, and I'm curious why it doesn't work. Question is: return the row corresponding to the minimum value, by group. So for example, given the dataset: df <- data.frame(State = c(rep('AK',4),rep('RI',4)), Company = LETTERS[1:8], Employees = c(82L, 104L, 37L, 24L, 19L, 118L, 88L, 42L)) ...the correct answer is: State Company Employees 1: AK D 24 2: RI E 19 as can be obtained, for example, by library(data.table); setDT(df)[ , .SD[which.min(Employees)]

how to create a column including the maximum value of another column in R? [duplicate]

阅读更多关于 how to create a column including the maximum value of another column in R? [duplicate]

问题 This question already has answers here : Calculate group mean (or other summary stats) and assign to original data (4 answers) Closed 2 years ago . Using R, I would like to create a new column (MaxAct) showing the maximum numbers of a different column (ActNo) while grouping by two factors (HHID and PERID) For example, I have this data set: UID HHID PERID ActNo 1 1000 1 1 2 1000 1 2 3 1000 1 3 4 1000 2 1 5 1000 2 2 6 2000 1 1 7 2000 1 2 8 2000 1 3 9 2000 1 4 10 2000 2 1 11 2000 2 2 Then I want

Read All Excel Files into R by Sheet with file name as column

阅读更多关于 Read All Excel Files into R by Sheet with file name as column

问题 I have a local folder with excel files in the same format. Each excel file has 10 sheets. I want to be able to do the following: 1) Read all the excel files in R 2) Rbind all the results together but by sheet . 3) Result would be 10 new dataframes with all the excel files rbinded together. 4) New column will be added with file name I have looked up code and the best I could find is this but it doesn't do it by sheet: files = list.files() library(plyr) library(readr) library(readxl) data2

create a new column in a data.table from group by multiple columns

阅读更多关于 create a new column in a data.table from group by multiple columns

问题 I'm working on a data.table that includes X and Y columns and I want to create a new column Z which is the number of all records with the same value of (X, Y). I know the syntax when working with a data.frame: ddply(df,.(X,Y),nrow) I tested different syntaxes I found on this forum but they didn't work: dt[, Z := lapply(.SD,nrow), by="X,Y"] # or dt[, `:=`(Z = lapply(.SD,nrow)), by="X,Y"] I precise X and Y are numeric. 回答1: Starting from library(data.table) dt <- data.table(X = c(1, 1, 2), Y =

r plyr revalue limitation of number of operations?

阅读更多关于 r plyr revalue limitation of number of operations?

问题 I'm currently using revalue() from plyr to revalue factors levels in a dataframe from a code, like A01-21 to the real value. There are around 2400 levels, and I want revalue in order to be able to have the code as reference in my dataframe, and the corresponding values in translatable texts (to show them in french in french web pages, etc...) You can test this yourself : First create a dataframe: test <- c("H07-24", "H07-25", "H07-26", "H07-27", "H07-28", "H07-29", "H07-30", "H07-31", "H07-32

create variable conditionally by group in R (write function)

阅读更多关于 create variable conditionally by group in R (write function)

问题 I want to create a variable by group conditioned on existing variable on individual level. Each individual has a outlier variable 1, 2, 3. I want to create a new variable by group so that the new var = 2 whenever there is at least one individual in that group whose outlier variable = 2; and the new var = 3 whenever there is at least one individual in that group whose outlier variable = 3. The data looks like this grpid id outlier 1 1 1 1 2 1 1 3 2 2 4 1 2 5 3 2 6 1 3 7 1 3 8 1 3 9 1 Ideal

R sapply vs apply vs lapply + as.data.frame

阅读更多关于 R sapply vs apply vs lapply + as.data.frame

问题 I'm working with some Date columns and trying to cleanse for obviously incorrect dates. I've written a function using the safe.ifelse function mentioned here. Here's my toy data set: df1 <- data.frame(id = 1:25 , month1 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month' ) , month2 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month' ) , month3 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month' ) , letter1 = letters[1:25] ) This works fine for a single

Must ddply use all possible combinations of the splitting variable(s), or only observed?

阅读更多关于 Must ddply use all possible combinations of the splitting variable(s), or only observed?

问题 I have a data frame called thetas containing about 2.7 million observations. > str(thetas) 'data.frame': 2700000 obs. of 8 variables: $ rho_cnd : num 0 0 0 0 0 0 0 0 0 0 ... $ pct_cnd : num 0 0 0 0 0 0 0 0 0 0 ... $ sx : num 1 2 3 4 5 6 7 8 9 10 ... $ model : Factor w/ 7 levels "dN.mN","dN.mL",..: 1 1 1 1 1 1 1 1 1 1 ... $ estTheta : num -1.58 -1.716 0.504 -2.296 0.98 ... $ trueTheta : num 0.0962 -3.3913 3.6006 -0.1971 2.1906 ... $ estError : num -1.68 1.68 -3.1 -2.1 -1.21 ... $ trueAberSx:

data.table syntax for split-apply-combine ala plyr

阅读更多关于 data.table syntax for split-apply-combine ala plyr

问题 I'm just starting to learn data.table and working my way through the vignettes--although I'm simultaneously using it in a project. How do I replace some plyr syntax with data.table ? input <- data.table(ID = c(37, 45, 900), a1 = c(1, 2, 3), a2 = c(43, 320,390), b1 = c(-0.94, 2.2, -1.223), b2 = c(2.32, 4.54, 7.21), c1 = c(1, 2, 3), c2 = c(-0.94, 2.2, -1.223)) # simple user defined function that conveys my problem func <- function(x, num) { x <- data.table(x) new_b <- x$b1[1] x2 <- within(x[1,]