plyr

R: Function “diff” over various groups

旧时模样 提交于 2019-12-12 02:06:25
问题 While searching for a solution to my problem I found this thread: Function "diff" over various groups in R. I've got a very similar question so I'll just work with the example there. This is what my desired output should look like: name class year diff 1 a c1 2009 NA 2 a c1 2010 67 3 b c1 2009 NA 4 b c1 2010 20 I have two variables which form subgroups - class and name. So I want to compare only the values which have the same name and class. I also want to have the differences from 2009 to

return rows with max/min value of column, by group, using plyr::ddply

风格不统一 提交于 2019-12-12 01:46:00
问题 I found an answer (now deleted) to this question, and I'm curious why it doesn't work. Question is: return the row corresponding to the minimum value, by group. So for example, given the dataset: df <- data.frame(State = c(rep('AK',4),rep('RI',4)), Company = LETTERS[1:8], Employees = c(82L, 104L, 37L, 24L, 19L, 118L, 88L, 42L)) ...the correct answer is: State Company Employees 1: AK D 24 2: RI E 19 as can be obtained, for example, by library(data.table); setDT(df)[ , .SD[which.min(Employees)]

how to create a column including the maximum value of another column in R? [duplicate]

别来无恙 提交于 2019-12-11 18:53:38
问题 This question already has answers here : Calculate group mean (or other summary stats) and assign to original data (4 answers) Closed 2 years ago . Using R, I would like to create a new column (MaxAct) showing the maximum numbers of a different column (ActNo) while grouping by two factors (HHID and PERID) For example, I have this data set: UID HHID PERID ActNo 1 1000 1 1 2 1000 1 2 3 1000 1 3 4 1000 2 1 5 1000 2 2 6 2000 1 1 7 2000 1 2 8 2000 1 3 9 2000 1 4 10 2000 2 1 11 2000 2 2 Then I want

Read All Excel Files into R by Sheet with file name as column

╄→尐↘猪︶ㄣ 提交于 2019-12-11 18:45:57
问题 I have a local folder with excel files in the same format. Each excel file has 10 sheets. I want to be able to do the following: 1) Read all the excel files in R 2) Rbind all the results together but by sheet . 3) Result would be 10 new dataframes with all the excel files rbinded together. 4) New column will be added with file name I have looked up code and the best I could find is this but it doesn't do it by sheet: files = list.files() library(plyr) library(readr) library(readxl) data2

create a new column in a data.table from group by multiple columns

走远了吗. 提交于 2019-12-11 17:37:46
问题 I'm working on a data.table that includes X and Y columns and I want to create a new column Z which is the number of all records with the same value of (X, Y). I know the syntax when working with a data.frame: ddply(df,.(X,Y),nrow) I tested different syntaxes I found on this forum but they didn't work: dt[, Z := lapply(.SD,nrow), by="X,Y"] # or dt[, `:=`(Z = lapply(.SD,nrow)), by="X,Y"] I precise X and Y are numeric. 回答1: Starting from library(data.table) dt <- data.table(X = c(1, 1, 2), Y =

r plyr revalue limitation of number of operations?

亡梦爱人 提交于 2019-12-11 16:33:27
问题 I'm currently using revalue() from plyr to revalue factors levels in a dataframe from a code, like A01-21 to the real value. There are around 2400 levels, and I want revalue in order to be able to have the code as reference in my dataframe, and the corresponding values in translatable texts (to show them in french in french web pages, etc...) You can test this yourself : First create a dataframe: test <- c("H07-24", "H07-25", "H07-26", "H07-27", "H07-28", "H07-29", "H07-30", "H07-31", "H07-32

create variable conditionally by group in R (write function)

依然范特西╮ 提交于 2019-12-11 15:09:15
问题 I want to create a variable by group conditioned on existing variable on individual level. Each individual has a outlier variable 1, 2, 3. I want to create a new variable by group so that the new var = 2 whenever there is at least one individual in that group whose outlier variable = 2; and the new var = 3 whenever there is at least one individual in that group whose outlier variable = 3. The data looks like this grpid id outlier 1 1 1 1 2 1 1 3 2 2 4 1 2 5 3 2 6 1 3 7 1 3 8 1 3 9 1 Ideal

R sapply vs apply vs lapply + as.data.frame

被刻印的时光 ゝ 提交于 2019-12-11 14:15:20
问题 I'm working with some Date columns and trying to cleanse for obviously incorrect dates. I've written a function using the safe.ifelse function mentioned here. Here's my toy data set: df1 <- data.frame(id = 1:25 , month1 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month' ) , month2 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month' ) , month3 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month' ) , letter1 = letters[1:25] ) This works fine for a single

Must ddply use all possible combinations of the splitting variable(s), or only observed?

若如初见. 提交于 2019-12-11 12:55:53
问题 I have a data frame called thetas containing about 2.7 million observations. > str(thetas) 'data.frame': 2700000 obs. of 8 variables: $ rho_cnd : num 0 0 0 0 0 0 0 0 0 0 ... $ pct_cnd : num 0 0 0 0 0 0 0 0 0 0 ... $ sx : num 1 2 3 4 5 6 7 8 9 10 ... $ model : Factor w/ 7 levels "dN.mN","dN.mL",..: 1 1 1 1 1 1 1 1 1 1 ... $ estTheta : num -1.58 -1.716 0.504 -2.296 0.98 ... $ trueTheta : num 0.0962 -3.3913 3.6006 -0.1971 2.1906 ... $ estError : num -1.68 1.68 -3.1 -2.1 -1.21 ... $ trueAberSx:

data.table syntax for split-apply-combine ala plyr

て烟熏妆下的殇ゞ 提交于 2019-12-11 12:35:47
问题 I'm just starting to learn data.table and working my way through the vignettes--although I'm simultaneously using it in a project. How do I replace some plyr syntax with data.table ? input <- data.table(ID = c(37, 45, 900), a1 = c(1, 2, 3), a2 = c(43, 320,390), b1 = c(-0.94, 2.2, -1.223), b2 = c(2.32, 4.54, 7.21), c1 = c(1, 2, 3), c2 = c(-0.94, 2.2, -1.223)) # simple user defined function that conveys my problem func <- function(x, num) { x <- data.table(x) new_b <- x$b1[1] x2 <- within(x[1,]