Aggregating in R

时光怂恿深爱的人放手 提交于 2020-01-03 11:33:33

问题


I have a data frame with two columns. I want to add an additional two columns to the data set with counts based on aggregates.

df <- structure(list(ID = c(1045937900, 1045937900), 
SMS.Type = c("DF1", "WCB14"), 
SMS.Date = c("12/02/2015 19:51", "13/02/2015 08:38"), 
Reply.Date = c("", "13/02/2015 09:52")
), row.names = 4286:4287, class = "data.frame")

I want to simply count the number of Instances of SMS.Type and Reply.Date where there is no null. So in the toy example below, i will generate the 2 for SMS.Type and 1 for Reply.Date

I then want to add this to the data frame as total counts (Im aware they will duplicate out for the number of rows in the original dataset but thats ok)

I have been playing around with aggregate and count function but to no avail

mytempdf <-aggregate(cbind(testtrain$SMS.Type,testtrain$Response.option)~testtrain$ID,
                  train, 
                  function(x) length(unique(which(!is.na(x)))))

mytempdf <- aggregate(testtrain$Reply.Date~testtrain$ID,
                  testtrain, 
                  function(x) length(which(!is.na(x))))

Can anyone help?

Thank you for your time


回答1:


Using data.table you could do (I've added a real NA to your original data). I'm also not sure if you really looking for length(unique()) or just length?

library(data.table)
cols <- c("SMS.Type", "Reply.Date")
setDT(df)[, paste0(cols, ".count") := 
                  lapply(.SD, function(x) length(unique(na.omit(x)))), 
                  .SDcols = cols, 
            by = ID]
#            ID SMS.Type         SMS.Date       Reply.Date SMS.Type.count Reply.Date.count
# 1: 1045937900      DF1 12/02/2015 19:51               NA              2                1
# 2: 1045937900    WCB14 13/02/2015 08:38 13/02/2015 09:52              2                1

In the devel version (v >= 1.9.5) you also could use uniqueN function


Explanation

This is a general solution which will work on any number of desired columns. All you need to do is to put the columns names into cols.

  1. lapply(.SD, is calling a certain function over the columns specified in .SDcols = cols
  2. paste0(cols, ".count") creates new column names while adding count to the column names specified in cols
  3. := performs assignment by reference, meaning, updates the newly created columns with the output of lapply(.SD, in place
  4. by argument is specifying the aggregator columns



回答2:


After converting your empty strings to NAs:

library(dplyr)
mutate(df, SMS.Type.count   = sum(!is.na(SMS.Type)),
           Reply.Date.count = sum(!is.na(Reply.Date)))


来源:https://stackoverflow.com/questions/30211704/aggregating-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!