data.table or dplyr - data manipulation

会有一股神秘感。 提交于 2019-11-30 09:43:50
lukeA
library(dplyr)
df %.% 
  arrange(Date) %.% 
  filter(!duplicated(Col1)) %.% 
  group_by(Date) %.% 
  summarise(Count=n()) %.% # n() <=> length(Date)
  mutate(Count = cumsum(Count))
# Source: local data frame [3 x 2]
# 
#         Date Count
# 1 2014-01-01     3
# 2 2014-01-02     5
# 3 2014-01-03     6

library(data.table)
dt <- data.table(df, key="Date")
dt <- unique(dt, by="Col1")
(dt <- dt[, list(Count=.N), by=Date][, Count:=cumsum(Count)])
#          Date Count
# 1: 2014-01-01     3
# 2: 2014-01-02     5
# 3: 2014-01-03     6

Or

dt <- data.table(df, key="Date")
dt <- unique(dt, by="Col1")
dt[, .N, by=Date][, Count:=cumsum(N)]

.N is named N (no dot) automatically for convenience in chained operations like this, so you can use both .N and N together in the next operation if need be.

With ddply and duplicated, you just have to do

df <- ddply(data, .(Date, Col1), nrow)
df2 <- ddply(df[!duplicated(df$Col1),], .(Date), nrow)
ddply(df2, .(Date, V1), nrow)

ie you first count for all couples Date, Col1, then you remove the duplicated columns. You finally count the colums.

Your data must be sorted before.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!