data.table | 易学教程

Multiple pairwise differences based on column name patterns

阅读更多关于 Multiple pairwise differences based on column name patterns

来源： https://stackoverflow.com/questions/62043917/multiple-pairwise-differences-based-on-column-name-patterns

Multiple pairwise differences based on column name patterns

阅读更多关于 Multiple pairwise differences based on column name patterns

来源： https://stackoverflow.com/questions/62043917/multiple-pairwise-differences-based-on-column-name-patterns

What are helpful optimizations in R for big data sets?

阅读更多关于 What are helpful optimizations in R for big data sets?

来源： https://stackoverflow.com/questions/63774476/what-are-helpful-optimizations-in-r-for-big-data-sets

What are helpful optimizations in R for big data sets?

阅读更多关于 What are helpful optimizations in R for big data sets?

来源： https://stackoverflow.com/questions/63774476/what-are-helpful-optimizations-in-r-for-big-data-sets

What are helpful optimizations in R for big data sets?

阅读更多关于 What are helpful optimizations in R for big data sets?

来源： https://stackoverflow.com/questions/63774476/what-are-helpful-optimizations-in-r-for-big-data-sets

Multiple indexing with multiple idxmin() and idmax() in one aggregate in pandas

阅读更多关于 Multiple indexing with multiple idxmin() and idmax() in one aggregate in pandas

问题 In R data.table it is possible and easy to aggregate on multiple columns using argmin or argmax functions in one aggregate. For example for DT: > DT = data.table(id=c(1,1,1,2,2,2,2,3,3,3), col1=c(1,3,5,2,5,3,6,3,67,7), col2=c(4,6,8,3,65,3,5,4,4,7), col3=c(34,64,53,5,6,2,4,6,4,67)) > DT id col1 col2 col3 1: 1 1 4 34 2: 1 3 6 64 3: 1 5 8 53 4: 2 2 3 5 5: 2 5 65 6 6: 2 3 3 2 7: 2 6 5 4 8: 3 3 4 6 9: 3 67 4 4 10: 3 7 7 67 > DT_agg = DT[, .(agg1 = col1[which.max(col2)] , agg2 = col2[which.min(col3

Cartesian join in data.table

阅读更多关于 Cartesian join in data.table

问题 I am trying to do a full Cartesian join using data.table but with little luck. Code: a = data.table(dt=c(20131017,20131018)) setkey(a,dt) b = data.table(ticker=c("ABC","DEF","XYZ"),ind=c("MISC1","MISC2","MISC3")) setkey(b,ticker) Expected output: merge(data.frame(a),data.frame(b),all.x=TRUE,all.y=TRUE) I have tried merge(a,b,allow.cartesian=TRUE) but it gives me following error - " Error in merge.data.table(a, b, allow.cartesian = TRUE) : A non-empty vector of column names for by is required.

Cartesian join in data.table

阅读更多关于 Cartesian join in data.table

Why is R's data.table so much faster than pandas?

阅读更多关于 Why is R's data.table so much faster than pandas?

问题 I have a 12 million rows dataset, with 3 columns as unique identifiers and another 2 columns with values. I'm trying to do a rather simple task: - group by the three identifiers. This yields about 2.6 million unique combinations - Task 1: calculate the median for column Val1 - Task 2: calculate the mean for column Val1 given some condition on Val2 Here are my results, using pandas and data.table (both latest versions at the moment, on the same machine): +-----------------+-----------------+--

R data.table remove rows where one column is duplicated if another column is NA

阅读更多关于 R data.table remove rows where one column is duplicated if another column is NA

问题 Here is an example data.table dt <- data.table(col1 = c('A', 'A', 'B', 'C', 'C', 'D'), col2 = c(NA, 'dog', 'cat', 'jeep', 'porsch', NA)) col1 col2 1: A NA 2: A dog 3: B cat 4: C jeep 5: C porsch 6: D NA I want to remove rows where col1 is duplicated if col2 is NA and has a non-NA value in another row. AKA group by col1, then if group has more than one row and one of them is NA, remove it. This would be the result for dt : col1 col2 2: A dog 3: B cat 4: C jeep 5: C porsch 6: D NA I tried this: