Joining data.table with by argument

徘徊边缘 提交于 2021-02-19 04:02:12

问题


I have two data.table dx and dy

dx <- data.table(a = c(1,1,1,1,2,2), b = 3:8)
dy <- data.table(a = c(1,1,2), c = 7:9)

I want to join dy to each row of dx, and below is the desired output

data.table(plyr::ddply(dx, c("a", "b"), function(d) merge(d, dy, by = "a")))
    a b c
 1: 1 3 7
 2: 1 3 8
 3: 1 4 7
 4: 1 4 8
 5: 1 5 7
 6: 1 5 8
 7: 1 6 7
 8: 1 6 8
 9: 2 7 9
10: 2 8 9

However, I failed to make the output only using operation inside [] of data.table or merge? I have tired

merge(dx, dy, by = "a", all = TRUE)

Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in 10 rows; more than 9 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

dy[dx,on="a"]

Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in 10 rows; more than 9 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

dx[, merge(dy, by = "a"), by = c("a", "b")]

Error in is.data.table(y) : argument "y" is missing, with no default

dx[, merge(.SD, dy, by = "a"), by = c("a", "b")]

Error in merge.data.table(.SD, dy, by = "a") : Elements listed in by must be valid column names in x and y

How can I do it actually?

Thanks!


回答1:


If, I understood your requirement correctly, There is a direct merge option that you can use,

dx <- data.table(a = c(1,1,2,2), b = 3:6)
dy <- data.table(a = c(1,1,2), c = 7:9)
merge(x = dx, y = dy, by = "a", all = TRUE)

It gives your desired output that you mentioned. How to join (merge) data frames (inner, outer, left, right)?

I hope it clears your doubt if not, I am sory.




回答2:


Because of multiple 1s, the error is coming. In default merge, i.e the implementation for data.frames, allow.cartesian=TRUE by default. But in the data.table implemetation, this is not the case. Hence, if you run the below code it will give you merge output.

merge(dx, dy, by = "a", all = TRUE, allow.cartesian=TRUE)

The above code will give you the output that you want.



来源:https://stackoverflow.com/questions/36349613/joining-data-table-with-by-argument

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!