Inner Join with conditions in R

与世无争的帅哥 提交于 2019-12-02 00:05:38

问题


I want to do inner join with the condition that it should give me subtraction of 2 columns.

df1 = data.frame(Term = c("T1","T2","T3"), Sec = c("s1","s2","s3"), Value =c(10,30,30))

df2 = data.frame(Term = c("T1","T2","T3"), Sec = c("s1","s3","s2"), Value =c(40,20,10)

 df1
 Term Sec Value
  T1  s1    10
  T2  s2    30
  T3  s3    30

  df2
  Term  Sec Value
  T1  s1    40
  T2  s3    20
  T3  s2    10

The result I want is

  Term  Sec Value
   T1   s1   30
   T2   s2   20
   T3   s3   10

Basically I am joining two tables and for the column value I am taking

Value=  abs(df1$Value - df2$Value)

I have struggled but could not found any way to do this conditional merge in base R. Probably if it is not possible with base R, dplyr should able to do that with inner_join() but I am not well aware with much of this package.

So, any suggestion with base R and/or dplyr will be appreciated

EDITING

I have included my original data as asked. My data is here

https://jsfiddle.net/6z6smk80/1/

DF1 is first table and DF2 is second. DF2 starts from 168th row.

All logic same , I want to join these two tables whose length is 160 rows each. I want to join by ID and take difference of column Value from both tables. The resultant dataset should have same number of rows which is 160 with extra column diff


回答1:


Here is a "base R" solution using the merge() function on the Term column shared by your original df1 and df2 data frames:

df_merged <- merge(df1, df2, by="Sec")
df_merged$Value <- abs(df_merged$Value.x - df_merged$Value.y)
df_merged <- df_merged[, c("Sec", "Term.x", "Value")]
names(df_merged)[2] <- "Term"

> df_merged
  Sec Term Value
1  s1   T1    30
2  s2   T2    20
3  s3   T3    10



回答2:


Using data.tables binary join you can modify columns while joining. nomatch = 0L makes sure that you are doing an inner join

library(data.table)
setkey(setDT(df2), Sec)
setkey(setDT(df1), Sec)[df2, .(Term, Sec, Value = abs(Value - i.Value)), nomatch = 0L]
#    Term Sec Value
# 1:   T1  s1    30
# 2:   T2  s2    20
# 3:   T3  s3    10



回答3:


As this is a dplyr question, here is a dplyr solution :

First use inner_join and then transmute to keep variables and compute and append a new one.

inner_join(df1, df2, by = "Sec") %>% 
  transmute(Term = Term.x, Sec, Value = abs(Value.x - Value.y))


来源:https://stackoverflow.com/questions/31179805/inner-join-with-conditions-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!