Range join data.frames - specific date column with date ranges/intervals in R

后端未结

关注

 2  1334

孤城傲影 2020-12-15 13:06

Although the details of this are, of course, app specific, in the SO spirit I\'m trying to keep this as general as possible! The basic problem is how to merge data.frames by

2条回答

无人及你 (楼主)

2020-12-15 13:33
Here's an approach using sqldf(...) from the sqldf package. This produces your result, with the following exceptions:
1. The Member.n columns contain values in alphabetical order, rather than the order in which they appear in the History data frame. So Member.1 would contain c and Member.2 would contain f, rather than the other way around.
2. Your result set has all the role-related columns as factors, whereas this result set has them as character. If it's important that can easily be changed.
Note that Speeches and History are used for the input data frames, and I use your Output dataframe to get the columns' order only.
```
library(sqldf)    # for sqldf(...)
library(reshape2) # for dcast(...)

colnames(History)[4:5] <- c("Start","End")   # sqldf doesn't like "." in colnames
Speeches$id <- rownames(Speeches)            # need unique id column
result <- sqldf("select a.id, a.Name, a.Date, b.Role, b.Value 
                from Speeches a, History b 
                where a.Name=b.Name and a.Date between b.Start and b.End")
Roles <- aggregate(Role~Name+Date+id,result,function(x)
  ifelse(x=="Member",paste(x,1:length(x),sep="."),as.character(x)))$Role
result$Roles <- unlist(Roles)
result <- dcast(result,Name+Date+id~Roles,value.var="Value")
result <- result[order(result$id),]   # re-order the rows
result <- result[,colnames(Output)]   # re-order the columns
```
Explanation
- First, we need an id column in Speeches to differentiate between the replicated columns in the result. So we use the row names for that.
- Second, we use sqldf(...) to merge the Speeches and History tables based on your criteria. Because you want dates to match based on a range, this may be the best approach.
- Third, we have to convert multiple instances of "Member" into "Member.1", "Member.2", etc. We do this using aggregate(...) and paste(...).
- Fourth, we have to convert the result of the sql, which is in "long" format (all Values in one column, distinguished by a second column Roles), into "wide" format, values for each Role in different columns. We do this using dcast(...).
- Finally, we reorder the rows and columns to be consistent with your result.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...