Match columns of two data frames according to a reference column that is common on both data frames in R

问题

I am trying to create a data frame using data from two other data frames. Basically I have this:

structure(list(V1 = c(1L, 2L, 3L, 5L, 6L, 7L, 8L, 10L, 11L, 12L
), V2 = c(0.916983532, 1.032711089, 0.836822161, 1.006113655, 
1.008669791, 1.036207351, 1.097991705, 1.002907627, 1.108148337, 
1.092072261)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-10L))

And this:

structure(list(V1 = c(1L, 2L, 4L, 5L, 6L, 8L, 9L, 10L, 11L, 12L
), V2 = c(0.965881642, 1.061808325, 1.270001821, 1.018682611, 
1.18481589, 1.073037748, 1.039466199, 0.848856926, 0.839672387, 
0.802535575)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-10L))

And want to get the following output:

structure(list(V1 = 1:12, V2 = c(0.9169835, 1.0327111, 0.8368222, 
0, 1.0061137, 1.0086698, 1.0362074, 1.0979917, 0, 1.0029076, 
1.1081483, 1.0920723), V3 = c(0.965881642, 1.061808325, 0, 1.270001821, 
1.018682611, 1.18481589, 0, 1.073037748, 1.039466199, 0.848856926, 
0.839672387, 0.802535575)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 
-12L))

So, what i want R to do is to put in a new data frame the values from V2 in both data frames if the value on V1 matches so they are in the same row for further analysis. The problem is that V1 wont follow the same sequence on each data frame, sometimes there will be a value in V1 on the first data frame but not in the second or the values could be in different order so what I want R to search in the V1 columns of both data frames and group the values of V2 according to V1 and if there is a V1 value that is not in one of the data frames put a zero or an NA instead in the output data frame.

I have tried the match and merge functions but with no luck so far.

Thanks in advance for any help

回答1:

How about this?

merge(df.1,df.2,by="V1",all=TRUE)

   V1      V2.x      V2.y
1   1 0.9169835 0.9658816
2   2 1.0327111 1.0618083
3   3 0.8368222        NA
4   4        NA 1.2700018
5   5 1.0061137 1.0186826
6   6 1.0086698 1.1848159
7   7 1.0362074        NA
8   8 1.0979917 1.0730377
9   9        NA 1.0394662
10 10 1.0029076 0.8488569
11 11 1.1081483 0.8396724
12 12 1.0920723 0.8025356

When the all argument is set to TRUE, it keeps all the rows even though one of the data.frames does not contain a matching row. When a data.frame does not contain a matching row, NA is filled in.

To get the variable names V2 and V3 you can either rename V2 to V3 in the second data.frame (here defined as df.2) beforehand or rename V2.x and V2.y after merging.

来源：https://stackoverflow.com/questions/18506828/match-columns-of-two-data-frames-according-to-a-reference-column-that-is-common

标签