问题
I have 2 datasets having the same columns and different number of rows.
> dput(smalldf)
structure(list(X = structure(1:5, .Label = c("A", "B", "C", "F",
"G"), class = "factor"), Y = c(1L, 2L, 3L, 6L, 7L), Z = c(10L,
20L, 30L, 60L, 70L)), .Names = c("X", "Y", "Z"), class = "data.frame", row.names = c(NA,
-5L))
> dput(bigdf)
structure(list(X = structure(1:7, .Label = c("A", "B", "C", "D",
"E", "F", "G"), class = "factor"), Y = c(10L, 20L, 30L, 40L,
50L, 60L, 70L), Z = c(100L, 200L, 300L, 400L, 500L, 600L, 700L
)), .Names = c("X", "Y", "Z"), class = "data.frame", row.names = c(NA,
-7L))
I would like to match the similar rows and subtract the Y column. I know this is a quite simple task but I wasn't able to do it! should I be using match()
? or some sort of apply()
function here?
回答1:
This is kinda a common problem. One way to do it in base
R would be to use match
as you suggest, like this with no apply
in sight....
# rows of bigdf that appear in smalldf, in order that they appear in smalldf
idx <- match( rownames(smalldf) , rownames(bigdf) )
# subtract rows of smalldf from bigdf for rows that appear in smalldf and rbind them with original rows from bigdf that do not appear in samlldf
result <- rbind( ( bigdf[ idx , ] - smalldf ) , bigdf[ -idx , ] )
# Order the results
result <- result[ order( rownames(result) ) , ]
X Y Z
A 3 2 5
B 10 3 7
C 0 0 6
D 5 3 4
E 9 -2 20
来源:https://stackoverflow.com/questions/18077158/match-rows-and-subtract-columns