I have a dataframe that looks like:
animal_id trait_id sire_id dam_id
1 25.05 0 0
2 -46.3 1 2
3
You can use match; match(col, df$animal_id) gives corresponding index of elements from col in the animal_id, which can be used further to locate the values of trait:
df[c("trait_sire", "trait_dam")] <-
lapply(df[c("sire_id", "dam_id")], function(col) df$trait_id[match(col, df$animal_id)])
df
# animal_id trait_id sire_id dam_id trait_sire trait_dam
#1 1 25.05 0 0 NA NA
#2 2 -46.30 1 2 25.05 -46.30
#3 3 41.60 1 2 25.05 -46.30
#4 4 -42.76 3 4 41.60 -42.76
#5 5 -10.99 3 4 41.60 -42.76
#6 6 -49.81 5 4 -10.99 -42.76
You can do this using match (in base R) in one run (no need to loop over)
df[c("trait_sire", "trait_dam")] <-
cbind(with(df, trait_id[match(sire_id, animal_id)]),
with(df, trait_id[match(dam_id, animal_id)]))
# animal_id trait_id sire_id dam_id trait_sire trait_dam
# 1 1 25.05 0 0 NA NA
# 2 2 -46.30 1 2 25.05 -46.30
# 3 3 41.60 1 2 25.05 -46.30
# 4 4 -42.76 3 4 41.60 -42.76
# 5 5 -10.99 3 4 41.60 -42.76
# 6 6 -49.81 5 4 -10.99 -42.76
With data.table joins...
library(data.table)
setDT(DT)
DT[, trait_sire :=
.SD[.SD, on=.(animal_id = sire_id), x.trait_id ]
]
DT[, trait_dam :=
.SD[.SD, on=.(animal_id = dam_id), x.trait_id ]
]
animal_id trait_id sire_id dam_id trait_sire trait_dam
1: 1 25.05 0 0 NA NA
2: 2 -46.30 1 2 25.05 -46.30
3: 3 41.60 1 2 25.05 -46.30
4: 4 -42.76 3 4 41.60 -42.76
5: 5 -10.99 3 4 41.60 -42.76
6: 6 -49.81 5 4 -10.99 -42.76
The syntax is x[i, on=, j] where j is some function of the columns. To see how it works, try out DT[DT, on=.(animal_id = dam_id)] and variations. Some notes:
i.* / x.* syntax helps to distinguish where a column is taken from. j is v := expression, the expression is assigned to column, v.x[i, ...] uses rows of i to look up rows of x. on= syntax is like .(xcol = icol).j, the table itself can be written as .SD.One advantage of this approach over match is that it extends to joins on more than one column, like on = .(xcol = icol, xcol2 = icol2) or even "non equi joins" like on = .(xcol < icol). Also, it's part of a consistent syntax for operating on the table (explained in the package's introductory material), rather than specialized code for one task.