Lookup value from another column that matches with variable

我是研究僧i 提交于 2019-12-01 22:18:53

You can use match; match(col, df$animal_id) gives corresponding index of elements from col in the animal_id, which can be used further to locate the values of trait:

df[c("trait_sire", "trait_dam")] <- 
    lapply(df[c("sire_id", "dam_id")], function(col) df$trait_id[match(col, df$animal_id)])

df
#  animal_id trait_id sire_id dam_id trait_sire trait_dam
#1         1    25.05       0      0         NA        NA
#2         2   -46.30       1      2      25.05    -46.30
#3         3    41.60       1      2      25.05    -46.30
#4         4   -42.76       3      4      41.60    -42.76
#5         5   -10.99       3      4      41.60    -42.76
#6         6   -49.81       5      4     -10.99    -42.76

With data.table joins...

library(data.table)
setDT(DT)    

DT[, trait_sire := 
  .SD[.SD, on=.(animal_id = sire_id), x.trait_id ]
]

DT[, trait_dam := 
  .SD[.SD, on=.(animal_id = dam_id), x.trait_id ]
]

   animal_id trait_id sire_id dam_id trait_sire trait_dam
1:         1    25.05       0      0         NA        NA
2:         2   -46.30       1      2      25.05    -46.30
3:         3    41.60       1      2      25.05    -46.30
4:         4   -42.76       3      4      41.60    -42.76
5:         5   -10.99       3      4      41.60    -42.76
6:         6   -49.81       5      4     -10.99    -42.76

The syntax is x[i, on=, j] where j is some function of the columns. To see how it works, try out DT[DT, on=.(animal_id = dam_id)] and variations. Some notes:

  1. The i.* / x.* syntax helps to distinguish where a column is taken from.
  2. When j is v := expression, the expression is assigned to column, v.
  3. The join x[i, ...] uses rows of i to look up rows of x.
  4. The on= syntax is like .(xcol = icol).
  5. Inside j, the table itself can be written as .SD.

One advantage of this approach over match is that it extends to joins on more than one column, like on = .(xcol = icol, xcol2 = icol2) or even "non equi joins" like on = .(xcol < icol). Also, it's part of a consistent syntax for operating on the table (explained in the package's introductory material), rather than specialized code for one task.

You can do this using match (in base R) in one run (no need to loop over)

df[c("trait_sire", "trait_dam")] <- 
cbind(with(df, trait_id[match(sire_id, animal_id)]), 
      with(df, trait_id[match(dam_id, animal_id)]))

  # animal_id trait_id sire_id dam_id trait_sire trait_dam
# 1         1    25.05       0      0         NA        NA
# 2         2   -46.30       1      2      25.05    -46.30
# 3         3    41.60       1      2      25.05    -46.30
# 4         4   -42.76       3      4      41.60    -42.76
# 5         5   -10.99       3      4      41.60    -42.76
# 6         6   -49.81       5      4     -10.99    -42.76
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!