Create new variable based on other columns using R

前提是你 提交于 2020-01-04 02:27:27

问题


I have a huge file where I want to create a column based on other columns. My file look like this:

person = c(1,2,3,4,5,6,7,8)
father = c(0,0,1,1,4,5,5,7)
mother = c(0,0,2,3,2,2,6,6)
ped = data.frame(person,father,mother)

And I want to create a column indicating if the person is a father or mother (gender column). I got it using a for loop in a small example, but when I apply in the whole file it takes hours to finish. How can I create an apply function to solve that, please. Thanks.

for(i in 1:nrow(ped)){
  ped$test[i] = ifelse(ped[i,1] %in% ped[,2], "M", ifelse(ped[i,1] %in% ped[,3], "F", NA)) 
}

回答1:


Try this:

ped <- transform(ped, gender = ifelse(person %in% father,
                                      'M',
                                      ifelse(person %in% mother, 'F', NA)
                                     ))

Instead of looping over the individual values across the rows, this uses vectorization.




回答2:


You could try

ped$gender <- c(NA, 'M', 'F')[as.numeric(factor(with(ped, 
                  1+2*person %in% father + 4*person %in% mother)))]

Or a faster option would be to assign := with data.table

library(data.table)
setDT(ped)[person %in% father, gender:='M'][person %in% mother, gender:='F']



回答3:


Without having to specify each "father" / "mother" / etc option in code, you could do:

vars <- c("father","mother")
factor(
  do.call(pmax, Map(function(x,y) (ped$person %in% x) * y, ped[vars], seq_along(vars) )),
  labels=c(NA,"M","F")
)
#[1] M    F    F    M    M    F    M    <NA>
#Levels: <NA> M F


来源:https://stackoverflow.com/questions/30339765/create-new-variable-based-on-other-columns-using-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!