Add column to data frame based on long list and values in another column is too slow

半城伤御伤魂 提交于 2021-02-05 11:39:25

问题


I am adding a new column to a dataframe using apply() and mutate. It works. Unfortunately, it is very slow. I have 24M rows and I am adding column based on values in a long (58 items). It was bearable with smaller list. Not anymore. Here is my example

large_df <-data.frame(A=(1:4),
                   B= c('a','b','c','d'),
                  C= c('e','f','g','h')) 
long_list = c('e','f','g')

large_df =mutate (large_df, new_C = apply(large_df[,2:3], 1, 
                 function(r) any(r %in% long_list)))


The new column (new_C) will read True or False. It works but I am looking for a speedy alternative.

Thank you so much. Serhiy

Bonus Q. I couldn't just select one column with in apply(), needed range. Why?


回答1:


Try one of these alternatives using lapply :

large_df$new_c <- Reduce(`|`, lapply(large_df[, 2:3], `%in%`, long_list))

or sapply :

large_df$new_c <- rowSums(sapply(large_df[, 2:3], `%in%`, long_list)) > 0

Both of which return :

large_df
#  A B C new_c
#1 1 a e  TRUE
#2 2 b f  TRUE
#3 3 c g  TRUE
#4 4 d h FALSE


来源:https://stackoverflow.com/questions/62502950/add-column-to-data-frame-based-on-long-list-and-values-in-another-column-is-too

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!