remove everything after the last underscore of a column in R [duplicate]

时光怂恿深爱的人放手 提交于 2021-02-07 06:55:47

问题


I have a dataframe and for a particular column I want to strip out everything after the last underscore.

So:

test <- data.frame(label=c('test_test_test', 'test_tom_cat', 'tset_eat_food', 'tisk - tisk'), 
                   stuff=c('blah', 'blag', 'gah', 'nah') , 
                   numbers=c(1,2,3, 4))

should become

result <- data.frame(label=c('test_test', 'test_tom', 'tset_eat', 'tisk - tisk'), 
                   stuff=c('blah', 'blag', 'gah', 'nah') , 
                   numbers=c(1,2,3, 4))

I have got:

require(dplyr)
test %>%
  mutate(label = gsub('_.*','',label))

but that drops everything from the first underscore and gives me

 wrong_result <- data.frame(label=c('test', 'test', 'tset', 'tisk - tisk'), 
                   stuff=c('blah', 'blag', 'gah', 'nah') , 
                   numbers=c(1,2,3, 4))

回答1:


We can use sub and this can be done without any external packages

test$label <- sub("_[^_]+$", "", test$label)
test$label
#[1] "test_test"   "test_tom"    "tset_eat"    "tisk - tisk"



回答2:


This will also work:

gsub('(.*)_\\w+', '\\1', test$label)
#[1] "test_test"   "test_tom"    "tset_eat"    "tisk - tisk"


来源:https://stackoverflow.com/questions/40857694/remove-everything-after-the-last-underscore-of-a-column-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!