Separate a column into 2 columns at the last underscore in R

让人想犯罪 __ 提交于 2019-12-19 18:12:58

问题


I have a dataframe like this

id <-c("1","2","3")
col <- c("CHB_len_SCM_max","CHB_brf_SCM_min","CHB_PROC_S_SV_mean")

df <- data.frame(id,col)

I want to create 2 columns by separating the "col" into the measurement and stat. stat is basically the text after the last underscore (max,min,mean, etc)

My desired output is

  id   Measurement stat
   1   CHB_len_SCM  max  
   2   CHB_brf_SCM  min   
   3 CHB_PROC_S_SV mean    

I tried it this way but the stat column in empty. I am not sure if I am pointing to the last underscore.

library(tidyverse)
df1 <- df %>%
  # Separate the sensors and the summary statistic
  separate(col, into = c("Measurement", "stat"),sep = '\\_[^\\_]*$')

What am I missing here? Can someone point me in the right direction?


回答1:


We could use extract by capturing as two groups by making sure that the second group have one or more characters that are not a _ until the end ($) of the string

library(tidyverse)
df %>% 
   extract(col, into = c("Measurement", "stat"), "(.*)_([^_]+)$")
#   id   Measurement stat
#1  1   CHB_len_SCM  max
#2  2   CHB_brf_SCM  min
#3  3 CHB_PROC_S_SV mean

Or using separate with a regex lookaround

df %>% 
   separate(col, into = c("Measurement", "stat"), sep="_(?=[^_]+$)")


来源:https://stackoverflow.com/questions/50518137/separate-a-column-into-2-columns-at-the-last-underscore-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!