Tidyr Separate using regex

问题

I searched and searched for this and found similar stuff but nothing quite right. Hopefully this hasn't been answered.

Lets say I have a column with Y,N, and sometimes extra information

    df<-data.frame(Names=c("Patient1","patient2","Patient3","Patient4","patient5"),Surgery=c("Y","N","Y-this kind of surgery","See note","Y"))

And I'm trying to separate out the Y or N into one column, and everything else from that column into another.

I've tried

    df%>%separate('Surgery',c("Surgery","Notes"), sep=" ")

Will end up with a column that has "see", next column has "notes"

    df%>%separate('Surgery',c("Surgery","Notes"), sep = '^Y|^N')

Just gets weird

    df%>%separate('Surgery',c("Surgery","Notes), sep= "^[YN]?")

Splits notes correctly, removes Y and N.

Anybody know how to separate it? The result I'm looking for would have only Y or N in the surgery column and anything else pushed to a different column.

回答1:

We can use extract from tidyr

library(tidyr)
library(dplyr)
df %>% 
  extract(Surgery, into = c("Surgery", "Notes"), "^([YN]*)[[:punct:]]*(.*)")
#     Names Surgery                Notes
#1 Patient1       Y                     
#2 patient2       N                     
#3 Patient3       Y this kind of surgery
#4 Patient4                     See note
#5 patient5       Y

来源：https://stackoverflow.com/questions/49437010/tidyr-separate-using-regex

标签

regex

tidyr

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!