Text Mining R Package & Regex to handle Replace Smart Curly Quotes

前端 未结 3 594
夕颜
夕颜 2020-12-04 00:37

I\'ve got a bunch of texts like this below with different smart quotes - for single and double quotes. All I could end up with the packages I\'m aware of is to remove those

3条回答
  •  感情败类
    2020-12-04 01:08

    We can use gsub here for a base R option. Replace each curly quoted term at a time.

    text <- "You don‘t get “your” money’s worth"
    new_text <- gsub("“(.*?)”", "\"\\1\"", text)
    new_text <- gsub("’", "'", new_text)
    new_text
    [1] "You don‘t get \"your\" money's worth"
    

    I have assumed here that your curly quotes are always balanced, i.e. they always wrap a word. If not, then you might have to do more work.

    Doing a blanket replacement of opening/closing double curly quotes may not play out as intended, if you want them to remain as is when not quoting a word.

    Demo

提交回复
热议问题