Extracting Date from text using R

别来无恙 提交于 2019-12-02 12:06:55

问题


My dataframe looks like

df <- setNames(data.frame(c("2 June 2004, 5 words, ()(","profit, Insight, 2 May 2004, 188 words,  reports, by ()("), stringsAsFactors = F), "split")

What I want is to split column for date and words So far I found "Extract date text from string"

lapply(df2, function(x) gsub(".*(\\d{2} \\w{3} \\d{4}).*", "\\1", x))

But its not working with my example, thanks for the help as always


回答1:


As there is only a single column, we can directly use gsub/sub after extracting the column. In the pattern, the days can be 1 or more, similarly the words have 3 ('May') or 4 characters ('June'), so we need to make those changes

sub(".*\\b(\\d{1,} \\w{3,4} \\d{4}).*", "\\1", df$split)
#[1] "2 June 2004" "2 May 2004" 


来源:https://stackoverflow.com/questions/50557460/extracting-date-from-text-using-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!