How to replace square brackets with curly brackets using R's regex?

允我心安 提交于 2019-12-01 09:15:29

You can use

gsub("\\[@([^]]*)]", "\\\\cite{\\1}", x) 

See IDEONE demo

Regex breakdown:

  • \\[@ - a literal [@ symbol sequence
  • ([^]]*) - a capture group 1 that matches 0 or more occurrences of any symbol but a ] (note that if ] appears at the beginning of a character class, it does not need escaping)
  • ] - a literal ] symbol

You do not need to use perl=T with this one because the ] inside a character class is not escaped. Otherwise, it would require using that option.

Also, I believe we should only escape what must be escaped. If there is a way to avoid backslash hell, we should. Thus, you can even use

gsub("[[]@([^]]*)]", "\\\\cite{\\1}", x) 

Here is another demo

Why TRE-based regex works better than the PCRE one:

In R 2.10.0 and later, the default regex engine is a modified version of Ville Laurikari's TRE engine [source]. The library's author states that time spent for matching grows linearly with increasing of input text length, while memory requirements are almost constant (tens of kilobytes). TRE is also said to use predictable and modest memory consumption and a quadratic worst-case time in the length of the used regular expression matching algorithm. That is why it seems best to rely on TRE rather than on PCRE regex when dealing with larger documents.

You need to use capturing group.

x <- c("[@Fotheringham1981]", "df[1,2]") gsub("\\[@([^\\]]*)\\]", "\\\\cite{\\1}", x, perl=T) # [1] "\\cite{Fotheringham1981}" "df[1,2]"  

or

gsub("\\[@(.*?)\\]", "\\\\cite{\\1}", x) # [1] "\\cite{Fotheringham1981}" "df[1,2]" 

This matches [@ and then sets up a capture group, i.e. everything within (...), and then .*? matches the shortest string until ] :

gsub("\\[(@.*?)\\]", "\\\\cite{\\1}", x) ## [1] "\\cite{@Fotheringham1981}" "df[1,2]"  

Here is a railroad diagram of the regular expression:

\[(@.*?)\] 

Debuggex Demo

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!