I have a large text vector I would like to search for a particular character or phrase. Regular expressions are taking forever. How do I search it quickly?
Sample
There's no need for regular expressions here, and their power comes with a computational cost.
You can turn off regular expression parsing in any of the regex functions in R with the ,fixed=TRUE argument. Speed gains result:
library(microbenchmark)
m <- microbenchmark(
grep( " ", garbage, fixed=TRUE ),
grep( " ", garbage )
)
m
Unit: milliseconds
expr min lq median uq max neval
grep(" ", garbage, fixed = TRUE) 491.5634 497.1309 499.109 503.3009 1128.643 100
grep(" ", garbage) 1786.8500 1801.9837 1810.294 1825.2755 3620.346 100