I\'m trying to clean a bunch of .txt files in a folder using regex. I can\'t seem to get R to find line breaks.
This is the code I\'m using. It works for character subst
You can't do that with xfun::gsub_dir.
Have a look at the source code:
read_utf8 that basically executes x = readLines(con, encoding = 'UTF-8', warn = FALSE), gsub is fed with these lines, and when all replacements are done,You need to use some custom function for that, here is "quick and dirty" one that will replace all LF symbols with #:
lbr_change_gsub_dir = function(newline = '\n', encoding = 'UTF-8', dir = '.', recursive = TRUE) {
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files) {
x = readLines(f, encoding = encoding, warn = FALSE)
cat(x, sep = newline, file = f)
}
}
folder <- "C:\\MyFolder\\Here"
lbr_change_gsub_dir(newline="#", dir=folder)
If you want to be able to match multiline patterns, paste the lines collapeing them with newline and use any pattern you like:
lbr_gsub_dir = function(pattern, replacement, perl = TRUE, newline = '\n', encoding = 'UTF-8', dir = '.', recursive = TRUE) {
files = list.files(dir, full.names = TRUE, recursive = recursive)
for (f in files) {
x <- readLines(f, encoding = encoding, warn = FALSE)
x <- paste(x, collapse = newline)
x <- gsub(pattern, replacement, x, perl = perl)
cat(x, file = f)
}
}
folder <- "C:\\1"
lbr_gsub_dir("(?m)\\d+\\R(.+)", "\\1", dir = folder)
This will remove lines that follow digit only lines.