问题
In terminal, I am attempting to clean up some .txt files so they can be imported into another program. Only literal search/replaces seem to be working. I cannot get regular expression searches to work.
If I attempt a search and replace with a literal string, it works:
find . -type f -name '*.txt' -exec sed -i '' s/Title Page// {} +;
(remove the words "Title Page" from every text file)
But if I am attempting even the most basic of regular expressions, it does not work:
find . -type f -name '*.txt' -exec sed -i '' s/\n\nDOWN/\\n<DOWN\>/ {} +;
(In every text file, reformat any word "DOWN" that follows double return: remove extra newline and put word in brackets: "\n")
This does not work. The only thing at all "regular expression" about this is looking for the newline.
I must be doing something incorrectly.
Any help is much appreciated.
Update: part 2
John1024's answer helped me out a lot for one aspect.
find . -type f -name '*.txt' -exec sed -i '' '/^$/{N; s/\n[0-9]+/\n/;}' {} +;
Now I am having trouble getting other types of regular expressions to respond properly. The example above, I wish to remove all numbers that appear at the beginning of a line.
Argh! What am I missing?
回答1:
By default, sed handles only one line at a time. When a line is read into sed's pattern space the newline character is removed.
I see that you want to look for an empty line followed by DOWN
and, when found, remove the empty and change the text to <DOWN>
. That can be done. Consider this as the test file:
$ cat file
some
thing
DOWN
DOWN
other
Try:
$ sed '/^$/{N; s/\nDOWN/<DOWN>/;}' file
some
thing
DOWN
<DOWN>
other
How it works
/^$/
This looks for empty lines. The commands in braces which follow are executed only on empty lines.
{N; s/\nDOWN/<DOWN>/;}
The
N
command reads the next line into the pattern space, separated from the current line by a newline character.If the pattern space matches an empty line followed by
DOWN
, the substitution command,s/\nDOWN/<DOWN>/
, removes the newline and replaces theDOWN
with<DOWN>
.
Special Case: DOS/Windows Files
If a file has DOS/Windows line endings, \r\n
, sed will only remove the \n
when the line is read in. The \r
will remain. When dealing with these files, the presence of that character, if unanticipated, may lead to surprising results.
来源:https://stackoverflow.com/questions/30655001/sed-on-mac-not-recognizing-regular-expressions