问题
While doing regex find-and-replace in text file, I wanna jump over & ignore certain segments of the text. That is, certain parts of the text should be excluded from the search, and only do search & replace in the remaining parts. The criteria is:
(1) anything between START and END should be excluded from the search & replace.
START may or may not be at the start of a line;
END may or may not be at the end of a line;
one pair of START & END may span multiple lines;
(2) anything wihtin inline comment // should be ignored;
// may or may not be at the start of line;
(3) the first word after . should be ignored;
. may or may not be at the start of a line;
the word may immediately follow . or with spaces, newlines, tabs splitting them.
Example code:
#!/usr/bin/env perl
use strict;
use warnings;
$/ = undef;
#iterate the DATA filehandle
while (<DATA>) {
# This one replaces ALL occurrences of pattern.
s/old/new/gs;
# How do I skip the unwanted segments and do the replace?
#print all
print;
}
##inlined data filehandle for testing.
__DATA__
xx START xx old xx END xx --> ignore
xx old xx --> REPLACE !
START xx old --> ignore
xx old xx END --> ignore
xx old xx --> REPLACE !
// xx old --> ignore
xx // xx old --> ignore
xx . old old xx --> ignore first one, replace second one
.
old --> ignore
(old) xx --> REPLACE !
xx old xx --> REPLACE !
Expected output is:
xx START xx old xx END xx --> ignore
xx new xx --> REPLACE !
START xx old --> ignore
xx old xx END --> ignore
xx new xx --> REPLACE !
// xx old --> ignore
xx // xx old --> ignore
xx . old new xx --> ignore first one, replace second one
.
old --> ignore
(new) xx --> REPLACE !
xx new xx --> REPLACE !
Can anyone help me with the regex here? I posted a similar question couple of hours ago, but that post was full of ambiguities and precludes a clear answer. Hopefully this post may be a "good" & "clear" question.
回答1:
You can use (*SKIP)(*F) verbs to skip something.
(?:(?s:START.*?END)|\/\/.*|\.\s*\w+\b)(*SKIP)(*F)|old
It works like this: (?:part 1 to skip|part 2 to skip|...)(*SKIP)(*F) | part to match
(?:opens a non capture group for alternation(?s:withsflag to make dot match newline\wmatches a word character[A-Za-z0-9_]\bmatches a word boundary
See demo at regex101
回答2:
You need to be more precise on your structue (i.e. when old should be ignored), but for your example the following regex will work (demo on regex101.com):
~ # delimiter
(?s)(?:START).*?(?:END)(?-s)| # look for START-END in single-line mode OR
//.+| # everything after two forward slashes
\.\sold| # the word old after a dot and space OR
^\s+old # old after spaces at the beginning of the line
(*SKIP)(*FAIL)| # all these matches shall fail
\b(old)\b # this one is to be kept
~xg # verbose and multiline modifier
To read more about the concept, check this fantastic site - rexegg.com.
回答3:
Thanks to the valuable contributions from @bobblebubble and @Jan, and based on the Perl code in their replies, I eventually learned to use (*SKIP)(*F) to skip, jumper over or ignore unwanted segments. The final code is:
#!/usr/bin/env perl
use strict;
use warnings;
$/ = undef;
#iterate the DATA filehandle
while (<DATA>) {
# This one replaces ALL occurrences of pattern.
# s/old/new/gs;
# How to skip the unwanted segments and do the replace:
# Both are good.
#s/(?:(?:START.*?END)|\/\/.*?\n|\.\s*\w+\b)(*SKIP)(*F)|old/new/gs;
s/(?:(?s:START.*?END)|\/\/.*|\.\s*\w+\b)(*SKIP)(*F)|old/new/g;
#print all
print;
}
##inlined data filehandle for testing.
__DATA__
xx START xx old xx END xx --> ignore
xx old xx --> REPLACE !
START xx old --> ignore
xx old xx END --> ignore
xx old xx --> REPLACE !
// xx old --> ignore
xx // xx old --> ignore
xx . old old xx --> ignore first one, replace second one
.
old --> ignore
(old) xx --> REPLACE !
xx old xx --> REPLACE !
And, again, many thanks to bobble bubble and Jan.
来源:https://stackoverflow.com/questions/35547683/how-to-ignore-parts-of-the-text-and-do-search-and-replace-in-the-remaining-part