How to ignore parts of the text and do search-and-replace in the remaining part? [duplicate]

烂漫一生 提交于 2019-12-23 17:58:08

问题


While doing regex find-and-replace in text file, I wanna jump over & ignore certain segments of the text. That is, certain parts of the text should be excluded from the search, and only do search & replace in the remaining parts. The criteria is:

(1) anything between START and END should be excluded from the search & replace. START may or may not be at the start of a line; END may or may not be at the end of a line; one pair of START & END may span multiple lines;

(2) anything wihtin inline comment // should be ignored; // may or may not be at the start of line;

(3) the first word after . should be ignored; . may or may not be at the start of a line; the word may immediately follow . or with spaces, newlines, tabs splitting them.

Example code:

#!/usr/bin/env perl
use strict;
use warnings;

$/ = undef;

#iterate the DATA filehandle
while (<DATA>) {
    # This one replaces ALL occurrences of pattern.
    s/old/new/gs;

    # How do I skip the unwanted segments and do the replace?
    #print all
    print;
}

##inlined data filehandle for testing. 
__DATA__
xx START xx old xx END xx   --> ignore
xx old xx                   --> REPLACE !
START xx old                --> ignore
      xx old xx END         --> ignore
      xx old xx             --> REPLACE !
// xx old                   --> ignore
xx // xx old                --> ignore
xx . old old xx             --> ignore first one, replace second one
.
  old                       --> ignore
  (old) xx                  --> REPLACE !
xx old xx                   --> REPLACE !

Expected output is:

xx START xx old xx END xx   --> ignore
xx new xx                   --> REPLACE !
START xx old                --> ignore
      xx old xx END         --> ignore
      xx new xx             --> REPLACE !
// xx old                   --> ignore
xx // xx old                --> ignore
xx . old new xx             --> ignore first one, replace second one
.
  old                       --> ignore
  (new) xx                  --> REPLACE !
xx new xx                   --> REPLACE !

Can anyone help me with the regex here? I posted a similar question couple of hours ago, but that post was full of ambiguities and precludes a clear answer. Hopefully this post may be a "good" & "clear" question.


回答1:


You can use (*SKIP)(*F) verbs to skip something.

(?:(?s:START.*?END)|\/\/.*|\.\s*\w+\b)(*SKIP)(*F)|old

It works like this: (?:part 1 to skip|part 2 to skip|...)(*SKIP)(*F) | part to match

  • (?: opens a non capture group for alternation (?s: with s flag to make dot match newline
  • \w matches a word character [A-Za-z0-9_]
  • \b matches a word boundary

See demo at regex101




回答2:


You need to be more precise on your structue (i.e. when old should be ignored), but for your example the following regex will work (demo on regex101.com):

~                                       # delimiter
    (?s)(?:START).*?(?:END)(?-s)|       # look for START-END in single-line mode OR
    //.+|                               # everything after two forward slashes
    \.\sold|                             # the word old after a dot and space OR
    ^\s+old                             # old after spaces at the beginning of the line
    (*SKIP)(*FAIL)|                     # all these matches shall fail
    \b(old)\b                           # this one is to be kept
~xg                                     # verbose and multiline modifier

To read more about the concept, check this fantastic site - rexegg.com.




回答3:


Thanks to the valuable contributions from @bobblebubble and @Jan, and based on the Perl code in their replies, I eventually learned to use (*SKIP)(*F) to skip, jumper over or ignore unwanted segments. The final code is:

#!/usr/bin/env perl
use strict;
use warnings;

$/ = undef;

#iterate the DATA filehandle
while (<DATA>) {
    # This one replaces ALL occurrences of pattern.
#    s/old/new/gs;

    # How to skip the unwanted segments and do the replace:
    # Both are good.
    #s/(?:(?:START.*?END)|\/\/.*?\n|\.\s*\w+\b)(*SKIP)(*F)|old/new/gs;
    s/(?:(?s:START.*?END)|\/\/.*|\.\s*\w+\b)(*SKIP)(*F)|old/new/g;
    #print all
    print;
}

##inlined data filehandle for testing. 
__DATA__
xx START xx old xx END xx   --> ignore
xx old xx                   --> REPLACE !
START xx old                --> ignore
      xx old xx END         --> ignore
      xx old xx             --> REPLACE !
// xx old                   --> ignore
xx // xx old                --> ignore
xx . old old xx             --> ignore first one, replace second one
.
  old                       --> ignore
  (old) xx                  --> REPLACE !
xx old xx                   --> REPLACE !

And, again, many thanks to bobble bubble and Jan.



来源:https://stackoverflow.com/questions/35547683/how-to-ignore-parts-of-the-text-and-do-search-and-replace-in-the-remaining-part

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!