Perl regular expression to find a exact word

空扰寡人 提交于 2021-02-05 05:51:26

问题


I want to find the word sprintf in my code. What Perl regular expression should be used? There are some lines which have text like sprintf_private, which I want to exclude, but need just sprintf.


回答1:


You must use \b at the words' border:

/\bsprintf\b/



回答2:


If you want to find all occurrences of sprintf on lines that do not contain sprintf_private, you might use a pair of regexes:

while( my $line = <DATA> ) {
    next if $line =~ m/\bsprintf_private\b/;
    while( $line =~ m/\bsprintf\b/g ) {
        print "[sprintf] found on line $. at column $-[0]\n";
    }
}

This first rejects any line containing sprintf_private. Then lines not containing that disqualifier are scanned for all occurrences of sprintf. Wherever it's found, a message is printed identifying the line in the file and the starting column of the match where sprintf is found.

The $. and @- special variables are described in perlvar. And some good reading on regular expressions can be found in perlrequick and perlretut. The first regular expression is pretty simple; it just uses the \b zero width assertion to assure that the disqualifying substring has a word boundary at each side of it. The second regex uses that same technique but also applies the /g modifier to iterate over all occurrences of sprintf just in case there happens to be more than one occurrence per line.

The zero width assertion \b matches anywhere that a \w\W or \W\w transition occurs. Since the character class \w contains all alpha characters (where what constitutes "all" depends on your unicode_strings flag, or /u), plus underscore and numeric digits (ie, whatever characters are permissible in an identifier), you might find the \b word boundary too restrictive. If you find that the "simple" solution is too naive of an approach, you could go the extra mile and really narrow down what should qualify as a word boundary by using a regex that looks like this:

(?<!\p{Alpha})sprintf(?!\p{Alpha})

If you chose to go this route, the solution would look like this:

while( my $line = <DATA> ) {
    next if $line =~ m/(?<!\p{Alpha})sprintf_private(?!\p{Alpha})/;
    while( $line =~ m/(?<!\p{Alpha})sprintf(?!\p{Alpha})/g ) {
        print "[sprintf] found on line $. at column $-[0]\n";
    }
}

This uses zero width negative lookbehind and zero-width negative lookahead assertions that reject matches where the character immediately to the left or right of the primary substring are "Alpha" characters, rather than using the slightly more naive \b.



来源:https://stackoverflow.com/questions/11683672/perl-regular-expression-to-find-a-exact-word

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!