$ and Perl's global regular expression modifier

前端 未结 3 1084
南旧
南旧 2021-01-02 21:42

I finally figured out how to append text to the end of each line in a file:

perl -pe \'s/$/addthis/\' myfile.txt

However, as I\'m trying to

相关标签:
3条回答
  • 2021-01-02 22:12

    As Jim Davis pointed out, $ matches both the end of the string, or before the \n character (with the /m option). (See the Regular Expressions section of the perlre Perldoc page. Using the g modifier allowed it to continue matching.

    Multiple line Perl regular expressions (i.e., Perl regular expressions with the new line character in them even if it only occurs once at the end of the line) causes all sorts of complications that most Perl programmers have issues handling.

    • If you're reading in a file one line at a time, always use chomp before doing ANYTHING with that line. This would have solved your issue when using the g qualifier.

    • Further issues can happen if you're reading files on Linux/Mac which came from Windows. In that case, you will have both the \r and \n character. As I found out recently in attempting to debug a program, the \r character isn't removed by chomp. I now make sure I always open my text files for reading

    Like this:

    open my $file_handle, "<:crlf", $file...
    

    This will automatically substitute the \r\n characters with just \n if this is in fact a Windows file on a Linux/Mac system. If this is a regular Linux/Mac text file, it will do nothing. Other obvious solution is not to use Windows (rim shot!).

    Of course, in your case, using chomp first would have done the following:

    $cat file
    line one
    line two
    line three
    line four
    $ perl -pe 'chomp;s/$/addthis::/g`
    line oneaddthis::line twoaddthis::line threeaddthis::line fouraddthis::
    

    The chomp removed the \n, so now, you don't see it when the line print out. Hmm...

    $ perl -pe 'chomp;s/$/addthis/g;print "\n";
    line oneaddthis
    line twoaddthis
    line threeaddthis
    line fouraddthis
    

    That works! And, your one liner is only mildly incomprehensible.


    The other thing is to take a more modern approach that Damian Conway recommends in Chapter 12 of his book Perl Best Practices:

    Use \A and \z as string boundary anchors.

    Even if you don’t adopt the previous practice of always using /m, using ^ and $ with their default meanings is a bad idea. Sure, you know what ^ and $ actually mean in a Perl regex1. But will those who read or maintain your code know? Or is it more likely that they will misinterpret those metacharacters in the ways described earlier? Perl provides markers that always—and unambiguously—mean “start of string” and “end of string”: \A and \z (capital A, but lowercase z). They mean “start/end of string” regardless of whether /m is active. They mean “start/end of string” regardless of what the reader thinks ^ and $ mean.

    If you followed Conaway's advice, and did this:

    perl -pe 's/\z/addthis/mg' myfile.txt
    

    You would see that your phrase addthis got added to only to the end of each and every line:

    $cat file
    line one
    line two
    line three
    line four
    $ perl -pe `s/\z/addthis/mg` myfile.txt
    line one
    addthisline two
    addthisline three
    addthisline four
    addthis
    

    See how well that works. That addthis was added to the very end of each line! ...Right after the \n character on that line.

    Enough fun and back to work. (Wait, it's President's Day. It's a paid holiday. No work today except of course all that stuff I promised to have done by Tuesday morning).

    Hope this helped you understand how much fun regular expressions are and why so many people have decided to learn Python.


    1. Know what ^ and $ really mean in Perl? Uh, yes of course I do. I've been programming in Perl for a few decades. Yup, I know all this stuff. (Note to self: $ apparently doesn't mean what I always thought it meant.)

    0 讨论(0)
  • 2021-01-02 22:13

    Summary: For what you're doing, drop the /g so it only matches before the newline. The /g is telling it to match before the newline and at the end of the string (after the newline).

    Without the /m modifier, $ will match either before a newline (if it occurs at the end of the string) or at the end of the string. For instance, with both "foo" and "foo\n", the $ would match after foo. With "foo\nbar", though, it would match after bar, because the embedded newline isn't at the end of the string.

    With the /g modifier, you're getting all the places that $ would match -- so

    s/$/X/g;
    

    would take a line like "foo\n" and turn it into "fooX\nX".

    Sidebar: The /m modifier will allow $ to match newlines that occur before the end of the string, so that

    s/$/X/mg;
    

    would convert "foo\nbar\n" into "fooX\nbarX\nX".

    0 讨论(0)
  • 2021-01-02 22:16

    A workaround :

    perl -pe 's/\n/addthis\n/' 
    

    no need g modifier : the regex is treated line by lines.

    0 讨论(0)
提交回复
热议问题