$ and Perl's global regular expression modifier

前端未结

关注

 3  1084

I finally figured out how to append text to the end of each line in a file:

perl -pe \'s/$/addthis/\' myfile.txt

However, as I\'m trying to

相关标签:

3条回答

慢半拍i

2021-01-02 22:12
As Jim Davis pointed out, $ matches both the end of the string, or before the \n character (with the /m option). (See the Regular Expressions section of the perlre Perldoc page. Using the g modifier allowed it to continue matching.

Multiple line Perl regular expressions (i.e., Perl regular expressions with the new line character in them even if it only occurs once at the end of the line) causes all sorts of complications that most Perl programmers have issues handling.
- If you're reading in a file one line at a time, always use chomp before doing ANYTHING with that line. This would have solved your issue when using the g qualifier.
- Further issues can happen if you're reading files on Linux/Mac which came from Windows. In that case, you will have both the \r and \n character. As I found out recently in attempting to debug a program, the \r character isn't removed by chomp. I now make sure I always open my text files for reading
Like this:
```
open my $file_handle, "<:crlf", $file...
```
This will automatically substitute the \r\n characters with just \n if this is in fact a Windows file on a Linux/Mac system. If this is a regular Linux/Mac text file, it will do nothing. Other obvious solution is not to use Windows (rim shot!).

Of course, in your case, using chomp first would have done the following:
```
$cat file
line one
line two
line three
line four
$ perl -pe 'chomp;s/$/addthis::/g`
line oneaddthis::line twoaddthis::line threeaddthis::line fouraddthis::
```
The chomp removed the \n, so now, you don't see it when the line print out. Hmm...
```
$ perl -pe 'chomp;s/$/addthis/g;print "\n";
line oneaddthis
line twoaddthis
line threeaddthis
line fouraddthis
```
That works! And, your one liner is only mildly incomprehensible.

The other thing is to take a more modern approach that Damian Conway recommends in Chapter 12 of his book Perl Best Practices:

Use \A and \z as string boundary anchors.

Even if you don’t adopt the previous practice of always using /m, using ^ and $ with their default meanings is a bad idea. Sure, you know what ^ and $ actually mean in a Perl regex¹. But will those who read or maintain your code know? Or is it more likely that they will misinterpret those metacharacters in the ways described earlier? Perl provides markers that always—and unambiguously—mean “start of string” and “end of string”: \A and \z (capital A, but lowercase z). They mean “start/end of string” regardless of whether /m is active. They mean “start/end of string” regardless of what the reader thinks ^ and $ mean.

If you followed Conaway's advice, and did this:
```
perl -pe 's/\z/addthis/mg' myfile.txt
```
You would see that your phrase addthis got added to only to the end of each and every line:
```
$cat file
line one
line two
line three
line four
$ perl -pe `s/\z/addthis/mg` myfile.txt
line one
addthisline two
addthisline three
addthisline four
addthis
```
See how well that works. That addthis was added to the very end of each line! ...Right after the \n character on that line.

Enough fun and back to work. (Wait, it's President's Day. It's a paid holiday. No work today except of course all that stuff I promised to have done by Tuesday morning).

Hope this helped you understand how much fun regular expressions are and why so many people have decided to learn Python.

^1. Know what ^ and $ really mean in Perl? Uh, yes of course I do. I've been programming in Perl for a few decades. Yup, I know all this stuff. (Note to self: $ apparently doesn't mean what I always thought it meant.)
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦毁少年i

2021-01-02 22:13
Summary: For what you're doing, drop the /g so it only matches before the newline. The /g is telling it to match before the newline and at the end of the string (after the newline).

Without the /m modifier, $ will match either before a newline (if it occurs at the end of the string) or at the end of the string. For instance, with both "foo" and "foo\n", the $ would match after foo. With "foo\nbar", though, it would match after bar, because the embedded newline isn't at the end of the string.

With the /g modifier, you're getting all the places that $ would match -- so
```
s/$/X/g;
```
would take a line like "foo\n" and turn it into "fooX\nX".

Sidebar: The /m modifier will allow $ to match newlines that occur before the end of the string, so that
```
s/$/X/mg;
```
would convert "foo\nbar\n" into "fooX\nbarX\nX".
0 讨论(0)
发布评论:

提交评论
- 加载中...
谎友^

2021-01-02 22:16
A workaround :
```
perl -pe 's/\n/addthis\n/' 
```
no need g modifier : the regex is treated line by lines.
0 讨论(0)
发布评论:

提交评论
- 加载中...

$ and Perl's global regular expression modifier

Use \A and \z as string boundary anchors.

Use `\A` and `\z` as string boundary anchors.