inventory of regex anchors

问题

^ is said to match the beginning of a line, but it does not match right after a "\n", "\r" or "\r\n". It matches the beginning of a string, though. In what sense does it match the beginning of a line, and how is it different from \A?
$ is said to match the end of a line, but it does not match right before a "\n", "\r" or "\r\n". It matches the end of a string, though. In what sense does it match the end of a line, and how is it different from \z?
\Z, unlike \z, matches right before "\n" if that is at the end of a string. It seems to me that \A and \z are naturally paired concept, and \Z is rather an odd one. Why is it that \Z and \z are defined as is, and not the other way around? And, when would you want to use \Z?

Can you illustrate the above using examples? If difference among languages/standards matters, it would be helpful to list them.

回答1:

The difference is that the ^ and $ anchors can have modified behaviors.

With multiline mode on, the ^ and $ anchors assert the beginning and end of a line.

With multiline mode off, the ^ and $ anchors assert the beginning and end of the string.

Most regex implementations have a multiline mode.

With Ruby, Perl, or Javascript, it's defined with the m modifier. e.g. /pattern/m

In .NET it's defined with (?m) inside the pattern itself, or from the RegexOptions.Multiline enumeration.

To answer your 3rd question...

\A - The match must occur at the start of the string.

\Z - The match must occur at the end of the string or before \n at the end of the string.

\z - The match must occur at the end of the string.

These three are constants that are not affected by any modifiers. I agree that \A and \z seem to be an illogical pairing. It doesn't make a great deal of sense to me either. But in a case where you may have a trailing line feed that you wish to ignore then \Z might be preferred.

来源：https://stackoverflow.com/questions/5451453/inventory-of-regex-anchors

标签

regex

anchor