GNU sed, ^ and $ with | when first/last character matches

社会主义新天地 提交于 2019-12-11 01:54:13

问题


When doing a substitution that includes something like ^|. in the REGEXP sed doesn't match the null string at beginning of the pattern space if the first character matches. It also doesn't match the end if the last character matches. Why is that?

Here are some examples using 123 as input (with the -r option):

substitution    expected output     actual output   comments
s/^/x/g         x123                x123            works as expected
s/$/x/g         123x                123x            works as expected
s/^|$/x/g       x123x               x123x           works as expected
s/^|./x/g       xxxx                xxx             didn't match the very begining
s/.|$/x/g       xxxx                xxx             didn't match the very end
s/^|1/x/g       xx23                x23             didn't match the very begining
s/^|2/x/g       x1x3                x1x3            this time it did match the begining

I get the same results when using \` instead of ^.
I've tried GNU sed version 4.2.1 and 4.2.2

Try it online!


回答1:


AFAIK sed will try to match the longest match in an alternation.

So when the null string at the beginning of the pattern space can be matched vs. 1 at the same position. 1 is chosen as it's the longest match.

Consider the following:

$ sed 's/12\|123/x/g' <<< 123
x
$ sed 's/123\|12/x/g' <<< 123
x
$ sed 's/^1\|12/x/g' <<< 123
x3

The same applies when reaching the end. Lets break sed 's/.\|$/x/g' <<< 123 down:

123
^
. matches and replace with x
x23
 ^
 . matches and replace with x
xx3
  ^
  . matches and replace with x
xxx
   ^
   Out of pattern space $ will not match.


来源:https://stackoverflow.com/questions/39808332/gnu-sed-and-with-when-first-last-character-matches

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!