What do I need to quote in sed command lines?

给你一囗甜甜゛ 提交于 2019-12-06 03:16:16
Kent

If I understood you right, your problem is not about bash/sh, it is about the regex flavour sed uses by default: BRE.

The other [= anything but dot, star, caret and dollar] BRE metacharacters require a backslash to give them their special meaning. The reason is that the oldest versions of UNIX grep did not support these.

Grouping (..) should be escaped to give it special meaning. same as + otherwise sed will try to match them as they are literal strings/chars. That's why your s#\(\w\+\) #...# should be escaped. The replacement part doesn't need escaping, so:

sed 's#\(\w\+\) #\1 /#' 

should work.

sed has usually option to use extended regular expressions (now with ?, +, |, (), {m,n}); e.g. GNU sed has -r, then your one-liner could be:

sed -r 's#(\w+) #\1 /#'

I paste some examples here that may help you understand what's going on:

kent$  echo "abcd "|sed 's#\(\w\+\) #\1 /#'
abcd /
kent$  echo "abcd "|sed -r 's#(\w+) #\1 /#'                                                                                                                                 
abcd /
kent$  echo "(abcd+) "|sed 's#(\w*+) #&/#'
(abcd+) /

What you're observing is correct. Certain characters like ?, +, (, ), {, } need to be escaped when using basic regular expressions.

Quoting from the sed manual:

The only difference between basic and extended regular expressions is in the behavior of a few characters: ‘?’, ‘+’, parentheses, and braces (‘{}’). While basic regular expressions require these to be escaped if you want them to behave as special characters, when using extended regular expressions you must escape them if you want them to match a literal character.

(Emphasis mine.) These don't need to be escaped, though, when using extended regexps, except when matching a literal character (as mentioned in the last line quoted above.)

If you want a general answer,

  • Shell metacharacters need to be quoted or escaped from the shell;
  • Regex metacharacters need to be escaped if you want a literal interpretation;
  • Some regex constructs are formed by a backslash escape; depending on context, these backslashes may need quoting.

So you have the following scenarios;

# Match a literal question mark
echo '?' | grep \?
# or equivalently
echo '?' | grep "?"
# or equivalently
echo '?' | grep '?'

# Match a literal asterisk
echo '*' | grep \\\*
# or equivalently
echo '*' | grep "\\*"
# or equivalently
echo '*' | grep '\*'

# Match a backreference: any character repeated twice
echo 'aa' | grep \\\(.\\\)\\1
# or equivalently
echo 'aa' | grep "\(.\)\\1"
# or equivalently
echo 'aa' | grep '\(.\)\1'

As you can see, single quotes probably make the most sense most of the time.

If you are embedding into a language which requires backslash quoting of its own, you have to add yet another set of backslashes, or avoid invoking a shell.

As others have pointed out, extended regular expressions obey a slightly different syntax, but the general pattern is the same. Bottom line, to minimize interference from the shell, use single quotes whenever you can.

For literal characters, you can avoid some backslashitis by using a character class instead.

echo '*' | grep \[\*\]
# or equivalently
echo '*' | grep "[*]"
# or equivalently
echo '*' | grep '[*]'

FreeBSD sed, which is also used on Mac OS X, uses -E instead of -r for extended regular expressions. Therefore, to have it portable, use basic regular expressions. + in extended-regular-expression mode, for example, would have to be replaced with \{1,\} in basic-regular-expression mode. In basic- as well as extended-regular-expression mode, FreeBSD sed does not seem to recognize \w which has to be replaced with [[:alnum:]_] (cf. man re_format).

# using FreeBSD sed (on Mac OS X)

# output: Hello, world!
echo 'hello    world' | sed -e 's/h/H/' -e 's/ \{1,\}/, /g' -e 's/\([[:alnum:]_]\{1,\}\)$/\1!/'
echo 'hello    world' | sed -E -e 's/h/H/' -e 's/ +/, /g' -e 's/([[:alnum:]_]+)$/\1!/'
echo 'hello    world' | sed -E -e 's/h/H/' -e 's/ +/, /g' -e 's/(\w+)$/\1!/'  # does not work

# find a sequence of characters in a line
# replace the following space with a slash
# output: abcd+/abcd+/
echo 'abcd+ abcd+ ' > test
python
import os
output = os.execl('/usr/bin/sed', '-e', 's#\([[:alnum:]_+]\{1,\}\) #\\1/#g', 'test')

To use a single quote as part of a sed regular expression while keeping your outer single quotes for the sed regular expression, you can concatenate three separate strings each enclosed in single quotes to avoid possible shell expansion.

# man bash:
# "A single quote may not occur between single quotes, even when preceded by a backslash."
# cf. http://stackoverflow.com/a/9114512 & http://unix.stackexchange.com/a/82757
# concatenate: 's/doesn'  +  \'  +  't/does not/'
echo "sed doesn't work for me" | sed -e 's/doesn'\''t/does not/'
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!