How to replace all the blanks within square brackets with an underscore using sed?

问题

I figured out that in order to turn [some name] into [some_name] I need to use the following expression:

s/\(\[[^ ]*\) /\1_/

i.e. create a backreference capture for anything that starts with a literal '[' that contains any number of non space characters, followed by a space, to be replaced with the non space characters followed by an underscore. What I don't know yet though is how to alter this expression so it works for ALL underscores within the braces e.g. [a few words] into [a_few_words].

I sense that I'm close, but am just missing a chunk of knowledge that will unlock the key to making this thing work an infinite number of times within the constraints of the first set of []s contained in a line (of SQL Server DDL in this case).

Any suggestions gratefully received....

回答1:

There are two parts to the trickery needed:

Stop replacing when you reach a close square bracket (but do it repeatedly on the line):
```
s/\(\[[^] ]*\) /\1_/g
```
This matches an open square bracket, followed by zero or more characters that are neither a blank nor a close square bracket. The global suffix means that the pattern is applied to all sequences starting with an open square bracket followed eventually by a blank or close square bracket on the line. Note, too, that this regex does not alter '[single-word] and context' whereas the original would translate that to '[single-word]_and context', which is not the object of the exercise.
Get sed to repeat the search from where this one started. Unfortunately, there isn't a truly good way to do that. Sed always resumes searching after the text that was substituted; and this is one occasion when we don't want that. Sometimes, you can get away with simply repeating the substitute operation. In this case, you have to repeat it every time the substitution succeeds, stopping when there are no more substitutions.

Two of the less well known operations in sed are the ':label' and the 't' commands. They were present in the 7th Edition of Unix (circa 1978), though, so they are not new features. The first simply identifies a position in the script which can be jumped to with 'b' (not wanted here) or 't':

[2addr]t [label]
Branch to the ':' function bearing the label if any substitutions have been made since the most recent reading of an input line or execution of a 't' function. If no label is specified, branch to the end of the script.

Marvellous: we need:

 sed -e ':redo; s/\(\[[^] ]*\) /\1_/g; t redo' data.file

Except - it doesn't work all on one line like that (at least, not on MacOS X). This did work admirably, though:

sed -e ':redo
        s/\(\[[^] ]*\) /\1_/g
        t redo' data.file

Or, as noted in the comments, you could write three separate '-e' options (which works on MacOS X):

 sed -e ':redo' -e 's/\(\[[^] ]*\) /\1_/g' -e 't redo' data.file

Given the data file:

a line with [one blank] word inside square brackets.
a line with [two blank] or [three blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple words in a single bracket] inside square brackets.
a line with [multiple words in a single bracket] [several times on one line]

the output from the sed script shown is:

a line with [one_blank] word inside square brackets.
a line with [two_blank] or [three_blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple_words_in_a_single_bracket] inside square brackets.
a line with [multiple_words_in_a_single_bracket] [several_times_on_one_line]

And, finally, reading the fine print in the question, if you need this done only in the first square-bracketed field on each line, then we need to ensure that are no open square brackets before the one that starts the match. This variant works:

sed -e ':redo' -e 's/^\([^]]*\[[^] ]*\) /\1_/' -e 't redo' data.file

(The 'g' qualifier is gone - it probably isn't needed in the other variants either given the loop; its presence might make the process marginally more efficient, but it would most likely be essentially impossible to detect that. The pattern is now anchored to the start of the line (the caret) and contains zero or more characters that are not open square bracket before the first open square bracket.)

Sample output:

a line with [two_blank] or [three blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple_words_in_a_single_bracket] inside square brackets.
a line with [multiple_words_in_a_single_bracket] [several times on one line]

回答2:

This is easier in a language like perl which has "executable" substitutions:

perl -wne 's/(\[.*?])/ do { my $x = $1; $x =~ y, ,_,; $x } /ge; print'

Or to split it up more clearly:

sub replace_with_underscores {
    my $s = shift;
    $s =~ y/ /_/;
    $s
}
s/(\[.*?])/ replace_with_underscores($1) /ge;

The .*? is the non-greedy match (to avoid slurring together two adjacent bracketed phrases) and the e flag to the substitution causes it to be evaluated, so you can call a function to do the inner work.

来源：https://stackoverflow.com/questions/4503535/how-to-replace-all-the-blanks-within-square-brackets-with-an-underscore-using-se

标签

regex

sed

backreference