sed substitute whitespace for dash only between specific character patterns

问题

I have a lines like these:

ORIGINAL

sometext1 sometext2 word:A12 B34 C56 sometext3 sometext4
sometext5 sometext6 word:A123 B45 C67 sometext7 sometext8
sometext9 sometext10 anotherword:(someword1 someword2 someword3) sometext11 sometext12

EDITED

asdjfkklj lkdsjfic kdiw:A12 B34 C56 lksjdfioe sldkjflkjd
lknal niewoc kdiw:A123 B45 C678 oknes lkwid 
cnqule nkdal anotherword:(kdlklks inlqok mncvmnx) unqieo lksdnf

Desired output:

asdjfkklj lkdsjfic kdiw:A12-B34-C56 lksjdfioe sldkjflkjd
lknal niewoc kdiw:A123-B45-C678 oknes lkwid 
cnqule nkdal anotherword:(kdlklks-inlqok-mncvmnx) unqieo lksdnf

EDITED: Would this be more explicit? But frankly this is much more difficult to read and answer than writing sometext#. I do not know people's preference.

I only want to replace the whitespaces with dashes after A alphabet letter followed by some digits AND replace the whitespaces with dashes between the words between the two parentheses. And not any other whitespaces in the line. Would appreciate an explanation of the syntax too.

Thanks!

回答1:

This might work for you (GNU sed):

sed -r ':a;s/(A[0-9]+(-[A-Z][0-9]+)*) ([A-Z][0-9]+)/\1-\3/;ta;s/(\(\S+(-\S+)*) (\S+( \S+)*\))/\1-\3/;ta' file

Iteratively replace the space(s) in the required strings using a regexp and back references.

回答2:

This code work good

darby@Debian:~/Scrivania$ cat test.txt | sed -r 's@\s+([A-Z][0-9]+)@-\1@g' | sed ':l s/\(([^ )]*\)[ ]/\1-/;tl'
asdjfkklj lkdsjfic kdiw:A12-B34-C56 lksjdfioe sldkjflkjd
lknal niewoc kdiw:A123-B45-C678 oknes lkwid 
cnqule nkdal anotherword:(kdlklks-inlqok-mncvmnx) unqieo lksdnf

Explain my regex

In the first regex

Options

-r              Enable regex extended

Pattern

\s+             One or more space characters
([A-Z][0-9]+)   Submatch a uppercase letter and one or more digits

Replace

-              Dash character
\1             Previous submatch

Note

The g after delimiters ///g is for global substitution.

In the second regex

Pattern

:l             label branched to by t or b
tl             jump to label if any substitution has been made on the pattern space since the most recent reading of input line or execution of command 't'. If label is not specified, then jump to the end of the script. This is a conditional branch
\(([^ )]*\)    match all in round brackets and stop to first space found
[ ]            one space character

Replace

\1             Previous submatch
-              Add a dash

回答3:

You need capture the first Alphanumeric group using () and the second group. Then you can simply replace all using backreferences \1 and \2 :

using sed twice

sed -E 's/(\b[A-Za-z][0-9]+) ([A-Z])/\1-\2/g' | sed -E 's/(\b[A-Za-z][0-9]+) ([A-Z])/\1-\2/g'

or using perl (with lookahead (?=...)the regex don't capture the 2nd group)

perl -pe 's/(\b[A-Za-z][0-9]+) (?=[A-Z])/\1-/g'

\b work boundary
[A-Za-z] 1 letter
[0-9]+ 1 or more digits

sed doesn't support lookahead and lookbehind fonctionality

来源：https://stackoverflow.com/questions/46946593/sed-substitute-whitespace-for-dash-only-between-specific-character-patterns

标签

bash

sed

character

substitution

digits