How to remove words of a line upto specific character pattern…Regex

问题

I want the words after "test" word from a line in a file. means in actuaaly, i dont want the words coming before "test" word.

thats the pattern...

e.g:

Input:

***This is a*** test page.

***My*** test work of test is complete.

Output:

test page.

work of test is complete.

回答1:

Using sed:

sed -n 's/^.*test/test/p' input

If you want to print non-matching lines, untouched:

sed 's/^.*test/test/' input

The one above will remove (greedily) all text until the last test on a line. If you want to delete up to the first test use potong's suggestion:

sed -n 's/test/&\n/;s/.*\n//p' input

回答2:

A pure bash one-liner:

while read x; do [[ $x =~ test.* ]] && echo ${BASH_REMATCH[0]}; done <infile

Input: infile

This is a test page.
My test work of test is complete.

Output:

test page.
test work of test is complete.

It reads all lines from file infile, checks if the line contains the string test and then prints the rest of the line (including test).

The same in sed:

~~sed 's/.(test.)/\1/' infile~~ (Oops! This is wrong! .* is greedy, so it cuts too much from the 2nd example line). This works well:

sed -e 's/\(test.*\)/\x03&/' -e 's/.*\x03//' infile

I did some speed testing (for the original (wrong) sed version). The result is that for small files the bash solution performs better. For larger files sed is better. I also tried this awk version, which is even better for big files:

awk 'match($0,"test.*"){print substr($0,RSTART)}' infile

Similar in perl:

perl -ne 's/(.*?)(test.*)/$2/ and print' infile

I used the two lines example input file and I duplicated it every time. Every version run 1000 times. The result is:

  Size |  bash  |  sed   |  awk   |  perl
   [B] |  [sec] |  [sec] |  [sec] |  [sec]
------------------------------------------
    55 |  0.420 | 10.510 | 10.900 | 17.911
   110 |  0.460 | 10.491 | 10.761 | 17.901
   220 |  0.800 | 10.451 | 10.730 | 17.901
   440 |  1.780 | 10.511 | 10.741 | 17.871
   880 |  4.030 | 10.671 | 10.771 | 17.951
  1760 |  8.600 | 10.901 | 10.840 | 18.011
  3520 | 17.691 | 11.460 | 10.991 | 18.181
  7040 | 36.042 | 12.401 | 11.300 | 18.491
 14080 | 72.355 | 14.461 | 11.861 | 19.161
 28160 |145.950 | 18.621 | 12.981 | 20.451
 56320 |        |        | 15.132 | 23.022
112640 |        |        | 19.763 | 28.402
225280 |        |        | 29.113 | 39.203
450560 |        |        | 47.634 | 60.652
901120 |        |        | 85.047 |103.997

来源：https://stackoverflow.com/questions/16892013/how-to-remove-words-of-a-line-upto-specific-character-pattern-regex

标签

regex

bash

sed