gnu sed remove portion of line after pattern match with special characters

懵懂的女人 提交于 2019-12-11 15:20:07

问题


The goal is to use sed to return only the url from each line of FF extension Mining Blocker which uses this format for its regex lines:

{"baseurl":"*://002.0x1f4b0.com/*", "suburl":"*://*/002.0x1f4b0.com/*"},
{"baseurl":"*://003.0x1f4b0.com/*", "suburl":"*://*/003.0x1f4b0.com/*"},

the result should be:

002.0x1f4b0.com
003.0x1f4b0.com

One way would be to keep everything after suburl":"*://*/ then remove each occurrence of /*"},

I found https://unix.stackexchange.com/questions/24140/return-only-the-portion-of-a-line-after-a-matching-pattern but the special characters are a problem.

this won't work:

sed -n -e s@^.*suburl":"*://*/@@g hosts

Would someone please show me how to mark the 2 asterisks in the string so they are seen by regex as literal characters, not wildcards?

edit:

sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' hosts

doesn't work, unfortunately.

regarding character substitution, thanks for directing me to the references.

I reduced the searched-for string to //*/ and used ASCII character codes like this:

sed -n -e s@^.*\d047\d047\d042\d047@@g hosts

Unfortunately, that didn't output any changes to the lines.

My assumptions are:

^.*something specifies everything up to and including the last occurrence of "something" in a line

sed -n -e s@search@@g deletes (replace with nothing) "search" within a line

So, this line:

sed -n -e s@^.*\d047\d047\d042\d047@@g hosts

Should output everything after //*/ in each line...except it doesn't.

What is incorrect with that line?

Regarding deleting everything including and after the first / AFTER that first operation, yes, that's wanted too.


回答1:


This might work for you (GNU sed):

sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' file

Match greedily (the longest string that matches) all characters up to ://*/, followed by a group of characters (which will be referred to as \1) that do not match a /, followed by the rest of the line and replace it by the group \1.

N.B. the sed substitution delimiters are arbitrary, in this case chosen to be # so as make pattern matching / easier. Also the character * on the left hand side of the substitution command may be interpreted as a meta character that means zero or more of the previous character/group and so is quoted \* so that it does not mistakenly exert this property. Finally, using the option -n toggles off the usual printing of every thing in the pattern space after all the sed commands have been executed. The p flag on the substitution command, prints the pattern space following a successful substitution, therefore only URL's will appear in the output or nothing.



来源:https://stackoverflow.com/questions/51490729/gnu-sed-remove-portion-of-line-after-pattern-match-with-special-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!