问题
I am trying to create a simple shell script to get the title from a magnet link and write it to a .out file.
If I try out on regex101.com the below regex, there is a hit. See screenshot.
&dn=(.*?)&
(https://imge.to/i/Fw26r)
The problem is that I get the following error all the time: "unterminated address regex".
I tried different options, yet same result:
u@d:~/Documents/tmp $ sed -e '\&dn=(.*?)\&$' magnet.txt >> magnet.out
sed: -e expression #1, char 13: unterminated address regex
u@d:~/Documents/tmp $ sed -E '\&dn=(.*?)\&' magnet.txt >> magnet.out
sed: -e expression #1, char 12: unterminated address regex
u@d:~/Documents/tmp $ cat magnet.txt | sed -e '\&dn=(.*?)\&i'
sed: -e expression #1, char 13: unterminated address regex
u@d:~/Documents/tmp $ sed -e '&dn=(.*?)&' magnet.txt >> magnet.out
sed: -e expression #1, char 1: unknown command: `&'
Can you please point me out in the right direction?
回答1:
The backslash before the closing delimiter is wrong. The first backslash is necessary to say "I want to use a different delimiter than the default slash" but the second backslash says "this is a literal ampersand, not the closing delimiter" (and so sed
expects the regex to continue, and complains when it never sees the closing delimiter).
Just an address expression causes sed
to print matching lines in their entirety (a second time, without -n
, as the default behavior is to print all lines), and it seems that you want the ampersand to be part of the regex, not the delimiter around the regex. If the intent is to extract a string between ampersands, you want something like
sed -n 's/.*&dn=\([^&]*\)&.*/\1/p' magnet.txt
that is, replace the entire line with just the extracted parenthesized expression, then print that line.
sed
is a scripting language. Most commands other than slash (and colon and equals) are single-letter alphabetics; the s
command - which is the only command many people ever encounter - performs substitutions in text.
Just to reiterate, your original script looks like
sed '/dn=.*?/'
with a custom &
delimiter instead of /
. This looks for lines containing dn=
followed by anything, followed by a literal question mark. The default action is to print matching lines, so sed
would print those lines twice (and all other lines only once).
The non-greedy quantifier .*?
is a Perl extension which is not supported in any sed
dialect I am familiar with; but expressing exactly what you want is actually better (even when you do have access to non-greedy quantifiers).
来源:https://stackoverflow.com/questions/57026016/regex-to-get-title-from-magnet-link-unterminated-address-regex