Regex to get title from magnet link: “unterminated address regex”

十年热恋 提交于 2020-04-30 09:07:53

问题


I am trying to create a simple shell script to get the title from a magnet link and write it to a .out file.

If I try out on regex101.com the below regex, there is a hit. See screenshot.

&dn=(.*?)&

(https://imge.to/i/Fw26r)

The problem is that I get the following error all the time: "unterminated address regex".

I tried different options, yet same result:

u@d:~/Documents/tmp $ sed -e '\&dn=(.*?)\&$' magnet.txt >> magnet.out
sed: -e expression #1, char 13: unterminated address regex
u@d:~/Documents/tmp $ sed -E '\&dn=(.*?)\&' magnet.txt >> magnet.out
sed: -e expression #1, char 12: unterminated address regex
u@d:~/Documents/tmp $ cat magnet.txt | sed -e '\&dn=(.*?)\&i'
sed: -e expression #1, char 13: unterminated address regex
u@d:~/Documents/tmp $ sed -e '&dn=(.*?)&' magnet.txt >> magnet.out
sed: -e expression #1, char 1: unknown command: `&'

Can you please point me out in the right direction?


回答1:


The backslash before the closing delimiter is wrong. The first backslash is necessary to say "I want to use a different delimiter than the default slash" but the second backslash says "this is a literal ampersand, not the closing delimiter" (and so sed expects the regex to continue, and complains when it never sees the closing delimiter).

Just an address expression causes sed to print matching lines in their entirety (a second time, without -n, as the default behavior is to print all lines), and it seems that you want the ampersand to be part of the regex, not the delimiter around the regex. If the intent is to extract a string between ampersands, you want something like

sed -n 's/.*&dn=\([^&]*\)&.*/\1/p' magnet.txt

that is, replace the entire line with just the extracted parenthesized expression, then print that line.

sed is a scripting language. Most commands other than slash (and colon and equals) are single-letter alphabetics; the s command - which is the only command many people ever encounter - performs substitutions in text.

Just to reiterate, your original script looks like

sed '/dn=.*?/'

with a custom & delimiter instead of /. This looks for lines containing dn= followed by anything, followed by a literal question mark. The default action is to print matching lines, so sed would print those lines twice (and all other lines only once).

The non-greedy quantifier .*? is a Perl extension which is not supported in any sed dialect I am familiar with; but expressing exactly what you want is actually better (even when you do have access to non-greedy quantifiers).



来源:https://stackoverflow.com/questions/57026016/regex-to-get-title-from-magnet-link-unterminated-address-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!