Decode URL Unix/Bash Command Line (without sed) [duplicate]

梦想的初衷 提交于 2019-12-11 04:25:54

问题


I am scraping a website with curl and parsing out what I need.

The URLs are returned with Ascii encoded characters like

GET v2.12/...?fields={fieldname_of_type_Tab} HTTP/1.1

How can I convert this to UTF-8 (char) directly from the command line (ideally something I can pipe | to) so that the result is...

GET v2.12/...?fields={fieldname_of_type_Tab} HTTP/1.1

EDIT: There are a number of solutions with sed but the regex that goes along with it is quite ugly. Since the provided answer leveraging perl is very clean I hope we can leave this question open


回答1:


It's not really utf8 but html-entities

Try doing this using perl :

$ echo 'http://domain.tld/?fields={fieldname_of_type_Tab&#125' |
    perl -MHTML::Entities -pe 'decode_entities($_)' 

Output :

http://domain.tld/?fields={fieldname_of_type_Tab}


来源:https://stackoverflow.com/questions/48998515/decode-url-unix-bash-command-line-without-sed

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!