Using regex to extract URLs from plain text with Perl

前端 未结 7 1994
梦如初夏
梦如初夏 2020-12-16 05:12

How can I use Perl regexps to extract all URLs of a specific domain (with possibly variable subdomains) with a specific extension from plain text? I have tried:



        
7条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-16 05:57

    i thought that shouldn't happen because i am using .*? which ought to be non-greedy and give me the smallest match

    It does, but it gives you the smallest match going right. Starting from the first http and going right, that's the smallest match.

    Please note for the future, you don't have to escape the slashes, because you don't have to use slashes as your separator. And you don't have to escape the colon either. Next time just do this:

    m|(http://.*?homepage.com\/.*?\.gif)|
    

    or

    m#(http://.*?homepage.com\/.*?\.gif)#
    

    or

    m<(http://.*?homepage.com\/.*?\.gif)>
    

    or one of lots of other characters, see the perlre documentation.

提交回复
热议问题