Properly Matching a IDN URL

前端 未结 3 1335
刺人心
刺人心 2020-12-04 00:42

I need help building a regular expression that can properly match an URL inside free text.

  • scheme
    • One of the follow
相关标签:
3条回答
  • 2020-12-04 01:17

    This will get you most of the way there. If you need it more refined please provide test data.

    (ftp|https?)://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?
    
    0 讨论(0)
  • 2020-12-04 01:27

    If you require the protocol and aren't worried too much about false positives, by far the easiest thing to do is match all non-whitespace characters around ://

    0 讨论(0)
  • 2020-12-04 01:30

    John Gruber, of Daring Fireball fame, had a post recently that detailed his quest for a good URL-recognizing regex string. What he came up with was this:

    \b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

    Which apparently does OK with Unicode-containing URLs, as well. You'd need to do the slight modification to it to get the rest of what you're looking for -- the scheme, username, password, etc. Alan Storm wrote a piece explaining Gruber's regex pattern, which I definitely needed (regex is so write-once-have-no-clue-how-to-read-ever-again!).

    0 讨论(0)
提交回复
热议问题