Extracting top-level and second-level domain from a URL using regex

后端 未结 9 800
误落风尘
误落风尘 2020-12-05 08:02

How can I extract only top-level and second-level domain from a URL using regex? I want to skip all lower level domains. Any ideas?

9条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-05 08:46

    Also, you can likely do that with some expression similar to,

    ^(?:https?:\/\/)(?:w{3}\.)?.*?([^.\r\n\/]+\.)([^.\r\n\/]+\.[^.\r\n\/]{2,6}(?:\.[^.\r\n\/]{2,6})?).*$
    

    and add as much as capturing groups that you want to capture the components of a URL.

    Demo


    If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


    RegEx Circuit

    jex.im visualizes regular expressions:

提交回复
热议问题