Extracting top-level and second-level domain from a URL using regex

后端 未结 9 809
误落风尘
误落风尘 2020-12-05 08:02

How can I extract only top-level and second-level domain from a URL using regex? I want to skip all lower level domains. Any ideas?

9条回答
  •  长情又很酷
    2020-12-05 08:49

    For anyone using JavaScript and wanting a simple way to extract the top and second level domains, I ended up doing this:

    'example.aus.com'.match(/\.\w{2,3}\b/g).join('')
    

    This matches anything with a period followed by two or three characters and then a word boundary.

    Here's some example outputs:

    'example.aus.com'       // .aus.com
    'example.austin.com'    // .austin.com
    'example.aus.com/howdy' // .aus.com
    'example.co.uk/howdy'   // .co.uk
    

    Some people might need something a bit cleverer, but this was enough for me with my particular dataset.

    Edit

    I've realised there are actually quite a few second-level domains which are longer than 3 characters (and allowed). So, again for simplicity, I just removed the character counting element of my regex:

    'example.aus.com'.match(/\.\w*\b/g).join('')
    

提交回复
热议问题