发表新帖

发表新帖

Extracting top-level and second-level domain from a URL using regex

后端未结

关注

 9  809

误落风尘 2020-12-05 08:02

How can I extract only top-level and second-level domain from a URL using regex? I want to skip all lower level domains. Any ideas?

9条回答

长情又很酷 (楼主)

2020-12-05 08:49
For anyone using JavaScript and wanting a simple way to extract the top and second level domains, I ended up doing this:
```
'example.aus.com'.match(/\.\w{2,3}\b/g).join('')
```
This matches anything with a period followed by two or three characters and then a word boundary.

Here's some example outputs:
```
'example.aus.com'       // .aus.com
'example.austin.com'    // .austin.com
'example.aus.com/howdy' // .aus.com
'example.co.uk/howdy'   // .co.uk
```
Some people might need something a bit cleverer, but this was enough for me with my particular dataset.

Edit

I've realised there are actually quite a few second-level domains which are longer than 3 characters (and allowed). So, again for simplicity, I just removed the character counting element of my regex:
```
'example.aus.com'.match(/\.\w*\b/g).join('')
```
0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...

热议问题