Regex to extract subdomain from URL?

后端 未结 7 2405
名媛妹妹
名媛妹妹 2020-12-05 15:07

I have a bunch of domain names coming in like this:

http://subdomain.example.com (example.com is always example.com, but the subdomain varies).

I need \"subd

相关标签:
7条回答
  • 2020-12-05 15:34

    Purely the subdomain string (result is $1):

    ^http://([^.]+)\.domain\.com
    

    Making http:// optional (result is $2):

    ^(http://)?([^.]+)\.domain\.com
    

    Making the http:// and the subdomain optional (result is $3):

    (http://)?(([^.]+)\.)?domain\.com
    
    0 讨论(0)
  • 2020-12-05 15:34

    To math sub domains with dot character in it, I used this one

    https?:\/\/?(?:([^*]+)\.)?domain\.com
    

    to get all matching characters after protocol until domain.

    https://sub.domain.com (sub)

    https://sub.sub.domain.com (sub.sub) ...

    0 讨论(0)
  • 2020-12-05 15:35

    The problem with the above regex is: if you do not know what the protocol is, or what the domain suffix is, you will get some unexpected results. Here is a little regex accounts for those situations. :D

    /(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i  //javascript
    

    This should always return your subdomain (if present) in group 1. Here it is in a Javascript example, but it should also work for any other engine that supports positive look-ahead assertions:

    // EXAMPLE of use
    var regex = /(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i
      , whoKnowsWhatItCouldBe = [
                            "www.mydomain.com/whatever/my-site" //matches: www
                          , "mydomain.com"// does not match
                          , "http://mydomain.com" // does not match
                          , "https://mydomain.com"// does not match
                          , "banana.com/somethingelse" // does not match
                          , "https://banana.com/somethingelse.org" // does not match
                          , "http://what-ever.mydomain.mu" //matches: what-ever
                          , "dev-www.thisdomain.com/whatever" // matches: dev-www
                          , "hot-MamaSitas.SomE_doma-in.au.xxx"//matches: hot-MamaSitas
                      , "http://hot-MamaSitas.SomE_doma-in.au.xxx" // matches: hot-MamaSitas
                      , "пуст.пустыня.ru" //even non english chars! Woohoo! matches: пуст
                      , "пустыня.ru" //does not match
                      ];
    
    // Run a loop and test it out.
    for ( var i = 0, length = whoKnowsWhatItCouldBe.length; i < length; i++ ){
        var result = whoKnowsWhatItCouldBe[i].match(regex);
        if(result != null){
          // YAY! We have a match!
        } else {
          // Boo... No subdomain was found
        }
    }
    
    0 讨论(0)
  • 2020-12-05 15:39

    1st group of

    http://(.*).example.com
    
    0 讨论(0)
  • 2020-12-05 15:49
    /(http:\/\/)?(([^.]+)\.)?domain\.com/
    

    Then $3 (or \3) will contain "subdomain" if one was supplied.

    If you want to have the subdomain in the first group, and your regex engine supports non-capturing groups (shy groups), use this as suggested by palindrom:

    /(?:http:\/\/)?(?:([^.]+)\.)?domain\.com/
    
    0 讨论(0)
  • 2020-12-05 15:52

    It should just be

    \Qhttp://\E(\w+)\.domain\.com
    

    The sub domain will be the first group.

    0 讨论(0)
提交回复
热议问题