What is the fastest way to get the domain/host name from a URL?

前端 未结 8 979
不知归路
不知归路 2020-12-08 15:48

I need to go through a large list of string url\'s and extract the domain name from them.

For example:

http://www.stackoverflow.com/questions

8条回答
  •  猫巷女王i
    2020-12-08 16:29

    I wrote a method (see below) which extracts a url's domain name and which uses simple String matching. What it actually does is extract the bit between the first "://" (or index 0 if there's no "://" contained) and the first subsequent "/" (or index String.length() if there's no subsequent "/"). The remaining, preceding "www(_)*." bit is chopped off. I'm sure there'll be cases where this won't be good enough but it should be good enough in most cases!

    I read here that the java.net.URI class could do this (and was preferred to the java.net.URL class) but I encountered problems with the URI class. Notably, URI.getHost() gives a null value if the url does not include the scheme, i.e. the "http(s)" bit.

    /**
     * Extracts the domain name from {@code url}
     * by means of String manipulation
     * rather than using the {@link URI} or {@link URL} class.
     *
     * @param url is non-null.
     * @return the domain name within {@code url}.
     */
    public String getUrlDomainName(String url) {
      String domainName = new String(url);
    
      int index = domainName.indexOf("://");
    
      if (index != -1) {
        // keep everything after the "://"
        domainName = domainName.substring(index + 3);
      }
    
      index = domainName.indexOf('/');
    
      if (index != -1) {
        // keep everything before the '/'
        domainName = domainName.substring(0, index);
      }
    
      // check for and remove a preceding 'www'
      // followed by any sequence of characters (non-greedy)
      // followed by a '.'
      // from the beginning of the string
      domainName = domainName.replaceFirst("^www.*?\\.", "");
    
      return domainName;
    }
    

提交回复
热议问题