Top level domain from URL in C#

后端 未结 7 1288
误落风尘
误落风尘 2020-11-29 10:11

I am using C# and ASP.NET for this.

We receive a lot of \"strange\" requests on our IIS 6.0 servers and I want to log and catalog these by domain.

Eg. we get

相关标签:
7条回答
  • 2020-11-29 10:42

    Use a regular expression:

    ^https?://([\w./]+[^.])?\.?(\w+\.(com)|(co.uk)|(com.au))$
    

    This will match any URL ending with a TLD in which you are interested. Extend the list for as many as you want. Further, the capturing groups will contain the subdomain, hostname and TLD respectively.

    0 讨论(0)
  • 2020-11-29 10:45

    You can use the following nuget Nager.PublicSuffix package. It uses the same data source that browser vendors use.

    nuget

    PM> Install-Package Nager.PublicSuffix
    

    Example

    var domainParser = new DomainParser(new WebTldRuleProvider());
    
    var domainName = domainParser.Get("sub.test.co.uk");
    //domainName.Domain = "test";
    //domainName.Hostname = "sub.test.co.uk";
    //domainName.RegistrableDomain = "test.co.uk";
    //domainName.SubDomain = "sub";
    //domainName.TLD = "co.uk";
    
    0 讨论(0)
  • 2020-11-29 10:52

    I've written a library for use in .NET 2+ to help pick out the domain components of a URL.

    More details are on github but one benefit over previous options is that it can download the latest data from http://publicsuffix.org automatically (once per month) so the output from the library should be more-or-less on a par with the output used by web browsers to establish domain security boundaries (i.e. pretty good).

    It's not perfect yet but suits my needs and shouldn't take much work to adapt to other use cases so please fork and send a pull request if you want.

    0 讨论(0)
  • 2020-11-29 10:54

    This is not possible without a up-to-date database of different domain levels.

    Consider:

    s1.moh.gov.cn
    moh.gov.cn
    s1.google.com
    google.com
    

    Then at which level you want to get the domain? It's completely depends of the TLD, SLD, ccTLD... because ccTLD in under control of countries they may define very special SLD which is unknown to you.

    0 讨论(0)
  • 2020-11-29 10:54
    uri.Host.ToLower().Replace("www.","").Substring(uri.Host.ToLower().Replace("www.","").IndexOf('.'))
    
    • returns ".com" for

      Uri uri = new Uri("http://stackoverflow.com/questions/4643227/top-level-domain-from-url-in-c");

    • returns ".co.jp" for Uri uri = new Uri("http://stackoverflow.co.jp");

    • returns ".s1.moh.gov.cn" for Uri uri = new Uri("http://stackoverflow.s1.moh.gov.cn");

    etc.

    0 讨论(0)
  • 2020-11-29 10:57

    There may be some examples where this returns something other than what is desired, but country codes are the only ones that are 2 characters, and they may or may not have a short second level (2 or 3 characters) typically used. Therefore, this will give you what you want in most cases:

    string GetRootDomain(string host)
    {
        string[] domains = host.Split('.');
    
        if (domains.Length >= 3)
        {
            int c = domains.Length;
            // handle international country code TLDs 
            // www.amazon.co.uk => amazon.co.uk
            if (domains[c - 1].Length < 3 && domains[c - 2].Length <= 3)
                return string.Join(".", domains, c - 3, 3);
            else
                return string.Join(".", domains, c - 2, 2);
        }
        else
            return host;
    }
    
    0 讨论(0)
提交回复
热议问题