Top level domain from URL in C#

情到浓时终转凉″ 提交于 2019-11-27 02:05:27

I needed the same, so I wrote a class that you can copy and paste into your solution. It uses a hard coded string array of tld's. http://pastebin.com/raw.php?i=VY3DCNhp

Console.WriteLine(GetDomain.GetDomainFromUrl("http://www.beta.microsoft.com/path/page.htm"));

outputs microsoft.com

and

Console.WriteLine(GetDomain.GetDomainFromUrl("http://www.beta.microsoft.co.uk/path/page.htm"));

outputs microsoft.co.uk

The following code uses the Uri class to obtain the host name, and then obtains the second level host (examplecompany.com) from Uri.Host by splitting the host name on periods.

var uri = new Uri("http://www.poker.winner4ever.examplecompany.com/");
var splitHostName = uri.Host.Split('.');
if (splitHostName.Length >= 2)
{
    var secondLevelHostName = splitHostName[splitHostName.Length - 2] + "." +
                              splitHostName[splitHostName.Length - 1];
}
Garr Godfrey

There may be some examples where this returns something other than what is desired, but country codes are the only ones that are 2 characters, and they may or may not have a short second level (2 or 3 characters) typically used. Therefore, this will give you what you want in most cases:

string GetRootDomain(string host)
{
    string[] domains = host.Split('.');

    if (domains.Length >= 3)
    {
        int c = domains.Length;
        // handle international country code TLDs 
        // www.amazon.co.uk => amazon.co.uk
        if (domains[c - 1].Length < 3 && domains[c - 2].Length <= 3)
            return string.Join(".", domains, c - 3, 3);
        else
            return string.Join(".", domains, c - 2, 2);
    }
    else
        return host;
}

This is not possible without a up-to-date database of different domain levels.

Consider:

s1.moh.gov.cn
moh.gov.cn
s1.google.com
google.com

Then at which level you want to get the domain? It's completely depends of the TLD, SLD, ccTLD... because ccTLD in under control of countries they may define very special SLD which is unknown to you.

You can use the following nuget Nager.PublicSuffix package.

nuget

PM> Install-Package Nager.PublicSuffix

Example

var domainParser = new DomainParser(new WebTldRuleProvider());

var domainName = domainParser.Get("sub.test.co.uk");
//domainName.Domain = "test";
//domainName.Hostname = "sub.test.co.uk";
//domainName.RegistrableDomain = "test.co.uk";
//domainName.SubDomain = "sub";
//domainName.TLD = "co.uk";

Use a regular expression:

^https?://([\w./]+[^.])?\.?(\w+\.(com)|(co.uk)|(com.au))$

This will match any URL ending with a TLD in which you are interested. Extend the list for as many as you want. Further, the capturing groups will contain the subdomain, hostname and TLD respectively.

I've written a library for use in .NET 2+ to help pick out the domain components of a URL.

More details are on github but one benefit over previous options is that it can download the latest data from http://publicsuffix.org automatically (once per month) so the output from the library should be more-or-less on a par with the output used by web browsers to establish domain security boundaries (i.e. pretty good).

It's not perfect yet but suits my needs and shouldn't take much work to adapt to other use cases so please fork and send a pull request if you want.

TruncatedCoDr
uri.Host.ToLower().Replace("www.","").Substring(uri.Host.ToLower().Replace("www.","").IndexOf('.'))
  • returns ".com" for

    Uri uri = new Uri("http://stackoverflow.com/questions/4643227/top-level-domain-from-url-in-c");

  • returns ".co.jp" for Uri uri = new Uri("http://stackoverflow.co.jp");

  • returns ".s1.moh.gov.cn" for Uri uri = new Uri("http://stackoverflow.s1.moh.gov.cn");

etc.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!