I need to trim URL\'s in Microsoft Excel to the root domain and to the subdomain.
A1 = Contains https://blog.example.com/page/
For B1 (Extracting root domain), If A1 is complete URL:
=SUBSTITUTE(SUBSTITUTE(REPLACE(A1,1,FIND(".",$A1),""),REPLACE(REPLACE(A1,1,FIND(".",$A1),""),1,FIND("/",REPLACE(A1,1,FIND(".",$A1),"")),""),""),"/","")
Subdomain - it's Jeeped's answer, but I've added support for blank cells, because the original version outputted "/":
=IF(ISBLANK(A1), "", SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), "www.", TEXT(,)))
Domain - a version from MrExcel that supports international domains (e.g. this.co.uk). But unlike Jeeped's version it doesn't support 1 word TLDs like www.this.co or test.this.co - does anyone know how to fix this? For now I use a helper row at least for "www":
=IF(LEFT(a1,LEN("www."))="www.",RIGHT(a1,LEN(a1)-LEN("www.")), a1)
=SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(b1,FIND("://",b1)+3,99),b1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(b1,FIND("://",b1)+3,99),b1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))," ",".")
It worked on:
A | B | C
(blank) | "" | ""
blog.test.com | blog.test.com | test.com
http://blog.test.com | blog.test.com | test.com
test.com | test.com | test.com
http://test.com | test.com | test.com
https://test.com | test.com | test.com
www.test.com | test.com | test.com
http://www.test.com | test.com | test.com
https://www.test.com | test.com | test.com
test.co.uk | test.co.uk | test.co.uk
http://test.co.uk | test.co.uk | test.co.uk
https://test.co.uk | test.co.uk | test.co.uk
www.test.co.uk | test.co.uk | test.co.uk
http://www.test.co.uk | test.co.uk | test.co.uk
https://www.test.co.uk | test.co.uk | test.co.uk
example.test.co.uk | example.test.co.uk | test.co.uk
http://example.test.co.uk | example.test.co.uk | test.co.uk
https://example.test.co.uk | example.test.co.uk | test.co.uk
example.com/test | example.com | example.com
http://example.com/test | example.com | example.com
https://example.com/test | example.com | example.com
http://blog.example.com/page/ | blog.example.com | example.com
example.com/page | example.com | example.com
www.example.com/page | example.com | example.com
Try this in B1,
=SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), CHAR(46), REPT(CHAR(32), LEN(A1))), LEN(A1)*2)), CHAR(32), CHAR(46))
.... and this in C1,
=SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), "www.", TEXT(,))
If your version of excel has FILTERXML function (which can be found in Excel 365, Excel 2019, Excel 2016, and Excel 2013),
Suppose your URLs are in range A2:A29
To find Sub Domain, enter the following formula in Cell B2 and drag it down:
=SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(IFERROR(MID(A2,FIND("//",A2)+2,LEN(A2)),A2),"/","</s><s>")&"</s></t>","t/s[1]"),"www.","")
For the logic behind this formula you may give a read to this article: Extract Words with FILTERXML.
To find Root Domain, enter the following formula in Cell C2 and drag it down:
=IF((SUMPRODUCT(--(MID(B2,ROW($1:$100),1)="."))-IF(SUMPRODUCT(--(MID(RIGHT(B2,8),ROW($1:$8),1)="."))=3,2,SUMPRODUCT(--(MID(RIGHT(B2,8),ROW($1:$8),1)="."))))>0,RIGHT(B2,LEN(B2)-FIND(".",B2)),B2)
I used the Sub Domain from the first formula to find Root Domain. The trick is to find out if the component of the URL before the first dot
.is the root domain or sub domain, and take action accordingly.
Sample Data
| URL | Sub | Root |
|----------------------------------|---------------------|----------------|
| https://example.com/page/page | example.com | example.com |
| http://www.example.com/page/page | example.com | example.com |
| http://blog.example.com/page/ | blog.example.com | example.com |
| example.com/page | example.com | example.com |
| www.example.com/page | example.com | example.com |
| blog.test.com | blog.test.com | test.com |
| http://blog.test.com | blog.test.com | test.com |
| test.com | test.com | test.com |
| http://blog.test.uk.net/ | blog.test.uk.net | test.uk.net |
| https://test.cn | test.cn | test.cn |
| www.test.com | test.com | test.com |
| http://www.test.com | test.com | test.com |
| https://www.test.com | test.com | test.com |
| test.co.uk | test.co.uk | test.co.uk |
| https://test.co.uk | test.co.uk | test.co.uk |
| www.test.co.uk | test.co.uk | test.co.uk |
| http://www.test.co.uk | test.co.uk | test.co.uk |
| https://www.test.co.uk | test.co.uk | test.co.uk |
| blog.123.firm.in | blog.123.firm.in | 123.firm.in |
| http://example.test.co.uk | example.test.co.uk | test.co.uk |
| https://test.7.org.au | test.7.org.au | 7.org.au |
| test.example.org.nz/page | test.example.org.nz | example.org.nz |
| http://example.com/test | example.com | example.com |
| https://example.com/test | example.com | example.com |
| http://blog.example.com/page/ | blog.example.com | example.com |
| example.com/page | example.com | example.com |
| www.example.com/page | example.com | example.com |
| http://blog.1.co.uk | blog.1.co.uk | 1.co.uk |