问题
I need to trim URL's in Microsoft Excel to the root domain and to the subdomain.
A1 = Contains https://blog.example.com/page/
B1 = Should result in example.com
C1 = Should result in blog.example.com
Two formulas removing http, https, .www and PATH. The first version (B1) should also remove SUBDOMAIN.
I only have one formula right now:
=MID(SUBSTITUTE(A2;"www.";"");SEARCH(":";A2)+3;SEARCH("/";SUBSTITUTE(A2;"www.";"");9)-SEARCH(":";A2)-3)
https://example.com/page/page results in example.comhttp://www.example.com/page/page results in example.comhttp://blog.example.com/page/ results in blog.example.comexample.com/page results in #VALUE!www.example.com/page results in #VALUE!
As you can see in the sample above I get good results. But it dosen't work without http or https. Also this version keeps subdomain.
回答1:
Try this in B1,
=SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), CHAR(46), REPT(CHAR(32), LEN(A1))), LEN(A1)*2)), CHAR(32), CHAR(46))
.... and this in C1,
=SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), "www.", TEXT(,))
回答2:
Subdomain - it's Jeeped's answer, but I've added support for blank cells, because the original version outputted "/":
=IF(ISBLANK(A1), "", SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), "www.", TEXT(,)))
Domain - a version from MrExcel that supports international domains (e.g. this.co.uk). But unlike Jeeped's version it doesn't support 1 word TLDs like www.this.co or test.this.co - does anyone know how to fix this? For now I use a helper row at least for "www":
=IF(LEFT(a1,LEN("www."))="www.",RIGHT(a1,LEN(a1)-LEN("www.")), a1)
=SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(b1,FIND("://",b1)+3,99),b1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(b1,FIND("://",b1)+3,99),b1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))," ",".")
It worked on:
A | B | C
(blank) | "" | ""
blog.test.com | blog.test.com | test.com
http://blog.test.com | blog.test.com | test.com
test.com | test.com | test.com
http://test.com | test.com | test.com
https://test.com | test.com | test.com
www.test.com | test.com | test.com
http://www.test.com | test.com | test.com
https://www.test.com | test.com | test.com
test.co.uk | test.co.uk | test.co.uk
http://test.co.uk | test.co.uk | test.co.uk
https://test.co.uk | test.co.uk | test.co.uk
www.test.co.uk | test.co.uk | test.co.uk
http://www.test.co.uk | test.co.uk | test.co.uk
https://www.test.co.uk | test.co.uk | test.co.uk
example.test.co.uk | example.test.co.uk | test.co.uk
http://example.test.co.uk | example.test.co.uk | test.co.uk
https://example.test.co.uk | example.test.co.uk | test.co.uk
example.com/test | example.com | example.com
http://example.com/test | example.com | example.com
https://example.com/test | example.com | example.com
http://blog.example.com/page/ | blog.example.com | example.com
example.com/page | example.com | example.com
www.example.com/page | example.com | example.com
回答3:
If your version of excel has FILTERXML function (which can be found in Excel 365, Excel 2019, Excel 2016, and Excel 2013),
Suppose your URLs are in range A2:A29
To find Sub Domain, enter the following formula in Cell B2 and drag it down:
=SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(IFERROR(MID(A2,FIND("//",A2)+2,LEN(A2)),A2),"/","</s><s>")&"</s></t>","t/s[1]"),"www.","")
For the logic behind this formula you may give a read to this article: Extract Words with FILTERXML.
To find Root Domain, enter the following formula in Cell C2 and drag it down:
=IF((SUMPRODUCT(--(MID(B2,ROW($1:$100),1)="."))-IF(SUMPRODUCT(--(MID(RIGHT(B2,8),ROW($1:$8),1)="."))=3,2,SUMPRODUCT(--(MID(RIGHT(B2,8),ROW($1:$8),1)="."))))>0,RIGHT(B2,LEN(B2)-FIND(".",B2)),B2)
I used the Sub Domain from the first formula to find Root Domain. The trick is to find out if the component of the URL before the first dot
.is the root domain or sub domain, and take action accordingly.
Sample Data
| URL | Sub | Root |
|----------------------------------|---------------------|----------------|
| https://example.com/page/page | example.com | example.com |
| http://www.example.com/page/page | example.com | example.com |
| http://blog.example.com/page/ | blog.example.com | example.com |
| example.com/page | example.com | example.com |
| www.example.com/page | example.com | example.com |
| blog.test.com | blog.test.com | test.com |
| http://blog.test.com | blog.test.com | test.com |
| test.com | test.com | test.com |
| http://blog.test.uk.net/ | blog.test.uk.net | test.uk.net |
| https://test.cn | test.cn | test.cn |
| www.test.com | test.com | test.com |
| http://www.test.com | test.com | test.com |
| https://www.test.com | test.com | test.com |
| test.co.uk | test.co.uk | test.co.uk |
| https://test.co.uk | test.co.uk | test.co.uk |
| www.test.co.uk | test.co.uk | test.co.uk |
| http://www.test.co.uk | test.co.uk | test.co.uk |
| https://www.test.co.uk | test.co.uk | test.co.uk |
| blog.123.firm.in | blog.123.firm.in | 123.firm.in |
| http://example.test.co.uk | example.test.co.uk | test.co.uk |
| https://test.7.org.au | test.7.org.au | 7.org.au |
| test.example.org.nz/page | test.example.org.nz | example.org.nz |
| http://example.com/test | example.com | example.com |
| https://example.com/test | example.com | example.com |
| http://blog.example.com/page/ | blog.example.com | example.com |
| example.com/page | example.com | example.com |
| www.example.com/page | example.com | example.com |
| http://blog.1.co.uk | blog.1.co.uk | 1.co.uk |
来源:https://stackoverflow.com/questions/39221589/trim-url-to-root-subdomain-with-excel