Trim URL to ROOT/SUBDOMAIN with Excel

前端 未结 4 620
小蘑菇
小蘑菇 2020-12-06 08:46

I need to trim URL\'s in Microsoft Excel to the root domain and to the subdomain.

A1 = Contains https://blog.example.com/page/

相关标签:
4条回答
  • 2020-12-06 08:55

    For B1 (Extracting root domain), If A1 is complete URL:

    =SUBSTITUTE(SUBSTITUTE(REPLACE(A1,1,FIND(".",$A1),""),REPLACE(REPLACE(A1,1,FIND(".",$A1),""),1,FIND("/",REPLACE(A1,1,FIND(".",$A1),"")),""),""),"/","")
    
    0 讨论(0)
  • 2020-12-06 09:01

    Subdomain - it's Jeeped's answer, but I've added support for blank cells, because the original version outputted "/":

    =IF(ISBLANK(A1), "", SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), "www.", TEXT(,)))
    

    Domain - a version from MrExcel that supports international domains (e.g. this.co.uk). But unlike Jeeped's version it doesn't support 1 word TLDs like www.this.co or test.this.co - does anyone know how to fix this? For now I use a helper row at least for "www":

    =IF(LEFT(a1,LEN("www."))="www.",RIGHT(a1,LEN(a1)-LEN("www.")), a1)
    =SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(b1,FIND("://",b1)+3,99),b1))&"/","/",REPT(" ",99)),99))),".",REPT(" ",99)),99*(2+(LEN(TRIM(RIGHT(SUBSTITUTE(TRIM(TRIM(LEFT(SUBSTITUTE(TRIM(IFERROR(MID(b1,FIND("://",b1)+3,99),b1))&"/","/",REPT(" ",99)),99)))&".",".",REPT(" ",99)),198)))=2))))," ",".")
    

    It worked on:

                A                   |           B           |       C
    (blank)                         |   ""                  |   ""                          
    blog.test.com                   |   blog.test.com       |   test.com
    http://blog.test.com            |   blog.test.com       |   test.com
    test.com                        |   test.com            |   test.com
    http://test.com                 |   test.com            |   test.com
    https://test.com                |   test.com            |   test.com
    www.test.com                    |   test.com            |   test.com
    http://www.test.com             |   test.com            |   test.com
    https://www.test.com            |   test.com            |   test.com
    test.co.uk                      |   test.co.uk          |   test.co.uk
    http://test.co.uk               |   test.co.uk          |   test.co.uk
    https://test.co.uk              |   test.co.uk          |   test.co.uk
    www.test.co.uk                  |   test.co.uk          |   test.co.uk
    http://www.test.co.uk           |   test.co.uk          |   test.co.uk
    https://www.test.co.uk          |   test.co.uk          |   test.co.uk
    example.test.co.uk              |   example.test.co.uk  |   test.co.uk
    http://example.test.co.uk       |   example.test.co.uk  |   test.co.uk
    https://example.test.co.uk      |   example.test.co.uk  |   test.co.uk
    example.com/test                |   example.com         |   example.com
    http://example.com/test         |   example.com         |   example.com
    https://example.com/test        |   example.com         |   example.com
    http://blog.example.com/page/   |   blog.example.com    |   example.com
    example.com/page                |   example.com         |   example.com
    www.example.com/page            |   example.com         |   example.com
    
    0 讨论(0)
  • 2020-12-06 09:09

    Try this in B1,

    =SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), CHAR(46), REPT(CHAR(32), LEN(A1))), LEN(A1)*2)), CHAR(32), CHAR(46))
    

    .... and this in C1,

    =SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), "www.", TEXT(,))
    

    0 讨论(0)
  • 2020-12-06 09:12

    If your version of excel has FILTERXML function (which can be found in Excel 365, Excel 2019, Excel 2016, and Excel 2013),

    Suppose your URLs are in range A2:A29

    To find Sub Domain, enter the following formula in Cell B2 and drag it down:

    =SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(IFERROR(MID(A2,FIND("//",A2)+2,LEN(A2)),A2),"/","</s><s>")&"</s></t>","t/s[1]"),"www.","")
    

    For the logic behind this formula you may give a read to this article: Extract Words with FILTERXML.

    To find Root Domain, enter the following formula in Cell C2 and drag it down:

    =IF((SUMPRODUCT(--(MID(B2,ROW($1:$100),1)="."))-IF(SUMPRODUCT(--(MID(RIGHT(B2,8),ROW($1:$8),1)="."))=3,2,SUMPRODUCT(--(MID(RIGHT(B2,8),ROW($1:$8),1)="."))))>0,RIGHT(B2,LEN(B2)-FIND(".",B2)),B2)
    

    I used the Sub Domain from the first formula to find Root Domain. The trick is to find out if the component of the URL before the first dot . is the root domain or sub domain, and take action accordingly.

    Sample Data

    | URL                              | Sub                 | Root           |
    |----------------------------------|---------------------|----------------|
    | https://example.com/page/page    | example.com         | example.com    |
    | http://www.example.com/page/page | example.com         | example.com    |
    | http://blog.example.com/page/    | blog.example.com    | example.com    |
    | example.com/page                 | example.com         | example.com    |
    | www.example.com/page             | example.com         | example.com    |
    | blog.test.com                    | blog.test.com       | test.com       |
    | http://blog.test.com             | blog.test.com       | test.com       |
    | test.com                         | test.com            | test.com       |
    | http://blog.test.uk.net/         | blog.test.uk.net    | test.uk.net    |
    | https://test.cn                  | test.cn             | test.cn        |
    | www.test.com                     | test.com            | test.com       |
    | http://www.test.com              | test.com            | test.com       |
    | https://www.test.com             | test.com            | test.com       |
    | test.co.uk                       | test.co.uk          | test.co.uk     |
    | https://test.co.uk               | test.co.uk          | test.co.uk     |
    | www.test.co.uk                   | test.co.uk          | test.co.uk     |
    | http://www.test.co.uk            | test.co.uk          | test.co.uk     |
    | https://www.test.co.uk           | test.co.uk          | test.co.uk     |
    | blog.123.firm.in                 | blog.123.firm.in    | 123.firm.in    |
    | http://example.test.co.uk        | example.test.co.uk  | test.co.uk     |
    | https://test.7.org.au            | test.7.org.au       | 7.org.au       |
    | test.example.org.nz/page         | test.example.org.nz | example.org.nz |
    | http://example.com/test          | example.com         | example.com    |
    | https://example.com/test         | example.com         | example.com    |
    | http://blog.example.com/page/    | blog.example.com    | example.com    |
    | example.com/page                 | example.com         | example.com    |
    | www.example.com/page             | example.com         | example.com    |
    | http://blog.1.co.uk              | blog.1.co.uk        | 1.co.uk        |
    
    0 讨论(0)
提交回复
热议问题