Slicing URL with Python

后端 未结 10 1163
予麋鹿
予麋鹿 2020-12-15 01:17

I am working with a huge list of URL\'s. Just a quick question I have trying to slice a part of the URL out, see below:

http://www.domainname.com/page?CONTEN         


        
10条回答
  •  离开以前
    2020-12-15 01:46

    Use the urlparse module. Check this function:

    import urlparse
    
    def process_url(url, keep_params=('CONTENT_ITEM_ID=',)):
        parsed= urlparse.urlsplit(url)
        filtered_query= '&'.join(
            qry_item
            for qry_item in parsed.query.split('&')
            if qry_item.startswith(keep_params))
        return urlparse.urlunsplit(parsed[:3] + (filtered_query,) + parsed[4:])
    

    In your example:

    >>> process_url(a)
    'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'
    

    This function has the added bonus that it's easier to use if you decide that you also want some more query parameters, or if the order of the parameters is not fixed, as in:

    >>> url='http://www.domainname.com/page?other_value=xx¶m3&CONTENT_ITEM_ID=1234¶m1'
    >>> process_url(url, ('CONTENT_ITEM_ID', 'other_value'))
    'http://www.domainname.com/page?other_value=xx&CONTENT_ITEM_ID=1234'
    

提交回复
热议问题