parsing a url in python with changing part in it

≡放荡痞女 提交于 2019-12-31 03:53:05

问题


I'm parsing a url in Python, below you can find a sample url and the code, what i want to do is splitting the (74743) from the url and make a for loop which will be taking it from a parts list. Tried to use urlparse but couldn't complete it to the end mostly because of the changing parts in the url. Ijust want the easiest and fastest way to do this.

Sample url:

http://example.com/wps/portal/lYuxDoIwGAYf6f9aqKSjMNQ/?PartNo=74743&IntNumberOf=&is=

(http://example.com/wps/portal) Always fixed

(lYuxDoIwGAYf6f9aqKSjMNQ) Always changing

(74743) Will be taken from a list name Parts

(IntNumberOf=&is=) Also changing depending on the section of the website

Here's the Code:

from lxml import html
import requests
import urlparse


Parts = [74743, 85731, 93021]

url = 'http://example.com/wps/portal/lYuxDoIwGAYf6f9aqKSjMNQ/?PartNo=74743&IntNumberOf=&is='

parsing = urlparse.urlsplit(url)

print parsing

回答1:


>>> import urlparse

>>> url = 'http://example.com/wps/portal/lYuxDoIwGAYf6f9aqKSjMNQ/?PartNo=74743&IntNumberOf=&is='

>>> split_url = urlparse.urlsplit(url)
>>> split_url.path
'/wps/portal/lYuxDoIwGAYf6f9aqKSjMNQ/'

You can split the path into a list of strings using '/', slice the list, and re-join:

>>> path = split_url.path
>>> path.split('/')
['', 'wps', 'portal', 'lYuxDoIwGAYf6f9aqKSjMNQ', '']

Slice off the last two:

>>> path.split('/')[:-2]
['', 'wps', 'portal']

And re-join:

>>> '/'.join(path.split('/')[:-2])
'/wps/portal'

To parse the query, use parse_qs:

>>> parsed_query = urlparse.parse_qs(split_url.query)
{'PartNo': ['74743']}

To keep the empty parameters use keep_blank_values=True:

>>> query = urlparse.parse_qs(split_url.query, keep_blank_values=True)
>>> query
{'PartNo': ['74743'], 'is': [''], 'IntNumberOf': ['']}

You can then modify the query dictionary:

>>> query['PartNo'] = 85731

And update the original split_url:

>>> updated = split_url._replace(path='/'.join(base_path.split('/')[:-2] +
                                              ['ASDFZXCVQWER', '']),
                                query=urllib.urlencode(query, doseq=True))

>>> urlparse.urlunsplit(updated)
'http://example.com/wps/portal/ASDFZXCVQWER/?PartNo=85731&IntNumberOf=&is='


来源:https://stackoverflow.com/questions/33203727/parsing-a-url-in-python-with-changing-part-in-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!