Extract a word that follows a particular word from a webpage with python

倾然丶 夕夏残阳落幕 提交于 2020-01-17 05:04:43

问题


I am writing a simple web scraper script to extract a single word from a web page. The word I require changes regularly, but comes after a word that never changes, so I can search for it.

This my script so far:

#!/bin/python

import requests
response = requests.get('http://vpnbook.com/freevpn')
print(response.text)

Which obviously prints the whole HTML of the page. But the bit I need is the password:

<li>All bundles include UDP53, UDP 25000, TCP 80, TCP 443 profile</li>
<li>Username: <strong>vpnbook</strong></li>
<li>Password: <strong>binbd5ar</strong></li>
</ul>  

How could I print ONLY 'binbd5ar' (or whatever replaces it) to STOUT?


回答1:


from bs4 import BeautifulSoup
import requests

response = requests.get('http://vpnbook.com/freevpn')
soup = BeautifulSoup(response.text, 'html.parser')
pricing = soup.find(id = 'pricing')
first_column = pricing.find('div', {'class': 'one-third'})
for li in first_column.find('ul', {'class': 'disc'}):
    if 'password' in str(li).lower():
        password = li.find('strong').text
print(password)



回答2:


import re
re.search(r'Password: <strong>(.+)</strong>',response.text).group(1)



回答3:


You can use regex search.

"Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string" link

>>> import re
>>> x = re.search(r"Password: <strong>(?P<pass>\w+)</strong>", response.text)
>>> print x.groupdict()
{'pass': 'binbd5ar'}



回答4:


password = re.match(r'Password: <strong>(.*?)</strong>',response.text).group(1)

then to change it

re.sub(password,newPassword,response.text,max = 1)


来源:https://stackoverflow.com/questions/32894497/extract-a-word-that-follows-a-particular-word-from-a-webpage-with-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!