在开发网络爬虫时,我们常常需要面对动态页面。例如,网页中的JavaScript脚本会在网页加载后再修改、填充网页内容;或者需要在网页上进行交互(如登录,点击按钮、链接等’)才能获取到需要的内容。我们可以使用python-selenium库来运行JavaScript程序和模拟交互。
本文介绍在Ubuntu操作系统上安装Chrome webdriver和Python Selenium库。
1. 安装Chrome
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
echo 'deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main' | sudo tee /etc/apt/sources.list.d/google-chrome.list
apt -y install google-chrome-stable
注意看console上打印的信息,chrome的版本是79.0,接下来安装的webdriver要对应好版本。
Setting up google-chrome-stable (79.0.3945.88-1) ...
update-alternatives: using /usr/bin/google-chrome-stable to provide /usr/bin/x-www-browser (x-www-browser) in auto mode
update-alternatives: using /usr/bin/google-chrome-stable to provide /usr/bin/gnome-www-browser (gnome-www-browser) in auto mode
update-alternatives: using /usr/bin/google-chrome-stable to provide /usr/bin/google-chrome (google-chrome) in auto mode
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for mime-support (3.60ubuntu1) ...
2. 安装webdriver
# 可以从http://chromedriver.storage.googleapis.com/index.html找到合适的版本号。主版本号一定要与前面安装的chrome一致。
wget http://chromedriver.storage.googleapis.com/79.0.3945.36/chromedriver_linux64.zip
chmod a+x chromedriver
mv chromedriver /usr/local/share/
ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
3. 安装Selenium Python库
pip3 install selenium
4. 测试
写一个Python脚本测试一下:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
url = "https://www.python.org"
driver = webdriver.Chrome(options=chrome_options)
try:
driver.get(url)
menu = driver.find_element_by_xpath("//ul[@class='menu' and @role='tree']")
for li in menu.find_elements_by_tag_name("li"):
print(li.text)
except Exception as e:
raise e
finally:
driver.quit()
运行结果:
python3 test.py
Python
PSF
Docs
PyPI
Jobs
Community
检查一下是否还有残留的chrome和chromedriver进程:
ps -ef | grep chrome
root 14366 12560 0 09:17 pts/0 00:00:00 grep --color=auto chrome
来源:CSDN
作者:cyxueecust
链接:https://blog.csdn.net/cyxueecust/article/details/103793461