问题
This question is for Python 3.6.3, bs4 and Selenium 3.8 on Win10.
I am trying to scrape pages with dynamic content. What I am trying to scrape is numbers and text (from http://www.oddsportal.com for example). From my understanding using requests+beautifulsoup will not do the job, as dynamic content will be hidden. So I have to use other tools such us selenium webdriver.
Then, given that I will use selenium webdriver anyway, do you recommend ignoring beautifulsoup and stick with selenium webdriver functions, eg
elem = driver.find_element_by_name("q"))
or is it considered better practice to use selenium+beautifulsoup?
Do you have any opinion as to which of the two routes will give me more convenient functions to work with?
Thanks.
回答1:
Beautifulsoup
Beautifulsoup is a powerful tool for Web Scrapping. It use the urllib.request Python library. urllib.request
is quite powerful to extract data from static pages.
Selenium
Selenium is currently the most widely accepted and efficient tool for Web Automation. Selenium supports interacting with Dynamic Pages, Contents and Elements
.
Conclusion
To create a robust and efficient framework to scrape pages with dynamic content you must integrate both Selenium
and Beautifulsoup
in your framework. Browse and interact with dynamic elements through Selenium
and scrape the contents efficiently through Beautifulsoup
An Example
Here is an example using Selenium
and Beautifulsoup
for Scrapping
回答2:
Selenium
has many selectors
find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
# and
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
so mostly you don't need BeautifulSoup
.
Especially xpath
and css_selector
can be useful.
来源:https://stackoverflow.com/questions/47983495/python-which-is-considered-better-for-scrapping-selenium-or-beautifulsoup-wit