问题
This question is for Python 3.6.3, bs4 and Selenium 3.8 on Win10.
I am trying to scrape pages with dynamic content. What I am trying to scrape is numbers and text (from http://www.oddsportal.com for example). From my understanding using requests+beautifulsoup will not do the job, as dynamic content will be hidden. So I have to use other tools such us selenium webdriver.
Then, given that I will use selenium webdriver anyway, do you recommend ignoring beautifulsoup and stick with selenium webdriver functions, eg
elem = driver.find_element_by_name("q"))
or is it considered better practice to use selenium+beautifulsoup?
Do you have any opinion as to which of the two routes will give me more convenient functions to work with?
Thanks.
回答1:
Beautifulsoup
Beautifulsoup is a powerful tool for Web Scrapping. It use the urllib.request Python library. urllib.request is quite powerful to extract data from static pages.
Selenium
Selenium is currently the most widely accepted and efficient tool for Web Automation. Selenium supports interacting with Dynamic Pages, Contents and Elements.
Conclusion
To create a robust and efficient framework to scrape pages with dynamic content you must integrate both Selenium and Beautifulsoup in your framework. Browse and interact with dynamic elements through Selenium and scrape the contents efficiently through Beautifulsoup
An Example
Here is an example using Selenium and Beautifulsoup for Scrapping
回答2:
Selenium has many selectors
find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
# and
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
so mostly you don't need BeautifulSoup.
Especially xpath and css_selector can be useful.
来源:https://stackoverflow.com/questions/47983495/python-which-is-considered-better-for-scrapping-selenium-or-beautifulsoup-wit