Beautiful Soup can't find the part of the HTML I want

て烟熏妆下的殇ゞ 提交于 2019-12-06 02:33:41

Try this:

from bs4 import BeautifulSoup as bs

html='''<div class="legend-block legend-block--pageviews">
      <h5>Pageviews</h5><hr>
      <div class="legend-block--body">
        <div class="linear-legend--counts">
          Pageviews:
          <span class="pull-right">101,172
          </span>
        </div>
        <div class="linear-legend--counts">
          Daily average:
          <span class="pull-right">
            4,818
          </span>
        </div></div></div>'''
soup = bs(html, 'html.parser')
div = soup.find("div", {"class": "linear-legend--counts"})
span = div.find('span')
text = span.get_text()
print(text)

output:

101,172

simply in one line:

soup = bs(html, 'html.parser')
result = soup.find("div", {"class": "linear-legend--counts"}).find('span').get_text()

EDIT:

As OP has posted another question which can be a possible duplicate for this one, He had found an answer. For someone who is looking for an answer for a similar kind of a question I will post the accepted answer for this question. It can be found here.

The javascript code won't get executed if you retrieve page with the requests.get. So the selenium shall be used instead. It will mimic user like behaviour with the opening of the page in browser, so the js code will be executed.

To start with selenium, you need to install with pip install selenium. Then to retrieve your item use code below:

from selenium import webdriver

browser = webdriver.Firefox()
# List of the page url and selector of element to retrieve.
wiki_pages = [("https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&range=latest-20&pages=Star_Wars:_The_Last_Jedi",
               ".summary-column--container .legend-block--pageviews .linear-legend--counts:first-child span.pull-right"),]
for wiki_page in wiki_pages:
    url = wiki_page[0]
    selector = wiki_page[1]
    browser.get(wiki_page)
    page_views_count = browser.find_element_by_css_selector(selector)
    print page_views_count.text
browser.quit()

NOTE: If you need to run headless browser, consider using PyVirtualDisplay (a wrapper for Xvfb) to run headless WebDriver tests, see 'How do I run Selenium in Xvfb?' for more information.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!