Using BeautifulSoup to select div blocks within HTML

隐身守侯 提交于 2020-01-13 03:04:16

问题


I am trying to parse several div blocks using Beautiful Soup using some html from a website. However, I cannot work out which function should be used to select these div blocks. I have tried the following:

import urllib2
from bs4 import BeautifulSoup

def getData():

    html = urllib2.urlopen("http://www.racingpost.com/horses2/results/home.sd?r_date=2013-09-22", timeout=10).read().decode('UTF-8')

    soup = BeautifulSoup(html)

    print(soup.title)
    print(soup.find_all('<div class="crBlock ">'))

getData()

I want to be able to select everything between <div class="crBlock "> and its correct end </div>. (Obviously there are other div tags but I want to select the block all the way down to the one that represents the end of this section of html.)


回答1:


The correct use would be:

soup.find_all('div', class_="crBlock ")

By default, beautiful soup will return the entire tag, including contents. You can then do whatever you want to it if you store it in a variable. If you are only looking for one div, you can also use find() instead. For instance:

div = soup.find('div', class_="crBlock ")
print(div.find_all(text='foobar'))

Check out the documentation page for more info on all the filters you can use.



来源:https://stackoverflow.com/questions/19011613/using-beautifulsoup-to-select-div-blocks-within-html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!