Extracting value in Beautifulsoup

混江龙づ霸主 提交于 2019-12-22 00:44:10

问题


I have the following code:

f = open(path, 'r')
html = f.read() # no parameters => reads to eof and returns string

soup = BeautifulSoup(html)
schoolname = soup.findAll(attrs={'id':'ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel'})
print schoolname

which gives:

[<span id="ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel">A B Paterson College, Arundel, QLD</span>]

when I try and access the value (i.e. 'A B Paterson College, Arundel, QLD) by using schoolname['value'] I get the following error:

print schoolname['value'] TypeError: list indices must be integers, not str

What am I doing wrong to get that value?


回答1:


You can use contents to move down the tree:

>>> for x in schoolname:
>>>    print x.contents
[u'A B Paterson College, Arundel, QLD']    

Note that the contents doesn't necessarily have to be a string - in general it could also be more tags or a mixture of string and tags.




回答2:


findAll returns a list of strings, which is why you get an exception. I'm pretty sure your problem is solved simply by using find instead of findAll. Then you should be able to access the value you want with:

schoolname['value']

Obviously this only 'works' if you only need one specific value.



来源:https://stackoverflow.com/questions/2616659/extracting-value-in-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!