Using beautifulsoup get_text()

感情迁移 提交于 2019-12-23 23:44:45

问题


I can parse the field that I need from a website with this code block:

response = requests.get(index_url)
soup = bs4.BeautifulSoup(response.text, "lxml")
poem = soup.select('div.siir p[id^=siir]')
print poem

But it prints with HTML tags. I'm trying to use get_text() function. When I try to use like this:

print poem.get_text()

I get this error:

AttributeError: 'list' object has no attribute 'get_text'

I also tried to use like this:

poem = soup.select('div.siir p[id^=siir]').get_text()

I get same error again. How can I eliminate the HTML tags after I parse the correct field?


回答1:


soup.select() always returns a list of elements, not just one element. Call get_text() on each element in turn:

for element in poem:
    print element.get_text()

If you expected just one element, then extract it with indexing:

print poem[0].get_text()


来源:https://stackoverflow.com/questions/33318980/using-beautifulsoup-get-text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!