Extract text between link tags in python using BeautifulSoup

一个人想着一个人 提交于 2019-12-01 09:57:17

问题


I have an html code like this:

<h2 class="title"><a href="http://www.gurletins.com">My HomePage</a></h2>

<h2 class="title"><a href="http://www.gurletins.com/sections">Sections</a></h2>

I need to extract the texts (link descriptions) between 'a' tags. I need an array to store these like:

a[0] = "My HomePage"

a[1] = "Sections"

I need to do this in python using BeautifulSoup.

Please help me, thank you!


回答1:


You can do something like this:

import BeautifulSoup

html = """
<html><head></head>
<body>
<h2 class='title'><a href='http://www.gurletins.com'>My HomePage</a></h2>
<h2 class='title'><a href='http://www.gurletins.com/sections'>Sections</a></h2>
</body>
</html>
"""

soup = BeautifulSoup.BeautifulSoup(html)

print [elm.a.text for elm in soup.findAll('h2', {'class': 'title'})]
# Output: [u'My HomePage', u'Sections']



回答2:


print [a.findAll(text=True) for a in soup.findAll('a')]




回答3:


The following code extracts text (link descriptions) between 'a' tags and stores in an array.

>>> from bs4 import BeautifulSoup
>>> data = """<h2 class="title"><a href="http://www.gurletins.com">My 
HomePage</a></h2>
...
... <h2 class="title"><a href="http://www.gurletins.com/sections">Sections</a>
</h2>"""
>>> soup = BeautifulSoup(data, "html.parser")
>>> reqTxt = soup.find_all("h2", {"class":"title"})
>>> a = []
>>> for i in reqTxt:
...     a.append(i.get_text())
...
>>> a
['My HomePage', 'Sections']
>>> a[0]
'My HomePage'
>>> a[1]
'Sections'


来源:https://stackoverflow.com/questions/6251319/extract-text-between-link-tags-in-python-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!