Using find_all in BS4 to get text as a list

拟墨画扇 提交于 2019-12-11 04:38:06

问题


I'll start by saying I'm very new with Python. I've been building a Discord bot with discord.py and Beautiful Soup 4. Here's where I'm at:

@commands.command(hidden=True)
async def roster(self):
    """Gets a list of CD's members"""
    url = "http://www.clandestine.pw/roster.html"
    async with aiohttp.get(url) as response:
        soupObject = BeautifulSoup(await response.text(), "html.parser")
    try:
        text = soupObject.find_all("font", attrs={'size': '4'})
        await self.bot.say(text)
    except:
        await self.bot.say("Not found!")

Here's the output:

Now, I've tried using get_text() in multiple different ways to strip the brackets and HTML tags from this code, but it throws an error each time. How would I be able to either achieve that or output this data into an array or list and then just print the plain text?


回答1:


Replace

text = soupObject.find_all("font", attrs={'size': '4'})

with this:

all_font_tags = soupObject.find_all("font", attrs={'size': '4'})
list_of_inner_text = [x.text for x in all_font_tags]
# If you want to print the text as a comma separated string
text = ', '.join(list_of_inner_text)



回答2:


You are returning a list of Tags from BeautifulSoup, the brackets you are seing are from the list object.

Either return them as a list of strings:

 text = [Member.get_text().encode("utf-8").strip() for Member in soup.find_all("font", attrs={'size': '4'}) if not Member.get_text().encode("utf-8").startswith("\xe2")]

Or a single string:

text = ",".join([Member.get_text().encode("utf-8") for Member in soup.find_all("font", attrs={'size': '4'}) if not Member.get_text().encode("utf-8").startswith("\xe2")])


来源:https://stackoverflow.com/questions/42652147/using-find-all-in-bs4-to-get-text-as-a-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!