问题
I'll start by saying I'm very new with Python. I've been building a Discord bot with discord.py and Beautiful Soup 4. Here's where I'm at:
@commands.command(hidden=True)
async def roster(self):
"""Gets a list of CD's members"""
url = "http://www.clandestine.pw/roster.html"
async with aiohttp.get(url) as response:
soupObject = BeautifulSoup(await response.text(), "html.parser")
try:
text = soupObject.find_all("font", attrs={'size': '4'})
await self.bot.say(text)
except:
await self.bot.say("Not found!")
Here's the output:
Now, I've tried using get_text()
in multiple different ways to strip the brackets and HTML tags from this code, but it throws an error each time. How would I be able to either achieve that or output this data into an array or list and then just print the plain text?
回答1:
Replace
text = soupObject.find_all("font", attrs={'size': '4'})
with this:
all_font_tags = soupObject.find_all("font", attrs={'size': '4'})
list_of_inner_text = [x.text for x in all_font_tags]
# If you want to print the text as a comma separated string
text = ', '.join(list_of_inner_text)
回答2:
You are returning a list of Tags
from BeautifulSoup, the brackets you are seing are from the list object.
Either return them as a list of strings:
text = [Member.get_text().encode("utf-8").strip() for Member in soup.find_all("font", attrs={'size': '4'}) if not Member.get_text().encode("utf-8").startswith("\xe2")]
Or a single string:
text = ",".join([Member.get_text().encode("utf-8") for Member in soup.find_all("font", attrs={'size': '4'}) if not Member.get_text().encode("utf-8").startswith("\xe2")])
来源:https://stackoverflow.com/questions/42652147/using-find-all-in-bs4-to-get-text-as-a-list