问题
I have a Pandas DataFrame that I got from reading a csv, in that file there is HTML tags I want to remove. I want to remove the tags with BeautifulSoup because it is more reliable than using a simple regex like <.*?>.
I usually remove HTML tags from Strings by executing
text = BeautifulSoup(text, 'html.parser').get_text()
Now I want to do this with every element in my DataFrame, so I tried the following:
df.apply(lambda text: BeautifulSoup(text, 'html.parser').get_text())
But that returns the following error:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index id')
回答1:
Use applymap
Ex:
import pandas as pd
from bs4 import BeautifulSoup
df = pd.DataFrame({"a": ["<a>Hello</a>"], "b":["<c>World</c>"]})
print(df.applymap(lambda text: BeautifulSoup(text, 'html.parser').get_text()))
Output:
a b
0 Hello World
MoreInfo
来源:https://stackoverflow.com/questions/53189494/apply-beautifulsoup-function-to-pandas-dataframe