Apply BeautifulSoup function to Pandas DataFrame

家住魔仙堡 提交于 2020-01-06 05:56:09

问题


I have a Pandas DataFrame that I got from reading a csv, in that file there is HTML tags I want to remove. I want to remove the tags with BeautifulSoup because it is more reliable than using a simple regex like <.*?>.

I usually remove HTML tags from Strings by executing

text = BeautifulSoup(text, 'html.parser').get_text()

Now I want to do this with every element in my DataFrame, so I tried the following:

df.apply(lambda text: BeautifulSoup(text, 'html.parser').get_text())

But that returns the following error:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index id')

回答1:


Use applymap

Ex:

import pandas as pd
from bs4 import BeautifulSoup


df = pd.DataFrame({"a": ["<a>Hello</a>"], "b":["<c>World</c>"]})
print(df.applymap(lambda text: BeautifulSoup(text, 'html.parser').get_text()))

Output:

       a      b
0  Hello  World

MoreInfo



来源:https://stackoverflow.com/questions/53189494/apply-beautifulsoup-function-to-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!