Remove all <a> tags

大兔子大兔子 提交于 2019-12-21 22:38:25

问题


I scraped one container which includes urls for example:

<a href="url">text</a>

I need all to be removed and only the text remain...

import urllib2, sys
from bs4 import BeautifulSoup

site = "http://mysite.com"
page = urllib2.urlopen(site)
soup = BeautifulSoup(page)

Is it possible?


回答1:


soup = BeautifulSoup(page)
anchors = soup.findAll('a')
for anchor in anchors:
  anchor.replaceWithChildren()



回答2:


You can do this with Bleach

PyPi - Bleach

>>> import bleach

>>> bleach.clean('an <script>evil()</script> example')
u'an &lt;script&gt;evil()&lt;/script&gt; example'

>>> bleach.linkify('an http://example.com url')
u'an <a href="http://example.com" rel="nofollow">http://example.com</a> url

>>> bleach.delinkify('a <a href="http://ex.mp">link</a>')
u'a link'


来源:https://stackoverflow.com/questions/13058284/remove-all-a-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!