bs4 | 易学教程

How to find all comments with Beautiful Soup

阅读更多关于 How to find all comments with Beautiful Soup

This question was asked four years ago, but the answer is now out of date for BS4. I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string , I thought this code would work: for comments in soup.find_all('comment'): comments.decompose() So that didn't work.... How do I find all comments using BS4? Flickerlight You can pass a function to find_all() to help it check whether the string is a Comment. For example I have below html: <body>  <div class="Branding">The Science & Safety

BeautifulSoup安装及其应用

阅读更多关于 BeautifulSoup安装及其应用

BeautifulSoup 安装及其使用 BeautifulSoup 是个好东东。官网见这里： http://www.crummy.com/software/BeautifulSoup/ 下载地址见这里： http://www.crummy.com/software/BeautifulSoup/bs4/download/4.1/ ，附件有4.1.2的安装源码文档见这里： http://www.crummy.com/software/BeautifulSoup/bs3/documentation.zh.html ，是中文翻译的，不过文档有点旧，是 3.0 的文档版本，看起来没有什么意思。我推荐大家看个： http://www.crummy.com/software/BeautifulSoup/bs4/doc/ ，这个是 python 的官网英文版，看起来要舒服，清晰很多。在 python 下，你想按照 jquery 格式来读取网页，免除网页格式、标签的不规范的困扰，那么 BeautifulSoup 是个不错的选择。按照官网所说， BeautifulSoup 是 Screen-Scraping 应用，旨在节省大家处理 HTML 标签，并且从网络中获得信息的工程。 BeautifulSoup 有这么几个优点，使得其功能尤其强大： 1 ： Beautiful Soup

Extract `src` attribute from `img` tag using BeautifulSoup

阅读更多关于 Extract `src` attribute from `img` tag using BeautifulSoup

问题 <div class="someClass"> <a href="href"> <img alt="some" src="some"/> </a> </div> I use bs4 and I cannot use a.attrs['src'] to get the src , but I can get href . What should I do? 回答1: You can use BeautifulSoup to extract src attribute of an html img tag. In my example, the htmlText contains the img tag itself but this can be used for a URL too along with urllib2 . For URLs from BeautifulSoup import BeautifulSoup as BSHTML import urllib2 page = urllib2.urlopen('http://www.youtube.com/') soup =