Getting BeautifulSoup to catch tags in a non-case-sensitive way

ぃ、小莉子 提交于 2021-02-07 08:35:58

问题


I want to catch some tags with BeautifulSoup: Some <p> tags, the <title> tag, some <meta> tags. But I want to catch them regardless of their case; I know that some sites do meta like this: <META> and I want to be able to catch that.

I noticed that BeautifulSoup is case-sensitive by default. How do I catch these tags in a non-case-sensitive way?


回答1:


You can use soup.findAll which should match case-insensitively:

import BeautifulSoup

html = '''<html>
<head>
<meta name="description" content="Free Web tutorials on HTML, CSS, XML" /> 
<META name="keywords" content="HTML, CSS, XML" /> 
<title>Test</title>
</head>
<body>
</body>
</html>'''

soup = BeautifulSoup.BeautifulSoup(html)
for x in soup.findAll('meta'):
    print x

Result:

<meta name="description" content="Free Web tutorials on HTML, CSS, XML" />
<meta name="keywords" content="HTML, CSS, XML" />



回答2:


BeautifulSoup standardises the parse tree on input. It converts tags to lower-case. You don't have anything to worry about IMO.



来源:https://stackoverflow.com/questions/3352563/getting-beautifulsoup-to-catch-tags-in-a-non-case-sensitive-way

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!