How would I, using BeautifulSoup, search for tags containing ONLY the attributes I search for?
For example, I want to find all Adding a combination of Chris Redford's and Amr's answer, you can also search for an attribute name with any value with the select command: As explained on the BeautifulSoup documentation You may use this : EDIT : To return tags that have only the valign="top" attribute, you can check for the length of the tag That returns : Just pass it as an argument of The easiest way to do this is with the new CSS style if you want to only search with attribute name with any value as per Steve Lorimer better to pass True instead of regex You can use
from bs4 import BeautifulSoup as Soup
html = '<td valign="top">.....</td>\
<td width="580" valign="top">.......</td>\
<td>.....</td>'
soup = Soup(html, 'lxml')
results = soup.select('td[valign]')
soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})
attrs
property :from BeautifulSoup import BeautifulSoup
html = '<td valign="top">.....</td>\
<td width="580" valign="top">.......</td>\
<td>.....</td>'
soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})
for result in results :
if len(result.attrs) == 1 :
print result
<td valign="top">.....</td>
findAll
:>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup("""
... <html>
... <head><title>My Title!</title></head>
... <body><table>
... <tr><td>First!</td>
... <td valign="top">Second!</td></tr>
... </table></body><html>
... """)
>>>
>>> soup.findAll('td')
[<td>First!</td>, <td valign="top">Second!</td>]
>>>
>>> soup.findAll('td', valign='top')
[<td valign="top">Second!</td>]
select
method:soup = BeautifulSoup(html)
results = soup.select('td[valign="top"]')
from bs4 import BeautifulSoup
import re
soup= BeautifulSoup(html.text,'lxml')
results = soup.findAll("td", {"valign" : re.compile(r".*")})
results = soup.findAll("td", {"valign" : True})
lambda
functions in findAll
as explained in documentation. So that in your case to search for td
tag with only valign = "top"
use following:td_tag_list = soup.findAll(
lambda tag:tag.name == "td" and
len(tag.attrs) == 1 and
tag["valign"] == "top")