I would like to get all the tags in a document and then process each one based on the presence (or absence) of certain attributes.
E.g.,
By using the pprint module you can examine the contents of an element.
from pprint import pprint
pprint(vars(element))
Using this on a bs4 element will print something similar to this:
{'attrs': {u'class': [u'pie-productname', u'size-3', u'name', u'global-name']},
'can_be_empty_element': False,
'contents': [u'\n\t\t\t\tNESNA\n\t'],
'hidden': False,
'name': u'span',
'namespace': None,
'next_element': u'\n\t\t\t\tNESNA\n\t',
'next_sibling': u'\n',
'parent': \nBedside table\n\n\t\t\t\tNESNA\n\t\n
,
'parser_class': ,
'prefix': None,
'previous_element': u'\n',
'previous_sibling': u'\n'}
To access an attribute - lets say the class list - use the following:
class_list = element.attrs.get('class', [])
You can filter elements using this approach:
for script in soup.find_all('script'):
if script.attrs.get('for'):
# ... Has 'for' attr
elif "myClass" in script.attrs.get('class', []):
# ... Has class "myClass"
else:
# ... Do something else