I\'m having trouble parsing HTML elements with \"class\" attribute using Beautifulsoup. The code looks like this
soup = BeautifulSoup(sdata)
mydivs = soup.fi
Other answers did not work for me.
In other answers the findAll
is being used on the soup object itself, but I needed a way to do a find by class name on objects inside a specific element extracted from the object I obtained after doing findAll
.
If you are trying to do a search inside nested HTML elements to get objects by class name, try below -
# parse html
page_soup = soup(web_page.read(), "html.parser")
# filter out items matching class name
all_songs = page_soup.findAll("li", "song_item")
# traverse through all_songs
for song in all_songs:
# get text out of span element matching class 'song_name'
# doing a 'find' by class name within a specific song element taken out of 'all_songs' collection
song.find("span", "song_name").text
Points to note:
I'm not explicitly defining the search to be on 'class' attribute findAll("li", {"class": "song_item"})
, since it's the only attribute I'm searching on and it will by default search for class attribute if you don't exclusively tell which attribute you want to find on.
When you do a findAll
or find
, the resulting object is of class bs4.element.ResultSet
which is a subclass of list
. You can utilize all methods of ResultSet
, inside any number of nested elements (as long as they are of type ResultSet
) to do a find or find all.
My BS4 version - 4.9.1, Python version - 3.8.1