How do you parse HTML with a variety of languages and parsing libraries?
When answering:
Individual comments will be linked to in answers to questions
language: Python
library: BeautifulSoup
from BeautifulSoup import BeautifulSoup
html = ""
for link in ("foo", "bar", "baz"):
html += '%s' % (link, link)
html += ""
soup = BeautifulSoup(html)
links = soup.findAll('a', href=True) # find with a defined href attribute
print links
output:
[foo,
bar,
baz]
also possible:
for link in links:
print link['href']
output:
http://foo.com
http://bar.com
http://baz.com