I\'m trying to parse out content from specific meta tags. Here\'s the structure of the meta tags. The first two are closed with a backslash, but the rest don\'t have any clo
Edited: Added regex for case sensitivity as suggested by @Albert Chen.
Python 3 Edit:
from bs4 import BeautifulSoup
import re
import urllib.request
page3 = urllib.request.urlopen("https://angel.co/uber").read()
soup3 = BeautifulSoup(page3)
desc = soup3.findAll(attrs={"name": re.compile(r"description", re.I)})
print(desc[0]['content'])
Although I'm not sure it will work for every page:
from bs4 import BeautifulSoup
import re
import urllib
page3 = urllib.urlopen("https://angel.co/uber").read()
soup3 = BeautifulSoup(page3)
desc = soup3.findAll(attrs={"name": re.compile(r"description", re.I)})
print(desc[0]['content'].encode('utf-8'))
Yields:
Learn about Uber's product, founders, investors and team. Everyone's Private Dri
ver - Request a car from any mobile phoneΓÇötext message, iPhone and Android app
s. Within minutes, a professional driver in a sleek black car will arrive curbsi
de. Automatically charged to your credit card on file, tip included.