How can I extract the first paragraph from a Wikipedia article, using Python?
For example, for Albert Einstein, that would be:
<
What I did is this:
import urllib
import urllib2
from BeautifulSoup import BeautifulSoup
article= "Albert Einstein"
article = urllib.quote(article)
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')] #wikipedia needs this
resource = opener.open("http://en.wikipedia.org/wiki/" + article)
data = resource.read()
resource.close()
soup = BeautifulSoup(data)
print soup.find('div',id="bodyContent").p