Extract the first paragraph from a Wikipedia article (Python)

前端 未结 10 1562
闹比i
闹比i 2020-11-28 01:36

How can I extract the first paragraph from a Wikipedia article, using Python?

For example, for Albert Einstein, that would be:

<
10条回答
  •  猫巷女王i
    2020-11-28 02:18

    What I did is this:

    import urllib
    import urllib2
    from BeautifulSoup import BeautifulSoup
    
    article= "Albert Einstein"
    article = urllib.quote(article)
    
    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')] #wikipedia needs this
    
    resource = opener.open("http://en.wikipedia.org/wiki/" + article)
    data = resource.read()
    resource.close()
    soup = BeautifulSoup(data)
    print soup.find('div',id="bodyContent").p
    

提交回复
热议问题