How can I extract the first paragraph from a Wikipedia article, using Python?
For example, for Albert Einstein, that would be:
<
The relatively new REST API has a summary
method that is perfect for this use, and does a lot of the things mentioned in the other answers here (e.g. removing wikicode). It even includes an image and geocoordinates if applicable.
Using the lovely requests
module and Python 3:
import requests
r = requests.get("https://en.wikipedia.org/api/rest_v1/page/summary/Amsterdam")
page = r.json()
print(page["extract"]) # Returns 'Amsterdam is the capital and...'