How can I extract the first paragraph from a Wikipedia article, using Python?
For example, for Albert Einstein, that would be:
<
Wikipedia runs a MediaWiki extension that provides exactly this functionality as an API module. TextExtracts implements action=query&prop=extracts with options to return the first N sentences and/or just the introduction, as HTML or plain text.
Here's the API call you want to make, try it: https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Albert%20Einstein&exintro=&exsentences=2&explaintext=&redirects=&formatversion=2
action=query&prop=extracts to request this inforedirects=(true) so if you ask for "titles=Einstein" you'll get the Albert Einstein page infoformatversion=2 for a cleaner format in UTF-8.There are various libraries that wrap invoking the MediaWiki action API, such as the one in DGund's answer, but it's not too hard to make the API calls yourself.
Page info in search results discusses getting this text extract, along with getting a description and lead image for articles.