发表新帖

发表新帖

Extract the first paragraph from a Wikipedia article (Python)

前端未结

关注

 10  1555

闹比i 2020-11-28 01:36

How can I extract the first paragraph from a Wikipedia article, using Python?

For example, for Albert Einstein, that would be:

<

10条回答

小蘑菇 (楼主)

2020-11-28 02:01
Wikipedia runs a MediaWiki extension that provides exactly this functionality as an API module. TextExtracts implements action=query&prop=extracts with options to return the first N sentences and/or just the introduction, as HTML or plain text.

Here's the API call you want to make, try it: https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Albert%20Einstein&exintro=&exsentences=2&explaintext=&redirects=&formatversion=2
- action=query&prop=extracts to request this info
- (ex)sentences=2, (ex)intro=, (ex)plaintext, are parameters to the module (see the first link for its API doc) asking for two sentences from the intro as plain text; leave off the latter for HTML.
- redirects=(true) so if you ask for "titles=Einstein" you'll get the Albert Einstein page info
- formatversion=2 for a cleaner format in UTF-8.
There are various libraries that wrap invoking the MediaWiki action API, such as the one in DGund's answer, but it's not too hard to make the API calls yourself.

Page info in search results discusses getting this text extract, along with getting a description and lead image for articles.
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...

热议问题