wikipedia

Is there a clean wikipedia API just for retrieve content summary?

感情迁移 提交于 2019-11-26 06:09:25
问题 I need just to retrieve first paragraph of a Wikipedia page. Content must be html formated, ready to be displayed on my websites (so NO BBCODE, or WIKIPEDIA special CODE!) 回答1: There's a way to get the entire "intro section" without any html parsing! Similar to AnthonyS's answer with an additional explaintext param, you can get the intro section text in plain text. Query Getting Stack Overflow's intro in plain text: https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts

How to extract information from a Wikipedia infobox?

你离开我真会死。 提交于 2019-11-26 04:52:31
There is this fancy infobox in <some Wikipedia article>. How do I get the value of <this field and that>? Tgr The wrong way: trying to parse HTML Use (cURL/jQuery/file_get_contents/requests/wget/ more jQuery ) to fetch the HTML article code of the article, then use a DOM parser to extract table.infobox tr[3] td / use a regex . This is actually a really bad idea most of the time. Wikipedia's HTML code is not particularly parsing-friendly (especially infoboxes which are a system of hand-written templates), the exact structure changes from infobox to infobox, and the structure of an infobox might

Is there a Wikipedia API?

╄→尐↘猪︶ㄣ 提交于 2019-11-26 03:17:50
问题 On my Wikipedia user page, I run a Wikipedia script that displays my statistics (number of pages edited, number of new pages, monthly activity, etc.). I\'d like to put this information on my blog. Is there an API that would allow me to do something like this? 回答1: MediaWiki's API is running on Wikipedia (docs). You can also use the Special:Export feature to dump data and parse it yourself. More information. 回答2: Wikipedia is built on MediaWiki, and here's the MediaWiki API. 回答3: If you want

How to extract information from a Wikipedia infobox?

大城市里の小女人 提交于 2019-11-26 01:49:49
问题 There is this fancy infobox in <some Wikipedia article>. How do I get the value of <this field and that>? 回答1: The wrong way: trying to parse HTML Use (cURL/jQuery/file_get_contents/requests/wget/more jQuery) to fetch the HTML article code of the article, then use a DOM parser to extract table.infobox tr[3] td / use a regex. This is actually a really bad idea most of the time. Wikipedia's HTML code is not particularly parsing-friendly (especially infoboxes which are a system of hand-written