How to extract data from a Wikipedia article?

雨燕双飞 提交于 2019-12-03 07:52:42

问题


I have a question regarding parsing data from Wikipedia for my Android app. I have a script that can download the XML by reading the source from http://en.wikipedia.org/w/api.php?action=parse&prop=text&format=xml&page=ARTICLE_NAME (and also the JSON by replacing format=xml with format=json.

But what I can't figure out is how to only access certain sections from the table of contents. What I want is when the page is loaded, the user can press a button that makes a pop-up appear that displays the headers from the table of contents and allow the user to read that piece and only that piece of it for convenience. I'm a little shaky with JSON but is it possible to do this? Or, is there an API from Wikipedia that allows the developer to only view certain parts of a page?

Thanks!


回答1:


Unfortunatelly, it seems the mediawiki.org documentation for parse doesn't tell you how to do this. But the documentation in the API itself does: You can use section parameter. And you can use prop=sections to get the list of sections.

So, you could first use:

http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=sections

to get the list of sections and then

http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=text&section=26

to get the HTML for a certain section.




回答2:


action=parse doesn't work well with per-section parse, consider this shoert example:

Foo is a bar<ref>really!</ref>
== References ==
<references/>

Parsing just the zeroth section will result in red error message about without while parsing the first one will result in empty references list.

However, there's a better solution: action=mobileview is not only free from this problem, but it's also specifically intended for mobile apps and gives you mobile-optimized HTML.



来源:https://stackoverflow.com/questions/10467492/how-to-extract-data-from-a-wikipedia-article

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!