How do I grab just the parsed Infobox of a wikipedia article?

后端 未结 8 1675
萌比男神i
萌比男神i 2020-12-16 04:05

I\'m still stuck on my problem of trying to parse articles from wikipedia. Actually I wish to parse the infobox section of articles from wikipedia i.e. my application has re

8条回答
  •  余生分开走
    2020-12-16 04:07

    I'd use the wikipedia (wikimedia) API. You can get data back in JSON, XML, php native format, and others. You'll then still need to parse the returned information to extract and format the info you want, but the info box start, stop, and information types are clear.

    Run your query for just rvsection=0, as this first section gets you the material before the first section break, including the infobox. Then you'll need to parse the infobox content, which shouldn't be too hard. See en.wikipedia.org/w/api.php for the formal wikipedia api documentation, and www.mediawiki.org/wiki/API for the manual.

    Run, for example, the query: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xmlfm&titles=fortran&rvsection=0

提交回复
热议问题