Parsing a Wikipedia dump

前端 未结 9 1275
生来不讨喜
生来不讨喜 2020-12-03 05:33

For example using this Wikipedia dump:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=lebron%20james&rvprop=content&redirects=t

9条回答
  •  情话喂你
    2020-12-03 05:51

    There's some information on Python and XML libraries here.

    If you're asking is there an existing library that's designed to parse Wiki(pedia) XML specifically and match your requirements, this is doubtful. However you can use one of the existing libraries to traverse the DOM and pull out the data you need.

    Another option is to write an XSLT stylesheet that does similar and call it using lxml. This also lets you make calls to Python functions from inside the XSLT so you get the best of both worlds.

提交回复
热议问题