发表新帖

发表新帖

Parsing a Wikipedia dump

前端未结

关注

 9  1275

生来不讨喜 2020-12-03 05:33

For example using this Wikipedia dump:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=lebron%20james&rvprop=content&redirects=t

9条回答

情话喂你 (楼主)

2020-12-03 05:51

There's some information on Python and XML libraries here.

If you're asking is there an existing library that's designed to parse Wiki(pedia) XML specifically and match your requirements, this is doubtful. However you can use one of the existing libraries to traverse the DOM and pull out the data you need.

Another option is to write an XSLT stylesheet that does similar and call it using lxml. This also lets you make calls to Python functions from inside the XSLT so you get the best of both worlds.

0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...

热议问题