发表新帖

发表新帖

Parsing a Wikipedia dump

前端未结

关注

 9  1262

生来不讨喜

For example using this Wikipedia dump:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=lebron%20james&rvprop=content&redirects=t

相关标签:

9条回答

失恋的感觉

2020-12-03 06:04

I would say look at using Beautiful Soup and just get the Wikipedia page in HTML instead of using the API.

I'll try and post an example.

0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-12-03 06:05

I know the question is old, but I was searching for a library that parses wikipedia xml dump. However, the suggested libraries, wikidump and mwlib, don't offer many code documentation. Then, I found Mediwiki-utilities, which has some code documentation in: http://pythonhosted.org/mediawiki-utilities/.

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘了有多久

2020-12-03 06:08

Just stumbled over a library on PyPi, wikidump, that claims to provide

Tools to manipulate and extract data from wikipedia dumps

I didn't use it yet, so you are on your own to try it...

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

热议问题