Generating plain text from a Wikipedia database dump

烂漫一生 提交于 2019-12-22 04:41:10

问题


I found a Python script (here: Wikipedia Extractor) that can generate plain text from (English) Wikipedia database dump. When I use this command (as it's stated on the script's page):

$ python enwiki-latest-pages-articles.xml WikiExtractor.py -b 500K -o extracted

I get this error:

File "enwiki-latest-pages-articles.xml", line 1 < mediawiki xmlns="http://www.mediawiki.org/xml/export-0.8/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.8/http://www.mediawiki.org/xml/export-0.8.xsd" version="0.8" xml:lang="en">

^
SyntaxError: invalid syntax

I'm executing the script using Python 2.7.6 & Cygwin on Windows 7.

I hope If anyone has already used this script or experience with Python can help me to solve this error.

Thanks in advance!


回答1:


The first argument to python should be the script name.

You probably need to swap xml and py file names:

$ python WikiExtractor.py enwiki-latest-pages-articles.xml -b 500K -o extracted


来源:https://stackoverflow.com/questions/22772952/generating-plain-text-from-a-wikipedia-database-dump

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!