Parsing XML with SAX/Python + no validation

空扰寡人 提交于 2019-12-23 01:47:09

问题


I am new to python and I'm trying to parse a XML file with SAX without validating it.

The head of my xml file is:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE n:document SYSTEM "schema.dtd">
<n:document....

and I've tried to parse it with python 2.5.2:

from xml.sax import make_parser, handler
import sys

parser = make_parser()
parser.setFeature(handler.feature_namespaces,True)
parser.setFeature(handler.feature_validation,False)
parser.setContentHandler(handler.ContentHandler())
parser.parse(sys.argv[1])

but I got an error:

python doc.py document.xml
(...)
  File "/usr/lib/python2.5/urllib2.py", line 244, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: schema.dtd

I don't want the SAX parser to look for a schema. Where am I wrong ? Thanks !


回答1:


expatreader considers the DTD external subset as an external general entity. So the feature you want is:

parser.setFeature(handler.feature_external_ges, False)

However, it's a bit dodgy pointing the DTD external subset to a non-existant URL; as this shows, it's not only validating parsers that read it.



来源:https://stackoverflow.com/questions/1998425/parsing-xml-with-sax-python-no-validation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!