可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I need to convert a web page to XML (using Python 3.4.3). If I write the contents of the URL to a file then I can read and parse it perfectly but if I try to read directly from the web page I get the following error in my terminal:

File "./AnimeXML.py", line 22, in xml = ElementTree.parse (xmlData) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/xml/etree/ElementTree.py", line 1187, in parse tree.parse(source, parser) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/xml/etree/ElementTree.py", line 587, in parse source = open(source, "rb") OSError: [Errno 36] File name too long:

My python code:

# AnimeXML.py #! /usr/bin/Python  # Import xml parser. import xml.etree.ElementTree as ElementTree  # XML to parse. sampleUrl = "http://cdn.animenewsnetwork.com/encyclopedia/api.xml?anime=16989"  # Read the xml as a file. content = urlopen (sampleUrl)  # XML content is stored here to start working on it. xmlData = content.readall().decode('utf-8')  # Close the file. content.close()  # Start parsing XML. xml = ElementTree.parse (xmlData)  # Get root of the XML file. root = xml.getroot()  for info in root.iter("info"):     print (info.attrib)

Is there any way I can fix my code so that I can read the web page directly into python without getting this error?

回答1:

As explained in the Parsing XML section of the ElementTree docs:

We can import this data by reading from a file:

import xml.etree.ElementTree as ET tree = ET.parse('country_data.xml') root = tree.getroot()

Or directly from a string:

root = ET.fromstring(country_data_as_string)

You're passing the whole XML contents as a giant pathname. Your XML file is probably bigger than 2K, or whatever the maximum pathname size is for your platform, hence the error. If it weren't, you'd just get a different error about there being no directory named [everything up to the first / in your XML file].

Just use fromstring instead of parse.

Or, notice that parse can take a file object, not just a filename. And the thing returned by urlopen is a file object.

Also notice the very next line in that section:

fromstring() parses XML from a string directly into an Element, which is the root element of the parsed tree. Other parsing functions may create an ElementTree.

So, you don't want that root = tree.getroot() either.

So:

# ... content.close() root = ElementTree.fromstring(xmlData)

文章来源: OSError: [Errno 36] File name too long:

标签

errno

python