sax | 易学教程

How to solve SAXException: Invalid element in

阅读更多关于 How to solve SAXException: Invalid element in

问题 I try to get results from a webservice in the following way. List result = new Vector(); LibrarySearchRequest request = new LibrarySearchRequest(queryString); LibrarySearchServicePortTypeProxy proxy = new LibrarySearchServicePortTypeProxy(); LibrarySearchServicePortType port = proxy.getLibrarySearchServicePortType(); LibrarySearchResponse response = port.process(request); librarysearch.soft.Book[] books = response.getBooks(); When I do this I get the following exception (stacktrace) : org.xml

Parsing of badly formatted HTML in PHP

阅读更多关于 Parsing of badly formatted HTML in PHP

In my code I convert some styled xls document to html using openoffice. I then parse the tables using xml_parser_create . The problem is that openoffice creates oldschool html with unclosed <BR> and <HR> tags, it doesn't create doctypes and don't quote attributes <TABLE WIDTH=4> . The php parsers I know off don't like this, and yield xml formatting errors. My current solution is to run some regexes over the file before I parse it, but this is neither nice nor fast. Do you know a (hopefully included) php-parser, that doesn't care about these kinds of mistakes? Or perhaps a fast way to fix a

Is there a SaxParser that reads json and fires events so it looks like xml

阅读更多关于 Is there a SaxParser that reads json and fires events so it looks like xml

This would be great as it would allow my xml stuff to read json w/out any change except for the different sax parser. Ryan Fernandes If you meant, event-based parser then there are a couple of projects out there that do this: http://code.google.com/p/json-simple/ Stoppable SAX-like interface for streaming input of JSON text This project has moved to https://github.com/fangyidong/json-simple http://jackson.codehaus.org/Tutorial Jackson Streaming API is similar to Stax API This project has moved to https://github.com/FasterXML/jackson-core I think it is a bad idea to try treat JSON as if it was

Cure for 'The string “--” is not permitted within comments.' exception?

阅读更多关于 Cure for 'The string “--” is not permitted within comments.' exception?

I'm using Java 6. I have this dependency in my pom ... <dependency> <groupId>xerces</groupId> <artifactId>xercesImpl</artifactId> <version>2.10.0</version> </dependency> I'm trying to parse an XHTML doc with this line <!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w

Storing specific XML node values with R's xmlEventParse

阅读更多关于 Storing specific XML node values with R's xmlEventParse

I have a big XML file which I need to parse with xmlEventParse in R . Unfortunately on-line examples are more complex than I need, and I just want to flag a matching node tag to store the matched node text (not attribute), each text in a separate list, see the comments in the code below: library(XML) z <- xmlEventParse( "my.xml", handlers = list( startDocument = function() { cat("Starting document\n") }, startElement = function(name,attr) { if ( name == "myNodeToMatch1" ){ cat("FLAG Matched element 1\n") } if ( name == "myNodeToMatch2" ){ cat("FLAG Matched element 2\n") } }, text = function

SAXParseException: Content is not allowed in prolog

阅读更多关于 SAXParseException: Content is not allowed in prolog

问题 I need to add the following file to my Tomcat's '/conf' directory: <?xml version="1.0" encoding="UTF-8"?> <Context useHttpOnly="false" path="/bbc"> <Realm className="com.bbc.tomcat.BBCSecurityRealm"/> </Context> After adding this file, I get the following error when Tomcat starts up" ERROR ecmdefault util.digester.Digester 18:37:14,477 localhost-startStop-1 : Parse Fatal Error at line 1 column 1: Content is not allowed in prolog. org.xml.sax.SAXParseException: Content is not allowed in prolog

Keep numeric character entity characters such as `

` when parsing XML in Java

阅读更多关于 Keep numeric character entity characters such as ` ` when parsing XML in Java

问题 I am parsing XML that contains numeric character entity characters such as (but not limited to) < > (line feed carriage return < >) in Java. While parsing, I am appending text content of nodes to a StringBuffer to later write it out to a textfile. However, these unicode characters are resolved or transformed into newlines/whitespace when I write the String to a file or print it out. How can I keep the original numeric character entity characters symbols when iterating over nodes of an XML

Help the Java SAX parser to understand bad xml

阅读更多关于 Help the Java SAX parser to understand bad xml

问题 I am parsing XML returned from a website but sadly it is slightly malformed. I am getting XML like: <tag attrib="Buy two for £1" /> Which, I am informed, is invalid because £ is an HTML character, not an XML character and definitely cannot appear in an attribute. What can I do to fix this, assuming I cannot tell the website to obey the rules? I am considering using a FilterInputStream to filter the arriving data before it gets to the SAX parser but this seems over the top. 回答1: In the end I

How to Parse Big (50 GB) XML Files in Java

阅读更多关于 How to Parse Big (50 GB) XML Files in Java

Currently im trying to use a SAX Parser but about 3/4 through the file it just completely freezes up, i have tried allocating more memory etc but not getting any improvements. Is there any way to speed this up? A better method? Stripped it to bare bones, so i now have the following code and when running in command line it still doesn't go as fast as i would like. Running it with "java -Xms-4096m -Xmx8192m -jar reader.jar" i get a GC overhead limit exceeded around article 700000 Main: public class Read { public static void main(String[] args) { pages = XMLManager.getPages(); } } XMLManager

ElementTree iterparse strategy

阅读更多关于 ElementTree iterparse strategy

I have to handle xml documents that are big enough (up to 1GB) and parse them with python. I am using the iterparse() function (SAX style parsing). My concern is the following, imagine you have an xml like this <?xml version="1.0" encoding="UTF-8" ?> <families> <family> <name>Simpson</name> <members> <name>Homer</name> <name>Marge</name> <name>Bart</name> </members> </family> <family> <name>Griffin</name> <members> <name>Peter</name> <name>Brian</name> <name>Meg</name> </members> </family> </families> The problem is, of course to know when I am getting a family name (as Simpsons) and when I am