sax

How to solve SAXException: Invalid element in

萝らか妹 提交于 2019-11-28 09:42:05
问题 I try to get results from a webservice in the following way. List result = new Vector(); LibrarySearchRequest request = new LibrarySearchRequest(queryString); LibrarySearchServicePortTypeProxy proxy = new LibrarySearchServicePortTypeProxy(); LibrarySearchServicePortType port = proxy.getLibrarySearchServicePortType(); LibrarySearchResponse response = port.process(request); librarysearch.soft.Book[] books = response.getBooks(); When I do this I get the following exception (stacktrace) : org.xml

Parsing of badly formatted HTML in PHP

帅比萌擦擦* 提交于 2019-11-28 09:20:08
In my code I convert some styled xls document to html using openoffice. I then parse the tables using xml_parser_create . The problem is that openoffice creates oldschool html with unclosed <BR> and <HR> tags, it doesn't create doctypes and don't quote attributes <TABLE WIDTH=4> . The php parsers I know off don't like this, and yield xml formatting errors. My current solution is to run some regexes over the file before I parse it, but this is neither nice nor fast. Do you know a (hopefully included) php-parser, that doesn't care about these kinds of mistakes? Or perhaps a fast way to fix a

Is there a SaxParser that reads json and fires events so it looks like xml

社会主义新天地 提交于 2019-11-28 09:02:05
This would be great as it would allow my xml stuff to read json w/out any change except for the different sax parser. Ryan Fernandes If you meant, event-based parser then there are a couple of projects out there that do this: http://code.google.com/p/json-simple/ Stoppable SAX-like interface for streaming input of JSON text This project has moved to https://github.com/fangyidong/json-simple http://jackson.codehaus.org/Tutorial Jackson Streaming API is similar to Stax API This project has moved to https://github.com/FasterXML/jackson-core I think it is a bad idea to try treat JSON as if it was

Cure for 'The string “--” is not permitted within comments.' exception?

本小妞迷上赌 提交于 2019-11-28 09:00:59
I'm using Java 6. I have this dependency in my pom ... <dependency> <groupId>xerces</groupId> <artifactId>xercesImpl</artifactId> <version>2.10.0</version> </dependency> I'm trying to parse an XHTML doc with this line <!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w

Storing specific XML node values with R's xmlEventParse

浪子不回头ぞ 提交于 2019-11-28 08:45:28
I have a big XML file which I need to parse with xmlEventParse in R . Unfortunately on-line examples are more complex than I need, and I just want to flag a matching node tag to store the matched node text (not attribute), each text in a separate list, see the comments in the code below: library(XML) z <- xmlEventParse( "my.xml", handlers = list( startDocument = function() { cat("Starting document\n") }, startElement = function(name,attr) { if ( name == "myNodeToMatch1" ){ cat("FLAG Matched element 1\n") } if ( name == "myNodeToMatch2" ){ cat("FLAG Matched element 2\n") } }, text = function

SAXParseException: Content is not allowed in prolog

佐手、 提交于 2019-11-28 07:32:20
问题 I need to add the following file to my Tomcat's '/conf' directory: <?xml version="1.0" encoding="UTF-8"?> <Context useHttpOnly="false" path="/bbc"> <Realm className="com.bbc.tomcat.BBCSecurityRealm"/> </Context> After adding this file, I get the following error when Tomcat starts up" ERROR ecmdefault util.digester.Digester 18:37:14,477 localhost-startStop-1 : Parse Fatal Error at line 1 column 1: Content is not allowed in prolog. org.xml.sax.SAXParseException: Content is not allowed in prolog

Keep numeric character entity characters such as `

` when parsing XML in Java

梦想的初衷 提交于 2019-11-28 04:38:28
问题 I am parsing XML that contains numeric character entity characters such as (but not limited to) < > (line feed carriage return < >) in Java. While parsing, I am appending text content of nodes to a StringBuffer to later write it out to a textfile. However, these unicode characters are resolved or transformed into newlines/whitespace when I write the String to a file or print it out. How can I keep the original numeric character entity characters symbols when iterating over nodes of an XML

Help the Java SAX parser to understand bad xml

给你一囗甜甜゛ 提交于 2019-11-28 04:35:26
问题 I am parsing XML returned from a website but sadly it is slightly malformed. I am getting XML like: <tag attrib="Buy two for £1" /> Which, I am informed, is invalid because £ is an HTML character, not an XML character and definitely cannot appear in an attribute. What can I do to fix this, assuming I cannot tell the website to obey the rules? I am considering using a FilterInputStream to filter the arriving data before it gets to the SAX parser but this seems over the top. 回答1: In the end I

How to Parse Big (50 GB) XML Files in Java

醉酒当歌 提交于 2019-11-28 04:33:20
Currently im trying to use a SAX Parser but about 3/4 through the file it just completely freezes up, i have tried allocating more memory etc but not getting any improvements. Is there any way to speed this up? A better method? Stripped it to bare bones, so i now have the following code and when running in command line it still doesn't go as fast as i would like. Running it with "java -Xms-4096m -Xmx8192m -jar reader.jar" i get a GC overhead limit exceeded around article 700000 Main: public class Read { public static void main(String[] args) { pages = XMLManager.getPages(); } } XMLManager

ElementTree iterparse strategy

谁说我不能喝 提交于 2019-11-28 04:24:32
I have to handle xml documents that are big enough (up to 1GB) and parse them with python. I am using the iterparse() function (SAX style parsing). My concern is the following, imagine you have an xml like this <?xml version="1.0" encoding="UTF-8" ?> <families> <family> <name>Simpson</name> <members> <name>Homer</name> <name>Marge</name> <name>Bart</name> </members> </family> <family> <name>Griffin</name> <members> <name>Peter</name> <name>Brian</name> <name>Meg</name> </members> </family> </families> The problem is, of course to know when I am getting a family name (as Simpsons) and when I am