sax

Python sax to lxml for 80+GB XML

谁说胖子不能爱 提交于 2019-11-27 18:37:54
How would you read an XML file using sax and convert it to a lxml etree.iterparse element? To provide an overview of the problem, I have built an XML ingestion tool using lxml for an XML feed that will range in the size of 25 - 500MB that needs ingestion on a bi-daily basis, but needs to perform a one time ingestion of a file that is 60 - 100GB's. I had chosen to use lxml based on the specifications that detailed a node would not exceed 4 -8 GB's in size which I thought would allow the node to be read into memory and cleared when finished. An overview if the code is below elements = etree

When should I choose SAX over StAX?

谁说我不能喝 提交于 2019-11-27 16:54:28
Streaming xml-parsers like SAX and StAX are faster and more memory efficient than parsers building a tree-structure like DOM-parsers. SAX is a push parser, meaning that it's an instance of the observer pattern (also called listener pattern). SAX was there first, but then came StAX - a pull parser, meaning that it basically works like an iterator. You can find reasons why to prefer StAX over SAX everywhere, but it usually boils down to: "it's easier to use". In the Java tutorial on JAXP StAX is vaguely presented as the middle between DOM and SAX: "it's easier than SAX and more efficient than

Exception reading XLSB File Apache POI java.io.CharConversionException

 ̄綄美尐妖づ 提交于 2019-11-27 15:44:13
Im developing a Java aplication that reads an excel xlsb file using Apache POI, but I got an exception while reading it, my code is as follows: import java.io.IOException; import java.io.InputStream; import org.apache.poi.xssf.eventusermodel.XSSFReader; import org.apache.poi.xssf.model.SharedStringsTable; import org.apache.poi.xssf.usermodel.XSSFRichTextString; import org.apache.poi.openxml4j.exceptions.InvalidFormatException; import org.apache.poi.openxml4j.exceptions.OpenXML4JException; import org.apache.poi.openxml4j.opc.Package; import org.xml.sax.Attributes; import org.xml.sax

Android: parse XML from string problems

只谈情不闲聊 提交于 2019-11-27 15:40:01
I've got a custom contentHandler (called XMLHandler), I've been to a lot of sites via Google and StackOverflow that detail how to set that up. What I do not understand is how to USE it. Xml.parse(...,...) returns nothing, because it is a void method. How do I access my parsed XML data? I realize this question is probably trivial, but I've been searching for (literally) hours and have found no solution. Please help. String result = fetchData(doesntmatter); Xml.parse(result, new XMLHandler()); Here is one example i hope it will be usefull to understand "SAXParser" package test.example; import

SAX vs XmlTextReader - SAX in C#

*爱你&永不变心* 提交于 2019-11-27 15:05:55
问题 I am attempting to read a large XML document and I wanted to do it in chunks vs XmlDocument 's way of reading the entire file into memory. I know I can use XmlTextReader to do this but I was wondering if anyone has used SAX for .NET? I know Java developers swear by it and I was wondering if it is worth giving it a try and if so what are the benefits in using it. I am looking for specifics. 回答1: If you're talking about SAX for .NET, the project doesn't appear to be maintained. The last release

why is sax parsing faster than dom parsing ? and how does stax work?

陌路散爱 提交于 2019-11-27 12:30:46
somewhat related to: libxml2 from java yes, this question is rather long-winded - sorry. I kept is as dense as I felt possible. I bolded the questions to make it easier to peek at before reading the whole thing. Why is sax parsing faster than dom parsing? The only thing I can come up with is that w/ sax you're probably ignoring the majority of the incoming data, and thus not wasting time processing parts of the xml you don't care about. IOW - after parsing w/ SAX, you can't recreate the original input. If you wrote your SAX parser so that it accounted for each and every xml node (and could

What is the difference between localname and qname?

核能气质少年 提交于 2019-11-27 12:05:25
问题 When using SAX to parse an XML file in Java, what is the difference between the parameters localname and qname in SAX methods such as startElement(String uri, String localName,String qName, Attributes attributes) ? 回答1: The qualified name includes both the namespace prefix and the local name: att1 and foo:att2 . Sample XML <root xmlns="http://www.example.com/DEFAULT" att1="Hello" xmlns:foo="http://www.example.com/FOO" foo:att2="World"/> Java Code: att1 Attributes without a namespace prefix do

XML parsing - ElementTree vs SAX and DOM

我怕爱的太早我们不能终老 提交于 2019-11-27 09:20:54
问题 Python has several ways to parse XML... I understand the very basics of parsing with SAX . It functions as a stream parser, with an event-driven API. I understand the DOM parser also. It reads the XML into memory and converts it to objects that can be accessed with Python. Generally speaking, it was easy to choose between the two depending on what you needed to do, memory constraints, performance, etc. (Hopefully I'm correct so far.) Since Python 2.5, we also have ElementTree . How does this

How to set Saxon as the Xslt processor in Java?

随声附和 提交于 2019-11-27 09:19:25
问题 This is a simple question, but one I cannot find the answer to. I have an XSLT 2.0 stylesheet that I'm trying to process in Java. It relies on XSL elements from Saxon. My current class works fine with simple XSLT 1.0, but I'm getting errors about unrecognized elements with my 2.0 XSLT built with Saxon. I cannot figure out how to tell Java to use Saxon as the processor. I'm using javax.xml.transform in my class. Is this a property I can set? What do I set it to? Thanks! Edited I figured out

Can SAX Parsers use XPath in Java?

懵懂的女人 提交于 2019-11-27 09:08:35
I'm trying to migrate one of my classes which uses DOM parsing with lots of XPath expressions to SAX parsing. DOM Parsing was good for me but some of the files i try to parse are too big and they cause server timeouts. I want to reuse the XPath with the SAX parsing but i'm not sure if it is possible and if not possible could you please help me because i have no idea how the following code will be when i use only SAX : Document doc = bpsXml.getDocument(); String supplierName = BPSXMLUtils.getXpathString(doc, "/Invoice/InvoiceHeader/Party[@stdValue='SU']/Name/Name1"); String language =