sax

Use CSS selectors to collect HTML elements from a streaming parser (e.g. SAX stream)

泪湿孤枕 提交于 2019-12-03 12:29:17
How to parse CSS (CSS3) selector and use it (in jQuery-like way) to collect HTML elements not from DOM (from tree structure), but from stream (e.g. SAX), i.e. using sequential access event based parser? By the way, are there any CSS selectors (or their combination) that need access to DOM (Wikipedia SAX page says that XPath selectors "need to be able to access any node at any time in the parsed XML tree")? I am most interested in implementing selector combinators , e.g. 'A B' descendant selector. I prefer solutions describing algorithm, or in Perl (for HTML::Zoom ). I would do it with regular

Xml not parsing String as input with sax

佐手、 提交于 2019-12-03 12:09:07
I have a string input from which I need to extract simple information, here is the sample xml (from mkyong): <?xml version="1.0"?> <company> <staff> <firstname>yong</firstname> <lastname>mook kim</lastname> <nickname>mkyong</nickname> <salary>100000</salary> </staff> <staff> <firstname>low</firstname> <lastname>yin fong</lastname> <nickname>fong fong</nickname> <salary>200000</salary> </staff> </company> How I parse it within my code (I have a field String name in my class) : public String getNameFromXml(String xml) { try { SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser

How to use SAXParseException effectively in Java

与世无争的帅哥 提交于 2019-12-03 09:04:36
问题 I'm validating against XMLSchema in Java, and getting SAXParseExceptions thrown when I have non-valid content models. I'm going to be using these exceptions to highlight where the validation has failed - but the SAXParseExceptions seem to be a little too low-level. For example, for a failure on an enumeration, I get the validity error that the value provided doesn't match the content model in one exception, and the element it applies to in the next. I'm thinking I need to have a utility that

How to get error's line number while validating a XML file against a XML schema

余生颓废 提交于 2019-12-03 08:02:18
问题 I'm trying to validade a XML against a W3C XML Schema. The following code does the job and reports when error occurs. But I'm unable to get line number of the error. It always returns -1. Is there a easy way to get the line number? import java.io.File; import javax.xml.XMLConstants; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.transform.Source; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream

Light weight C++ SAX XML parser

偶尔善良 提交于 2019-12-03 06:34:22
I know of at least three light weight C++ XML parsers: RapidXML , TinyXML and PugiXML . However, all three use a DOM based interface (ie, they build their own in-memory representation of the XML document and then provide an interface to traverse and manipulate it). For most situations that I have to deal with, I much prefer the SAX interface (where the parser just spits out a stream of events like start-of-tag, and the application code is responsible for doing whatever it wants based on those events). Can anyone recommend a light weight C++ XML library with a SAX interface? Edit: I should also

Java学习:使用SAX解析XML

限于喜欢 提交于 2019-12-03 01:02:13
本文是我在学习《Java核心技术》第九版卷II(高级特性)时写的一段代码。原书作者为(美)Cay S. Horstmann,Gary Cornell,译者为陈昊鹏、王浩、姚建平等。我的Java版本为1.8 Java提供了两种XML解析器:树型解释器DOM(Document Object Model,文档对象模型),和流机制解析器SAX(Simple API for XML,XML简单API)。如果文档较大且处理算法又较为简单,可在运行时解析节点而不必看到完整的树形结构时,DOM处理方式效率不如SAX。 XML文档内容如下: 文件名:NameList.xml,文件路径:C:\Users\Tsybius\Desktop\NameList.xml <?xml version="1.0" encoding="UTF-8"?> <root> <list1> <person id="101" name="Tsybius" remark="1" /> <person id="102" name="Galatea" remark="2" /> <person id="103" name="Quintus" remark="3" /> <person id="104" name="Atia" remark="4" /> <person id="105" name="Justitia" remark=

parsing large xml 500M with node.js

北城余情 提交于 2019-12-03 00:48:09
I am using isaacs' SAX to parse a huge xml file. Also recommended by La Gentz . The process uses about 650M of memory, how can I reduce this or allow node to use even more. FATAL ERROR: CALL_AND_RETRY_0 Allocation failed - process out of memory My XML file is larger than 300M it could grow to 1GB. You should stream the file into the parser, that's the whole point of a streaming parser after all. var parser = require('sax').createStream(strict, options); fs.createReadStream(file).pipe(parser); 来源: https://stackoverflow.com/questions/8707255/parsing-large-xml-500m-with-node-js

How to use SAXParseException effectively in Java

一曲冷凌霜 提交于 2019-12-02 23:11:38
I'm validating against XMLSchema in Java, and getting SAXParseExceptions thrown when I have non-valid content models. I'm going to be using these exceptions to highlight where the validation has failed - but the SAXParseExceptions seem to be a little too low-level. For example, for a failure on an enumeration, I get the validity error that the value provided doesn't match the content model in one exception, and the element it applies to in the next. I'm thinking I need to have a utility that abstracts a little to merge related errors together and parse exception text into useable exception

Unable to understand Even model of POI API for reading Excel files(.xlsx)

感情迁移 提交于 2019-12-02 13:28:15
问题 I have been using the User model of POI API for reading excel files(.xlsx) until I encountered the out of memory(GC Overhead) exception as I was processing a pretty big file. Upon some analysis, I was suggested to use the Event model(XSSF SAX Event API) rather of the same POI API. The problem is I am unable to understand the howto of this API by reading the docs. I did not have any problem while trying to understand the User model which I am using now. I know this Event model requires some

Discard html tags within custom tags while getting text in XHTML using SAX Parser in Groovy

自古美人都是妖i 提交于 2019-12-02 11:22:41
问题 So I am trying to get the text between the tags. So far I have been successful. But sometimes when there are special characters or html tags inside my custom tags I am unable to get the text. The sample xml looks like <records> <car name='HSV Maloo' make='Holden' year='2006'> <ae_definedTermTitleBegin />Australia<ae_definedTermTitleEnd /> <ae_clauseTitleBegin />1.02 <u>Accounting Terms</u>.<ae_clauseTitleEnd /> </car> <car name='P50' make='Peel' year='1962'> <ae_definedTermTitleBegin />Isle