sax

open-uri and sax parsing for a giant xml document

被刻印的时光 ゝ 提交于 2019-12-07 23:33:22
问题 I need to connect to an external XML file to download and process (300MB+). Then run through the XML document and save elements in the database. I am already doing this no problem on a production server with Saxerator to be gentle on memory. It works great. Here is my issue now -- I need to use open-uri (though there could be alternative solutions?) to grab the file to parse through. This problem is that open-uri has to load the whole file before anything starts parsing, which defeats the

Determine root Element during SAX parsing

﹥>﹥吖頭↗ 提交于 2019-12-07 17:38:38
问题 I am using SAX to parse XML files. Let's suppose that I want my application to only deal with XML files with root element " animalList " - if the root node is something else, the SAX parser should terminate parsing. Using DOM, you would do it like this: ... Element rootElement = xmldoc.getDocumentElement(); if ( ! rootElement.getNodeName().equalsIgnoreCase("animalList") ) throw new Exception("File is not an animalList file."); ... but I can't ascertain how to do it using SAX - I can't figure

Proper way to store an url in xml?

♀尐吖头ヾ 提交于 2019-12-07 17:15:10
问题 I am storing data in xml file, In one of the node I have to store an url which consists of special character like & I used &amp instead of & and xml shows no error but when I did SAX parsing String value returned within the node is the string which is after &, I am guessing the way I am storing the url is not proper. What is right way of storing an url in xml. Currently I am storing as, <param>http://www.example.com?param1=abc&v=1</param> XML has no errors, but SAX parser won't return the

JAXB Unmarshalling an subset of Unknown XML content

孤街浪徒 提交于 2019-12-07 15:08:42
问题 I have a requirement to unmarshall a subset of Unknown XML content, with that unmarshalled object, I need modify some contents and re-bind the same XML content(subset) with the Original XML. Sample Input XML: <Message> <x> </x> <y> </y> <z> </z> <!-- Need to unmarshall this content to "Content" - java Object --> <Content> <Name>Robin</Name> <Role>SM</Role> <Status>Active</Status> </Content> ..... </Message> Need to unmarshall the <Content> tag alone, by keeping the other XML part as same.

SAX XML Java Entities problem

旧巷老猫 提交于 2019-12-07 14:39:22
问题 I've a problem with SAX and Java . I'm parsing the dblp digital library database xml file (which enumerates journal, conferences, paper). The XML file is very large (> 700MB). However, my problem is that when the callback characters() returns, if the string retrieved contains several entities , the method only returns the string starting from the last entity characters found . i.e.: Rüdiger Mecke is the original author name held between <author> tags üdiger Mecke is the result (The String

readstream pipe does not close

寵の児 提交于 2019-12-07 13:54:41
问题 I am using sax-js to read large xml files. I cannot get the program to exit when the parser is finished. Here is the shape of the script, with parser logic removed. var fs = require('fs'); var sax = require('sax'); var feedFile = 'foo.xml'; var saxStream = sax.createStream(true) .on('opentag', function(node) { // do stuff }) .on('end', function() { console.log("parser end event"); }); var options = { flags: 'r', encoding: 'utf8', mode: 0666, bufferSize: 1024 }; fs.createReadStream(feedFile,

How to tidy up malformed xml in ruby

送分小仙女□ 提交于 2019-12-07 13:04:57
问题 I'm having issues tidying up malformed XML code I'm getting back from the SEC's edgar database. For some reason they have horribly formed xml. Tags that contain any sort of string aren't closed and it can actually contain other xml or html documents inside other tags. Normally I'd had this off to Tidy but that isn't being maintained. I've tried using Nokogiri::XML::SAX::Parser but that seems to choke because the tags aren't closed. It seems to work alright until it hits the first ending tag

Android XML Parsing omitting “&”

浪尽此生 提交于 2019-12-07 11:48:07
问题 The problem again is that though i have succesfully implemented a SAX parser in my code... It is behaving wierdly. It jus skips the enteries after the & and goes to the next entry. Just wanted to know whether this is the typical working of SAX parser or m i implementing it wrongly??? I have implemented org.xml.sax.ContentHandler and have provided the following coding inside... ` public void characters(char[] ch, int start, int length) { if(lastName.equals("id")) { String id = String

XML SAX parser for scripting using reflection

家住魔仙堡 提交于 2019-12-07 11:19:45
问题 I'd like an opinion about to create an hypothetic scripting system using XML. The idea is to use a SAX parser and C# reflection. I cannot find a library/framework which allow to specify custom action using XML files. At this time I use XML for serialize application classes, bug could be awesome to specify which actions the application shall execute using XML. So, I'm thinking about: SAX parser implementation for C#? XML script conventions? What I'd like to achieve is the: - Possibility to

Lazy SAX XML parser with stop/resume

橙三吉。 提交于 2019-12-07 08:38:31
I am pretty sure the answer is no but of course there are cleverer guys than me! Is there a way to construct a lazy SAX based XML parser that can be stopped (e.g. raising an exception is a possible way of doing this) but also resumable ? I am looking for a possible solution for Python >= 2.6 with standard XML libraries. The "lazy" part is also trivial: I am really after the "resumable" property here. Expat can be stopped and is resumable. AFAIK Python SAX parser uses Expat. Does the API really not expose the stopping stuff to the Python side?? EDIT: nope, looks like the parser stopping isn't