xml-conduit

How to skip elements in xml-conduit

早过忘川 提交于 2021-01-27 22:01:28
问题 I have to handle rather big XML files and I want to use the streaming API of xml-conduit to go through them and extract the info I need. In my case using streaming xml-conduit is especially appealing because I don't need much data from these files, and I need to perform simple aggregations on it so conduits are perfect. Now, I don't always know the exact structure of the file. Files are generated by different versions of (sometimes buggy) software around the world so I can't impose the schema

Why doesn't runConduit send all the data?

二次信任 提交于 2021-01-06 02:47:13
问题 here's some xml i'm parsing: <?xml version="1.0" encoding="utf-8"?> <data> <row ows_Document='Weekly Report 10.21.2020' ows_Category='Weekly Report'/> <row ows_Document='Daily Update 10.20.2020' ows_Category='Daily Update'/> <row ows_Document='Weekly Report 10.14.2020' ows_Category='Weekly Report'/> <row ows_Document='Weekly Report 10.07.2020' ows_Category='Weekly Report'/> <row ows_Document='Spanish: Reporte Semanal 07.10.2020' ows_Category='Weekly Report'/> </data> i've been trying to

heap memory buildup with xml-conduit parseBytes

六眼飞鱼酱① 提交于 2019-12-11 08:35:47
问题 I'm parsing some rather large XML files with xml-conduit's streaming interface https://hackage.haskell.org/package/xml-conduit-1.8.0/docs/Text-XML-Stream-Parse.html#v:parseBytes but I'm seeing this memory buildup (here on a small test file): where the top users are: The actual data shouldn't take up that much heap – if I serialise and re-read, the resident memory use is kilobytes vs the megabytes here. The minimal example I've managed to reproduce this with: {-# LANGUAGE BangPatterns #-} {-#

How to use the xml-conduit Cursor Interface for information extraction from a large XML file (around 30G)

放肆的年华 提交于 2019-12-10 21:46:38
问题 The following question is based upon the accepted answer of this question. The author of the accepted answer said that the streaming helper API in xml-conduit was not updated for years (source: accepted answer of SO question), and he recommends the Cursor interface. Based on the solution of the first question, I wrote the following haskell code which uses the Cursor interface of xml-conduit package. import Text.XML as XML (readFile, def) import Text.XML.Cursor (Cursor, ($/), (&/), ($//), (>=>

Streaming xml-conduit parse results

寵の児 提交于 2019-12-04 03:47:48
问题 I want to use xml-conduit, specifically Text.XML.Stream.Parse in order to lazily extract a list of objects from a large XML file. As a test case, I use the recently re-released StackOverflow data dumps. To keep it simple, I intend to extract all usernames from stackoverflow.com-Users.7z . Even if the file is a .7z , file says it is just bzip2-compressed data (there might be some 7zip stuff at the end of the file, but right now I don't care). A simplified version of the XML would be <users>

Get all Names from xml-conduit

故事扮演 提交于 2019-12-03 07:52:38
I'm parsing a modified XML from http://hackage.haskell.org/package/xml-conduit-1.1.0.9/docs/Text-XML-Stream-Parse.html Here's what it looks like: <?xml version="1.0" encoding="utf-8"?> <population xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://example.com"> <success>true</success> <row_count>2</row_count> <summary> <bananas>0</bananas> </summary> <people> <person> <firstname>Michael</firstname> <age>25</age> </person> <person> <firstname>Eliezer</firstname> <age>2</age> </person> </people> </population> How do I get a list of

How to ignore unclosed tags in XML or HTML?

北城余情 提交于 2019-12-02 07:11:13
问题 I'm writing a parser in Haskell for the site using the packages Text.XML and Text.XML.Cursor. There are unclosed tags and get an error: Main.hs: Error parsing XML file dat.html: 29:1-29:8: Expected end element for: Name {nameLocalName = "br", nameNamespace = Nothing, namePrefix = Nothing}, but received: EventEndElement (Name {nameLocalName = "body", nameNamespace = Nothing, namePrefix = Nothing}) What to do? How to ignore such tags? 回答1: A text object with unclosed tags is not well-formed and