How to skip elements in xml-conduit

早过忘川 提交于 2021-01-27 22:01:28

问题


I have to handle rather big XML files and I want to use the streaming API of xml-conduit to go through them and extract the info I need. In my case using streaming xml-conduit is especially appealing because I don't need much data from these files, and I need to perform simple aggregations on it so conduits are perfect.

Now, I don't always know the exact structure of the file. Files are generated by different versions of (sometimes buggy) software around the world so I can't impose the schema.

I know, however, elements that I am interested in, and their shapes. But, as I said, these elements can be located in different order with other elements, etc.

What I need, I guess, is just to skip all the elements I am not interested in and only to consider ones that want.

I initially wanted to write something like that:

tagName "person" (requireAttr "age" <* ignoreAttrs) <|> ignoreTag (const True)

but it wouldn't compile because ignoreType returns Maybe ()

What would be the way to skip all the "unknown" tags when using xml-conduit streaming API?


回答1:


As proposed here

λ> runConduit $ Text.XML.Stream.Parse.parseLBS def  "<foo>bar</foo><person age=\"25\">Michael</person><person age=\"2\">Eliezer</person>" .| many_ (choose [takeTree "person" ignoreAttrs, ignoreAnyTreeContent]) .| manyYield parsePerson .| Data.Conduit.List.consume 
[Person 25 "Michael",Person 2 "Eliezer"]


来源:https://stackoverflow.com/questions/42265047/how-to-skip-elements-in-xml-conduit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!