问题
How can you filter text nodes in XML with Clojure zippers? For example, you may have a pretty-printed XML document that interleaves element nodes with text nodes containing whitespace:
(def doc
"<?xml version=\"1.0\"?>
<root>
<a>1</a>
<b>2</b>
</root>")
If you want to retrieve the content of the root
's children, you can do this:
(require '[clojure.data.xml :as xml]
'[clojure.zip :as zip]
'[clojure.data.zip :as zf]
'[clojure.data.zip.xml :as zip-xml])
(-> doc
xml/parse-str
zip/xml-zip
(zip-xml/xml-> :root zf/children zip-xml/text))
However, this returns (" " "1" " " "2" " ")
, including the whitespace.
How do you filter the zipper, so that only element nodes are selected?
I've come up with this.
(def filter-elements (comp (partial filter (comp xml/element? zip/node)) zf/children))
(-> doc
xml/parse-str
zip/xml-zip
(zip-xml/xml-> :root filter-elements zip-xml/text))
; => ("1" "2")
I suspect it's unnecessarily complex and hence I'm looking for a better solution.
回答1:
I think this relates to the general XML parsing problem of deciding which whitespace is meaningful and which isn’t. See for example this Q&A: Why am I getting extra text nodes as child nodes of root node?
I checked and found that data.xml does support skipping whitespace via an option :skip-whitespace
. It’s undocumented though (source).
So best solve this at the parsing stage.
(-> doc
(xml/parse-str :skip-whitespace true)
zip/xml-zip
(zip-xml/xml-> :root zf/children zip-xml/text))
; => ("1" "2")
回答2:
You can do this using the Tupelo library, which offers XML parsing using both clojure.data.xml
and tagsoup
parsers:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test)
(:require
[tupelo.forest :as tf]
[tupelo.parse.tagsoup :as tagsoup]
[tupelo.string :as ts] ))
(dotest
(let [doc "<?xml version=\"1.0\"?>
<root>
<a>1</a>
<b>2</b>
</root>"
result-enlive (tagsoup/parse (ts/string->stream doc))
result-hiccup (tf/enlive->hiccup result-enlive)
]
(is= result-enlive
{:tag :root,
:attrs {},
:content
[{:tag :a, :attrs {}, :content ["1"]}
{:tag :b, :attrs {}, :content ["2"]}]})
(is= result-hiccup
[:root
[:a "1"]
[:b "2"]])))
来源:https://stackoverflow.com/questions/47475799/filter-element-nodes-in-xml-with-clojure-zippers