问题
I am getting an exception parsing an XML file with clojure.data.xml, because the stream is closing before the parsing is complete.
What I do not understand is why doall is not forcing the evaluation of the XML data before with-open closes it (as suggested by this related answer):
(:require [clojure.java.io :as io]
[clojure.data.xml :as xml])
(defn file->xml [path]
(with-open [rdr (-> path io/resource io/reader)]
(doall (xml/parse rdr))))
Which throws the exception:
(file->xml "example.xml")
;-> XMLStreamException ParseError at [row,col]:[80,1926]
Message: Stream closed com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next
If I remove the with-open wrapper, it returns the XML data as expected (so the file is legit though the reader is not guaranteed closed).
I see that (source xml/parse) yields lazy results:
(defn parse
"Parses the source, which can be an
InputStream or Reader, and returns a lazy tree of Element records.
Accepts key pairs with XMLInputFactory options, see http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLInputFactory.html
and xml-input-factory-props for more information.
Defaults coalescing true."
[source & opts]
(event-tree (event-seq source opts)))
so perhaps that is related, but the function I have is very similar to the "round-trip" example on the clojure.data.xml README.
What am I missing here?
回答1:
I was surprised to see this behavior. It appears that clojure.data.xml.Element (the return type) implements a type of "lazy map" that is immune to the effects of doall.
Here is a solution which transforms the lazy values into normal maps:
(ns tst.clj.core
(:use clj.core clojure.test tupelo.test)
(:require
[tupelo.core :as t]
[clojure.string :as str]
[clojure.pprint :refer [pprint]]
[clojure.java.io :as io]
[clojure.data.xml :as xml]
[clojure.walk :refer [postwalk]]
))
(t/refer-tupelo)
(defn unlazy
[coll]
(let [unlazy-item (fn [item]
(cond
(sequential? item) (vec item)
(map? item) (into {} item)
:else item))
result (postwalk unlazy-item coll) ]
result ))
(defn file->xml [path]
(with-open [rdr (-> path io/resource io/reader) ]
(let [lazy-vals (xml/parse rdr)
eager-vals (unlazy lazy-vals) ]
eager-vals)))
(pprint (file->xml "books.xml"))
{:tag :catalog,
:attrs {},
:content
[{:tag :book,
:attrs {:id "bk101"},
:content
[{:tag :author, :attrs {}, :content ["Gambardella, Matthew"]}
{:tag :title, :attrs {}, :content ["XML Developer's Guide"]}
{:tag :genre, :attrs {}, :content ["Computer"]}
{:tag :price, :attrs {}, :content ["44.95"]}
{:tag :publish_date, :attrs {}, :content ["2000-10-01"]}
{:tag :description,
:attrs {},
:content
["An in-depth look at creating applications\n with XML."]}]}
{:tag :book,
:attrs {:id "bk102"},
:content
[{:tag :author, :attrs {}, :content ["Ralls, Kim"]}
{:tag :title, :attrs {}, :content ["Midnight Rain"]}
{:tag :genre, :attrs {}, :content ["Fantasy"]}
{:tag :price, :attrs {}, :content ["5.95"]}
{:tag :publish_date, :attrs {}, :content ["2000-12-16"]}
{:tag :description,
:attrs {},
:content
["A former architect battles corporate zombies,\n an evil sorceress, and her own childhood to become queen\n of the world."]}]}
{:tag :book,
:attrs {:id "bk103"},
:content .....
Since clojure.data.xml.Element implements clojure.lang.IPersistentMap, using (map? item) returns true.
Here is the sample data for books.xml
Please Note:
clojure.data.xml is different that clojure.xml. You may need to explore both libraries to find the one that fits your needs best.
- https://clojuredocs.org/clojure.xml
- https://github.com/clojure/data.xml
You can also use crossclj.info to find api docs when needed:
- https://crossclj.info/doc/org.clojure/clojure/latest/clojure.xml.html
- https://crossclj.info/doc/org.clojure/data.xml/0.2.0-alpha2/index.html
Update:
Just a week or so after I saw this question I ran into an XML parsing problem just like this one that needed the unlazy function. You can now find unlazy in the Tupelo library.
来源:https://stackoverflow.com/questions/43194162/clojure-xml-stream-closed-exception