Clojure XML Stream Closed Exception

心已入冬 提交于 2019-12-10 16:14:09

问题


I am getting an exception parsing an XML file with clojure.data.xml, because the stream is closing before the parsing is complete.

What I do not understand is why doall is not forcing the evaluation of the XML data before with-open closes it (as suggested by this related answer):

(:require [clojure.java.io :as io]
          [clojure.data.xml :as xml])

(defn file->xml [path] 
  (with-open [rdr (-> path io/resource io/reader)] 
    (doall (xml/parse rdr))))

Which throws the exception:

(file->xml "example.xml")
;-> XMLStreamException ParseError at [row,col]:[80,1926]
Message: Stream closed com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next

If I remove the with-open wrapper, it returns the XML data as expected (so the file is legit though the reader is not guaranteed closed).

I see that (source xml/parse) yields lazy results:

(defn parse
  "Parses the source, which can be an
   InputStream or Reader, and returns a lazy tree of Element records. 
   Accepts key pairs with XMLInputFactory options, see http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLInputFactory.html
   and xml-input-factory-props for more information. 
   Defaults coalescing true."
   [source & opts]
     (event-tree (event-seq source opts)))

so perhaps that is related, but the function I have is very similar to the "round-trip" example on the clojure.data.xml README.

What am I missing here?


回答1:


I was surprised to see this behavior. It appears that clojure.data.xml.Element (the return type) implements a type of "lazy map" that is immune to the effects of doall.

Here is a solution which transforms the lazy values into normal maps:

(ns tst.clj.core
  (:use clj.core clojure.test tupelo.test)
  (:require
    [tupelo.core :as t]
    [clojure.string :as str]
    [clojure.pprint :refer [pprint]]
    [clojure.java.io :as io]
    [clojure.data.xml :as xml]
    [clojure.walk :refer [postwalk]]
  ))
(t/refer-tupelo)

(defn unlazy
  [coll]
  (let [unlazy-item (fn [item]
                      (cond
                        (sequential? item) (vec item)
                        (map? item) (into {} item)
                        :else item))
        result    (postwalk unlazy-item coll) ]
    result ))

(defn file->xml [path]
  (with-open [rdr (-> path io/resource io/reader) ]
    (let [lazy-vals    (xml/parse rdr)
          eager-vals   (unlazy lazy-vals) ]
      eager-vals)))
(pprint (file->xml "books.xml"))

{:tag :catalog,
 :attrs {},
 :content
 [{:tag :book,
   :attrs {:id "bk101"},
   :content
   [{:tag :author, :attrs {}, :content ["Gambardella, Matthew"]}
    {:tag :title, :attrs {}, :content ["XML Developer's Guide"]}
    {:tag :genre, :attrs {}, :content ["Computer"]}
    {:tag :price, :attrs {}, :content ["44.95"]}
    {:tag :publish_date, :attrs {}, :content ["2000-10-01"]}
    {:tag :description,
     :attrs {},
     :content
     ["An in-depth look at creating applications\n      with XML."]}]}
  {:tag :book,
   :attrs {:id "bk102"},
   :content
   [{:tag :author, :attrs {}, :content ["Ralls, Kim"]}
    {:tag :title, :attrs {}, :content ["Midnight Rain"]}
    {:tag :genre, :attrs {}, :content ["Fantasy"]}
    {:tag :price, :attrs {}, :content ["5.95"]}
    {:tag :publish_date, :attrs {}, :content ["2000-12-16"]}
    {:tag :description,
     :attrs {},
     :content
     ["A former architect battles corporate zombies,\n      an evil sorceress, and her own childhood to become queen\n      of the world."]}]}
  {:tag :book,
   :attrs {:id "bk103"},
   :content .....

Since clojure.data.xml.Element implements clojure.lang.IPersistentMap, using (map? item) returns true.

Here is the sample data for books.xml

Please Note:

clojure.data.xml is different that clojure.xml. You may need to explore both libraries to find the one that fits your needs best.

  • https://clojuredocs.org/clojure.xml
  • https://github.com/clojure/data.xml

You can also use crossclj.info to find api docs when needed:

  • https://crossclj.info/doc/org.clojure/clojure/latest/clojure.xml.html
  • https://crossclj.info/doc/org.clojure/data.xml/0.2.0-alpha2/index.html

Update:

Just a week or so after I saw this question I ran into an XML parsing problem just like this one that needed the unlazy function. You can now find unlazy in the Tupelo library.



来源:https://stackoverflow.com/questions/43194162/clojure-xml-stream-closed-exception

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!