Improving clojure lazy-seq usage for iterative text parsing

非 Y 不嫁゛ 提交于 2019-12-05 17:17:32
Alex Taggart

It probably doesn't matter, but average is holding onto the head of the seq of lengths.
The following is a wholly untested, but lazier way to do what I think you want.

(use 'clojure.java.io) ;' since 1.2

(defn lazy-avg [coll]
  (let [f (fn [[v c] val] [(+ v val) (inc c)])
        [sum cnt] (reduce f [0 0] coll)]
    (if (zero? cnt) 0 (/ sum cnt)))

(defn fasta-avg [f]
  (->> (reader f) 
    line-seq
    (filter #(not (.startsWith % ">")))
    (map #(.length %))
    lazy-avg))

Your average function is non-lazy -- it needs to realise the entire coll argument while holding onto its head. Update: Just realised that my original answer included a nonsensical suggestion as to how to solve the above problem... argh. Fortunately ataggart has since posted a correct solution.

Other than that, your code does seem lazy at first glance, though the use of read-lines is currently discouraged (use line-seq instead).

If the file is really large and your functions will be called a large number of times, type-hinting seq-iter in the argument vector of seq-length -- ^NameOfBiojavaSeqIterClass seq-iter, use #^ in place of ^ if you're on Clojure 1.1 -- might make a significant difference. In fact, (set! *warn-on-reflection* true), then compile your code and add type hints to remove all reflection warnings.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!