问题
Starting with a collection of strings like:
(def str-coll ["abcd" "efgh" "jklm"])
The goal is to extract off a specific number of characters from the head of the string collection, generating a partitioned grouping of strings. This is the desired behavior:
(use '[clojure.contrib.str-utils2 :only (join)])
(partition-all 3 (join "" str-coll))
((\a \b \c) (\d \e \f) (\g \h \j) (\k \l \m))
However, using join forces evaluation of the entire collection, which causes memory issues when dealing with very large collections of strings. My specific use case is generating subsets of strings from a lazy collection generated by parsing a large file of delimited records:
(defn file-coll [in-file]
(->> (line-seq (reader in-file))
(partition-by #(.startsWith ^String % ">"))
(partition 2))))
and is building on work from this previous question. I've tried combinations of reduce, partition and join but can't come up with the right incantation to pull characters from the head of the first string and lazily evaluate subsequent strings as needed. Thanks much for any ideas or pointers.
回答1:
Not quite sure what you're going for, but the following does what your first example does, and does so lazily.
Step-by-step for clarity:
user=> (def str-coll ["abcd" "efgh" "jklm"]) #'user/str-coll user=> (map seq str-coll) ((\a \b \c \d) (\e \f \g \h) (\j \k \l \m)) user=> (flatten *1) (\a \b \c \d \e \f \g \h \j \k \l \m) user=> (partition 3 *1) ((\a \b \c) (\d \e \f) (\g \h \j) (\k \l \m))
All together now:
(->> str-coll (map seq) flatten (partition 3))
回答2:
EDIT: EVERYTHING I'VE WRITTEN WAS WRONG
When a function with a var-arg is applied to with a seq longer than the number of discrete args, the remainder of the seq is passed as the var-arg (see RestFn.applyTo).
To Jürgen: I'm stupid. You're smart. I was wrong. You were right. You're the best. I'm the worst. You're very good-looking. I'm not attractive.
The following is a record of my idiocy...
Responding to Jürgen Hötzel's comment.
mapcat
isn't fully lazy because apply
isn't lazy in evaluating the number of args to apply. Further, apply
can't be lazy because functions must be invoked with a discrete number of args. Currently if the number of args exceeds 20, the remaining args are dumped into an array, hence non-lazy.
So looking at the source for mapcat
:
(defn mapcat "Returns the result of applying concat to the result of applying map to f and colls. Thus function f should return a collection." {:added "1.0"} [f & colls] (apply concat (apply map f colls)))
If we expand the evaluation out using the example, the inner apply
would evaluate to:
user=> (map seq str-coll) ((\a \b \c \d) (\e \f \g \h) (\j \k \l \m))
which is fine since the str-coll
doesn't get fully realized, but then the outer apply
would evaluate to:
user=> (concat '(\a \b \c \d) '(\e \f \g \h) '(\j \k \l \m)) (\a \b \c \d \e \f \g \h \j \k \l \m)
Note that the outer apply
applies n arguments to concat
, one for each string in the original str-coll
. Now, it's true that the result of concat
is lazy, and each arg is itself lazy, but you still need realize the full length of str-coll
to get those n lazy seqs. If str-coll
has 1000 strings, then concat
will get 1000 args, and all 1000 strings would need to be read out of the file and into memory before concat
could be called.
For the unbelivers, a demonstration of the seq-realizing behavior of apply:
user=> (defn loud-seq [] (lazy-seq (println "HELLO") (cons 1 (loud-seq)))) #'user/loud-seq user=> (take 3 (loud-seq)) ; displaying the lazy-seq realizes it, thus printing HELLO (HELLO HELLO 1 HELLO 1 1) user=> (do (take 3 (loud-seq)) nil) ; lazy-seq not realized; no printing of HELLO nil user=> (do (apply concat (take 3 (loud-seq))) nil) ; draw your own conclusions HELLO HELLO HELLO nil
And a demonstration that varargs are not lazy:
user=> (defn foo [& more] (type more)) #'user/foo user=> (foo 1 2 3 4) clojure.lang.ArraySeq user=> (apply foo (repeat 4 1)) clojure.lang.Cons
Though as counterpoint, that the following works baffles me:
user=> (take 10 (apply concat (repeat [1 2 3 4]))) (1 2 3 4 1 2 3 4 1 2)
来源:https://stackoverflow.com/questions/3348719/partitioning-in-clojure-with-a-lazy-collection-of-strings