问题
It is very convenient to use Tasks
to express a lazy collection / a generator.
Eg:
function fib()
Task() do
prev_prev = 0
prev = 1
produce(prev)
while true
cur = prev_prev + prev
produce(cur)
prev_prev = prev
prev = cur
end
end
end
collect(take(fib(), 10))
Output:
10-element Array{Int64,1}:
1
1
2
3
5
8
13
21
34
However, they do not follow good iterator conventions at all. They are as badly behaved as they can be
They do not use the returned state state
start(fib()) == nothing #It has no state
So they are instead mutating the iterator object itself.
An proper iterator uses its state, rather than ever mutating itself, so they multiple callers can iterate it at once.
Creating that state with start
, and advancing it during next
.
Debate-ably, that state should be immutable
with next
returning a new state, so that can be trivially tee
ed. (On the other hand, allocating new memory -- though on the stack)
Further-more, the hidden state, it not advanced during next
.
The following does not work:
@show ff = fib()
@show state = start(ff)
@show next(ff, state)
Output:
ff = fib() = Task (runnable) @0x00007fa544c12230
state = start(ff) = nothing
next(ff,state) = (nothing,nothing)
Instead the hidden state is advanced during done
:
The following works:
@show ff = fib()
@show state = start(ff)
@show done(ff,state)
@show next(ff, state)
Output:
ff = fib() = Task (runnable) @0x00007fa544c12230
state = start(ff) = nothing
done(ff,state) = false
next(ff,state) = (1,nothing)
Advancing state during done
isn't the worst thing in the world.
After all, it is often the case that it is hard to know when you are done, without going to try and find the next state. One would hope done
would always be called before next
.
Still it is not great, since the following happens:
ff = fib()
state = start(ff)
done(ff,state)
done(ff,state)
done(ff,state)
done(ff,state)
done(ff,state)
done(ff,state)
@show next(ff, state)
Output:
next(ff,state) = (8,nothing)
Which is really now what you expect. It is reasonably to assume that done
is safe to call multiple times.
Basically Task
s make poor iterators. In many cases they are not compatible with other code that expects an iterator. (In many they are, but it is hard to tell which from which).
This is because Task
s are not really for use as iterators, in these "generator" functions. They are intended for low-level control flow.
And are optimized as such.
So what is the better way?
Writing an iterator for fib
isn't too bad:
immutable Fib end
immutable FibState
prev::Int
prevprev::Int
end
Base.start(::Fib) = FibState(0,1)
Base.done(::Fib, ::FibState) = false
function Base.next(::Fib, s::FibState)
cur = s.prev + s.prevprev
ns = FibState(cur, s.prev)
cur, ns
end
Base.iteratoreltype(::Type{Fib}) = Base.HasEltype()
Base.eltype(::Type{Fib}) = Int
Base.iteratorsize(::Type{Fib}) = Base.IsInfinite()
But is is a bit less intuitive. For more complex functions, it is much less nice.
So my question is: What is a better way to have something that works like as Task does, as a way to buildup a iterator from a single function, but that is well behaved?
I would not be surprised if someone has already written a package with a macro to solve this.
回答1:
The current iterator interface for Tasks is fairly simple:
# in share/julia/base/task.jl
275 start(t::Task) = nothing
276 function done(t::Task, val)
277 t.result = consume(t)
278 istaskdone(t)
279 end
280 next(t::Task, val) = (t.result, nothing)
Not sure why the devs chose to put the consumption step in the done
function rather than the next
function. This is what is producing your weird side-effect. To me it sounds much more straightforward to implement the interface like this:
import Base.start; function Base.start(t::Task) return t end
import Base.next; function Base.next(t::Task, s::Task) return consume(s), s end
import Base.done; function Base.done(t::Task, s::Task) istaskdone(s) end
Therefore, this is what I would propose as the answer to your question.
I think this simpler implementation is a lot more meaningful, fulfils your criteria above, and even has the desired outcome of outputting a meaningful state: the Task itself! (which you're allowed to "inspect" if you really want to, as long as that doesn't involve consumption :p ).
However, there are certain caveats:
Caveat 1: The task is REQUIRED to have a return value, signifying the final element in the iteration, otherwise "unexpected" behaviour might occur.
I'm assuming the devs chose the first approach to avoid exactly this kind of "unintended" output; however I believe this should have actually been the expected behaviour! A task expected to be used as an iterator should be expected to define an appropriate iteration endpoint (by means of a clear return value) by design!
Example 1: The wrong way to do it
julia> t = Task() do; for i in 1:10; produce(i); end; end; julia> collect(t) |> show Any[1,2,3,4,5,6,7,8,9,10,nothing] # last item is a return value of nothing # correponding to the "return value" of the # for loop statement, which is 'nothing'. # Presumably not the intended output!
Example 2: Another wrong way to do it
julia> t = Task() do; produce(1); produce(2); produce(3); produce(4); end; julia> collect(t) |> show Any[1,2,3,4,()] # last item is the return value of the produce statement, # which returns any items passed to it by the last # 'consume' call; in this case an empty tuple. # Presumably not the intended output!
Example 3: The (in my humble opinion) right way to do it!.
julia> t = Task() do; produce(1); produce(2); produce(3); return 4; end; julia> collect(t) |> show [1,2,3,4] # An appropriate return value ending the Task function ensures an # appropriate final value for the iteration, as intended.
Caveat 2: The task should not be modified / consumed further inside the iteration (a common requirement with iterators in general), except in the understanding that this intentionally causes a 'skip' in the iteration (which would be a hack at best, and presumably not advisable).
Example:
julia> t = Task() do; produce(1); produce(2); produce(3); return 4; end; julia> for i in t; show(consume(t)); end 24
More Subtle example:
julia> t = Task() do; produce(1); produce(2); produce(3); return 4; end; julia> for i in t # collecting i is a consumption event for j in t # collecting j is *also* a consumption event show(j) end end # at the end of this loop, i = 1, and j = 4 234
Caveat 3: With this scheme it is expected behaviour that you can 'continue where you left off'. e.g.
julia> t = Task() do; produce(1); produce(2); produce(3); return 4; end; julia> take(t, 2) |> collect |> show [1,2] julia> take(t, 2) |> collect |> show [3,4]
However, if one would prefer the iterator to always start from the pre-consumption state of a task, the start function could be modified to achieve this:
import Base.start; function Base.start(t::Task) return Task(t.code) end; import Base.next; function Base.next(t::Task, s::Task) consume(s), s end; import Base.done; function Base.done(t::Task, s::Task) istaskdone(s) end; julia> for i in t for j in t show(j) end end # at the end of this loop, i = 4, and j = 4 independently 1234123412341234
Interestingly, note how this variant would affect the 'inner consumption' scenario from 'caveat 2':
julia> t = Task() do; produce(1); produce(2); produce(3); return 4; end; julia> for i in t; show(consume(t)); end 1234 julia> for i in t; show(consume(t)); end 4444
See if you can spot why this makes sense! :)
Having said all this, there is a philosophical point about whether it even matters that the way a Task behaves with the start
, next
, and done
commands matters at all, in that, these functions are considered "an informal interface": i.e. they are supposed to be "under the hood" functions, not intended to be called manually.
Therefore, as long as they do their job and return the expected iteration values, you shouldn't care too much about how they do it under the hood, even if technically they don't quite follow the 'spec' while doing so, since you were never supposed to be calling them manually in the first place.
回答2:
How about the following (uses fib
defined in OP):
type NewTask
t::Task
end
import Base: start,done,next,iteratorsize,iteratoreltype
start(t::NewTask) = istaskdone(t.t)?nothing:consume(t.t)
next(t::NewTask,state) = (state==nothing || istaskdone(t.t)) ?
(state,nothing) : (state,consume(t.t))
done(t::NewTask,state) = state==nothing
iteratorsize(::Type{NewTask}) = Base.SizeUnknown()
iteratoreltype(::Type{NewTask}) = Base.EltypeUnknown()
function fib()
Task() do
prev_prev = 0
prev = 1
produce(prev)
while true
cur = prev_prev + prev
produce(cur)
prev_prev = prev
prev = cur
end
end
end
nt = NewTask(fib())
take(nt,10)|>collect
This is a good question, and is possibly better suited to the Julia list (now on Discourse platform). In any case, using defined NewTask an improved answer to a recent StackOverflow question is possible. See: https://stackoverflow.com/a/41068765/3580870
来源:https://stackoverflow.com/questions/41072425/better-way-than-using-task-produce-consume-for-lazy-collections-express-as-cor