Better way than using `Task/produce/consume` for lazy collections express as coroutines

妖精的绣舞 提交于 2019-12-05 14:48:47

The current iterator interface for Tasks is fairly simple:

# in share/julia/base/task.jl
275 start(t::Task) = nothing
276 function done(t::Task, val)
277     t.result = consume(t)
278     istaskdone(t)
279 end
280 next(t::Task, val) = (t.result, nothing)

Not sure why the devs chose to put the consumption step in the done function rather than the next function. This is what is producing your weird side-effect. To me it sounds much more straightforward to implement the interface like this:

import Base.start; function Base.start(t::Task) return t end
import Base.next;  function Base.next(t::Task, s::Task) return consume(s), s end
import Base.done;  function Base.done(t::Task, s::Task) istaskdone(s) end

Therefore, this is what I would propose as the answer to your question.

I think this simpler implementation is a lot more meaningful, fulfils your criteria above, and even has the desired outcome of outputting a meaningful state: the Task itself! (which you're allowed to "inspect" if you really want to, as long as that doesn't involve consumption :p ).


However, there are certain caveats:

  • Caveat 1: The task is REQUIRED to have a return value, signifying the final element in the iteration, otherwise "unexpected" behaviour might occur.

    I'm assuming the devs chose the first approach to avoid exactly this kind of "unintended" output; however I believe this should have actually been the expected behaviour! A task expected to be used as an iterator should be expected to define an appropriate iteration endpoint (by means of a clear return value) by design!

    Example 1: The wrong way to do it

    julia> t = Task() do; for i in 1:10; produce(i); end; end;
    julia> collect(t) |> show
    Any[1,2,3,4,5,6,7,8,9,10,nothing] # last item is a return value of nothing
                                      # correponding to the "return value" of the
                                      # for loop statement, which is 'nothing'.
                                      # Presumably not the intended output!
    

    Example 2: Another wrong way to do it

    julia> t = Task() do; produce(1); produce(2); produce(3); produce(4); end;
    julia> collect(t) |> show
    Any[1,2,3,4,()] # last item is the return value of the produce statement,
                    # which returns any items passed to it by the last
                    # 'consume' call; in this case an empty tuple.
                    # Presumably not the intended output!
    

    Example 3: The (in my humble opinion) right way to do it!.

    julia> t = Task() do; produce(1); produce(2); produce(3); return 4; end;
    julia> collect(t) |> show
    [1,2,3,4] # An appropriate return value ending the Task function ensures an
              # appropriate final value for the iteration, as intended.
    
  • Caveat 2: The task should not be modified / consumed further inside the iteration (a common requirement with iterators in general), except in the understanding that this intentionally causes a 'skip' in the iteration (which would be a hack at best, and presumably not advisable).

    Example:

    julia> t = Task() do; produce(1); produce(2); produce(3); return 4; end;
    julia> for i in t; show(consume(t)); end
    24
    

    More Subtle example:

    julia> t = Task() do; produce(1); produce(2); produce(3); return 4; end;
    julia> for i in t   # collecting i is a consumption event
            for j in t  # collecting j is *also* a consumption event
              show(j)
            end
           end # at the end of this loop, i = 1, and j = 4
    234
    
  • Caveat 3: With this scheme it is expected behaviour that you can 'continue where you left off'. e.g.

    julia> t = Task() do; produce(1); produce(2); produce(3); return 4; end;
    julia> take(t, 2) |> collect |> show
    [1,2]
    julia> take(t, 2) |> collect |> show
    [3,4]
    

    However, if one would prefer the iterator to always start from the pre-consumption state of a task, the start function could be modified to achieve this:

    import Base.start; function Base.start(t::Task) return Task(t.code) end;
    import Base.next;  function Base.next(t::Task, s::Task) consume(s), s end;
    import Base.done;  function Base.done(t::Task, s::Task) istaskdone(s) end;
    
    julia> for i in t
             for j in t
               show(j)
             end
           end # at the end of this loop, i = 4, and j = 4 independently
    1234123412341234
    

    Interestingly, note how this variant would affect the 'inner consumption' scenario from 'caveat 2':

    julia> t = Task() do; produce(1); produce(2); produce(3); return 4; end;
    julia> for i in t; show(consume(t)); end
    1234
    julia> for i in t; show(consume(t)); end
    4444       
    

    See if you can spot why this makes sense! :)


Having said all this, there is a philosophical point about whether it even matters that the way a Task behaves with the start, next, and done commands matters at all, in that, these functions are considered "an informal interface": i.e. they are supposed to be "under the hood" functions, not intended to be called manually.

Therefore, as long as they do their job and return the expected iteration values, you shouldn't care too much about how they do it under the hood, even if technically they don't quite follow the 'spec' while doing so, since you were never supposed to be calling them manually in the first place.

Dan Getz

How about the following (uses fib defined in OP):

type NewTask
  t::Task
end

import Base: start,done,next,iteratorsize,iteratoreltype

start(t::NewTask) = istaskdone(t.t)?nothing:consume(t.t)
next(t::NewTask,state) = (state==nothing || istaskdone(t.t)) ?
  (state,nothing) : (state,consume(t.t))
done(t::NewTask,state) = state==nothing
iteratorsize(::Type{NewTask}) = Base.SizeUnknown()
iteratoreltype(::Type{NewTask}) = Base.EltypeUnknown()

function fib()
    Task() do
        prev_prev = 0
        prev = 1
        produce(prev)
        while true
            cur = prev_prev + prev
            produce(cur)
            prev_prev = prev
            prev = cur
        end
    end
end
nt = NewTask(fib())
take(nt,10)|>collect

This is a good question, and is possibly better suited to the Julia list (now on Discourse platform). In any case, using defined NewTask an improved answer to a recent StackOverflow question is possible. See: https://stackoverflow.com/a/41068765/3580870

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!