F#: How do i split up a sequence into a sequence of sequences

后端 未结 8 1979
执念已碎
执念已碎 2021-01-02 06:12

Background:

I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a

相关标签:
8条回答
  • 2021-01-02 06:13

    A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:

    type TimeStamp = Integer -- ticks
    type TimeSpan  = Integer -- difference between TimeStamps
    
    groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
    

    There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:

    The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,

    group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
    

    It is a special case of groupBy, which allows the programmer to supply their own equality test.

    It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:

    groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
    

    So let's write it!

    groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
    groupBy1 _    []            = [[]]
    groupBy1 _    [x]           = [[x]]
    groupBy1 comp (x : xs@(y : _))
      | comp x y  = (x : firstGroup) : otherGroups
      | otherwise = [x] : groups
      where groups@(firstGroup : otherGroups) = groupBy1 comp xs
    

    UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.

    UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).

    0 讨论(0)
  • 2021-01-02 06:17

    You seem to want a function that has signature

    (`a -> bool) -> seq<'a> -> seq<seq<'a>>
    

    I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.

    Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):

    let groupBy (fun: 'a -> bool) (input: seq) =
      seq {
        let cache = ref (new System.Collections.Generic.List())
        for e in input do
          (!cache).Add(e)
          if not (fun e) then
            yield !cache
            cache := new System.Collections.Generic.List()
        if cache.Length > 0 then
         yield !cache
      }
    

    An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.

    0 讨论(0)
  • 2021-01-02 06:23

    I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.

    I expect there is a better way, but I'll have to try again later.

    let N = 20
    let data =  // produce some arbitrary data with holes
        seq {
            for x in 1..N do
                if x % 4 <> 0 && x % 7 <> 0 then
                    printfn "producing %d" x
                    yield x
        }
    
    let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> = 
        LazyList.delayed (fun () ->
        match input with
        | LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
        | LazyList.Cons(x,LazyList.Nil) -> 
            LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
        | LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
            let groups = GroupBy comp xs
            if comp x y then
                LazyList.consf 
                    (LazyList.consf x (fun () -> 
                        let (LazyList.Cons(firstGroup,_)) = groups
                        firstGroup)) 
                    (fun () -> 
                        let (LazyList.Cons(_,otherGroups)) = groups
                        otherGroups)
            else
                LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
    
    let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
    printfn "Consuming..."
    for group in result do
        printfn "about to do a group"
        for x in group do
            printfn "  %d" x
    
    0 讨论(0)
  • 2021-01-02 06:23

    Below is some code that does what I think you want. It is not idiomatic F#.

    (It may be similar to Brian's answer, though I can't tell because I'm not familiar with the LazyList semantics.)

    But it doesn't exactly match your test specification: Seq.length enumerates its entire input. Your "test code" calls Seq.length and then calls Seq.hd. That will generate an enumerator twice, and since there is no caching, things get messed up. I'm not sure if there is any clean way to allow multiple enumerators without caching. Frankly, seq<seq<'a>> may not be the best data structure for this problem.

    Anyway, here's the code:

    type State<'a> = Unstarted | InnerOkay of 'a | NeedNewInner of 'a | Finished
    
    // f() = true means the neighbors should be kept together
    // f() = false means they should be split
    let split_up (f : 'a -> 'a -> bool) (input : seq<'a>) =
        // simple unfold that assumes f captured a mutable variable
        let iter f = Seq.unfold (fun _ -> 
            match f() with
            | Some(x) -> Some(x,())
            | None -> None) ()
    
        seq {
            let state = ref (Unstarted)
            use ie = input.GetEnumerator()
    
            let innerMoveNext() = 
                match !state with
                | Unstarted -> 
                    if ie.MoveNext()
                    then let cur = ie.Current
                         state := InnerOkay(cur); Some(cur)
                    else state := Finished; None 
                | InnerOkay(last) ->
                    if ie.MoveNext()
                    then let cur = ie.Current
                         if f last cur
                         then state := InnerOkay(cur); Some(cur)
                         else state := NeedNewInner(cur); None
                    else state := Finished; None
                | NeedNewInner(last) -> state := InnerOkay(last); Some(last)
                | Finished -> None 
    
            let outerMoveNext() =
                match !state with
                | Unstarted | NeedNewInner(_) -> Some(iter innerMoveNext)
                | InnerOkay(_) -> failwith "Move to next inner seq when current is active: undefined behavior."
                | Finished -> None
    
            yield! iter outerMoveNext }
    
    
    open System
    
    let groupContigs (contigTime : TimeSpan) (holey : seq<DateTime * int>) =
        split_up (fun (t1,_) (t2,_) -> (t2 - t1) <= contigTime) holey
    
    
    // Test data
    let numbers = {1 .. 15}
    let contiguousTimeStamps = 
        let baseTime = DateTime.Now
        seq { for n in numbers -> baseTime.AddMinutes(float n)}
    
    let holeyData = 
        Seq.zip contiguousTimeStamps numbers 
            |> Seq.filter (fun (dateTime, num) -> num % 7 <> 0)
    
    let grouped_data = groupContigs (new TimeSpan(0,1,0)) holeyData
    
    
    printfn "Consuming..."
    for group in grouped_data do
        printfn "about to do a group"
        for x in group do
            printfn "  %A" x
    
    0 讨论(0)
  • 2021-01-02 06:24

    I think this does what you want

    dataWithOccationalHoles 
    |> Seq.pairwise
    |> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
    |> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) ->  (index+indexres,(time1,elem1),(time2,elem2))  ) (0,(baseTime,-1.0),(baseTime,-1.0))
    |> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
    |> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
    |> PSeq.groupBy(fst)
    |> Seq.map(snd>>Seq.map(snd))
    

    Thanks for asking this cool question

    0 讨论(0)
  • 2021-01-02 06:30

    (EDIT: This suffers from a similar problem to Brian's solution, in that iterating the outer sequence without iterating over each inner sequence will mess things up badly!)

    Here's a solution that nests sequence expressions. The imperitave nature of .NET's IEnumerable<T> is pretty apparent here, which makes it a bit harder to write idiomatic F# code for this problem, but hopefully it's still clear what's going on.

    let groupBy cmp (sq:seq<_>) =
      let en = sq.GetEnumerator()
      let rec partitions (first:option<_>) =
        seq {
          match first with
          | Some first' ->             //'
            (* The following value is always overwritten; 
               it represents the first element of the next subsequence to output, if any *)
            let next = ref None
    
            (* This function generates a subsequence to output, 
               setting next appropriately as it goes *)
            let rec iter item = 
              seq {
                yield item
                if (en.MoveNext()) then
                  let curr = en.Current
                  if (cmp item curr) then
                    yield! iter curr
                  else // consumed one too many - pass it on as the start of the next sequence
                    next := Some curr
                else
                  next := None
              }
            yield iter first' (* ' generate the first sequence *)
            yield! partitions !next (* recursively generate all remaining sequences *)
          | None -> () // return an empty sequence if there are no more values
        }
      let first = if en.MoveNext() then Some en.Current else None
      partitions first
    
    let groupContiguousDataPoints (time:TimeSpan) : (seq<DateTime*_> -> _) = 
      groupBy (fun (t,_) (t',_) -> t' - t <= time)
    
    0 讨论(0)
提交回复
热议问题