Background:
I have a sequence of contiguous, time-stamped data.
The data-sequence has holes in it, some large, others just a single missing value.
Whenever the hole is j
Any time you break apart a seq using Seq.hd
and Seq.skip 1
you are almost surely falling into the trap of going O(N^2). IEnumerable<T>
is an awful type for recursive algorithms (including e.g. Seq.unfold
), since these algorithms almost always have the structure of 'first element' and 'remainder of elements', and there is no efficient way to create a new IEnumerable
that represents the 'remainder of elements'. (IEnumerator<T>
is workable, but its API programming model is not so fun/easy to work with.)
If you need the original data to 'stay lazy', then you should use a LazyList (in the F# PowerPack). If you don't need the laziness, then you should use a concrete data type like 'list', which you can 'tail' into in O(1).
(You should also check out Avoiding stack overflow (with F# infinite sequences of sequences) as an FYI, though it's only tangentially applicable to this problem.)
Seq.skip constructs a new sequence. I think that is why your original approach is slow.
My first inclination is to use a sequence expression and Seq.pairwise. This is fast and easy to read.
let insertDummyValuesWhereASingleValueIsMissingSeq (timeBetweenContiguousValues : TimeSpan) (values : seq<(DateTime * float)>) =
let sizeOfHolesToPatch = timeBetweenContiguousValues.Add timeBetweenContiguousValues // Only insert dummy-values when the gap is twice the normal
seq {
yield Seq.hd values
for ((prevTime, _), ((currentTime, _) as next)) in Seq.pairwise values do
let timeDiffBetweenPrevAndCurrentValue = currentTime.Subtract(prevTime)
if timeDiffBetweenPrevAndCurrentValue = sizeOfHolesToPatch then
let dummyValue = (prevTime.Add timeBetweenContiguousValues, 42.0) // 42 is chosen here for obvious reasons, making this comment superfluous
yield dummyValue
yield next
}