This weekend I decided to try my hand at some Scala and Clojure. I\'m proficient with object oriented programming, and so Scala was easy to pick up as a language, but wante
I was (surprised and) disappointed by the performance of what seemed to me the most idiomatic Clojure solutions, @JamesCunningham 's lazy-seq
solutions.
(def integers (iterate inc 0))
(def coll (take 10000 integers))
(def n 1000)
(time (doall (moving-average-james-1 coll n)))
# "Elapsed time: 3022.862 msecs"
(time (doall (moving-average-james-2 coll n)))
# "Elapsed time: 3433.988 msecs"
So here's a combination of James' solution with @DanielC.Sobral 's idea of adapting fast-exponentiation to moving sums :
(defn moving-average
[coll n]
(letfn [(moving-sum [coll n]
(lazy-seq
(cond
(= n 1) coll
(= n 2) (map + coll (rest coll))
(odd? n) (map + coll (moving-sum (rest coll) (dec n)))
:else (let [half (quot n 2)
hcol (moving-sum coll half)]
(map + hcol (drop half hcol))))))]
(cond
(< n 1) nil
(= n 1) coll
:else (map #(/ % n) (moving-sum coll n)))))
(time (doall (moving-average coll n)))
# "Elapsed time: 42.034 msecs"
Edit: this one -based on @mikera 's solution- is even faster.
(defn moving-average
[coll n]
(cond
(< n 1) nil
(= n 1) coll
:else (let [sums (reductions + 0 coll)]
(map #(/ (- %1 %2) n) (drop n sums) sums))))
(time (doall (moving-average coll n)))
# "Elapsed time: 9.184 msecs"
Here's a partially point-free one line Haskell solution:
ma p = reverse . map ((/ (fromIntegral p)) . sum . take p) . (drop p) . reverse . tails
First it applies tails to the list to get the "tails" lists, so:
Prelude List> tails [2.0, 4.0, 7.0, 6.0, 3.0]
[[2.0,4.0,7.0,6.0,3.0],[4.0,7.0,6.0,3.0],[7.0,6.0,3.0],[6.0,3.0],[3.0],[]]
Reverses it and drops the first 'p' entries (taking p as 2 here):
Prelude List> (drop 2 . reverse . tails) [2.0, 4.0, 7.0, 6.0, 3.0]
[[6.0,3.0],[7.0,6.0,3.0],[4.0,7.0,6.0,3.0],[2.0,4.0,7.0,6.0,3.0]]
In case you aren't familiar with the (.) dot/nipple symbol, it is the operator for 'functional composition', meaning it passes the output of one function as the input of another, "composing" them into a single function. (g . f) means "run f on a value then pass the output to g", so ((f . g) x) is the same as (g(f x)). Generally its usage leads to a clearer programming style.
It then maps the function ((/ (fromIntegral p)) . sum . take p) onto the list. So for every list in the list it takes the first 'p' elements, sums them, then divides them by 'p'. Then we just flip the list back again with "reverse".
Prelude List> map ((/ (fromIntegral 2)) . sum . take 2) [[6.0,3.0],[7.0,6.0,3.0]
,[4.0,7.0,6.0,3.0],[2.0,4.0,7.0,6.0,3.0]]
[4.5,6.5,5.5,3.0]
This all looks a lot more inefficient than it is; "reverse" doesn't physically reverse the order of a list until the list is evaluated, it just lays it out onto the stack (good ol' lazy Haskell). "tails" also doesn't create all those separate lists, it just references different sections of the original list. It's still not a great solution, but it one line long :)
Here's a slightly nicer but longer solution that uses mapAccum to do a sliding subtraction and addition:
ma p l = snd $ mapAccumL ma' a l'
where
(h, t) = splitAt p l
a = sum h
l' = (0, 0) : (zip l t)
ma' s (x, y) = let s' = (s - x) + y in (s', s' / (fromIntegral p))
First we split the list into two parts at "p", so:
Prelude List> splitAt 2 [2.0, 4.0, 7.0, 6.0, 3.0]
([2.0,4.0],[7.0,6.0,3.0])
Sum the first bit:
Prelude List> sum [2.0, 4.0]
6.0
Zip the second bit with the original list (this just pairs off items in order from the two lists). The original list is obviously longer, but we lose this extra bit:
Prelude List> zip [2.0, 4.0, 7.0, 6.0, 3.0] [7.0,6.0,3.0]
[(2.0,7.0),(4.0,6.0),(7.0,3.0)]
Now we define a function for our mapAccum(ulator). mapAccumL is the same as "map", but with an extra running state/accumulator parameter, which is passed from the previous "mapping" to the next one as map runs through the list. We use the accumulator as our moving average, and as our list is formed of the element that has just left the sliding window and the element that just entered it (the list we just zipped), our sliding function takes the first number 'x' away from the average and adds the second number 'y'. We then pass the new 's' along and return 's' divided by 'p'. "snd" (second) just takes the second member of a pair (tuple), which is used to take the second return value of mapAccumL, as mapAccumL will return the accumulator as well as the mapped list.
For those of you not familiar with the $ symbol, it is the "application operator". It doesn't really do anything but it has a has "low, right-associative binding precedence", so it means you can leave out the brackets (take note LISPers), i.e. (f x) is the same as f $ x
Running (ma 4 [2.0, 4.0, 7.0, 6.0, 3.0, 8.0, 12.0, 9.0, 4.0, 1.0]) yields [4.75, 5.0, 6.0, 7.25, 8.0, 8.25, 6.5] for either solution.
Oh and you'll need to import the module "List" to compile either solution.
I know how I would do it in python (note: the first 3 elements with the values 0.0 are not returned since that is actually not the appropriate way to represent a moving average). I would imagine similar techniques will be feasible in Scala. Here are multiple ways to do it.
data = (2.0, 4.0, 7.0, 6.0, 3.0, 8.0, 12.0, 9.0, 4.0, 1.0)
terms = 4
expected = (4.75, 5.0, 6.0, 7.25, 8.0, 8.25, 6.5)
# Method 1 : Simple. Uses slices
assert expected == \
tuple((sum(data[i:i+terms])/terms for i in range(len(data)-terms+1)))
# Method 2 : Tracks slots each of terms elements
# Note: slot, and block mean the same thing.
# Block is the internal tracking deque, slot is the final output
from collections import deque
def slots(data, terms):
block = deque()
for datum in data :
block.append(datum)
if len(block) > terms : block.popleft()
if len(block) == terms :
yield block
assert expected == \
tuple(sum(slot)/terms for slot in slots(data, terms))
# Method 3 : Reads value one at a time, computes the sums and throws away read values
def moving_average((avgs, sums),val):
sums = tuple((sum + val) for sum in sums)
return (avgs + ((sums[0] / terms),), sums[1:] + (val,))
assert expected == reduce(
moving_average,
tuple(data[terms-1:]),
((),tuple(sum(data[i:terms-1]) for i in range(terms-1))))[0]
# Method 4 : Semantically same as method 3, intentionally obfuscates just to fit in a lambda
assert expected == \
reduce(
lambda (avgs, sums),val: tuple((avgs + ((nsum[0] / terms),), nsum[1:] + (val,)) \
for nsum in (tuple((sum + val) for sum in sums),))[0], \
tuple(data[terms-1:]),
((),tuple(sum(data[i:terms-1]) for i in range(terms-1))))[0]
Using Haskell:
movingAverage :: Int -> [Double] -> [Double]
movingAverage n xs = catMaybes . (fmap avg . take n) . tails $ xs
where avg list = case (length list == n) -> Just . (/ (fromIntegral n)) . (foldl (+) 0) $ list
_ -> Nothing
The key is the tails function, which maps a list to a list of copies of the original list, with the property that the n-th element of the result is missing the first n-1 elements.
So
[1,2,3,4,5] -> [[1,2,3,4,5], [2,3,4,5], [3,4,5], [4,5], [5], []]
We apply fmap (avg . take n) to the result, which means we take the n-length prefix from the sublist, and compute its avg. If the length of the list we are avg'ing is not n, then we do not compute the average (since it is undefined). In that case, we return Nothing. If it is, we do, and wrap it in "Just". Finally, we run "catMaybes" on the result of fmap (avg . take n), to get rid of the Maybe type.
In Haskell pseudocode:
group4 (a:b:c:d:xs) = [a,b,c,d] : group4 (b:c:d:xs)
group4 _ = []
avg4 xs = sum xs / 4
running4avg nums = (map avg4 (group4 nums))
or pointfree
runnig4avg = map avg4 . group4
(Now one really should abstract the 4 out ....)
This solution is in Haskell, which is more familiar to me:
slidingSums :: Num t => Int -> [t] -> [t]
slidingSums n list = case (splitAt (n - 1) list) of
(window, []) -> [] -- list contains less than n elements
(window, rest) -> slidingSums' list rest (sum window)
where
slidingSums' _ [] _ = []
slidingSums' (hl : tl) (hr : tr) sumLastNm1 = sumLastN : slidingSums' tl tr (sumLastN - hl)
where sumLastN = sumLastNm1 + hr
movingAverage :: Fractional t => Int -> [t] -> [t]
movingAverage n list = map (/ (fromIntegral n)) (slidingSums n list)
paddedMovingAverage :: Fractional t => Int -> [t] -> [t]
paddedMovingAverage n list = replicate (n - 1) 0 ++ movingAverage n list
Scala translation:
def slidingSums1(list: List[Double], rest: List[Double], n: Int, sumLastNm1: Double): List[Double] = rest match {
case Nil => Nil
case hr :: tr => {
val sumLastN = sumLastNm1 + hr
sumLastN :: slidingSums1(list.tail, tr, n, sumLastN - list.head)
}
}
def slidingSums(list: List[Double], n: Int): List[Double] = list.splitAt(n - 1) match {
case (_, Nil) => Nil
case (firstNm1, rest) => slidingSums1(list, rest, n, firstNm1.reduceLeft(_ + _))
}
def movingAverage(list: List[Double], n: Int): List[Double] = slidingSums(list, n).map(_ / n)
def paddedMovingAverage(list: List[Double], n: Int): List[Double] = List.make(n - 1, 0.0) ++ movingAverage(list, n)