With help from two very smart colleagues, I've written a dataflow-optimization library in both Objective Caml and Haskell. The Haskell version is a bit more polymorphic, has more compile-time type checking, and therefore has less run-time checking. The OCaml version uses mutable state to accumulate dataflow facts, which might be faster or slower this week, depending on the phase of the moon. The key fact is that in their intended applications, both libraries are so fast that they are not worth fooling with. That is, in the respective compilers (Quick C-- and GHC), so little time is spent in dataflow optimization that the code is not worth improving.
Benchmarking is hell.