I know Haskell a little bit, and I wonder if it\'s possible to write something like a matrix-matrix product in Haskell that is all of the following:
Like Java, Haskell is not the best language for writing numerical code.
Haskell's numeric-heavy codegeneration is... average. It hasn't had the years of research behind it that the likes of Intel and GCC have.
What Haskell gives you instead, is a way to cleanly interface your "fast" code with the rest of your application. Remember that 3% of code is responsible for 97% of your application's running time. 1
With Haskell, you have a way to call these highly optimized functions in a way that interfaces extremely nicely with the rest of your code: via the very nice C Foreign Function Interface. In fact, if you so desired, you could write your numeric code in the assembly language of your architecture and get even more performance! Dipping into C for performance-heavy parts of your application isn't a bug - it's a feature.
But I digress.
By having these highly optimized functions isolated, and with a similar interface to the rest of your Haskell code, you could perform high level optimizations with Haskell's very powerful rewrite rules, which allow you to write rules such as reverse . reverse == id
which automagically reduce complex expressions at compile time 2. This leads to extremely fast, purely functional, and easy-to-use libraries like Data.Text 3 and Data.Vector [4].
By combining high and low levels of optimization, we end up with a much more optimized implementation, with each half ("C/asm", and "Haskell") relatively easy to read. The low level optimization is done in its native tongue (C or assembly), the high level optimization gets a special DSL (Haskell rewrite rules), and the rest of the code is oblivious to it completely.
In conclusion, yes, Haskell can be faster than Java. But it cheats by going through C for the raw FLOPS. This is much harder to do in Java (as well as having a much higher overhead for Java's FFI), so it's avoided. In Haskell, it's natural. If your application spends an exorbitant amount of time doing numeric calculations, then maybe instead of looking at Haskell or Java, you look at Fortran for your needs. If your application spends a large portion of its time in a tiny part of performance-sensitive code, then the Haskell FFI is your best bet. If your application doesn't spend any time in numeric code... then use whatever you like. =)
Haskell (nor Java, for that matter) isn't Fortran.
1 These numbers were made up, but you get my point.
2 http://www.cse.unsw.edu.au/~dons/papers/CLS07.html
3 http://hackage.haskell.org/package/text
[4] http://hackage.haskell.org/package/vector
Now that that's out of the way, to answer your actual question:
No, it's not currently smart to write your matrix multiplications in Haskell. At the moment, REPA is the canonical way to do this [5]. The implementation partially breaks memory safety, (they use unsafeSlice), but the "broken memory safety" is isolated to that function, actually very safe (but not easily verified by the compiler), and easy to remove if things go wrong (replace "unsafeSlice" with "slice").
But this is Haskell! Very rarely are the performance characteristics of a function to be taken in isolation. That can be a bad thing (in the case of space leaks), or a very, very good thing.
Although the matrix multiplication algorithm used is naive, it will perform worse in a raw benchmark. But rarely does our code look like benchmarks.
What if you were a scientist with millions of data points and want to multiply huge matrices? [7]
For those people, we have mmultP [6]. This performs matrix multiplication, but is data-parallel, and subject to REPA's nested data parallelism. Also note that the code is essentially unchanged from the sequential version.
For those people that don't multiply huge matrices, and instead multiply lots of little matrices, there tends to be other code interacting with said matrices. Possibly cutting it up into column vectors and finding their dot products, maybe finding its eigenvalues, maybe something else entirely. Unlike C, Haskell knows that although you like to solve problems in isolation, the most efficient solution usually isn't found there.
Like ByteString, Text, and Vector, REPA arrays are subject to fusion. 2 You should actually read 2 by the way - it's a very well written paper. This, combined with aggressive inlining of relevant code and REPA's highly parallel nature allows us to express these high-level mathematical concepts with very advanced high-level optimizations behind the scenes.
Although a method of writing an efficient matrix multiplication in pure functional languages isn't currently know, we can come somewhat close (no automatic vectorization, a few excessive dereferences to get to the actual data, etc.), but nothing near what IFORT or GCC can do. But program's don't exist on an island, and making the island as a whole perform well is much, much easier in Haskell than Java.
[5] http://hackage.haskell.org/packages/archive/repa-algorithms/3.2.1.1/doc/html/src/Data-Array-Repa-Algorithms-Matrix.html#mmultS
[6] http://hackage.haskell.org/packages/archive/repa-algorithms/3.2.1.1/doc/html/src/Data-Array-Repa-Algorithms-Matrix.html#mmultP
[7] Acutally, the best way to do this is by using the GPU. There are a few GPU DSLs available for Haskell which make this possible to do natively. They're really neat!