As someone in the world of HPC who came from the world of enterprise web development, I\'m always curious to see how developers back in the \"real world\" are taking advanta
I believe that "Cycles are an engineers' best friend".
My company provides a commercial tool for analyzing and transforming very large software systems in many computer languages. "Large" means 10-30 million lines of code. The tool is the DMS Software Reengineering Toolkit (DMS for short).
Analyses (and even transformations) on such huge systems take a long time: our points-to analyzer for C code takes 90 CPU hours on an x86-64 with 16 Gb RAM. Engineers want answers faster than that.
Consequently, we implemented DMS in PARLANSE, a parallel programming language of our own design, intended to harness small-scale multicore shared memory systems.
The key ideas behind parlanse are: a) let the programmer expose parallelism, b) let the compiler choose which part it can realize, c) keep the context switching to an absolute minimum. Static partial orders over computations are an easy to help achieve all 3; easy to say, relatively easy to measure costs, easy for compiler to schedule computations. (Writing parallel quicksort with this is trivial).
Unfortunately, we did this in 1996 :-( The last few years have finally been a vindication; I can now get 8 core machines at Fry's for under $1K and 24 core machines for about the same price as a small car (and likely to drop rapidly).
The good news is that DMS is now a fairly mature, and there are a number of key internal mechanisms in DMS which take advantage of this, notably an entire class of analyzers call "attribute grammars", which we write using a domain-specific language which is NOT parlanse. DMS compiles these atrribute grammars into PARLANSE and then they are executed in parallel. Our C++ front end uses attribute grammars, and is about 100K sloc; it is compiled into 800K SLOC of parallel parlanse code that actually works reliably.
Now (June 2009), we are pretty busy making DMS useful, and don't always have enough time to harness the parallelism well. Thus the 90 hour points-to analysis. We are working on parallelizing that, and have reasonable hope of 10-20x speedup.
We believe that in the long run, harnessing SMP well will make workstations far more friendly to engineers asking hard questions. As well they should.