Yeah, I've been programming with threads, too. But I'm not masochistic enough to love them. It's still way too easy to get cross-talk between threads, no matter how much of a super-man you are, plus whatever help you get from coworkers. Threads are easy to do, but very difficult to do correctly, so of course Joe-Schmoe gravitates to it, plus, they're fast! (which is all that matters, of course)
On *nix, good old fork() is still a good way to go for many things. The overhead is not too bad (yes, I'll need to measure that to back up my BS some day), particularly if you are forking an interpreter, then generating a bunch of task specific data in the child process.
That said, child processes are hideously expensive on Windoze, I'm told. So the Erlang approach is looking pretty good: force Joe Schmoe to write pure functions and use message passing instead of his seemingly-infinite-state automata global (instance) variable whack-fest with bonus thread cross-talk extravaganza.
But I'm not bitter :-)
Revision / comment:
Excellent comment elsewhere about distance-to-memory. I had been thinking about this quite a bit recently as well. Mark-and-sweep garbage collection really hurts the "locality" aspect of running processes. M/S GC on 0 wait state RAM on an old 80286 may have seemed harmless, but it really hurts on multi-level caching architectures. Maybe referencing counting + fork/exit isn't such a bad idea as a GC implementation in some cases?
edit: I put some effort into backing up my talk here (results vary):
http://roboprogs.com/devel/2009.04.html