What techniques promote efficient opcode dispatch to make a fast interpreter? Are there some techniques that only work well on modern hardware and others that don\'t work we
I found an blog post on threaded interpreter implementation that was useful.
The author describes GCC label-based threading and also how to do this in Visual Studio using inline assembler.
http://abepralle.wordpress.com/2009/01/25/how-not-to-make-a-virtual-machine-label-based-threading/
The results are interesting. He reports 33% performance improvement when using GCC but surprisingly the Visual Studio inline assembly implementation is 3 times slower!