I remember reading somewhere that to really optimize & speed up certain section of the code, programmers write that section in Assembly language. My questions are -
On some embedded devices (phones and PDAs), it's useful because the compilers are not terribly mature, and can generate extremely slow and even incorrect code. I have personally had to work around, or write assembly code to fix, the buggy output of several different compilers for ARM-based embedded platforms.