In Haskell, if I write
fac n = facRec n 1
where facRec 0 acc = acc
facRec n acc = facRec (n-1) (acc*n)
and compile it with GHC
In haskell, it only helps to write your program in a tail-recursive way if your accumulator is strict and you need to whole result.
With ghc's runHaskell the program won't be optimised, so there won't be a strictness analysis, so you may stack overflow; while if you compile with optimisations the compiler may detect the accumulator needs to be strict and optimise accordingly.
To see how things happen differently (or not) the best way is to inspect the core langage generated, a good blog post from Don Stewart explains things . Many of his blog post are interesting if your interested about performance, by the way.