This question is mystifying me for years and considering this site\'s name, this is the place to ask.
Why do we, programmers, still have this StackOverflow
I can't speak for "major languages". Many "minor" languages do heap-allocated activation records, with each call using a chunk of heap space instead of a linear stack chunk. This allows recursion to go as deep as you have address space to allocate.
Some folks here claim that recursion that deep is wrong, and that using a "big linear stack" is just fine. That isn't right. I'd agree that if you have to use the entire address space, you do a problem of some kind. However, when one has very large graph or tree structures, you want to allow deep recursion and you don't want to guess at how much linear stack space you need first, because you'll guess wrong.
If you decide to go parallel, and you have lots (thousand to million of "grains" [think, small threads]) you can't have 10Mb of stack space allocated to each thread, because you'll be wasting gigabytes of RAM. How on earth could you ever have a million grains? Easy: lots of grains that interlock with one another; when a grain is frozen waiting for a lock, you can't get rid of it, and yet you still want to run other grains to use your available CPUs. This maximizes the amount of available work, and thus allows many physical processors to be used effectively.
The PARLANSE parallel programming language uses this very-large-number of parallel grains model, and heap allocation on function calls. We designed PARLANSE to enable the symbolic analysis and transformation of very large source computer programs (say, several million lines of code). These produce... giant abstract syntax trees, giant control/data flow graphs, giant symbol tables, with tens of millions of nodes. Lots of opportunity for parallel workers.
The heap allocation allows PARLANSE programs to be lexically scoped, even across parallelism boundaries, because one can implement "the stack" as a cactus stack, where forks occur in "the stack" for subgrains, and each grain can consequently see the activation records (parent scopes) of its callers. This makes passing big data structures cheap when recursing; you just reference them lexically.
One might think that heap allocation slows down the program. It does; PARLANSE pays about a 5% penalty in performance but gains the ability to process very large structures in parallel, with as many grains as the address space can hold.