Technically, why are processes in Erlang more efficient than OS threads?

放肆的年华 提交于 2019-11-26 15:35:37

There are several contributing factors:

  1. Erlang processes are not OS processes. They are implemented by the Erlang VM using a lightweight cooperative threading model (preemptive at the Erlang level, but under the control of a cooperatively scheduled runtime). This means that it is much cheaper to switch context, because they only switch at known, controlled points and therefore don't have to save the entire CPU state (normal, SSE and FPU registers, address space mapping, etc.).
  2. Erlang processes use dynamically allocated stacks, which start very small and grow as necessary. This permits the spawning of many thousands — even millions — of Erlang processes without sucking up all available RAM.
  3. Erlang used to be single-threaded, meaning that there was no requirement to ensure thread-safety between processes. It now supports SMP, but the interaction between Erlang processes on the same scheduler/core is still very lightweight (there are separate run queues per core).
Jonas

After some more research I found a presentation by Joe Armstrong.

From Erlang - software for a concurrent world (presentation) (at 13 min):

[Erlang] is a concurrent language – by that I mean that threads are part of the programming language, they do not belong to the operating system. That's really what's wrong with programming languages like Java and C++. It's threads aren't in the programming language, threads are something in the operating system – and they inherit all the problems that they have in the operating system. One of the problems is granularity of the memory management system. The memory management in the operating system protects whole pages of memory, so the smallest size that a thread can be is the smallest size of a page. That's actually too big.

If you add more memory to your machine – you have the same number of bits that protects the memory so the granularity of the page tables goes upyou end up using say 64kB for a process you know running in a few hundred bytes.

I think it answers if not all, at least a few of my questions

I've implemented coroutines in assembler, and measured performance.

Switching between coroutines, a.k.a. Erlang processes, takes about 16 instructions and 20 nanoseconds on a modern processor. Also, you often know the process you are switching to (example: a process receiving a message in its queue can be implemented as straight hand-off from the calling process to the receiving process) so the scheduler doesn't come into play, making it an O(1) operation.

To switch OS threads, it takes about 500-1000 nanoseconds, because you're calling down to the kernel. The OS thread scheduler might run in O(log(n)) or O(log(log(n))) time, which will start to be noticeable if you have tens of thousands, or even millions of threads.

Therefore, Erlang processes are faster and scale better because both the fundamental operation of switching is faster, and the scheduler runs less often.

Erlang processes correspond (approximately) to green threads in other languages; there's no OS-enforced separation between the processes. (There may well be language-enforced separation, but that's a lesser protection despite Erlang doing a better job than most.) Because they're so much lighter-weight, they can be used far more extensively.

OS threads on the other hand are able to be simply scheduled on different CPU cores, and are (mostly) able to support independent CPU-bound processing. OS processes are like OS threads, but with much stronger OS-enforced separation. The price of these capabilities is that OS threads and (even more so) processes are more expensive.


Another way to understand the difference is this. Supposing you were going to write an implementation of Erlang on top of the JVM (not a particularly crazy suggestion) then you'd make each Erlang process be an object with some state. You'd then have a pool of Thread instances (typically sized according to the number of cores in your host system; that's a tunable parameter in real Erlang runtimes BTW) which run the Erlang processes. In turn, that will distribute the work that is to be done across the real system resources available. It's a pretty neat way of doing things, but relies utterly on the fact that each individual Erlang process doesn't do very much. That's OK of course; Erlang is structured to not require those individual processes to be heavyweight since it is the overall ensemble of them which execute the program.

In many ways, the real problem is one of terminology. The things that Erlang calls processes (and which correspond strongly to the same concept in CSP, CCS, and particularly the π-calculus) are simply not the same as the things that languages with a C heritage (including C++, Java, C#, and many others) call a process or a thread. There are some similarities (all involve some notion of concurrent execution) but there's definitely no equivalence. So be careful when someone says “process” to you; they might understand it to mean something utterly different…

I think Jonas wanted some numbers on comparing OS threads to Erlang processes. The author of Programming Erlang, Joe Armstrong, a while back tested the scalability of the spawning of Erlang processes to OS threads. He wrote a simple web server in Erlang and tested it against multi-threaded Apache (since Apache uses OS threads). There's an old website with the data dating back to 1998. I've managed only to find that site exactly once. So I can't supply a link. But the information is out there. The main point of the study showed that Apache maxed out just under 8K processes, while his hand written Erlang server handled 10K+ processes.

Because Erlang interpreter has only to worry about itself, the OS has many other things to worry about.

one of the reason is erlang process is created not in the OS, but in the evm(erlang virtual machine), so the cost is smaller.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!