Understanding NodeJS & Non-Blocking IO

我怕爱的太早我们不能终老 提交于 2019-12-05 10:05:19

问题


So, I've recently been injected with the Node virus which is spreading in the Programming world very fast.

I am fascinated by it's "Non-Blocking IO" approach and have indeed tried out a couple of programs myself.

However, I fail to understand certain concepts at the moment.

I need answers in layman terms (someone coming from a Java background)

1. Multithreading & Non-Blocking IO.

Let's consider a practical scenario. Say, we have a website where users can register. Below would be the code.

..
..
   // Read HTTP Parameters
   // Do some Database work
   // Do some file work
   // Return a confirmation message
..
..

In a traditional programming language, the above happens in a sequential way. And, if there are multiple requests for registration, the web server creates a new thread and the rest is history. Of course, programmers can create threads of their own to work on Line 2 and Line 3 simultaneously.

In Node, as I understand, Lines 2 & 3 will be run in parallel while the rest of the program gets executed and the Interpreter polls the lines 2 & 3 every 'x' ms.

Now, my question is, if Node is a single threaded language, what does the job of lines 2 & 3 while the rest of the program is being executed?

2. Scalability

I recently read that LinkedIn have adapted Node as a back-end for their Mobile Apps and have seen massive improvements.

Can anyone explain how it has made such a difference?

3. Adapting in other programming languages

If people are claiming that Node to be making a lot of difference when it comes to performance, why haven't other programming languages adapted this Non-Blocking IO paradigm?

I'm sure I'm missing something. Only if you can explain me and guide me with some links, would be helpful.

Thanks.


回答1:


A similar question was asked and probably contains all the info you're looking for: How the single threaded non blocking IO model works in Node.js

But I'll briefly cover your 3 parts:

1.
Lines 2 and 3 in a very simple form could look like:
      db.query(..., function(query_data) { ... });
      fs.readFile('/path/to/file', function(file_data) { ... });

Now the function(query_data) and function(file_data) are callbacks. The functions db.query and fs.readFile will send the actual I/O requests but the callbacks allow the processing of the data from the database or the file to be delayed until the responses are received. It doesn't really "poll lines 2 and 3". The callbacks are added to an event loop and associated with some file descriptors for their respective I/O events. It then polls the file descriptors to see if they are ready to perform I/O. If they are, it executes the callback functions with the I/O data.

I think the phrase "Everything runs in parallel except your code" sums it up well. For example, something like "Read HTTP parameters" would execute sequentially, but I/O functions like in lines 2 and 3 are associated with callbacks that are added to the event loop and execute later. So basically the whole point is it doesn't have to wait for I/O.

2.
Because of the things explained in 1., Node scales well for I/O intensive requests and allows many users to be connected simultaneously. It is single threaded, so it doesn't necessarily scale well for CPU intensive tasks.

3.
This paradigm has been used with JavaScript because JavaScript has support for callbacks, event loops and closures that make this easy. This isn't necessarily true in other languages.

I might be a little off, but this is the gist of what's happening.




回答2:


Q1. " what does the job of lines 2 & 3 while the rest of the program is being executed?" Answer: "Nothing". Lines 2 and 3 each themselves start their respective jobs, but those jobs cannot be done immediately because (for example) the disk sectors required are not loaded in yet - so the operating system issues a call to the disk to go get those sectors, then "Nothing happens" (node goes on with it's next task) until the disk subsystem (later) issues an interrupt to report they're ready, at which point node returns control to lines #2 and #3.

Q2. single-thread non-blocking dedicates almost no resources to each incoming connection (just some housekeeping data about the connected socket). It's very memory efficient. Traditional web servers "fork" a whole new process to handle each new connection - that means making a humongous copy of every bit of code and data variables needed, and time-slicing the CPU to deal with it all. That's massively wasteful of resources. Thus - if your load is a lot of idle connections waiting for stuff, as was theirs, node makes loads more sense.

Q3. almost every programming language does already have non-blocking I/O if you want to use it. Node is not a programming language, it's a web server that runs javascript and uses non-blocking I/O (eg: I personally wrote my own identical thing 10 years ago in perl, as did google (in C) when they started, and I'm sure loads of other people have similar web servers too). The non-blocking I/O is not the hard part - getting the programmer to understand how to use it is the tricky bit. Javascript happens to work well for that, because those programmers are already familiar with event programming.




回答3:


Even though node.js has been around for a few years, it's performance model is still a bit mysterious.

I recently started a blog and decided that the node.js model would be a good first topic since I wanted to understand it better myself and it would be helpful to others to share what I learned. Here are a couple of articles I wrote that explain the high level concepts and some tradeoffs:

Blocking vs. Non-Blocking I/O – What’s going on?

Understanding node.js Performance



来源:https://stackoverflow.com/questions/18040366/understanding-nodejs-non-blocking-io

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!