问题
I've got a fairly complicated NodeJS application which manages various operations on a server. It listens for messages on a PubNub channel, and makes several requests to 3rd-party APIs. It also has a timer (using setTimeout) which runs every second to check that all relevant process are still running and healthy.
Recently I've noticed some strange behaviour. Under certain circumstances (specific, but 100% reproduceable) it goes into a strange state where all network I/O stops working. PubNub updates stop being received, and any HTTP requests made (using the request-promise-native library) will timeout. This state appears to be permanent once the application has entered it, as I have tried running requests with very long timeouts and they just never resolve.
I've been reading online about how it's possible to starve IO by stalling the Event Loop, often using long-running operations on the main thread, process.nextTick or bad usage of promises. I haven't got any of these in my application.
But even weirder, I've read that if you block the Event Loop, all timers also stop working. I use setTimeout, setInterval and setIntermediate within the app and they all continue working even when the application has entered this state. So my Event Loop is obviously still running.
I think the issue is most likely caused by my usage of named pipes. Part of my application creates a named pipe and then reads it as a stream:
let pipePath = "/tmp/myPipe";
await promisify(fs.unlink)(pipePath)
.then(function() {}, function(err) {console.warn("Error removing pipe", err);}); // Ignore errors which may happen if the pipe doesn't exist
childProcess.spawnSync('mkfifo', [pipePath]);
// (not shown) here it starts a 3rd party application that writes to the pipe
let pipeHandle = await promisify(fs.open)(pipePath, O_RDWR); // https://stackoverflow.com/a/53492514/674675
let stream = fs.createReadStream(null, {fd: pipeHandle, autoClose: false});
// (not shown) here several event listeners are added to stream's data event
I also detect if the 3rd-party application is terminated for any reason, and when it is I clean up the pipe:
// (not-shown) here all the listeners on the stream are removed using removeListener
stream.close(); // This also seems to close the handle to the file
The state with frozen IO occurs if the 3rd-party application is killed without ever writing to the pipe. But really weirdly, the frozen IO state is only entered after the fourth time that this happens. So I can create a pipe, launch the third-party application, read the pipe and kill the application before it writes any data 3 times, but the 4th will somehow break NodeJS.
I'm using Node v8.10.0. Does anyone have any idea what's going on?
Things I've tried: (without success)
I didn't trust the steam.close()
closing the file handle, so I tried this alternative:
// (not-shown) here all the listeners on the stream are removed using removeListener
stream.pause();
await promisify(fs.close)(pipeHandle);
I updated to Node v11.8.0.
I switched from using fs
to graceful-fs
.
I figured out how to connect Chrome's DevTools to the process and have found that the exact moment of the bug appears to be when fs.unlink is called before creating the fifth pipe. The promise for fs.unlink is never resolved or rejected. Still not sure why this is happening, but at least this is progress.
I tried replacing the fs.unlink call with spawnSync("rm", [pipePath])
. This allows the code to progress past the call, but the fs.open
operation failed instead. I also tried running fs.openSync
and that froze the entire program. So it seems that after the fourth process is killed, the app for some reason can't do any more IO and the next fs
call blocks indefinitely.
I stepped through the program and used lsof
in a different screen to check the file descriptors at every interesting part of the process. Usually there's a file descriptor to my temporary file after the pipe is created and I start reading it, and then once I close the stream the descriptor is gone. But after closing the stream for the fourth time, the descriptor is still there. It's always the fourth time. What's the significance of 4? I'm starting to become tetraphobic.
Tried changing autoClose to true.
Attempted to create a minimal example that reproduces the issue. Didn't exactly reproduce the issue, but maybe found something related? Question here
Used lslocks
to see if the process is waiting for a lock. (It's not)
Switched to using fifo-js to handle the pipes. This fixed the issue! But unfortunately it mangles the binary data coming through the pipe, so it's unusable.
来源:https://stackoverflow.com/questions/54648248/nodejs-io-starved-but-timers-still-running