I have an application that reads large files from a server and hangs frequently on a particular machine. It has worked successfully under RHEL5.2 for a long time. We have
Consider these two possible rules:
The receiver may wait for the sender to send more before receiving what has already been sent.
The sender may wait for the receiver to receive what has already been sent before sending more.
We can have either of these rules, but we cannot have both of these rules.
Why? Because if the receiver is permitted to wait for the sender, that means the sender cannot wait for the receiver to receive before sending more, otherwise we deadlock. And if the sender is permitted to wait for the receiver, that means the receiver cannot wait for the sender to send before receiving more, otherwise we deadlock.
If both of these things happen at the same time, we deadlock. The sender will not send more until the receiver receives what has already been sent, and the receiver will not receive what has already been sent unless the sender send more. Boom.
TCP chooses rule 2 (for reasons that should be obvious). Thus it cannot support rule 1. But in your code, you are the receiver, and you are waiting for the sender to send more before you receive what has already been sent. So this will deadlock.
MSG_WAITALL
should block until all data has been received. From the manual page on recv:
This flag requests that the operation block until the full request is satisfied.
However, the buffers in the network stack probably are not large enough to contain everything, which is the reason for the error messages on the server. The client network stack simply can't hold that much data.
The solution is either to increase the buffer sizes (SO_RCVBUF
option to setsockopt
), split the message into smaller pieces, or receiving smaller chunks putting it into your own buffer. The last is what I would recommend.
Edit: I see in your code that you already do what I suggested (read smaller chunks with own buffering,) so just remove the MSG_WAITALL
flag and it should work.
Oh, and when recv
returns zero, that means the other end have closed the connection, and that you should do it too.