The Problem
An app I\'m maintaining keeps getting socket timeouts after approximately 21000 ms, despite the fact that I\'ve explicitly set longer ti
According to RFC 1122 (TRANSPORT LAYER -- TCP), section 4.2.3.1 ("Retransmission Timeout Calculation"):
"Implementation also MUST include exponential backoff for successive RTO values for the same segment".
So xpa1492's answer sounds plausible (despite its Windows-specific nature); the implementation of a TCP stack either follows this RFC or gets panned for failing to do so.
By the way, RFC 1122 specifies 3 seconds as the initial timeout, explicitly, making xpa1492's (3 + 6 + 12 = 21) answer sound like the answer to your mystery.
And yes, the Android TCP stack shares a common ancestor with Windows TCP stack; they were both created using RFC 1122 as a guide ("[The Linux TCP stack is] an implementation of the TCP protocol defined in RFC 793, RFC 1122 and RFC 2001 with the NewReno and SACK extensions").
I suspect that your problem is related to radio interference, so you might want to try enabling F-RTO, as you might be hitting the "magic number" repeatedly because of the environment in which you are testing.
It seems like it is a Windows default configuration...
https://social.technet.microsoft.com/Forums/windows/en-US/9e7f59dd-6469-4ade-91ca-ceb5bcaf2675/windows-7-tcp-parameter-tcpmaxconnectretransmissions-and-tcpinitialrtt?forum=w7itpronetworking
Based on the link and some further reading, Windows will by default do 3 retries and double the timeout with each attempt, starting a s 3sec one. So you end up with 3sec + 6sec + 12sec = 21sec timeout.
I wrote a crude test app, based on the code in my question, that simulates a connect timeout by attempting to connect to a non-routable address as suggested in this answer. On my Moto G (Android 4.4.2), it throws a SocketTimeoutException
in approximately 45 seconds as expected. Curiously, if I do not explicitly set the connect timeout, it instead throws a ConnectException
after approximately one minute.
I'm going to write a slightly more sophisticated test app and send it to the customer to try to determine if the device itself is imposing a 21s timeout, or if some router on their mobile network might be the culprit. I'll update this answer with the results.
Result: This appears to be an OS bug that affects the Samsung SPH-P100 (Galaxy Tab 1) from Sprint. I don't have access to a Tab 1 from any other carrier, so this could be blamed on Samsung or Sprint. It does not seem to generally affect Android 2.x, because I have a ZTE X501 running 2.3.6 which allows me to set longer timeouts.