问题
I have an application. It's a self hosted WCF application. The application connects to an OPC server with OpcNetApi. OpcNetApi internally creates in-process COM component to communicate with an OPC server. WCF exposes methods for synchronous reading from and writing to an OPC server. Application platform target is x86, because we have no 64-bit version of the COM component. Target framework is .NET 4.0. Our application also supports asynchronous reading from an opc server. For this we ask opc server component to notify us if any of items from a list is changed. Notification is configured to be not more than once every 100 ms and at least every 2000 ms even if no items were changed. Notification comes to event handler attached to the event of the opc server class. When we receive notification from the opc server we transforms it and deliver to clients via wcf call.
In production we have 3 instances of the application deployed on 3 different machines. Each instance is connected to its own type of opc server. The problem is that one particular application instance (each time on the same machine, working with same OPC server) stops working.
Stops working means that process is alive, but application is no longer accept wcf calls, none of the clients receive notifications, in the log I have no new entries about application activities. Application stops working all of a sudden. Sometimes it could happen in several days after last restart, sometimes it happens in several weeks. I didn't notice any regularity and I could reproduce it locally neither.
When client calls a wcf method it gets the following exception:
System.TimeoutException: The open operation did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The socket transfer timed out after 00:00:59.9990234. You have exceeded the timeout set on your binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
at System.ServiceModel.Channels.SocketConnection.ReadCore(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout, Boolean closing)
--- End of inner exception stack trace ---
at System.ServiceModel.Channels.SocketConnection.ReadCore(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout, Boolean closing)
at System.ServiceModel.Channels.SocketConnection.Read(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout)
at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.SendPreamble(IConnection connection, ArraySegment`1 preamble, TimeoutHelper& timeoutHelper)
at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.DuplexConnectionPoolHelper.AcceptPooledConnection(IConnection connection, TimeoutHelper& timeoutHelper)
at System.ServiceModel.Channels.ConnectionPoolHelper.EstablishConnection(TimeSpan timeout)
at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpen(TimeSpan timeout)
--- End of inner exception stack trace ---
Server stack trace:
at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at System.ServiceModel.ICommunicationObject.Open()
at MEScontrol.Contracts.ChannelManager.TryToEsteblishChannel[TService](Boolean asyncPattern, ChannelParameters channelParameters, IAvailableServicesManager servicesManager, ServiceClientOptions options, IClientChannel& channel, Exception& error)
at MEScontrol.Contracts.SyncClientProxyBase`1.GetSyncClient()
at IDataCenterServiceProxy.Read(DataCenterConnectionItems[] )
What I also noticed it's that when application stops working number of handles grows steadily up to some limit. Normally application owns about 750 handles. When problem occurs handles number grows up to about 5800. Sometimes it might take 24 hours to reach the limit, sometimes only 30 minutes. I figured out that handles which number grows belong to types: Thread and Event.
Last time I managed to take several application dumps while handles number has been growing. I examined dumpes with WinDbg. I could find that number of Unstarted (and Pending) threads are growing. Number of Unstarted threads is big. In last dump (when handles number stops growing or was about to stop) I have 953 Unstarted threads.
What also strange is that almost all 'Threadpool worker' and 'Threadpool Completion Port' threads seems to be dead. Except the one with ID 7.
What these unstarted threads mean and why they did not start? Why threads of thread pool are dead? What else can I do to find the reason why the application stops working?
Below you will find results I got with WinDbg for one of the dumps in middle of process when handles number was growing :
results of !threads
results of !eestack
results of !dumpheap -type System.Threading.Thread
Update for Thomas:
0:000> !dlk
Examining SyncBlocks...
Scanning for ReaderWriterLock(Slim) instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
No deadlocks detected.
Update:
0:000> !threadpool
CPU utilization: 100%
Worker Thread: Total: 301 Running: 301 Idle: 0 MaxLimit: 1023 MinLimit: 1
Work Request in Queue: 4005
--------------------------------------
Number of Timers: 8
--------------------------------------
Completion Port Thread:Total: 6 Free: 0 MaxFree: 2 CurrentLimit: 6 MaxLimit: 1000 MinLimit: 1
来源:https://stackoverflow.com/questions/36644902/big-number-of-unstarted-threads-in-net-application