Azure worker role recycling with unhandled Service Bus fault message

I have been running an Azure worker role deployment that uses the Microsoft.ServiceBus 2.2 library to respond to jobs posted from other worker roles and web roles. Recently (suspiciously around the time of the OS update discussed here), the instances of the cluster started constantly recycling, rebooting, running for a short period of time, and then recycling again.

I can confirm that the role instances make it all the way through the OnStart() method of my RoleEntryPoint from the trace messages I have in my diagnostics. Occasionally, the Instances pane of the Azure Management Portal would mention that a recycling role had experienced an "unhandled exception," but would not give more detail. After logging in with remote desktop to one of the instances, the two clues I have are:

Performance counters indicate that \Processor(_Total)\% Processor Time is hovering at 100%, periodically dropping to the mid-80s coinciding with drops in \TCPv4\Connections Established. Some drops in \TCPv4\Connections Established do not correlate with drops in \Processor(_Total)\% Processor Time.
I was able to find, in the Local Server Events in the Server Manager of one of the instances, the following message:

Application: WaWorkerHost.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: Microsoft.ServiceBus.Common.CallbackException Stack: at Microsoft.ServiceBus.Common.Fx+IOCompletionThunk.UnhandledExceptionFrame(UInt32, UInt32, System.Threading.NativeOverlapped*) at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)

There have been no permissions configuration changes associated with the service bus during this time, and this message occurs despite us not having updated any of our VMs. Nonetheless, it also appears that our service is still functioning => jobs are being processed and removed from the Service Bus Queues they are listening to.

Most Googling on these issues turns up suggestions that this is somehow related to IntelliTrace, however, these VMs do not have IntelliTrace enabled on them.

Does anyone have any ideas on what is going on here?

The service bus exceptions turned out to be a red herring from the perspective of the crashing - a namespace conflict in one of the data contracts being sent between two different VM roles that were published at different times. Adding additional tracing to exceptions thrown during one of the receive retries revealed it. Still a mystery as to why it's working at all, and the role recycling has not ceased, just the service bus exception.

I had the similar issue. The main reason is that it could not resolve the Service Bus dll version issues make sure the version you are redirecting in AppSettings and the version you actually added reference to are same. It may occur with any dll mismatches not only with service bus dll...

来源：https://stackoverflow.com/questions/19669355/azure-worker-role-recycling-with-unhandled-service-bus-fault-message

标签

azure

azureservicebus