Worse multithreaded performance on better system (possibly due to Deedle) [closed]

大兔子大兔子 提交于 2019-12-13 17:26:36

问题


We are dealing with a multithreaded C# service using Deedle. Tests on a quad-core current system versus an octa-core target system show that the service is about two times slower on the target system instead of two times faster. Even when restricting the number of threads to two, the target system is still almost 40% slower.

Analysis shows a lot of waiting in Deedle(/F#), making the target system basically run on two core. Non-Deedle test programs show normal behaviour and superiour memory bandwidth on the target system.

Any ideas on what could cause this or how to best approach this situation?

EDIT: It seems most of the time waiting is done in calls to Invoke.


回答1:


The problem turned out to be a combination of using Windows 7, .NET 4.5 (or actually the 4.0 runtime) and the heavy use of tail recursion in F#/Deedle.

Using Visual Studio's Concurrency Visualizer, I already found that most time is spent waiting in Invoke calls. On closer inspection, these result in the following call trace:

ntdll.dll:RtlEnterCriticalSection
ntdll.dll:RtlpLookupDynamicFunctionEntry
ntdll.dll:RtlLookupFunctionEntry
clr.dll:JIT_TailCall
<some Deedle/F# thing>.Invoke

Searching for these function gave multiple articles and forum threads indicating that using F# can result in a lot of calls to JIT_TailCall and that .NET 4.6 has a new JIT compiler that seems to deal with some issues relating to these calls. Although I didn't find anything mentioning problems relating to locking/synchronisation, this did give me the idea that updating to .NET 4.6 might be a solution.

However, on my own Windows 8.1 system that also uses .NET 4.5, the problem doesn't occur. After searching a bit for similar Invoke calls, I found that the call trace on this system looked as follows:

ntdll.dll:RtlAcquireSRWLockShared
ntdll.dll:RtlpLookupDynamicFunctionEntry
ntdll.dll:RtlLookupFunctionEntry
clr.dll:JIT_TailCall
<some Deedle/F# thing>.Invoke

Apparently, in Windows 8(.1) the locking mechanism was changed to something less strict, which resulted in a lot less need for waiting for the lock.

So only with the target system's combination of both Windows 7's strict locking and .NET 4.5's less efficient JIT compiler, did F#'s heavy usage of tail recursion cause problems. After updating to .NET 4.6, the problem disappeared and our service is running as expected.



来源:https://stackoverflow.com/questions/38304137/worse-multithreaded-performance-on-better-system-possibly-due-to-deedle

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!