performance

Performance cost of SetWinEventHook vs Polling in c#?

这一生的挚爱 提交于 2021-02-10 06:15:41
问题 I am developing a time tracking application, it monitors window changes and user idleness. My question is: which one costs more in terms of performance, wasting system resources: using SetWinEventHook (EVENT_SYSTEM_FOREGROUND), or setting a Timer.Tick and check if the active window title changed user32.dll GetForegroundWindow() and GetWindowText() all the time) ? For testing user idleness I already figured out that using low level mouse and keyboard hooks are more expensive than calling

Understanding `_mm_prefetch`

雨燕双飞 提交于 2021-02-10 00:24:39
问题 The answer What are _mm_prefetch() locality hints? goes into details on what the hint means. My question is: which one do I WANT ? I work on a function that is called repeatedly, billions of times, with some int parameter among others. First thing I do is to look up some cached value using that parameter (its low 32 bits) as a key into 4GB cache. Based on the algorithm from where this function is called, I know that most often that key will be doubled (shifted left by 1 bit) from one call to

Understanding `_mm_prefetch`

谁说我不能喝 提交于 2021-02-09 23:58:55
问题 The answer What are _mm_prefetch() locality hints? goes into details on what the hint means. My question is: which one do I WANT ? I work on a function that is called repeatedly, billions of times, with some int parameter among others. First thing I do is to look up some cached value using that parameter (its low 32 bits) as a key into 4GB cache. Based on the algorithm from where this function is called, I know that most often that key will be doubled (shifted left by 1 bit) from one call to

Understanding `_mm_prefetch`

人走茶凉 提交于 2021-02-09 23:57:45
问题 The answer What are _mm_prefetch() locality hints? goes into details on what the hint means. My question is: which one do I WANT ? I work on a function that is called repeatedly, billions of times, with some int parameter among others. First thing I do is to look up some cached value using that parameter (its low 32 bits) as a key into 4GB cache. Based on the algorithm from where this function is called, I know that most often that key will be doubled (shifted left by 1 bit) from one call to

Understanding `_mm_prefetch`

牧云@^-^@ 提交于 2021-02-09 23:51:14
问题 The answer What are _mm_prefetch() locality hints? goes into details on what the hint means. My question is: which one do I WANT ? I work on a function that is called repeatedly, billions of times, with some int parameter among others. First thing I do is to look up some cached value using that parameter (its low 32 bits) as a key into 4GB cache. Based on the algorithm from where this function is called, I know that most often that key will be doubled (shifted left by 1 bit) from one call to

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

拜拜、爱过 提交于 2021-02-09 04:37:06
问题 In the general case, how can an instruction that can take memory or register operands ever be slower with memory operands then mov + mov -> instruction -> mov + mov Based on the throughput and latency found in Agner Fog's instruction tables (looking at Skylake in my case, p238) I see that the following numbers for the btr/bts instructions: instruction, operands, uops fused domain, uops unfused domain, latency, throughput mov r,r 1 1 0-1 .25 mov m,r 1 2 2 1 mov r,m 1 1 2 .5 ... bts/btr r,r 1 1

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

安稳与你 提交于 2021-02-09 04:34:53
问题 In the general case, how can an instruction that can take memory or register operands ever be slower with memory operands then mov + mov -> instruction -> mov + mov Based on the throughput and latency found in Agner Fog's instruction tables (looking at Skylake in my case, p238) I see that the following numbers for the btr/bts instructions: instruction, operands, uops fused domain, uops unfused domain, latency, throughput mov r,r 1 1 0-1 .25 mov m,r 1 2 2 1 mov r,m 1 1 2 .5 ... bts/btr r,r 1 1

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

狂风中的少年 提交于 2021-02-09 04:33:48
问题 In the general case, how can an instruction that can take memory or register operands ever be slower with memory operands then mov + mov -> instruction -> mov + mov Based on the throughput and latency found in Agner Fog's instruction tables (looking at Skylake in my case, p238) I see that the following numbers for the btr/bts instructions: instruction, operands, uops fused domain, uops unfused domain, latency, throughput mov r,r 1 1 0-1 .25 mov m,r 1 2 2 1 mov r,m 1 1 2 .5 ... bts/btr r,r 1 1

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

大憨熊 提交于 2021-02-09 04:33:11
问题 In the general case, how can an instruction that can take memory or register operands ever be slower with memory operands then mov + mov -> instruction -> mov + mov Based on the throughput and latency found in Agner Fog's instruction tables (looking at Skylake in my case, p238) I see that the following numbers for the btr/bts instructions: instruction, operands, uops fused domain, uops unfused domain, latency, throughput mov r,r 1 1 0-1 .25 mov m,r 1 2 2 1 mov r,m 1 1 2 .5 ... bts/btr r,r 1 1

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

别说谁变了你拦得住时间么 提交于 2021-02-09 04:31:34
问题 In the general case, how can an instruction that can take memory or register operands ever be slower with memory operands then mov + mov -> instruction -> mov + mov Based on the throughput and latency found in Agner Fog's instruction tables (looking at Skylake in my case, p238) I see that the following numbers for the btr/bts instructions: instruction, operands, uops fused domain, uops unfused domain, latency, throughput mov r,r 1 1 0-1 .25 mov m,r 1 2 2 1 mov r,m 1 1 2 .5 ... bts/btr r,r 1 1