cpu

Why are 50 threads faster than 4?

依然范特西╮ 提交于 2019-11-30 04:41:41
DWORD WINAPI MyThreadFunction(LPVOID lpParam) { volatile auto x = 1; for (auto i = 0; i < 800000000 / MAX_THREADS; ++i) { x += i / 3; } return 0; } This function is run in MAX_THREADS threads. I have run the tests on Intel Core 2 Duo , Windows 7 , MS Visual Studio 2012 using Concurrency Visualizer with MAX_THREADS=4 and MAX_THREADS=50 . test1 (4 threads) completed in 7.1 seconds , but test2 (50 threads) completed in 5.8 seconds while test1 has more context switches than test2 . I have run the same tests on Intel Core i5 , Mac OS 10.7.5 and got the same results. I decided to benchmark this

branch prediction vs branch target prediction

两盒软妹~` 提交于 2019-11-30 03:36:49
问题 Have I understood this right, if statements are more dependent on branch prediction and v-table look-up is more dependent on branch target prediction? Regarding v-tables, there is no "branch prediction", just the target prediction? Trying to understand how a v-table is processed by the CPU. 回答1: Branch prediction is predicting whether or not the branch will be taken . Branch target prediction is prediction where the branch is going to. These two things are independent and can occur in all

How does a CPU idle (or run below 100%)?

て烟熏妆下的殇ゞ 提交于 2019-11-30 03:14:43
问题 I first learned about how computers work in terms of a primitive single stored program machine. Now I'm learning about multitasking operating systems, scheduling, context switching, etc. I think I have a fairly good grasp of it all, except for one thing. I have always thought of a CPU as something which is just charging forward non-stop. It always knows where to go next (program counter), and it goes to that instruction, etc, ad infinitum. Clearly this is not the case since my desktop

How is it possible to read the CPU registers using a debugger running on the same CPU?

点点圈 提交于 2019-11-30 02:58:52
问题 As I was learning about assembly, I used GDB the following way: gdb ./a.out (a is a compiled C script that only prints hello world) break main run info registers Why can I see the registers used by my program when I am myself using the same CPU to print the registers? Shouldn't the use of GDB (or operating system) overwrite the registers and only show me the overwritten registers? The only answer I can think of is the fact that my CPU is dual-core and that one of the cores is being used and

How to determine CPU and memory consumption from inside a process?

为君一笑 提交于 2019-11-30 02:16:46
I once had the task of determining the following performance parameters from inside a running application: Total virtual memory available Virtual memory currently used Virtual memory currently used by my process Total RAM available RAM currently used RAM currently used by my process % CPU currently used % CPU currently used by my process The code had to run on Windows and Linux. Even though this seems to be a standard task, finding the necessary information in the manuals (WIN32 API, GNU docs) as well as on the Internet took me several days, because there's so much incomplete/incorrect

Python - get process names,CPU,Mem Usage and Peak Mem Usage in windows

倾然丶 夕夏残阳落幕 提交于 2019-11-30 01:52:44
I am wanting to get a list of all the process names, CPU, Mem Usage and Peak Mem Usage. I was hoping I could use ctypes. but I am happy to hear any other options. Thanks for your time. Bakuriu You can use psutil . For example, to obtain the list of process names: process_names = [proc.name() for proc in psutil.process_iter()] For info about the CPU use psutil.cpu_percent or psutil.cpu_times . For info about memory usage use psutil.virtual_memory . Note that psutil works with Linux, OS X, Windows, Solaris and FreeBSD and with python 2.4 through 3.3. I like using wmic on Windows. You can run it

Branch target prediction in conjunction with branch prediction?

落爺英雄遲暮 提交于 2019-11-30 01:42:07
EDIT: My confusion arises because surely by predicting which branch is taken, you are effectively doing the target prediction too?? This question is intrinsically linked to my first question on the topic: branch prediction vs branch target prediction Looking at the accepted answer: Unconditional branch, fixed target Infinite loop goto statement break or continue statement End of the 'then' clause of an if/else statement (to jump past the else clause) Non-virtual function call Unconditional branch, variable target Returning from a function Virtual function call Function pointer call switch

Assigning a cpu core to a process - Linux

倾然丶 夕夏残阳落幕 提交于 2019-11-30 01:29:42
问题 Is there any way to force a process with specific PID, to be executed and run on only one of the cpu s of a server? I know that there is a command like this taskset -cp <Cpu_Number> <Pid> but the above command does not work on my system. So please let me know if there is any other command. 回答1: There are two ways of assigning cpu core/cores to a running process. First method: taskset -cp 0,4 9030 Pretty clear ! assigning cpu cores 0 and 4 to the pid 9030. Second Method: taskset -p 0x11 9030

Why does CPU access memory on a word boundary?

非 Y 不嫁゛ 提交于 2019-11-29 22:25:49
I heard a lot that data should be properly aligned in memory for better access efficiency. CPU access memory on a word boundary. So in the following scenario, the CPU has to make 2 memory accesses to get a single word. Supposing: 1 word = 4 bytes ("|" stands for word boundary. "o" stands for byte boundary) |----o----o----o----|----o----o----o----| (The word boundary in CPU's eye) ----o----o----o---- (What I want to read from memory) Why should this happen? What's the root cause of the CPU can only read at the word boundary? If the CPU can only access at the 4-byte word boundary, the address

Is integer multiplication really done at the same speed as addition on a modern CPU?

六眼飞鱼酱① 提交于 2019-11-29 20:53:59
I hear this statement quite often, that multiplication on modern hardware is so optimized that it actually is at the same speed as addition. Is that true? I never can get any authoritative confirmation. My own research only adds questions. The speed tests usually show data that confuses me. Here is an example: #include <stdio.h> #include <sys/time.h> unsigned int time1000() { timeval val; gettimeofday(&val, 0); val.tv_sec &= 0xffff; return val.tv_sec * 1000 + val.tv_usec / 1000; } int main() { unsigned int sum = 1, T = time1000(); for (int i = 1; i < 100000000; i++) { sum += i + (i+1); sum++;