How to understand “All threads in a warp execute the same instruction at the same time.” in GPU?
问题 I am reading Professional CUDA C Programming, and in GPU Architecture Overview section: CUDA employs a Single Instruction Multiple Thread (SIMT) architecture to manage and execute threads in groups of 32 called warps. All threads in a warp execute the same instruction at the same time. Each thread has its own instruction address counter and register state, and carries out the current instruction on its own data. Each SM partitions the thread blocks assigned to it into 32-thread warps that it