To answer the second part of your question, when control flow diverges at the if statement, the threads where threadIdx.x != 0 simply wait to at the convergence point after the if statement. They do not go on to the printf statement until thread 0 has completed the if block.