I\'m using cuda to develop image processing program, I found the efficiency of memory read and write are different. According to the mannual, the global memory are read by 3