I am currently trying to parallelise the \'scatter\' and \'connect\' part of 2D Transmission Line Matrix(TLM) in CUDA Programming. I require only to use x-dimension threads.