I am trying to implement AES-256 in CTR mode using nVidia CUDA. I have successfully coded CPU code for key expansion and now I need to implement the actual AES-256 algorithm
The T tables are a straightforward description of the AES round transformation in matrix form. To build them, see the original Rijndael NIST proposal, section 5.2.1.