I found a code that would solve my problem that looks like this:
(self.conv_diag(input_tensor.diagonal(dim1=2, dim2=3))).diag_embed(dim1=2, dim2=3)