In a paper dealing with clean speech and noisy audio, I read the following line:
In detail, at each training epoch, we first convolve speech and noise wi