What is the most efficient way to implement a convolution filter within a pixel shader?
问题 Implementing convolution in a pixel shader is somewhat costly as to the very high number of texture fetches. A direct way of implementing a convolution filter is to make N x N lookups per fragment using two for cycles per fragment. A simple calculation says that a 1024x1024 image blurred with a 4x4 Gaussian kernel would need 1024 x 1024 x 4 x 4 = 16M lookups. What can one do about this? Can one use some optimization that would need less lookups? I am not interested in kernel-specific