Suppose we\'re given a prior on X (e.g. X ~ Gaussian) and a forward operator y = f(x). Suppose further we have observed y
The problem is that your function, $y = x^2$, is not one-to-one. Specifically, because you lose all information about the sign of X when you square it, there is no way to tell from your Y values whether you originally used 2 or -2 to generate the data. If you create a histogram of your trace for X after just the first iteration, you will see this:
This distribution has 2 modes, one at 2 (your true value) and one at -2. At the next iteration, x.mean() will be close to zero (averaging over the bimodal distribution), which is obviously not what you want.