The concept of a function originated in mathematics. The mathematical concept of a function is more-or-less a mapping from one set onto another. In this sense it's impossible for functions to have side effects; not because they're "better" that way or because they're specifically defined as to not have side effects, but because the concept of "having side effects" doesn't make any sense with this definition of a function. Mathematical functions aren't a series of steps that execute, so how could any of those steps somehow "affect" other mathematical objects you're talking about?
When people started studying computation, they became interested in machine-implementable algorithms for computing the values of mathematical functions given their inputs. People started talking about computable functions. But functions as implemented in a computer (in imperative languages at least, which are what programmers first worked with) are a series of executable steps, which obviously can have side effects.
So it became natural for programmers to think about functions as algorithms, not as mathematical functions. So then a pure function is one that is purely a mathematical function, to which all the hundreds of years of theory about functions applies, as opposed to the generalised programmer's function, which can't be reasoned about that way.