It's not at all a requirement. In fact, the rectified linear activation function is very useful in large neural networks. Computing the gradient is much faster, and it induces sparsity by setting a minimum bound at 0.
See the following for more details: https://www.academia.edu/7826776/Mathematical_Intuition_for_Performance_of_Rectified_Linear_Unit_in_Deep_Neural_Networks
Edit:
There has been some discussion over whether the rectified linear activation function can be called a linear function.
Yes, it is technically a nonlinear function because it is not linear at the point x=0, however, it is still correct to say that it is linear at all other points, so I don't think it's that useful to nitpick here,
I could have chosen the identity function and it would still be true, but I chose ReLU as an example because of its recent popularity.