I am currently reading the Machine Learning book by Tom Mitchell. When talking about neural networks, Mitchell states:
\"Although the perceptron rule
Look at the following two data sets:
^ ^
| X O | AA /
| | A /
| | / B
| O X | A / BB
| | / B
+-----------> +----------->
The left data set is not linearly separable (without using a kernel). The right one is separable into two parts for A' andB` by the indicated line.
I.e. You cannot draw a straight line into the left image, so that all the X are on one side, and all the O are on the other. That is why it is called "not linearly separable" == there exist no linear manifold separating the two classes.
Now the famous kernel trick (which will certainly be discussed in the book next) actually allows many linear methods to be used for non-linear problems by virtually adding additional dimensions to make a non-linear problem linearly separable.