Difference between a linear problem and a non-linear problem? Essence of Dot-Product and Kernel trick

后端 未结 5 1863
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-30 05:30

The kernel trick maps a non-linear problem into a linear problem.

My questions are:
1. What is the main difference between a linear and a non-linear problem? What i

5条回答
  •  清歌不尽
    2021-01-30 06:23

    When people say linear problem with respect to a classification problem, they usually mean linearly separable problem. Linearly separable means that there is some function that can separate the two classes that is a linear combination of the input variable. For example, if you have two input variables, x1 and x2, there are some numbers theta1 and theta2 such that the function theta1.x1 + theta2.x2 will be sufficient to predict the output. In two dimensions this corresponds to a straight line, in 3D it becomes a plane and in higher dimensional spaces it becomes a hyperplane.

    You can get some kind of intuition about these concepts by thinking about points and lines in 2D/3D. Here's a very contrived pair of examples...

    2D scatter plot

    This is a plot of a linearly inseparable problem. There is no straight line that can separate the red and blue points.

    3D scatter plot

    However, if we give each point an extra coordinate (specifically 1 - sqrt(x*x + y*y)... I told you it was contrived), then the problem becomes linearly separable since the red and blue points can be separated by a 2-dimensional plane going through z=0.

    Hopefully, these examples demonstrate part of the idea behind the kernel trick:

    Mapping a problem into a space with a larger number of dimensions makes it more likely that the problem will become linearly separable.

    The second idea behind the kernel trick (and the reason why it is so tricky) is that it is usually very awkward and computationally expensive to work in a very high-dimensional space. However, if an algorithm only uses the dot products between points (which you can think of as distances), then you only have to work with a matrix of scalars. You can implicitly perform the calculations in the higher-dimensional space without ever actually having to do the mapping or handle the higher-dimensional data.

提交回复
热议问题