Difference between a linear problem and a non-linear problem? Essence of Dot-Product and Kernel trick

后端未结

关注

 5  1863

爱一瞬间的悲伤 2021-01-30 05:30

The kernel trick maps a non-linear problem into a linear problem.

My questions are:
1. What is the main difference between a linear and a non-linear problem? What i

5条回答

清歌不尽 (楼主)

2021-01-30 06:23

When people say linear problem with respect to a classification problem, they usually mean linearly separable problem. Linearly separable means that there is some function that can separate the two classes that is a linear combination of the input variable. For example, if you have two input variables, x1 and x2, there are some numbers theta1 and theta2 such that the function theta1.x1 + theta2.x2 will be sufficient to predict the output. In two dimensions this corresponds to a straight line, in 3D it becomes a plane and in higher dimensional spaces it becomes a hyperplane.

You can get some kind of intuition about these concepts by thinking about points and lines in 2D/3D. Here's a very contrived pair of examples...

This is a plot of a linearly inseparable problem. There is no straight line that can separate the red and blue points.

However, if we give each point an extra coordinate (specifically 1 - sqrt(x*x + y*y)... I told you it was contrived), then the problem becomes linearly separable since the red and blue points can be separated by a 2-dimensional plane going through z=0.

Hopefully, these examples demonstrate part of the idea behind the kernel trick:

Mapping a problem into a space with a larger number of dimensions makes it more likely that the problem will become linearly separable.

The second idea behind the kernel trick (and the reason why it is so tricky) is that it is usually very awkward and computationally expensive to work in a very high-dimensional space. However, if an algorithm only uses the dot products between points (which you can think of as distances), then you only have to work with a matrix of scalars. You can implicitly perform the calculations in the higher-dimensional space without ever actually having to do the mapping or handle the higher-dimensional data.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...