Can scipy.stats identify and mask obvious outliers?

后端未结

关注

 4  640

终归单人心 2020-12-07 19:44

With scipy.stats.linregress I am performing a simple linear regression on some sets of highly correlated x,y experimental data, and initially visually inspecting each x,y sc

4条回答

隐瞒了意图╮ (楼主)

2020-12-07 19:46
It is also possible to limit the effect of outliers using scipy.optimize.least_squares. Especially, take a look at the f_scale parameter:

Value of soft margin between inlier and outlier residuals, default is 1.0. ... This parameter has no effect with loss='linear', but for other loss values it is of crucial importance.

On the page they compare 3 different functions: the normal least_squares, and two methods involving f_scale:
```
res_lsq =     least_squares(fun, x0, args=(t_train, y_train))
res_soft_l1 = least_squares(fun, x0, loss='soft_l1', f_scale=0.1, args=(t_train, y_train))
res_log =     least_squares(fun, x0, loss='cauchy', f_scale=0.1, args=(t_train, y_train))
```
As can be seen, the normal least squares is a lot more affected by data outliers, and it can be worth playing around with different loss functions in combination with different f_scales. The possible loss functions are (taken from the documentation):
```
‘linear’ : Gives a standard least-squares problem.
‘soft_l1’: The smooth approximation of l1 (absolute value) loss. Usually a good choice for robust least squares.
‘huber’  : Works similarly to ‘soft_l1’.
‘cauchy’ : Severely weakens outliers influence, but may cause difficulties in optimization process.
‘arctan’ : Limits a maximum loss on a single residual, has properties similar to ‘cauchy’.
```
The scipy cookbook has a neat tutorial on robust nonlinear regression.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...