Custom transformer for sklearn Pipeline that alters both X and y

前端未结

关注

 3  2025

I want to create my own transformer for use with the sklearn Pipeline. Hence I am creating a class that implements both fit and transform methods. The purpose of the transfo

相关标签:

3条回答

抹茶落季

2020-12-15 07:20

Modifying the sample axis, e.g. removing samples, does not (yet?) comply with the scikit-learn transformer API. So if you need to do this, you should do it outside any calls to scikit learn, as preprocessing.

As it is now, the transformer API is used to transform the features of a given sample into something new. This can implicitly contain information from other samples, but samples are never deleted.

Another option is to attempt to impute the missing values. But again, if you need to delete samples, treat it as preprocessing before using scikit learn.

0 讨论(0)
发布评论:

提交评论
- 加载中...
伪装坚强ぢ

2020-12-15 07:22
You can solve this easily by using the sklearn.preprocessing.FunctionTransformer method (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html)

You just need to put your alternations to X in a function
```
def drop_nans(X, y=None):
    total = X.shape[1]                                           
    new_thresh = total - thresh
    df = pd.DataFrame(X)
    df.dropna(thresh=new_thresh, inplace=True)
    return df.values
```
then you get your transformer by calling
```
transformer = FunctionTransformer(drop_nans, validate=False)
```
which you can use in the pipeline. The threshold can be set outside the drop_nans function.
0 讨论(0)
发布评论:

提交评论
- 加载中...
挽巷

2020-12-15 07:26
Use "deep-copies" further on, down the pipeline and X, y remain protected

.fit() can first assign on each call deep-copy to new class-variables
```
self.X_without_NaNs = X.copy()
self.y_without_NaNs = y.copy()
```
and then reduce / transform these not to have more NaN-s than ordered by self.treshold
0 讨论(0)
发布评论:

提交评论
- 加载中...