I want to create my own transformer for use with the sklearn Pipeline. Hence I am creating a class that implements both fit and transform methods. The purpose of the transfo
You can solve this easily by using the sklearn.preprocessing.FunctionTransformer method (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html)
You just need to put your alternations to X in a function
def drop_nans(X, y=None):
total = X.shape[1]
new_thresh = total - thresh
df = pd.DataFrame(X)
df.dropna(thresh=new_thresh, inplace=True)
return df.values
then you get your transformer by calling
transformer = FunctionTransformer(drop_nans, validate=False)
which you can use in the pipeline. The threshold can be set outside the drop_nans function.