fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer

前端未结

关注

 13  1959

时光取名叫无心

I am totally new to Machine Learning and I have been working with unsupervised learning technique.

Image shows my sample Data(After all Cleaning) Screenshot : Sample

相关标签:

13条回答

南旧

2020-12-07 16:45
I believe your example is from the book Hands-On Machine Learning with Scikit-Learn & TensorFlow. Unfortunately, I ran into this problem, as well. A recent change in scikit-learn (0.19.0) changed LabelBinarizer's fit_transform method. Unfortunately, LabelBinarizer was never intended to work how that example uses it. You can see information about the change here and here.

Until they come up with a solution for this, you can install the previous version (0.18.0) as follows:
```
$ pip install scikit-learn==0.18.0
```
After running that, your code should run without issue.

In the future, it looks like the correct solution may be to use a CategoricalEncoder class or something similar to that. They have been trying to solve this problem for years apparently. You can see the new class here and further discussion of the problem here.
0 讨论(0)
发布评论:

提交评论
- 加载中...
感动是毒

2020-12-07 16:45

Forget LaberBinarizer and use OneHotEncoder instead.

In case you use a LabelEncoder before OneHotEncoder to convert categories to integers, you can now use the OneHotEncoder directly.

0 讨论(0)
发布评论:

提交评论
- 加载中...

无人共我

2020-12-07 16:46

I ended up rolling my own

class LabelBinarizer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        X = self.prep(X)
        unique_vals = []
        for column in X.T:
            unique_vals.append(np.unique(column))
        self.unique_vals = unique_vals
    def transform(self, X, y=None):
        X = self.prep(X)
        unique_vals = self.unique_vals
        new_columns = []
        for i, column in enumerate(X.T):
            num_uniq_vals = len(unique_vals[i])
            encoder_ring = dict(zip(unique_vals[i], range(len(unique_vals[i]))))
            f = lambda val: encoder_ring[val]
            f = np.vectorize(f, otypes=[np.int])
            new_column = np.array([f(column)])
            if num_uniq_vals <= 2:
                new_columns.append(new_column)
            else:
                one_hots = np.zeros([num_uniq_vals, len(column)], np.int)
                one_hots[new_column, range(len(column))]=1
                new_columns.append(one_hots)
        new_columns = np.concatenate(new_columns, axis=0).T        
        return new_columns

    def fit_transform(self, X, y=None):
        self.fit(X)
        return self.transform(X)

    @staticmethod
    def prep(X):
        shape = X.shape
        if len(shape) == 1:
            X = X.values.reshape(shape[0], 1)
        return X

Seems to work

lbn = LabelBinarizer()
thingy = np.array([['male','male','female', 'male'], ['A', 'B', 'A', 'C']]).T
lbn.fit(thingy)
lbn.transform(thingy)

returns

array([[1, 1, 0, 0],
       [1, 0, 1, 0],
       [0, 1, 0, 0],
       [1, 0, 0, 1]])

0 讨论(0)

别那么骄傲

2020-12-07 16:50

Simply, what you can do is define following class just before your pipeline:

class NewLabelBinarizer(LabelBinarizer):
    def fit(self, X, y=None):
        return super(NewLabelBinarizer, self).fit(X)
    def transform(self, X, y=None):
        return super(NewLabelBinarizer, self).transform(X)
    def fit_transform(self, X, y=None):
        return super(NewLabelBinarizer, self).fit(X).transform(X)

Then the rest of the code is like the one has mentioned in the book with a tiny modification in cat_pipeline before pipeline concatenation - follow as:

cat_pipeline = Pipeline([
    ("selector", DataFrameSelector(cat_attribs)),
    ("label_binarizer", NewLabelBinarizer())])

You DONE!

0 讨论(0)

离开以前

2020-12-07 16:52
I think you are going through the examples from the book: Hands on Machine Learning with Scikit Learn and Tensorflow. I ran into the same problem when going through the example in Chapter 2.

As mentioned by other people, the problem is to do with sklearn's LabelBinarizer. It takes less args in its fit_transform method compared to other transformers in the pipeline. (only y when other transformers normally take both X and y, see here for details). That's why when we run pipeline.fit_transform, we fed more args into this transformer than required.

An easy fix I used is to just use OneHotEncoder and set the "sparse" to False to ensure the output is a numpy array same as the num_pipeline output. (this way you don't need to code up your own custom encoder)

your original cat_pipeline:
```
cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attribs)),
('label_binarizer', LabelBinarizer())
])
```
you can simply change this part to:
```
cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attribs)),
('one_hot_encoder', OneHotEncoder(sparse=False))
])
```
You can go from here and everything should work.
0 讨论(0)
发布评论:

提交评论
- 加载中...
小蘑菇

2020-12-07 16:52
I got the same issue, and got resolved by using DataFrameMapper (need to install sklearn_pandas):
```
from sklearn_pandas import DataFrameMapper
cat_pipeline = Pipeline([
    ('label_binarizer', DataFrameMapper([(cat_attribs, LabelBinarizer())])),
])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页