ValueError: Number of features of the model must match the input

后端 未结 5 1242
谎友^
谎友^ 2020-12-14 12:11

I\'m getting this error when trying to predict using a model I built in scikit learn. I know that there are a bunch of questions about this but mine seems different from the

5条回答
  •  时光取名叫无心
    2020-12-14 12:57

    You can utilize the Categorical Dtype to apply null values to unseen data.

    Input:

    import pandas as pd
    import numpy as np
    from pandas.api.types import CategoricalDtype
    
    # Create Example Data
    train = pd.DataFrame({"text":["A", "B", "C", "D", 'F', np.nan]})
    test = pd.DataFrame({"text":["D", "D", np.nan,"B", "E", "T"]})
    
    # Convert columns to category dtype and specify categories for test set
    train['text'] = train['text'].astype('category')
    test['text'] = test['text'].astype(CategoricalDtype(categories=train['text'].cat.categories))
    
    # Create Dummies
    pd.get_dummies(test['text'], dummy_na=True)
    

    Output:

    | A | B | C | D | F | nan |
    |---|---|---|---|---|-----|
    | 0 | 0 | 0 | 1 | 0 | 0   |
    | 0 | 0 | 0 | 1 | 0 | 0   |
    | 0 | 0 | 0 | 0 | 0 | 1   |
    | 0 | 1 | 0 | 0 | 0 | 0   |
    | 0 | 0 | 0 | 0 | 0 | 1   |
    | 0 | 0 | 0 | 0 | 0 | 1   |
    

提交回复
热议问题