ValueError: Number of features of the model must match the input

后端未结

关注

 5  1242

谎友^ 2020-12-14 12:11

I\'m getting this error when trying to predict using a model I built in scikit learn. I know that there are a bunch of questions about this but mine seems different from the

5条回答

时光取名叫无心 (楼主)

2020-12-14 12:57

You can utilize the Categorical Dtype to apply null values to unseen data.

Input:

import pandas as pd
import numpy as np
from pandas.api.types import CategoricalDtype

# Create Example Data
train = pd.DataFrame({"text":["A", "B", "C", "D", 'F', np.nan]})
test = pd.DataFrame({"text":["D", "D", np.nan,"B", "E", "T"]})

# Convert columns to category dtype and specify categories for test set
train['text'] = train['text'].astype('category')
test['text'] = test['text'].astype(CategoricalDtype(categories=train['text'].cat.categories))

# Create Dummies
pd.get_dummies(test['text'], dummy_na=True)

Output:

| A | B | C | D | F | nan |
|---|---|---|---|---|-----|
| 0 | 0 | 0 | 1 | 0 | 0   |
| 0 | 0 | 0 | 1 | 0 | 0   |
| 0 | 0 | 0 | 0 | 0 | 1   |
| 0 | 1 | 0 | 0 | 0 | 0   |
| 0 | 0 | 0 | 0 | 0 | 1   |
| 0 | 0 | 0 | 0 | 0 | 1   |

0 讨论(0)

查看其它5个回答