Number of features of the model must match the input. Model n_features is 40 and input n_features is 38

问题

i am getting this error.please give me any suggestion to resolve it.here is my code.i am taking traing data from train.csv and testing data from another file test.csv.i am new to machine learning so i could not understand what is the problem.give me any suggestion.

import quandl,math    
import numpy as np    
import pandas as pd    
import matplotlib.pyplot as plt
from matplotlib import style
import datetime
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import metrics
train = pd.read_csv("train.csv", index_col=None)
test = pd.read_csv("test.csv", index_col=None)
vectorizer = CountVectorizer(min_df=1)
X1 = vectorizer.fit_transform(train['question'])
Y1 = vectorizer.fit_transform(test['testing'])
X=X1.toarray()
Y=Y1.toarray()
#print(Y.shape)
number=LabelEncoder()
train['answer']=number.fit_transform(train['answer'].astype('str'))
features = ['question','answer']
y = train['answer']
clf=RandomForestClassifier(n_estimators=100)
clf.fit(X[:25],y)
predicted_result=clf.predict(Y[17])
p_result=number.inverse_transform(predicted_result)
f = open('output.txt', 'w')
t=str(p_result)
f.write(t)
print(p_result)

回答1:

There are multiple problems with your code. But the thing related to this question is that you are fitting the CountVectorizer (vectorizer) on both train and test data, which is why you are getting different features.

What you should do is:

X1 = vectorizer.fit_transform(train['question'])

# The following line is changed
Y1 = vectorizer.transform(test['testing'])

来源：https://stackoverflow.com/questions/44363682/number-of-features-of-the-model-must-match-the-input-model-n-features-is-40-and

标签

python

machine-learning

scikit-learn

random-forest

sklearn-pandas