plot_decision_regions with error “Filler values must be provided when X has more than 2 training features.”

纵然是瞬间 提交于 2019-12-21 21:29:56

问题


I am plotting 2D plot for SVC Bernoulli output.

converted to vectors from Avg word2vec and standerdised data split data to train and test. Through grid search found the best C and gamma(rbf)

clf = SVC(C=100,gamma=0.0001)

clf.fit(X_train1,y_train)

from mlxtend.plotting import plot_decision_regions



plot_decision_regions(X_train, y_train, clf=clf, legend=2)


plt.xlabel(X.columns[0], size=14)
plt.ylabel(X.columns[1], size=14)
plt.title('SVM Decision Region Boundary', size=16)

Receive error :- ValueError: y must be a NumPy array. Found

also tried to convert the y to numpy. Then it prompts error ValueError: y must be an integer array. Found object. Try passing the array as y.astype(np.integer)

finally i converted it to integer array. Now it is prompting of error. ValueError: Filler values must be provided when X has more than 2 training features.


回答1:


I've spent some time with this too as plot_decision_regions was then complaining ValueError: Column(s) [2] need to be accounted for in either feature_index or filler_feature_values and there's one more parameter needed to avoid this.

So, say, you have 4 features and they come unnamed:

X_train_std.shape[1] = 4

We can refer to each feature by their index 0, 1, 2, 3. You only can plot 2 features at a time, say you want 0 and 2.

You'll need to specify one additional parameter (to those specified in @sos.cott's answer), feature_index, and fill the rest with fillers:

value=1.5
width=0.75

fig = plot_decision_regions(X_train.values, y_train.values, clf=clf,
              feature_index=[0,2],                        #these one will be plotted  
              filler_feature_values={1: value, 3:value},  #these will be ignored
              filler_feature_ranges={1: width, 3: width})



回答2:


You can use PCA to reduce your data multi-dimensional data to two dimensional data. Then pass the obtained result in plot_decision_region and there will be no need of filler values.

from sklearn.decomposition import PCA
from mlxtend.plotting import plot_decision_regions

clf = SVC(C=100,gamma=0.0001)
pca = PCA(n_components = 2)
X_train2 = pca.fit_transform(X_train)
clf.fit(X_train2, y_train)
plot_decision_regions(X_train2, y_train, clf=clf, legend=2)

plt.xlabel(X.columns[0], size=14)
plt.ylabel(X.columns[1], size=14)
plt.title('SVM Decision Region Boundary', size=16)



回答3:


You can just do (Assuming X_train and y_train are still panda dataframes) for the numpy array problem.

plot_decision_regions(X_train.values, y_train.values, clf=clf, legend=2)

For the filler_feature issue, you have to specify the number of features so you do the following:

value=1.5
width=0.75

fig = plot_decision_regions(X_train.values, y_train.values, clf=clf,
                  filler_feature_values={2: value, 3:value, 4:value},
                  filler_feature_ranges={2: width, 3: width, 4:width},
                  legend=2, ax=ax)

You need to add one filler feature for each feature you have.



来源:https://stackoverflow.com/questions/52952310/plot-decision-regions-with-error-filler-values-must-be-provided-when-x-has-more

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!