问题
i have a encoded dataframe. I encode it with the labelEncoder from scitkit-learn, create a machine learning model and done some predictions. But now i cannot decode the values in the pandas dataframe for the outputs. I tried it several times with inverse_transform from the doc but still i get everytime errors like
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`
Thats what my dataframe look like:
0 147 14931 9 0 0 1 0 0 0 4 ... 0 0 242 677 0 94 192 27 169 20
1 146 14955 15 1 0 0 0 0 0 0 ... 0 1 63 42 0 94 192 27 169 20
2 145 15161 25 1 0 0 0 1 0 5 ... 0 0 242 677 0 94 192 27 169 20
Thats the code how i encode it if it is necessary:
labelEncoder = preprocessing.LabelEncoder()
for col in b.columns:
b[col] = labelEncoder.fit_transform(b[col])
The column names are unnecessary. I also tried it with the lambda function, which is shown in another question here but still it doesnt work. What im doing wrong? Thanks for help!
Edit: After Vivek Kumars Code implementation i get the following error:
KeyError: 'Predicted_Values'
Thats a column i added to the dataframe just to represent the predicted values. I do that in the following way:
b = pd.concat([X_test, y_test], axis=1) # features and actual predicted values
b['Predicted_Values'] = y_predict
Thats how i drop the column from the dataframe that will be on the y-axis and choose fit the estimator:
from sklearn.cross_validation import train_test_split
X = b.drop(['Activity_Profile'],axis=1)
y = b['Activity_Profile']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state=0)
model = tree.DecisionTreeClassifier()
model = model.fit(X_train, y_train)
回答1:
You can look at my answer here to know the proper usage of LabelEncoder for multiple columns:-
Why does sklearn preprocessing LabelEncoder inverse_transform apply from only one column?
The explanation is that LabelEncoder only supports single dimension as input. So for each column, you need to have a different labelEncoder object which can then be used to inverse transform that particular column only.
You can use a dictionary of labelencoder objects for convertig multiple columns. Something like this:
labelencoder_dict = {}
for col in b.columns:
labelEncoder = preprocessing.LabelEncoder()
b[col] = labelEncoder.fit_transform(b[col])
labelencoder_dict[col]=labelEncoder
While decoding, you can just use:
for col in b.columns:
b[col] = labelencoder_dict[col].inverse_transform(b[col])
Update:-
Now that you have added the column which you are using as y
, here's how you can decode it (assuming you have added the 'Predicted_Values' column to the dataframe):
for col in b.columns:
# Skip the predicted column here
if col != 'Predicted_values':
b[col] = labelencoder_dict[col].inverse_transform(b[col])
# Use the original `y (Activity_Profile)` encoder on predicted data
b['Predicted_values'] = labelencoder_dict['Activity_Profile'].inverse_transform(
b['Predicted_values'])
来源:https://stackoverflow.com/questions/47217821/decode-pandas-dataframe