Decode pandas dataframe

问题

i have a encoded dataframe. I encode it with the labelEncoder from scitkit-learn, create a machine learning model and done some predictions. But now i cannot decode the values in the pandas dataframe for the outputs. I tried it several times with inverse_transform from the doc but still i get everytime errors like

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

Thats what my dataframe look like:

    0   147 14931   9   0   0   1   0   0   0   4   ... 0   0   242 677 0   94  192 27  169 20
    1   146 14955   15  1   0   0   0   0   0   0   ... 0   1   63  42  0   94  192 27  169 20
    2   145 15161   25  1   0   0   0   1   0   5   ... 0   0   242 677 0   94  192 27  169 20

Thats the code how i encode it if it is necessary:

labelEncoder = preprocessing.LabelEncoder()
for col in b.columns:
    b[col] = labelEncoder.fit_transform(b[col])

The column names are unnecessary. I also tried it with the lambda function, which is shown in another question here but still it doesnt work. What im doing wrong? Thanks for help!

Edit: After Vivek Kumars Code implementation i get the following error:

KeyError: 'Predicted_Values'

Thats a column i added to the dataframe just to represent the predicted values. I do that in the following way:

b = pd.concat([X_test, y_test], axis=1)  # features and actual predicted values
b['Predicted_Values'] = y_predict

Thats how i drop the column from the dataframe that will be on the y-axis and choose fit the estimator:

from sklearn.cross_validation import train_test_split
X = b.drop(['Activity_Profile'],axis=1)
y = b['Activity_Profile']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state=0)
model = tree.DecisionTreeClassifier()
model = model.fit(X_train, y_train)

回答1:

You can look at my answer here to know the proper usage of LabelEncoder for multiple columns:-

Why does sklearn preprocessing LabelEncoder inverse_transform apply from only one column?

The explanation is that LabelEncoder only supports single dimension as input. So for each column, you need to have a different labelEncoder object which can then be used to inverse transform that particular column only.

You can use a dictionary of labelencoder objects for convertig multiple columns. Something like this:

labelencoder_dict = {}
for col in b.columns:
    labelEncoder = preprocessing.LabelEncoder()
    b[col] = labelEncoder.fit_transform(b[col])
    labelencoder_dict[col]=labelEncoder

While decoding, you can just use:

for col in b.columns:
    b[col] = labelencoder_dict[col].inverse_transform(b[col])

Update:-

Now that you have added the column which you are using as y, here's how you can decode it (assuming you have added the 'Predicted_Values' column to the dataframe):

for col in b.columns:
    # Skip the predicted column here
    if col != 'Predicted_valu‌es':
        b[col] = labelencoder_dict[col].inverse_transform(b[col])

# Use the original `y (Activity_Profile)` encoder on predicted data
b['Predicted_valu‌es'] = labelencoder_dict['Activity_Profile'].inverse_transfo‌rm(
                                                      b['Predicted_valu‌es'])

来源：https://stackoverflow.com/questions/47217821/decode-pandas-dataframe

标签

python

pandas

scikit-learn

decode