One-Hot-Encode categorical variables and scale continuous ones simultaneouely

前端未结

关注

 4  1413

情书的邮戳 2020-12-24 14:04

I\'m confused because it\'s going to be a problem if you first do OneHotEncoder and then StandardScaler because the scaler will also scale the colu

4条回答

心在旅途 (楼主)

2020-12-24 14:21

Sure thing. Just separately scale and one-hot-encode the separate columns as needed:

# Import libraries and download example data
from sklearn.preprocessing import StandardScaler, OneHotEncoder

dataset = pd.read_csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
print(dataset.head(5))

# Define which columns should be encoded vs scaled
columns_to_encode = ['rank']
columns_to_scale  = ['gre', 'gpa']

# Instantiate encoder/scaler
scaler = StandardScaler()
ohe    = OneHotEncoder(sparse=False)

# Scale and Encode Separate Columns
scaled_columns  = scaler.fit_transform(dataset[columns_to_scale]) 
encoded_columns =    ohe.fit_transform(dataset[columns_to_encode])

# Concatenate (Column-Bind) Processed Columns Back Together
processed_data = np.concatenate([scaled_columns, encoded_columns], axis=1)

0 讨论(0)

查看其它4个回答