One-Hot-Encode categorical variables and scale continuous ones simultaneouely

前端 未结 4 1413
情书的邮戳
情书的邮戳 2020-12-24 14:04

I\'m confused because it\'s going to be a problem if you first do OneHotEncoder and then StandardScaler because the scaler will also scale the colu

4条回答
  •  心在旅途
    2020-12-24 14:21

    Sure thing. Just separately scale and one-hot-encode the separate columns as needed:

    # Import libraries and download example data
    from sklearn.preprocessing import StandardScaler, OneHotEncoder
    
    dataset = pd.read_csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
    print(dataset.head(5))
    
    # Define which columns should be encoded vs scaled
    columns_to_encode = ['rank']
    columns_to_scale  = ['gre', 'gpa']
    
    # Instantiate encoder/scaler
    scaler = StandardScaler()
    ohe    = OneHotEncoder(sparse=False)
    
    # Scale and Encode Separate Columns
    scaled_columns  = scaler.fit_transform(dataset[columns_to_scale]) 
    encoded_columns =    ohe.fit_transform(dataset[columns_to_encode])
    
    # Concatenate (Column-Bind) Processed Columns Back Together
    processed_data = np.concatenate([scaled_columns, encoded_columns], axis=1)
    

提交回复
热议问题