Python: Scaling numbers column by column with pandas

后端 未结 6 930
不知归路
不知归路 2020-12-13 08:54

I have a Pandas data frame \'df\' in which I\'d like to perform some scalings column by column.

  • In column \'a\', I need the maximum number to be 1, the minimum
相关标签:
6条回答
  • 2020-12-13 09:36

    This is not very elegant but the following works for this two column case:

    #Create dataframe
    df = pd.DataFrame({'A':[14,90,90,96,91], 'B':[103,107,110,114,114]})
    
    #Apply operates on each row or column with the lambda function
    #axis = 0 -> act on columns, axis = 1 act on rows
    #x is a variable for the whole row or column
    #This line will scale minimum = 0 and maximum = 1 for each column
    df2 = df.apply(lambda x:(x.astype(float) - min(x))/(max(x)-min(x)), axis = 0)
    
    #Want to now invert the order on column 'B'
    #Use apply function again, reverse numbers in column, select column 'B' only and 
    #reassign to column 'B' of original dataframe
    df2['B'] = df2.apply(lambda x: 1-x, axis = 1)['B']
    

    If I find a more elegant way (for example, using the column index: (0 or 1)mod 2 - 1 to select the sign in the apply operation so it can be done with just one apply command, I'll let you know.

    0 讨论(0)
  • 2020-12-13 09:37

    This is how you can do it using sklearn and the preprocessing module. Sci-Kit Learn has many pre-processing functions for scaling and centering data.

    In [0]: from sklearn.preprocessing import MinMaxScaler
    
    In [1]: df = pd.DataFrame({'A':[14,90,90,96,91],
                               'B':[103,107,110,114,114]}).astype(float)
    
    In [2]: df
    Out[2]:
        A    B
    0  14  103
    1  90  107
    2  90  110
    3  96  114
    4  91  114
    
    In [3]: scaler = MinMaxScaler()
    
    In [4]: df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
    
    In [5]: df_scaled
    Out[5]:
              A         B
    0  0.000000  0.000000
    1  0.926829  0.363636
    2  0.926829  0.636364
    3  1.000000  1.000000
    4  0.939024  1.000000
    
    0 讨论(0)
  • 2020-12-13 09:41

    I think Acumenus' comment in this answer, should be mentioned explicitly as an answer, as it is a one-liner.

    >>> import pandas as pd
    >>> from sklearn.preprocessing import minmax_scale
    >>> df = pd.DataFrame({'A':[14,90,90,96,91], 'B':[103,107,110,114,114]})
    >>> minmax_scale(df)
    array([[0.        , 0.        ],
           [0.92682927, 0.36363636],
           [0.92682927, 0.63636364],
           [1.        , 1.        ],
           [0.93902439, 1.        ]])
    
    0 讨论(0)
  • 2020-12-13 09:44

    You could subtract by the min, then divide by the max (beware 0/0). Note that after subtracting the min, the new max is the original max - min.

    In [11]: df
    Out[11]:
        a    b
    A  14  103
    B  90  107
    C  90  110
    D  96  114
    E  91  114
    
    In [12]: df -= df.min()  # equivalent to df = df - df.min()
    
    In [13]: df /= df.max()  # equivalent to df = df / df.max()
    
    In [14]: df
    Out[14]:
              a         b
    A  0.000000  0.000000
    B  0.926829  0.363636
    C  0.926829  0.636364
    D  1.000000  1.000000
    E  0.939024  1.000000
    

    To switch the order of a column (from 1 to 0 rather than 0 to 1):

    In [15]: df['b'] = 1 - df['b']
    

    An alternative method is to negate the b columns first (df['b'] = -df['b']).

    0 讨论(0)
  • 2020-12-13 09:44

    given a data frame

    df = pd.DataFrame({'A':[14,90,90,96,91], 'B':[103,107,110,114,114]})
    

    scale with mean 0 and var 1

    df.apply(lambda x: (x - np.mean(x)) / np.std(x), axis=0)
    

    scale with range between 0 and 1

    df.apply(lambda x: x / np.max(x), axis=0)
    
    0 讨论(0)
  • 2020-12-13 09:52

    In case you want to scale only one column in the dataframe, you can do the following:

    from sklearn.preprocessing import MinMaxScaler
    
    scaler = MinMaxScaler()
    df['Col1_scaled'] = scaler.fit_transform(df['Col1'].values.reshape(-1,1))
    
    0 讨论(0)
提交回复
热议问题