Normalise between 0 and 1 ignoring NaN

前端 未结 3 821
一生所求
一生所求 2020-12-09 16:25

For a list of numbers ranging from x to y that may contain NaN, how can I normalise between 0 and 1, ignoring the NaN val

相关标签:
3条回答
  • 2020-12-09 16:36

    Here's a different approach and one that I believe answers the OP correctly, the only difference is this works for a dataframe instead of a list, you can easily put your list in a dataframe as done below. The other options didn't work for me because I needed to store the MinMaxScaler in order to reverse transform after a prediction was made. So instead of passing the entire column to the MinMaxScaler you can filter out NaNs for both the target and the input.

    Solution Example

    import pandas as pd

    import numpy as np

    from sklearn.preprocessing import MinMaxScaler

    scaler = MinMaxScaler(feature_range=(0, 1))

    d = pd.DataFrame({'A': [0, 1, 2, 3, np.nan, 3, 2]})

    null_index = d['A'].isnull()

    d.loc[~null_index, ['A']] = scaler.fit_transform(d.loc[~null_index, ['A']])

    0 讨论(0)
  • 2020-12-09 16:37

    consider pd.Series s

    s = pd.Series(np.random.choice([3, 4, 5, 6, np.nan], 100))
    s.hist()
    


    Option 1
    Min Max Scaling

    new = s.sub(s.min()).div((s.max() - s.min()))
    new.hist()
    


    NOT WHAT OP ASKED FOR
    I put these in because I wanted to

    Option 2
    sigmoid

    sigmoid = lambda x: 1 / (1 + np.exp(-x))
    
    new = sigmoid(s.sub(s.mean()))
    new.hist()
    


    Option 3
    tanh (hyperbolic tangent)

    new = np.tanh(s.sub(s.mean())).add(1).div(2)
    new.hist()
    

    0 讨论(0)
  • 2020-12-09 16:51

    It seems that sklearn now (June 2020) behaves as you (and me) desire: np.nan is left untouched. (mainly copy pasted from sklearn docs)

    import sklearn
    import numpy as np
    from sklearn.preprocessing import MinMaxScaler
    sklearn.__version__
    # '0.23.1'
    data = np.array([[-1, 2, 3], [-0.5, 6,3 ], [np.nan, 18, 3 ]])
    print(data)
    #[[-1.   2.   3. ]
    # [-0.5  6.   3. ]
    # [ nan 18.   3. ]]
    scaler = MinMaxScaler()
    data = scaler.fit_transform(data)
    print(data)
    #[[0.   0.   0.  ]
    # [1.   0.25 0.  ]
    # [ nan 1.   0.  ]]
    
    0 讨论(0)
提交回复
热议问题