Normalise between 0 and 1 ignoring NaN

前端未结

关注

 3  825

For a list of numbers ranging from x to y that may contain NaN, how can I normalise between 0 and 1, ignoring the NaN val

相关标签:

3条回答

深忆病人

2020-12-09 16:36

Here's a different approach and one that I believe answers the OP correctly, the only difference is this works for a dataframe instead of a list, you can easily put your list in a dataframe as done below. The other options didn't work for me because I needed to store the MinMaxScaler in order to reverse transform after a prediction was made. So instead of passing the entire column to the MinMaxScaler you can filter out NaNs for both the target and the input.

Solution Example

import pandas as pd

import numpy as np

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))

d = pd.DataFrame({'A': [0, 1, 2, 3, np.nan, 3, 2]})

null_index = d['A'].isnull()

d.loc[~null_index, ['A']] = scaler.fit_transform(d.loc[~null_index, ['A']])

0 讨论(0)
发布评论:

提交评论
- 加载中...

渐次进展

2020-12-09 16:37

consider pd.Series s

s = pd.Series(np.random.choice([3, 4, 5, 6, np.nan], 100))
s.hist()

Option 1
Min Max Scaling

new = s.sub(s.min()).div((s.max() - s.min()))
new.hist()

NOT WHAT OP ASKED FOR
I put these in because I wanted to

Option 2
sigmoid

sigmoid = lambda x: 1 / (1 + np.exp(-x))

new = sigmoid(s.sub(s.mean()))
new.hist()

Option 3
tanh (hyperbolic tangent)

new = np.tanh(s.sub(s.mean())).add(1).div(2)
new.hist()

0 讨论(0)

天命终不由人

2020-12-09 16:51

It seems that sklearn now (June 2020) behaves as you (and me) desire: np.nan is left untouched. (mainly copy pasted from sklearn docs)

import sklearn
import numpy as np
from sklearn.preprocessing import MinMaxScaler
sklearn.__version__
# '0.23.1'
data = np.array([[-1, 2, 3], [-0.5, 6,3 ], [np.nan, 18, 3 ]])
print(data)
#[[-1.   2.   3. ]
# [-0.5  6.   3. ]
# [ nan 18.   3. ]]
scaler = MinMaxScaler()
data = scaler.fit_transform(data)
print(data)
#[[0.   0.   0.  ]
# [1.   0.25 0.  ]
# [ nan 1.   0.  ]]

0 讨论(0)