I am using numpy.log10 to calculate the log of an array of probability values. There are some zeros in the array, and I am trying to get around it using
resu
I solved this by finding the lowest non-zero number in the array and replacing all zeroes by a number lower than the lowest :p
Resulting in a code that would look like:
def replaceZeroes(data):
min_nonzero = np.min(data[np.nonzero(data)])
data[data == 0] = min_nonzero
return data
...
prob = replaceZeroes(prob)
result = numpy.where(prob > 0.0000000001, numpy.log10(prob), -10)
Note that all numbers get a tiny fraction added to them.
You can turn it off with seterr
numpy.seterr(divide = 'ignore')
and back on with
numpy.seterr(divide = 'warn')
This solution worked for me, use numpy.sterr to turn warnings off followed by where
numpy.seterr(divide = 'ignore')
df_train['feature_log'] = np.where(df_train['feature']>0, np.log(df_train['feature']), 0)
Just use the where argument in np.log10
import numpy as np
np.random.seed(0)
prob = np.random.randint(5, size=4) /4
print(prob)
result = np.where(prob > 0.0000000001, prob, -10)
# print(result)
np.log10(result, out=result, where=result > 0)
print(result)
Output
[1. 0. 0.75 0.75]
[ 0. -10. -0.12493874 -0.12493874]
numpy.log10(prob) calculates the base 10 logarithm for all elements of prob, even the ones that aren't selected by the where. If you want, you can fill the zeros of prob with 10**-10 or some dummy value before taking the logarithm to get rid of the problem. (Make sure you don't compute prob > 0.0000000001 with dummy values, though.)