nan | 易学教程

Count number of non-NaN entries in every column of Dataframe

阅读更多关于 Count number of non-NaN entries in every column of Dataframe

I have a really big DataFrame and I was wondering if there was short (one or two liner) way to get the a count of non-NaN entries in a DataFrame. I don't want to do this one column at a time as I have close to 1000 columns. df1 = pd.DataFrame([(1,2,None),(None,4,None),(5,None,7),(5,None,None)], columns=['a','b','d'], index = ['A', 'B','C','D']) a b d A 1 2 NaN B NaN 4 NaN C 5 NaN 7 D 5 NaN NaN Output: a: 3 b: 2 d: 1 The count() method returns the number of non- NaN values in each column: >>> df1.count() a 3 b 2 d 1 dtype: int64 Similarly, count(axis=1) returns the number of non- NaN values in

Fill in missing pandas data with previous non-missing value, grouped by key

阅读更多关于 Fill in missing pandas data with previous non-missing value, grouped by key

I am dealing with pandas DataFrames like this: id x 0 1 10 1 1 20 2 2 100 3 2 200 4 1 NaN 5 2 NaN 6 1 300 7 1 NaN I would like to replace each NAN 'x' with the previous non-NAN 'x' from a row with the same 'id' value: id x 0 1 10 1 1 20 2 2 100 3 2 200 4 1 20 5 2 200 6 1 300 7 1 300 Is there some slick way to do this without manually looping over rows? You could perform a groupby/forward-fill operation on each group: import numpy as np import pandas as pd df = pd.DataFrame({'id': [1,1,2,2,1,2,1,1], 'x':[10,20,100,200,np.nan,np.nan,300,np.nan]}) df['x'] = df.groupby(['id'])['x'].ffill() print

remove row with nan value

阅读更多关于 remove row with nan value

let's say, for example, i have this data: data <- c(1,2,3,4,5,6,NaN,5,9,NaN,23,9) attr(data,"dim") <- c(6,2) data [,1] [,2] [1,] 1 NaN [2,] 2 5 [3,] 3 9 [4,] 4 NaN [5,] 5 23 [6,] 6 9 Now i want to remove the rows with the NaN values in it: row 1 and 4. But i don't know where these rows are, if it's a dataset of 100.000+ rows, so i need to find them with a function and remove the complete row. Can anybody point me in the right direction? The function complete.cases will tell you where the rows are that you need: data <- matrix(c(1,2,3,4,5,6,NaN,5,9,NaN,23,9), ncol=2) data[complete.cases(data),

classifiers in scikit-learn that handle nan/null

阅读更多关于 classifiers in scikit-learn that handle nan/null

I was wondering if there are classifiers that handle nan/null values in scikit-learn. I thought random forest regressor handles this but I got an error when I call predict . X_train = np.array([[1, np.nan, 3],[np.nan, 5, 6]]) y_train = np.array([1, 2]) clf = RandomForestRegressor(X_train, y_train) X_test = np.array([7, 8, np.nan]) y_pred = clf.predict(X_test) # Fails! Can I not call predict with any scikit-learn algorithm with missing values? Edit. Now that I think about this, it makes sense. It's not an issue during training but when you predict how do you branch when the variable is null?

Force gfortran to stop program at first NaN

阅读更多关于 Force gfortran to stop program at first NaN

To debug my application (fortran 90) I want to turn all NaNs to signalling NaN. With default settings my program works without any signals and just outputs NaN data in file. I want find the point, where NaN is generated. If I can recompile program with signalling NaN, I will get an SIGFPE signal at first point where first wrong floating operation reside. The flag you're looking for is -ffpe-trap=invalid ; I usually add ,zero,overflow to check for related floating point exceptions. program nantest real :: a, b, c a = 1. b = 2. c = a/b print *, c,a,b a = 0. b = 0. c = a/b print *, c,a,b a = 2. b

Why does GCC implement isnan() more efficiently for C++ <cmath> than C <math.h>?

阅读更多关于 Why does GCC implement isnan() more efficiently for C++ than C ?

Here's my code: int f(double x) { return isnan(x); } If I #include <cmath> I get this assembly: xorl %eax, %eax ucomisd %xmm0, %xmm0 setp %al This is reasonably clever: ucomisd sets the parity flag if the comparison of x with itself is unordered, meaning x is NAN. Then setp copies the parity flag into the result (only a single byte, hence the initial clear of %eax ). But if I #include <math.h> I get this assembly: jmp __isnan Now the code is not inline, and the __isnan function is certainly no faster the the ucomisd instruction, so we have incurred a jump for no benefit. I get the same thing

Comparing NaN values for equality in Javascript

阅读更多关于 Comparing NaN values for equality in Javascript

I need to compare two numeric values for equality in Javascript. The values may be NaN as well. I've come up with this code: if (val1 == val2 || isNaN(val1) && isNaN(val2)) ... which is working fine, but it looks bloated to me. I would like to make it more concise. Any ideas? Anant Try using Object.is() , it determines whether two values are the same value. Two values are the same if one of the following holds: both undefined both null both true or both false both strings of the same length with the same characters in the same order both the same object both numbers and both +0 both -0 both

Elegant way to create empty pandas DataFrame with NaN of type float

阅读更多关于 Elegant way to create empty pandas DataFrame with NaN of type float

I want to create a Pandas DataFrame filled with NaNs. During my research I found an answer : import pandas as pd df = pd.DataFrame(index=range(0,4),columns=['A']) This code results in a DataFrame filled with NaNs of type "object". So they cannot be used later on for example with the interpolate() method. Therefore, I created the DataFrame with this complicated code (inspired by this answer ): import pandas as pd import numpy as np dummyarray = np.empty((4,1)) dummyarray[:] = np.nan df = pd.DataFrame(dummyarray) This results in a DataFrame filled with NaN of type "float", so it can be used

Remove NaN/NULL columns in a Pandas dataframe?

阅读更多关于 Remove NaN/NULL columns in a Pandas dataframe?

问题 I have a dataFrame in pandas and several of the columns have all null values. Is there a built in function which will let me remove those columns? 回答1: Yes, dropna . See http://pandas.pydata.org/pandas-docs/stable/missing_data.html and the DataFrame.dropna docstring: Definition: DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None) Docstring: Return object with labels on given axis omitted where alternately any or all of the data are missing Parameters ---------- axis : {0, 1}

What is the difference between (NaN != NaN) and (NaN !== NaN)?

阅读更多关于 What is the difference between (NaN != NaN) and (NaN !== NaN)?

First of all I want to mention that I know how isNaN() and Number.isNaN() work. I am reading The Definite Guide by David Flanagan and he gives an example for how to check if the value is NaN : x !== x This will result in true if and only if x is NaN . But now I have a question: why does he use strict comparison? Because it seems that x != x behaves the same way. Is it safe to use both versions, or I am missing some value(s) in JavaScript that will return true for x !== x and false for x != x ? First, let me point out that NaN is a very special value: By definition, it's not equal to itself.