How to select rows with one or more nulls from a pandas DataFrame without listing columns explicitly?

后端未结

关注

 5  2174

滥情空心 2020-11-28 00:37

I have a dataframe with ~300K rows and ~40 columns. I want to find out if any rows contain null values - and put these \'null\'-rows into a separate dataframe so that I coul

5条回答

自闭症患者 (楼主)

2020-11-28 01:06

.any() and .all() are great for the extreme cases, but not when you're looking for a specific number of null values. Here's an extremely simple way to do what I believe you're asking. It's pretty verbose, but functional.

import pandas as pd
import numpy as np

# Some test data frame
df = pd.DataFrame({'num_legs':          [2, 4,      np.nan, 0, np.nan],
                   'num_wings':         [2, 0,      np.nan, 0, 9],
                   'num_specimen_seen': [10, np.nan, 1,     8, np.nan]})

# Helper : Gets NaNs for some row
def row_nan_sums(df):
    sums = []
    for row in df.values:
        sum = 0
        for el in row:
            if el != el: # np.nan is never equal to itself. This is "hacky", but complete.
                sum+=1
        sums.append(sum)
    return sums

# Returns a list of indices for rows with k+ NaNs
def query_k_plus_sums(df, k):
    sums = row_nan_sums(df)
    indices = []
    i = 0
    for sum in sums:
        if (sum >= k):
            indices.append(i)
        i += 1
    return indices

# test
print(df)
print(query_k_plus_sums(df, 2))

Output

   num_legs  num_wings  num_specimen_seen
0       2.0        2.0               10.0
1       4.0        0.0                NaN
2       NaN        NaN                1.0
3       0.0        0.0                8.0
4       NaN        9.0                NaN
[2, 4]

Then, if you're like me and want to clear those rows out, you just write this:

# drop the rows from the data frame
df.drop(query_k_plus_sums(df, 2),inplace=True)
# Reshuffle up data (if you don't do this, the indices won't reset)
df = df.sample(frac=1).reset_index(drop=True)
# print data frame
print(df)

Output:

   num_legs  num_wings  num_specimen_seen
0       4.0        0.0                NaN
1       0.0        0.0                8.0
2       2.0        2.0               10.0

0 讨论(0)

查看其它5个回答