Python Pandas dataframe find missing values

余生颓废 提交于 2021-02-16 20:18:15

问题


I'm trying to find missing values and then drop off missing values. Tried looking for the data online but can't seem to find the answer.

Extracted Dataframe:

In the df, for 1981 and 1982, it should be '-', i.e. missing values. I would like to find the missing values then drop off the missing values.

Exported Dataframe using isnull:

I used df.isnull() but in 1981 and 1982, it's detected as 'False' which means there's data. But it should be '-', therefore considered as missing values.

I had pasted my code below. What am I missing out?

import pandas as pd

mydf = pd.read_excel('abc.xlsx', sep='\t')

df1 = mydf.set_index('Variables')
df = df1[0:10]
print(df)
print(df.isnull())

回答1:


Missing values are not -.

So for missing values use na_values parameter in read_excel for converting - to missing values NaNs:

mydf = pd.read_excel('abc.xlsx', sep='\t', na_values='-')

na_values : scalar, str, list-like, or dict, default None

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'.




回答2:


The question has two points: finding which columns have missing values and drop those values.

To find the missing values on a dataframe df

missing = df.isnull().sum()
print(missing)

To drop those missing values, apart from @jezrael's consideration, if that doesn't help, I suggest you to use dropna:

Drop the rows where all elements are missing.

df.dropna(how='all')

Drop the columns where at least one element is missing.

df.dropna(axis='columns')


来源:https://stackoverflow.com/questions/59694988/python-pandas-dataframe-find-missing-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!