I am trying to find the count of distinct values in each column using Pandas. This is what I did.
import pandas as pd
import numpy as np
# Generate data.
NR
Need to segregate only the columns with more than 20 unique values for all the columns in pandas_python:
enter code here
col_with_morethan_20_unique_values_cat=[]
for col in data.columns:
if data[col].dtype =='O':
if len(data[col].unique()) >20:
....col_with_morethan_20_unique_values_cat.append(data[col].name)
else:
continue
print(col_with_morethan_20_unique_values_cat)
print('total number of columns with more than 20 number of unique value is',len(col_with_morethan_20_unique_values_cat))
# The o/p will be as:
['CONTRACT NO', 'X2','X3',,,,,,,..]
total number of columns with more than 20 number of unique value is 25
Already some great answers here :) but this one seems to be missing:
df.apply(lambda x: x.nunique())
As of pandas 0.20.0, DataFrame.nunique()
is also available.