Suppose I have the following data frame:
0 1 2
new NaN NaN
new one one
a b c
NaN NaN NaN
How would I
Use a list comprehension.... with set
:
df['num_uniq'] = [len(set(v[pd.notna(v)].tolist())) for v in df.values]
df
0 1 2 num_uniq
0 new NaN NaN 1
1 new one one 2
2 a b c 3
3 NaN NaN NaN 0
You could do this with stack
, groupby
and nunique
.
# df.join(df.stack().groupby(level=0).nunique().to_frame('num_uniq'))
df['num_uniq'] = df.stack().groupby(level=0).nunique()
df
0 1 2 num_uniq
0 new NaN NaN 1.0
1 new one one 2.0
2 a b c 3.0
3 NaN NaN NaN NaN
Yet another option is apply
and nunique
:
df['num_uniq'] = df.apply(pd.Series.nunique, axis=1)
df
0 1 2 num_uniq
0 new NaN NaN 1
1 new one one 2
2 a b c 3
3 NaN NaN NaN 0
Performance
df_ = df
df = pd.concat([df_] * 1000, ignore_index=True)
%timeit df['num_uniq'] = [len(set(v[pd.notna(v)])) for v in df.values]
%timeit df['num_uniq'] = df.stack().groupby(level=0).nunique()
%timeit df['num_uniq'] = df.apply(pd.Series.nunique, axis=1)
%timeit df['num_uniq'] = df.nunique(1)
196 ms ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
6.34 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
679 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.21 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)