Factorize a column of strings in pandas

北城以北 提交于 2019-11-27 05:39:02
v

RowX    yes
RowY     no
RowW    yes
RowJ     no
RowA    yes
RowR     no
RowX    yes
RowY    yes
RowW    yes
RowJ    yes
RowA    yes
RowR     no
Name: Column 3, dtype: object

pd.factorize

1 - pd.factorize(v)[0]
array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0])

np.where

np.where(v == 'yes', 1, 0)
array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0])

pd.Categorical/astype('category')

pd.Categorical(v).codes
array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0], dtype=int8)
v.astype('category').cat.codes

RowX    1
RowY    0
RowW    1
RowJ    0
RowA    1
RowR    0
RowX    1
RowY    1
RowW    1
RowJ    1
RowA    1
RowR    0
dtype: int8

pd.Series.replace

v.replace({'yes' : 1, 'no' : 0})

RowX    1
RowY    0
RowW    1
RowJ    0
RowA    1
RowR    0
RowX    1
RowY    1
RowW    1
RowJ    1
RowA    1
RowR    0
Name: Column 3, dtype: int64

A fun, generalised version of the above:

v.replace({r'^(?!yes).*$' : 0}, regex=True).astype(bool).astype(int)

RowX    1
RowY    0
RowW    1
RowJ    0
RowA    1
RowR    0
RowX    1
RowY    1
RowW    1
RowJ    1
RowA    1
RowR    0
Name: Column 3, dtype: int64

Anything that is not "yes" is 0.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!