df = pd.DataFrame({\'A\': [\'x\', \'y\', \'x\'], \'B\': [\'z\', \'u\', \'z\'],
\'C\': [\'1\', \'2\', \'3\'],
\'D\':[\'j\', \'l\',
Adding to the above perfect answers, in case you have a big dataset with lots of attributes, if you don't want to specify by hand all of the dummies you want, you can do set differences:
len(df.columns) = 50
non_dummy_cols = ['A','B','C']
# Takes all 47 other columns
dummy_cols = list(set(df.columns) - set(non_dummy_cols))
df = pd.get_dummies(df, columns=dummy_cols)
Just select the two columns you want to .get_dummies()
for - column
names indicate source column and variable label represented as binary variable, and pd.concat()
the original columns you want unchanged:
pd.concat([pd.get_dummies(df[['A', 'D']]), df[['B', 'C']]], axis=1)
A_x A_y D_j D_l B C
0 1.0 0.0 1.0 0.0 z 1
1 0.0 1.0 0.0 1.0 u 2
2 1.0 0.0 1.0 0.0 z 3
It can be done without concatenation, using get_dummies() with required parameters
In [294]: pd.get_dummies(df, prefix=['A', 'D'], columns=['A', 'D'])
Out[294]:
B C A_x A_y D_j D_l
0 z 1 1.0 0.0 1.0 0.0
1 u 2 0.0 1.0 0.0 1.0
2 z 3 1.0 0.0 1.0 0.0