How to use dummy variable to represent categorical data in python scikit-learn random forest
- 阅读更多 关于 How to use dummy variable to represent categorical data in python scikit-learn random forest
问题 I'm generating feature vector for random forest classifier of scikit-learn . The feature vector represents the name of 9 protein amino acid residues. There are 20 possible residue names. So, I use 20 dummy variables to represent one residue name, for 9 residue, I have 180 dummy variables. For example, if the 9 residues in the sliding window are: ARNDCQEGH (every one letter represent a name of a protein residue),my feature vector will be: "True\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse\tFalse