Scikit-Learn One-hot-encode before or after train/test split

后端 未结 2 998

I am looking at two scenarios building a model using scikit-learn and I can not figure out why one of them is returning a result that is so fundamentally different than the

2条回答
  •  遥遥无期
    2020-12-28 20:21

    I can't get your code to run, but my guess is that in the test dataset either

    • you're not seeing all the levels of some of the categorical variables, and hence if you calculate your dummy variables just on this data, you'll actually have different columns.
    • Otherwise, maybe you have the same columns but they're in a different order?

提交回复
热议问题