Over-Sampling Class Imbalance Train/Test Split “Found input variables with inconsistent numbers of samples” Solution?
问题 Trying to follow this article to perform over-sampling for imbalanced classification. My class ratio is about 8:1. https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets/notebook I am confused on the pipeline + coding structure. Should you over-sample after train/test splitting? If so, how do you deal with the fact that the target label is dropped from X? I tried keeping it and then performed the over-sampling then dropped labels on X_train/X_test and replaced the new