classification: PCA and logistic regression using sklearn
Step 0: Problem description I have a classification problem, ie I want to predict a binary target based on a collection of numerical features, using logistic regression, and after running a Principal Components Analysis (PCA). I have 2 datasets: df_train and df_valid (training set and validation set respectively) as pandas data frame, containing the features and the target. As a first step, I have used get_dummies pandas function to transform all the categorical variables as boolean. For example, I would have: n_train = 10 np.random.seed(0) df_train = pd.DataFrame({"f1":np.random.random(n