import numpy as np
import pandas as pd
import matplotlib.pyplot as pt
data1 = pd.read_csv(\'stage1_labels.csv\')
X = data1.iloc[:, :-1].values
y = data1.iloc[:, 1]
In my case the problem was that the size of test_size was different from the range of the scatter plot. The range should be the same of the test_size (40% in your code) of the total observation. Here you should set the range of your scatter plot as 40% of total observations that you are processing in your model.