ValueError: x and y must be the same size

后端 未结 4 1662
抹茶落季
抹茶落季 2020-12-11 03:16
import numpy as np
import pandas as pd
import matplotlib.pyplot as pt

data1 = pd.read_csv(\'stage1_labels.csv\')

X = data1.iloc[:, :-1].values
y = data1.iloc[:, 1]         


        
相关标签:
4条回答
  • 2020-12-11 03:49

    Slicing with [:, :-1] will give you a 2-dimensional array (including all rows and all columns excluding the last column).

    Slicing with [:, 1] will give you a 1-dimensional array (including all rows from the second column). To make this array also 2-dimensional use [:, 1:2] or [:, 1].reshape(-1, 1) or [:, 1][:, None] instead of [:, 1]. This will make x and y comparable.


    An alternative to making both arrays 2-dimensional is making them both one dimensional. For this one would do [:, 0] (instead of [:, :1]) for selecting the first column and [:, 1] for selecting the second column.

    0 讨论(0)
  • 2020-12-11 03:54

    Try this:

    x_train=np.arange(0,len(x_train),1)
    

    It will make an evenly spaced array and your error will be gone permanently.

    0 讨论(0)
  • 2020-12-11 04:00

    Print X_train shape. What do you see? I'd bet X_train is 2d (matrix with a single column), while y_train 1d (vector). In turn you get different sizes.

    I think using X_train[:,0] for plotting (which is from where the error originates) should solve the problem

    0 讨论(0)
  • 2020-12-11 04:06

    In my case the problem was that the size of test_size was different from the range of the scatter plot. The range should be the same of the test_size (40% in your code) of the total observation. Here you should set the range of your scatter plot as 40% of total observations that you are processing in your model.

    0 讨论(0)
提交回复
热议问题