Golearn models are implicit about independent variables (predictors) and targets (predicted)

问题

I am learning ML in Go. I was exploring Golearn Package in Go, for ML support. I am very confused with the way the model.fit and model.predict functions are implemented.

For example in this example implementation of Knn Classifier from Golearn repo:

    rawData, err := base.ParseCSVToInstances("../datasets/iris_headers.csv", true)
    
    cls := knn.NewKnnClassifier("euclidean", "linear", 2)

    trainData, testData := base.InstancesTrainTestSplit(rawData, 0.50)
    cls.Fit(trainData)

    predictions, err := cls.Predict(testData)

I am confused which are x and y for model. How do I selectively pass in the predictors and predicted? I have almost got frozen with the internet articles giving no clues about it.

I am new to Golang ML dev. Had prev experience with web and database work in go. I code ML models in python. Recently I found GO is faster in data processing, and suited for ML application while faster than python. I am eager about an explanation of this. If not, a Go library with less complex but sufficient ML support will also do.

回答1:

golearn ->knn implements k nearest neighbor algorithm. It is implemented by

parsing a csv file into a matrix
(Predict function) calculating distance between vectors using different algorithms
- euclidian
- manhattan
- cosine
while doing this step all non numerical fields are removed. The non numerical field is assumed as label for which this model is training.
Categories/Labels or Attributes defined in csv, are returned in prediction list, a pair of values of the form (index,predicted Attribute).

How do I selectively pass in the predictors and predicted

in knn you can do that by labeling your prediction target in csv as a non integer value. For example (Iris-setosa,Iris-versicolor).

linear regression

you can use AddClassAttribute(), this method is defined on DenseInstances struct which is the output of base.ParseCSVToInstances() method.

the code to do that would look like

   instances, err := base.ParseCSVToInstances("../examples/datasets/exams.csv", true) // true: means first line of csv is headers.
   
   attrArray:=instances.AllAttributes() 
   instances.SetClassAttribute(attrArray[4])//setting final column as class attribute, note that there cannot be more than one class attribute for linear regression.
   trainData, testData := base.InstancesTrainTestSplit(instances, 0.1) 
   lr := NewLinearRegression()
   err := lr.Fit(instances)
   if err!=nil{
      // error handling
   }
   predictions, err := lr.Predict(testData)
   if err!=nil{
      // error handling
   }

caveat:-> in the test file given with linear regression all these are not done. I would not claim that the above method is the correct way or the optimal way of assigning the regression target.

It is a possible way. It makes a candidate for Fit() function of linear regression which is where the computations for this model takes place. Predict() function merely multiplies the finite set of linear regression coefficients and stores that value as the prediction.

来源：https://stackoverflow.com/questions/63534929/golearn-models-are-implicit-about-independent-variables-predictors-and-targets

标签

machine-learning