问题
I'm using ML.NET to predict a series of values using a regression model. I am only interested in one column being predicted (the score column). However, the values of some of the other columns are not available for the prediction class. I can't leave them at 0 as this would upset the prediction, so I guess they would have to also be predicted.
I saw a similar question here on predicting multiple values. The answer suggests creating two models, but I can see that the feature columns specified in each model do not include the label column of the other model. So this implies that those columns would not be used when making the prediction. Am I wrong, or should the label column of each model also be included in the feature column of the other model?
Here's some example code to try and explain in code:
public class FooInput
{
public float Feature1 { get; set; }
public float Feature2 { get; set; }
public float Bar {get; set; }
public float Baz {get; set; }
}
public class FooPrediction : FooInput
{
public float BarPrediction { get; set; }
public float BazPrediction { get; set; }
}
public ITransformer Train(IEnumerable<FooInput> data)
{
var mlContext = new MLContext(0);
var trainTestData = mlContext.Data.TrainTestSplit(mlContext.Data.LoadFromEnumerable(data));
var pipelineBar = mlContext.Transforms.CopyColumns("Label", "Bar")
.Append(mlContext.Transforms.CopyColumns("Score", "BarPrediction"))
.Append(mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2", "Baz"))
.Append(mlContext.Regression.Trainers.FastTree());
var pipelineBaz = mlContext.Transforms.CopyColumns("Label", "Baz")
.Append(mlContext.Transforms.CopyColumns("Score", "BazPrediction"))
.Append(mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2", "Bar"))
.Append(mlContext.Regression.Trainers.FastTree());
return pipelineBar.Append(pipelineBaz).Fit(trainTestData.TestSet);
}
This is effectively the same as the aforementioned answer, but with the addition of Baz
as a feature for the model where Bar
is to be predicted, and conversely the addition of Bar
as a feature for the model where Baz
is to be predicted.
Is this the correct approach, or does the answer on the other question achieve the desired result, being that the prediction of each column will utilise the values of the other predicted column from the loaded dataset?
回答1:
One technique you can use is called "Imputation", which replaces these unknown values with some "guessed" value. Imputation is simply the process of substituting the missing values of our dataset.
In ML.NET, what you're looking for is the ReplaceMissingValues
transform. You can find samples on docs.microsoft.com.
The technique you are discussing above is also a form of imputation, where your unknowns are replaced by predicting the value from the other known values. This can work as well. I guess I would try both forms and see what works best for your dataset.
来源:https://stackoverflow.com/questions/58367755/specify-columns-as-both-feature-and-label-in-multiple-combined-regression-models