designing classification problem of weather data

问题

In normal 2 or multi class classification problem, we can use any famous machine learning algorithm like Naive Bayes or SVM to train and test the model. My problem is that I have been given weather data where the label variable is in the format of "20 % rain, 80 % dry" or "30% cloudy, 70% rain" etc. How should I approach this problem? Will I need to covert the problem into regression somehow? In that case, if there are three labels (rain, dry, cloudy) in data, what may be the right approach to convert percentage information to continuous values? Thanks for your time

回答1:

Assuming that the expressions "20 % rain, 80 % dry" and "30% cloudy, 70% rain" represent probabilities, that the classes are mutually exclusive and that we may ignore a possible ordinal relationship (such as "dry > cloudy > rain") among them, models such as polychotomous logistic regression may be fit to these values, as though they were grouped or replicated.

I suppose other, ad hoc procedures could be employed as well, which would minimize, for example, the Kullback–Leibler divergence.

回答2:

I would recommend a neural network with three outputs labels Rain, Dry, Cloud.

If you have data with label "20 % rain" then weight of instance will be 0.2. If the are no "rain" label should contain "false". Other approach is to 3 different regression classifier with same converting convention. I think regression would work better.

Neural networks will be good choice because it can do all three regression/classification at once and they can influence on each other. Additionally the training algorithm is straightforward.

来源：https://stackoverflow.com/questions/5055112/designing-classification-problem-of-weather-data

标签

machine-learning

data-modeling

classification

regression