问题

I'm classifing users with a multiclass svm (one-against-on), 3 classes. In binary, I would be able to plot the distribution of the weight of each feature in the hyperplan equation for different training sets. In this case, I don't really need a PCA to see stability of the hyperplan and relative importance of the features (reudced centered btw). What would the alternative be in multiclass svm, as for each training set you have 3 classifiers and you choose one class according to the result of the three classifiers (what is it already ? the class that appears the maximum number of times or the bigger discriminant ? whichever it does not really matter here). Anyone has an idea.

And if it matters, I am writing in C# with Accord. Thank you !

回答1:

In a multi-class SVM that uses the one-vs-one strategy, the problem is divided into a set of smaller binary problems. For example, if you have three possible classes, using the one-vs-one strategy requires the creation of (n(n-1))/n binary classifiers. In your example, this would be

(n(n-1))/n = (3(3-1))/2 = (3*2)/2 = 3

Each of those will be specialized in the following problems:

Distinguishing between class 1 and class 2 (let's call it svm_a).
Distinguishing between class 1 and class 3 (let's call it svm_b)
Distinguishing between class 2 and class 3 (let's call it svm_c)

Now, I see that actually you have asked multiple questions in your original post, so I will ask them separately. First I will clarify how the decision process works, and then tell how you could detect which features are the most important.

Since you mentioned Accord.NET, there are two ways this framework might be computing the multi-class decision. The default one is to use a Decision Directed Acyclic Graph (DDAG), that is nothing more but the sequential elimination of classes. The other way is by solving all binary problems and taking the class that won most of the time. You can configure them at the moment you are classifying a new sample by setting the method parameter of the SVM's Compute method.

Since the winning-most-of-the-time version is straightforward to understand, I will explain a little more about the default approach, the DDAG.

Decision using directed acyclic graphs

In this algorithm, we test each of the SVMs and eliminate the class that lost at each round. So for example, the algorithm starts with all possible classes:

Candidate classes: [1, 2, 3]

Now it asks svm_a to classify x, it decides for class 2. Therefore, class 1 lost and is not considered anymore in further tests:

Candidate classes: [2, 3]

Now it asks svm_b to classify x, it decides for class 2. Therefore, class 3 lost and is not considered anymore in further tests:

Candidate classes: [2]

The final answer is thus 2.

Detecting which features are the most useful

Now, since the one-vs-one SVM is decomposed into (n(n-1)/2) binary problems, the most straightforward way to analyse which features are the most important is by considering each binary problem separately. Unfortunately it might be tricky to globally determine which are the most important for the entire problem, but it will be possible to detect which ones are the most important to discriminate between class 1 and 2, or class 1 and 3, or class 2 and 3.

However, here I can offer a suggestion if your are using DDAGs. Using DDAGs, it is possible to extract the decision path that lead to a particular decision. This means that is it possible to estimate how many times each of the binary machines was used when classifying your entire database. If you can estimate the importance of a feature for each of the binary machines, and estimate how many times a machine is used during the decision process in your database, perhaps you could take their weighted sum as an indicator of how useful a feature is in your decision process.

By the way, you might also be interested in trying one of the Logistic Regression Support Vector Machines using L1-regularization with a high C to perform sparse feature selection:

// Create a new linear machine
var svm = new SupportVectorMachine(inputs: 2);

// Creates a new instance of the sparse logistic learning algorithm
var smo = new ProbabilisticCoordinateDescent(svm, inputs, outputs)
{
    // Set learning parameters
    Complexity = 100,
};

回答2:

I'm not an expert in ML or SVM. I am a self learner. However my prototype over-performed some of similar commercial or academical software in accuracy, while the training time is about 2 hours in contrast of days and weeks(!) of some competitors.

My recognition system (patterns in bio-cells) uses following approach to select best features:

1)Extract features and calculate mean and variance for all classes 2)Select those features, where means of classes are most distanced and variances are minimal. 3)Remove redundant features - those which mean-histograms over classes are similar

In my prototype I'm using parametric features e.g feature "circle" with parameters diameter, threshold, etc. The training is controlled by scripts defining which features with which argument-ranges are to use. So the software tests all possible combinations.

For some training-time optimization: The software begins with 5 instances per class for extracting the features and increases the number when the 2nd condition met.

Probably there are some academical names for some of the steps. Unfortunately I'm not aware of them, I've "invented the wheel" myself.

来源：https://stackoverflow.com/questions/29566309/find-right-features-in-multiclass-svm-without-pca

标签

machine-learning

svm

accord.net

Find right features in multiclass svm without PCA

问题

回答1:

Decision using directed acyclic graphs

Detecting which features are the most useful

回答2: