SVM with dynamic time warping kernel return error rate greater than 0

问题

I'm using Accord.net in my research. I have a vector sequences of variable size as input so I use DynamicTimeWarping as a kernel for MulticlassSupportVectorMachine.

IKernel kernel = new DynamicTimeWarping(dimension);
        var machine = new MulticlassSupportVectorMachine(0, kernel, 2);
        // Create the Multi-class learning algorithm for the machine
        var teacher = new MulticlassSupportVectorLearning(machine, inputs.ToArray(), outputs.ToArray());
        // Configure the learning algorithm to use SMO to train the
        //  underlying SVMs in each of the binary class subproblems.
        teacher.Algorithm = (svm, classInputs, classOutputs, i, j) =>
            new SequentialMinimalOptimization(svm, classInputs, classOutputs)
            {
                Complexity = 1.5
            };
        // Run the learning algorithm
        double error = teacher.Run();

The inputs and outputs look like this:

?inputs.ToArray()
{double[22][]}
    [0]: {double[10656]}
    [1]: {double[9360]}
    [2]: {double[9216]}
    [3]: {double[9864]}
    [4]: {double[10296]}
    [5]: {double[10152]}
    [6]: {double[9936]}
    [7]: {double[9216]}
    [8]: {double[10944]}
    [9]: {double[9504]}
    [10]: {double[11880]}
    [11]: {double[22752]}
    [12]: {double[23688]}
    [13]: {double[29880]}
    [14]: {double[32328]}
    [15]: {double[37224]}
    [16]: {double[30024]}
    [17]: {double[27288]}
    [18]: {double[26064]}
    [19]: {double[22032]}
    [20]: {double[21672]}
    [21]: {double[22680]}
?inputs[0]
{double[10656]}
    [0]: 7.6413027545068823
    [1]: -61.607203372756942
    [2]: 7.7375128997886513
    [3]: -25.704529598536471
    [4]: -0.4124927191531238
    [5]: 9.6820255661415011
    [6]: 3.0674374003781861
    [7]: 4.6364653722537668
    [8]: 3.3559314278499177
    [9]: 0.93969394152714925
    [10]: -6.3800159552064146
    [11]: 1.4239779356781062
    [12]: -2.25349154655782
    [13]: -1.5457194406236221
    [14]: -0.7612541874802764
    [15]: -3.3364791133985348
    [16]: 0.67816801816804861
    [17]: -3.4117217877592343
    [18]: 1.5785492543017225
    [19]: 0.31091690789261689
    [20]: -2.5526646739208712
    [21]: -1.0550268575680164
    [22]: -0.9598271201088191
    [23]: -1.1797916101998056
    [24]: 0.56157735657438412
    [25]: -0.16309890421998655
    [26]: 0.29765136770064271
    [27]: -0.35684735108472643
    [28]: -0.52382117896006564
    [29]: -0.052087258844925849
    [30]: -0.45363669419489172
    [31]: -0.16216259086709361
    [32]: -0.25958480481802632
    [33]: 0.081248839173330589
    [34]: -0.019783293216956807
    [35]: 0.14139773316964666
    [36]: 0.088466551256948273
    [37]: -0.019528343614348152
    [38]: 0.087073343332064762
    [39]: 0.048432068369313144
    [40]: -0.0069171431858626713
    [41]: -0.0095272766950126042
    [42]: 0.016639887499893875
    [43]: -0.009108847017642599
    [44]: 0.0017424263597747487
    [45]: 0.0042160613810267641
    [46]: -0.002793626734247919
    [47]: 0.00092130299196750763
    [48]: 0.0024488939699103319
    [49]: 0.0021684669072286468
    [50]: 0.000000000000010673294119695543
    [51]: -0.000000000000014072530108313123
    [52]: 0.000000000000000069063495074940116
    [53]: 8.73342598612937E-17
    [54]: 0.000000000000000030048643853749834
    [55]: -6.95380121971215E-17
    [56]: 0.00000000000000010093927767292201
    [57]: 0.000000000000000046158366228268829
    [58]: 0.000000000000000039070100378142324
    [59]: 0.00000000000000010492059540665321
    [60]: -0.000000000000000014254591247067773
    [61]: -0.0000000000000000015902697756329909
    [62]: 0.0000000000000000017024249964704589
    [63]: 0.0000000000000000010277956708903136
    [64]: 3.5875442986020568E-28
    [65]: -2.215158998843094E-31
    [66]: 1.041379591973569E-31
    [67]: -4.3897715186113276E-31
    [68]: 4.248432864156974E-34
    [69]: 4.3718530099471368E-47
    [70]: 1.4551856970655856E-50
    [71]: 0.0
    [72]: 11.031182384920639
    [73]: -63.434486026723626
    [74]: 1.7731679007864651
    [75]: -23.968196466652273
    [76]: 2.2753564408666507
    [77]: 9.5492641110324534
    [78]: 3.4465209481281054
    [79]: 4.7979691924966161
    [80]: 2.0147801482840508
    [81]: 1.1858337013571998
    [82]: -4.607944757859336
    [83]: 0.75637871318664485
    [84]: -3.8397810581420115
    [85]: -2.1276086210477514
    [86]: -0.4060620782117581
    [87]: -2.9313848427777227
    [88]: 0.052605148372525556
    [89]: -1.5948208186863277
    [90]: 0.36061926783486992
    [91]: -0.12623742266247567
    [92]: -1.1889713301479885
    [93]: -0.33299631607409635
    [94]: -0.00912650336180437
    [95]: -0.52707950657313729
    [96]: 0.52115933681848092
    [97]: 0.46870463636533816
    [98]: -0.18482093982467213
    [99]: -0.49350561475314514
    < More... (The first 100 of 10656 items were displayed.) >
?outputs
Count = 22
    [0]: 0
    [1]: 0
    [2]: 0
    [3]: 0
    [4]: 0
    [5]: 0
    [6]: 0
    [7]: 0
    [8]: 0
    [9]: 0
    [10]: 0
    [11]: 1
    [12]: 1
    [13]: 1
    [14]: 1
    [15]: 1
    [16]: 1
    [17]: 1
    [18]: 1
    [19]: 1
    [20]: 1
    [21]: 1

With that code, the error return is 0.5.

Questions:

Does that mean there are problems with my training data?
Is there any other kernel that I can use with my variable size sequences?

Thanks.

回答1:

I will give an example on how to perform sequence classification using the DynamicTimeWarping kernel combined with a Gaussian kernel, which should hopefully give better results.

The first task in a sequence classification problem is to organize the sequences properly to feed the learning algorithms. Each sequence can be composed of multivariate vectors, and as such, the input data must be organized accordingly (this is the most likely source of error when using the sequence machines, so please take a moment or two to understand what the following code is doing).

// Suppose you have sequences of multivariate observations, and that
// those sequences could be of arbitrary length. On the other hand, 
// each observation have a fixed, delimited number of dimensions.

// In this example, we have sequences of 3-dimensional observations. 
// Each sequence can have an arbitrary length, but each observation
// will always have length 3:

double[][][] sequences =
{
    new double[][] // first sequence
    {
        new double[] { 1, 1, 1 }, // first observation of the first sequence
        new double[] { 1, 2, 1 }, // second observation of the first sequence
        new double[] { 1, 4, 2 }, // third observation of the first sequence
        new double[] { 2, 2, 2 }, // fourth observation of the first sequence
    },

    new double[][] // second sequence (note that this sequence has a different length)
    {
        new double[] { 1, 1, 1 }, // first observation of the second sequence
        new double[] { 1, 5, 6 }, // second observation of the second sequence
        new double[] { 2, 7, 1 }, // third observation of the second sequence
    },

    new double[][] // third sequence 
    {
        new double[] { 8, 2, 1 }, // first observation of the third sequence
    },

    new double[][] // fourth sequence 
    {
        new double[] { 8, 2, 5 }, // first observation of the fourth sequence
        new double[] { 1, 5, 4 }, // second observation of the fourth sequence
    }
};

Those are our input sequences. Now, since we are trying to perform a classification problem, we must have an output class label associated with each of those sequences. If we have 4 sequences, then we will need 4 class labels:

// Now, we will also have different class labels associated which each 
// sequence. We will assign -1 to sequences whose observations start 
// with { 1, 1, 1 } and +1 to those that do not:

int[] outputs =
{
    -1,-1,  // First two sequences are of class -1 (those start with {1,1,1})
        1, 1,  // Last two sequences are of class +1  (don't start with {1,1,1})
};

Now that the problem has been defined, it has to be transformed a little so they can be processed by the DTW-SVMs:

// At this point, we will have to "flat" out the input sequences from double[][][]
// to a double[][] so they can be properly understood by the SVMs. The problem is 
// that, normally, SVMs usually expect the data to be comprised of fixed-length 
// input vectors and associated class labels. But in this case, we will be feeding
// them arbitrary-length sequences of input vectors and class labels associated with
// each sequence, instead of each vector.

double[][] inputs = new double[sequences.Length][];
for (int i = 0; i < sequences.Length; i++)
    inputs[i] = Matrix.Concatenate(sequences[i]);


// Now we have to setup the Dynamic Time Warping kernel. We will have to
// inform the length of the fixed-length observations contained in each
// arbitrary-length sequence:
// 
var kernel = new Gaussian<DynamicTimeWarping>(new DynamicTimeWarping(length: 3));

// Now we can create the machine. When using variable-length
// kernels, we will need to pass zero as the input length:
var svm = new KernelSupportVectorMachine(kernel, inputs: 0);

// Create the Sequential Minimal Optimization learning algorithm
var smo = new SequentialMinimalOptimization(svm, inputs, outputs);

// And start learning it!
double error = smo.Run(); // error will be 0.0

After this point, the machine should have been created. The parameter C should have been guessed automatically by the machine, but you can fine tune it afterwards to try to increase the generalization performance of your machines. In any case,

// At this point, we should have obtained an useful machine. Let's
// see if it can understand a few examples it hasn't seem before:

double[][] a = 
{ 
    new double[] { 1, 1, 1 },
    new double[] { 7, 2, 5 },
    new double[] { 2, 5, 1 },
};

double[][] b =
{
    new double[] { 7, 5, 2 },
    new double[] { 4, 2, 5 },
    new double[] { 1, 1, 1 },
};

// Following the aforementioned logic, sequence (a) should be
// classified as -1, and sequence (b) should be classified as +1.

int resultA = System.Math.Sign(svm.Compute(Matrix.Concatenate(a))); // -1
int resultB = System.Math.Sign(svm.Compute(Matrix.Concatenate(b))); // +1

回答2:

I don't have your original data to test it, but two things came to my mind that maybe you should try them. First, your inputs doesn't have a constant length. For example, the first one has 10656 length and the second has 9360 length of feature vector. as far as I'm concerned, the length of all feature vectors must be the same. Second you put the first argument of MulticlassSupportVectorMachine 0. This is the length of your inputs and you should determine it correctly. Also I highly recommend you to scale your data before the training phase.

来源：https://stackoverflow.com/questions/29887010/svm-with-dynamic-time-warping-kernel-return-error-rate-greater-than-0

标签

machine-learning

svm

accord.net