问题
I am performing logistic regression in MATLAB with L2 regularization on text data. My program works well for small datasets. For larger sets, it keeps running infinitely.
I have seen the potentially duplicate question (matlab fminunc not quitting (running indefinitely)). In that question, the cost for initial theta was NaN and there was an error printed in the console. For my implementation, I am getting a real valued cost and there is no error even with verbose parameters being passed to fminunc(). Hence I believe this question might not be a duplicate.
I need help in scaling it to larger sets. The size of the training data I am currently working on is roughly 10k*12k (10k text files cumulatively containing 12k words). Thus, I have m=10k training examples and n=12k features.
My cost function is defined as follows:
function [J gradient] = costFunction(X, y, lambda, theta)
[m n] = size(X);
g = inline('1.0 ./ (1.0 + exp(-z))');
h = g(X*theta);
J =(1/m)*sum(-y.*log(h) - (1-y).*log(1-h))+ (lambda/(2*m))*norm(theta(2:end))^2;
gradient(1) = (1/m)*sum((h-y) .* X(:,1));
for i = 2:n
gradient(i) = (1/m)*sum((h-y) .* X(:,i)) - (lambda/m)*theta(i);
end
end
I am performing optimization using MATLAB's fminunc() function. The parameters I pass to fminunc() are:
options = optimset('LargeScale', 'on', 'GradObj', 'on', 'MaxIter', MAX_ITR);
theta0 = zeros(n, 1);
[optTheta, functionVal, exitFlag] = fminunc(@(t) costFunction(X, y, lambda, t), theta0, options);
I am running this code on a machine with these specifications:
Macbook Pro i7 2.8GHz / 8GB RAM / MATLAB R2011b
The cost function seems to behave correctly. For initial theta, I get acceptable values of J and gradient.
K>> theta0 = zeros(n, 1);
K>> [j g] = costFunction(X, y, lambda, theta0);
K>> j
j =
0.6931
K>> max(g)
ans =
0.4082
K>> min(g)
ans =
-2.7021e-05
The program takes incredibly long to run. I started profiling keeping MAX_ITR = 1 for fminunc(). With a single iteration, the program did not complete execution even after a couple of hours had elapsed. My questions are:
Am I doing something wrong mathematically?
Should I use any other optimizer instead of fminunc()? With LargeScale=on, fminunc() uses trust-region algorithms.
Is this problem cluster-scale and should not be run on a single machine?
Any other general tips will be appreciated. Thanks!
This helped solve the problem: I was able to get this working by setting the LargeScale flag to 'off' in fminunc(). From what I gather, LargeScale = 'on' uses trust region algorithms, while keeping it 'off' uses quasi-newton methods. Using quasi-newton methods and passing the gradient worked a lot faster for this particular problem and gave very nice results.
回答1:
I was able to get this working by setting the LargeScale flag to 'off' in fminunc(). From what I gather, LargeScale = 'on' uses trust region algorithms, while keeping it 'off' uses quasi-newton methods. Using quasi-newton methods and passing the gradient worked a lot faster for this particular problem and gave very nice results.
回答2:
Here is my advise:
-Set the Matlab flag to show debug output during run. If not just print out in your cost function the cost, which will allow you to monitor iteration count and error.
And second, which is very important:
Your problem is illposed, or so to say underdetermined. You have a 12k feature space and provide only 10k examples, which means for an unconstrained optimization the answer is -Inf. To make a quick example why this is, your problem is like: Minimize x+y+z given that x+y-z = 2. Feature space dim 3, spanned vector space - 1d. I suggest use PCA or CCA to reduce the dimensionality of the the text files by retaining their variation up to 99%. This will probably give you a feature space ~100-200dim.
PS: Just to point out that the problem is very fram from cluster size requirement, which usually is 1kk+ data points and that fminunc is not at all an overkill, and LIBSVM has nothing to do with it because fminunc is just a quadratic optimizer, while LIBSVM is a classifier. To clear out LIBSVM uses something similar to fminunc just with different objective function.
回答3:
Here's what I suspect to be the issue, based on my experience with this type of problem. You're using a dense representation for X
instead of a sparse one. You're also seeing the typical effect in text classification that the number of terms increasing roughly linearly with the number of samples. Effectively, the cost of the matrix multiplication X*theta
goes up quadratically with the number of samples.
By contrast, a good sparse matrix representation only iterates over the non-zero elements to do a matrix multiplication, which tends to be roughly constant per document if they're of appropriately constant length, causing linear instead of quadratic slowdown in the number of samples.
I'm not a Matlab guru, but I know it has a sparse matrix package, so try to use that.
来源:https://stackoverflow.com/questions/16696459/matlab-fminunc-not-completing-for-large-datasets-works-for-smaller-ones