Topics
Let A be a given real symmetric, i.e. A=A', matrix. Then, its minimal eigenvalue is real and can be found as a minimum of the following function:
R(x)=(Ax,x)/(x,x),
where x is an arbitrary nonzero real column-vector, and (u,v)=u'v is the standard scalar product of real column vectors. The gradient of this function is
grad R(x)=Ax - R(x)x,
up to a scalar multiplier, which is not important for us.
Write and test a MATLAB program of the gradient descent method
x(k+1)=xk - alpha(Axk - R(x)xk)
to find the minimum of function R(x), thus, to find the smallest eigenvalue of A. Use algorithm 10.3, pp. 716-618 from the text. Test it for different matrices. Try to use it to find the smallest eigenvalue of the matrix NOS6:from set LANPRO, from the Harwell-Boeing Collection. use a random vector as the initial guess. use MATLAB built in functions eig, and eigs, to find the eigenvalue, too. Discuss numerical results and performance. Write a report and publish it on the Internet.
The advantage of the newton and quasi-newton methods for solving systems of nonlinear equations is their speed of convergence once a sufficiently accurate approximation is known. A weakness of these methods is that an accurate initial approximation to the solution is needed to ensure convergence. The Steepest Descent method implemented in this project converges linearly to the solution, but it will usually converge even for poor initial approximations. As a consequence, this method is used to find sufficiently accurate starting approximations for the Newton-based techniques, in the same way the Bisection method is used for a single equation.
The method of Steepest Descent determines a local minimum for a multivariable
function of the form
The connection between
the minimization of a function from
and the
solution of a system of nonlinear equations is due to the fact that a system
of the form
f1(x1,x2,...,xn)=0, f2(x1,x2,...,xn)=0, . . . fn(x1,x2,...,xn)=0,has a solution at x = (x1, x2,...xn)t precisely when the function g defined by
has the minimal value.
The method of Steepest Descent for finding a local minimum for an arbitrary
function g from
can be intuitively
described as follows:
1. Evaluate g at an initial approximation x(0) = (x1(0), x2(0),...xn(0))t.
2. Determine a direction from x(0) that results in a decrease in the value of g.
3. Move an appropriate amount in this direction and call the new vector x(1).
4. Repeat steps 1 through 3 with x(0) replaced by x(1).
There are many variations to the method of Steepest Decent, some of which involve more intricate methods for determining the value of alpha that will produce a minimum for the single variable function h, this method will be done as extra credit. Other techniques use a multidimensional Taylor polynomial to replace the original multivariable function g and minimize the polynomial instead of g. In general the Steepest Decent methods are linearly convergent and converge independent of the starting approximation. In some instances, however, the methods may converge to something other than an absolute minimum of the function g.
Method:
The following program was written in MATLAB to incorporate Gradient Descent Method:
Main program body:
function y=descent(x_0,A,N) %This function performs the gradient descent technique %on a system g(p)=min g(x), where x is the initial %approximation. x=x_0; for k=1:N if alpha(x,A)==0 break else x=x-alpha(x,A)*((A*x)-(rayquo(A,x)*x)); end end y=x;Calculation of Alpha:
function y=alpha(x_vect,A) %This function calculates alpha for the gradient descent %technique for minimizing functions. g_1=rayquo(A,x_vect); z=((A*x_vect)-rayquo(A,x_vect)*x_vect); if norm(z,2)==0 y=0; else z=z/norm(z,2); alpha_1=0; alpha_3=1; g_3=rayquo(A,x_vect-alpha_3*z); while g_3>=g_1 alpha_3=(alpha_3)/2; g_3=rayquo(A,x_vect-alpha_3*z); end alpha_2=alpha_3/2; g_2=rayquo(A,x_vect-alpha_2*z); h_1=(g_2-g_1)/alpha_2; h_2=(g_3-g_2)/(alpha_3-alpha_2); h_3=(h_2-h_1)/alpha_3; alpha_0=0.5*(alpha_2-(h_1/h_3)); g_0=rayquo(A,x_vect-alpha_0*z); if min(g_0,g_3)==g_0 y=alpha_0; end if min(g_0,g_3)==g_3 y=alpha_3; end endCalculation of the Rayleigh quotient:
function y=rayquo(A,x_vect) %This function is the Rayleigh quotient. y=dot((A*x_vect),x_vect)/dot(x_vect,x_vect);
In writing and testing the program to perform the gradient
descent method for the Rayleigh Quotient, R(x), where
R(x)=(Ax,x)/(x,x) ; (A is a real symmetric matrix, x is
an arbitrary nonzero vector.
Minimizing R(x) will give the minimum
eigenvalue for A.)
there were two difficulties. One is to construct a program that
takes advantage of the sparse matrix format. The second is to
find a program for alpha.
In order to use sparse matrix format, the descent method was coded
as a function that took as one of its arguments a matrix, A. If
the matrix is in Matlab's sparse format, then Matlab will perform
all operations on it as a sparse matrix. The m-file used for this
function was descent.m, which performs iterations on the equation:
x{k+1} = x{k} - alpha*(A*x{k} - R(x{k})*x{k} .
The most difficult part was writing a function for alpha. The
function coded was based on the algorithm from the text, pg 617-
618. After much debugging, rechecking, and checking again, this
program was bug free. Testing it on different matrices, 4 by 4,
16 by 16 for example, produced favorable results. Running on the
nos6_mtx matrix did not produce decent results. Before running
the program, the true minimum eigenvalue for the matrix was found
using min(eig(A)). This eigenvalue was 1.000015259814004e+00.
The program was run using a random vector as the initial guess.
This was done with x=rand(size(A,1),1) to insure that the vector
had the correct dimensions. Then the program was run, varying the
number of iterations.
The program, with the nos6_mtx matrix, works well for 1, 2, and 3
iterations. It appears to be converging quite rapidly (the initial
approximations are quite far off, but rapidly approach the correct
value). At four iterations, however, the program produces a NaN
result, meaning that somewhere it is dividing by zero. How can
this be happening? The only possible places where a division by
zero could occur are in the Rayleigh quotient, in the alpha
subroutine. This means that at one point the zero vector is being
passed into R(x).
It is possible that their is a glitch in one of the subroutines,
but I was unable to find it. It is also possible that the random
choice of the initial vector led to these results. The only
favorable result obtained from running the program was the initial
appearance of rapid convergence.
Extra Credit:
Extra Credit #1:
Prove, that the formula for the gradient is correct, using the directional derivative, cf. 6664 Final, 1997.
Ñ f(x1, x2,…, xn) = fx1(x1, x2,…, xn) + fx2(x1, x2,…, xn) + … + fxn(x1, x2,…, xn)
Du f(x1, x2,…, xn) = fx1(x1, x2,…, xn) cosa + fx2(x1, x2,…, xn) cosb + … + fxn(x1, x2,…, xn) cosd
= v . Ñ f(x1, x2,…, xn)
If x is an eigenvector associated with the eigenvalue l , then Ax = l x, so the matrix A takes the vector x into a scalar multiple of itself.
So, we know that if R(x) is an eigenvalue and x is an eigenvector, then R(x)x = Ax. Then Ax - R(x)x = 0. This supports the idea that Ñ R(x) = Ax – R(x)x, in that if R(x) is an eigenvalue and x is an eigenvector, the function is at a local minimum, and we would not want R(x) or x to change.
However, because we are dealing with an arbitrary nonzero real column-vector and not necessarily an eigenvector, this does not prove the above statement to be true.
From the Axelsson text, Thm. 11.3 states: Let A be a selfadjoint and positive w.r.t. the iner product ( . , . ). Let x0 be an arbitrary vector and let d0 = - r0, where r0 = A x0 – b. For k = 0, 1, …, let
xk+1 = xk + t kdk
rk+1 = rk + t k Adk
dk+1 = - rk+1 + b k dk
where t k and b k are computed by (11.15) and (11.22), respectively, then:…
( c) If the inner product (u,v) = uTv, then the conjugate gradient method minimizes
where ek+1 = xk+1 – x*, where x* = A-1 b is the iteration error. Here,
(dk+1) TA -1 dj = 0, 0£ j £ k, and
(rk+1) T rj = 0, 0 £ j £ k.
Extra Credit #2:
Find the following paper in the library: [1] Knyazev, A.V.; Skorokhodov, A.L. On exact estimates of the convergence rate of the steepest ascent method in symmetric eigenvalue problem. Linear Algebra Appl. 154/156 (1991), 245-257. Take the optimal formula for alpha, which makes the method to be the steepest decent, from the paper and implement it in your program.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function alpha = f(A,x)
% computes alpha for steepest descent
% method
if size(x,2)~=1
x = x';
end
r = R(A,x);
ggrad = grad(A,x);
rgrad = R(A,ggrad);
alpha = 2*(r-rgrad+((r-rgrad)^2+4*(norm(ggrad))^2/(norm(x))^2)^(1/2))^(-1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
R is the Raleigh quotient, as follows:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function R = f(A,x)
% Raleigh Quotient computation
if size(x,2)~=1
x = x';
end
R = ((A*x)'*x)/(x'*x);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The gradient of the Rayleigh quotient
function is necessary, in order to
minimize R(x), and was computed
using the following code:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function grad = f(A,x)
% computes the gradient
if size(x,2)~=1
x = x';
end
grad = A*x - R(A,x)*x;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Starting with a matrix A and an initial guess
x, the minimal eigenvaulue is computed using the
steepest descent method as the minimum of the
Rayleigh quotient. The main routine for the steepest
descent method is as follows:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function eigmin= f(A,x)
% main routine for computing the
% minimum eigenvalue for a matrix
% A with nonzero initial guess x.
% returns a list of the errors by
% iterate, followed by minimum
% eigenvalue estimate
if size(x,2)~=1
x = x';
end
for i = 1:20
if (x'*x) < 10^(-6)
break
elseif norm(grad(A,x)) < 10^(-6)
break
end
x = x - alpha(A,x)*grad(A,x);
er(i) = abs(R(A,x)-min(eig(A)));
end
fprintf('errors are:\n')
fprintf('% .8f\n',er')
eigmin = R(A,x);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The following are results of running the
code using various test matrices:
Trial #1-
>> A = [2,0;0,1];
>> x = [1,1];
Run routine:
>> eigmin(A,x)
errors are:
0.00000000
ans =
1
Routine converged in one iteration
Trial #2-
>> A = [2,1;1,4];
>> x = [2,2]
Run routine:
>> eigmin(A,x)
errors are:
1.41421356
0.00000000
ans =
1.5858
Check:
>> eig(A)
ans =
1.5858
4.4142
For this second trial, the routine converged
in two iterations.
Trial #3-
>> A = [45,2;2,80];
>> x = [1,1];
Run routine:
>> eigmin(A,x)
errors are:
0.45418636
33.43450744
28.41970648
13.25828254
2.15415298
27.13811675
10.29985897
6.07422201
15.12037653
0.70600248
32.46041608
25.02778017
6.24113351
14.68613402
0.97331016
31.44215566
21.71240653
1.90731399
28.01163904
12.27582471
ans =
57.1619
Check:
>> eig(A)
ans =
44.8861
80.1139
For this trial, the routine did not converge within the twenty allowed
iterations. In fact, convergence does not happen in 1000 iterations. The
errors fluctuate wildly, but seem to be bounded - they do not appear to
grow beyond 35. The reason for this behavior is unclear. Since Steepest
Descent methods are generally linearally convergent, the errors should
be decreasing. I can only guess that there may be some bug in the code.