MATH 4660 - Numerical Analysis II

    Final Project : Iterative Methods in Nonquadratic Optimization
    Christopher Mehl, Margo Martinez, Elizabeth Pflum, and Robert VandenBosch

    Topics

    • Project Description
    • Background
    • Method
    • Results
    • Extra Credits

    Project Description:

    Let A be a given real symmetric, i.e. A=A', matrix. Then, its minimal eigenvalue is real and can be found as a minimum of the following function:

    R(x)=(Ax,x)/(x,x),

    where x is an arbitrary nonzero real column-vector, and (u,v)=u'v is the standard scalar product of real column vectors. The gradient of this function is

    grad R(x)=Ax - R(x)x,

    up to a scalar multiplier, which is not important for us.

    Write and test a MATLAB program of the gradient descent method

    x(k+1)=xk - alpha(Axk - R(x)xk)

    to find the minimum of function R(x), thus, to find the smallest eigenvalue of A. Use algorithm 10.3, pp. 716-618 from the text. Test it for different matrices. Try to use it to find the smallest eigenvalue of the matrix NOS6:from set LANPRO, from the Harwell-Boeing Collection. use a random vector as the initial guess. use MATLAB built in functions eig, and eigs, to find the eigenvalue, too. Discuss numerical results and performance. Write a report and publish it on the Internet.

    Background:

    The advantage of the newton and quasi-newton methods for solving systems of nonlinear equations is their speed of convergence once a sufficiently accurate approximation is known. A weakness of these methods is that an accurate initial approximation to the solution is needed to ensure convergence. The Steepest Descent method implemented in this project converges linearly to the solution, but it will usually converge even for poor initial approximations. As a consequence, this method is used to find sufficiently accurate starting approximations for the Newton-based techniques, in the same way the Bisection method is used for a single equation.

    The method of Steepest Descent determines a local minimum for a multivariable function of the form  The connection between the minimization of a function from  and the solution of a system of nonlinear equations is due to the fact that a system of the form

                            f1(x1,x2,...,xn)=0,
                            f2(x1,x2,...,xn)=0,
                                    .
                                    .
                                    .
                            fn(x1,x2,...,xn)=0,
    has a solution at x = (x1, x2,...xn)t precisely when the function g defined by

    has the minimal value.

    The method of Steepest Descent for finding a local minimum for an arbitrary function g from  can be intuitively described as follows:

    1. Evaluate g at an initial approximation x(0) = (x1(0), x2(0),...xn(0))t.

    2. Determine a direction from x(0) that results in a decrease in the value of g.

    3. Move an appropriate amount in this direction and call the new vector x(1).

    4. Repeat steps 1 through 3 with x(0) replaced by x(1).

    There are many variations to the method of Steepest Decent, some of which involve more intricate methods for determining the value of alpha that will produce a minimum for the single variable function h, this method will be done as extra credit. Other techniques use a multidimensional Taylor polynomial to replace the original multivariable function g and minimize the polynomial instead of g. In general the Steepest Decent methods are linearly convergent and converge independent of the starting approximation. In some instances, however, the methods may converge to something other than an absolute minimum of the function g.

    Method:

    The following program was written in MATLAB to incorporate Gradient Descent Method:

    Main program body:

    function y=descent(x_0,A,N)
    %This function performs the gradient descent technique
    %on a system g(p)=min g(x), where x is the initial
    %approximation.
    
    x=x_0;
    for k=1:N
      if alpha(x,A)==0
        break
      else
        x=x-alpha(x,A)*((A*x)-(rayquo(A,x)*x));
      end
    end
    y=x;
    Calculation of Alpha:
    function y=alpha(x_vect,A)
    %This function calculates alpha for the gradient descent
    %technique for minimizing functions.
    
    g_1=rayquo(A,x_vect);
    z=((A*x_vect)-rayquo(A,x_vect)*x_vect);
    if norm(z,2)==0
      y=0;
    else
      z=z/norm(z,2);
      alpha_1=0;
      alpha_3=1;
      g_3=rayquo(A,x_vect-alpha_3*z);
    
      while g_3>=g_1
        alpha_3=(alpha_3)/2;
        g_3=rayquo(A,x_vect-alpha_3*z);
      end
      alpha_2=alpha_3/2;
      g_2=rayquo(A,x_vect-alpha_2*z);
      h_1=(g_2-g_1)/alpha_2;
      h_2=(g_3-g_2)/(alpha_3-alpha_2);
      h_3=(h_2-h_1)/alpha_3;
      alpha_0=0.5*(alpha_2-(h_1/h_3));
      g_0=rayquo(A,x_vect-alpha_0*z);
      if min(g_0,g_3)==g_0
        y=alpha_0;
      end
      if min(g_0,g_3)==g_3
        y=alpha_3;
      end
    end
    Calculation of the Rayleigh quotient:
    function y=rayquo(A,x_vect)
    %This function is the Rayleigh quotient.
    
    y=dot((A*x_vect),x_vect)/dot(x_vect,x_vect);

    Results:
    In writing and testing the program to perform the gradient 
    descent method for the Rayleigh Quotient, R(x), where
    R(x)=(Ax,x)/(x,x)   ; (A is a real symmetric matrix, x is 
                           an arbitrary nonzero vector.  
                           Minimizing R(x) will give the minimum 
                           eigenvalue for A.)
    
    there were two difficulties.  One is to construct a program that 
    takes advantage of the sparse matrix format.  The second is to 
    find a program for alpha.
    
    In order to use sparse matrix format, the descent method was coded
    as a function that took as one of its arguments a matrix, A.  If 
    the matrix is in Matlab's sparse format, then Matlab will perform 
    all operations on it as a sparse matrix.  The m-file used for this 
    function was descent.m, which performs iterations on the equation:
    
    x{k+1} = x{k} - alpha*(A*x{k} - R(x{k})*x{k} .
    
    The most difficult part was writing a function for alpha.  The 
    function coded was based on the algorithm from the text, pg 617-
    618.  After much debugging, rechecking, and checking again, this 
    program was bug free.  Testing it on different matrices, 4 by 4, 
    16 by 16 for example, produced favorable results.  Running on the 
    nos6_mtx matrix did not produce decent results.  Before running 
    the program, the true minimum eigenvalue for the matrix was found 
    using min(eig(A)).  This eigenvalue was 1.000015259814004e+00.  
    The program was run using a random vector as the initial guess.  
    This was done with x=rand(size(A,1),1) to insure that the vector 
    had the correct dimensions.  Then the program was run, varying the 
    number of iterations.  
    
    The program, with the nos6_mtx matrix, works well for 1, 2, and 3 
    iterations.  It appears to be converging quite rapidly (the initial
    approximations are quite far off, but rapidly approach the correct 
    value).  At four iterations, however, the program produces a NaN 
    result, meaning that somewhere it is dividing by zero.  How can 
    this be happening?  The only possible places where a division by 
    zero could occur are in the Rayleigh quotient, in the alpha 
    subroutine.  This means that at one point the zero vector is being
    passed into R(x). 
    
    It is possible that their is a glitch in one of the subroutines, 
    but I was unable to find it.  It is also possible that the random 
    choice of the initial vector led to these results.  The only 
    favorable result obtained from running the program was the initial
    appearance of rapid convergence.

    Extra Credit:

    • Extra Credit #1
    • Extra Credit #2

    Extra Credit #1:

    Prove, that the formula for the gradient is correct, using the directional derivative, cf. 6664 Final, 1997.

    Proof

    Definitions:
    1. gradient: z = f(x1, x2,…, xn), then the gradient of f, denoted by Ñ f(x1, x2,…, xn) is the vector
    2. Ñ f(x1, x2,…, xn) = fx1(x1, x2,…, xn) + fx2(x1, x2,…, xn) + … + fxn(x1, x2,…, xn)

       

    3. directional derivative: Let f be a function of x1, x2,…, xn, with continuous first partial derivatives. The directional derivative of f in the direction of the unit vector v = cosa i + cosb j + cosd k is given by
    4. Du f(x1, x2,…, xn) = fx1(x1, x2,…, xn) cosa + fx2(x1, x2,…, xn) cosb + … + fxn(x1, x2,…, xn) cosd

      = v . Ñ f(x1, x2,…, xn)

       

    5. characteristic polynomial: If A is a square matrix, the polynomial defined by p(l ) = det(A-l I) is called the characteristic polynomial of A.
    6. eigenvalue and eigenvector: If p is the characteristic polynomial of the matrix A, the zeros of p are called eigenvalues,, or characteristic values, of the matrix A. If l is an eigenvalue of A and x¹ 0 has the property that (A - l I)x = 0, then x is called an eigenvector, or characteristic vector, of A corresponding to the eigenvalue l .
     

    If x is an eigenvector associated with the eigenvalue l , then Ax = l x, so the matrix A takes the vector x into a scalar multiple of itself.

     

    So, we know that if R(x) is an eigenvalue and x is an eigenvector, then R(x)x = Ax. Then Ax - R(x)x = 0. This supports the idea that Ñ R(x) = Ax – R(x)x, in that if R(x) is an eigenvalue and x is an eigenvector, the function is at a local minimum, and we would not want R(x) or x to change.

     

    However, because we are dealing with an arbitrary nonzero real column-vector and not necessarily an eigenvector, this does not prove the above statement to be true.

     

    From the Axelsson text, Thm. 11.3 states: Let A be a selfadjoint and positive w.r.t. the iner product ( . , . ). Let x0 be an arbitrary vector and let d0 = - r0, where r0 = A x0 – b. For k = 0, 1, …, let

    xk+1 = xk + t kdk

    rk+1 = rk + t k Adk

    dk+1 = - rk+1 + b k dk

    where t k and b k are computed by (11.15) and (11.22), respectively, then:…

    ( c) If the inner product (u,v) = uTv, then the conjugate gradient method minimizes

    (rk+1) TA -1 r k+1 = (ek+1) TA ek+1

    where ek+1 = xk+1 – x*, where x* = A-1 b is the iteration error. Here,

    (dk+1) TA -1 dj = 0, 0£ j £ k, and

    (rk+1) T rj = 0, 0 £ j £ k.

    Extra Credit #2:

    Find the following paper in the library: [1] Knyazev, A.V.; Skorokhodov, A.L. On exact estimates of the convergence rate of the steepest ascent method in symmetric eigenvalue problem. Linear Algebra Appl. 154/156 (1991), 245-257. Take the optimal formula for alpha, which makes the method to be the steepest decent, from the paper and implement it in your program.

    Implemenation of Steepest Descent Method with optimal alpha

    The following Matlab function was used to calclate alpha as prescribed in [1]:
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    function alpha = f(A,x)
    
    % computes alpha for steepest descent 
    % method  
    
    if size(x,2)~=1
    x = x';
    end
    r = R(A,x);
    ggrad = grad(A,x);
    rgrad = R(A,ggrad);
    
    alpha = 2*(r-rgrad+((r-rgrad)^2+4*(norm(ggrad))^2/(norm(x))^2)^(1/2))^(-1);
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    R is the Raleigh quotient, as follows:
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    function R = f(A,x)
    
    % Raleigh Quotient computation
    
    if size(x,2)~=1
    x = x';
    end
    
    R = ((A*x)'*x)/(x'*x);
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    The gradient of the Rayleigh quotient 
    function is necessary, in order to 
    minimize R(x), and was computed
    using the following code:
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    function grad = f(A,x)
    
    % computes the gradient 
    
    if size(x,2)~=1
    x = x';
    end
    
    grad = A*x - R(A,x)*x;
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    Starting with a matrix A and an initial guess
    x, the minimal eigenvaulue is computed using the
    steepest descent method as the minimum of the
    Rayleigh quotient.  The main routine for the steepest
    descent method is as follows:
    
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    function eigmin= f(A,x)
    
    % main routine for computing the
    % minimum eigenvalue for a matrix
    % A with nonzero initial guess x.
    
    % returns a list of the errors by
    % iterate, followed by minimum 
    % eigenvalue estimate
    
    if size(x,2)~=1
    x = x';
    end
    
    for i = 1:20
    if (x'*x) < 10^(-6)
    break
    elseif norm(grad(A,x)) < 10^(-6)
    break
    end
    x = x - alpha(A,x)*grad(A,x);
    er(i) = abs(R(A,x)-min(eig(A)));
    end
    fprintf('errors are:\n')
    fprintf('% .8f\n',er')
    eigmin = R(A,x);
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    The following are results of running the
    code using various test matrices:
    
    Trial #1-
    >> A = [2,0;0,1];
    >> x = [1,1];
    
    Run routine:
    >> eigmin(A,x)
    errors are:
     0.00000000
    
    ans =
    
         1
    
    Routine converged in one iteration
    
    Trial #2-
    >> A = [2,1;1,4];
    >> x = [2,2]
    
    Run routine:
    >> eigmin(A,x)
    errors are:
     1.41421356
     0.00000000
    
    ans =
    
        1.5858
    
    Check:
    >> eig(A)
    
    ans =
    
        1.5858
        4.4142
    
    For this second trial, the routine converged
    in two iterations.
    
    Trial #3-
    >> A = [45,2;2,80];
    >> x = [1,1];
    
    Run routine:
    >> eigmin(A,x)
    errors are:
     0.45418636
     33.43450744
     28.41970648
     13.25828254
     2.15415298
     27.13811675
     10.29985897
     6.07422201
     15.12037653
     0.70600248
     32.46041608
     25.02778017
     6.24113351
     14.68613402
     0.97331016
     31.44215566
     21.71240653
     1.90731399
     28.01163904
     12.27582471
    
    ans =
    
       57.1619
    
    Check:
    >> eig(A)
    
    ans =
    
       44.8861
       80.1139
    For this trial, the routine did not converge within the twenty allowed iterations. In fact, convergence does not happen in 1000 iterations. The errors fluctuate wildly, but seem to be bounded - they do not appear to grow beyond 35. The reason for this behavior is unclear. Since Steepest Descent methods are generally linearally convergent, the errors should be decreasing. I can only guess that there may be some bug in the code.