next up previous
Next: Krylov Space Methods Up: No Title Previous: Domain Decomposition

Conjugate Gradient Method

There are many techniques for solving systems of linear equations, i.e., solving for x in Ax = b. When the system is dense, the routines in the LAPACK or ScaLAPACK libraries are appropriate. It is often the case that the system is sparse, the case of most systems that arise as discritixations of PDEs. The Conjugate Gradient method is suitable for solving any linear system where the coefficient matrix A is both symmetric, and positive definite. A matrix A is positive definite if (x,Ax) ;SPMgt; 0 for all nonzero vectors x. If you think about it for a moment, you will realize this condition is equivalent to all the eigenvalues of A being positive. CG is often used for sparse systems, because it only requires a single operation, matrix-vector multiplication. In fact, a "matrix free" formulation does not even require that you actually have A, but only that you have a function for computing the product of A and a vector, say v.

Roughly speaking, CG is a minimization procedure for the equation

displaymath297

A sketch of the CG algorithm:

     x =          initial guess for solution of Ax=b
     r = b - Ax = residual, to be made small
     p = r      = initial "search direction"
     
     do while ( new_r , new_r ) not small
        v = Ap                        ... matrix-vector multiply
        a = ( r , r ) / ( p , v )     ... dot product 
        x = x + a*p                   ... updated approximate solution
        r_new = r - a*v               ... update the residual
        g = ( r_new , r_new ) / ( r , r )
        p = r_new + g*p               ... update search direction
        r = r_new
     enddo

Here is a rough motivation for this algorithm.

CG maintains 3 vectors at each step, the approximate solution x, its residual r=Ax-b, and a search direction p, which is also called a conjugate gradient. At each step x is improved by searching for a better solution in the direction p, yielding an improved solution x+a*p. This direction p is called a gradient because we are in fact doing gradient descent on a certain measure of the error (namely sqrt(r , A^-1 r)). The directions p_i and p_j from steps i and j of the algorithm are called conjugate, or more precisely A-conjugate, because they satisfy ( p_i , Ap_j ) = 0 if i!=j. [Prove this fact!] One can also show that after i iterations x_i is the "optimal" solution among all possible linear combinations of the form:

displaymath299

For most matrices, the majority of work is in the sparse matrix-vector multiplication v=Ap in the first step. all matrix-vector and vector-vector operations should use BLAS routine. Operations in CG are easy to parallelize.

The rate of convergence of CG depends on the condition number of A. The condition number is ratio of the largest to the smallest eigenvalue of A. A roughly equivalent quantity is ||A||*||A^-1|| where the norm of a matrix is the magnitude of the largest entry. The larger the condition number, the slower the convergence. One can show that the number of CG iterations required to reduce the error by a constant g<1 is proportional to the square root of the condition number. One can speed up convergence by "preconditioning" - at an intermediate stage of the calculatin, solving a related system Mz=q, where M is an approximate inverse of A and easy to invert. Choosing a good preconditioner^ is something of an art. Among the options: point- or block- Jacobi, ADI, ILU [incomplete LU]. There are methods similar to CG that are suitable for matrices A that are not symmetric or positive definite [CG squared, Bi-CG, MINRES, GMRES] Like CG, most of their work involves computing the matrix vector product Ap or A^tp, as well as dot-products. See the Templates for the Solution of Sparse Linear Systems webpage.


next up previous
Next: Krylov Space Methods Up: No Title Previous: Domain Decomposition

E. Bruce Pitman
Thu Feb 10 18:57:30 EST 2000