On the nonmonotone behavior of the Newton‐GMBACK method

Abstract

GMBACK is a Krylov solver for large linear systems, which is based on backward error minimization properties. The minimum backward error is guaranteed (in exact arithmetic) to decrease when the subspace dimension is increased. In this paper we consider two test problems which lead to nonlinear systems which we solve by the Newton‐GMBACK. We notice that in floating point arithmetic the mentioned property does not longer hold; this leads to nonmonotone behavior of the errors, as reported in a previous paper. We also propose a remedy, which solves this drawback

Authors

Emil Cătinaș
Tiberiu Popoviciu Institute of Numerical Analysis, Academy Romanian

Keywords

linear systems; Krylov subspace method; backward error; GMBACK.

References

See the expanding block below.

Paper coordinates

E. Cătinaş, On the nonmonotone behavior of the Newton-GMBACK method, AIP Conf. Proc., 2008, vol. 1046, pp. 87-90
https://doi.org/10.1063/1.2997323

PDF

About this paper

Journal

AIP Conference Proceedings

Publisher Name

AIP

Print ISSN

??

Online ISSN

??

Google Scholar Profile

soon

??

Paper (preprint) in HTML form

1.2997323 (1)

On the Nonmonotone Behavior of the Newton-Gmback Method

Emil CătinaşRomanian Academy, "T. Popoviciu" Institute of Numerical Analysis, P.O. Box 68-1, Cluj-Napoca, Romania, e-mail: ecatinas@ictp.acad.ro

Abstract

GMBACK is a Krylov solver for large linear systems, which is based on backward error minimization properties. The minimum backward error is guaranteed (in exact arithmetic) to decrease when the subspace dimension is increased. In this paper we consider two test problems which lead to nonlinear systems which we solve by the Newton-GMBACK. We notice that in floating point arithmetic the mentioned property does not longer hold; this leads to nonmonotone behavior of the errors, as reported in a previous paper. We also propose a remedy, which solves this drawback.

Keywords: linear/nonlinear systems, Krylov solvers, Newton method.
PACS: 02.60.Cb

NONLINEAR SYSTEMS.

The Newton method for solving a nonlinear system F ( x ) = 0 , F : D R N R N F ( x ) = 0 , F : D R N R N F(x)=0,quad F:D subeR^(N)rarrR^(N)F(x)=0, \quad F: D \subseteq \mathbb{R}^{N} \rightarrow \mathbb{R}^{N}F(x)=0,F:DRNRN leads to the solving of a linear system at each iteration step:
F ( x k ) s k = F ( x k ) x k + 1 = x k + s k , k = 0 , 1 , , x 0 D . F x k s k = F x k x k + 1 = x k + s k , k = 0 , 1 , , x 0 D . {:[F^(')(x_(k))s_(k)=-F(x_(k))],[x_(k+1)=x_(k)+s_(k)","quad k=0","1","dots","quadx_(0)in D.]:}\begin{aligned} & F^{\prime}\left(x_{k}\right) s_{k}=-F\left(x_{k}\right) \\ & x_{k+1}=x_{k}+s_{k}, \quad k=0,1, \ldots, \quad x_{0} \in D . \end{aligned}F(xk)sk=F(xk)xk+1=xk+sk,k=0,1,,x0D.
Under the following conditions (which will be implicitly assumed throughout this paper) the Newton method converge locally at q q qqq-superlinear rate (see [7]):
x int D x int D -EEx^(**)in int D-\exists x^{*} \in \operatorname{int} DxintD such that F ( x ) = 0 F x = 0 F(x^(**))=0F\left(x^{*}\right)=0F(x)=0
  • the mapping F F FFF is Fréchet differentiable on int D int D int D\operatorname{int} DintD, with F F F^(')F^{\prime}F continuous at x x x^(**)x^{*}x;
  • the Jacobian F ( x ) F x F^(')(x^(**))F^{\prime}\left(x^{*}\right)F(x) is invertible.
When one considers approximate Jacobians at each step, F ( x k ) + Δ k R N × N F x k + Δ k R N × N F^(')(x_(k))+Delta_(k)inR^(N xx N)F^{\prime}\left(x_{k}\right)+\Delta_{k} \in \mathbb{R}^{N \times N}F(xk)+ΔkRN×N, we are lead to the quasi-Newton (QN) iterates
( F ( x k ) + Δ k ) s k = F ( x k ) x k + 1 = x k + s k , k = 0 , 1 , , x 0 D . F x k + Δ k s k = F x k x k + 1 = x k + s k , k = 0 , 1 , , x 0 D . {:[(F^(')(x_(k))+Delta_(k))s_(k)=-F(x_(k))],[x_(k+1)=x_(k)+s_(k)","quad k=0","1","dots","quadx_(0)in D.]:}\begin{aligned} & \left(F^{\prime}\left(x_{k}\right)+\Delta_{k}\right) s_{k}=-F\left(x_{k}\right) \\ & x_{k+1}=x_{k}+s_{k}, \quad k=0,1, \ldots, \quad x_{0} \in D . \end{aligned}(F(xk)+Δk)sk=F(xk)xk+1=xk+sk,k=0,1,,x0D.
We have characterized the superlinear convergence of these iterates in the following result:
Theorem 1. [3] Assume that the QN iterates converge to x x x^(**)x^{*}x. Then the convergence is superlinear if and only if
(1) Δ k s k = o ( F ( x k ) ) , as k . (1) Δ k s k = o F x k ,  as  k . {:(1)||Delta_(k)s_(k)||=o(||F(x_(k))||)","quad" as "k rarr oo.:}\begin{equation*} \left\|\Delta_{k} s_{k}\right\|=o\left(\left\|F\left(x_{k}\right)\right\|\right), \quad \text { as } k \rightarrow \infty . \tag{1} \end{equation*}(1)Δksk=o(F(xk)), as k.

THE GMBACK METHOD

When the dimension N N NNN is large, the numerical solving of a linear system
A u = b , A R N × N nonsingular, b R N A u = b , A R N × N  nonsingular,  b R N Au=b,quad A inR^(N xx N)" nonsingular, "b inR^(N)A u=b, \quad A \in \mathbb{R}^{N \times N} \text { nonsingular, } b \in \mathbb{R}^{N}Au=b,ARN×N nonsingular, bRN
becomes a difficult task. The Krylov solvers are popular choices for accomplishing this task, since they may offer good approximations at low computational cost.
We shall consider here the GMBACK solver introduced by Kasenally in [6]. For a given subspace dimension m { 1 , , N } m { 1 , , N } m in{1,dots,N}m \in\{1, \ldots, N\}m{1,,N} and an initial approximation u 0 R N u 0 R N u_(0)inR^(N)u_{0} \in \mathbb{R}^{N}u0RN having the residual r 0 = b A u 0 r 0 = b A u 0 r_(0)=b-Au_(0)r_{0}=b-A u_{0}r0=bAu0, it finds u m G B u 0 + K m = u 0 + span { r 0 , A r 0 , , A m 1 r 0 } u m G B u 0 + K m = u 0 + span r 0 , A r 0 , , A m 1 r 0 u_(m)^(GB)inu_(0)+K_(m)=u_(0)+span{r_(0),Ar_(0),dots,A^(m-1)r_(0)}u_{m}^{G B} \in u_{0}+\mathscr{K}_{m}= u_{0}+\operatorname{span}\left\{r_{0}, A r_{0}, \ldots, A^{m-1} r_{0}\right\}umGBu0+Km=u0+span{r0,Ar0,,Am1r0} by the following minimization property:
Δ m G B F = min u m u 0 + K m Δ m F w.r.t. ( A Δ m ) u m = b Δ m G B F = min u m u 0 + K m Δ m F  w.r.t.  A Δ m u m = b ||Delta_(m)^(GB)||_(F)=min_(u_(m)inu_(0)+K_(m))||Delta_(m)||_(F)quad" w.r.t. "(A-Delta_(m))u_(m)=b\left\|\Delta_{m}^{G B}\right\|_{F}=\min _{u_{m} \in u_{0}+\mathscr{K}_{m}}\left\|\Delta_{m}\right\|_{F} \quad \text { w.r.t. }\left(A-\Delta_{m}\right) u_{m}=bΔmGBF=minumu0+KmΔmF w.r.t. (AΔm)um=b
Here F F ||*||_(F)\|\cdot\|_{F}F denotes the Frobenius norm of a matrix, Z F = tr ( Z Z t ) 1 / 2 Z F = tr Z Z t 1 / 2 ||Z||_(F)=tr(ZZ^(t))^(1//2)\|Z\|_{F}=\operatorname{tr}\left(Z Z^{t}\right)^{1 / 2}ZF=tr(ZZt)1/2 while 2 2 ||*||_(2)\|\cdot\|_{2}2 will denote the Euclidean norm from R N R N R^(N)\mathbb{R}^{N}RN and its induced operator norm.
The following steps are performed for determining u m G B u m G B u_(m)^(GB)u_{m}^{G B}umGB :
Arnoldi
  • Let r 0 = b A x 0 , β = r 0 2 r 0 = b A x 0 , β = r 0 2 r_(0)=b-Ax_(0),beta=||r_(0)||_(2)r_{0}=b-A x_{0}, \beta=\left\|r_{0}\right\|_{2}r0=bAx0,β=r02 and v 1 = 1 β r 0 v 1 = 1 β r 0 v_(1)=(1)/(beta)r_(0)v_{1}=\frac{1}{\beta} r_{0}v1=1βr0;
  • For j = 1 , , m j = 1 , , m j=1,dots,mj=1, \ldots, mj=1,,m do
h i j = ( A v j , v i ) , for i = 1 , , j v ^ j + 1 = A v ^ j j = 1 i h i j v i h j + 1 , j = v ^ j + 1 2 v j + 1 = 1 h j + 1 , j v ^ j + 1 h i j = A v j , v i ,  for  i = 1 , , j v ^ j + 1 = A v ^ j j = 1 i h i j v i h j + 1 , j = v ^ j + 1 2 v j + 1 = 1 h j + 1 , j v ^ j + 1 {:[h_(ij)=(Av_(j),v_(i))","quad" for "i=1","dots","j],[ hat(v)_(j+1)=A hat(v)_(j)-sum_(j=1)^(i)h_(ij)v_(i)],[h_(j+1,j)=|| hat(v)_(j+1)||_(2)],[v_(j+1)=(1)/(h_(j+1,j)) hat(v)_(j+1)]:}\begin{aligned} & h_{i j}=\left(A v_{j}, v_{i}\right), \quad \text { for } i=1, \ldots, j \\ & \hat{v}_{j+1}=A \hat{v}_{j}-\sum_{j=1}^{i} h_{i j} v_{i} \\ & h_{j+1, j}=\left\|\hat{v}_{j+1}\right\|_{2} \\ & v_{j+1}=\frac{1}{h_{j+1, j}} \hat{v}_{j+1} \end{aligned}hij=(Avj,vi), for i=1,,jv^j+1=Av^jj=1ihijvihj+1,j=v^j+12vj+1=1hj+1,jv^j+1
  • Form the Hessenberg matrix H ¯ m R ( m + 1 ) × m H ¯ m R ( m + 1 ) × m bar(H)_(m)inR^((m+1)xx m)\bar{H}_{m} \in \mathbb{R}^{(m+1) \times m}H¯mR(m+1)×m with the (possible) nonzero elements h i j h i j h_(ij)h_{i j}hij determined above, and the matrix V m R N × m V m R N × m V_(m)inR^(N xx m)V_{m} \in \mathbb{R}^{N \times m}VmRN×m having as columns the vectors v j : V m = [ v 1 v m ] v j : V m = v 1 v m v_(j):V_(m)=[v_(1)dotsv_(m)]v_{j}: V_{m}=\left[v_{1} \ldots v_{m}\right]vj:Vm=[v1vm].
    GMBACK
  • Let β = r 0 2 β = r 0 2 beta=||r_(0)||_(2)\beta=\left\|r_{0}\right\|_{2}β=r02,
    H ^ m = [ β e 1 H ¯ m ] R ( m + 1 ) × ( m + 1 ) , G ^ m = [ u 0 V m ] R N × ( m + 1 ) H ^ m = β e 1      H ¯ m R ( m + 1 ) × ( m + 1 ) , G ^ m = u 0      V m R N × ( m + 1 ) hat(H)_(m)=[[-betae_(1), bar(H)_(m)]]inR^((m+1)xx(m+1)),quad hat(G)_(m)=[[u_(0),V_(m)]]inR^(N xx(m+1))\hat{H}_{m}=\left[\begin{array}{ll}-\beta e_{1} & \bar{H}_{m}\end{array}\right] \in \mathbb{R}^{(m+1) \times(m+1)}, \quad \hat{G}_{m}=\left[\begin{array}{ll}u_{0} & V_{m}\end{array}\right] \in \mathbb{R}^{N \times(m+1)}H^m=[βe1H¯m]R(m+1)×(m+1),G^m=[u0Vm]RN×(m+1),
    P = H ^ m t H ^ m R ( m + 1 ) × ( m + 1 ) P = H ^ m t H ^ m R ( m + 1 ) × ( m + 1 ) P= hat(H)_(m)^(t) hat(H)_(m)inR^((m+1)xx(m+1))quadP=\hat{H}_{m}^{t} \hat{H}_{m} \in \mathbb{R}^{(m+1) \times(m+1)} \quadP=H^mtH^mR(m+1)×(m+1) and Q = G ^ m t G ^ m R ( m + 1 ) × ( m + 1 ) Q = G ^ m t G ^ m R ( m + 1 ) × ( m + 1 ) quad Q= hat(G)_(m)^(t) hat(G)_(m)inR^((m+1)xx(m+1))\quad Q=\hat{G}_{m}^{t} \hat{G}_{m} \in \mathbb{R}^{(m+1) \times(m+1)}Q=G^mtG^mR(m+1)×(m+1);
  • Determine an eigenvector v m + 1 v m + 1 v_(m+1)v_{m+1}vm+1 corresponding to the smallest eigenvalue λ m + 1 G B λ m + 1 G B lambda_(m+1)^(GB)\lambda_{m+1}^{G B}λm+1GB of the generalized eigenproblem P v = λ Q v ; P v = λ Q v ; Pv=lambda Qv;P v=\lambda Q v ;Pv=λQv;
  • If the first component v m + 1 ( 1 ) v m + 1 ( 1 ) v_(m+1)^((1))v_{m+1}^{(1)}vm+1(1) is nonzero, compute the vector y m G B R m y m G B R m y_(m)^(GB)inR^(m)y_{m}^{G B} \in \mathbb{R}^{m}ymGBRm by scaling v m + 1 v m + 1 v_(m+1)v_{m+1}vm+1 such that
[ 1 y m G B ] = 1 v m + 1 ( 1 ) v m + 1 ; 1 y m G B = 1 v m + 1 ( 1 ) v m + 1 ; [[1],[y_(m)^(GB)]]=(1)/(v_(m+1)^((1)))v_(m+1);\left[\begin{array}{c} 1 \\ y_{m}^{G B} \end{array}\right]=\frac{1}{v_{m+1}^{(1)}} v_{m+1} ;[1ymGB]=1vm+1(1)vm+1;
  • Set u m G B = x 0 + V m y m G B u m G B = x 0 + V m y m G B u_(m)^(GB)=x_(0)+V_(m)y_(m)^(GB)u_{m}^{G B}=x_{0}+V_{m} y_{m}^{G B}umGB=x0+VmymGB.
We shall assume in the following analysis that u m G B u m G B u_(m)^(GB)u_{m}^{G B}umGB exists (this may not be the case when all the eigenvectors of the smallest eigenvalue have the first component 0 ).
Kasenally proved that for any u 0 R N u 0 R N u_(0)inR^(N)u_{0} \in \mathbb{R}^{N}u0RN and m { 1 , , N } m { 1 , , N } m in{1,dots,N}m \in\{1, \ldots, N\}m{1,,N}, the backward error Δ m G B Δ m G B Delta_(m)^(GB)\Delta_{m}^{G B}ΔmGB corresponding to the GMBACK solution satisfies
(2) Δ m G B F = λ m + 1 G B (2) Δ m G B F = λ m + 1 G B {:(2)||Delta_(m)^(GB)||_(F)=sqrt(lambda_(m+1)^(GB)):}\begin{equation*} \left\|\Delta_{m}^{G B}\right\|_{F}=\sqrt{\lambda_{m+1}^{G B}} \tag{2} \end{equation*}(2)ΔmGBF=λm+1GB
Regarding the induced operator Euclidean norm, the following inequality is known:
(3) Z 2 Z F , for all Z R N × N (3) Z 2 Z F ,  for all  Z R N × N {:(3)||Z||_(2) <= ||Z||_(F)","quad" for all "Z inR^(N xx N):}\begin{equation*} \|Z\|_{2} \leq\|Z\|_{F}, \quad \text { for all } Z \in \mathbb{R}^{N \times N} \tag{3} \end{equation*}(3)Z2ZF, for all ZRN×N
The eigenvalues at steps m m mmm and m + 1 m + 1 m+1m+1m+1 in the Arnoldi algorithm interlace as follows.
Lemma 2. [6] Suppose that m + 1 m + 1 m+1m+1m+1 steps of the Arnoldi process have been taken and h m + 2 , m + 1 0 h m + 2 , m + 1 0 h_(m+2,m+1)!=0h_{m+2, m+1} \neq 0hm+2,m+10. Furthermore, assume that { λ i G B } i = 1 , , m + 1 λ i G B i = 1 , , m + 1 {lambda_(i)^(GB)}_(i=1,dots,m+1)\left\{\lambda_{i}^{G B}\right\}_{i=1, \ldots, m+1}{λiGB}i=1,,m+1 and { λ ^ i G B } i = 1 , , m + 2 λ ^ i G B i = 1 , , m + 2 { hat(lambda)_(i)^(GB)}_(i=1,dots,m+2)\left\{\hat{\lambda}_{i}^{G B}\right\}_{i=1, \ldots, m+2}{λ^iGB}i=1,,m+2 are, respectively, the eigenvalues of the matrix pairs ( P m , Q m ) P m , Q m (P_(m),Q_(m))\left(P_{m}, Q_{m}\right)(Pm,Qm) and ( P m + 1 , Q m + 1 ) P m + 1 , Q m + 1 (P_(m+1),Q_(m+1))\left(P_{m+1}, Q_{m+1}\right)(Pm+1,Qm+1) arranged in decreasing order. Then, for any i m + 1 i m + 1 i <= m+1i \leq m+1im+1,
λ ^ i G B λ i G B λ ^ i + 1 G B . λ ^ i G B λ i G B λ ^ i + 1 G B . hat(lambda)_(i)^(GB) <= lambda_(i)^(GB) <= hat(lambda)_(i+1)^(GB).\hat{\lambda}_{i}^{G B} \leq \lambda_{i}^{G B} \leq \hat{\lambda}_{i+1}^{G B} .λ^iGBλiGBλ^i+1GB.

THE CONVERGENCE OF THE NEWTON-GMBACK METHOD

The Newton-GMBACK iterates may be written as
(4) ( F ( y k ) Δ k G B ) s k G B = F ( y k ) , (4) F y k Δ k G B s k G B = F y k , {:(4)(F^(')(y_(k))-Delta_(k)^(GB))s_(k)^(GB)=-F(y_(k))",":}\begin{equation*} \left(F^{\prime}\left(y_{k}\right)-\Delta_{k}^{G B}\right) s_{k}^{G B}=-F\left(y_{k}\right), \tag{4} \end{equation*}(4)(F(yk)ΔkGB)skGB=F(yk),
and we may control the convergence of the iterates by Theorem 1 with the aid of the backward error. It is worth noting that we may use formulas (2) and (3) to evaluate the magnitude of the backward error in the Euclidean norm.
We obtain:
Theorem 3. Assume that the Newton-GMBACK iterates are well defined and converge to x x x^(**)x^{*}x. If
(5) λ k G B 0 , as k (5) λ k G B 0 ,  as  k {:(5)lambda_(k)^(GB)rarr0","quad" as "k rarr oo:}\begin{equation*} \lambda_{k}^{G B} \rightarrow 0, \quad \text { as } k \rightarrow \infty \tag{5} \end{equation*}(5)λkGB0, as k
then they converge superlinearly.
Of course, we may also use the inexact Newton model, and control the convergence of the iterates with the aid of residuals, see [4].
FIGURE 1. Newton-GMBACK errors for the Bratu problem.

Bratu problem

Consider the nonlinear partial differential equation
u + α u x + λ e u = f u + α u x + λ e u = f -/_\u+alphau_(x)+lambdae^(u)=f-\triangle u+\alpha u_{x}+\lambda e^{u}=fu+αux+λeu=f
over the unit square of R 2 R 2 R^(2)\mathbb{R}^{2}R2, with Dirichlet boundary conditions. As mentioned in [1], this is a standard problem, a simplified form of which is known as the Bratu problem. We have discretized by 5-point finite differences, respectively by central finite differences on a uniform mesh, obtaining a system of nonlinear equations of size N = ( n 2 ) 2 N = ( n 2 ) 2 N=(n-2)^(2)N=(n-2)^{2}N=(n2)2, where n n nnn is the number of mesh points in each direction. As in [1], we took f f fff such that the solution of the discretized problem to be the constant unity, and α = 10 , λ = 1 α = 10 , λ = 1 alpha=10,lambda=1\alpha=10, \lambda=1α=10,λ=1, the initial approximations in the inner iterations were zero. The runs were made on a HP Proliant 570 G4 server, using MATLAB 2007a.
We took N = 16 , 384 N = 16 , 384 N=16,384N=16,384N=16,384 and considered first some standard iterations, with fixed subspace dimension, of size m = 40 m = 40 m=40m=40m=40. We noticed that, starting from outer iteration k = 9 k = 9 k=9k=9k=9, where F F ||F||\|F\|F was of magnitude 1 e 9 1 e 9 1e-91 \mathrm{e}-91e9, lemma 2 did not hold in floating point arithmetic. As we can see in figure 1, the consequence is that the convergence of the errors of iterates is no longer monotone.
The remedy we propose is to check at each inner iteration step in GMBACK whether the size of λ λ lambda\lambdaλ is decreasing, and to stop the iterations if the size is increasing. We obtain monotone behavior of the errors, and the plot in figure 1 is relevant for the runs we made.

ACKNOWLEDGMENTS

This research has been supported by MEdC under grant 2CEx06-11-96.

REFERENCES

  1. P. N., Brown, and Y., Saad, Hybrid Krylov Methods for Nonlinear Systems of Equations, SIAM Journal on Scientific and Statistical Computing 11, pp. 450-481 (1990).
  2. E. Cătinaş, Inexact perturbed Newton methods and applications to a class of Krylov solvers, J. Optim. Theory Appl. 108, pp. 543-570 (2001).
  3. E. Cătinaş, The inexact, inexact perturbed and quasi-Newton methods are equivalent models, Math. Comp., 74 no. 249 , pp. 291-301 (2005).
  4. R. S. Dembo, S. C. Eisenstat and T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal. 19, pp. 400-408 (1982).
  5. J. E. Dennis, Jr. and J. J. Moré, A characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comp. 28, pp. 549-560 (1974).
  6. E. M. Kasenally, GMBACK: a generalised minimum backward error algorithm for nonsymmetric linear systems, SIAM J. Sci. Comput. 16, pp. 698-719 (1995).
  7. J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970.
2008

Related Posts