GMBACK is a Krylov solver for linear systems in \(\mathbb{R}^n\).
We analyze here the high convergence orders (superlinear convergence) of the Newton-GMBACK methods, which can be characterized applying three different existing results.
In this note we show by some direct calculations that these characterizations are equivalent.
Authors
Emil Cătinaş
(Tiberiu Popoviciu Institute of Numerical Analysis, Romanian Academy)
Keywords
nonlinear system of equations in Rn; inexact Newton method; Krylov methods; linear systems of equation in Rn; residual; local convergence; superlinear convergence.
Cite this paper as:
E. Cătinaş, On the high convergence orders of the Newton-GMBACK methods, Rev. Anal. Numér. Théor. Approx., 28 (1999) no. 2, pp. 125-132.
[1] P.N. Brown, A theoretical comparison of the Arnoldi and GMRES algorithms, SIAM, J. Sci. Stat. Comput., 12 (1991) no. 1, pp. 58–78.
[2] P.N. Brown and H.F. Walker, GMRES on (nearly) singular systems, SIAM J. Matrix Anal. Appl., 18 (1997) no. 1, pp. 37–51.
[3] E. Catinas, A note on inexact secant methods, Rev. Anal. Numer. Theor. Approx., 25 (1996) nos. 1–2, pp. 33–41.
[4] E. Catinas, Newton and Newton-Krylov methods for solving nonlinear systems in Rn, Ph.D. Thesis, Cluj-Napoca, Romania, (to be defended).
[5] E. Catinas, Inexact perturbed Newton methods and some applications for a class of Krylov solvers, J. Optim. Theory Appl., submitted.
[6] R.S. Dembo, S.C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal., 19 (1982) no. 2, pp. 400–408.
[7] J.E. Dennis, Jr. and J.J. More, A characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comput., 28 (1974) no. 126, pp. 549–560.
[8] J.E. Dennis, Jr. and J.J. More, Quasi-Newton methods, motivation and theory, SIAM Review, 19 (1977) no. 1, pp. 46–89.
[9] J.E. Dennis, Jr. and R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall Series in Computational Mathematics, Englewood Cliffs, NJ, 1983.
[10] S.C. Eisenstat and H.F. Walker, Choosing the forcing terms in an inexact Newton method, SIAM J. Sci. Comput., 17 (1996) no.1, pp. 16–32.
[11] E.M. Kasenally, GMBACK: a generalised minimum backward error algorithm for nonsymmetric linear systems, SIAM J. Sci. Comput., 16 (1995) no. 3, pp. 698–719.
[12] E.M. Kasenally and V. Simoncini, Analysis of a minimum perturbation algorithm for nonsymmetric linear systems, SIAM J. Numer. Anal, 34 (1997) no. 1, pp. 48–66. 132
[13] C.T. Kelley, Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, 1995.
[14] H.J. Martinez, Z. Parada and R.A. Tapia, On the characterization of Q-superlinear convergence of quasi-Newton interior-point methods for nonlinear programming, Bol.Soc. Mat. Mexicana, 1 (1995) no. 3, pp. 137–148.
[15] J.M. Ortega and W.C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970.
[16] I. Pavaloiu, Introduction in the Approximation of the Solutions of Equations Theory, Ed. Dacia, Cluj-Napoca, 1976 (in Romanian).
[17] F.A. Potra and V. Ptak, Nondiscrete Induction and Iterative Processes, Pitman, London, 1984.
[18] F.A. Potra, On Q-order and R-order of convergence, J. Optim. Theory Appl., 63 (1989) no. 3, pp. 415–431.
[19] W.C. Rheinboldt, Methods for Solving Systems of Nonlinear Equations, SIAM, Philadelphia, 1996.
[20] Y. Saad, Krylov subspace methods for solving large unsymmetric linear systems, Math.Comp., 37 (1981) no. 155, pp. 105–126.
[21] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Pub. Co., Boston, 1996.
[22] Y. Saad and M.H. Schultz, GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 7 (1986) no. 3, pp.856–869.
[23] G.W. Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press, New York, 1990.
[24] H.F. Walker, An approach to continuation using Krylov subspace methods, Research Report 1/97/89, Dept. of Math., Utah State University, submitted to Computational Science in the 21st Century.
ON THE HIGH CONVERGENCE ORDERS OF THE NEWTON-GMBACK METHODS*
EMIL CĂTINAŞ
Abstract
The high convergence orders of the Newton-GMBACK methods can be characterized applying three different existing results. In this note we show by some direct computations that these characterizations are equivalent.
The GMBACK method introduced by Kasenally [11, and which we shall describe in the following section, is a Krylov method for solving large linear systems. When it is used in a Newton method, the convergence orders of the resulted iterations may be characterized applying three existing results.
The first result was given by Dennis and Moré in [7], but before enouncing it we review the common setting. Given F:D subeR^(N)rarrR^(N)F: D \subseteq \mathbb{R}^{N} \rightarrow \mathbb{R}^{N}, the local convergence of different Newton-type methods is usually studied under the following standard assumptions:
(C1) there exists y^(**)in Dy^{*} \in D such that F(y^(**))=0F\left(y^{*}\right)=0;
(C2) the mapping FF is differentiable in a neighborhood of y^(**)y^{*}, with the derivative F^(')F^{\prime} continuous at y^(**)y^{*};
(C3) the Jacobian F^(')(y^(**))F^{\prime}\left(y^{*}\right) is nonsingular: EEF^(')(y^(**))^(-1)inR^(N xx N)\exists F^{\prime}\left(y^{*}\right)^{-1} \in \mathbb{R}^{N \times N}.
We shall denote hereafter by ||*||\|\cdot\| an arbitrary fixed norm on R^(N)\mathbb{R}^{N} or its induced operator norm. The symbol ||*||_(2)\|\cdot\|_{2} stands for the euclidean norm and ||*||_(F)\|\cdot\|_{F} denotes the Frobenius norm. For definitions and results concerning the convergence orders we refer to [15, ch.9] (see also [19], [18]).
Theorem 1.1. [7]. Let F:D rarrR^(N)F: D \rightarrow \mathbb{R}^{N} be differentiable in the open convex set D subeR^(N)D \subseteq \mathbb{R}^{N} and assume that for some y^(**)in Dy^{*} \in D the mapping FF satisfies (C2) and (C3). Let (B_(k))_(k >= 0)subR^(N xx N)\left(B_{k}\right)_{k \geq 0} \subset \mathbb{R}^{N \times N} be a sequence of nonsingular matrices and
suppose that for some y_(0)in Dy_{0} \in D the quasi-Newton iterates given by y_(k+1)=y_(k)-B_(k)^(-1)F(y_(k)),quad k=0,1,dotsy_{k+1}=y_{k}-B_{k}^{-1} F\left(y_{k}\right), \quad k=0,1, \ldots
remain in DD and converge to y^(**)y^{*}. Then (y_(k))_(k >= 0)\left(y_{k}\right)_{k \geq 0} converges qq-superlinearly to y^(**)y^{*} and F(y^(**))=0F\left(y^{*}\right)=0 if and only if lim_(k rarr oo)(||(B_(k)-F^(')(y^(**)))(y_(k+1)-y_(k))||)/(||y_(k+1)-y_(k)||)=0.\lim _{k \rightarrow \infty} \frac{\left\|\left(B_{k}-F^{\prime}\left(y^{*}\right)\right)\left(y_{k+1}-y_{k}\right)\right\|}{\left\|y_{k+1}-y_{k}\right\|}=0 .
The second result was obtained by Dembo, Eisenstat and Steihaug in 6 .
Theorem 1.2. [6]. Assume the mapping FF satisfies the standard assumptions and suppose that for some y_(0)in Dy_{0} \in D the sequence of the inexact Newton iterates given by
remains in DD and converges to y^(**)y^{*}. Then (y_(k))_(k >= 0)\left(y_{k}\right)_{k \geq 0} converges qq-superlinearly if and only if the residuals r_(k)r_{k} satisfy
||r_(k)||=o(||F(y_(k))||),quad" as "k rarr oo.\left\|r_{k}\right\|=o\left(\left\|F\left(y_{k}\right)\right\|\right), \quad \text { as } k \rightarrow \infty .
Martinez, Parada and Tapia [14] obtained some results for the sequences of damped and perturbed quasi-Newton methods
where 0 < alpha_(k) <= 1,r_(k)inR^(N),k=0,1,dots0<\alpha_{k} \leq 1, r_{k} \in \mathbb{R}^{N}, k=0,1, \ldots and y_(0)inR^(N)y_{0} \in \mathbb{R}^{N}. Though, those results do not fully characterize the convergence orders of the above sequences. On the other hand, we shall see that if we want to fit in a direct manner the Newton-GMBACK iterates in this frame, we must reduce the above iterations either to the classical quasi-Newton or to the IN ones.
We have recently extended Theorem 1.2,
Theorem 1.3. [5]. Assume the mapping FF satisfies the standard assumptions and consider the following elements: (Delta_(k))_(k >= 0)subR^(N xx N)\left(\Delta_{k}\right)_{k \geq 0} \subset \mathbb{R}^{N \times N} (perturbations in the Jacobians), (delta_(k))_(k >= 0)subR^(N)\left(\delta_{k}\right)_{k \geq 0} \subset \mathbb{R}^{N} (perturbations in the function evaluations) and ( hat(r)_(k))_(k >= 0)subR^(N)\left(\hat{r}_{k}\right)_{k \geq 0} \subset \mathbb{R}^{N} (residuals of the approximate solutions s_(k)s_{k} to the perturbed linear systems {:(F^(')(y_(k))+Delta_(k))s=-F(y_(k))+delta_(k))\left.\left(F^{\prime}\left(y_{k}\right)+\Delta_{k}\right) s=-F\left(y_{k}\right)+\delta_{k}\right). If for some y_(0)in Dy_{0} \in D the sequence of the inexact perturbed Newton iterates given by
is well defined (i.e. the matrices F^(')(y_(k))+Delta_(k)F^{\prime}\left(y_{k}\right)+\Delta_{k} are nonsingular and the iterates y_(k)y_{k} remain in DD ) and converges to y^(**)y^{*}, then the convergence is qq-superlinear if and only if ||Delta_(k)(F^(')(y_(k))+Delta_(k))^(-1)F(y_(k))+(I-Delta_(k)(F^(')(y_(k))+Delta_(k))^(-1))(delta_(k)+ hat(r)_(k))||=o(||F(y_(k))||)\left\|\Delta_{k}\left(F^{\prime}\left(y_{k}\right)+\Delta_{k}\right)^{-1} F\left(y_{k}\right)+\left(I-\Delta_{k}\left(F^{\prime}\left(y_{k}\right)+\Delta_{k}\right)^{-1}\right)\left(\delta_{k}+\hat{r}_{k}\right)\right\|=o\left(\left\|F\left(y_{k}\right)\right\|\right),
as k rarr ook \rightarrow \infty.
Remark. Modest additional continuity conditions imposed on the derivative F^(')F^{\prime} allow the characterizations of the qq-convergence orders 1+p,p in(0,1]1+p, p \in(0,1] of the three methods considered above. For these extensions we refer the reader to [8], 6 and resp. [5].
In Section 2 we briefly describe the GMBACK method and we deduce some results, while in Section 3 we show that when applying the above three theorems for the characterization of the qq-superlinear convergence of the NewtonGMBACK method we are led to some equivalent results.
2. THE GMBACK METHOD
Consider the linear nonsingular system
Ax=bA x=b
with A inR^(N xx N)A \in \mathbb{R}^{N \times N} and b inR^(N)b \in \mathbb{R}^{N}. The GMBACK algorithm introduced by Kasenally in [11] belongs to the class of Krylov methods for solving such systems when the dimension NN is large. Given the initial approximation x_(0)inR^(N)x_{0} \in \mathbb{R}^{N} of the true solution x^(**)x^{*} and a number m in{1,dots,N}m \in\{1, \ldots, N\}, by a modified GramSchmidt procedure there is constructed an orthonormal basis {v_(1),dots,v_(m)}\left\{v_{1}, \ldots, v_{m}\right\} in the Krylov subspace K_(m)=K_(m)(A,r_(0))=span{r_(0),Ar_(0),dots,A^(m-1)r_(0)}\mathcal{K}_{m}=\mathcal{K}_{m}\left(A, r_{0}\right)=\operatorname{span}\left\{r_{0}, A r_{0}, \ldots, A^{m-1} r_{0}\right\}, where r_(0)=b-Ax_(0)r_{0}=b-A x_{0} is the residual of the initial approximation. Finally, GMBACK determines an approximation x_(m)^(GB)inx_(0)+K_(m)x_{m}^{G B} \in x_{0}+\mathcal{K}_{m} which solves the following minimization problem:
min_(x_(m)inx_(0)+K_(m))||Delta_(A)||_(F)quad" subject to "(A-Delta_(A))x_(m)=b.\min _{x_{m} \in x_{0}+\mathcal{K}_{m}}\left\|\Delta_{A}\right\|_{F} \quad \text { subject to }\left(A-\Delta_{A}\right) x_{m}=b .
The following steps are performed for determining x_(m)^(GB)x_{m}^{G B} :
Arnoldi
Let r_(0)=b-Ax_(0),beta=||r_(0)||_(2)r_{0}=b-A x_{0}, \beta=\left\|r_{0}\right\|_{2} and v_(1)=(1)/(beta)r_(0)v_{1}=\frac{1}{\beta} r_{0};
Form the Hessenberg matrix bar(H)_(m)inR^((m+1)xx m)\bar{H}_{m} \in \mathbb{R}^{(m+1) \times m} with the (possible) nonzero elements h_(ij)h_{i j} computed above and the matrix V_(m)inR^(N xx m)V_{m} \in \mathbb{R}^{N \times m} having on columns the vectors v_(j):V_(m)=[v_(1)dotsv_(m)]v_{j}: V_{m}=\left[v_{1} \ldots v_{m}\right];
Determine an eigenvector u_(m+1)u_{m+1} corresponding to the smallest eigenvalue lambda_(m+1)^(GB)\lambda_{m+1}^{G B} of the generalized eigenproblem Pu=lambda QuP u=\lambda Q u;
If the first component u_(m+1)^((1))u_{m+1}^{(1)} is nonzero, compute the vector y_(m)^(GB)inR^(m)y_{m}^{G B} \in \mathbb{R}^{m} by scaling u_(m+1)u_{m+1} such that
Set x_(m)^(GB)=x_(0)+V_(m)y_(m)^(GB)x_{m}^{G B}=x_{0}+V_{m} y_{m}^{G B}.
This algorithm may lead to two possible breakdowns, either in the Arnoldi method or in the scaling of u_(m+1)u_{m+1}. The first one is as for GMRES a happy breakdown, because the solution may be determined exactly using bar(H)_(m)\bar{H}_{m} and V_(m)V_{m}. The second one appears when all the eigenvectors associated to lambda_(m+1)^(GB)\lambda_{m+1}^{G B} have the first component zero, the inevitable divisions by zero leading to uncircumventible breakdowns. In such a case either mm is increased or the algorithm is restarted with a different initial approximation x_(0)x_{0}. We shall assume in the following analysis that x_(m)^(GB)x_{m}^{G B} exist.
It is worth noting that the algorithm may be used in the restarted version, by taking after the mm steps the computed solution x_(m)^(GB)x_{m}^{G B} as the new initial approximation.
We shall prove the following result:
Proposition 2.1. Consider some arbitrary elements x_(0)inR^(N)x_{0} \in \mathbb{R}^{N} and m in{1,dots,N}m \in \{1, \ldots, N\}. If there exists a GMBACK solution x_(m)^(GB)x_{m}^{G B}, then its corresponding
backward error Delta_(A,m)^(GB)inR^(N xx N)\Delta_{A, m}^{G B} \in \mathbb{R}^{N \times N} satisfies
where for the first writting we have considered the linear systems A_(k)s=b_(k)A_{k} s=b_{k} with A_(k)=F^(')(y_(k))A_{k}=F^{\prime}\left(y_{k}\right) and b_(k)=-F(y_(k))b_{k}=-F\left(y_{k}\right).
Applying Theorem 1.1 of Dennis and Moré for the first writing of the iterates we get
Theorem 3.1. Assume the standard conditions hold and that for a given element y_(0)in Dy_{0} \in D, the sequence of Newton-GMBACK iterates is well defined and converges to y^(**)y^{*}. Then the convergence is qq-superlinear if and only if
||r_(k)^(GB)||_(2)=o(||s_(k)^(GB)||_(2)),quad" as "k rarr oo.\left\|r_{k}^{G B}\right\|_{2}=o\left(\left\|s_{k}^{G B}\right\|_{2}\right), \quad \text { as } k \rightarrow \infty .
Proof. We first observe that under the notations of formula (3.1), B_(k)=F^(')(y_(k))-Delta_(A_(k))^(GB)B_{k}= F^{\prime}\left(y_{k}\right)-\Delta_{A_{k}}^{G B} such that B_(k)-F^(')(y_(k))=-Delta_(A_(k))^(GB)B_{k}-F^{\prime}\left(y_{k}\right)=-\Delta_{A_{k}}^{G B}. Next, Theorem 1.1 and Proposition 2.1 lead to the stated affirmation.
Theorem 1.2 of Dembo, Eisenstat and Steihaug applied to the writting 3.2 yields
Theorem 3.2. Under the same assumptions of Theorem 3.1, the convergence of the Newton-GMBACK iterates is qq-superlinear if and only if
||r_(k)^(GB)||=o(||F(y_(k))||),quad" as "k rarr oo.\left\|r_{k}^{G B}\right\|=o\left(\left\|F\left(y_{k}\right)\right\|\right), \quad \text { as } k \rightarrow \infty .
One can see that the damped and perturbed quasi-Newton method of Martinez, Parada and Tapia must be reduced either to the quasi-Newton or to the IN method if we want to fit the Newton-GMBACK iterates in this frame.
Regarding the Newton-GMBACK iterates as IPN iterates we get
Theorem 3.3. Under the same assumptions of Theorem 3.1, the convergence of the Newton-GMBACK iterates is qq-superlinear if and only if
||r_(k)^(GB)||=o(||F(y_(k))||),quad" as "k rarr oo.\left\|r_{k}^{G B}\right\|=o\left(\left\|F\left(y_{k}\right)\right\|\right), \quad \text { as } k \rightarrow \infty .
Proof. Taking Delta_(k)=-Delta_(A_(k))^(GB),delta_(k)=0\Delta_{k}=-\Delta_{A_{k}}^{G B}, \delta_{k}=0 and hat(r)_(k)=0,k=0,1,dots\hat{r}_{k}=0, k=0,1, \ldots in Theorem 3.3 we are led by the writting 3.1 and by Proposition 2.1 to the stated affirmation.
Remark. Since the proofs from our IPN model relied on the IN one, Theorems 3.2 and 3.3 were expected to yield the same results.
In order to complete our initial assertion we see that it suffices to show that (||F(y_(k))||)_(k >= 0)\left(\left\|F\left(y_{k}\right)\right\|\right)_{k \geq 0} and (||s_(k)||)_(k >= 0)\left(\left\|s_{k}\right\|\right)_{k \geq 0} have the same rate of convergence. This is true by the following considerations.
Walker proved for an arbitrary sequence (y_(k))_(k >= 0)subR^(N)\left(y_{k}\right)_{k \geq 0} \subset \mathbb{R}^{N} that it converges qq-superlinearly only at the same time with (y_(k+1)-y_(k))_(k >= 0)\left(y_{k+1}-y_{k}\right)_{k \geq 0}.
Lemma 3.4. [24]. Let (y_(k))_(k >= 0)subR^(N)\left(y_{k}\right)_{k \geq 0} \subset \mathbb{R}^{N} be a convergent sequence and denote s_(k)=y_(k+1)-y_(k),k=0,1,dotss_{k}=y_{k+1}-y_{k}, k=0,1, \ldots Then (y_(k))_(k >= 0)\left(y_{k}\right)_{k \geq 0} converges qq-superlinearly if and only if (s_(k))_(k >= 0)\left(s_{k}\right)_{k \geq 0} converges qq-superlinearly.
The fact that (y_(k))_(k >= 0)\left(y_{k}\right)_{k \geq 0} and (s_(k))_(k >= 0)\left(s_{k}\right)_{k \geq 0} have precisely the same rate when converging qq-superlinearly is known for a longer time, by a result of Dennis and Moré.
Lemma 3.5. [7]. In the hypotheses of the above Lemma, if (y_(k))_(k >= 0)\left(y_{k}\right)_{k \geq 0} converges qq-superlinearly at y^(**)y^{*}, then
The connection between the rates of ||y^(**)-y_(k)||\left\|y^{*}-y_{k}\right\| and ||F(y_(k))||\left\|F\left(y_{k}\right)\right\| is expressed by the following result obtained by Dembo, Eisenstat and Steihaug:
Lemma 3.6. [6]. Under the standard assumptions on FF there exists epsi,beta > 0\varepsilon, \beta>0 such that
(1)/(beta)||y^(**)-y|| <= ||F(y)|| <= beta||y^(**)-y||quad" for all "||y-y^(**)|| <= epsi.\frac{1}{\beta}\left\|y^{*}-y\right\| \leq\|F(y)\| \leq \beta\left\|y^{*}-y\right\| \quad \text { for all }\left\|y-y^{*}\right\| \leq \varepsilon .
The equivalence of the characterizations is now completed.