On the superlinear convergence of the successive approximations method

Abstract

The Ostrowski theorem is a classical result which ensures the attraction of all the successive approximations xk+1 = G(xk) near a fixed point x*. Different conditions [ultimately on the magnitude of G′(x*)] provide lower bounds for the convergence order of the process as a whole.

In this paper, we consider only one such sequence and we characterize its high q-convergence orders in terms of some spectral elements of G′(x*); we obtain that the set of trajectories with high q-convergence orders is restricted to some affine subspaces, regardless of the nonlinearity of G.

We analyze also the stability of the successive approximations under perturbation assumptions.

Authors

Emil Cătinaş
(Tiberiu Popoviciu Institute of Numerical Analysis, Romanian Academy)

Keywords

fixed point; successive approximations; nonlinear system of equations in Rn; inexact Newton method; perturbed Newton method; linear systems of equation in Rn; residual; local convergence; q-convergence orders.

Cite this paper as:

E. Cătinaş, On the superlinear convergence of the successive approximations method, J. Optim. Theory Appl., 113 (2002) no. 3, pp. 473-485
doi: 10.1023/A:1015304720071

PDF

About this paper

Publisher Name

Kluwer Academic Publishers-Plenum Publishers

Print ISSN

0022-3239

Online ISSN

1573-2878

MR

0022-3239

Online ISSN

1573-2878

Google Scholar citations

[1] Ortega, J. M., and Rheinboldt, W. C., Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, NY, 1970. 484 JOTA: VOL. 113, NO. 3, JUNE 2002

[2] Potra, F. A., On Q-Order and R-Order of Convergence, Journal of Optimization Theory and Applications, Vol. 63, pp. 415–431, 1989.

[3] Rheinboldt, W. C., Methods for Solûing Systems of Nonlinear Equations, SIAM, Philadelphia, Pennsylvania, 1998.

[4] Potra, F. A., Q-Superlinear Convergence of the Iterates in Primal–Dual InteriorPoint Methods, Mathematical Programming (to appear).

[5] Ostrowski, A. M., Solution of Equations and Systems of Equations, Academic Press, New York, NY, 1966.

[6] Householder, A. S., The Theory of Matrices in Numerical Analysis, Dover, New York, NY, 1974.

[7] Argyros, I., and Szidarovszky, F., The Theory and Applications of Iteration Methods, CRC Press, Boca Raton, Florida, 1993.

[8] Argyros, I., On the Convergence of the Modified Contractions, Journal of Computational and Applied Mathematics, Vol. 55, pp. 183–189, 1994.

[9] Brown, P. N., A Local Convergence Theory for Combined Inexact-Newton Finite-Difference Projection Methods, SIAM Journal on Numerical Analysis, Vol. 24, pp. 407–434, 1987.

[10] Brown, P. N., and Saad, Y., Convergence Theory of Nonlinear Newton–Krylov Algorithms, SIAM Journal on Optimization, Vol. 4, pp. 297–330, 1994.

[11] Catinas, E., Newton and Newton–Krylov Methods for Solving Nonlinear Systems in n , PhD Thesis, Babes–Bolyai University of Cluj–Napoca, Cluj– Napoca, Romania, 1999.

[12] Catinas, E., On the High Convergence Orders of the Newton–GMBACK Methods, Revue d’Analyse Numerique et de Theorie de l’Approximation, Vol. 28, pp. 125–132, 1999.

[13] Catinas, E., A Note on the Quadratic Convergence of the Inexact Newton Methods, Revue d’Analyse Numerique et de Theorie de l’Approximation, Vol. 29, pp. 129–134, 2000.

[14] Catinas, E., Inexact Perturbed Newton Methods and Applications to a Class of Krylov Solvers, Journal of Optimization Theory and Applications, Vol. 108, pp. 543–570, 2001.

[15] Catinas, E., The Inexact, Inexact Perturbed, and Quasi-Newton Methods are Equivalent Models, Mathematics of Computation (to appear).

[16] Dembo, R. S., Eisenstat, S. C., and Steihaug, T., Inexact Newton Methods, SIAM Journal on Numerical Analysis, Vol. 19, pp. 400–408, 1982.

[17] DENNIS, J. E., JR., and MORE´ , J. J., A Characterization of Superlinear Convergence and Its Application to Quasi-Newton Methods, Mathematics of Computation, Vol. 28, pp. 549–560, 1974.

[18] Dennis, J. E., JR., and More , J. J., Quasi-Newton Methods, Motivation and Theory, SIAM Review, Vol. 19, pp. 46–89, 1977.

[19] Dennis, J. E., JR., and Schnabel, R. B., Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Series in Computational Mathematics, Prentice-Hall, Englewood Cliffs, New Jersey, 1983.

[20] Deuflhard, P., and Potra, F. A., Asymptotic Mesh Independence of Newton– Galerkin Methods via a Refined Mysovskii Theorem, SIAM Journal on Numerical Analysis, Vol. 29, pp. 1395–1412, 1992.

Paper (preprint) in HTML form

On the Superlinear Convergence of the Successive Approximations Method 1

E. CĂtinaş 2
Communicated by F. A. Potra
Abstract

The Ostrowski theorem is a classical result which ensures the attraction of all the successive approximations xk+1=G(xk)x_{k+1}=G\left(x_{k}\right) near a fixed point xx^{*}. Different conditions [ultimately on the magnitude of G(x)G^{\prime}\left(x^{*}\right) ] provide lower bounds for the convergence order of the process as a whole. In this paper, we consider only one such sequence and we characterize its high convergence orders in terms of some spectral elements of G(x)G^{\prime}\left(x^{*}\right); we obtain that the set of trajectories with high convergence orders is restricted to some affine subspaces, regardless of the nonlinearity of GG. We analyze also the stability of the successive approximations under perturbation assumptions.

Key Words. Successive approximations, convergence orders, inexact Newton iterates.

1. Introduction

Consider a subset DnD\subseteq\mathbb{R}^{n} and a mapping GG : DDD\rightarrow D which has a fixed point xint(D)x^{*}\in\operatorname{int}(D),

G(x)=x.G\left(x^{*}\right)=x^{*}.

We are interested in the convergence to xx^{*} of the successive approximations (xk)k0\left(x_{k}\right)_{k\geq 0}, given for some x0Dx_{0}\in D by

xk+1=G(xk),k=0,1,x_{k+1}=G\left(x_{k}\right),\quad k=0,1,\ldots (1)

First, we recall briefly the definitions of the convergence orders. The symbol \|\cdot\| stands for a given norm in n\mathbb{R}^{n} and for its induced operator norm.

00footnotetext: 1 This research was supported by the Romanian Academy of Sciences under Grant GAR 64/ 2001.
2 Research Fellow, T. Popoviciu Institute of Numerical Analysis, Romanian Academy of Sciences, Cluj-Napoca, Romania.

Definition 1.1. See Ref. 1, Chapter 9. Let (xk)k0n\left(x_{k}\right)_{k\geq 0}\subset\mathbb{R}^{n} be an arbitrary sequence converging to some xnx^{*}\in\mathbb{R}^{n}. For each α[1,+)\alpha\in[1,+\infty), the quotient factor and the root convergence factor are defined by
Qα{xk}Q_{\alpha}\left\{x_{k}\right\}
={0, if xk=x, for all but finitely many k,lim supk+,xk+1x/xkxα, if xkx, for all but finitely many k, otherwise ,=\begin{cases}0,&\text{ if }x_{k}=x^{*},\text{ for all but finitely many }k,\\ \limsup_{\begin{subarray}{c}k\rightarrow\infty\\ +\infty,\end{subarray}}\left\|x_{k+1}-x^{*}\right\|/\left\|x_{k}-x^{*}\right\|^{\alpha},&\text{ if }x_{k}\neq x^{*},\text{ for all but finitely many }k,\\ &\text{ otherwise },\end{cases}
Rα{xk}={lim supkxkx1/αk, when α>1,lim supkxkx1/k, when α=1.R_{\alpha}\left\{x_{k}\right\}=\begin{cases}\limsup_{k\rightarrow\infty}\left\|x_{k}-x^{*}\right\|^{1/\alpha^{k}},&\text{ when }\alpha>1,\\ \limsup_{k\rightarrow\infty}\left\|x_{k}-x^{*}\right\|^{1/k},&\text{ when }\alpha=1.\end{cases}
The qq-convergence and rr-convergence orders are defined by

OQ{xk}={+, if Qα{xk}=0,α[1,+),inf{α[1,+):Qα{xk}=+}, otherwise,\displaystyle O_{Q}\left\{x_{k}\right\}=
OR{xk}={+, if Rα{xk}=0,α[1,+),inf{α[1,+):Rα{xk}=1}, otherwise.\displaystyle O_{R}\left\{x_{k}\right\}=

Remark 1.1. When Q1{xk}=0Q_{1}\left\{x_{k}\right\}=0, it is said that the sequence converges qq-superlinearly; this may be written as

xk+1x=o(xkx), as k.\left\|x_{k+1}-x^{*}\right\|=o\left(\left\|x_{k}-x^{*}\right\|\right),\quad\text{ as }k\rightarrow\infty.

When Qα0{xk}<+Q_{\alpha_{0}}\left\{x_{k}\right\}<+\infty for some α0>1\alpha_{0}>1, one may write

xk+1x=𝒪(xkxα0), as k.\left\|x_{k+1}-x^{*}\right\|=\mathscr{O}\left(\left\|x_{k}-x^{*}\right\|^{\alpha_{0}}\right),\quad\text{ as }k\rightarrow\infty.

We recall also that qq-convergence with a certain order implies rr-convergence with at least the same order, the converse being false; for related results, we refer the reader to Ref. 1, Chapter 9 and Ref. 2; see also Ref. 3, Chapter 3 and Ref. 4.

When considering a whole iterative process, its convergence order measures the worst convergence among the sequences with the same limit. In this paper, we shall deal with the conditions ensuring that xx^{*} is an attraction point; i.e., there exists an open ball with center at xx^{*} such that all the sequences given by (1), with the initial approximation x0x_{0} from that ball, converge to xx^{*}. The set of all such sequences will be denoted by 𝒮\mathscr{S}. The
qq-factor and rr-factor of the iterative process 𝒮\mathscr{S} are then defined as

Qα(𝒮)=sup{Qα{xk}:(xk)k0𝒮}\displaystyle Q_{\alpha}(\mathscr{S})=\sup\left\{Q_{\alpha}\left\{x_{k}\right\}:\left(x_{k}\right)_{k\geq 0}\in\mathscr{S}\right\}
Rα(𝒮)=sup{Rα{xk}:(xk)k0𝒮}\displaystyle R_{\alpha}(\mathscr{S})=\sup\left\{R_{\alpha}\left\{x_{k}\right\}:\left(x_{k}\right)_{k\geq 0}\in\mathscr{S}\right\}

the convergence orders being defined in the same fashion as for a single sequence.

The following attraction theorem is well known; see also Ref. 3, Theorem 3.5.

Theorem 1.1. See Ostrowski (Ref. 5, Theorem 22.1) and see Ref. 1, Theorems 10.1.3 and 10.1.4. Assume that the mapping GG is differentiable at the fixed point xint(D)x^{*}\in\operatorname{int}(D). If the spectral radius of G(x)G^{\prime}\left(x^{*}\right) satisfies

ρ(G(x))=σ<1\rho\left(G^{\prime}\left(x^{*}\right)\right)=\sigma<1

then xx^{*} is an attraction point for the successive approximations. Moreover,

R1(𝒪)=σ,R_{1}(\mathscr{O})=\sigma,

and if σ>0\sigma>0, then

OR(𝒮)=OQ(𝒮)=1O_{R}(\mathscr{S})=O_{Q}(\mathscr{S})=1

The condition σ<1\sigma<1 is sharp. See the following example.
Example 1.1. See Ref. 1, Exercise 10.1-2. For GG : \mathbb{R}\rightarrow\mathbb{R},

G(x)=xx3G(x)=x-x^{3}

x=0x^{*}=0 is an attraction point, while for

G(x)=x+x3G(x)=x+x^{3}

the same fixed point is no longer an attraction point; in both cases, σ=1\sigma=1.
It is worth noting that the rr-superlinear convergence of 𝒮\mathscr{S} does not generally imply the qq-superlinear convergence; see also Ref. 3, p. 30.

Example 1.2. See Ref. 1, Exercise 10.1-6. For GG : 22\mathbb{R}^{2}\rightarrow\mathbb{R}^{2},

G(u,v)=(u2v,v2)G(u,v)=\left(u^{2}-v,v^{2}\right)

with x=0x^{*}=0, one obtains R1(𝒮)=0R_{1}(\mathscr{S})=0, but Q1(𝒮)>0Q_{1}(\mathscr{S})>0 in any norm.
A sufficient condition for R1(𝒮)=Q1(𝒮)=σ[0,1)R_{1}(\mathscr{S})=Q_{1}(\mathscr{S})=\sigma\in[0,1) is that G(x)G^{\prime}\left(x^{*}\right) is an M-matrix (see Ref. 3, p. 30); i.e., there exists a norm \|\cdot\| in n\mathbb{R}^{n} such that
G(x)=σ\left\|G^{\prime}\left(x^{*}\right)\right\|=\sigma [equivalently, for any eigenvalue λ\lambda of G(x)G^{\prime}\left(x^{*}\right) with |λ|=σ|\lambda|=\sigma, all Jordan blocks containing λ\lambda are one-dimensional; see e.g. Ref. 6, p. 46]. As a limiting situation, we are led to the following result, which was proved in a direct manner in Ref. 1 (see also Ref. 3, p. 30).

Theorem 1.2. See Ref. 1, Theorem 10.1.6. Under the assumptions of the Ostrowski theorem, if G(x)=0G^{\prime}\left(x^{*}\right)=0, then R1(𝒮)=Q1(𝒮)=0R_{1}(\mathscr{S})=Q_{1}(\mathscr{S})=0; i.e. 𝒮\mathscr{S} has qq-superlinear and rr-superlinear convergence.

The convergence orders in the above theorem are actually higher if GG is smoother; see also Ref. 3, Theorem 3.6.

Theorem 1.3. See Ref. 1, Theorem 10.1.7. Assume that the mapping GG is continuously differentiable on an open neighborhood of the fixed point xint(D)x^{*}\in\operatorname{int}(D). If G(x)=0G^{\prime}\left(x^{*}\right)=0 and if GG is twice differentiable at xx^{*}, then

OR(𝒮)OQ(𝒮)2O_{R}(\mathscr{S})\geq O_{Q}(\mathscr{S})\geq 2

i.e the process has qq-convergence and rr-convergence orders of at least two. If additionally,

G′′(x)(h,h)0, for all h0 in n,G^{\prime\prime}\left(x^{*}\right)(h,h)\neq 0,\quad\text{ for all }h\neq 0\text{ in }\mathbb{R}^{n},

then the convergence orders are exactly equal to two,

OR(𝒮)=OQ(𝒮)=2.O_{R}(\mathscr{S})=O_{Q}(\mathscr{S})=2.

Ortega and Rheinboldt have noticed that the conditions in the previous two results are not necessary.

Example 1.3. See Ref. 1, Exercise 10.1-12. For GG : 22\mathbb{R}^{2}\rightarrow\mathbb{R}^{2},

G(u,v)=(0,u+uv+vα),G(u,v)=\left(0,u+uv+v^{\alpha}\right),

arbitrarily high qq-convergence orders α>1\alpha>1 may be attained at x=0x^{*}=0, even if G(x)0G^{\prime}\left(x^{*}\right)\neq 0.

The above results on sufficiency are global, in the sense that they provide lower bounds for the convergence orders of all the sequences from 𝒮\mathscr{S}. However, it is possible for some sequences to exhibit higher convergence orders than the lowest bound ensured for some α01\alpha_{0}\geq 1 by Qα0(𝒮)Q_{\alpha_{0}}(\mathscr{S}) or Rα0(S)R_{\alpha_{0}}(S).

Example 1.4. Consider some α>1\alpha>1 and G:22\mathrm{G}:\mathbb{R}^{2}\rightarrow\mathbb{R}^{2},

G(u,v)=(1/2u,vα)G(u,v)=\left(1/2u,v^{\alpha}\right)

with x=0x^{*}=0 and σ=1/2\sigma=1/2. Then,
(a) for x0=(u0,v0),u00,0|v0|<1,(xk)k0x_{0}=\left(u_{0},v_{0}\right),u_{0}\neq 0,0\leq\left|v_{0}\right|<1,\left(x_{k}\right)_{k\geq 0} converges only linearly;
(b) for x0=(0,v0),0<|v0|<1,(xk)k0x_{0}=\left(0,v_{0}\right),0<\left|v_{0}\right|<1,\left(x_{k}\right)_{k\geq 0} converges with qq-order α>1\alpha>1.

The aim of this paper is to characterize the high convergence orders of the sequence (1). This will be done in Section 2, while in Section 3 we shall analyze the stability of these iterations under some perturbation assumptions.

2. Convergence Orders of the Successive Approximations

Given a subset DnD^{\prime}\subseteq\mathbb{R}^{n} and a nonlinear mapping F:DnF:D^{\prime}\rightarrow\mathbb{R}^{n}, the Newton method for approximating a solution of the nonlinear system F(x)=0F(x)=0 is given by

xk+1=xkF(xk)1F(xk),k=0,1,,x0D.x_{k+1}=x_{k}-F^{\prime}\left(x_{k}\right)^{-1}F\left(x_{k}\right),\quad k=0,1,\ldots,\quad x_{0}\in D^{\prime}.

Several results have revealed the local convergence properties of this method and of other Newton-type iterations; see e.g. Refs. 1, 3, 5, and 736. We shall recall the results of Dembo, Eisenstat, and Steihaug on inexact Newton methods (Ref. 16), which will allow us to analyze the local behavior of the successive approximations.

Consider the following (standard) assumptions on FF :
(a) there exists xDx^{*}\in D^{\prime} such that F(x)=0F\left(x^{*}\right)=0;
(b) the mapping FF is differentiable on an open neighborhood of xx^{*}, with FF^{\prime} continuous at xx^{*};
(c) the Jacobian F(x)F^{\prime}\left(x^{*}\right) is nonsingular.

The derivative FF^{\prime} is said to be Hölder continuous at xx^{*} with exponent p,p(0,1]p,p\in(0,1], if there exist L,ϵ>0L,\epsilon>0 such that

F(x)F(x)Lxxp, when xx<ϵ\left\|F^{\prime}(x)-F^{\prime}\left(x^{*}\right)\right\|\leq L\left\|x-x^{*}\right\|^{p},\quad\text{ when }\left\|x-x^{*}\right\|<\epsilon

Given an initial approximation x0Dx_{0}\in D^{\prime}, the inexact Newton (IN) method for approximating the solution xx^{*} is given by the following iterations:

For k=0,1,k=0,1,\ldots, until convergence do the following steps:
Step 1. Find sks_{k} such that F(xk)sk=F(xk)+rkF^{\prime}\left(x_{k}\right)s_{k}=-F\left(x_{k}\right)+r_{k}.
Step 2. Set xk+1=xk+skx_{k+1}=x_{k}+s_{k}.
The residuals rkr_{k} are the amounts by which the approximate solutions sks_{k} fail to satisfy the exact linear systems.

The following result was obtained in Ref. 16.

Theorem 2.1. See Ref. 16. Assume that FF satisfies the standard assumptions and that, for an initial approximation x0Dx_{0}\in D^{\prime}, the IN iterates converge to xx^{*}. Then, the convergence is qq-superlinear iff

rk=o(F(xk)), as k.\left\|r_{k}\right\|=o\left(\left\|F\left(x_{k}\right)\right\|\right),\quad\text{ as }k\rightarrow\infty.

Additionally, if FF^{\prime} is Hölder continuous at xx^{*} with exponent p,p(0,1]p,p\in(0,1], then the convergence is with qq-order 1+p1+p iff

rk=𝒪(F(xk)1+p), as k,\left\|r_{k}\right\|=\mathscr{O}\left(\left\|F\left(x_{k}\right)\right\|^{1+p}\right),\quad\text{ as }k\rightarrow\infty,

while it has rr-order 1+p1+p iff

rk0, with r-order 1+p, as kr_{k}\rightarrow 0,\quad\text{ with }r\text{-order }1+p\text{, as }k\rightarrow\infty\text{. }

We obtain the following result concerning the successive approximations.

Theorem 2.2. Assume that the mapping GG is differentiable on an open neighborhood of the fixed point xx^{*}, with G\mathrm{G}^{\prime} continuous at xx^{*} and ρ(G(x))=σ<1\rho\left(G^{\prime}\left(x^{*}\right)\right)=\sigma<1. Let x0Dx_{0}\in D be an initial approximation such that the sequence of successive approximations converges to xx^{*}. Then (xk)k0\left(x_{k}\right)_{k\geq 0} converges qq-superlinearly iff

G(xk)(xkG(xk))=o(xkG(xk)), as k.\left\|G^{\prime}\left(x_{k}\right)\left(x_{k}-G\left(x_{k}\right)\right)\right\|=o\left(\left\|x_{k}-G\left(x_{k}\right)\right\|\right),\quad\text{ as }k\rightarrow\infty. (2)

Additionally, suppose that G\mathrm{G}^{\prime} is Hölder continuous at xx^{*} with exponent pp, p(0,1]p\in(0,1]. Then (xk)k0\left(x_{k}\right)_{k\geq 0} converges with qq-order 1+p1+p iff

G(xk)[xkG(xk)]=𝒪(xkG(xk)1+p), as k,\left\|G^{\prime}\left(x_{k}\right)\left[x_{k}-G\left(x_{k}\right)\right]\right\|=\mathscr{O}\left(\left\|x_{k}-G\left(x_{k}\right)\right\|^{1+p}\right),\quad\text{ as }k\rightarrow\infty, (3)

while the convergence is with rr-order 1+p1+p iff

G(xk)(xkG(xk))0, with r-order 1+p, as k.G^{\prime}\left(x_{k}\right)\left(x_{k}-G\left(x_{k}\right)\right)\rightarrow 0,\quad\text{ with }r\text{-order }1+p,\text{ as }k\rightarrow\infty.

Proof. The successive approximations may be regarded as IN iterates for solving

F(x)=xG(x)=0,F(x)=x-G(x)=0,

namely,

[IG(xk)][G(xk)xk]=[xkG(xk)]+G(xk)[xkG(xk)].\left[I-G^{\prime}\left(x_{k}\right)\right]\left[G\left(x_{k}\right)-x_{k}\right]=-\left[x_{k}-G\left(x_{k}\right)\right]+G^{\prime}\left(x_{k}\right)\left[x_{k}-G\left(x_{k}\right)\right].

The standard assumptions on F=IGF=I-G are obviously satisfied, the invertibility of F(x)=IG(x)F^{\prime}\left(x^{*}\right)=I-G^{\prime}\left(x^{*}\right) being ensured by hypothesis σ<1\sigma<1. Next, the Hölder continuity of GG^{\prime} at xx^{*} implies the same property for FF^{\prime},

F(x)F(x)(1+L)xxp, when xx<ϵ<1.\left\|F^{\prime}(x)-F^{\prime}\left(x^{*}\right)\right\|\leq(1+L)\left\|x-x^{*}\right\|^{p},\quad\text{ when }\left\|x-x^{*}\right\|<\epsilon<1.

Denoting

rk=G(xk)[xkG(xk)]r_{k}=G^{\prime}\left(x_{k}\right)\left[x_{k}-G\left(x_{k}\right)\right]

the conclusions are now straightforward from the previous theorem.

Remark 2.1.

(a) The superlinear convergence of 𝒮\mathscr{S} when G(x)=0G^{\prime}\left(x^{*}\right)=0, assured by Theorem 1.2, is retrieved under the hypotheses of this result:

G(xk)(xkG(xk))\displaystyle\left\|G^{\prime}\left(x_{k}\right)\left(x_{k}-G\left(x_{k}\right)\right)\right\| G(xk)xkG(xk)\displaystyle\leq\left\|G^{\prime}\left(x_{k}\right)\right\|\left\|x_{k}-G\left(x_{k}\right)\right\|
=o(xkG(xk)), as k.\displaystyle=o\left(\left\|x_{k}-G\left(x_{k}\right)\right\|\right),\quad\text{ as }k\rightarrow\infty.

(b) The conclusions of Theorem 1.3 may be obtained in the same fashion,

G(xk)(xkG(xk))\displaystyle\left\|G^{\prime}\left(x_{k}\right)\left(x_{k}-G\left(x_{k}\right)\right)\right\|
=(G(xk)G(x))(xkG(xk))\displaystyle=\left\|\left(G^{\prime}\left(x_{k}\right)-G^{\prime}\left(x^{*}\right)\right)\left(x_{k}-G\left(x_{k}\right)\right)\right\|
(G′′(x)xkx+o(xkx))xkG(xk)\displaystyle\leq\left(\left\|G^{\prime\prime}\left(x^{*}\right)\right\|\left\|x_{k}-x^{*}\right\|+o\left(\left\|x_{k}-x^{*}\right\|\right)\right)\left\|x_{k}-G\left(x_{k}\right)\right\|
=(xkG(xk)2), as k\displaystyle=\left(\left\|x_{k}-G\left(x_{k}\right)\right\|^{2}\right),\quad\text{ as }k\rightarrow\infty

since the standard hypotheses on FF ensure the existence of α,ϵ>0\alpha,\epsilon>0 such that (see Ref. 16)

(1/α)xxF(x)αxx, for xx<ϵ(1/\alpha)\left\|x-x^{*}\right\|\leq\|F(x)\|\leq\alpha\left\|x-x^{*}\right\|,\quad\text{ for }\left\|x-x^{*}\right\|<\epsilon

(c) The same conclusions could be obtained in the above theorem by writing the successive approximations as either quasi-Newton iterates or inexact perturbed Newton iterates; see Refs. 11, 14, 15.
(d) Instead of σ<1\sigma<1, one may assume a more general condition, namely that IG(x)I-G^{\prime}\left(x^{*}\right) is invertible [which holds iff G(x)G^{\prime}\left(x^{*}\right) has no eigenvalue equal to one] and that (xk)k0\left(x_{k}\right)_{k\geq 0} converges to xx^{*}.
(e) The Ostrowski theorem holds also in Banach spaces (see e.g. Ref. 37, Theorem 4C) as well as the corresponding characterizations for the IN iterations, so our result may be restated in this more general frame.

Theorem 2.2 characterizes the convergence orders of the successive approximations in terms of the iterates alone. In the following two results, we shall show that the convergence orders are intimately related to some spectral elements of G(x)G^{\prime}\left(x^{*}\right).

Theorem 2.3. Under the assumptions of Theorem 2.2, the sequence (xk)k0\left(x_{k}\right)_{k\geq 0} converges qq-superlinearly if and only if G(x)G^{\prime}\left(x^{*}\right) has a zero eigenvalue and, starting from a certain step, the corrections xk+1xkx_{k+1}-x_{k} are the corresponding eigenvectors,

G(x)(xk+1xk)=0,kk0.G^{\prime}\left(x^{*}\right)\left(x_{k+1}-x_{k}\right)=0,\quad\forall k\geq k_{0}. (4)

Provided that GG^{\prime} is Hölder continuous at xx^{*} with exponent p,p(0,1]p,p\in(0,1], the above condition characterizes in fact the qq-convergence orders 1+p1+p.

Proof. The residuals of the IN iterates may be written as the sums of two terms,

rk=[G(xk)G(x)](xk+1xk)+G(x)(xk+1xk),k=0,1,,-r_{k}=\left[G^{\prime}\left(x_{k}\right)-G^{\prime}\left(x^{*}\right)\right]\left(x_{k+1}-x_{k}\right)+G^{\prime}\left(x^{*}\right)\left(x_{k+1}-x_{k}\right),\quad k=0,1,\ldots,

so the sufficiency of condition (4) is obvious. For necessity, we notice that the residuals and their first terms converge to zero with rate at least o(xkG(xk))o\left(\left\|x_{k}-G\left(x_{k}\right)\right\|\right) as kk\rightarrow\infty, which requires (4).

Remark 2.2.

(a) The above result implies that no qq-superlinear convergence to xx^{*} of any sequence of successive approximations may occur when all the eigenvalues of G(x)G^{\prime}\left(x^{*}\right) are nonzero.
(b) As Example 1.3 shows, 𝒮\mathscr{S} may attain qq-superlinear convergence even when not all the nonzero vectors in n\mathbb{R}^{n} are eigenvectors of the eigenvalue 0 , i.e., when G(x)G^{\prime}\left(x^{*}\right) has only the zero eigenvalue but is defective. However, in such a case, the set of the possible trajectories is restricted.
(c) We notice that, when OQ(𝒮)=1O_{Q}(\mathscr{S})=1, the eventual sequences with qq superlinear convergence are highly sensitive to perturbations, which is not good news for the floating-point arithmetic context. We shall analyze the convergence of the perturbed sequences in Section 3.

The other characterization may be stated in terms of errors.
Theorem 2.4. Under the assumptions of Theorem 2.2, (xk)k0\left(x_{k}\right)_{k\geq 0} converges qq-superlinearly if and only if G(x)G^{\prime}\left(x^{*}\right) has a zero eigenvalue and,
starting from a certain step, the errors xkxx_{k}-x^{*} are corresponding eigenvectors,

G(x)(xkx)=0,kk0G^{\prime}\left(x^{*}\right)\left(x_{k}-x^{*}\right)=0,\quad\forall k\geq k_{0} (5)

Provided that GG^{\prime} is Hölder continuous at xx^{*} with exponent p,p(0,1]p,p\in(0,1], the above condition characterizes in fact the qq-convergence orders 1+p1+p.

Proof. The sufficiency is again obvious. For necessity, using (4), we get that

G(x)(xkx)=G(x)(xk0x),kk0+1,G^{\prime}\left(x^{*}\right)\left(x_{k}-x^{*}\right)=G^{\prime}\left(x^{*}\right)\left(x_{k_{0}}-x^{*}\right),\quad\forall k\geq k_{0}+1,

and therefore we get the conclusions, since these constant terms have the limit zero.

It is interesting to note that when (xk)k0\left(x_{k}\right)_{k\geq 0} converges qq-superlinearly to xx^{*} and the zero eigenvalue of G(x)G^{\prime}\left(x^{*}\right) is simple, then its trajectory is restricted from a certain step to a line containing xx^{*}, regardless of the nonlinearity of GG; when zero is a double eigenvalue, the trajectory is restricted from a certain step to a plane containing xx^{*}, etc. Theoretically, the trajectory may be arbitrary only when G(x)=0G^{\prime}\left(x^{*}\right)=0.

Consider now the affine mapping

G(x)=Bx+c,Bn×n,cn given ,G(x)=Bx+c,\quad B\in\mathbb{R}^{n\times n},\quad c\in\mathbb{R}^{n}\text{ given },

and for some initial approximation x0nx_{0}\in\mathbb{R}^{n} the iterations

xk+1=Bxk+c,k=0,1,x_{k+1}=Bx_{k}+c,\quad k=0,1,\ldots (6)

The condition ρ(B)<1\rho(B)<1 in the Ostrowski theorem becomes necessary and sufficient for these iterates to converge for any initial approximation x0nx_{0}\in\mathbb{R}^{n} to the unique fixed point xx^{*} in n\mathbb{R}^{n}; see e.g. Ref. 1, Theorem 10.1.5. Our results can be refined in this case.

Theorem 2.5. If ρ(B)=0\rho(B)=0, then the sequence given by (6) converges to xx^{*} in less than nn steps, for any initial approximation x0nx_{0}\in\mathbb{R}^{n}. If 0<ρ(B)<10<\rho(B)<1, then (xk)k0\left(x_{k}\right)_{k\geq 0} converges qq-superlinearly if and only if there exists k0k_{0}\in\mathbb{N} such that

Bk+1[(IB)x0c]=0,kk0,B^{k+1}\left[(I-B)x_{0}-c\right]=0,\quad\forall k\geq k_{0}, (7)

in which case xk0+1=xx_{k_{0}+1}=x^{*}.
Proof. It is known that a matrix has a spectral radius zero iff it is nilpotent; i.e., there exists l0l_{0}\in\mathbb{N} such that Bl0=0B^{l_{0}}=0, in which case l0l_{0} may be
taken smaller than nn (see Ref. 38, Problem 159). The relation

xk+1xk=Bk(x1x0),k=2,3,,x0n,x_{k+1}-x_{k}=B^{k}\left(x_{1}-x_{0}\right),\quad k=2,3,\ldots,\quad\forall x_{0}\in\mathbb{R}^{n}, (8)

completes the proof of the first affirmation.
The relation (7) is obtained immediately from (4) and (8).
When the initial approximation is taken to be zero, we obtain the following result.

Corollary 2.1. If ρ(B)<1\rho(B)<1 and x0=0x_{0}=0, then the sequence given by (6) converges in a finite number of steps if and only if ρ(B)=0\rho(B)=0 or there exists k0k_{0}\in\mathbb{N} such that

Bk0c=0.B^{k_{0}}c=0.

It is worth noting that, in the affine case, the qq-superlinear convergence reduces to convergence in a finite number of steps, i.e., to convergence with infinite order.

3. Stability of the Successive Approximations

Assume that the evaluation of GG at each step is performed only approximately,

xk+1=G(xk)+δk,k=0,1,x_{k+1}=G\left(x_{k}\right)+\delta_{k},\quad k=0,1,\ldots (9)

These iterates may be viewed again as IN iterations for solving

F(x)=xG(x)=0,F(x)=x-G(x)=0,

since

[IG(xk)][G(xk)+δkxk]\displaystyle{\left[I-G^{\prime}\left(x_{k}\right)\right]\left[G\left(x_{k}\right)+\delta_{k}-x_{k}\right]}
=[xkG(xk)]+G(xk)[xkG(xk)]+[IG(xk)]δk\displaystyle=-\left[x_{k}-G\left(x_{k}\right)\right]+G^{\prime}\left(x_{k}\right)\left[x_{k}-G\left(x_{k}\right)\right]+\left[I-G^{\prime}\left(x_{k}\right)\right]\delta_{k}

We obtain the following result.
Theorem 3.1. Assume that GG satisfies the assumptions of Theorem 2.2, and that the sequence (9) of perturbed successive approximations converges to xx^{*}. Then the convergence is qq-superlinear iff

G(xk)[xkG(xk)]+[IG(xk)]δk\displaystyle\left\|G^{\prime}\left(x_{k}\right)\left[x_{k}-G\left(x_{k}\right)\right]+\left[I-G^{\prime}\left(x_{k}\right)\right]\delta_{k}\right\|
=(xkG(xk)), as k.\displaystyle=\left(\left\|x_{k}-G\left(x_{k}\right)\right\|\right),\quad\text{ as }k\rightarrow\infty.

Additionally, if GG^{\prime} is Hölder continuous at xx^{*} with exponent pp, the convergence is with qq-order 1+p1+p iff

G(xk)[xkG(xk)]+[IG(xk)]δk\displaystyle\left\|G^{\prime}\left(x_{k}\right)\left[x_{k}-G\left(x_{k}\right)\right]+\left[I-G^{\prime}\left(x_{k}\right)\right]\delta_{k}\right\|
=O(xkG(xk)1+p), as k\displaystyle=O\left(\left\|x_{k}-G\left(x_{k}\right)\right\|^{1+p}\right),\quad\text{ as }k\rightarrow\infty

and with rr-order 1+p1+p iff
G(xk)[xkG(xk)]+[IG(xk)]δk0,G^{\prime}\left(x_{k}\right)\left[x_{k}-G\left(x_{k}\right)\right]+\left[I-G^{\prime}\left(x_{k}\right)\right]\delta_{k}\rightarrow 0,\quad with rr-order 1+p1+p, as kk\rightarrow\infty.
Remark 3.1. We notice that sufficient conditions for the high convergence orders of the perturbed successive approximations are obtained when xkG(xk)x_{k}-G\left(x_{k}\right) and δk\delta_{k} are eigenvectors corresponding to the eigenvalue 0 of G(x)G^{\prime}\left(x^{*}\right) and δk\delta_{k} converges to zero with a certain speed. In such a case, the trajectory remains in the set of the high convergence trajectories corresponding to the unperturbed iterations.

4. Conclusions

The condition ρ(G(x))<1\rho\left(G^{\prime}\left(x^{*}\right)\right)<1 in the Ostrowski theorem ensures that the fixed point xx^{*} is an attraction point and yields the lowest rr-convergence order attained by all the sequences of successive approximations. Our results characterize the high qq-convergence orders of a single such sequence [which may be attained even when ρ(G(x))0\rho\left(G^{\prime}\left(x^{*}\right)\right)\neq 0 ], indicating the set of all possible trajectories with convergence orders up to two (in this sense, the study of the convergence orders higher than two may be a direction of future research). The Ostrowski theorem requires the knowledge of the spectral radius of G(x)G^{\prime}\left(x^{*}\right), while ours require the spectral structure of G(x)G^{\prime}\left(x^{*}\right).

The results obtained may be applied to the study of the iterative methods used in practice and which may be written as fixed-point problems with known mappings GG and GG^{\prime}; they may be applied also to the study of the fixed-point problems in the more abstract setting of Banach spaces (differential and integral equations, dynamical systems, etc). Further, we shall study the existence and estimates for the radius of the attraction balls for the successive approximations with high convergence orders.

References

  1. 1.

    Ortega, J. M., and Rheinboldt, W. C., Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, NY, 1970.

  2. 2.

    Potra, F. A., On Q-Order and R-Order of Convergence, Journal of Optimization Theory and Applications, Vol. 63, pp. 415-431, 1989.

  3. 3.

    Rheinboldt, W. C., Methods for Solving Systems of Nonlinear Equations, SIAM, Philadelphia, Pennsylvania, 1998.

  4. 4.

    Potra, F. A., Q-Superlinear Convergence of the Iterates in Primal-Dual InteriorPoint Methods, Mathematical Programming (to appear).

  5. 5.

    Ostrowski, A. M., Solution of Equations and Systems of Equations, Academic Press, New York, NY, 1966.

  6. 6.

    Householder, A. S., The Theory of Matrices in Numerical Analysis, Dover, New York, NY, 1974.

  7. 7.

    Argyros, I., and Szidarovszky, F., The Theory and Applications of Iteration Methods, CRC Press, Boca Raton, Florida, 1993.

  8. 8.

    Argyros, I., On the Convergence of the Modified Contractions, Journal of Computational and Applied Mathematics, Vol. 55, pp. 183-189, 1994.

  9. 9.

    Brown, P. N., A Local Convergence Theory for Combined Inexact-Newton/ Finite-Difference Projection Methods, SIAM Journal on Numerical Analysis, Vol. 24, pp. 407-434, 1987.

  10. 10.

    Brown, P. N., and Saad, Y., Convergence Theory of Nonlinear Newton-Krylov Algorithms, SIAM Journal on Optimization, Vol. 4, pp. 297-330, 1994.

  11. 11.

    Cătinaș, E., Newton and Newton-Krylov Methods for Solving Nonlinear Systems in n,PhD\mathbb{R}^{n},\mathrm{PhD} Thesis, Babes-Bolyai University of Cluj-Napoca, ClujNapoca, Romania, 1999.

  12. 12.

    Cătinas, E., On the High Convergence Orders of the Newton-GMBACK Methods, Revue d’Analyse Numérique et de Théorie de l’Approximation, Vol. 28, pp. 125-132, 1999.

  13. 13.

    Cătinaş, E., A Note on the Quadratic Convergence of the Inexact Newton Methods, Revue d’Analyse Numérique et de Théorie de l’Approximation, Vol. 29, pp. 129-134, 2000.

  14. 14.

    Cătinaș, E., Inexact Perturbed Newton Methods and Applications to a Class of Krylov Solvers, Journal of Optimization Theory and Applications, Vol. 108, pp. 543-570, 2001.

  15. 15.

    Cătinaş, E., The Inexact, Inexact Perturbed, and Quasi-Newton Methods are Equivalent Models, Mathematics of Computation (to appear).

  16. 16.

    Dembo, R. S., Eisenstat, S. C., and Steihaug, T., Inexact Newton Methods, SIAM Journal on Numerical Analysis, Vol. 19, pp. 400-408, 1982.

  17. 17.

    Dennis, J. E., Jr., and Moré, J. J., A Characterization of Superlinear Convergence and Its Application to Quasi-Newton Methods, Mathematics of Computation, Vol. 28, pp. 549-560, 1974.

  18. 18.

    Dennis, J. E., Jr., and Moré, J. J., Quasi-Newton Methods, Motivation and Theory, SIAM Review, Vol. 19, pp. 46-89, 1977.

  19. 19.

    Dennis, J. E., Jr., and Schnabel, R. B., Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Series in Computational Mathematics, Prentice-Hall, Englewood Cliffs, New Jersey, 1983.

  20. 20.

    Deuflhard, P., and Potra, F. A., Asymptotic Mesh Independence of NewtonGalerkin Methods via a Refined Mysovskii Theorem, SIAM Journal on Numerical Analysis, Vol. 29, pp. 1395-1412, 1992.

  21. 21.

    Eisenstat, S. C., and Walker, H. F., Globally Convergent Inexact Newton Methods, SIAM Journal on Optimization, Vol. 4, pp. 393-422, 1994.

  22. 22.

    Eisenstat, S. C., and Walker, H. F., Choosing the Forcing Terms in an Inexact Newton Method, SIAM Journal on Scientific Computing, Vol. 17, pp. 16-32, 1996.

  23. 23.

    Istrătescu, V. I., Introduction to the Fixed-Point Theory, Editura Academiei RSR, Bucharest, Romania, 1973 (in Romanian).

  24. 24.

    Kelley, C. T., Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, Pennsylvania, 1995.

  25. 25.

    Kelley, C. T., and Sachs, E. W., Truncated Newton Methods for Optimization with Inaccurate Functions and Gradients, Report CRSC-TR-99-20, Center for Research in Scientific Computation, North Carolina State University, 1999.

  26. 26.

    Kerkhoven, T., and Saad, Y., On Acceleration Methods for Coupled Nonlinear Elliptic Systems, Numerische Mathematik, Vol. 60, pp. 525-548, 1992.

  27. 27.

    Mărușter, Șt., Quasi-Nonexpansivity and Two Classical Methods for Solving Nonlinear Equations, Proceedings of the American Mathematical Society, Vol. 62, pp. 119-123, 1977.

  28. 28.

    Mărușter, Șt.,Numerical Methods for Solving Nonlinear Equations, Editura Tehnică, Bucharest, Romania, 1981 (in Romanian).

  29. 29.

    Păvăloiu, I., Introduction to the Theory of Approximating the Solutions of Equations, Editura Dacia, Cluj-Napoca, Romania, 1976 (in Romanian).

  30. 30.

    Pãvăloiu, I., The Solving of Nonlinear Equations by Interpolation, Editura Dacia, Cluj-Napoca, Romania, 1981 (in Romanian).

  31. 31.

    Potra, F. A., On the Numerical Stability of a Class of Iterative Processes, Itinerant Seminar of Functional Equations, Approximation and Convexity, Timişoara, Romania, pp. 51-54, 1980 (in Romanian).

  32. 32.

    Potra, F. A., On a Class of Iterative Procedures for Solving Nonlinear Equations in Banach Spaces, Computational Mathematics, Banach Center Publications, PWN-Polish Scientific Publishers, Warsaw, Poland, Vol. 13, pp. 607-621, 1984.

  33. 33.

    Potra, F. A., and Pták, V., Sharp Error Bounds for Newton’s Process, Numerische Mathematik, Vol. 34, pp. 63-72, 1980.

  34. 34.

    Potra, F. A., and Pták, V., Nondiscrete Induction and Iterative Processes, Pitman, London, England, 1984.

  35. 35.

    Walker, H. F., An Approach to Continuation Using Krylov Subspace Methods, Computational Science in the 21st Century, Edited by M. O. Bristeau, G. Etgen, W. Fitzgibbon, J. L. Lions, J. Periaux, and M. F. Wheeler, John Wiley and Sons, New York, NY, pp. 72-82, 1997.

  36. 36.

    Wilkinson, J. H., The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, England, 1965.

  37. 37.

    Zeidler, E., Nonlinear Functional Analysis and Its Applications, I: Fixed-Point Theorems, Springer Verlag, New York, NY, 1986.

  38. 38.

    Glazman, I. and Liubitch, Y., Analyse Linéaire dans les Espaces de Dimensions Finies, Manuel en Problèmes, Mir, Moscow, Russia, 1972.

2002

Related Posts