On the convergence of some quasi-Newton iterates
studied by I. Păvăloiu

Emil Cătinaş$^\ast $

October 3, 2015.

$^\ast $“T. Popoviciu” Institute of Numerical Analysis, P.O. Box 68-1, Cluj-Napoca, Romania, e-mail: ecatinas@ictp.acad.ro.

Dedicated to prof. I. Păvăloiu on the occasion of his 75th anniversary

I. Păvăloiu has considered a Banach space $X$ and the problem

\begin{equation*} x=\lambda D\left( x\right) +y \qquad (D:X\rightarrow X,\ \lambda \in {\mathbb R}, \ y \in X \ {\rm given}) \label{f.1}\end{equation*}

written in the equivalent form $F(x):=x -\lambda D\left( x\right) -y=0$ and solved by the general quasi-Newton method

\begin{equation*} x_{k+1}=x_{k}-A\left( x_{k}\right) \left[ x_{k}-\lambda D\left( x_{k}\right) -y\right] ,\qquad k=0,1,\ldots \end{equation*}

Semilocal convergence results were obtained, ensuring linear convergence of this method. Further results were obtained for the iterates:

\[ x_{k+1}=x_{k}-[I+\lambda D^\prime \left( x_{k}\right)] \left[ x_{k}-\lambda D\left( x_{k}\right) -y\right] ,\qquad k=0,1,\ldots \]

In this note, we analyze the local convergence of these iterates, and, using the Ostrowski local attraction theorem, we give some sufficient conditions such that the iterates converge locally either linearly or with higher convergence orders. The local convergence results require fewer differentiability assumptions for $D$.

MSC. 65H10.

Keywords. quasi-Newton methods, Ostrowski local attraction theorem, local convergence.

1 Introduction

In [ 6 ] , Păvăloiu has considered a Banach space $(X,\| \cdot \| )$, a nonlinear mapping $D:X\rightarrow X$, a parameter $\lambda \in {\mathbb R}$, an element $y \in X$ and the equation (arising from certain integral equations)

\begin{equation} x=\lambda D\left( x\right) +y, \label{f.1 eq x=lam D(x)+y}\end{equation}

solved by the following iterations:

\begin{equation} x_{k+1}=x_{k}-A\left( x_{k}\right) \left[ x_{k}-\lambda D\left( x_{k}\right) -y\right] ,\qquad k=0,1,\ldots ,x_{0}\in E\subseteq X, \label{f.2 xn+1=xn-An(xn-lam D(xn)-y)}\end{equation}

where $A(x) :E\rightarrow E$ is a linear continuous mapping (i.e., $A(x)\in {\mathcal L}(X)$), for each $x\in E$.

Denoting

\begin{equation} \label{f F(x)=x-lam D(x)+y} F(x)=x-\lambda D(x) - y, \end{equation}

the above iterations can be written as

\[ x_{k+1}=x_{k}-A\left( x_{k}\right) F(x_{k}) ,\qquad k=0,1,\ldots ; \]

in a subsequent paper, Păvăloiu [ 7 ] has analyzed the above iterations for general mappings $F$, not necessarily given by (3).

The following semilocal convergence results were obtained.

Theorem 1

[ 6 ] If the mappings $D$ and $A\left( x\right) ,$ the initial approximation $x_{0}$ and the real number $r{\gt}0$ satisfy the following conditions:

the mapping $D$ admits Fréchet derivatives of order one and two on the ball $S=S\left( x_{0},r\right) ;$
$\left\Vert A\left( x\right) \right\Vert \leq \beta ,$ for each $x\in S,$ and some $\beta {\gt}0;$
$\left\Vert I-F^{\prime }\left( x\right) A\left( x\right) \right\Vert \leq \alpha ,$ for each $x\in S ,$ and some $\alpha {\gt}0;$
$\left\Vert D^{\prime \prime }\left( x\right) \right\Vert \leq M/\left\vert \lambda \right\vert ,$ for each $x\in S ,$ and some $M{\gt}0;$
$\frac{\beta \rho _{0}}{1-d_{0}} \leq r,$ where $\rho _{0}=\left\Vert F\left( x_{0}\right) \right\Vert ,\ d_{0}=\frac{M\beta ^{2}\rho _{0}}{2}+\alpha ;$
$d_{0}{\lt}1,$

then the sequence $\left( x_{k}\right) _{k\geq 0}$ given by (2) converges: $x^\ast =\lim _{k\rightarrow \infty }x_{k},$ with $F\left( x^\ast \right) =0$. The following estimations hold:

\begin{equation*} \left\Vert x^\ast -x_{k}\right\Vert \leq \frac{\beta d_{0}^{k}\rho _{0}}{1-d_{0}},\qquad k=0,1,\ldots \label{f.3}\end{equation*}

When $\| \lambda D^\prime (x)\| {\lt}1$, it is known that the operator $I-\lambda D^\prime \left( x\right) $ is invertible, with $\big(I-\lambda D^\prime (x) \big)^{-1}= I+\lambda D^\prime (x) + \lambda ^2 D^\prime (x)^2 + \ldots $ Păvăloiu has considered the operator $A(x)$ as being given by the first two terms of this expansion, obtaining the following iterates

\begin{equation} x_{k+1}=x_{k}-\big(I+\lambda D^\prime (x_{k}) \big) \left[ x_{k}-\lambda D\left( x_{k}\right) -y\right] ,\qquad k=0,1,\ldots , \label{f. xn+1=xn-(I+D'xn)(xn-lam D(xn)-y)}\end{equation}

and the following result.

Theorem 2

[ 6 ] If the mapping $D,$ the initial approximation $x_{0}$ and the real number $r{\gt}0$ satisfy the following assumptions:

the mapping $D$ admits Fréchet derivatives of order one and two for each $x\in S=S\left( x_{0},r\right) ;$
$\left\Vert D^{\prime }\left( x\right) \right\Vert \leq b,$ for each $x\in S;$
$\left\Vert D^{\prime \prime }\left( x\right) \right\Vert \leq M/\left\vert \lambda \right\vert ,$ for each $x\in S;$
$2-M\rho _{0}{\gt}0,$ where $\rho _{0}=\left\Vert x_{0}-\lambda D\left( x_{0}\right) -y\right\Vert ;$
$ \tfrac {\rho _{0}\left( 1+\left\vert \lambda \right\vert b\right) }{1-d_{0}}\leq r ,$ where $d_{0}=M\tfrac {(1+\left\vert \lambda \right\vert b)^{2}}{2} \rho _{0} +\lambda ^{2} b^{2}; $
$\left\vert \lambda \right\vert \leq \displaystyle \tfrac { 2-M\rho _{0}}{ b( 2+M\rho _{0})} ,$

then the sequence given by (4) converges to a solution $x^\ast $ of equation (1) and the following estimates hold:

\[ \left\Vert x^\ast -x_{k}\right\Vert \leq \frac{\left( 1+\left\vert \lambda \right\vert b\right) d_{0}^{k}\rho _{0}}{1-d_{0}},\qquad k=0,1,\ldots \]

Remark 3

We note that the assumptions of the above results require the existence of the second derivative of $D$, and also that the smaller $d_0$ (i.e., the smaller $|\lambda |, b, M$ and $\rho _0$), the faster is the convergence of sequence (4). â–¡

2 Local convergence

In order to analyze the local convergence of the considered iterates, we shall use the Ostrowski local attraction theorem, which offers sharp general conditions ensuring the local convergence. We shall consider for simplicity that $X={\mathbb R}^n$, with $\| \cdot \| $ an arbitrary given norm, though the results hold in Banach spaces (see, e.g., [ 5 , NR 10.1-3. ] ).

Theorem 4 Ostrowski local attraction theorem

[ 5 , Th. 10.1.3 ] Suppose that $G:\Omega \subset {\mathbb R}^n \rightarrow {\mathbb R}^n$ has a fixed point $x^\ast \in \operatorname {int}(\Omega )$ and is differentiable at $x^\ast $. If the spectral radius of $G^\prime (x^\ast )$ satisfies

\[ \rho (G^\prime (x^\ast ))=\sigma {\lt}1, \]

then $x^\ast $ is a point of attraction of the successive approximations $x_{k+1}=G(x_k),$ $k\geq 0,$ i.e., there exists an open neighborhood $V\subseteq \Omega $ of $x^\ast $ such that $\forall x_0 \in V$, the successive approximations given above all lie in $\Omega $ and converge to $x^\ast $.

Remark 5

The classical book of Ortega and Rheinboldt also contains completions to this result (see [ 5 , Ch. 10 ] ), in the sense that the spectral radius $\sigma $ yields the “worst” ($r$-)convergence factor among the sequences converging to the fixed point: when $\sigma \neq 0$, the convergence of the (whole) process is not faster than linear (though, theoretically, there may exist sequences converging at least $r$-superlinearly), while when $\sigma = 0$, all the sequences converge at least $r$-superlinearly. This result was refined by us in [ 1 ] , where we have shown that $x_k \rightarrow x^\ast $ $q$-superlinearly iff $G^\prime (x^\ast )$ has a zero eigenvalue and, starting from a certain step, $x^\ast - x_k$ are corresponding eigenvectors. This implies that no $q$-superlinear convergence may occur when $G^\prime (x^\ast )$ has no zero eigenvalue. â–¡

The above result can be applied to method (2) if we notice that the derivative of $x-A(x)F(x)$ has a simple form at the fixed point $x^\ast $, the following auxiliary result being similar to (Lemma) 10.2.1 in [ 5 ] .

Lemma 6

Suppose that $F:\Omega \subset {\mathbb R}^n \rightarrow {\mathbb R}^n$ is differentiable at a point $x^\ast \in \operatorname {int}(\Omega )$ for which $F(x^\ast )=0$. Let $A:\Omega _0 \rightarrow {\mathcal L}({\mathbb R}^n)$ be defined on an open neighborhood $\Omega _0 \subseteq \Omega $ of $x^\ast $ and continuous at $x^\ast $. Then the mapping $G:S\rightarrow {\mathbb R}^n,$

\[ G(x)=x-A(x)F(x) \]

is differentiable at $x^\ast $ and

\[ G^\prime (x^\ast )=I-A(x^\ast )F^\prime (x^\ast ). \]

Proof â–¼

The proof is elementary:

\begin{align*} & \| G(x)-G(x^\ast ) - [I-A(x^\ast )F^\prime (x^\ast )](x-x^\ast )\| = \\ & =\| (A(x)-A(x^\ast ))F(x) + A(x^\ast )[F(x)-F(x^\ast )-F^\prime (x^\ast )(x-x^\ast )]\| \\ & =o(\| x-x^\ast \| ), \qquad {\rm as\ } x \rightarrow x^\ast . \end{align*}

Now we can state the main results of this note. First, consider iterations (2).

Theorem 7

Let $D:{\mathbb R}^n\rightarrow {\mathbb R}^n$, $y \in {\mathbb R}^n$, $\lambda \in {\mathbb R}$, $x^\ast $ a solution of $F(x):=x-\lambda D(x)-y=0$, and the mapping $A$ is defined on an open neighborhood $E$ of $x^\ast $, $A:E \rightarrow {\mathcal L}({\mathbb R}^n)$. If $D$ is differentiable at $x^\ast $, $A$ is continuous at $x^\ast $ and

\[ \rho \big(I-A(x^\ast )(I-\lambda D^\prime (x^\ast )\big) {\lt} 1 \]

then $x^\ast $ is a point of attraction for the method (2).

The proof is an immediate application of Lemma 6 and Theorem 4.

The conditions are much simpler for the case of the second method.

Theorem 8

Let $D:{\mathbb R}^n\rightarrow {\mathbb R}^n$, $y \in {\mathbb R}^n$, $\lambda \in {\mathbb R}$, $x^\ast $ a solution of $F(x):=x-\lambda D(x)-y=0$. If the mapping $D$ is differentiable on an open neighborhood of $x^\ast $, with $D^\prime $ continuous at $x^\ast $, and

\[ |\lambda | \rho \big(D^\prime (x^\ast )\big) {\lt} 1 \]

then $x^\ast $ is a point of attraction for the method (4).

Proof â–¼

By Lemma 6 we get

\[ G^\prime (x^\ast )= I - \big(I+\lambda D^\prime (x^\ast )\big)(I-\lambda D^\prime (x^\ast )\big)= \lambda ^2 D^\prime (x^\ast )^2, \]

whence, by Theorem 4, the conclusion follows.

The same observations as in Remark 5 apply.

Bibliography

1: E. Cătinaş, On the superlinear convergence of the successive approximations method, J. Optim. Theory Appl., 113 (2002) no. 3, pp. 473–485. $\includegraphics[scale=0.1]{ext-link.png}$
2: E. Cătinaş, The inexact, inexact perturbed and quasi-Newton methods are equivalent models, Math. Comp., 74 (2005) no. 249, pp. 291–301. $\includegraphics[scale=0.1]{ext-link.png}$
3: E. Cătinaş, On the convergence orders, manuscript.
4: Diaconu, A., Păvăloiu, I., Sur quelque méthodes itératives pour la resolution des équations opérationnelles, Rev. Anal. Numér. Theor. Approx., vol. 1, 45–61 (1972). $\includegraphics[scale=0.1]{ext-link.png}$
5: J.M. Ortega, W.C. Rheinboldt, Iterative solution of nonlinear equations in several variables, Academic Press, New York, 1970.
6: I. Păvăloiu, La convergence de certaines méthodes itératives pour résoudre certaines equations operationnelles, Seminar on functional analysis and numerical methods, Preprint no. 1 (1986), pp. 127-132 (in French).
7: I. Păvăloiu, A unified treatment of the modified Newton and chord methods, Carpathian J. Math. 25 (2009) no. 2, pp. 192–196.

On the convergence of some quasi-Newton iterates studied by I. Păvăloiu

1 Introduction

2 Local convergence

Bibliography

On the convergence of some quasi-Newton iterates
studied by I. Păvăloiu