# Expanding the applicability of the Gauss-Newton method for a certain class of systems of equations

November 28, 2013.

\(^1\)Department of Mathematicsal Sciences, Cameron University, Lawton, OK 73505, USA, Email: ioannisa@cameron.edu.

\(^2\)Department of Mathematical and Computational Sciences, National Institute of Technology Karnataka, India-575 025, Email: sgeorge@nitk.ac.in.

We present a new semi-local convergence analysis of the Gauss-Newton method in order to solve a certain class of systems of equations under a majorant condition. Using a center majorant function as well as a majorant function and under the same computational cost as in earlier studies such as [ 11 ] - [ 13 ] , we present a semilocal convergence analysis with the following advantages: weaker sufficient convergence conditions; tighter error estimates on the distances involved and an at least as precise information on the location of the solution. Special cases and applications complete this study.

**MSC.** 65H10, 65G99, 65K10, 47H17, 49M15.

**Keywords.** Gauss-Newton method, Newton’s method, semilocal convergence, least squares problem.

# 1 Introduction

Let \(D\subseteq \mathbb {R}^n\) be open. Let \(F:D\rightarrow \mathbb {R}^m\) be continuously Fréchet- differentiable. We are concerned with the problem of finding least squares solutions \(x^\ast \) of the nonlinear least squares problem

The least squares solutions of (1.1) are stationary points of \(G(x)=\| F(x)\| ^2.\) Diversified problems arising in applied sciences and in engineering can be expressed in a form like (1.1). For example in data fitting \(n\) is the number of parameters and \(m\) is the number of observations. Other examples can be found in [ 5 , 15 , 18 ] and the references therein. The famous Gauss-Newton method defined by

where \(x_0\) is an initial point and \(F'(x_k)^\dagger \) the Moore-Penrose inverse of the linear operator \(F'(x_k)\) has been used extensively to generate a sequence \(\{ x_k\} \) converging to \(x^\ast \) [ 1 ] – [ 5 ] , [ 7 , 9 , 19 , 13 , 14 , 16 ] .

In the present paper, we are motivated by the work of Goncalves and Oliveira in [ 13 ] (see also [ 11 ] , [ 12 ] ) and optimization considerations. They provided a semilocal convergence analysis for the Gauss-Newton method (1.2) for systems of nonlinear equations where the function \(F\) satisfies

where \(k\in [0, 1)\) and \(I_{\mathbb {R}^m}\) denotes the identity operator on \({\mathbb {R}^m}.\) Their semilocal convergence analysis is based on the construction of a majorant function (see \((h_3)\)). Their results unify the classical results for functions involving Lipschitz derivative [ 5 , 6 , 15 , 17 ] with results for analytical functions (\(\alpha \)-theory or \(\gamma \)-theory) [ 8 , 10 , 14 , 16 , 18 , 19 ] .

We introduce a center majorant function (see \((h_3)\)) which is a special case of the majorant function that can provide more precise estimates on the distances \(\| F'(x)^\dagger \| .\) This modification leads to: weaker sufficient convergence conditions; more precise error estimates on the distances \(\| x_{k+1}-x_k\| , \| x_k-x^\ast \| \) and an at least as precise information on the location of the solution.

The paper is organized as follows: Section 2 contains standard information on Moore-Penrose inverses added so that the paper should be as self contained as possible. The semilocal convergence analysis of the Gauss-Newton method is presented in Section 3. Special cases and applications are given in the concluding Section 4.

# 2 Background

Let \(U(x, r)\) denote the open ball with center \(x\in D\) and radius \(r {\gt} 0.\) Let also \(\overline{U(x, r)}\) denote its closure. The Moore-Penrose inverse of a linear operator \(M: \mathbb {R}^n\longrightarrow \mathbb {R}^m\) is the linear operator \(M^\dagger :\mathbb {R}^m\longrightarrow \mathbb {R}^n\) which satisfies

where \(M^\ast \) denotes the adjoint of \(M.\) We denote by \(\operatorname {Ker}(M)\) and \(\operatorname {Im}(M)\) the kernel and image of \(M.\) It follows from (2.1) that

where \(\prod _S\) denote the projection of \(\mathbb {R}^n\) onto the subspace \(S.\) Moreover, if \(M\) is surjective, we have

We refer the reader to [ 7 ] for more properties and results on Moore-Penrose inverses.

Next, we list a number of usefull auxiliary results, starting with Banach’s lemma on invertible operators [ 5 , 7 , 15 ] .

[ 7 , 13 ] Let \(M_1, M_2 : \mathbb {R}^n\longrightarrow \mathbb {R}^m\) be linear and continuous. Suppose that \(1\leq \operatorname {rank}(M_2)\leq \operatorname {rank}(M_1)\) and \(\| M_1^\dagger \| \| M_1-M_2\| {\lt} 1.\) Then the following hold

# 3 Semi-local convergence analysis

In this section we present the semi-local convergence analysis of the Gauss-Newton method. We shall use the hypotheses given by:

\((H)\) Let \(D\subseteq \mathbb {R}^n\) be open and \(F:D\rightarrow \mathbb {R}^m\) be continuously Fréchet-differentiable.

\((h_1)\) Suppose that

where \(\kappa \in [0, 1).\) Let \(x_0\in D\) and set \(\beta = \| F'(x_0)^\dagger F(x_0)\| {\gt} 0,\, \, \, F'(x_0) \neq 0.\)

\((h_2)\) Suppose that

\((h_3)\) Suppose that there exist \(R {\gt} 0\) and \(f_0, f : [0, R)\rightarrow \mathbb {R}\) such that \(U(x_0, R)\subseteq D,\)

and

with \(\| y-x\| +\| x-x_0\| {\lt} R.\)

\((h_4)\)

\((h_5)\) \(f_0', f'\) are convex and strictly increasing.

Let \(\lambda \geq 0\) be such that \(\lambda \geq -\kappa f'(\beta )\) and define \(h_{\beta ,\lambda }:[0, R)\rightarrow \mathbb {R}\) by

\((h_6)\) \(h_{\beta ,\lambda }=0\) for some \(t\in [0, R).\)

\((h_7)\) For each \(s, t, u \in [0,R)\) with \(s \leq t \leq u\)

If \(f_0=f,\) then hypotheses \((H)\) and our results reduce to the ones given in [ 13 ] . Notice that the second hypotheses in \((h_3)\) implies the first one but not necessarily vice versa. That is the first hypotheses is a special case of the second. From the computational point of view, the computation of function \(f\) involves the computation of function \(f_0.\) Hence, the results in this study are given under the same computational cost as in [ 13 ] (since the rest of the \((H)\) hypotheses are the same as in [ 13 ] ). From now on we assume the \((H)\) conditions hold.

The majorizing iteration \(\{ s_k\} \) for \(\{ x_k\} \) is given by

The corresponding iteration \(\{ t_n\} \) used in [ 13 ] (for \(f_0 = f\)) is given by

The proofs in this study also for each \(k=0,1,2,\ldots \) (see Lemma 11 in [ 13 ] and our Lemma 3.8) shall show that the following iterations \(\{ r_k\} \) and \(\{ q_k\} \) are also majorizing for \(\{ x_k\} :\)

and for \(g_{\beta ,\lambda }=\beta +\lambda t+f_0(t),\)

New majorizing sequence \(\{ s_k\} , \{ r_k\} , \{ q_k\} \) can not only be tighter than \(\{ t_k\} \) but their sufficient convergence conditions can be weaker than the corresponding ones for \(\{ t_k\} .\) â–¡

Next, we first study the convergence of iteration \(\{ s_k\} \) and the properties of majorizing functions. The proofs are analogous to the corresponding ones in [ 13 ] . We simply replace \(f'(t)\) by \(f_0'(t)\) in the proofs and use \(f_0'(t) \leq f'(t).\) However, there are some differences in the proofs which are not immediate to the reader. Therefore, in what follows we will point out those differences. As already noted above for the rest of the differences, we refer the reader to [ 13 ] .

[ 13 ] The following hold

\(h_{\beta ,\lambda }(0)=\beta ,\, \, h_{\beta ,\lambda }'(0)= \lambda ^{-1};\)

\(h_{\beta ,\lambda }'\) is convex and strictly increasing.

The function \(h_{\beta ,\lambda }\) has a smallest zero \(s^\ast \in [0, R),\) is strictly convex, and the following hold:

Moreover, \(f_0'(t^\ast ) \leq h_{\beta ,\lambda }(t^\ast ) = f'(t^\ast )\leq 0.\)

Then, the preceeding inequality can be written as

(since \(-h_{\beta ,\lambda }'(t) {\gt} 0\)). We also have that

which together with the preceeding estimates shows the third estimate in (3.5). Moreover, we have that \(h_{\beta ,\lambda }(t) {\gt} 0\) for each \(t\in [0, t^\ast )\) and \(h_{\beta ,\lambda }(t^\ast )=0.\) Then, we must have \(h_{\beta ,\lambda }'(t)\leq 0\) for each \(t\in [0, t^\ast ).\) Hence, the last estimate follows from

It follows from the second estimate in (3.5) that the map \(\psi _{h_{\beta ,\lambda }}:\)

is well defined. Notice that if \(f_0 = f\) this map reduces to the one used in [ 13 ] . Moreover, if \(\lambda = 0\) this map reduces to the one given in [ 11 ] .

We can define sequence \(\{ s_k\} \) given in (3.1) by

The following hold

\(\beta \leq \psi _{h_{\beta ,\lambda }}(t) {\lt} t^\ast , \forall t\in [0, t^\ast ).\)

The map \( \psi _{h_{\beta ,\lambda }}\) maps \([0, t^\ast )\) in \([0, t^\ast )\) and \(t {\lt} \psi _{h_{\beta ,\lambda }}(t),\, \, \, \, \textnormal{for each}\, \, \, t\in [0, t^\ast ).\) If \(\lambda =0\) or \(\lambda =0\) and \(f_0'(t^\ast ) {\lt} 0,\) then, the following hold, respectively

\begin{eqnarray*} \nonumber t^\ast -\psi _{h_{\beta ,\lambda }}(t)& \leq & \tfrac {1}{2}(t^\ast -t),\\ t^\ast -\psi _{h_{\beta ,\lambda }}(t)& \leq & \frac{D^-f_0'(t^\ast )}{-2f_0'(t^\ast )}(t^\ast -t)^2, \, \, \, \textnormal{for each}\, \, t\in [0, t^\ast ) \end{eqnarray*}where \(D^-\) stands for the left directional derivative [ 5 , 13 ] .

The sequence \(\{ s_k\} \) is: well defined; strictly increasing; contained in \([0, t^\ast )\) and converges to \(t^\ast .\)

If \(\lambda =0\) or \(\lambda =0\) and \(f_0'(t^\ast ) {\lt} 0,\) the sequence \(\{ s_k\} \) converges \(Q\)-linearly or \(Q\)-quadratically to \(t^\ast ,\) respectively and

\begin{eqnarray*} \nonumber t^\ast -s_{k+1}& \leq & \tfrac {1}{2}(t^\ast -s_k),\\ s^\ast -s_{k+1}& \leq & \frac{D^-f_0'(t^\ast )}{-2f_0'(t^\ast )}(t^\ast -s_k)^2, \, \, \, \textnormal{for each}\, \, k=0,1,2,\ldots . \end{eqnarray*}

In what follows we prove the convergence of Gauss-Newton method (1.2). We need the following Banach type Lemma:

Suppose that \(x\in U(x_0, t)\) for \(t\in [0, t^\ast ).\) Then, the following hold \(\operatorname {rank}(F'(x)) = \operatorname {rank}(F'(x_0)) \geq 1\) and

In view of the preceeding estimate and Lemma 2.2 we deduce that \(\operatorname {rank}(F'(x)) = \operatorname {rank}(F'(x_0)) \geq 1\) and

It is worth noticing that the second inequality in \((h_3)\) is used in [ 13 ] to obtain instead of (3.5) that

which is more expensive to arrive at and less precise than (3.5) if \(f_0'(t) {\lt} f'(t)\) for each \(t\in [0, t^\ast ).\) This important observation makes the main difference in our approach over the one used in [ 13 ] and the earlier studies [ 11 , 12 , 14 , 16 , 19 ] .

As in earlier studies of this type [ 4 , 5 , 6 , 9 , 10 , 11 , 13 ] we study the linearization error of \(F\) at each point in \(D\) by defining

and the error in the linearization on majorant function \(f\) by

â–¡

[ 11 ] Let \(x, y \in U(x_0, R)\) and \(0 \leq t {\lt} r {\lt} R.\) Suppose that \(\| x-x_0\| \leq t\) and \(\| y-x\| \leq r-t.\) Then,

It follows from Proposition 3.5 that Gauss-Newton map \(G_F\) for \(F:\)

is well-defined. In order to guarantee that the Gauss-Newton iteration can be repeated indefinitely we need to introduce certain sets:

The sets \(K(t)\) and \(K\) were defined in [ 11 , 13 ] . It follows from the estimate \(f_0'(t)\leq f'(t)\) for each \(t\in [0, t^\ast )\) and definitions (3.11)-(3.14) that

and

In view of (3.15) and (3.16) the next two results improve the corresponding ones in [ 13 ] . The proofs are given in exactly the same way but replacing \(K(t), K, \{ t_n\} , h'_{\beta ,0}\) by \(K_0(t), K_0, \{ s_n\} , f'_{0},\) respectively.

The following hold for each \(t\in [0, t^\ast ):\)

\(K_0(t) \subset U(x_0,t^\ast );\)

\(\| G_F(G_F(x))-G_F(x)\| \leq -\frac{h_{\beta ,\lambda }(\psi _{h_{\beta ,\lambda }}(t))}{f_0'(\psi _{h_{\beta ,\lambda }}(t))}(\frac{\| G_F(x)-x\| }{\psi _{h_{\beta ,\lambda }}(t)-t})^2\), \(\forall x\in K_0(t)\)

\(G_F(K_0(t))\subset K_0(\psi _{h_{\beta ,\lambda }}(t))\)

\(K_0 \subset U(x_0, t^\ast )\) and \(G_F(K_0)\subset K_0.\)

The Gauss-Newton method (1.2) can be written as

Next, the main semi-local convergence result for the Gauss-Newton method is presented.

Suppose that the \((H)\) conditions hold. Then, the following hold:

\(h_{\beta ,\lambda }(t)\) has a smallest zero \(t^\ast \in (0, R),\) the sequences \(\{ s_k\} \) and \(\{ x_k\} \) for solving \(h_{\beta ,\lambda }(t)=0\) and \(F(x)=0,\) with starting point \(t_0=0\) and \(x_0,\) respectively given by (3.2) and (3.17) are well defined, \(\{ s_k\} \) is strictly increasing, remains in \([0, t^\ast ),\) and converges to \(t^\ast , \, \{ x_k\} \) remains in \(U(x_0, t^\ast ),\) converges to a point \(x^\ast \in U(x_0, t^\ast )\) such that \(F'(x^\ast )^\dagger F(x^\ast )=0.\) Moreover, the following estimates hold:

and

Furthermore, if \(\lambda =0\) (\(\lambda =0\) and \(f_0'(t^\ast ) {\lt} 0\)), the sequence \(\{ s_k\} ,\, \{ x_k\} \) converge \(Q\)-linearly and \(R\)-linearly (\(Q\)-quadratically and \(R\)-quadratically) to \(t^\ast \) and \(x^\ast ,\) respectively.

(i) As noted in [ 13 ] the best choice for \(\lambda \) is given by \(\lambda =-\kappa f'(\kappa ).\)

(ii) Sequence \(\{ r_k\} \) appears in the proof of Lemma 11 in [ 13 ] or our Lemma 3.8 (see also Theorem 4.1).

(iii) Sequence \(\{ q_k\} \) is derived from the proof of Lemma 11 in [ 13 ] or our Lemma 3.8 if we simply use \(x=x_0\) and notice that the first instead of the second hypothesis in \((h_3)\) can be used for the corresponding estimate. Notice that \(\{ q_k\} \) is the tightest among \(\{ s_k\} \) and \(\{ r_k\} .\) Moreover \(\{ s_k\} ,\) \(\{ t_k\} \) and \(\{ q_k\} \) converges under the \((H)\) conditions. â–¡

# 4 Special cases and applications

In this section, we present some special cases and applications. Then, we arrive at:

Under the \((H)\) hypotheses, the conclusions of Theorem 3.9 hold. Moreover, the following estimates hold

and

Suppose that \(r_i\leq s_i\) and \(r_{i+1}-r_i\leq s_{i+1}-s_i\) for \(i=0, 1, 2, \cdots k.\) We shall show that \(r_{k+2}\leq s_{k+2}\) and \(r_{k+2}-r_{k+1} \leq s_{k+2}-s_{k+1}.\) We have from the induction hypotheses and \((h_7)\) that

and

Similarly, we show that \(q_k \leq r_k\) and \(q_{k+1}-q_k\leq r_{k+1}-r_k.\)

So, far we have shown that under the \((H)\) and \((h_7)\) hypotheses more precise majorizing sequences can be obtained for \(\{ x_k\} .\) At this point we are wondering if a direct study of the convergence of majorizing sequences \(\{ r_k\} , \{ s_k\} , \{ q_k\} \) lead to weaker sufficient convergence conditions than \((h_6).\) Let us present this in the interesting case of Newton’s method

for solving nonlinear equation

That is \(\lambda =\kappa =0.\) Let us define \(f_0\) and \(f\) on \([0, R)\) by \(f_0(t)=\frac{L_0}{2}t^2-t\) and \(f(t)=\frac{L}{2}t^2-t\) for \(L_0 {\gt} 0\) and \(L {\gt} 0.\) Then, sequences \(\{ t_k\} , \{ s_k\} , \{ r_k\} \) and \(\{ q_k\} \) reduce to

Then, according to \((h_3),\) and Theorem 4.1 sequences \(\{ t_k\} , \{ s_k\} , \{ r_k\} ,\{ q_k\} \) converge, if the famous for its simplicity and clarity Kantorovich hypothesis [ 5 , 15 ]

is satisfied. We also have that

However, a direct study of sequence \(\{ q_k\} \) (see [ 5 , 6 ] ) shows that this sequence converges provided that

where

Notice that since \(\bar{L}\leq L,\)

but not necessarily vice versa unless if \(L_0 = L.\) Moreover, we have that

Implication (4.4) shows by how many times (at most) the applicability of Newton’s method (4.2) is extended under our approach. Notice also that if \(L_0 {\lt} L\)

and \(q^\ast = \lim _{k\rightarrow \infty }q_k \leq t^\ast .\) Examples, where \(L_0 {\lt} L\) can be found in [ 5 , 6 ] .

# Bibliography

- 1
*I. Argyros*,*On the semilocal convergence of the Gauss-Newton method*, Adv. Nonlinear Var. Inequal.,**8**(2005) 2, pp. 93–99.- 2
*I. Argyros*and*S. Hilout*,*On the local convergence of the Gauss-Newton method*, Punjab Univ. J. Math.,**41**(2009), pp. 23–33.- 3
- 4
- 5
- 6
*I. Argyros*and*S. Hilout*,*Numerical methods in nonlinear analysis*, World Scientific Publ. Comp. New Jersey, USA, 2013.- 7
*A. Ben-Israel*and*T.N.E. Greville*,*Generalized inverses*, CMS Books in Mathematics/Ouvrages de Mathematiques de la SMC, 15. Springer-Verlag, New York, 2nd ed., Theory and Applications, 2003.- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
*L.V. Kantorovich*and*G.P. Akilov*,*Functional Analysis*, Pergamon Press, Oxford, 1982.- 16
- 17
*F.A. Potra*and*V. Ptak*,*Nondiscrete induction and iterative processes*, Research notes in Mathematics, 103, Pitman (Advanced Publishing Program), Boston, MA, 1984.- 18
*S. Smale*,*Newton’s method estimates from data at one point. The merging of disciplines: new directions in pure, applied, and computational mathematics*(Laramie, Wyo., 1985), pp. 185–196, Springer, New York, 1986.- 19