Relationship between the inexact Newton method and the continuous analogy of Newton’s method

T. Zhanlav\(^\ast \), O. Chuluunbaatar\(^{\ast ^\S }\) G. Ankhbayar\(^\ast \)

January 11, 2012

\(^\ast \)School of Mathematics and Computer Science, National University of Mongolia, Ulan-Bator, Mongolia, e-mail: tzhanlav@yahoo.com, chuka@jinr.ru, anxaa\(\_ \)w@yahoo.com.

\(^\S \)Joint Institute for Nuclear Research, Dubna, 141980 Moscow region, Russia.

In this paper we propose two new strategies to determine the forcing terms that allow one to improve the efficiency and robustness of the inexact Newton method. The choices are based on the relationship between the inexact Newton method and the continuous analogy of Newton’s method. With the new forcing terms, the inexact Newton method is locally Q-superlinearly and quadratically convergent. Numerical results are presented to support the effectiveness of the new forcing terms.

MSC. 97N40

Keywords. inexact Newton method, continuous analogy of Newton’s method

1 Introduction

Consider a nonlinear system

\[ F(x)=0, \eqno (1) \]

where \(F:D\subseteq \mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) is a continuously differentiable nonlinear mapping. Among all kinds of methods for solving the nonlinear equations (1), the Newton method is perhaps the most elementary, popular and important [ 1 , 2 , 3 , 4 , 5 , 6 ] . One of the advantages of the method is its local quadratic convergence. However, its computational cost is expensive, particularly when the size of the problem is very large, because the Newton equations

\[ F(x_{k})+F'(x_{k})s_{k}=0 \eqno (2) \]

should be solved at each iteration step. To reduce the computational cost of the Newton method, Dembo, Eisenstat and Steihaug [ 7 ] proposed an inexact Newton (IN) method,

\[ F'(x_{k})\bar s_{k}=-F(x_{k})+r_{k}, \]
\[ x_{k+1}=x_{k}+\bar s_{k},\quad k=0,1,\dots ,\quad x_{0}\in D. \eqno (3) \]

The terms \(r_{k}\in \mathbb {R}^{n}\) represent the residuals of the approximate solutions \(\bar s_{k}\) of the Newton equation (2), i.e., inexactly solve the Newton equation (2) and obtain a step \(\bar s_{k}\) such that

\[ \| r_{k}\| =\| F(x_{k})+F'(x_{k})\bar s_{k}\| \leq \eta _{k}\| F(x_{k})\| , \eqno (4) \]

where \(\eta _{k}\in [0,1) \) is the forcing term. In each iteration step of the IN method, a real number \(\eta _{k}\) should be chosen first, and then an IN step \(\bar s_{k}\) is obtained by solving the Newton equations approximately with an efficient iteration solver for systems of linear equations. The forcing terms play an important role both in reducing the residuals of Newton equations and in increasing accuracy of the method. In particular, if \(\eta _{k}=0\) for all \(k\), then the IN method reduces to the Newton method. The IN method, like the Newton method, is locally convergent.

Theorem 1.1

[ 7 ] . Assume that the IN iterates converge to \(x^{*}\). Then the convergence is superlinear if and only if

\[ \| r_{k}\| =o(\| F(x_{k})\| )~ ~ as~ ~ k\rightarrow \infty . \eqno (5) \]
Theorem 1.2

[ 7 ] . Given \(\eta _{k}\leq \overline{\eta }{\lt}t{\lt}1,~ k=0,1,\dots \) there exists \(\varepsilon {\gt}0\) such that for any initial approximation \(x_{0}\) with \(\| x_{0}-x^{*}\| \leq \varepsilon ,\) the sequence of the IN iteration \(x_{k}\) satisfying (4) converges to \(x^{*}\). Moreover, the convergence is linear in the sense that

\[ \| x_{k+1}-x^{*}\| _{*} \leq t \| x_{k}-x^{*}\| _{*},~ k=0,1,\dots , \eqno (6) \]

where \(\| y\| _{*}=\| F'(x^{*})y\| .\)

Theorem 1.3

[ 7 ] . Assume that \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) is continuously differentiable, \(x^{*}\in \mathbb {R}^{n}\) such that \(F(x^{*})=0\) and \(F'(x^{*})\) is nonsingular. If the sequence \(x_{k}\) generated by IN iterates converges to \(x^{*},\) then
(1) \(x_{k}\) converges to \(x^{*}\) superlinearly when \(\eta _{k}\rightarrow 0\);
(2) \(x_{k}\) converges to \(x^{*}\) quadratically if \(\eta _k=O(\| F(x_{k})\| )\) and \(F'(x)\) is Lipschitz continuous at \(x^{*}\). In [ 8 ] the condition

\[ \tfrac {\| r_{k}\| }{\| F(x_{k})\| +\| \bar s_{k}\| }=O(\| F(x_{k})\| )~ ~ as~ ~ k\rightarrow \infty \eqno (7) \]

is proposed which characterizes the quadratic convergence of the IN iterations.

In [ 8 ] it is shown that the IN, inexact perturbed and quasi-Newton methods are equivalent models. At present there are some strategies for choosing the forcing terms [ 9 , 10 , 11 ] .

2 The continuous analogy of Newton’s method (CANM)

One of the modifications of Newton method is the well known CANM or damped Newton method [ 1 , 4 , 5 , 6 ]

\[ F'(x_{k})v_{k}=-F(x_{k}),~ ~ x_{k+1}=x_{k}+\tau _{k}v_{k},~ ~ k=0,1,\dots , \eqno (8) \]

where \(\tau _{k}{\gt}0\) is an iteration parameter. The suitable choice of the parameter allows us to speed-up the convergence and to enlarge the convergence domain. If \(\tau _{k}\equiv 1\) for all \(k\), then CANM reduces to Newton method. There exists a closed relationship between the CANM and the IN iterates. Indeed, the CANM (8) can be considered as the IN iteration (3) with the residual

\[ r_{k}=F'(x_{k})\bar s_{k}+F(x_{k})=(1-\tau _{k})F(x_{k}). \eqno (9) \]

From here we get

\[ \| r_{k}\| =\eta _{k}\| F(x_{k})\| \eqno (10) \]

with

\[ \eta _{k}=|1-\tau _{k}|. \eqno (11) \]

We have a local and semi-local convergence results:

Theorem 2.1

There exits \(\varepsilon {\gt}0\) such that for any initial approximation \(x_{0}\) with \(\| x_{0}-x^{*}\| \leq \varepsilon ,\) the sequence generated by (8) with parameter \(\tau _{k}\in (0,2)\) converges to \(x^{*}\).

Proof â–¼
As mentioned above, the CANM is equivalent to IN iterates (9), for which the residual satisfies inequality (4) with forcing term \(\eta _{k}\in [0,1)\) under condition \(\tau _{k}\in (0,2).\) Then by theorem 1.2 the sequence \(x_{k}\) generated by (3) or by (8) converges to \(x^{*}.\)
Proof â–¼

Let \(D_{0}\) be a convex set with \(\overline{D}_{0}\subseteq D\), and \((F'(x))^{-1}\) exists for all \(x\in D_{0}\).

Theorem 2.2

We assume that

  • \(\| F''(x)\| \leq M,~ x\in D_{0}\),

  • \(\| F'(x_{0})^{-1}\| \leq \beta ,\)

  • \(\| F'(x_{0})^{-1}F(x_{0})\| \leq \eta ,~ a_{0}=M \beta \eta ,\)

  • \(M\| F'(x_{k})^{-1}\| \| F'(x_{k})^{-1}F(x_{k})\| \leq a_{k}{\lt}2,~ ~ k=0,1,\dots \)

and

\[ 0{\lt}\tau _{k}{\lt}\tfrac {-1+\sqrt{1+4a_{k}}}{a_{k}}. \eqno (12) \]

Then the sequence \(\{ x_{n}\} \) defined by (8) and starting at \(x_{0}\in D_{0}\) converges to a solution \(x^{*}\) of (1).

Proof â–¼
The Taylor expansion gives
\[ F(x_{k+1})=(1-\tau _{k})F(x_{k})+\tfrac {F”(\xi _{k})}{2}\tau ^{2}_{k}(F'(x_{k})^{-1} F(x_{k}))^{2}, \eqno (13) \]

where \(\xi _{k}=\theta x_{k}+(1-\theta )x_{k+1}\), \(\theta \in (0,1).\) If we use the assumption (iv), then from (13) it follows that

\[ \| F(x_{k+1})\| {\lt}\left(|1-\tau _{k}|+\tfrac {a_{k}}{2}\tau ^{2}_{k} \right)\| F(x_{k})\| {\lt}\| F(x_{k})\| \]

under condition (12), i.e., \(\{ \| F(x_{k})\| \} \) is a decreasing sequence provided the iteration parameter \(\tau _{k}\) is chosen in the interval \(\left( 0, \tfrac {-1+\sqrt{1+4a_{k}}}{a_{k}} \right)\).

Proof â–¼

Remark 2.1

It is easy to show that

\[ 1{\lt} \tfrac {-1+\sqrt{1+4a_{k}}}{a_{k}}{\lt}2,~ ~ ~ \mbox{when}~ ~ 0{\lt}a_{n}{\lt}2. \]

This means that \(\tau _{k}\in (0,2).\)â–¡

Remark 2.2

Assume that the sequence \(x_{k}\) generated by CANM converges to \(x^{*}\). Then

  • \(\{ x_{k}\} \) converges to \(x^{*}\) superlinearly when \(\tau \rightarrow 1\),

  • \(\{ x_{k}\} \) converges to \(x^{*}\) quadratically if

\[ |1-\tau _{k}|=O(\| F(x_{k})\| ) ~ ~ {\rm or}~ ~ |1-\tau _{k}|=O(\| F(x_{k-1})\| ), \eqno (14) \]

because of theorem 1.3.â–¡

3 Some choices of iteration parameters

From (13) it follows that

\[ \tfrac {\| F(x_{k})\| -|1-\tau _{k-1}| \| F(x_{k-1})\| }{\| F(x_{k-1})\| }=O(\| F(x_{k-1})\| ) \]

and

\[ \tfrac {\| F(x_{k})\| -|1-\tau _{k-1}| \| F(x_{k-1})\| }{\| F(x_{k})\| }=O(\| F(x_{k-1})\| ) \]

Then by Remark 2.2, one can choose \(\tau _{k},\) such that

\[ |1-\tau _{k}|=\tfrac {|\| F(x_{k})\| -|1-\tau _{k-1}| |\| F(x_{k-1})\| }{\| F(x_{k-1})\| } \eqno (16a) \]

or

\[ |1-\tau _{k}|=\tfrac {|\| F(x_{k})\| -|1-\tau _{k-1}| \| F(x_{k-1})\| |}{\| F(x_{k})\| }, \eqno (16b) \]

which allows us the quadratic convergence of CANM. The formulas (16) can be rewritten in term of the forcing term \(\eta _{k}\) as

\[ \eta _{k}=\tfrac {|\alpha _{k}\eta _{k-1}-1|}{\alpha _{k}} \eqno (17a) \]

and respectively

\[ \eta _{k}={|\alpha _{k}\eta _{k-1}-1|}, \eqno (17b) \]

where

\[ \alpha _{k}=\tfrac {\| F(x_{k-1})\| }{\| F(x_{k})\| }. \eqno (18) \]

Assume that

\[ \| F(x_{k})\| \leq \eta _{k-1}\| F(x_{k-1})\| ,~ ~ 0 \leq \eta _{k-1}{\lt}1 . \eqno (19) \]

Then \(\alpha _{k}{\gt}1\) and \(\alpha _{k}~ \eta _{k-1}\geq 1.\) As a consequence, the minimum of the possible choices (17) is

\[ \eta _{n}=\tfrac {\alpha _{k}\eta _{k-1}-1}{\alpha _{k}}. \]

If the inequality (19) is not true, then (17) give us \(\eta _{n}=1-\eta _{k-1}\alpha _{k}.\) Thus we have

\[ \eta _{k}=\left\{ \begin{array}{ll} 1-\eta _{k-1}\alpha _{k},~ ~ \mbox{when}~ ~ \eta _{k-1}\alpha _{k}{\lt}1, \\ -\tfrac {1-\eta _{k-1}\alpha _{k}}{\alpha _{k}},~ ~ \mbox{when} ~ ~ \eta _{k-1}\alpha _{k} \geq 1. \end{array}\right. \eqno (20) \]

The second choice in (20) allows us to decrease \(\eta _{k}\), i.e., \(0{\lt}\eta _{k}{\lt}\eta _{k-1},~ \) while the first choice in (20) implies that \(0{\lt}\eta _{k}{\lt}1. \) In both cases we have \(0{\lt}\eta _{k}{\lt}1.\) According to (4), it is possible that (19) is true and thereby the second choice in (20) allows us to decrease \(\eta _{k}\), i.e., \(\eta _{k}\rightarrow 0\) as \(k\rightarrow \infty \). In terms of \(\tau _{k}\) we have the following choice

\[ |1-\tau _{k}|=\eta _{k}. \eqno (21) \]

From this it follows that, if \(0{\lt}\eta _{k}{\lt}1\) and \(\eta _{k}\rightarrow 0\) as \(k\rightarrow \infty \), then we have \(0{\lt}\tau _{k}{\lt}2\) and \(\tau _{k}\rightarrow 1\) as \(k\rightarrow \infty \). Thus, the choice of the iteration parameter given by (20), (21) can be used in CANM.

In [ 6 ] another choice for CANM was proposed:

\[ \tau _n=\tfrac {2}{1+\sqrt{1+2b\| F(x_n)\| }}. \quad \eqno (22) \]

According to (21), the formula (22) in term of \(\eta _n\) reads as

\[ \eta _n=\tfrac {\sqrt{1+2b\| F(x_n)\| }-1}{\sqrt{1+2b\| F(x_n)\| }+1}.\eqno (23) \]

which can be used in the IN as a forcing term. From (20) we see that the inexact Newton method with forcing term given by (20) is locally Q-superlinearly convergent, while IN with forcing term given by (23) converges quadratically because of \(\eta _n=O(\| F(x_n)\| )\) (see theorem 1.3).

4 Numerical experiments

In this section we present numerical examples to demonstrate the efficiency of the new strategies to choose forcing terms. We compare the strategies with some known strategies on their numerical behavior.

We have used three test problems that are typical systems of nonlinear equations in literature, with each of its own name and standart initial guess, say \(x_s\). The problems are listed as follows [ 11 ] :

Problem 4.1 (Generalized function of Rosenbrock)

\[ \left\{ \begin{array}{ll} f_1(x)=-4c(x_2-x_1^2)x_1-2(1-x_1),\\ f_i(x)=2c(x_i-x_{i-1}^2)-4c(x_{i+1}-x_i^2)x_i-2(1-x_i),~ ~ i=2,3,\ldots ,n-1,\\ f_n(x)=2c(x_n-x_{n-1}^2), \end{array}\right. \]

with \(c=2\) and \(x_s=(1.2, 1.2,...,1.2)^T\).

Problem 4.2 (Tridiagonal system)

\[ \left\{ \begin{array}{ll} f_1(x)=4(x_1-x_2^2),\\ f_i(x)=8x_i(x_i^2-x_{i-1})-2(1-x_i)+4(x_i-x_{i+1}^2),~ ~ i=2,3,\ldots ,n-1,\\ f_n(x)=8x_n(x_n^2-x_{n-1})-2(1-x_n), \end{array}\right. \]

with \(x_s=(12, 12,...,12)^T\).

Problem 4.3 (Five-diagonal system)

\[ \left\{ \begin{array}{llllll} f_1(x)=4(x_1-x_2^2)+x_2-x_3^2,\\ f_2(x)=8x_2(x_2^2-x_1)-2(1-x_2)+4(x_2-x_3^2)+x_3-x_4^2,\\ f_i(x)=8x_i(x_i^2-x_{i-1})-2(1-x_i)+4(x_i-x_{i+1}^2) \\ \qquad \qquad +x_{i-1}^2-x_{i-2}+x_{i+1}-x_{i+2}^2, \qquad \qquad i=3,\ldots ,n-2,\\ f_{n-1}(x)=8x_{n-1}(x_{n-1}^2-x_{n-2})-2(1-x_{n-1})+4(x_{n-1}-x_n^2)\nonumber \\ \qquad \qquad +x_{n-2}^2-x_{n-3},\\ f_n(x)=8x_n(x_n^2-x_{n-1})-2(1-x_n)+x_{n-1}^2-x_{n-2}, \end{array}\right. \]

with \(x^s=(-2, -2,...,-2)^T\).

Table 1 Results for problems 4.1–4.3.

Choice

 

GF of Rosenbrock

Tridiagonal system

Five-diagonal system

   

\(x^s\)

\(3x^s\)

\(-3x^s\)

\(x^s\)

\(2x^s\)

\(-2x^s\)

\(x^s\)

\(2x^s\)

\(-2x^s\)

EW1

NI

6

17

18

24

23

27

41

19

16

EW1

GI

60

165

122

193

203

251

386

121

125

EW1

CT

1.91

5.66

2.19

6.21

7.15

7.81

12.88

5.11

3.17

EW2

NI

6

16

35

31

40

43

19

47

17

EW2

GI

85

110

129

147

206

167

94

156

85

EW2

CT

2.47

3.91

3.81

5.85

7.21

6.25

3.66

5.21

2.21

APR

NI

9

11

21

12

20

46

15

17

16

APR

GI

81

93

116

88

128

216

97

98

95

APR

CT

2.66

3.13

4.94

3.25

4.35

8.28

4.03

2.58

2.02

Str (20)

NI

10

19

35

33

31

42

33

32

19

Str (20)

GI

81

105

140

141

128

145

128

116

88

Str (20)

CT

2.50

3.88

5.22

5.37

5.17

6.16

6.38

3.30

3.00

Str (23)

NI

6

14

31

19

28

30

25

17

12

Str (23)

GI

64

87

115

102

135

127

126

91

70

Str (23)

CT

1.94

3.19

4.65

3.69

4.96

3.34

5.78

2.11

1.91

We test the case \(n=100\). Besides the standard initial guess \(x^s\), also test other initial guesses such as \(x^0=\pm jx^s\), \(j= 2, 3\). It is easy to see that \(e=(1, 1, ... ,1)^T\) is a solution to each of the above three problems. For the convenience of reporting the results of different forcing terms, the following notations are used in Table 1:

EW1 : the first strategy given by Eisenstat and Walker [ 10 ] ,

EW2 : the second strategy given by Eisenstat and Walker [ 10 ] ,

APR : the actual reduction and the predicted reduction strategy given in [ 11 ] as well as strategies defined by formulas (20) and (23),

Str (20) : strategy given by (20),

Str (23) : strategy given by (23),

NI : denotes the total number of nonlinear iterations,

GI : denotes the total number of GMRES iterations,

CT : denotes \(10^2\times \)(CPU time).

The norm in our test is the Euclidean norm \(\| \cdot \| _2\) and the stoping criteria is

\[ \| F(x^k)\| \leq \varepsilon =10^{-12}. \]

The initial forcing term \(\eta _0=0.5\) and additional parameters of \(\eta _k\) are chosen following [ 11 ] , for all the above iterations EW1, EW2, APR, Strategies (20) and (23).

The comparison of GMRES with different forcing terms was made by CPU time. From Table 1 we see that GMRES with the forcing term given by (23) is better than those with other forcing terms.

From Table 2 we see that the forcing term given by (23) decreases more rapidly than the forcing term APR that halved at each iteration. As results, the residual norm \(\| F(x^k)\| \) in the first case decreases as rapidly as the forcing term.

Table 2 Iteration comparison between APR and new forcing terms on Rosenbrock system with initial guess \(x^s\), tolerance \(\varepsilon =10^{-14}\) and parameter \(b=0.1\) in strategy (23).
 

Str (23)

APR

\(k\)

GI

\(\| F(x^k)\| \)  

\(\eta _k\)

GI

\(\| F(x^k)\| \)  

\(\eta _k\)

0

1

\(17.5015\)

\(5.0000\cdot 10^{-1}\)

1

\(17.5015\)

\(5.0000\cdot 10^{-1}\)

1

3

\(4.4680\)

\(1.5828\cdot 10^{-1}\)

2

\(4.4680\)

\(2.5000\cdot 10^{-1}\)

3

9

\(4.9646\cdot 10^{-1}\)

\(2.3662\cdot 10^{-2}\)

4

\(1.1195\)

\(1.2500\cdot 10^{-1}\)

4

11

\(1.0066\cdot 10^{-1}\)

\(4.9831\cdot 10^{-3}\)

8

\(1.0501\cdot 10^{-1}\)

\(6.250\cdot 10^{-2}\)

5

18

\(5.4711\cdot 10^{-4}\)

\(2.7354\cdot 10^{-5}\)

8

\(7.589\cdot 10^{-2}\)

\(6.250\cdot 10^{-2}\)

6

27

\(1.5473\cdot 10^{-7}\)

\(7.7363\cdot 10^{-9}\)

9

\(3.6494\cdot 10^{-2}\)

\(3.1250\cdot 10^{-2}\)

7

24

\(1.4223\cdot 10^{-15}\)

\(5.2340\cdot 10^{-13}\)

11

\(1.0827\cdot 10^{-4}\)

\(1.5625\cdot 10^{-2}\)

8

-

-

-

12

\(8.2737\cdot 10^{-7}\)

\(7.8125\cdot 10^{-3}\)

9

-

-

-

13

\(2.7747\cdot 10^{-9}\)

\(3.9063\cdot 10^{-3}\)

10

-

-

-

13

\(4.1806\cdot 10^{-12}\)

\(1.9531\cdot 10^{-3}\)

11

-

-

-

13

\(1.1678\cdot 10^{-14}\)

\(9.7656\cdot 10^{-4}\)

Acknowledgement

O. Chuluunbaatar acknowledges a financial support from the RFBR Grant No. 11-01-00523, and the theme 09-6-1060-2005/2013 “Mathematical support of experimental and theoretical studies conducted by JINR”.

This work was partially sponsored by Foundation for Science and Technology of Ministry of Education, Culture and Science (Mongolia).

Bibliography

1

J. Stoer and R. Bulirsch, Introduction to numerical analysis, Third edition, Springer, 2002.

2

E. Zeidler, Nonlinear functional analysis and its applications, part 1: Fixed-point theorems, Springer-Verlag, 1986.

3

L.V. Kantorovich, Functional analysis and applied mathematics, Uspekhi Mat. Nauk, 3:6(28), pp. 89–185, 1948, (in Russian).

4

M.K. Gavurin, Nonlinear functional equations and continuous analogues of iterative methods, Izv. Vyssh. Uchebn. Zaved. Mat., 5(6), pp. 18–31, 1958.

5

T. Zhanlav and I.V. Puzynin, The convergence of iterations based on the continuous analogy of Newton’s method, Comput. Maths. Math. Phys., 32, pp. 846–856, 1992. (in Russian)

6

T. Zhanlav and O. Chuluunbaatar, A local and semilocal convergence of the continuous analogy of Newton’s method for solving nonlinear equations, Bulletin of PFUR Series Mathematics. Information Sciences. Physics, 1, pp. 34–43, 2012.

7

R.S. Dembo, S.C. Eisenstat and T. Steihang, Inexact Newton methods, SIAM J. Numer. Anal., 19, pp. 400–408, 1982.

8

E. Cătinaş, The inexact, inexact perturbed, and quasi-Newton methods are equivalent models, Math. Comp., 74, pp. 291–301, 2004.

9

X.-C. Cai, W.D. Gropp, D.E. Keyes and M.D. Tidriti, Newton-Krylov-Schwarz methods in CFD, Proceeding of the international workshop on numerical methods for the Navier-stokes equations, Vieweg, Braunschwieg, pp. 17–30, 1995.

10

S.C. Eisenstat and M.F. Walker, Choosing the forcing term in an inexact Newton method, SIAM J. Sci. Comput., 17, pp. 16–32, 1996.

11

H.-B. An, Z.-Y. Mo and X.-P. Liu, A choice of forcing terms in inexact Newton method, J. Comput. Appl. Math., 200, pp. 47–60, 2007.