Return to Article Details Enlarging the convergence ball of Newton's method on Lie groups

Enlarging the convergence ball
of Newton’s method on Lie groups

Ioannis K. Argyros $^{*}$ Saïd Hilout $^{§}$

February 16, 2013.

$^{*}$ Department of Mathematical Sciences, Cameron University, Lawton, OK 73505, USA, e-mail: ioannisa@cameron.edu.

$^{§}$ Poitiers University, Laboratoire de Mathématiques et Applications, Bd. Pierre et Marie Curie, Téléport 2, B.P. 30179, 86962 Futuroscope Chasseneuil Cedex, France, e-mail: said.hilout@math.univ-poitiers.fr.

Dedicated to prof. I. Păvăloiu on the occasion of his 75th anniversary

We present a local convergence analysis of Newton’s method for approximating a zero of a mapping from a Lie group into its Lie algebra. Using more precise estimates than before [ 55 , 56 ] and under the same computational cost, we obtain a larger convergence ball and more precise error bounds on the distances involved. Some examples are presented to further validate the theoretical results.

MSC. 65G99, 65J15, 47H17, 49M15, 90C90.

Keywords. Newton’s method, Lie group, Lie algebra, Riemannian manifold, convergence ball, Kantorovich hypothesis.

1 Introduction

In this paper, we are concerned with the problem of approximating a zero $x^{⋆}$ of $C^{1}$ -mapping $F : G ⟶ Q$ , where $G$ is a Lie group and $Q$ the Lie algebra of G that is the tangent space $T_{e} G$ of $G$ at $e$ , equipped with the Lie bracket $[., .] : Q \times Q ⟶ Q$ [ 7 , 21 , 28 , 29 , 53 ] .

The study of numerical algorithms on manifolds for solving eigenvalue or optimization problems on Lie groups [ 1 ] – [ 12 ] , [ 33 ] – [ 36 ] , [ 54 ] – [ 57 ] is very important in Computational Mathematics. Newton-type methods are the most popular iterative procedures used to solve equations, when these equations contain differentiable operators. The study about convergence matter of Newton’s method is usually centered on two types: semilocal and local convergence analysis. The semilocal convergence matter is, based on the information around an initial point, to give criteria ensuring the convergence of Newton’s method; while the local one is, based on the information around a solution, to find estimates of the radii of convergence balls. There is a plethora of studies on the weakness and/or extension of the hypothesis made on the underlying operators; see for example (cf. [ 7 , 11 , 16 ] and references theiren). A local as well as a semilocal convergence of Newton-type methods has been given by several authors under various conditions [ 2 ] – [ 58 ] . Recently in [ 15 ] , we presented a finer semilocal convergence analysis for Newton’s method than in earlier studies [ 7 , 32 , 34 , 36 , 54 , 55 , 56 ] .

Newton’s method with initial point $x_{0} \in G$ was first introduced by Owren and Welfert [ 43 ] in the form

x_{n + 1} = x_{n} \cdot e x p (- d F_{x_{n}}^{- 1} F (x_{n})) f o r e a c h n = 0, 1, \dots .

1.1

Newton’s method is undoubtedly the most popular method for generating a sequence ${x_{n}}$ approximating $x^{⋆}$ [ 7 , 11 , 15 , 16 , 32 , 34 , 36 ] , [ 54 ] – [ 56 ] . In the present paper we establish a finer local convergence analysis with the advantages ( $A_{l}$ ):

Larger convergence ball;

and

Tighter error bounds on the distances involved.

The necessary background on Lie groups can be found in [ 7 , 11 , 15 , 16 ] and the references therein. The paper is organized as follows. In Section 2, we present the local convergence analysis of Newton’s method. Finally, numerical examples are given in the concluding Section 3.

2 Local convergence analysis for Newton’s method

We shall study the semilocal/local convergence of Newton’s method. In the rest of the paper we assume $⟨ \cdot, \cdot ⟩$ the inner product and $∥ \cdot ∥$ on $Q$ . As in [ 56 ] we define a distance on $G$ for $x, y \in G$ as follows:

\begin{aligned} m (x, y) & = inf {\sum_{i = 1}^{k} ∥ z_{i} ∥ : t h e r e e x i s t k \geq 1 a n d z_{1}, \dots, z_{k} \in Q \\ s u c h t h a t y = x \cdot \exp z_{1} \dots \exp z_{k}} . \end{aligned}

By convention $inf \emptyset = + \infty$ . It is easy to see that $m (\cdot, \cdot)$ is a distance on $G$ and the topology induced is equivalent to the original one on $G$ . Let $w \in G$ and $r > 0$ , we denote by

U (w, r) = {y \in G : m (w, y) < r}

the open ball centered at $w$ and of radius $r$ . Moreover, we denote the closure of $U (w, r)$ by $\overset{―}{U} (w, r)$ . Let also $L (Q)$ denotes the set of all linear operators on $Q$ .

We need the definition of Lipschitz continuity for a mapping.

Definition 2.1

Let $M : G ⟶ L (Q)$ , $x^{⋆} \in G$ and $r > 0$ . We say that $M$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, r)$ if

∥ M (x \cdot \exp u) - M (x) ∥\leq L ∥ u ∥

2.3

holds for any $u \in Q$ and $x \in U (x^{⋆}, r)$ such that $∥ u ∥ + m (x, x^{⋆}) < r$ .

It follows from (2.3) that there exists

L_{⋆}

such that

∥ M (x \cdot \exp u) - M (x^{⋆}) ∥\leq L_{⋆} (∥ u ∥ + m (x, x^{⋆}))

2.4

holds for any $u \in Q$ and $x \in U (x^{⋆}, r)$ such that $∥ u ∥ + m (x, x^{⋆}) < r$ .

We then say $M$ satisfies the $L_{⋆}$ -center Lipschitz condition at $x^{⋆} \in G$ on $U (x^{⋆}, r)$ . Note that if $M$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, r)$ , then it also satisfies the $L_{⋆}$ -center Lipschitz condition at $x^{⋆} \in G$ on $U (x^{⋆}, r)$ . Clearly,

L_{⋆} \leq L

2.5

holds in general and $L / L_{⋆}$ can be arbitrarily large [ 4 ] – [ 16 ] .

Let us show that (2.3) indeed implies (2.4). If (2.3) holds, then for $x = x^{⋆}$ there exists $K \in (0, L]$ such that

∥ M (x^{⋆} \cdot \exp u) - M (x^{⋆}) ∥\leq K ∥ u ∥ f o r a n y u \in Q s u c h t h a t ∥ u ∥ < r .

There exist $k \geq 1$ and $z_{0}, z_{1}, \dots, z_{k} \in Q$ such that $x = x^{⋆} \cdot \exp z_{0} \dots \exp z_{k}$ . Let $y_{0} = x^{⋆}$ , $y_{i + 1} = y_{i} \cdot \exp u_{i}$ , $i = 0, 1, \dots, k$ . Then, we have that $x = y_{k + 1}$ . We can write the identity

\begin{array}{l} M (x \cdot \exp u) - M (x^{⋆}) = \\ = (M (x \cdot \exp u) - M (x)) + (M (x) - M (x^{⋆})) \\ = (M (x \cdot \exp u) - M (x)) + (M (y_{k} \cdot \exp u_{k}) - M (y_{k})) + (M (y_{k}) - M (x^{⋆})) \\ = (M (x \cdot \exp u) - M (x)) + (M (y_{k} \cdot \exp u_{k}) - M (y_{k})) \\ + (M (y_{k - 1} \cdot \exp u_{k - 1}) - M (x^{⋆})) \\ = (M (x \cdot \exp u) - M (x)) + (M (y_{k} \cdot \exp u_{k}) - M (y_{k})) + \dots + \\ + (M (y_{1} \cdot \exp u_{1}) - M (y_{1})) + (M (x^{⋆} \cdot \exp u_{0}) - M (x^{⋆})) . \end{array}

Using (2.3) and the triangle inequality, we obtain in turn that

\begin{aligned} ∥ M (x \cdot \exp u) - M (x^{⋆}) ∥\leq \\ \leq∥ M (x \cdot \exp u) - M (x) ∥ + ∥ M (y_{k} \cdot \exp u_{k}) - M (y_{k}) ∥ + \dots \\ + ∥ M (y_{1} \cdot \exp u_{1}) - M (y_{1}) ∥ + ∥ M (x^{⋆} \cdot \exp u_{0}) - M (x^{⋆}) ∥ \\ \leq L (∥ u ∥ + ∥ u_{k} ∥ + \dots + ∥ u_{1} ∥) + K ∥ u_{0} ∥ \\ = L (∥ u ∥ + ∥ u_{k} ∥ + \dots + ∥ u_{0} ∥) + (K - L) ∥ u_{0} ∥ \\ \leq L (∥ u ∥ + ∥ u_{k} ∥ + \dots + ∥ u_{0} ∥), \end{aligned}

which implies (2.4).

We need a Banach type lemma on invertible mappings.

Lemma 2.2

Suppose that $d F_{x^{⋆}}^{- 1}$ exists and let $0 < r \leq \frac{1}{L_{⋆}}$ . Suppose $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L_{⋆}$ -center Lipschitz condition at $x^{⋆} \in G$ on $U (x^{⋆}, r)$ ; for $x \in U (x^{⋆}, r)$ , there exist $k \geq 1$ and $z_{0}, z_{1}, \dots, z_{k} \in Q$ , such that $x = x^{⋆} \cdot e x p z_{0} \dots e x p z_{k}$ and $\sum_{i = 0}^{k} ∥ z_{i} ∥ < r$ . Then, linear mapping $d F_{x}^{- 1}$ exists and

∥ d F_{x}^{- 1} d F_{x^{⋆}} ∥\leq (1 - L_{⋆} \sum_{i = 0}^{k} ∥ z_{i} ∥)^{- 1} .

2.6

Proof â–¼

Let

y_{k} = x^{⋆} \cdot \exp z_{0} \cdot \exp z_{1} \dots \exp z_{k - 1}

. Then, we deduce

y_{k} \in U (x^{⋆}, r)

, since

\sum_{i = 0}^{k - 1} ∥ z_{i} ∥ < r

. Note that

x = y_{k} \cdot \exp z_{k}

. Using (2.4) for

M = d F_{x^{⋆}}^{- 1} d F

, we get in turn

∥ d F_{x^{⋆}}^{- 1} (d F_{x} - d F_{x^{⋆}}) ∥\leq L_{⋆} (∥ z_{k} ∥ + m (y_{k}, x^{⋆})) \leq L_{⋆} \sum_{i = 0}^{k} ∥ z_{i} ∥< L_{⋆} r \leq 1 .

2.7

It follows from (2.7) and the Banach lemma on invertible operators [ 7 ] , [ 32 ] that $d F_{x}^{- 1}$ exists and (2.6) holds. That completes the proof of Lemma 2.2.

Proof â–¼

Next, we study the convergence domain of Newton’s method around a zero $x^{⋆}$ of mapping $F$ . First from Lemma 2.3 until Corollary 2.9, we present the local result for Newton’s method when $G$ is an Abelian group. Then, the corresponding local results follow when $G$ is not necessarily an Abelian group.

Lemma 2.3

Let $G$ be an Abelian group and $0 < r \leq \frac{1}{L_{⋆}}$ . Let $F : G ⟶ Q$ be a $C^{1}$ -mapping. Let $x^{⋆} \in G$ be a zero of mapping $F$ . Suppose $d F_{x^{⋆}}^{- 1}$ exists and let $L_{⋆} > 0$ ; there exist $j \geq 1$ and $z_{1}, z_{2}, \dots, z_{j} \in Q$ such that

x_{0} = x^{⋆} \cdot e x p z_{1} \dots e x p z_{j} f o r \sum_{i = 1}^{j} ∥ z_{i} ∥< r;

2.8

$d F_{x^{⋆}}^{- 1} d F$ satisfies the $L_{⋆}$ -center Lipschitz condition at $x^{⋆}$ on $U (x^{⋆}, r)$ . Then, linear mapping $d F_{x_{0}}^{- 1}$ exists and

∥ d F_{x_{0}}^{- 1} F (x_{0}) ∥\leq \frac{(2 + L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥) \sum_{i = 1}^{j} ∥ z_{i} ∥}{2 (1 - L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥)} .

2.9

Proof â–¼

Using hypothesis

d F_{x^{⋆}}^{- 1} d F

satisfies the

L_{⋆}

-center Lipschitz condition at

x^{⋆}

U (x^{⋆}, \frac{1}{L_{⋆}})

, we have that

∥ d F_{x^{⋆}}^{- 1} (d F_{x \cdot e x p u} - d F_{x^{⋆}}) ∥\leq L_{⋆} (∥ u ∥ + m (x, x^{⋆}))

2.10

for all $u \in Q$ , $x \in U (x^{⋆}, r)$ such that $∥ u ∥ + m (x, x^{⋆}) < r$ . Let $z = z_{1} + z_{2} + \dots + z_{j}$ . Then, since $G$ is an Abelian group, $e x p z_{1} \cdot e x p z_{2} \dots e x p z_{j} = e x p (z_{1} + z_{2} + \dots + z_{j}) = e x p z$ , so, we can write $x_{0} = x^{⋆} \cdot e x p z$ . It then follows from Lemma 2.2 that $d F_{x_{0}}^{- 1}$ exists and

∥ d F_{x_{0}}^{- 1} d F_{x^{⋆}}) ∥\leq (1 - L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥)^{- 1} .

2.11

We also have the following identity

\begin{array}{lll} F (x_{0}) & = & F (x^{⋆} \cdot e x p z) - F (x^{⋆}) \\ = & \int_{0}^{1} d F_{x^{⋆} \cdot e x p (t z)} z d t = \int_{0}^{1} (d F_{x^{⋆} \cdot e x p (t z)} - d F_{x^{⋆}}) z d t + d F_{x^{⋆}} z . \end{array}

2.12

In view of (2.10) and (2.12), we get that

\begin{array}{lll} ∥ d F_{x^{⋆}}^{- 1} F (x_{0}) ∥ & \leq & \int_{0}^{1} ∥ d F_{x^{⋆}}^{- 1} (d F_{x^{⋆} \cdot e x p (t z)} - d F_{x^{⋆}}) ∥ ∥ z ∥ d t + ∥ z ∥ \\ \leq & \int_{0}^{1} ∥ L_{⋆} (t ∥ z ∥) ∥ z ∥ d t + ∥ z ∥ \\ \leq & (\frac{L_{⋆}}{2} \sum_{i = 1}^{j} ∥ z_{i} ∥ + 1) \sum_{i = 1}^{j} ∥ z_{i} ∥ . \end{array}

2.13

Moreover, by (2.11) and (2.13), we obtain that

∥ d F_{x_{0}}^{- 1} F (x_{0}) ∥\leq∥ d F_{x_{0}}^{- 1} d F_{x^{⋆}} ∥ ∥ d F_{x^{⋆}}^{- 1} F (x_{0}) ∥\leq \frac{(2 + L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥) \sum_{i = 1}^{j} ∥ z_{i} ∥}{2 (1 - L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥)} .

That completes the proof of Lemma 2.3.

Remark 2.4

The proof of Lemma 2.3 reduces to [ 56 , Lemma 3.2 ] if $L_{⋆} = L$ . Otherwise (i.e., if $L_{⋆} < L$ ) it constitutes an improvement. We have also included the proof although similar to the corresponding one in [ 56 ] because it is not straightforward to see that $L_{⋆}$ can replace $L$ in the derivation of the crucial estimate (2.9). Note also that (2.10) can holds for some $L_{⋆}^{1} \in (0, L_{⋆}]$ . If $L_{⋆}^{1} < L_{⋆}$ , then according to the proof of Lemma 2.3, $L_{⋆}^{1}$ can replace $L_{⋆}$ in (2.9).

Let us define parameter

α

for

β = \frac{L_{⋆}}{L}

α = {\begin{cases} 4 \frac{\sqrt{1 + 3 β} - (1 + β)}{β (1 - β)}, & i f & L_{⋆} \neq L \\ 1, & i f & L_{⋆} = L . \end{cases}

2.14

Then, it is can easily be seen that

α \geq 1 .

2.15

We have the local convergence result for Newton’s method.

Theorem 2.5

Let $G$ be an Abelian group. Let $0 < r \leq \frac{α}{4 L}$ , where $α$ in given in (2.14). Let $F : G ⟶ Q$ be a $C^{1}$ -mapping. Suppose there exists $x^{⋆} \in G$ such that $F (x^{⋆}) = 0$ , $d F_{x^{⋆}}^{- 1}$ exists and $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, \frac{3 r}{1 - L_{⋆} r})$ . Then, sequence ${x_{n}}$ ( $n \geq 0$ ) generated by Newton’s method starting at $x_{0} \in U (x^{⋆}, r)$ is well defined, remains in $U (x^{⋆}, \frac{3 r}{1 - L_{⋆} r})$ for all $n \geq 0$ and converges to a zero $y^{⋆}$ of mapping $F$ such that

m (x^{⋆}, y^{⋆}) < \frac{3 r}{1 - L_{⋆} r} .

Proof â–¼

It follows from hypothesis

x_{0} \in U (x^{⋆}, r)

that there exist

j \geq 1

and

z_{1}, z_{2}, \dots, z_{j} \in Q

such that (2.8) holds. If

d F_{x^{⋆}}^{- 1} d F

L

-Lipschitz on

U (x^{⋆}, \frac{3 r}{1 - L_{⋆} r})

, then it is

L_{⋆}

-center Lipschitz at

x^{⋆}

U (x^{⋆}, \frac{3 r}{1 - L_{⋆} r})

. Note also that

\frac{α}{4 L} \leq \frac{1}{L_{⋆}}

by the choice of

α

. It follows from Lemma 2.3 that linear mapping

d F_{x_{0}}^{- 1}

exists and

η =∥ d F_{x_{0}}^{- 1} F (x_{0}) ∥\leq \frac{(2 + L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥) \sum_{i = 1}^{j} ∥ z_{i} ∥}{2 (1 - L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥)} .

2.16

Set

\overset{―}{L} = \frac{L}{1 - L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥} a n d \overset{―}{r} = \frac{(2 + L_{⋆} r) r}{1 - L_{⋆} r} .

We shall show mapping $d F_{x_{0}}^{- 1} d F$ satisfies the $\overset{―}{L}$ -Lipschitz condition on $U (x_{0}, \overset{―}{r})$ . Indeed, let $x \in U (x_{0}, \overset{―}{r})$ and $z \in Q$ be such that $∥ z ∥ + m (x_{0}, x) < \overset{―}{r}$ . Then, we get that

∥ z ∥ + m (x, x^{⋆}) < ∥ z ∥ + m (x, x_{0}) + m (x_{0}, x^{⋆}) < \overset{―}{r} + r \leq \frac{3 r}{1 - L_{⋆} r} .

Using Lemma 2.2, we have that

\begin{array}{lll} ∥ d F_{x_{0}}^{- 1} (d F_{x \cdot e x p z} - d F_{x}) ∥ & \leq & ∥ d F_{x_{0}}^{- 1} d F_{x^{⋆}} ∥ ∥ d F_{x^{⋆}}^{- 1} (d F_{x \cdot e x p z} - d F_{x}) ∥ \\ \leq & \frac{L ∥ z ∥}{1 - L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥} = \overset{―}{L} ∥ z ∥ . \end{array}

Set

\overset{―}{h} = \overset{―}{L} η a n d \overset{―}{r_{1}} = \frac{2 η}{1 + \sqrt{1 - 2 \overset{―}{h}}} \leq 2 η .

2.17

Then, by (2.16), we obtain that

\overset{―}{r_{1}} \leq \frac{(2 + L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥) \sum_{i = 1}^{j} ∥ z_{i} ∥}{1 - L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥} \leq \frac{(2 + L_{⋆} r) r}{1 - L_{⋆} r} = \overset{―}{r}

2.18

and

\overset{―}{h} \leq \frac{(2 + L_{⋆} r) L r}{2 (1 - L_{⋆} r)^{2}} \leq \frac{1}{2},

2.19

by the choice of $r$ and $α$ . By standard majorization techniques [ 7 ] , ${x_{k}}$ converges to some zero $y^{⋆}$ of mapping $F$ and $m (x_{0}, y^{⋆}) \leq \overset{―}{r_{1}}$ . Furthermore, we obtain that

m (x^{⋆}, y^{⋆}) \leq m (x^{⋆}, x_{0}) + m (x_{0}, y^{⋆}) \leq r + \overset{―}{r_{1}} \leq r + \overset{―}{r} \leq \frac{3 r}{1 - L_{⋆} r} .

That completes the proof of Theorem 2.5.

The proofs of remaining results in this Section are omitted, since they can be obtained from the corresponding ones in [ 56 ] .

Corollary 2.6

Let $G$ be an Abelian group. Let $F : G ⟶ Q$ be a $C^{1}$ -mapping. Let $x^{⋆} \in G$ be a zero of mapping $F$ . Suppose $d F_{x^{⋆}}^{- 1}$ exists and $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, \frac{1}{L_{⋆}})$ . Let also $x_{0} \in U (x^{⋆}, \frac{α}{4 L})$ . Then, sequence ${x_{n}}$ ( $n \geq 0$ ) generated by Newton’s method starting at $x_{0}$ is well defined, remains in $U (x^{⋆}, \frac{α}{4 L})$ for all $n \geq 0$ and converges to a zero $y^{⋆}$ of mapping $F$ such that $m (x^{⋆}, y^{⋆}) < \frac{1}{L_{⋆}} .$

We denote in the following Corollary

B (0, r) = {z \in Q : ∥ z ∥ < r} .

Corollary 2.7

Let $G$ be an Abelian group. Let $F : G ⟶ Q$ be a $C^{1}$ -mapping. Let $x^{⋆} \in G$ be a zero of mapping $F$ . Suppose $d F_{x^{⋆}}^{- 1}$ exists and $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, \frac{1}{L_{⋆}})$ . Let also $x_{0} \in U (x^{⋆}, \frac{1}{L_{⋆}})$ . Let $σ$ be the maximal number such that $U (e, σ) \subseteq e x p (B (0, \frac{1}{L_{⋆}}))$ . Set $r = min {\frac{σ}{3 + L_{⋆} σ}, \frac{α}{4 L}}$ and $N (x^{⋆}, r) = x^{⋆} e x p (B (0, r)) .$ Then, sequence ${x_{n}}$ ( $n \geq 0$ ) generated by Newton’s method starting at $x_{0} \in N (x^{⋆}, r)$ converges to $x^{⋆}$ .

Corollary 2.8

Let $G$ be an Abelian group. Let $F : G ⟶ Q$ be a $C^{1}$ -mapping, where $G$ is a compact connected Lie group equipped with a bi-invariant Riemannian metric. Let $x^{⋆} \in G$ be a zero of mapping $F$ and $0 < r < \frac{α}{4 L}$ . Suppose $d F_{x^{⋆}}^{- 1}$ exists and $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, \frac{3 r}{1 - L_{⋆} r})$ . Let also $x_{0} \in U (x^{⋆}, r)$ . Then, sequence ${x_{n}}$ ( $n \geq 0$ ) generated by Newton’s method starting at $x_{0}$ , remains in $\overset{―}{U} (x^{⋆}, \frac{3 r}{1 - L_{⋆} r})$ for all $n \geq 0$ and converges to $x^{⋆}$ .

Corollary 2.9

Let $G$ be an Abelian group. Let $F : G ⟶ Q$ be a $C^{1}$ -mapping, where $G$ is a compact connected Lie group equipped with a bi-invariant Riemannian metric. Let $x^{⋆} \in G$ be a zero of mapping $F$ . Suppose $d F_{x^{⋆}}^{- 1}$ exists and $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, \frac{1}{L_{⋆}})$ . Let also $x_{0} \in U (x^{⋆}, \frac{α}{4 L})$ . Then, sequence ${x_{n}}$ ( $n \geq 0$ ) generated by Newton’s method starting at $x_{0}$ , remains in $\overset{―}{U} (x^{⋆}, \frac{α}{4 L})$ for all $n \geq 0$ and converges to $x^{⋆}$ .

Lemma 2.10

Let $0 < r \leq \frac{1}{L}$ . Let $F : G ⟶ Q$ be a $C^{1}$ -mapping. Let $x^{⋆} \in G$ be a zero of mapping $F$ . Suppose $d F_{x^{⋆}}^{- 1}$ exists; there exist $j \geq 1$ and $z_{1}, z_{2}, \dots, z_{j} \in Q$ such that $x_{0} = x^{⋆} \cdot e x p z_{1} \dots e x p z_{j}$ for $\sum_{i = 1}^{j} ∥ z_{i} ∥ < r$ and $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition at $x^{⋆}$ on $U (x^{⋆}, r)$ . Then, the following assertions hold

$∥ d F_{x^{⋆}}^{- 1} (d F_{x^{⋆} \cdot e x p z} - d F_{x^{⋆}}) ∥\leq K_{⋆} ∥ z ∥$

for each $z \in Q$ with $∥ z ∥ < r$ and some $K_{⋆} \in (0, L]$ .
$∥ d F_{x^{⋆}}^{- 1} (F^{'} (x_{0}) - F^{'} (x^{⋆})) ∥\leq L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥ f o r s o m e L_{⋆} \in (0, L] .$
Linear mapping $d F_{x_{0}}^{- 1}$ exists and

$∥ d F_{x_{0}}^{- 1} F (x_{0}) ∥\leq \frac{(2 + L \sum_{i = 1}^{j} ∥ z_{i} ∥) \sum_{i = 1}^{j} ∥ z_{i} ∥}{2 (1 - L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥)} .$

Proof â–¼

This assertion follows from the hypothesis that $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition at $x^{⋆}$ on $U (x^{⋆}, r)$ .
Let $w_{0} = x^{⋆}$ , $w_{i + 1} = w_{i} \cdot e x p z_{i + 1}$ , $i = 1, 2, \dots, j - 1$ . Then, we have $w_{j} = w_{j - 1} \cdot e x p z_{j} = x_{0}$ . Using the $L$ -Lipschitz condition, we get in turn that

$\begin{array}{l} ∥ d F_{x^{⋆}}^{- 1} (d F_{w_{j}} - d F_{x^{⋆}}) ∥ \\ \leq∥ d F_{x^{⋆}}^{- 1} (d F_{w_{j - 1} \cdot e x p z_{j}} - d F_{w_{j - 1}}) ∥ + \dots + ∥ d F_{x^{⋆}}^{- 1} (d F_{w_{1} \cdot e x p z_{2}} - d F_{w_{1}}) ∥ + \\ ∥ d F_{x^{⋆}}^{- 1} (d F_{x^{⋆} \cdot e x p z_{1}} - d F_{x^{⋆}}) ∥ \\ \leq L (∥ z_{j} ∥ + \dots + ∥ z_{2} ∥) + K_{⋆} ∥ z_{1} ∥ \\ = L (∥ z_{j} ∥ + \dots + ∥ z_{1} ∥) + (K_{⋆} - L) ∥ z_{1} ∥\leq L (∥ z_{j} ∥ + \dots + ∥ z_{1} ∥) \end{array}$

which shows (ii).
We have by (ii) that

$∥ d F_{x^{⋆}}^{- 1} (d F_{x_{0}} - d F_{x^{⋆}}) ∥\leq L_{⋆} (∥ z_{j} ∥ + \dots + ∥ z_{1} ∥) \leq L r < 1 .$

Hence, $d F_{x_{0}}^{- 1}$ exists and

$∥ d F_{x_{0}}^{- 1} d F_{x^{⋆}} ∥\leq (1 - L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥)^{- 1} .$
2.20

We have that

$\begin{array}{l} d F_{x^{⋆}}^{- 1} (F (w_{i}) - F (w_{i - 1})) = d F_{x^{⋆}}^{- 1} \int_{0}^{1} d F_{w_{i - 1} \cdot e x p (t z_{i})} z_{i} d t \\ = \int_{0}^{1} d F_{x^{⋆}}^{- 1} (d F_{w_{i - 1} \cdot e x p (t z_{i})} - d F_{x^{⋆}}) z_{i} d t + z_{i} . \end{array}$

Hence, we get that

$∥ d F_{x^{⋆}}^{- 1} (d F_{w_{k - 1} \cdot e x p (z_{k})} - d F_{w_{k - 1}}) ∥\leq L ∥ z_{k} ∥ f o r e a c h k = 1, 2, \dots, j .$

Therefore, we obtain that

$\begin{array}{lll} ‖ d F_{x^{⋆}}^{- 1} (d F_{w_{i - 1} \cdot e x p (t z_{i})} - d F_{x^{⋆}}) ‖ \leq \\ \leq \sum_{k = 1}^{i - 1} ‖ d F_{x^{⋆}}^{- 1} (d F_{w_{k - 1} \cdot e x p (z_{k})} - d F_{w_{k - 1}}) ‖ + ‖ d F_{x^{⋆}}^{- 1} (d F_{w_{i - 1} \cdot e x p (t w_{i})} - d F_{w_{i - 1}}) ‖ \\ \leq L \sum_{k = 1}^{i - 1} ‖ z_{k} ‖ + L t ‖ z_{i} ‖ . \end{array}$

We have that $F (x_{0}) = \sum_{i = 1}^{j} (F (w_{i}) - F (w_{i - 1}))$ . That is, we can get

$\begin{array}{l} ∥ d F_{x^{⋆}}^{- 1} F (x_{0}) ∥ \\ \leq \sum_{i = 1}^{j} (\int_{0}^{1} ∥ d F_{x^{⋆}}^{- 1} (d F_{w_{i - 1} \cdot e x p (t z_{i})} - d F_{x^{⋆}}) ∥ ∥ z_{i} ∥ d t + ∥ z_{i} ∥) \\ \leq \sum_{i = 1}^{j} (\int_{0}^{1} L (\sum_{k = 1}^{i - 1} ∥ z_{k} ∥ + t ∥ z_{i} ∥) ∥ z_{i} ∥ d t + ∥ z_{i} ∥) \\ \leq (\frac{L}{2} \sum_{i = 1}^{j} ∥ z_{i} ∥ + 1) \sum_{i = 1}^{j} ∥ z_{i} ∥ . \end{array}$
2.21

The result now follows from (2.20), (2.21) and

$∥ d F_{x_{0}}^{- 1} F (x_{0}) ∥\leq∥ d F_{x_{0}}^{- 1} d F_{x^{⋆}} ∥ ∥ d F_{x^{⋆}}^{- 1} F (x_{0}) ∥\leq \frac{(2 + L \sum_{i = 1}^{j} ∥ z_{i} ∥) \sum_{i = 1}^{j} ∥ z_{i} ∥}{2 (1 - L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥)} .$

The proof of Lemma 2.10 is complete.

Let us define parameter $δ$ by

δ = {\begin{cases} 4 \frac{\sqrt{2 (1 + β)} - (1 + β)}{1 - β^{2}}, & i f & L_{⋆} \neq L \\ 1, & i f & L_{⋆} = L . \end{cases}

2.22

We have that

δ \geq 1 .

Theorem 2.11

Let $0 < r \leq \frac{δ}{4 L}$ , where $δ$ in given in (2.22). Let $F : G ⟶ Q$ be a $C^{1}$ -mapping. Suppose there exists $x^{⋆} \in G$ such that $F (x^{⋆}) = 0$ , $d F_{x^{⋆}}^{- 1}$ exists and $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, R = \frac{3 + (L - L_{⋆}) r}{1 - L_{⋆} r} r)$ . Then, sequence ${x_{n}}$ ( $n \geq 0$ ) generated by Newton’s method starting at $x_{0} \in U (x^{⋆}, r)$ is well defined, remains in $U (x^{⋆}, R)$ for all $n \geq 0$ and converges to a zero $y^{⋆}$ of mapping $F$ such that $m (x^{⋆}, y^{⋆}) < R$ .

Proof â–¼

The proof is similar to the proof of Theorem 2.5. Note that

\overset{―}{r}

in the proof of Theorem 2.5 is replaced by

\overset{―}{\overset{―}{r}} = \frac{(2 + L r) r}{1 - L_{⋆} r}

and $η$ satisfies the new condition

η \leq \frac{(2 + L \sum_{i = 1}^{j} ∥ z_{i} ∥) \sum_{i = 1}^{j} ∥ z_{i} ∥}{2 (1 - L_{⋆} \sum_{i = 1}^{j} ∥ z_{i} ∥)} .

The proof of Theorem 2.11 is complete.

Corollary 2.12

Let $F : G ⟶ Q$ be a $C^{1}$ -mapping. Let $x^{⋆} \in G$ be a zero of mapping $F$ . Suppose $d F_{x^{⋆}}^{- 1}$ exists and $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, \frac{1}{L_{⋆}})$ . Let also $x_{0} \in U (x^{⋆}, \frac{δ}{4 L})$ . Then, the conclusions of Corollary 2.6 hold.

Corollary 2.13

Let $F : G ⟶ Q$ be a $C^{1}$ -mapping. Let $x^{⋆} \in G$ be a zero of mapping $F$ . Suppose $d F_{x^{⋆}}^{- 1}$ exists and $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, \frac{1}{L_{⋆}})$ . Let also $x_{0} \in U (x^{⋆}, \frac{1}{L_{⋆}})$ . Let $σ$ and $N (x^{⋆}, r)$ as in Corollary 2.7. Set $r = min {r_{0}, r_{1}, \frac{δ}{4 L}}$ , where

r_{0} = {\begin{cases} \frac{- (L_{⋆} + 3 L) + \sqrt{(L_{⋆} + 3 L)^{2} + 4 L (L - L_{⋆})}}{2 L (L - L_{⋆})}, & i f & L_{⋆} \neq L \\ \frac{1}{4 L}, & i f & L_{⋆} = L \end{cases}

and

r_{1} = {\begin{cases} \frac{- (3 + σ L_{⋆}) + \sqrt{(3 + σ L_{⋆})^{2} + 4 σ (L - L_{⋆})}}{2 (L - L_{⋆})}, & i f & L_{⋆} \neq L \\ \frac{σ}{3 + σ L}, & i f & L_{⋆} = L . \end{cases}

Then, sequence ${x_{n}}$ ( $n \geq 0$ ) generated by Newton’s method starting at $x_{0} \in N (x^{⋆}, r)$ converges to $x^{⋆}$ .

Corollary 2.14

Let $F : G ⟶ Q$ be a $C^{1}$ -mapping, where $G$ is a compact connected Lie group equipped with a bi-invariant Riemannian metric. Let $x^{⋆} \in G$ be a zero of mapping $F$ and $0 < r < \frac{δ}{4 L}$ . Suppose $d F_{x^{⋆}}^{- 1}$ exists and $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, R)$ , where $R$ is defined in Theorem 2.11. Let also $x_{0} \in U (x^{⋆}, r)$ . Then, sequence ${x_{n}}$ ( $n \geq 0$ ) generated by Newton’s method starting at $x_{0}$ , remains in $\overset{―}{U} (x^{⋆}, R)$ for all $n \geq 0$ and converges to $x^{⋆}$ .

Corollary 2.15

Let $F : G ⟶ Q$ be a $C^{1}$ -mapping, where $G$ is a compact connected Lie group equipped with a bi-invariant Riemannian metric. Let $x^{⋆} \in G$ be a zero of mapping $F$ . Suppose $d F_{x^{⋆}}^{- 1}$ exists and $d F_{x^{⋆}}^{- 1} d F$ satisfies the $L$ -Lipschitz condition on $U (x^{⋆}, \frac{1}{L_{⋆}})$ . Let also $x_{0} \in U (x^{⋆}, \frac{δ}{4 L})$ . Then, sequence ${x_{n}}$ ( $n \geq 0$ ) generated by Newton’s method starting at $x_{0}$ , remains in $\overset{―}{U} (x^{⋆}, \frac{δ}{4 L})$ for all $n \geq 0$ and converges to $x^{⋆}$ .

Remark 2.16

(a) The local results reduce to the corresponding ones in [ 56 ] if $L = L_{⋆}$ . Otherwise (i.e., if $L_{⋆} < L$ ), they constitute an improvement under the same computational cost with advantages as already stated in the Introduction of this study. Note also that $α > 1$ , $δ > 1$ if $L_{⋆} < L$ and $α ⟶ \infty$ , $δ ⟶ \infty$ if $\frac{L}{L_{⋆}} ⟶ \infty$ .

(b) The local results if $G$ is an Abelian group are weaker and tighter than the ones when $G$ is not necessarily an Abelian group. We have for example that $\overset{―}{r} < \overset{―}{\overset{―}{r}}$ , $δ < α$ , $\frac{3 r}{1 - L_{⋆} r} < R$ and the upper bound on $η$ is smaller if $L_{⋆} < L$ .

(c) It is obvious that finer results can be immediately obtained if similar conditions as the semilocal case (see [ 15 , Section 1, L_⋆ instead L₀ ] ) are used instead of Kantorovich condition for $L_{⋆} < L$ . However, we decided to leave this part of analysis to the motivated reader. We refer the reader to [ 14 ] for such results involving nonlinear equations in a Banach space setting. We also refer the reader to [ 7 , 11 , 15 , 16 ] for examples.

3 Numerical examples

In this Section we present two numerical examples in the more general setting of a nonlinear equation on a Hilbert space where $L_{⋆} < L$ .

Example 3.1

Let $X = Y = R$ , $x^{⋆} = 0$ . Define $F$ by $F (x) = - d_{2} \sin 1 + d_{1} x + d_{2} \sin e^{d_{3} x}$ , where $d_{1}$ , $d_{2}$ , $d_{3}$ are given real numbers. We have that $F (x^{⋆}) = 0$ . Moreover, if $d_{3}$ is sufficiently large and $d_{2}$ sufficiently small, $L / L_{⋆}$ can be arbitrarily large.