Local convergence of a two-step Gauss-Newton Werner-type method for solving least squares problems

Ioannis K. Argyros1, Santhosh George2
(Date: October 19, 2018; accepted: July 09, 2024; published online: July 11, 2024.)
Abstract.

The aim of this paper is to extend the applicability of a two-step Gauss-Newton-Werner-type method (TGNWTM) for solving nonlinear least squares problems. The radius of convergence, error bounds and the information on the location of the solution are improved under the same information as in earlier studies. Numerical examples further validate the theoretical results.

Key words and phrases:
Gauss-Newton method, Werner’s method, local convergence, least squares problem, average Lipschitz condition.
2005 Mathematics Subject Classification:
65G99, 65J15, 65H10, 49M15, 47H17. 65N35, 47H17, 49M15.
1Department of Mathematical Sciences, Cameron University, Lawton, OK 73505, USA, e-mail: iargyros@cameron.edu.
2Department of Mathematical and Computational Sciences, National Institute of Technology Karnataka, India-575 025, e-mail: sgeorge@nitk.edu.in.

1. Introduction

Let i,j be natural numbers with ji. Let also Ω be an open and convex subset of j. We are concerned with the solution p of the least squares problem [4, 5, 6, 7, 8, 9]:

(1.1) minxΩf(x):=12F(x)TF(x),

where F:Ωj is a Fréchet-differentiable mapping. Numerous problems can be brought in the form (1.1) using Mathematical Modeling [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. The closed form solutions can only be found in special cases. That explains why most solution methods for problem (1.1) are iterative. Let x0,y0Ω and set z=x0+y02. In the present study, we provide the local convergence analysis of GNWTM defined for each n=0,1,2, by

xn+1 = xnAnF(xn)
(1.2) yn+1 = xn+1AnF(xn+1)
zn = xn+yn2,

where An=[F(zn)TF(zn)]1F(zn)T. If i=j, TGNWTM reduces to a Gauss-Newton-Werner type method [3, 8, 9]. Notice that in each iteration the inversion of [F(zn)TF(zn)]1 is required only once. Therefore, the computational cost is essentially the same as in the Gauss- Newton method. The LLT decomposition of [F(zn)TF(zn)]1 costs O(n3) floating-point operations (Flops) leading to the computation of xn+1. It then follows from the second substep of method (1) that O(n2) Flops are needed for the computation of yn+1.

The local convergence analysis of method (1) was given in the elegant paper by Shakhno et.al. in [9] (see also related work in [3, 8]). Their convergence analysis uses average Lipschitz continuity condition as well as Lipschitz conditions.

Using the concept of the average Lipschitz continuity [12] and our new idea of restricted convergence domains, we present a local convergence analysis with the following advantages (A) over works using the similar information [3, 4, 8, 9, 11, 12, 13]:

  1. (a)

    Larger radius of convergence;

  2. (b)

    Tighter error bounds on the distances xnp;

  3. (c)

    An at least as precise information on the location of the solution p.

Achieving (a)-(c) is very important in computational sciences, since: (a) We obtain a wider choice of initial guesses; (b) Fewer iterates are required to obtain a desired error tolerance; (c) Better information about the ball of convergence is obtained.

The rest of the paper is structured as follows: Section 2 contains the local convergence analysis of method (1) whereas special cases and the applications are presented in the concluding Section 3.

2. Local convergence analysis

Set U(w,ρ)={vj:vw<ρ} to be the open ball in j and by U¯(w,ρ) to denote its closure. Let R>0. Define parameter R1 by R1:=sup{t[0,R]:U¯(p,t)Ω}. The convergence analysis of numerous iterative methods has been given using the following concept due to Wang [12]:

Definition 1.

Mapping F:U¯(p,R1)i satisfies the Lipschitz condition with L1 average on U(p,R1) if

F(x)F(y)0xyL1(u)𝑑ufor eachx,yU¯(p,R1),

where L1 is a positive non-decreasing function.

It turns out that the convergence analysis of iterative methods based on the preceding notion can be improved as follows:

Definition 2.

The mapping F:U¯(p,R1)i satisfies the center-Lipschitz condition with L0 average on U(p,R1) if

F(x)F(p)0xpL0(u)𝑑u for eachxU¯(p,R1),

where L0 is a positive non-decreasing function.

Clearly, we have that

(2.1) L0(u)L1(u)for eachu[0,R1],

and L0L1 can be arbitrary small [2, 3, 4]. Let β>0 be a parameter. Suppose that equation

(2.2) β0tL0(u)𝑑u=1

has positive solutions. Denote by R0 the smallest such solution. Notice for example that R0 exists, if

(2.3) βL0(R1)R11.

Indeed, function g(t):=β0tL(u)𝑑u1 is such that g(0)=1<0 and g(R1)=βL(R1)R11>0. The existence of R0 follows from the intermediate value theorem.

Definition 3.

The mapping F:U¯(p,R1)i satisfies the restricted Lipschitz condition with L average on U(p,R0) if

F(x)F(y)0xyL(u)𝑑ufor eachx,yU¯(p,R0),

where L is a positive non-decreasing function.

We have that

(2.4) L(u)L1(u)for eachu[0,R0],

since R0R1. Throughout this paper, we suppose that

(2.5) L0(u)L(u)for eachu[0,R0],

unless, otherwise stated. Otherwise, i.e., if

(2.6) L(u)L0(u)for eachu[0,R0],

then the results that follow hold with L0 replacing L. Moreover, we need the definitions:

Definition 4 ([12]).

Let F:U¯(p,R1)i be a twice Fréchet-differentiable mapping. We say that mapping F satisfies the Lipschitz condition with M1 average on U(p,R1) if

F′′(x)F′′(y)0xyM1(u)𝑑ufor eachx,yU¯(p,R1),

where M1 is a positive nondecreasing function.

Definition 5.

Let F:U¯(p,R0)i be twice Fréchet-differentiable mapping. We say that mapping F satisfies the restricted Lipschitz condition with M average on U(p,R0) if

F′′(x)F′′(y)0xyM(u)𝑑ufor eachx,yU¯(p,R0),

where M is a positive nondecreasing function.

We have that

(2.7) M(u)M1(u)for eachu[0,R0].

It is worth noticing that the definition of functions L and M (based on L0 and R0) was not possible in earlier studies using L1 and M1. That is, L=L(L0,R0,R1),M=M(L0,R0,R1), whereas L1=L1(R1) and M1=M1(R1). It turns out that L0 can replace the less precise L in the computation of the upper bounds on the inverses of the operators involved and U¯(p,R0),L,M can replace U¯(p,R1),L1,M1, respectively in the proofs of such results. Moreover, notice that the iterates xn lie in U¯(p,R0) which is a more precise location than U¯(p,R1) used in earlier studies [2, 3, 4, 8, 11, 12, 13]. We shall make the paper as self contained as possible by stating some standard auxiliary concepts and results.

Denote by i×j the set of all i×j matrices. The Moore-Penrose pseudo-inverse is defined by A=(ATA)1AT for each full rank Ai×j [6].

Lemma 2.1 ([2, 6]).

Let A,A1m×n. Assume that A2=A+A1,AA1<1, and rank(A)=rank(A2). Then,

A2A1AA1.

If rank(A)=rank(A2)=min{m,n}, the following holds

A2A2A2A11AA1.
Lemma 2.2 ([5]).

Let A,A1m×n. Assume that A2=A+A1,A1A<1, and rank(A)=n, then rank(A2)=n.

Lemma 2.3 ([12]).

Let φ(t)=1t0tP(u)𝑑u,0tρ, where P(u) is a positive integrable function and monotonically non-decreasing on [0,ρ]. Then, φ(t) is monotonically non-decreasing with respect to t.

Lemma 2.4 ([12]).

Let ψ(t)=1t30tQ(u)𝑑u,0tρ, where Q(u) is a positive integrable function and monotonically non-decreasing on [0,ρ]. Then, ψ(t) is monotonically non-decreasing with respect to t.

As in [9], it is convenient for the local convergence analysis that follows to introduce some functions and parameters:

α = F(p),β=(FTF1FT,
d(x) = xp,s0=max{d(x0),d(y0)},
μ(t) = μ(L0,L,M)(t)
= β80tM(u)(tu)2𝑑u
+βt(032tL(u)𝑑u+0tL0(u)𝑑u)+2αβ20tL0(u)𝑑ut,
γ = γ(L0,L,M)
= β0d(x0)M(u)(d(x0)u)2𝑑u8d(x0)(1β0d(x0)L0(u)𝑑u)
+βd(x0)0d(x0)+d(y0)2L(u)𝑑u2d(x0)+d(y0)3(1β0d(z0)L0(u)𝑑u)
+2αβ20d(z0)L0(u)𝑑ud(z0)(1β0d(z0)L0(u)𝑑u)<1,
δ = β0d(x0)M(u)(d(x0)u)2𝑑u8d(x0)3(1β0d(z0)L0(u)𝑑u),
τ = 2αβ20d(z0)L0(u)𝑑ud(z0)(1β0d(z0)L0(u)𝑑u),
λ = β0d(x0)+d(y0)2L(u)𝑑u2d(x0)+d(y0)3(1β0d(z0)L0(u)𝑑u),
en+11 = δd(xn)3+λd(xn)d(yn)+τd(zn),
en+12 = δd(xn+1)3+λ3(d(xn)+d(yn)+d(xn+1))d(xn+1)+τd(zn),

and

sn+1=max{d(xn+1),d(yn+1)}.

Notice that if L0=L=L1 and M=M1, then the preceding definitions reduce to the corresponding ones in [9].

The local convergence analysis is based on the conditions (𝒞):

  1. (𝒞1)

    Mapping F:U¯(p,R1)i is twice Fréchet-differentiable, F(p) has full rank and p solves problem (1.1).

  2. (𝒞2)

    F(x) satisfies: the center-Lipschitz condition with L0 average on U¯(p,R1) and the restricted Lipschitz condition with L average on U¯(p,R0); F′′(x) satisfies the restricted Lipschitz condition with M average on U¯(p,R0), where L0,L and M are positive non-decreasing functions on [0,3R02].

  3. (𝒞3)

    Function μ has a minimal zero R in [0,R0], which also satisfies

    β0RL0(u)𝑑u<1.

Then, we can show the following local convergence result for TGNWTM under the conditions (𝒞) and the preceding notation.

Theorem 2.5.

Suppose that conditions (𝒞) hold. Then, sequences {xn}, {yn},{zn} generated for x0,y0U¯(p,R){p} by TGNWTM are well defined in U¯(p,R), remain in U¯(p,R) for each n=0,1,2, and converge to p. Moreover, the following estimates hold

(2.8) d(xn+1)en+11,
(2.9) d(yn+1)en+12,

and

(2.10) sn+1γskγn+1s0.

Proof. The proof follows the corresponding one in [9] but there are differences where we use (L0,L), M instead of L1,M1, respectively used in [9]. We shall use mathematical induction to show that iterates {xk}, {yk},{zk} are well defined converge to p and the error estimates (2.8)–(2.10) are satisfied. Using TGNWTM for n=0, we can write

x1p = x0p[F(z0)TF(z0)]1F(z0)F(x0)
= [F(z0)TF(z0)]1F(z0)T[F(z0)(x0p)F(x0)+F(p)]
+[FTF1FTF(p)[F(z0)TF(z0)]1F(z0)TF(p)
= [F(z0)TF(z0)]1F(z0)TJ(x0)
+[FTF1FTF(p)[F(z0)TF(z0)]1F(z0)TF(p),

and

y1p = x1p[F(z0)TF(z0)]1F(z0)TF(x1)
= [F(z0)TF(z0)]1F(z0)T[F(z0)(x1p)F(x1)+F(p)]
+[FTF1FTF(p)[F(z0)TF(z0)]1F(z0)TF(p)
= [F(z0)TF(z0)]1F(z0)TJ(x1)
+[FTF1FTF(p)[F(z0)TF(z0)]1F(z0)F(p),

where

J(xi) = F(xi+p2)(xip)F(xi)+F(p)
+(F(z0)F(xi+p2))(xip).

In view of the estimate

F(x)F(y)F(x+y2)(xy) =011t4(F′′(x+y2+t2(xy))
F′′(x+y2+t2(yx)))(xy)2dt,

for x=p and y=x0, we obtain in turn

F(p)F(x0)F(x0+p2)(px0)
=1401(1t)[F′′(x0+p2+t2(px0))
F′′(x0+p2+t2(x0p))](px0)2dt
01(1t)0tx0pM(u)𝑑ux0p2𝑑t
=180d(x0)M(u)(1ud(x0))2𝑑ud(x0)2
=180d(x0)M(u)(d(x0)u)2𝑑u,

and

F(x0+y02)F(x0+p2)0d(y0)/2L(u)𝑑u.

By the central Lipschitz condition, we have that

(F(p)TF(p))1F(p)TF(x)F(p)β0d(x)L0(u)𝑑u.

Moreover, by 2.1 and 2.2 and (𝒞1), for all xU(p,R), we get

(F(x)TF(x))1F(x)Tβ1β0d(x)L0(u)𝑑u

and

(F(x)TF(x))1F(x)T(F(p)TF(p))1F(p)T2β20d(x)L0(u)𝑑u1β0d(x)L0(u)𝑑u.

By the monotonicity of L(u) and M(u) with 2.3 and 2.4, functions 1t0tL(u)𝑑u and 1t30tM(u)(tu)2𝑑u are non-decreasing in t. That is, by (𝒞3)

γ = 1R0[β0R0M(u)(R0u)2𝑑u8(1β0R0L(u)𝑑u)+βR0032R0L(u)𝑑u1β0R0L0(u)du)+2αβ20R0L0(u)𝑑u1β0R0L0(u)du)]
< 1R[β0RM(u)(Ru)2𝑑u8(1β0RL0(u)𝑑u)+βR032RL(u)𝑑u1β0RL0(u)du)+2αβ20RL0(u)𝑑u1β0RL0(u)du)]
1.

Thus, by 2.12.9 and condition (𝒞2), we have in turn

x1p [F(z0)TF(z0)]1F(z0)T
×01(1t)[F′′(x0+p2+t2(px0))
F′′(x0+p2+t2(x0p))](px0)2dt
+[FTF1FTF(p)
[F(z0)TF(z0)]1F(z0)TF(p)
βd(x0)30d(x0)M(u)(d(x0)u)2𝑑u8d(x0)3(1β0d(x0)L0(u)𝑑u)
+βd(x0)d(y0)0d(y0)/2L(u)𝑑ud(y0)(1β0d(x0)L0(u)𝑑u)+2αβ20d(x0)L0(u)𝑑ud(z0)(1β0d(x0)L0(u)𝑑u)
< δd(x0)3+λd(x0)d(y0)+τd(z0)<δR0<R.

In an analogous way, we get in turn

y1p [F(z0)TF(z0)]1F(z0)T
×F(x1+p2)(x1p)F(x1)+F(p)
+(F(z0)F(x1+p2))(x1p)
+[FTF1FTF(p)[F(z0)TF(z0)]1F(z0)TF(p)
βd(x1)30d(x1)M(u)(d(x1)u)2𝑑u8d(x1)3(1β0d(x1)L0(u)𝑑u)
+βd(x1)d(z0)0d(z0)/2L(u)𝑑ud(z0)(1β0d(z0)L0(u)𝑑u)+2αβ20d(z0)L0(u)𝑑ud(z0)(1β0d(z0)L0(u)𝑑u)
δd(x1)3+λ3d(x1)(d(x0)+d(y0)+d(x1))+τd(z0)
< δd(x0)3+λ3d(x0)(d(x0)+d(y0))+τd(z0)<δR0<R,

hold, where d(z0)=(d(x0)+d(y0)+d(x1))/2, so x1,y1U(p,R) and we also have that

R1=max{x1p,y1p}δR0,

so (2.10) is satisfied. Suppose that xk,ykU(p,R) and (2.10) hold for k>0. By TGNWTM for k+1 we get in turn that

xk+1p βd(xk)30d(xk)M(u)(d(xk)u)2𝑑u8d(xk)3(1β0d(xk)L0(u)𝑑u)
+βd(xk)d(yk)0d(yk)/2L(u)𝑑ud(yk)(1β0d(yk)L0(u)𝑑u)+2αβ2d(zk)0d(zk)L0(u)𝑑ud(zk)(1β0d(zk)L0(u)𝑑u)
βd(xk)30d(x0)M(u)(d(x0)u)2𝑑u8d(x0)3(1β0d(x0)L0(u)𝑑u)
+βd(xk)d(yk)0d(y0)/2L0(u)𝑑ud(y0)(1β0d(y0)L0(u)𝑑u)+2αβ2d(zk)0d(z0)L0(u)𝑑ud(z0)(1β0d(z0)L0(u)𝑑u)
δd(xk)3+λd(xk)d(yk)+τd(zk)<δRk<R

and

yk+1p βd(xk+1)30d(xk+1)M(u)(d(xk+1)u)2𝑑u8d(xk+1)3(1β0d(xk)L0(u)𝑑u)
+βd(xk+1)d(zk)0d(zk)/2L(u)𝑑ud(zk)(1β0d(zk)L0(u)𝑑u)+2αβ2d(zk)0d(zk)L(u)𝑑ud(zk)(1β0d(zk)L0(u)𝑑u)
βd(xk+1)30d(x0)M(u)(d(x0)u)2𝑑u8d(x0)3(1β0d(x0)L0(u)𝑑u)
+βd(xk+1)d(zk)0d(z0)L(u)𝑑ud(z0)(1β0d(z0)L0(u)𝑑u)+2αβ2d(zk)0d(z0)L0(u)𝑑ud(z0)(1β0d(z0)L0(u)𝑑u)
δd(xk+1)3+λ3(d(xk)+d(yk)+d(xk+1))d(xk+1)+τd(zk)
< δRk<R,

where d(zk)=(d(xk)+d(yk)+d(xk+1))/2. Furthermore, we obtain

Rk+1=max{xk+1p,yk+1p}δRkδ2Rk1δk+1R0,

so

xk+1,yk+1U(p,R),(2.8), (2.10) hold,

limkxk=p and limkyk=p.

Concerning the uniqueness of the solution p we have:

Proposition 2.6.

Under the conditions (𝒞) further suppose that

(2.11) βR0RL0(u)(Ru)𝑑u+αβ¯R0RL0(u)𝑑u<1,

where β¯=[FTF1. Then, limit point p is the only solution of problem (1.1) in U¯(p,R).

The proof follows from the corresponding one in [5] but we only use the center-Lipschitz condition.

3. Special cases and applications

Remark 3.1.
  1. (a)

    Set α=F(p)=0 in 2.5 and 2.6 to obtain the results in the case of zero residual.

  2. (b)

    If L0,L,M are constants, then we can obtain results of special cases.

  3. (c)

    In the literature functions L1 and M1 are used instead of L and M, respectively [3, 5, 8, 9, 12]. Let us compare ratios and ball of convergence. Notice that in view of (2.1), (2.4), (2.5) and (2.7), we have

    (3.1) μ(L0,L,M)(t)μ(L1,L1,M1)(t),

    and

    (3.2) γ(L0,L,M)(t)γ(L1,L1,M1)(t),

    so

    (3.3) R(L1,L1,M1)(t)R(L0,L,M)(t).

Therefore, our radius of convergence is larger and our ratio of convergence is smaller. Moreover the information on the location of the solution p is more precise, since only L is used in (2.11) [9]. Notice that these advantages are obtained under the same computational cost, since in practice the computation of L1 and M1 require the computation of the rest of the functions L0,L and M as special cases.

Remark 6.

In particular, using the error estimates, it follows that for α=0 we have τ=0 and

d(xk+1) d(xk)(δd(xk)2+λd(yk))
d(yk+1) d(xk+1)[δd(xk+1)2+λ3(d(xk)+d(xk+1)+d(yk))]
d(xk+1)[(δd(xk)+2λ3)d(xk)+λd(yk)3]
d(xk+1)d(xk)(δR+λ)
= d(xk+1)d(xk)1.

Also, for sufficiently large k,

d(xk+1) d(xk)(δd(xk)2+λd(yk))
d(xk)(δd(xk)2+λ1d(xk)d(xk1))
d(xk)2d(xk1)(δ+λ1)
= d(xk)2d(xk1)2,

leading to the equation

x22x1=0,

so the order of iterative method (1) is the positive root of the preceding equation which is 1+2.

Next, we present an example to show that (3.1)–(3.3) hold as strict inequalities justifying the advantages as claimed at the introduction of this study.

EXAMPLE 3.2.

Let X=3,D=U¯(0,1),p=(0,0,0)T. Define function F on D for w=(x,y,z)T by

F(w)=(ex1,e12y2+y,z)T.

Then, the Fréchet-derivative is given by

F(v)=[ex000(e1)y+10001].

Notice that using the (2.9) conditions, we get L0=e1,L=M=e1/L0,L1=M1=e,β=1,i=j=3,α=0. Then

R(L1,L1,M1)(t)=0.1468<R(L0,L,M)(t)=0.2263,

which justify the improvements as stated in the introduction of this paper.

References