We survey the convergence orders of sequences and show the orders of the basic iterative sequences for solving nonlinear equations in \({\mathbb R}\): the Newton method, the secant method, and the successive approximations).

Abstract

The high speed of \(x_{k}\rightarrow x^\ast\in{\mathbb R}\) is usually measured using the C-, Q- or R-orders:
\begin{equation}\tag{$C$}
\lim \frac {|x^\ast – x_{k+1}|}{|x^\ast – x_k|^{p_0}}\in(0,+\infty),
\end{equation}
\begin{equation}\tag{$Q$}
\lim \frac {\ln |x^\ast – x_{k+1}|}{\ln |x^\ast – x_k|} =q_0,
\qquad \mbox{resp.}
\end{equation}
\begin{equation}\tag{$R$}
\lim \big\vert\ln |x^\ast – x_k| \big \vert ^\frac1k =r_0.
\end{equation}
By connecting them to the natural, term-by-term comparison of the errors of two sequences, we find that the C-orders — including (sub)linear — are in agreement. Weird relations may appear though for the Q-orders: we expect
\[
|x^\ast – x_k|=\mathcal{O}(|x^\ast – y_k|^\alpha), \qquad\forall\alpha>1,
\]
to attract “\(\geq\)” for the Q-orders of \(\{x_{k}\}\) vs. \(\{y_{k}\}\); the contrary is shown by an example providing no vs. infinite Q-orders. The R-orders seem even worse: an \(\{x_{k}\}\) with infinite R-order may have unbounded nonmonotone errors: \(\limsup \)\(|x^\ast – x_{k+1}|/|x^\ast – x_k|= +\infty\).

Such aspects motivate the study of equivalent definitions, computational variants, etc.

The orders are also the perspective from which we analyze the three basic iterative methods for nonlinear equations. The Newton method, widely known for its quadratic convergence, may in fact attain any C-order from \([1,+\infty]\) (including sublinear); we briefly recall such convergence results, along with connected aspects (historical notes, known asymptotic constants, floating point arithmetic issues, radius of attraction balls, etc.) and provide examples.

This approach leads to similar results for the successive approximations, while the secant method exhibits different behavior: it may not have high C-, but just Q-orders.

Authors

Emil Cătinaş
“Tiberiu Popoviciu” Institute of Numerical Analysis, Romanian Academy

Keywords

(computational) convergence orders, iterative methods, Newton method, secant method, successive approximations, asymptotic rates.

PDF

here you may find an annotated version, with some typos pointed out: https://ictp.acad.ro/wp-content/uploads/2023/12/Catinas-2021-SIREV-How-many-steps-still-left-to-x-annotated.pdf

also freely available at the publisher (without pointed typos): http://doi.org/10.1137/19M1244858

Paper coordinates:

E. Catinas, How many steps still left to x*?, SIAM Rev., 63 (2021) no. 3, pp. 585–624, doi: 10.1137/19M1244858

About this paper

Journal
Publisher Name
Print ISSN

0036-1445

Online ISSN

1095-7200

Github

Some small codes in LaTeX and Julia have been posted on Github in order to illustrate some aspects.

We will post soon in Python too.

The link is: https://github.com/ecatinas/conv-ord

Corrigendum

(introduced on February 10, 2023, modified on February 25, 2023, on August 27, 2023)

Corrigendum:

  • for the sequences with \(\bar{Q}_1=\infty\) we have proposed the terminology “sequences with unbounded nonmonotone errors” (see Remark 2.12.c). This is also termed in the beginning of Section 2.2 as “sequences with no Q-order” – which is not proper, as we have subsequently found sequences with \(\bar{Q}_1=\infty\) and with Q-order \(1\), see [Cătinaş, 2023];
  • in Remark 2.50.b, “the strict superlinear” should be removed. Indeed, this is not a C-order, but a Q-order, as is defined with the aid of \(\bar{Q}_p\). On the other hand, the strict superlinear order is not always faster than the C-order \(p>1\), see [Cătinaş, 2023].

Typos:

  • in the abstract, condition \(|x^\ast – x_{k+1}|/|x^\ast – x_{k}| \rightarrow \infty\) (i.e., \(Q_1=\infty\)), for the sequence with unbounded nonmonotone error, should read instead \(\limsup |x^\ast – x_{k+1}|/|x^\ast – x_{k}| = \infty\) (i.e., \(\bar{Q}_1=\infty\)); obviously, condition \(Q_1=\infty\) cannot be fulfilled by a convergent sequence. The matter is correctly dealt with inside the paper (see Remark 2.12.c).

Notes

Working with the SIAM staff was an amazing experience. I would like to thank to Louis Primus for his solicitude and professionalism.


Version: October 30, 2021.

Some references I have missed:

(these were included in the initial submission, but missing in the final version, as not cited in text, and we switched to BibTeX)

known to me, but still missing (nobody is perfect):

found after sending the revised version:

  • J. H. Wilkinson, The evaluation of the zeros of ill-conditioned polynomials. part ii, Numer. Math., 1 (1959), pp. 167–180, doi: 10.1007/BF01386382
  • D. F. Bailey and J. M. Borwein, Ancient indian square roots: an exercise in forensic paleo mathematics, Amer. Math. Monthly, 119 (2012), pp. 646–657, doi 10.4169/amer.math.monthly.119.08.646.
  • E. Halley, Methodus nova accurata & facilis inveniendi radices aeqnationum quarumcumque generaliter, sine praviae reductione, Phil. Trans. R. Soc., 18 (1694), pp. 136–148, https://doi.org/10.1098/rstl.1694.0029.
  • A. Ralston and P. Rabinowitz, A First Course in Numerical Analysis, 2nd ed., 1978.
  • M. A. Nordgaard, The origin and development of our present method of extracting the square and cube roots of numbers, The Mathematics Teacher, 17 (1924), pp. 223–238, https://doi.org/https://www.jstor.org/stable/27950616.

and others (to be completed)


Paper (preprint) in HTML form

How Many Steps Still Left to 𝒙*?Received by the editors February 14, 2019; accepted for publication (in revised form) September 17, 2020; published electronically August 5, 2021.

How Many Steps Still Left to 𝒙*?thanks: Received by the editors February 14, 2019; accepted for publication (in revised form) September 17, 2020; published electronically August 5, 2021.

Emil Cătinaş
“Tiberiu Popoviciu” Institute of Numerical Analysis, Cluj-Napoca, Romania
email: ecatinas@ictp.acad.ro

Abstract. The high speed of xkx is usually measured using the C-, Q-, or R-orders:

lim|xxk+1||xxk|p0(0,+),limln|xxk+1|ln|xxk|=q0,orlim|ln|xxk||1k=r0.

By connecting them to the natural, term-by-term comparison of the errors of two sequences, we find that the C-orders—including (sub)linear—are in agreement. Weird relations may appear though for the Q-orders: we expect |xxk|=𝒪(|xyk|α) α>1 to imply “” for the Q-orders of {xk} vs. {yk}; the contrary is shown by an example providing no vs. infinite Q-orders. The R-orders appear to be even worse: an {xk} with infinite R-order may have unbounded nonmonotone errors: limsup|xxk+1|/|xxk|=+ .
Such aspects motivate the study of equivalent definitions, computational variants, and so on.
These orders are also the perspective from which we analyze the three basic iterative methods for nonlinear equations in . The Newton method, widely known for its quadratic convergence, may in fact attain any C-order from [1,+] (including sublinear); we briefly recall such convergence results, along with connected aspects (such as historical notes, known asymptotic constants, floating point arithmetic issues, and radius of attraction balls), and provide examples.
This approach leads to similar results for the successive approximations method, while the secant method exhibits different behavior: it may not have high C-orders, but only Q-orders.

Keywords. (computational) convergence orders, iterative methods, Newton method, secant method, successive approximations, asymptotic rates

AMS subject classifications. 65-02, 65-03, 41A25, 40-02, 97-02, 97N40, 65H05

DOI. 10.1137/19M1244858

1 Introduction

The analysis of the convergence speed of sequences is an important task, since in numerical applications the aim is to use iterative methods with fast convergence, which provide good approximations in few steps.

The ideal setting for studying a given sequence {xk}:=(xk)k0 is that of error-based analysis, where the finite limit x is assumed to be known and we analyze the absolute values of the errors, ek:=|xkx|. The errors allow the comparison of the speeds of two sequences in the most natural way, even if their limits are distinct (though here we write both as x). Let us first define notation, for later reference.

Notation.

Given x˙k,x̊kx, denote their errors by {e˙k}, resp., {e̊k}, etc.

Comparison 1.1.

{x˙k} converges faster (not slower) than {x̊k} if

|xx˙k||xx̊k|,kk0 (e˙ke̊k)

(or, in brief, {x˙k} is eq. e˙ke̊k faster than {x̊k}) and strictly faster if

|xx˙k|<|xx̊k|,kk0. (e˙k<e̊k)

Furthermore, increasingly faster convergence holds if

- for some c(0,1) we have |xx˙k|c|xx̊k|,kk0;

- the constant c above is not fixed, but tends to zero (see, e.g., [9, p. 2], [53]),

|xx˙k|=o(|xx̊k|)as k; (e˙k=o(e̊k))
111This means that |xx˙k|ck|xx̊k|, kk0, with ck0, which allows a finite number of elements ck to be greater than one.

- given α>1, we have

|xx˙k|=𝒪(|xx̊k|α) as k. (e˙k=𝒪(e̊kα))
222That is, K>0 such that |xx˙k|K|xx̊k|α, kk0.

Let us first illustrate the convergence speed in an intuitive fashion, using some graphs.

Example 1.2.

Consider the following two groups of sequences ({xk}={ek}):

  • (a)

    {1k}, {1k}, {1k4};

  • (b)

    {k2k}, {12k}={(12)k}={2k}, {1k2k}, {14k}.

Handling the first terms of these sequences raises no difficulties, as all the common programming languages represent them as fl(xk) in standard double precision arithmetic (called binary64 by the IEEE 754-2008 standard); see 1.4 below.

The plotting of (k,ek) in fig. 1(a) does not help in analyzing the behavior of the errors, as for k10 we cannot see much, even if the graph is scaled.333The implicit fitting of the figures in the window usually results in different units for the two axes (scaled graph). In fig. 1(a) we would see even less if the graph were unscaled. All we can say is that {1k} has the slowest convergence, followed by {1k}, and also that the first terms of {1k4} are smaller than those of {12k}.

Refer to caption
(a) Plotting (k,ek) does not help much.
Refer to caption
(b) Plotting (k,lgek) is better.
Figure 1: (Scaled) Cartesian vs. semilog coordinates in visualizing the errors.

In order to compare the last terms we must use the semilog coordinates (k,lgek) in fig. 1(b) (see [43, p. 211]); for {12k}, they are (k,klg12), belonging to the line y=ax, a=lg12. As the graph is twice scaled,444The first scaling is by the use of an unequal number of units (50 on axis x vs. 36 on axis y), and the second one results from the fact that the final figure is a rectangle instead of a square. it actually shows another line, y=bx.

Figure 1(b) shows that, in reality, {12k} is eq. e˙k<e̊k faster than {1k4} (k0=17). We will see later that the convergence is sublinear in group (a) and linear in group (b).

Exercise 1.3.

Show that (a) {12k} is [eq. e˙k=𝒪(e̊kα) α>1] faster than {1k4};

(b) {12k} is eq. e˙k=o(e̊k) but not [eq. e˙k=𝒪(e̊kα) for some α>1] faster than {k2k}.

Exercise 1.4.

(a) The smallest positive binary64  number, which we denote by realmin( binary64 ), is 2.2250738585072014e308. Analyzing its binary representation, express its value as a power of 2 (see [69, p. 14], [28], or [45]).

(b) For each sequence from (a) and (b) in the example above, compute the largest k such that fl(xk)0.

(c) Give a formula for the largest index k for which all the elements fl(xk) of all the sequences from (a) and (b) in the example above are nonzero.

Next we analyze some increasingly faster classes of sequences.

Example 1.5.

Consider

  • (c)

    {12k2}, {12k3}, {1kk};

  • (d)

    {122kk}, {122k}={(12)2k}={22k}, {12k2k}, {132k};

  • (e)

    {123k};

  • (f)

    {122kk}, {122k2}.

In fig. 2 we plot {14k}, {12k2}, and {1kk}. Their graphs are on a line (the fastest from fig. 1(b)), on a parabola y=cx2, and, resp., in between.

Refer to caption
Figure 2: Linear vs. superlinear order.

The computation with what appears at first sight to be a “reasonable” number of terms (say, 10) becomes increasingly challenging as we successively consider {xk} from (c)–(f).

We leave as an exercise the calculation/computation of the largest index k for each {xk} in (c)–(f) such that fl(xk) is a nonzero binary64  number; we note though that all fl(1/22k2) and fl(1/22kk) are zero for k4, respectively, k5, and so binary64  numbers are not enough here.

There are plenty of choices for increased precision: the well-known MATLAB [58] (but with the Advanpix toolbox [1] providing arbitrary precision), the recent, freely available Julia [5], and many others (not to mention the programming languages for symbolic computation). All the graphs from this paper were obtained using the tikz/pgf package [92] for .555The initial  system was created by D. E. Knuth [54]. The figures are easy to customize and the instructions to generate them have a simple form, briefly,

    \addplot [domain=1:20, samples=20] {1/sqrt(x)};

Equally important, we will see that the tikz/pgf library allows the use of higher precision than binary64 (more precisely, smaller/larger numbers by increased representation of the exponent, e.g., realmin around 106500). The  sources for the figures are posted on https://github.com/ecatinas/conv-ord.

The sequence groups (c)–(f) are presented in fig. 3; {12k3} belongs to a cubic parabola, and {122k} to an exponential. The parabola of {12k2} appears flat, which shows how much faster the other sequences converge. The descent of the three terms of {1/22k2} is the steepest.

As will be seen later, the sequence groups (c)–(f) have increasing order: strict superlinear, quadratic, cubic, and infinite, respectively.

Refer to caption
Figure 3: Sequences with superlinear, quadratic, cubic, and infinite orders.
Exercise 1.6.

Show that {132k} is [eq. e˙k=𝒪(e̊kα), α(1,ln3ln2]] faster than {122k}.

Obviously, when comparing two sequences by eq. e˙k<e̊k, the faster one must have its graph below the other one (kk0) (cf. Polak [75, p. 47]). While the more concave the graph the better (cf. Kelley [50, Ex. 5.7.15]), fast convergence does not in fact necessarily require smooth graphs (see Jay [47]).

Exercise 1.7.

Prove that any convergence curve between {123k} and {132k} is at least [eq. e˙k=𝒪(e̊kα), α(1,ln3ln2]] faster than {122k} (hint: use 1.6).

Consider, for instance, the merging of certain sequences.

Example 1.8.

Let

xka={132k,kodd,123k,keven,xkb={122k,keven,132k,kodd,xkc={122k,keven,152k,kodd.

{123k}, {xka}, {132k}, {xkb}, {122k}, written in eq. e˙ke̊k order of decreasing speed, are plotted in Figure 4(a). The usual ranking by convergence orders is instead {123k}, {132k}, {122k}, {xkb}, {xka} (C-orders 3, 2, 2, R-order 2, no order). Though {xka} has the second fastest convergence, the orders actually rank it as the slowest. {xkb} does not have C- or Q-order (but just exact R-order 2), so is usually ranked below {122k}.

{152k}, {xkc}, {122k}, {11.32k} from fig. 4(b) have ranking {152k}, {122k}, {11.32k}, {xkc} (C-orders 2, 2, 2, R-order 2). Though nonmonotone, {xkc} has exact R-order 2 and is (at least) eq. e˙k<e̊k faster than the C-quadratic {11.32k}.

Refer to caption
(a)
Refer to caption
(b)
Figure 4: Ranking sequences visually by eq. e˙ke̊k.
Quiz 1.9.

Given {x˙k}, {x̊k} with errors shown in fig. 5, which one is faster?

Refer to caption
(a) The first 14 terms of {x˙k}.
Refer to caption
(b) The first 10 terms of {x̊k}.
Figure 5: 1.9 (the answers to quizzes are given at the end of the paper).

The study of errors using 1.1 seems666Of course, one cannot always compare all sequences by eq. e˙ke̊k: consider {xka} and {162k}, for example. perfect in theory, but has a notable disadvantage: it requires two sequences. The alternative is to use a “scale” for measuring any single sequence—the convergence orders. The (classical) C-orders, though they demand some strong conditions, appear in the convergence of the most often encountered iterative methods for nonlinear equations.777A notable exception: the higher orders of the secant method are Q- but not necessarily C-orders.

Still, some problems appear, for example, when one wants to check the usual predictions such as “in quadratic convergence, the error is squared at each step.” Indeed, as a sequence is an infinite set, its order refers to the asymptotic range (all sufficiently large indices), where statements such as

“if ε>0 is suff. small, k00 s.t. [some property] holds kk0

come to life. Clearly, the first k0 terms (k0=2,10, or perhaps 106) are not necessarily connected to the order: a finite number of terms does not contribute to the order of an abstract sequence, while, most painful, the high order of an iterative method may not be reflected in its first steps if the initial approximation is not good enough. Therefore, such predictions, instead of being applicable to the very first term, are usually applied only starting from the first terms from the asymptotic range.

When the stringent conditions of C-orders are not fulfilled, the Q-orders are considered instead, but unfortunately they can be in disagreement with the term-by-term comparison of the errors of two sequences; indeed, we show that convergence even with no Q- (or C)-order may in fact hide a fast overall speed, only some of the consecutive errors do not decrease fast enough. The R-orders are instead known for their limited usefulness, requiring the weakest conditions and allowing nonmonotone errors.

Consider now a further problem: given some iterations for which the calculation of the order turns out to be difficult or even impossible, can the computer be used to approximate it? A positive answer is given by the computational convergence orders.

Despite the fact that convergence orders have a long history, it is only recently that some final connections were made and a comprehensive picture was formed of the C-, Q-, and R-orders, together with their computational variants [20].

The understanding of these notions is important from both the theoretical and the practical viewpoints, the comments of Tapia, Dennis, and Schäfermeyer [93] being most relevant:

The distinction between Q- and R-convergence is quite meaningful and useful and is essentially sacred to workers in the area of computational optimization. However, for reasons not well understood, computational scientists who are not computational optimizers seem to be at best only tangentially aware of the distinction.

We devote section 2 to error-based analysis of the problem of measuring and comparing convergence speeds of abstract real sequences, using either the convergence orders we review, or the term-by-term comparison of errors.

In section 3 we deal with the computational versions of the convergence orders (based either on the corrections888Each xk+1 can be seen as being obtained from xk by adding the correction. xk+1xk or on the nonlinear residuals f(xk)) and show that in they are equivalent to the error-based ones.

In section 4 we deal with three main iterative methods (Newton, secant, and successive approximations), presenting results on their attainable convergence orders, as well as other connected aspects (asymptotic constants, estimation of the attraction balls, floating point arithmetic issues, brief history, etc.).

2 𝑪-, 𝑸-, and 𝑹-Orders 𝒑𝟎>𝟏 vs. Asymptotic Rates

Even though the roots of iterative methods trace back to the Babylonians and Egyptians (ca. 1800 B.C.) [3], the first comment on convergence orders seems to have been made by Newton (ca. 1669) on the doubling of digits in quadratic convergence (quoted by Ypma in [101]): “But if I desire to continue working merely to twice as many figures, less one,,” and then in 1675: “That is, the same Division, by wch you could finde the 6th decimal figure, if prosecuted, will give you all to the 11th decimal figure.”

In the Journal Book of the Royal Society, it is recorded “17 December 1690: Mr Ralphson’s Book was this day produced by E Halley, wherein he gives a Notable Improvemt of ye method […], which doubles the known figures of the Root known by each Operation….”

Halley noticed the tripling of digits in the method he introduced.

In his Tracts, Hutton showed that one of his schemes is of third order [3]. In 1818, Fourier [38] also noted that the iterates double the exact figures at each Newton step.

In 1870, Schröder [87] implicitly defined the high C-orders for some iterative methods by considering conditions on the nonlinear mapping.

Three types of convergence orders are used at present: the classical C-order (notation adopted from [4]), the Q-order (which contains the C-order as a particular case), and the R-order. Outstanding contributions to their study were successively made by Ortega and Rheinboldt (1970) in their fundamental book [67], Potra and Pták (1984) [79], Potra (1989) [76], and finally by Beyer, Ebanks, and Qualls (1990), in the less known but essential paper [4]. In [20] we connected and completed these results (in N).

We analyze here only the high orders, noting that the various equivalent definitions of the Q-orders lead to intricate implications for linear convergence [4].

2.1 𝑪-Order 𝒑𝟎>𝟏

The definition of C- and Q-orders is obtained by considering for p1 the quotient convergence factors [67, sect. 9.1]

Qp(k):=ek+1(ek)p=|xxk+1||xxk|p,k=0,1,.3

It is assumed that xkx, k0. If, though, xk0=x in an iterative method, then (hopefully) we get xk=x, kk0 (this holds for the three methods in section 4).

Let us briefly list the four slowest types of C-order:

  • no C-order, when Q1:=limkQ1(k) (e.g., xk=1k, k odd; xk=2k, k even);

  • C-sublinear, if Q1=1 (e.g., {1k});

  • C-linear, if 0<Q1<1 (e.g., {12k});

  • C-superlinear (defined later, usually called (strict) Q-superlinear);

notice that Q1>1 cannot hold, and that {xka},{xkc}, with no C-order, in fact are fast.

Remark 2.1.

(a) (see [65, p. 620]) C-linear order implies strict monotone errors:

ek+1<ek,kk0. (1)

(b) C-sublinear monotone errors (e.g., xk=1k2, k even, xk=1k, k odd, k3).

(c) {xka},{xkc}, both nonmonotone, have no C-order.

Exercise 2.2.

If {x̊k} has C-sublinear and {x˙k} C-linear order, show that {x˙k} is [eq. e˙k=𝒪(e̊kα) α>1] faster than {x̊k}.

The following definition of high orders is well known; p0>1 is implicitly assumed throughout this paper even if not explicitly mentioned.

Definition 2.3.

{xk} has C-order p0>1 if

Qp0:=limkQp0(k)(0,+). (C)
Remark 2.4.

It is important to stress that Qp0 above cannot be zero or infinite; these two cases will arise when analyzing the Q-orders.

By denoting (see [67, sect. 9.1], [88, p. 83])

Q¯p=lim infkQp(k),Q¯p=lim supkQp(k),p1, (2)

condition eq. C is equivalent either to relation

0<Q¯p0=Qp0=Q¯p0< (CQ)

or to requiring that ε>0,k00 such that

(Qp0ε)ekp0ek+1(Qp0+ε)ekp0kk0. (Cε)
Remark 2.5.

eq. C eq. 1 (by eq. Cε, or because eq. C is stronger than C-linear).

Example 2.6.

(a) xk=θp0k for some θ(0,1), p0>1, is the standard example for C-order p0 (Qp0=1). The usual instances are the C-quadratic {22k}, {32k} (θ=12, resp., θ=13, p0=2), the C-cubic {23k} (θ=12, p0=3), etc.

(b) {c22k} (c given, c0) also has C-quadratic convergence.

(c) The C-quadratic {(1)k22k} is nonmonotone, but with monotone {ek}.

(d) xkd={c(13)2k,kodd,1c(13)2k,keven, c>1, does not have C-order 2: Q¯2=1c3, Q¯2=c3.

Quiz 2.7 ([42, Ex. 5.6  and p. 245]).

What is the rate of {xk} if {ek} is

(a) 102,103,104,105,;

(b) 102,104,106,108,;

(c) 102,103,105,108,1013,1021,;

(d) 102,104,108,1016,?

Remark 2.8 (see, e.g., [4]).

If it exists, the C-order p0>1 of {xk} is unique. Indeed, if raising the power of the denominator in Qp0(k), the quotient tends to infinity, since Qp0+ε(k)=Qp0(k)1(ek)ε ε>0. Similarly, if lowering the power, it tends to zero.

Example 2.9.

Let {10522k}, {22k/k}, and {xkd} (c=10); in fig. 6 we plot {Q1.8(k)}, {Q2(k)}, and {Q2.2(k)} for each of them.

Refer to caption
Figure 6: Qp(k), k=1,13¯ (p=1.8, 2, 2.2).

The limiting values of Q¯p, Q¯p from eq. 2 may be regarded as “functions” of p1. Their graphical illustration leads to the so-called Q-convergence profile of {xk} [4] (or Q-profile here, for short).

For all C-orders p0>1, it holds that Q¯p=Q¯p=Qp p1, and Qp is a “function” given by Qp=0, p[1,p0), Qp0(0,+), and Qp=+, p>p0.

Refer to caption
Figure 7: Q-convergence profile for {122k}: the limit points of Qp(k) as “functions” of p. The vertical axis is not to scale.

In fig. 7 we plot the Q-profile of {22k}.

Exercise 2.10 ([4]).

Show that xkx cannot have C-order p0<1.

Exercise 2.11.

(a) Show that if {xk} has C-order p0>1, then so has {ek+1ek} and

Qp0{ek+1ek}=1, (3)

regardless of Qp0{ek}0 [20, Rem. 3.9]. Does the converse hold? (Hint: take {1k}.)

(b) [20, Rem. 3.9] Find a similar statement for {ek+1ek}.

(c) (Kelley [50, Ex. 5.7.15]) If {xk} has C-order p0>1, show that lgek is concave, i.e., lgek+1lgek is decreasing.

Not only convergence with any high order exists, but even the infinite C-order, defined either by Qp=0p>1 (take, e.g., {222k} [67, E 9.1-3(g)] or {22k2}) or by convention, when the convergence is in a finite number of steps.

“The higher the order, the better” is a well-known cliché, which we will express in section 2.5 in terms of the big Ohs.

2.2 𝑸-Order 𝒑𝟎>𝟏

We propose the following definitions of Q-order convergence:

  • with unbounded nonmonotone errors if Q¯1= (e.g., {xka},{xkc});

  • Q-sublinear if 1Q¯1<+ (e.g., xk=2k, k odd, xk=1k, k even);

  • exact Q-sublinear if 0<Q¯11Q¯1<+;

  • at least Q-linear if Q¯1<1;

  • Q-linear if 0<Q¯1<1;

  • exact Q-linear if 0<Q¯1Q¯1<1.

Remark 2.12.

(a) Obviously, when xkx, then Q¯11 and Q¯1Q¯1.

(b) Q¯1[0,+] always exist, while Q1 may not (i.e., when Q¯1<Q¯1).

(c) Q¯1<1 monotone, while Q¯1>1 nonmonotone errors; Q¯1= means unbounded nonmonotone errors. Unlike 0<Q11, 0<Q¯1+ alone does not necessarily imply slow speed (e.g., {xka},{xkc} with unbounded nonmonotone errors).

The convergence may be fast when Q¯1=0, even if Q¯1= (e.g., {xka},{xkc}), but is no longer fast when 0<Q¯1.

Strict Q-superlinear is an intermediate order between linear and p0>1.

Definition 2.13 ([67, p. 285]).

{xk} has Q-superlinear order999Or at least Q-superlinear, or even superlinear, as R-superlinear is seldom encountered. if Q¯1=0(=Q1):

limkek+1ek=0,(ck0 s.t. ek+1=ckek).

Strict Q-superlinear order holds when, moreover, Q¯p=+ p>1.

Remark 2.14 (cf. [75, (37b)]).

Q-superlinear order holds if θk>0 and c>0 such that θk0 and ek=ci=1kθi (c may be taken to be 1).

Example 2.15 (strict Q-superlinear).

(a) Let {1kk} [6, p. 22], {110k2} [11], [4], {1k!} [47], {1ck2}, c>1 [67, E 9.2-1(j)].

(b) [67, E 10.1-4], [79, p. 94] Given 0<c<1e, let xk+1=xklnxk, k0, x0=c.

Exercise 2.16.

{xkb} has Q-superlinear order, but not strict Q-superlinear order.

The following formulation has been commonly used for half a century, since the 1970 book of Ortega and Rheinboldt: “{xk} converges with Q-order at least p0>1[67], [79], [76], [85]. Here we deal, as in [4] and [20], with a more restrictive notion, with the classic notion corresponding to the lower Q-order ql from definition 2.30.

Definition 2.17 (see [20]; cf. [76], [4]).

{xk} has Q-order p0>1 if any of the following equivalent conditions hold:

limkQp(k)={0,p[1,p0),+,p(p0,+); (Q)
limkQL(k)=p0,QL(k):=lnek+1lnek; (QL)
limkQΛ(k)=p0,QΛ(k):=lnek+2ek+1lnek+1ek; (QΛ)

or ε>0, A,B>0 such that

Aekp0+εek+1Bekp0εkk0. (Qε)
Remark 2.18.

(a) If the Q-order p0 exists, it is unique (recall remark 2.8).

(b) If {xk} has Q-order p0>1, then the right-hand side inequalities in eq. Qε imply that the errors are strictly monotone (kk0); see remarks 2.5 and 2.1.

In [20] we proved the equivalence of the four conditions above for {xk}N, connecting and completing the independent, fundamental results of Potra [76] and Beyer, Ebanks, and Qualls [4]. This proof does not simplify for {xk}, but a natural connection between these conditions is found, e.g., by taking logarithms in eq. Cε to obtain eq. QL.101010p0 in eq. QL roughly means that the number of figures is multiplied by p0 at each step (kk0); see, e.g., [48] and [79, p. 91]. Also, eq. QL and eq. QΛ can be connected by 2.11 and as follows.

Exercise 2.19.

If {xk} has strict monotone errors for kk0, prove that eq. QΛ eq. QL (hint: use the Stolz–Cesàro Theorem).

Remark 2.20.

(a) One can always set k0=0 in eq. Qε (this is actually used in the definitions of Potra from [76]), by a proper choice of smaller A and larger B.

(b) In [20] we have denoted by (Qε) a condition which is trivially equivalent to eq. Q. Here we use (Qε) instead of (QI,ε) from [20] (“I” stood for “inequality” in that paper).

(c) eq. Q implies that the Q-profile of {xk} has a single jump at p0>1, but that the limit Qp0 is not required to exist, so

0Q¯p0Q¯p0, (4)

and so six cases result for the possible values of Q¯p0 and Q¯p0 (0, finite >0, +).

(d) Relations Q¯p0=0 or Q¯p0=+ might seem rather abstract, but they do actually occur (even simultaneously, as we shall see for the higher orders of the secant method; see also [99, p. 252 and Chap. 7], where Q¯2=Q¯2=+, for a problem in N).

(e) Q-order p0>1 implies Q¯1=0, but the converse is not true.

fig. 8 shows the Q-profiles of {22k/k} and {xkd} (c=54=1.25).

Refer to caption
(a) Q-profile for {122k/k}.
Refer to caption
(b) Q-profile for {xkd}, c=54.
Figure 8: Q-profiles: 8(a) Q-subquadratic; 8(b) exact Q-quadratic.

Superlinearity may also be defined in conjunction with higher Q-orders, but this is tricky.

Definition 2.21 (see [47], [20]).

The Q-order p0>1 is called

  • Q-superorder p0 if Q¯p0=0(=Qp0);

  • Q-suborder p0 if Q¯p0=+(=Qp0).111111In [20] this was defined by Q¯p0=+, which we believe is not strong enough.

The Q-order 2 with Q¯2=0 is Q-superquadratic (analogously, Q-supercubic, etc.).

Exercise 2.22.

(a) The Q-superquadratic {x˙k}={2k2k}, the Q- (and C-)quadratic {22k}, and the Q-subquadratic {x̊k}={22kk} are in [eq. e˙k=𝒪(e̊kα) α>1] decreasing order of speed. Study the Q-order of zk={x˙k,k odd,x̊k,k even, (2 is erroneous in [20]).

(b) Although the C-quadratic {22k} is eq. e˙k=o(e̊k) faster than {k22k}, the latter, considered in [47], is actually Q-superquadratic; similarly, {22k/k} is Q-subquadratic. The terminology “Q-sub/superquadratic” appears here to be in disagreement with the real speed.

(c) Determine the Q-orders from 2.7 by using eq. QL.

Example 2.23.

Let {1023(k1/4k)} (C-cubic), {1032(k+1/3k)} (C-quadratic), and {1kk} (Q-superlinear); in fig. 9 we see that QL(k),QΛ(k1) tend to 3,2, and 1, respectively.

Refer to caption
Figure 9: QL(k) and QΛ(k1) computed for three sequences.
Remark 2.24.

Plotting/comparing QL(k) and QΛ(k1) avoids conclusions based on QΛ(k) using information ahead of that from QL(k) (i.e., xk+2 vs.  xk+1).

The calculation of the order may be made easier by using logarithms.

Exercise 2.25.

Let xkx.

(a) If, for some given q1 and A,B>0, one has

Aekek1qek+1Bekek1q,kk0, (5)

show that {xk} has Q-order

λq:=1+1+4q2.

Thus, λ11.618 (the golden ratio), λ2=2, λ3=1+1322.3, etc. Inequality eq. 5 appears in the analysis of the secant method (see theorems 4.18 and 4.22), and it does not necessarily attract C- or even exact Q-orders, even if limkek+1ekek1q=c(0,).

(b) Calculate the Q-order of {xk} if it satisfies (see [94], [78])

Aek2ek1ek+1Bek2ek1.

(c) If {xk} verifies the general relation (see, e.g., [95, Chap. 3], [7], [76], [44], [72])

Aekα0ekqαqek+1Bekα0ekqαq,

determine the condition verified by the Q-order (see [76] for the exact Q-order).

A special case holds when ε=0 in eq. Qε (see also [9], [88], [79], [11], [76]).

Definition 2.26 (see, e.g., [8]).

{xk} has exact Q-order p0 if A,B,k00, s.t.

Aekp0ek+1Bekp0kk0. (Q¯¯)

This case can be characterized in terms of the asymptotic constants Q¯p0,Q¯p0.

Proposition 2.27.

{xk} has exact Q-order p0 iff Q¯p0,Q¯p0 are finite and nonzero,

0<Q¯p0Q¯p0<, (Q¯¯Q)

in which case the constants A,B in eq. Q¯¯ are bounded by

AQ¯p0Q¯p0B,

and these bounds are attained in some special restrictive circumstances.

Example 2.28.

(a) {xkd} in example 2.6 has exact Q-order 2: Q¯2=1c3Q¯2=c3.

(b) xk=22k, k odd, xk=322k133k, k even, has Q(2k)<Q¯2=19, i.e., A<Q¯2.

Remark 2.29.

The sequences {e̊k} and {ek} are asymptotically similar when e̊k=𝒪(ek) and ek=𝒪(e̊k), as k, denoted by ek=Θ(e̊k) (see, e.g., [53], [25, p. 50]).

The exact Q-order may also be expressed as ek+1=𝒪(ekp0) and ekp0=𝒪(ek+1), as k (see [8], [9, p. 2], and also [71]), i.e., ek+1=Θ(ekp0).

The classical Q-order of Ortega and Rheinboldt is the ql below.

Definition 2.30 ([4], [67], [88], [76]).

The lower/upper Q-orders of {xk} are

ql={, if Q¯p=0p1,inf{p[1,):Q¯p=+},resp., qu=sup{p[1,):Q¯p=0}. (6)

When {xk} has Q-order p0>1, the lower and upper orders coincide, ql=qu=p0; otherwise, ql<qu [4]; we will also see this in theorem 2.48 (relation eq. 13).

Next we analyze {xkb} from example 1.8, which has no Q-order, despite it being eq. e˙ke̊k faster than {122k}. We keep in mind that the Q-order does not measure the overall speed, but just compares the consecutive errors.

Example 2.31 ([20]).

Let xkb={22k,keven,32k,kodd, with the Q-profile in fig. 10(a).

{xkb} does not have a Q-order, as ql=log34=1.2<qu=4log43=3.1,, but still it has (exact) R-order 2 (see definitions 2.35 and 2.38). We note the usual statement in this case that “{xkb} converges with Q-order at least ql=1.2 and with R-order at least 2,” that no longer holds in the setting of this paper.

Refer to caption
(a) Q-profile of {xkb} (no Q-order).
Refer to caption
(b) R-profile of {xkb} (exact R-order 2).
Figure 10: 10(a) A Q-profile; 10(b) an R-profile.
Exercise 2.32.

Show that {xkc} has Q¯1=0,Q¯1=+ and determine ql,qu (here we find ql<1, and the Q-profile can be extended, as in [4], for p<1).

Determine the Q-profiles of {xka} and {xkc}.

The inequalities implied by the lower/upper Q-orders are as follows.

Remark 2.33.

If 1<ql<, then ε>0, with 1<qlε, B>0,k00, s.t. [80]

ek+1Bekqlεkk0 (7)

(and relation eq. 6 says that ql is the largest value satisfying the above inequalities).

When 1<qu<, then ε>0, A>0,k10, such that

Aekqu+εek+1kk1. (8)

One can say that the lower Q-order is small when the maximum error reduction w.r.t. the previous step is small (i.e., small exponent ql+ε for ek). Analogously, the upper order is large when the minimum error reduction per step is small.

The exact lower/upper Q-orders ql, respectively, qu are defined if ε=0 in eq. 7eq. 8:

Aekquek+1Bekqlkk2.

The role of qu will be seen in example 2.46.

The lower and upper Q-orders verify the relations (see [76], [20])

ql= Q¯L:=lim infklnek+1lnek, (9)
qu= Q¯L:=lim supklnek+1lnek, (10)

which will be completed in formulae eq. 19, respectively, eq. 20 by some computational variants.

2.3 𝑹-Order 𝒑𝟎>𝟏

The root factors consider some averaged quantities instead of relating the consecutive terms to one another; they are defined for p=1 by

R1(k)=ek1k,k1.

We propose the following terminology (see also [67], [63], [75], [79], [100]):

  • R-sublinear/with no R-order if R¯1=1 (e.g., {1k},{1k}; see [75, Rem. 1.2.40]);

  • R-linear if 0<R¯1<1;

  • at least R-linear if R¯1<1 (e.g., xk=12k, k odd, xk=14k, k even [75, Rem. 1.2.40], or xk=12k, k even, xk=122k, k odd);

  • exact R-linear if 0<R¯1R¯1<1;

  • (at least) R-superlinear if R¯1=0(=R1) (e.g., xk=1kk, k even, xk=xk1, k odd).

The R-orders are alternatively defined by requiring the errors to be bounded by sequences converging to zero with corresponding Q-orders [32, p. 21], [51, Def. 2.1.3], [43, p. 218]:

  • at least R-linear if θ(0,1) and c>0 s.t. ekcθk (k0) [75, (37a)];

  • R-superlinear if θk0 and c>0 s.t. ekci=1kθi [75, (37b)].

Remark 2.34.

(a) At least Q-linear at least R-linear [67, E 9.3-5], [75, Thm.1.2.41], [77], [100], [43, p. 213] (using, e.g., eq. 14); the converse is false (see above or [100]).

(b) Q-superlinear R-superlinear, with the converse again being false (see, e.g., the above example).

When p>1, the root factors are defined by (see [67], [79], [76], [85])

Rp(k)=ek1pk,k0.
Definition 2.35 (see [20]; cf. [76], [7]).

{xk} has R-order p0>1 if any of the equivalent relations hold:121212In order that relation eq. RΛ is properly defined and equivalent to the rest of the following definitions, an additional assumption is required (see [20]): 1<qlqu<+.

limkRp(k)={0,p[1,p0),1,p(p0,+); (R)
limkRL(k)=p0,RL:=|lnek|1k; (RL)
limkRΛ(k)=p0,RΛ:=|lnek+1ek|1k; (RΛ)

or ε>0, A,B>0, 0<η,θ<1, and k00 such that

Aη(p0+ε)kekBθ(p0ε)kkk0. (Rε)
Remark 2.36.

(a) If the R-order p0 exists, it is unique (see also [67, 9.2.3, p. 289]).

(b) We can always set k0=0 in eq. Rε, by choosing smaller A and larger B suitably (cf. [76], where k0=0 was used in definitions).

(c) We prefer here the notation (Rε) instead of (RI,ε) from [20].

For p1 one defines the asymptotic quantities (see [67, sect. 9.2] and [88, p. 85])

R¯p=lim infkRp(k),R¯p=lim supkRp(k)1. (11)

An example of an R-profile is shown in fig. 10(b) for {xkb}, with (exact) R-order 2.

Remark 2.37 ([67, p. 290], [20]).

If R¯p0<1, then ε>0 with R¯p0+ε<1, k00 s.t.

ek(R¯p0+ε)p0kkk0,

while if R¯p0>0, then ε>0 with 0<R¯p0ε, k10 s.t.

(R¯p0ε)p0kekkk1.

The following notion is defined similarly to the exact Q-order.

Definition 2.38 ([76]).

{xk} has exact R-order p0 if A,B>0,0<η,θ<1, s.t.

Aηp0kekBθp0kkk0. (R¯¯)
Example 2.39.

{xkb}, {xkc} have exact R-orders 2: R¯2=13, R¯2=12, and R¯2=15, R¯2=12, respectively.

Exercise 2.40.

The Q-sub-/superquadratic {22k/k}, {k22k} have R2=12, but not exact R-order 2. The Q-sub-/superquadratic {22k/k}, {2k2k} have R2=1, respectively, R2=0, i.e., they are R-sub-/superquadratic, again without having exact R-order 2.

Remark 2.41.

Perhaps a proper definition of Q-superorder p0 would be Q-order p0 with Qp0=0=Rp0, while Q-suborder p0 would be Q-order p0 with Qp0=, Rp0=1.

We characterize the exact R-order in the following result.

Proposition 2.42.

The exact R-order p0 of {xk} is equivalently defined by

0<R¯p0R¯p0<1, (R¯¯R)

which implies the following bounds for η,θ from eq. R¯¯:

ηR¯p0R¯p0θ;

these bounds are attained in some special restrictive circumstances.

Remark 2.43.

A particular instance from eq. R¯¯, i.e., ekBθ2k, was considered as a definition for (at least) R-quadratic convergence, and some computational scientists (who we suspect are not computational optimizers) have misled by simply calling it “quadratic convergence,” leading to the confusion noted in [93].

As the R-orders require weaker conditions than the Q-orders, they may allow nonmonotone errors. This aspect is perhaps not widely known; we have found it pointed out in [65, p. 620, and in the 1st ed.], [51, p. 14], [25, p. 51], and [47].

Clearly, iterative methods with nonmonotone errors are usually not desired.

We can easily find conditions for monotone errors in the case of exact R-order p0>1.

Theorem 2.44 (monotone errors in exact R-orders).

If {xk} obeys eq. R¯¯ and
p0>lnηlnθ(lnR¯p0lnR¯p0)or(p0=lnηlnθandB<A),
then it has strict monotone errors (kk0).

The lower and upper R-orders, rl, respectively, ru, and further notations from [20] are

rl= inf{p[1,):R¯p=1},ru=sup{p[1,):R¯p=0},
R¯L:= lim infk|lnek|1k,R¯L:=lim supk|lnek|1k,
R¯Λ:= lim infk|lnek+1ek|1k,R¯Λ:=lim supk|lnek+1ek|1k.
Remark 2.45.

The lower R-order rl of {xk} was also defined as follows: {x̊k} with C-order p̊0=rl s.t. {xk} is eq. e˙ke̊k faster than {x̊k} [29], [76], [50, Def. 4.1.3].

We now consider some sequences similar to those in example 1.8.

Example 2.46.

(a) Let x̊k=22k and take x˙k={22k,k𝑒𝑣𝑒𝑛,23k,k𝑜𝑑𝑑 (Jay [47]).

Then {x˙k} converges eq. e˙ke̊k faster than {x̊k} and though {x˙k} has neither C-order, nor Q-order, nor R-order (r˙l=2, r˙u=3), {x̊k} has C-order 2.

(b) Extending the above behavior, x˙k={24k,k𝑒𝑣𝑒𝑛,25k,k𝑜𝑑𝑑 has no C-, Q-, or R-order, but it converges [eq. e˙k=𝒪(e̊kα) α>1] faster than x̊k=22k.

(c) x˙k={232k,k𝑒𝑣𝑒𝑛,242k,k𝑜𝑑𝑑 is [eq. e˙k=𝒪(e̊kα) α>1] faster than x̊k=222k; {x˙k} has no C- or Q-order (but has infinite R-order), while {x̊k} has infinite C-order.

Here {x˙k} deserves a closer look, as it is a perfect candidate to support the comments from [93]: indeed, it attains infinite R-order, but its errors are unbounded nonmonotone. Nocedal and Wright [65, p. 620] also noted such possible behavior.

Remark 2.47.

Clearly, statements such as those in the above example can appear only for sequences {x˙k} with Q¯1=0, as this condition allows large upper orders qu. When Q¯1>0, the corresponding sequence cannot converge quickly.

2.4 Relations between the 𝑪-, 𝑸-, and 𝑹-Orders of a Sequence

C-order p0>1 of {xk} implies Q-order p0 and in turn R-order p0 (see 2.22 and 2.31 for converses). We state this below and, as in [4], we use curly braces for equivalent orders.

Theorem 2.48 (see [20]; cf. [76], [4]).

Let xkx and p0>1. Then (see footnote 12)

{C,CQ,Cε}{Q,QL,QΛ,Qε}{R,RL,RΛ,Rε}, (12)

or, in a more generic fashion (i.e., {C}:={C,CQ,Cε}),

{C}{Q}{R}.

Moreover, the following relation holds for the lower and upper orders:

ql=Q¯LR¯L=R¯Λ=rlru=R¯L=R¯ΛQ¯L=qu. (13)

Relation eq. 13 is obtained from the well-known inequalities for positive numbers,

lim infkak+1aklim infk|ak|1klim supk|ak|1klim supkak+1ak, (14)

taking ak=|lnek|; see [76] and [20].

Any inequality from eq. 13 may be strict. Now we see (e.g., in examples 2.31 and 2.49) that the inner inequality may be an equality (i.e., obtain R-order), while one of the outer inequalities may be strict (i.e., no Q-order).

An {xk} with exact R-order τ>1 arbitrarily large and 1<ql (but arbitrarily close) is again suitable for justifying the comments from [93].

Example 2.49 ([67, E 9.3-3], [76]).

Given any numbers 1<s<τ, take 0<θ<1, η=θq with q>1 such that qs>τ. Then xk={θτk,kodd,ητk,keven has exact R-order τ, while ql=Q¯L=τq and qu=Q¯L=τq(>τ), and thus it has no Q-order (in the classical sense from [67] {xk} has Q-order at least τq).

2.5 Comparing the Speeds of Two Sequences

We consider only C-orders.

Comparison 2.50 (higher C-order, faster speed).

If {x̊k},{x˙k} have C-orders 1<p̊0<p˙0, then {x˙k} is [eq. e˙k=𝒪(e̊kα) α>1] faster than {x̊k}.

For the proof, one may use eq. Cε. In [21] we show some simplified proofs.

Remark 2.51.

(a) 2.50 holds regardless of the magnitude of p˙0p̊0.

(b) The C-orders (i.e., sublinear, linear, strict superlinear, ( strict superlinear removed! ), 1<p̊0<p˙0, and infinite) form classes in [eq. e˙k=𝒪(e̊kα) α>1] increasing speed (cf. 2.2).

As seen in previous examples, comparing by [eq. e˙k=𝒪(e̊kα) α>1] is much stronger than by [eq. e˙k=𝒪(e̊kα), α(1,α0]] for some given α0>1.

How similarly do two sequences with the same C-order and identical Qp0 behave?

Comparison 2.52.

{132k} is [eq. e˙k=𝒪(e̊kα), α(1,ln3ln2]] faster than {122k} (recall 1.6). This shows that if {x̊k},{x˙k} have C-order p0 with the same Qp0, this does not necessarily mean that (in the asymptotic range) one has {e˙k}{e̊k}.

Brezinski [11] noted that both {1k} and {1k2} have the same Q1=1 (i.e., C-sublinear order), but quite different speeds.

Remark 2.53.

Assessing the asymptotic constant Qp0 may not make sense when {xk}N,N2, as its definition is norm-dependent [67, E 9.1-2], [20].

3 Computational Versions of the Convergence Orders

The error-based analysis of the orders above is similar in some ways to the local convergence theory for iterative methods from section 4: both assume the existence of the limit/solution x and infer essential properties, even if x is not known and in practice one needs to use quantities based solely on information available at each step k.

Next, we analyze two practical approaches, equivalent to error-based analysis: the replacing of |xxk| by (the absolute value of) either the corrections sk:=|xk+1xk| or the nonlinear residuals |fk|:=|f(xk)|.

We keep the notation from previous work and obtain a rather theoretical setting in this analysis (e.g., sk requiring xk+1); in numerical examples, however, we will use only information available at step k (i.e., xk,|fk|,sk1,xk1,|fk1|,sk2,, etc.).

3.1 Computational Convergence Orders Based on Corrections

When the corrections {sk} converge with lower Q-order ql, then {xk} also converges and attains at least lower R-order ql.

Theorem 3.1 (see [75, Thm. 1.2.42], [43, Lem. 4.5.6]).

Let {xk}. If c(0,1) and k0 with

|xk+1xk|c|xkxk1|(i.e., skcsk1),kk0+1,

then x such that xkx at least R-linearly.

If c>0,p0>1, and k0 s.t.

c1p01sk0<1

and

skcsk1p0,kk0+1,

then x such that xkx with lower R-order at least p0.

The errors and corrections are tightly connected when the convergence is fast.

Lemma 3.2 (Potra–Pták–Walker Lemma; [79, Prop. 6.4], [98]).

xkx Q-superlinearly iff xk+1xk0 Q-superlinearly.

In the case of Q-superlinear convergence, it holds (the Dennis–Moré Lemma [30]) that

limk|xk+1xk||xxk|=1. (15)
Remark 3.3.

(a) As pointed out by Dennis and Moré in [30], eq. 15 alone does not imply Q-superlinear convergence: take x2k1=1k!, x2k=2x2k1, k1, for example.

(b) While the statement and the proofs of this result consider multidimensional spaces and norms (or even metrics) in eq. 15, Brezinski [10] has noted that in the sufficiency can also be proved by using l’Hôpital’s rule for sequences.

As in [4], we use an apostrophe for the resulting quotient factors (p>1):

Qp(k):=sk+1skp=|xk+2xk+1||xk+1xk|p,k0,

and the above lemma shows that C-order p0 C-order p0 with Qp0=Qp0 (see also [40]). The same statement holds for the Q-orders and corresponding upper/lower limits of Qp0(k). The equivalence of the error-based and computational orders ({C,C}, etc.) is stated in section 3.3, which incorporates the nonlinear residuals as well.

The C-, Q-, and Qε-orders are semicomputational: they do not use x but still require p0. Instead, of much practical interest are the (full) computational expressions

limkQL(k)=p0,QL(k):=lnsk+1lnsk, (QL)
limkQΛ(k)=p0,QΛ(k):=lnsk+2sk+1lnsk+1sk, (QΛ)

as they are equivalent to the corresponding error-based orders (see corollary 3.5).

3.2 Computational Convergence Orders Based on Nonlinear Residuals

In section 4 we study the speed of some iterative methods toward a zero x of f. The nonlinear mapping f:D is assumed to be sufficiently many times differentiable and the solution is assumed simple:131313This terminology is used for equations in , while for systems in N the terminology is “nonsingular,” as (the Jacobian) F(x) is a matrix. f(x)0 (this means that f(x)=(xx)g(x) and g(x)0). However, we will consider multiple solutions as well.

The use of the nonlinear residuals for controlling the convergence to x is natural, as they can be seen as “proportional” to the errors: by the Lagrange theorem,

f(x)=f(x+θ(xx))(xx) for some θ(0,1), (16)

which in applied sciences is usually written as

f(x)f(x)(xx)(xx). (17)

However, instead of an asymptotic relation, this is often regarded as an approximate equality. Such an approach is as precise as the idiomatic “the higher the order, the fewer the iterations (needed)”: it may not refer to the asymptotic range.

Quiz 3.4.

If f is a polynomial of degree 2 with f(x)=1 at the root x=0, how large can |f(x)| be when |xx|=0.0001 holds (and then for |xx|=1016)?

Returning to the quotient factors, we use, as in [20], double prime marks,

Qp0′′(k):=|f(xk+1)||f(xk)|p0=:|fk+1||fk|p0,

and notice that if Qp0(0,+), then, by (16),

Qp0′′(k)Qp0|f(x)|p01. (18)

This leads us to the C′′- and Q′′-orders p0>1. For instance,

limkQL′′(k)=p0,QL′′(k):=ln|fk+1|ln|fk|, (QL′′)
limkQΛ′′(k)=p0,QΛ′′(k):=ln|fk+2/fk+1|ln|fk+1/fk|. (QΛ′′)

3.3 The Equivalence of the Error-Based and Computational Orders

This equivalence was given in N [20] (see also [76] and [4]). Here we formulate it as a corollary, since in the three C-orders are equivalent (see [20, Ex. 3.22] for {C′′}{C,C} in N).

Corollary 3.5 (cf. [20]).

Let f be smooth, x simple, xkx, p0>1. Then (see footnote 12)

{C,C,C′′}{Q,Q,Q′′}{R,R,R′′}.

Moreover, the lower/upper orders and asymptotic constants obey

Q¯L =Q¯L=Q¯L′′=qland Q¯L=Q¯L=Q¯L′′=qu,resp., (19)
Q¯qu =Q¯qu=|f(x)|p01Q¯qu′′and Q¯ql=Q¯ql=|f(x)|p01Q¯ql′′. (20)

Consequently, one has C-order p0>1 iff

0<Qp0=Qp0=|f(x)|p01Qp0′′<

and Q-order p0>1 iff

p0=Q¯L=Q¯L=Q¯L′′=ql=Q¯L=Q¯L=Q¯L′′=qu=QΛ=QΛ=QΛ′′.

Relation eq. 19 shows equalities of orders, while eq. 20 shows that of constants.

The equivalence of the computational orders based on the corrections follows from the Potra–Pták–Walker lemma, while eq. 19 and eq. 20 follow from the Dennis–Moré lemma.

The equivalence of the computational orders based on the nonlinear residuals is obtained using the Lagrange theorem, by eq. 16.

4 Iterative Methods for Nonlinear Equations

Unless otherwise stated, the nonlinear mapping f:D is sufficiently smooth and the solution x is simple.

Since an equation f(x)=0 might have a unique, several (even infinite), or no solution at all, the iterative methods usually need an initial approximation close to an x.

Definition 4.1 ([67, Def. 10.1.1]).

A method converges locally to x if VD (neighborhood of x) s.t. x0V, the generated iterations {xk}D and xkx.

4.1 The Newton Method

The first result on the local convergence of a method was given for the Newton iterations (see [67, NR 10.2-1], [101], [23, sect. 6.4]),

xk+1=xkf(xk)f(xk),k=0,1,,x0D, (21)

and it is to due to Cauchy in 1829 ([22], available on the Internet). These iterates are also known as the tangent method, though their initial devising did not consider this geometric interpretation.141414The first such interpretation was given by Mourraille (1768) [23, sect. 6.3].

4.1.1 History

This method can be traced back to ancient times (Babylon and Egypt, ca. 1800 B.C.), as it is conjectured that it was used in the computation (with five correct—equivalent—decimal digits) of 2 (see [3], [49, sect. 1.7], [74], with the references therein, and also [86, p. 39] for an annotated picture). While this is not certain, there is no doubt that Heron used these iterations for a [3].

“Heron of Alexandria (first century A.D.) seems to have been the first to explicitly propose an iterative method of approximation in which each new value is used to obtain the next value,” as noted by Chabert et al. [23, p. 200]. Indeed, Heron has approximated 720 in an iterative fashion by elements corresponding to the Newton method for solving x2720=0 [23, sect. 7.1].

We can also recognize the Newton method for x2A=0 used by Theon of Alexandria (ca. 370 A.D.) in approximating 4500, based entirely on geometric considerations (originating from Ptolemy; see [74]). See [23, sect. 7.2].

So efficient is this method in computing two of the basic floating point arithmetic operations—the square root and division (see 4.4)—that it is still a choice even in current (2021) codes (see, e.g., [62, pp. 138, 124]).

Let us recall some general comments on the subsequent history of this method, thoroughly analyzed by Ypma in [101].

A method algebraically equivalent to Newton’s method was known to the 12th century algebraist Sharaf al-Dīn al-Ṭūsī […], and the 15th century Arabic mathematician Al-Kāshī used a form of it in solving xpN=0 to find roots of N. In western Europe a similar method was used by Henry Briggs in his Trigonometria Britannica, published in 1633, though Newton appears to have been unaware of this […]. [101] (see also [82])

In solving nonlinear problems using such iterates, Newton (1669) and subsequently Raphson (1690) dealt only with polynomial equations,151515x32x5=0 is “the classical equation where the Newton method is applied” [90].,161616Actually, as noted by Ypma, Newton considered such iterations in solving a transcendent equation, namely, the Kepler equation xesinx=M. “However, given the geometrical obscurity of the argument, it seems unlikely that this passage exerted any influence on the historical development of the Newton-Raphson technique in general” [101]. described in [90]. Newton considered the process of successively computing the corrections, which are ultimately added together to form the final approximation (cf. [90]); see algorithm 1.

Algorithm 1 The algorithm devised by Newton (left), and his first example (right)
Let f(x), x0 [f(x)=x32x5, x0=2]

Let g0(s)=f(x0+s) [g0(s)=(2+s)32(2+s)5=s3+6s2+10s1]

For k=0:m1

Compute g~k(s) (the order-one approx. of gk(s)) [g~0(s)=10s1]

Solve for sk in g~k(s)=0 [s0=0.1]

Let gk+1(s)=gk(sk+s) [g1(s)=g0(s0+s)=s3+6.3s2+11.23s+0.061]

xn:=x0+k=0n1sk [xn=2+0.10.06111.23++sn1]

Raphson considered approximations updated at each step, a process equivalent to eq. 21 (see [55], [101], [90]). However, the derivatives of f (which could be calculated with the “fluxions” of that time) do not appear in their formulae, only the first-order approximations from the finite series development of polynomials (using the binomial formula). For more than a century, the belief was that these two variants represented two different methods;171717“Nearly all eighteenth century writers and most of the early writers of the nineteenth century carefully discriminated between the method of Newton and that of Raphson” (Cajori [13]). For a given input (n fixed), the result of LABEL:alg._devised_by_Newton may depend on how accurately the different fractions (e.g., 0.06111.23) are computed. it was Stewart (1745) who pointed out that they are in fact equivalent (see [90]). As noted by Steihaug [90],

It took an additional 50 years before it was generally accepted that the methods of Raphson and Newton were identical methods, but implemented differently. Joseph Lagrange in 1798 derives the Newton-Raphson method […] and writes that the Newton’s method and Raphson’s method are the same but presented differently and Raphson’s method is plus simple que celle de Newton.

Simpson (1740) was the first to apply the method to transcendent equations, using “fluxions.”181818The fluxions x˙ did not represent f, but are “essentially equivalent to dx/dt; implicit differentiation is used to obtain dy/dt, subsequently dividing through by dx/dt as instructed produces the derivative A=dy/dx of the function. […] Thus Simpson’s instructions closely resemble, and are mathematically equivalent to, the use of [eq. 21]. The formulation of the method using the now familiar f(x) calculus notation of [eq. 21] was published by Lagrange in 1798 […]” [101] (see also [90]).,16 Even more important, he extended it to the solving of nonlinear systems of two equations (see [23], [101], [90]),191919The first nonlinear system considered was y+y2x210=0, x+y2+x12=0, with (x0,y0)=(5,6), according to [101] and [90]. subsequently generalized to the usual form known today: F(x)=0 with F:NN, for which, denoting the Jacobian of F at xk by F(xk), one has to solve for sk at each step the linear system

F(xk)s=F(xk). (22)

Cajori [13] noted (see also [101]):

Then appear writers like Euler, Laplace, Lacroix and Legendre, who explain the Newton-Raphson process, but use no names. Finally, in a publication of Budan in 1807, in those of Fourier of 1818 and 1831, in that of Dandelin in 1826, the Newton-Raphson method is attributed to Newton. The immense popularity of Fourier’s writings led to the universal adoption of the misnomer “Newton’s method” for the Newton-Raphson process.

While some authors use “the Newton–Raphson method,”Add footnote: 202020“Who was Raphson?” (N. Bićanić and K. H. Johnson, 1979) has the popular answer “Raphson was Newton’s programmer” (S. Nash, 1992). the name Simpson—though appropriate212121“[…] it would seem that the Newton–Raphson–Simpson method is a designation more nearly representing the facts of history in reference to this method” (Ypma [101]). “I found no source which credited Simpson as being an inventor of the method. None the less, one is driven to conclude that neither Raphson, Halley nor anyone else prior to Simpson applied fluxions to an iterative approximation technique” (Kollerstrom [55]).—is only occasionally encountered (e.g., [59], [34], [41], [39]). In a few cases “Newton–Kantorovich” appears, but refers to the mid-1900 contributions of Kantorovich to semilocal convergence results when F is defined on normed spaces.

4.1.2 Attainable 𝑪-Orders

The method attains at least C-order 2 for simple solutions. Regarding rigorous convergence results, “Fourier appears to have been the first to approach this question in a note entitled Question d’analyse algébrique (1818) […], but an error was found in his formula,” as noted by Chabert et al. [23, sect. 6.4], who continue by noting that

Cauchy studied the subject from 1821 onwards […], but did not give a satisfactory formulation until 1829. […] The concern for clarity and rigour, which is found in all of Cauchy’s work, tidies up the question of the convergence of Newton’s method for the present.

Theorem 4.2 (cf. [22], [23, Thm. II, p. 184]).

If x is simple and f′′(x)0, then the Newton method converges locally, with C-quadratic order:

limk|xxk+1||xxk|2=|f′′(x)2f(x)|=:Q2. (23)
Remark 4.3.

The usual proofs for eq. 23 consider the Taylor developments of f(xk) and f(xk), assuming small errors and neglecting “higher-order terms”:

k+1=kf(x)+f(x)k+f(x)+f′′(x)k+f′′(x)2f(x)k2(k:=xkx, i.e., signed). (24)
Exercise 4.4.

(a) Write the Newton iterates for computing 2 (cf. [62, p. 138]).

(b) (Newton everywhere [96, sect. 17.9]) Apply eq. 21 to compute the division a/b of real numbers in terms of only the operations +,,* (cf. [62, p. 124]).

Remark 4.5.

(a) The usual perception is that the smaller Q2, the faster the asymptotic speed; expression eq. 23 shows that for the Newton method this is realized with smaller |f′′| and larger |f|, and this means that near x, the graph of f resembles a line (|f′′| small) having large angle with Ox (the line is not close to Ox).222222In the opposite situation, when the angle is zero (x is multiple), the convergence is only linear. While this interpretation holds locally for the asymptotic range of the Newton iterates, it is worth mentioning that the larger the region with such a property, the larger the attraction set (see also eq. 32), and the better conditioned the problem of solving f(x)=0 (see section 4.1.6).

We note that taking 2f(x) instead of f(x), despite doubling f(x), does not reduce Q2, leading in fact to the same iterates (this property is called affine invariance; see [34]).

(b) Of course, it would be of major interest to state a conclusion like “the smaller Q2, the fewer the iterates needed.” However, even if supported by a number of numerical examples, such a statement should be treated with great care. The reason is simple: we do not address similar objects (Q2 is an asymptotic constant, referring to k, while the desired conclusion refers to a finite number of steps). Moreover, when we say “the smaller Q2,” this means that we consider different mappings f; even if they have the same x, the attraction set and the dynamics of the iterates may be different.

The role of the asymptotic constants will be analyzed in the forthcoming work [21].

Let us analyze the expectation that this method behaves similarly for the same Q2.

Example 4.6.

(a) Let a be given and

f(x):=x+x2+ax3=0, with x=0.

We get f(0)=1 and f′′(0)=2, so by eq. 23, Q2=1 (a given).

The full Taylor series gives the exact relation for the signed errors from eq. 24:

k+1=1+2ak1+2k+3ak2k2. (25)

Take x0=0.0001(=e0) fixed. By eq. 24, we should obtain |1|=e1e02=108 for any a>0. In table 1 we see this does not hold when a has large values.

As |a| grows, despite being “small,” the error e0 in fact is not “squared,” as suggested by eq. 24. Indeed, when |a| is very large, in eq. 24 the terms with larger errors (corresponding to f′′′(x)=6a) are actually neglected, despite their having higher powers.

(b) Taking x0=0.0001 for f as above and then for f(x)=x+2x2 shows that Q2 and the number of iterates needed (for the same accuracy) are not in direct proportion.

Table 1: e1=|1| computed in two equivalent ways, for different a’s ( binary64 , Julia).
a 1(=x1) 1 by eq. 25
1 9.999999700077396e-9 9.999999700059996e-9
10 1.0017993395925492e-8 1.0017993395922797e-8
105 2.0933014354066327e-7 2.093301435406699e-7
106 1.9510774606872468e-6 1.951077460687245e-6
107 1.5389940009229362e-5 1.5389940009229348e-5
Remark 4.7.

Igarashi and Ypma [46] noticed that a smaller Q2 does not necessarily attract fewer iterates for attaining a given accuracy. However, the numerical examples used to support this claim were based not on revealing the individual asymptotic regime of the sequences, but on the number of iterates required to attain a given ( binary64 ) accuracy.

If f′′(x)=0, then eq. 23 implies Q-superquadratic order, but actually the order is higher: it is given by the first index2 of the nonzero derivative at the solution.

Theorem 4.8 (see, e.g., [37]).

Let x be simple. The Newton method converges locally with C-order p0, p02, iff

f(p0)(x)0andf′′(x)==f(p01)(x)=0,

in which case

Qp0=p01p0!|f(p0)(x)f(x)|.

4.1.3 Examples for the Attained 𝑪-Orders

We consider decreasing orders.

(a) Infinite C-order. If f(x)=ax+b, a0, then x1=x x0.

(b) Arbitrary natural C-order p02 (see remark 4.13 for p0).

Example 4.9.

Let p2 be a natural given number and consider

f(x)=x+xp,x=0,p=2,3,4[83].

By theorem 4.8, we obtain local convergence with C-order p and Qp=p1.

Let us first deal with p=2. In double precision, for x0=0.46 we obtain six nonzero iterates (shown truncated in table 2). In analyzing QL(k1) (shown with the last six decimal digits truncated), we note that QL(4) is a better approximation than the last one, QL(5). This phenomenon may appear even if we increase the precision (see Figure 12).

When x0=0.47 and 0.45 we obtain five nonzero iterates (see 4.11), and QL(k) is increasingly better.

Table 2: Newton iterates for x+x2=0 ( binary64 , Julia).
k xk QL(k1) xk QL(k1) xk QL(k1)
0 0.47 0.46 0.45
1 1.1e1 2.877706159 1.1e01 2.840052802 1.0e1 2.803816781
2 1.0e2 2.094428776 9.9e03 2.090320979 9.3e3 2.086305525
3 1.0e4 2.004592994 9.7e05 2.004275304 8.6e5 2.003972042
4 1.1e8 2.000023942 9.4e09 2.000021019 7.4e9 2.000018386
5 1.4e16 2.000000001 8.8e17 2.000000001 5.4e17 2.000000001
6 0 1.2e32 1.987981962 0
7 0 0

For p=3 and 4, we obtain four nonzero iterations for the three choices of x0, and therefore a single meaningful term in computing {QΛ(k)}.

Remark 4.10.

The higher the convergence order, the faster fl(x) may be reached, and we might end up with fewer distinct terms.

Higher precision is needed in this study. We have chosen to use the Julia language here, which allows not only quadruple precision (digits128, compliant with IEEE 754-2008) but arbitrary precision as well (by using the setprecision command). Some examples are posted at https://github.com/ecatinas/conv-ord and may be easily run online (e.g., by IJulia notebooks running on Binder).232323Such examples may be also performed in MATLAB [58] (by choosing the Advanpix [1] toolbox, which handles arbitrary precision), or in other languages.

In fig. 11 we show the resulting QL(k1), QΛ(k2), QL(k2), QΛ(k3) (in order to use only the information available at step k).

We see that all four (computational) convergence orders approach the corresponding p’s, but we cannot clearly distinguish their speed.

In fig. 12 we present |QΛ(k2)p| and |QL(k1)p|.

Refer to caption
Figure 11: Convergence orders: the Newton iterates for f[p](x)=x+xp=0, p=2,3,4.
Refer to caption
Figure 12: |QΛ(k2)p|, |QL(k1)p| (Newton iterates, f[p](x)=x+xp=0).
Exercise 4.11.

Examining the numerator and denominator in the Newton corrections, give the two different reasons why x7 is 0 in table 2 (when x0=0.47 and 0.45). When is realmin or machine epsilon (or both) too large?

(c) C-order p(1,2). The C-order 2 is lost with the smoothness of f (when f′′).

Example 4.12 ([67, E 10.2-4], [31]).

The Newton method has C-order 1+α for

f(x)=x+|x|1+α,α(0,1) given,x=0.
Remark 4.13.

Modifying the above example by taking f(x)=x+|x|p0 shows that the Newton method may in fact attain any real nonnatural order p0(2,+).

(d) C-linear order. This appears for a multiple solution x, which is implicitly assumed to be isolated (f(x)0 xx in a neighborhood of x).242424Otherwise, an iterative method would not be able to converge to a specific solution we were interested in (take, e.g., the constant function f(x)=0 with as the solution set, or f(x)=xsin1x, x0, f(0)=0). Indeed, if the multiplicity of x is q,

f(x)=f(x)==f(q1)(x)=0,f(q)(x)0 (26)

(i.e., f(x)=(xx)qg(x), g(x)0), then (see, e.g., [67, E 10.2-8], [36], [41])

Q1=11q. (27)

If q is known, one may use the Schröder formula [68] to restore the C-order 2:

xk+1=xkqf(xk)f(xk),with Q2=|f(q+1)(x)|q(q+1)|f(q)(x)| (see, e.g., [68, (8.13)]).
Exercise 4.14.

(a) Calculate the Newton and Schröder iterates for f(x)=x2.

(b) The Schröder iterates are the Newton iterates for f(x)q=(xx)g(x)q.

Approximation of the unknown q is surveyed in [60, sect. 7.9]; see also [81, sect. 6.6.2].

(e) C-sublinear order.

Exercise 4.15.

Let f(x)=e1/x2, x0, f(0)=0. Show that the Newton method converges locally to x=0, with Q1({xk})=1, i.e., C-sublinearly.

4.1.4 Convexity

The influence of x0 and the convexity of f were first analyzed by Mourraille (1768) [23, sect. 6.3], in an intuitive fashion.

In 1818, Fourier ([38], available on the Internet) gave the condition

f(x0)f′′(x0)>0, (28)

which ensures the convergence of the Newton method, provided f maintains the monotony and convexity in the interval determined by x and x0. This condition can lead to sided convergence intervals (i.e., not necessarily centered at x).

Clearly, the method may converge even on the whole axis, a result which is more interesting to consider in n (cf. the Baluev theorem [66, Thm. 8.3.4]).

Remark 4.16.

(a) By eq. 24, the signs of the errors are periodic with period either 1 (i.e., monotone iterates) or 2 (kk0).

(b) The Fourier condition is sufficient for convergence but not also necessary; the iterates may converge locally (by theorem 4.2) even when eq. 28 does not hold: take, e.g., f(x)=sin(x), x=0, with the signs of errors alternating at each step (kk0).

4.1.5 Attraction Balls

Estimations for Br(x)={xD:|xx|<r} =(xr,x+r) as V in definition 4.1 are usually given using the condition

|f(x)f(y)|L|xy|x,yD. (29)

The Lipschitz constant L of f measures the nonlinearity of f (i.e., how much the graph of f is bent); its exact bound shows this is in accordance with remark 4.5, since

LsupxD|f′′(x)|. (30)

The usual assumptions also require

|f(x)|1β, (31)

and the Dennis–Schnabel [32], Rheinboldt [84], and Deuflhard–Potra [35] estimations are, respectively,

r=12βL,r=23βL,and r=2ω, (32)

where ω satisfies |f(x)1||f(x+t(yx))f(x)|tω|yx| t(0,1), x,yD, i.e., it can be viewed as a combination of both L and β (see also [2]).

We note that eq. 32 estimates a small r for example 4.6, as L is large.

4.1.6 Floating Point Arithmetic

Curious behavior may arise in this setting.

The derivative f measures the sensitivity in computing f(x) using x~x instead: |f(x)f(x~)||f(x)||xx~|. This shows that for large |f(x)|, the absolute error in x may be amplified. When the relative error in x is of interest, the condition number becomes |f(x)||x|: |f(x)f(x~)||f(x)||x||xx~||x|.

For the relative errors of f, the condition numbers are given by [69], [42, sects. 1.2.6 and 5.3]

|f(x)f(x~)||f(x)||xf(x)||f(x)||xx~||x|or|f(x)f(x~)||f(x)||f(x)||f(x)||xx~|,

which show that the relative errors in f(x) may be large near a solution x.

However, as shown in [42, sects. 1.2.6 and 5.3] and [81, Ex. 2.4, sect. 6.1], the problem of solving an equation has the conditioning number 1/|f(x)|, which is equal to the inverse of the conditioning number for computing f(x).

When |f(x)| (i.e., |f(x)|) is not small, Shampine, Allen, and Pruess argue that the Newton method is “stable at limiting precision” [89, p. 147], as the Newton corrections f(x)f(x) are small.

When |f(x)| is small (e.g., x is multiple), notable cancellations may appear in computing f and prevent the usual graphical interpretation. The (expanded) polynomial (x2)9 of Demmel [28, p. 8] computed in double precision is well known.

Even simpler examples exist: f(x)=x32x2+43x827(=(x23)3) (Sauer [86, p. 45]) and f(x)=x3e3x3x2e2x+3xex1(=(ex1)3), x0.56 (Shampine, Allen, and Pruess [89, Fig. 1.2, p. 23]).

|f| not being small is no guarantee that things won’t get weird due to other reasons.

Quiz 4.17 (Wilkinson; see [89, Ex. 4.3]).

Let f(x)=x201, x=1, and (here) x0=10. Why, in double precision, are the (rounded) Newton iterates: x1=9.5,x2=9.0,x3=8.6,x4=8.1, etc.? Take 0=±0.001 and compute x1 and e1.

4.1.7 Nonlinear Systems in 𝑵

When Newton-type methods are used in practice to solve nonlinear systems in N, obtaining high Q-orders becomes challenging. As the dimension may be huge (N=106 and even higher) the main difficulty resides—having solved the representation of the data in memory252525One can use Newton–Krylov methods, which do not require storage of the matrices F(x), but instead computation of matrix-vector products F(x)v, further approximated by finite differences [12].—in accurately solving the resulting linear systems eq. 22 at each step. In many practical situations, superlinear convergence may be considered satisfactory.

The most common setting allows the systems eq. 22 to be only approximately solved (the corrections verify F(xk)sk=F(xk)+rk, rk0), leading to the inexact Newton method; its superlinear convergence and the orders p(1,2] were characterized by Dembo, Eisenstat, and Steihaug in [27], the approach being easily extended by us for further sources of perturbation in [14], [17].

The quasi-Newton methods form another important class of Newton-type iterations; they consider approximate Jacobians, in order to produce linear systems that are easier to solve [31]. Their superlinear convergence was characterized by Dennis and Moré [30].

The inexact and quasi-Newton methods are in fact tightly connected models [17].

4.2 The Secant Method

This is a two-step method, requiring x0x1D,

xk+1=xkf(xk)f(xk)f(xk1)xkxk1=xk1f(xk)xkf(xk1)f(xk)f(xk1),k=1,2,, (33)

but none of the above formulae is recommended for programming (see section 4.2.5).

4.2.1 History

Though nowadays the secant method is usually seen as a Newton-type method with derivatives replaced by finite differences (i.e., a quasi-Newton method) or an inverse interpolatory method, this is not the way it first appeared. The roots of this method can be traced back to approximately the same time as that of the Newton method, the 18th century B.C., found on Babylonian clay tablets and on the Egyptian Rhind Papyrus [23], [70], when it was used as a single iteration for solving linear equations.262626Such equations appear trivial these days, but their solution required ingenuity back then: the decimal system and the zero symbol were used much later, in the 800s (see, e.g., [61, pp. 192–193]). During these times and even later, the terminology for such a noniterated form was “the rule of double false position” (see, e.g., [70]); indeed, two initial “false positions” x0,x1 yield in one iteration the true solution of the linear equation.

Heron of Alexandria [33] approximated the cubic root of 100 by this method (as revealed by Luca and Păvăloiu [57], it turns out that it can be seen as applied equivalently to x2100x=0).

While reinvented in several cultures (in China,272727Jiuzhang Suanshu (Nine Chapters on the Mathematical Art) [23, sect. 3.3]. India,282828For example, in the work of Madhava’s student Paramesvara [73], [74], [23, sect. 3.4]. Arab countries, and Africa,292929al-Ṭūsī, al-Kāshī, al-Khayyām (see [82], [101], [23, sect. 3.5]). sometimes even for 2-by-2 systems of equations), its use as an iterative method was first considered by Cardano (1545), who called it Regula Liberae Positionis [70] (or regula aurea [82, note 5, p. 200]).

Viète also used a secant-type method, and it appears that Newton (around 1665) independently rediscovered the secant method [70], [101].

4.2.2 Attainable 𝑸-Orders

The standard convergence result shows the usual convergence with C-order given by the golden ratio λ1:=1+521.618; as noted by Papakonstantinou and Tapia [70], this was first obtained by Jeeves in 1958 (in a technical report, published as [48]).

Theorem 4.18 ([68, Chap. 12, sects. 11–12]).

If x is simple and f′′(x)0, then the secant method has local convergence with C-order λ1=1+52:

limk|xxk+1||xxk|λ1=|f′′(x)2f(x)|λ11=:Qλ1. (34)
Remark 4.19.

The Q-order λ1 follows by considering signed errors (see, e.g., [48])

k+1=[xk1,xk,x;f][xk1,xk;f]kk1,k1(k:=xkx), (35)

and then 2.25. Indeed, the divided differences converge, [xk1,xk;f]f(x), [xk1,xk,x;f]f′′(x)2, and denoting l=|f′′(x)2f(x)|, it follows that in eq. 5 we may take A=lε and B=l+ε for some small ε.

Quiz 4.20.

Find the incompleteness in proving the C-order λ1 in eq. 34 by arguing that

(lε)(ek1λ1ek)1λ1ek1λ1+1λ1(lε)ekek1ekλ1ek+1ekλ1(l+ε)ekek1ekλ1=(l+ε)(ek1λ1ek)1λ1,

1λ1+1λ1=0, and therefore Qλ1 satisfies Qλ1=lQλ11λ1, i.e., is given by eq. 34.

Remark 4.21.

By eq. 35, the signs of the errors (and of f(xk) as well) are periodic with period at most 3 (kk0), as noted by Neumaier [64, Cor. 5.1.3].

Higher Q- (but not necessarily C-)orders may be attained.

Theorem 4.22 (Raydan [83]).

If x is simple and q1 is the first index such that f(q+1)(x)0, then the secant method converges locally with Q-order given by

λq=1+1+4q2

and obeys

limek+1ekek1q=|f(q+1)(x)|(q+1)!|f(x)|.

Moreover, specific results hold in the following cases, noting the attained orders:

  • q=1: C-order λ1;

  • q=2: exact Q-order λ2=2 (but not necessarily C-order 2);

  • q=3: Q-order λ32.3 (but not necessarily exact Q-order λ3).

Remark 4.23.

When q2, the secant iterates may have only Q- but no C-order, despite ek+1ekek1q converging to a finite nonzero limit (see 2.25).

Example 4.24 (cf. [83]).

Let f[q+1](x)=x+xq+1, x0=0.5, x1=1(q+1)λq, q=1,2,3, and compute the secant iterates (by eq. 36).

The assertions of theorem 4.22 are verified in this case, since |QL(k1)λq| and |QΛ(k2)λq|, computed using setprecision(500), tend to 0 (see fig. 13).

Refer to caption
Figure 13: |QΛ(k2)λq|, |QL(k1)λq| (secant iterates, f(x)=x+x1+q=0).

In fig. 14 we plot Qλq(k) when q=1,2, and 3.

We see that Qλ1(k) tends to 1.

For q=2, Qλ2(k) oscillates (suggesting Q¯λ2(k)0.57, Q¯λ2(k)1.74).

For q=3, Q¯λ3(k) tends to infinity (as rigorously proved by Raydan for q3). For this particular data it appears that Q¯λ3=0 (which remains to be rigorously proved). In order to see it more clearly, we have also used further increased precision (setprecision(5000)), providing confirmation.

When q=3, Q¯λ3= or Q¯λ3=0 do not necessarily hold for any x0,x1: they do not hold (numerically) for x0=0.5,x1=0.25 (but they do hold, e.g., for x0=1,x1=0.5).

The higher the order, the faster the iterates attain fl(x). We see in fig. 14 that setprecision(500) allows 14 iterates when q=1 and only 7 iterates when q=3 (increased to 11 for setprecision(5000)).

Refer to caption
Figure 14: Qλq(k) (secant iterates, f[1+q](x), setprecision(500), resp., (5000)).

4.2.3 Multiple Solutions

Not only the high orders, but even very local convergence may be lost if x is not simple.

Example 4.25 (cf. [41]).

If f(x)=x2, x0=ε,x1=ε, x2 in eq. 36 (ε>0).

Assuming that the secant method converges, only a linear rate is obtained [95]; Díez [36] showed that if the multiplicity of x in eq. 26 is

  • q=2, then Q1=1λ1=λ110.618;

  • q3, then Q1 is the 0<λ<1 solution of xq+xq11=0.

Also, if the iterates verify at each step |f(xk)|<|f(xk1)| (by swapping), one obtains superlinear convergence for the root secant method of Neumaier [64, p. 241]:

xk+1=xkxkxk11f(xk1)/f(xk).

For other references on this topic, see [60, sect. 7.9] and [91, Ex. 1.9].

4.2.4 Attraction Balls

Estimates for the radius of an attraction ball were given by Liang [56]: if eq. 29 and |[x,y,z;f]|K x,y,zD hold, then

r=13βK.

When using a specific x1 in the secant method (x0Br arbitrary, x1=x0f(x0)f(x0)), estimates similar to eq. 32 were obtained: if eq. 31 and eq. 30, then

r=23βL(see [56]).

4.2.5 Floating Point Arithmetic

The two expressions in eq. 33 are not recommended for programming in an ad literam fashion, respectively, at all; indeed, the first one should be written instead as

xk+1=xkf(xk)(xkxk1)f(xk)f(xk1),k=1,2,, (36)

while the second one can lead, close to x, to cancellations when f(xk) and f(xk1) have the same sign [26, p. 228], [60, p. 4].

Further equivalent formulae are

xk+1=xkxkxk11f(xk1)f(xk),k=1,2,,

and (Vandergraft [97, p. 265])

xk+1=xk[(xk1xk)(f(xk)f(xk1))]/(1f(xk)f(xk1)),

which requires |f(xk)f(xk1)|<1 at each step (by swapping the iterates).

It is worth noting, however, that swapping the consecutive iterates in order to obtain |f(xk)|<|f(xk1)| leads to iterates that may have a different dynamic than is standard.

The secant method may not be stable at limiting precision, even for large |f(x)| (Shampine, Allen, and Pruess [89, p. 147]), and higher precision may be needed.

4.3 Successive Approximations for Fixed Point Problems

We consider a sufficiently differentiable mapping g:DD, D, and the fixed point problem

g(x)=x.

A solution x is called a fixed point of g; x is further called an attraction fixed point when one has local convergence for the successive approximations

xk+1=g(xk),k0.

4.3.1 History

Assuming that the computation of 2 with five (decimal) digits is highly improbable in the absence of an iterative method, it is likely that successive approximations date back to at least 1800 B.C. (considering the Newton iterates as particular instances of successive approximations).

Regarding the first use of iterates not connected to the Newton or secant-type iterates, they seem have appeared in the fifth and sixth centuries in India (see Plofker [74]; in this paper the author describes as an example the Brahmasphutasiddhanta of Brahmagupta from the seventh century).

Later occurrences are mentioned for Ibn al-Banna as cited in [23, sect. 7.3], al-Ṭūsī [23, sect. 7.4], Viète [23, sect. 7.5], Parameśvara (aviśeṣa) [73], al-Kāshī [23, sect. 7.5], and Ḥabash al-Ḥāsib al-Marwazi [52], [3].

Further occurrences of such iterates appear for Kepler (1618–1622) in [23, sect. 7.6], Gregory (1672 [101]; 1674 [3]), Dary (1674) [3], and Newton (1674) [101].

4.3.2 Local Convergence, Attainable 𝑪-Orders

The first part of the following local convergence result was first obtained by Schröder in 1870 [87]. The extension to mappings in N implied the analysis of the eigenvalues of the Jacobian at the fixed point, which is an important topic in numerical analysis.303030Given HN×N and the linear system x=Hx+c, the following result was called by Ortega [66, p. 118] the fundamental theorem of linear iterative methods: If x=Hx+c has a unique solution x, then the successive approximations converge to x x0N iff the spectral radius ρ(H) is <1. The standard formulation of the result was first given by Ostrowski in 1957, but Perron had essentially already obtained it in 1929 (see [67, NR 10.1-1]).

Theorem 4.26.
  • (a)

    ([67, Thm. 10.1.3], [68]) Let g be continuous on D and differentiable at the fixed point xD. If

    |g(x)|<1,

    then x is an attraction fixed point.

  • (b)

    ([37]) Conversely, if x is an attraction fixed point, then |g(x)|1.

Remark 4.27 (see [67, E 10.1-2]).

Condition |g(x)|1 is sharp: by considering |g(x)|=1 and taking g(x)=xx3, x=0 is an attraction fixed point, while if g(x)=x+x3, the same fixed point x is no longer of attraction.

The higher orders attained are characterized as follows.

Theorem 4.28 (see, e.g., [68, Chap. 4, sect. 10, p. 44], [95], [37]).

Let p02 be an integer and x a fixed point of g. Then xk+1=g(xk)x with C-order p0 iff

g(x)=g′′(x)==g(p01)(x)=0andg(p0)(x)0,

in which case

limk|xxk+1||xxk|p0=|g(p0)(x)|p0!=:Qp0. (37)
Example 4.29 ([67, E 10.1-10]).

Any C-order p0>1 is possible: let g(x)=xp0.

The C-sublinear order may be attained too.

Exercise 4.30.

Let g(x)=xx3 and xk+1=g(xk)0; then Q1({xk})=1.

The successive approximations are usually regarded as the most general iterations, but actually, they may also be seen as instances of an inexact Newton method [15]; such an approach allows us to study acceleration techniques in a different setting.

4.3.3 Attraction Balls

The following result holds.

Theorem 4.31 ([19]).

Let x be an attraction fixed point of g for which

|g(x)|q<1.

Suppose there exist r1>0, L>0 such that g is differentiable on Br1, with g Lipschitz continuous of constant L, and denote

r=min{r1,2(1q)L}. (38)

Then, x0Br, letting t=L2|x0x|+q<1 we get

|xk+1x|t|xkx|,k0,

which implies that xk+1=g(xk)Br and xk+1=g(xk)x.

Remark 4.32.

It is important to note that this result does not require g to be a contraction on D (i.e., L<1), in contrast to the classical Banach contraction theorem.

The center-Lipschitz condition (i.e., x instead of y in eq. 29) is an improvement [24].

The estimate eq. 38 may be sharp in certain cases.

Example 4.33 ([19]).

Let g(x)=x2 with x=0, g(x)=0, r1>0 arbitrary, and L=2. Then eq. 38 gives the sharp r=r2=1, since x=1 is another fixed point.

We observe that |g(x)|=2 at |xx|=r=1, which upholds remark 4.32.

Regarding the general case of several dimensions with G:DND, we note that the trajectories with high convergence orders are characterized by the zero eigenvalue of G(x) and its corresponding eigenvectors (see [16]).

Remark 4.34 ([18, Ex. 2.8]).

definition 4.1 allows a finite number of xkV (regardless of how “small” V is chosen to be): take, e.g., G(x)=Ax, A=(02180), V=Br(0).

5 Conclusions

Various aspects of convergence orders have been analyzed here, but rather than providing a final view, we see this work as a fresh start; in [21] we pursue the study of the following themes: the characterizations of the C-order, the connections between the Q- and R-orders and the big Oh rates, the asymptotic constants (with their immediate computational variants), the convergence orders of the discretization schemes (to mention a few).

6 Answers to Quizzes

1.9: {x˙k}={122k}={x̊k} in different windows.

2.7: (a) Q1=101; (b) Q1=102; (c) 10xk, {xk} the Fibonacci sequence; (d) Q2=1.

3.4: unbounded (take f(x)=x+ax2).

4.17: fl(10201)=fl(1020), in double precision.

4.20: the limit Qλ1 is not proved to exist; see also theorem 4.22.

Acknowledgments

I am grateful to two referees for constructive remarks which helped to improve the manuscript, particularly the one who brought the Julia language to my attention; also, I am grateful to my colleagues M. Nechita and I. Boros, as well as to S. Filip for useful suggestions regarding Julia.

References

  • [1] Advanpix team, Advanpix, Version 4.7.0.13642, 2020, https://www.advanpix.com/ (accessed 2021/04/07).
  • [2] I. K. Argyros and S. George, On a result by Dennis and Schnabel for Newton’s method: Further improvements, Appl. Math. Lett., 55 (2016), pp. 49–53, https://doi.org/10.1016/j.aml.2015.12.003.
  • [3] D. F. Bailey, A historical survey of solution by functional iteration, Math. Mag., 62 (1989), pp. 155–166, https://doi.org/10.1080/0025570X.1989.11977428.
  • [4] W. A. Beyer, B. R. Ebanks, and C. R. Qualls, Convergence rates and convergence-order profiles for sequences, Acta Appl. Math., 20 (1990), pp. 267–284, https://doi.org/10.1007/bf00049571.
  • [5] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: A fresh approach to numerical computing, SIAM Rev., 59 (2017), pp. 65–98, https://doi.org/10.1137/141000671.
  • [6] R. P. Brent, Algorithms for Minimization without Derivatives, Prentice-Hall, Englewood Cliffs, NJ, 1973.
  • [7] R. P. Brent, S. Winograd, and P. Wolfe, Optimal iterative processes for root-finding, Numer. Math., 20 (1973), pp. 327–341, https://doi.org/10.1007/bf01402555.
  • [8] C. Brezinski, Comparaison des suites convergentes, Rev. Française Inform. Rech. Opér., 5 (1971), pp. 95–99.
  • [9] C. Brezinski, Accélération de la Convergence en Analyse Numérique, Springer-Verlag, Berlin, 1977.
  • [10] C. Brezinski, Limiting relationships and comparison theorems for sequences, Rend. Circolo Mat. Palermo Ser. II, 28 (1979), pp. 273–280, https://doi.org/10.1007/bf02844100.
  • [11] C. Brezinski, Vitesse de convergence d’une suite, Rev. Roumaine Math. Pures Appl., 30 (1985), pp. 403–417.
  • [12] P. N. Brown, A local convergence theory for combined inexact-Newton/finite-difference projection methods, SIAM J. Numer. Anal., 24 (1987), pp. 407–434, https://doi.org/10.1137/0724031.
  • [13] F. Cajori, Historical note on the Newton-Raphson method of approximation, Amer. Math. Monthly, 18 (1911), pp. 29–32, https://doi.org/10.2307/2973939.
  • [14] E. Cătinaş, Inexact perturbed Newton methods and applications to a class of Krylov solvers, J. Optim. Theory Appl., 108 (2001), pp. 543–570, https://doi.org/10.1023/a:1017583307974.
  • [15] E. Cătinaş, On accelerating the convergence of the successive approximations method, Rev. Anal. Numér. Théor. Approx., 30 (2001), pp. 3–8, https://ictp.acad.ro/accelerating-convergence-successive-approximations-method/ (accessed 2021/04/07).
  • [16] E. Cătinaş, On the superlinear convergence of the successive approximations method, J. Optim. Theory Appl., 113 (2002), pp. 473–485, https://doi.org/10.1023/a:1015304720071.
  • [17] E. Cătinaş, The inexact, inexact perturbed and quasi-Newton methods are equivalent models, Math. Comp., 74 (2005), pp. 291–301, https://doi.org/10.1090/s0025-5718-04-01646-1.
  • [18] E. Cătinaş, Newton and Newton-type Methods in Solving Nonlinear Systems in N, Risoprint, Cluj-Napoca, Romania, 2007, https://ictp.acad.ro/methods-of-newton-and-newton-krylov-type/ (accessed on 2021/6/9).
  • [19] E. Cătinaş, Estimating the radius of an attraction ball, Appl. Math. Lett., 22 (2009), pp. 712–714, https://doi.org/10.1016/j.aml.2008.08.007.
  • [20] E. Cătinaş, A survey on the high convergence orders and computational convergence orders of sequences, Appl. Math. Comput., 343 (2019), pp. 1–20, https://doi.org/10.1016/j.amc.2018.08.006.
  • [21] E. Cătinaş, How Many Steps Still Left to x*? Part II. Newton Iterates without f and f, manuscript, 2020.
  • [22] A. Cauchy, Sur la determination approximative des racines d’une equation algebrique ou transcendante, Oeuvres Complete (II) 4, Gauthier-Villars, Paris, 1899, 23 (1829), pp. 573–609, http://iris.univ-lille.fr/pdfpreview/bitstream/handle/1908/4026/41077-2-4.pdf?sequence=1 (accessed 2021/04/07).
  • [23] J.-L. Chabert, ed., A History of Algorithms. From the Pebble to the Microchip, Springer-Verlag, Berlin, 1999.
  • [24] J. Chen and I. K. Argyros, Improved results on estimating and extending the radius of an attraction ball, Appl. Math. Lett., 23 (2010), pp. 404–408, https://doi.org/10.1016/j.aml.2009.11.007.
  • [25] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust-Region Methods, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719857.
  • [26] G. Dahlquist and Å. Björk, Numerical Methods, Dover, Mineola, NY, 1974.
  • [27] R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal., 19 (1982), pp. 400–408, https://doi.org/10.1137/0719025.
  • [28] J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971446.
  • [29] J. E. Dennis, On Newton-like methods, Numer. Math., 11 (1968), pp. 324–330, https://doi.org/10.1007/bf02166685.
  • [30] J. E. Dennis, Jr., and J. J. Moré, A characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comp., 28 (1974), pp. 549–560, https://doi.org/10.1090/s0025-5718-1974-0343581-1.
  • [31] J. E. Dennis, Jr., and J. J. Moré, Quasi-Newton methods, motivation and theory, SIAM Rev., 19 (1977), pp. 46–89, https://doi.org/10.1137/1019005.
  • [32] J. E. Dennis, Jr., and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, PA, 1996, https://doi.org/10.1137/1.9781611971200.
  • [33] G. Deslauries and S. Dubuc, Le calcul de la racine cubique selon Héron, Elem. Math., 51 (1996), pp. 28–34, http://eudml.org/doc/141587 (accessed 2020/02/18).
  • [34] P. Deuflhard, Newton Methods for Nonlinear Problems. Affine Invariance and Adaptive Algorithms, Springer-Verlag, Heidelberg, 2004.
  • [35] P. Deuflhard and F. A. Potra, Asymptotic mesh independence of Newton–Galerkin methods via a refined Mysovskii theorem, SIAM J. Numer. Anal., 29 (1992), pp. 1395–1412, https://doi.org/10.1137/0729080.
  • [36] P. Díez, A note on the convergence of the secant method for simple and multiple roots, Appl. Math. Lett., 16 (2003), pp. 1211–1215, https://doi.org/10.1016/s0893-9659(03)90119-4.
  • [37] F. Dubeau and C. Gnang, Fixed point and Newton’s methods for solving a nonlinear equation: From linear to high-order convergence, SIAM Rev., 56 (2014), pp. 691–708, https://doi.org/10.1137/130934799.
  • [38] J. Fourier, Question d’analyse algébrique, Bull. Sci. Soc. Philo., 67 (1818), pp. 61–67, http://gallica.bnf.fr/ark:/12148/bpt6k33707/f248.item (accessed 2021-04-07).
  • [39] W. Gautschi, Numerical Analysis, 2nd ed., Birkhäuser/Springer, New York, 2011, https://doi.org/10.1007/978-0-8176-8259-0.
  • [40] M. Grau-Sánchez, M. Noguera, and J. M. Gutiérrez, On some computational orders of convergence, Appl. Math. Lett., 23 (2010), pp. 472–478, https://doi.org/10.1016/j.aml.2009.12.006.
  • [41] A. Greenbaum and T. P. Chartier, Numerical Methods. Design, Analysis, and Computer Implementation of Algorithms, Princeton University Press, Princeton, NJ, 2012.
  • [42] M. T. Heath, Scientific Computing. An Introductory Survey, 2nd ed., SIAM, Philadelphia, PA, 2018, https://doi.org/10.1137/1.9781611975581.
  • [43] M. Heinkenschloss, Scientific Computing. An Introductory Survey, Rice University, Houston, TX, 2018.
  • [44] J. Herzberger and L. Metzner, On the Q-order of convergence for coupled sequences arising in iterative numerical processes, Computing, 57 (1996), pp. 357–363, https://doi.org/10.1007/BF02252254.
  • [45] N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, Philadelphia, PA, 2002, https://doi.org/10.1137/1.9780898718027.
  • [46] M. Igarashi and T. J. Ypma, Empirical versus asymptotic rate of convergence of a class of methods for solving a polynomial equation, J. Comput. Appl. Math., 82 (1997), pp. 229–237, https://doi.org/10.1016/s0377-0427(97)00077-0.
  • [47] L. O. Jay, A note on Q-order of convergence, BIT, 41 (2001), pp. 422–429, https://doi.org/10.1023/A:1021902825707.
  • [48] T. A. Jeeves, Secant modification of Newton’s method, Commun. ACM, 1 (1958), pp. 9–10, https://doi.org/10.1145/368892.368913.
  • [49] V. J. Katz, A History of Mathematics. An Introduction, 2nd ed., Addison-Wesley, 1998.
  • [50] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, PA, 1995, https://doi.org/10.1137/1.9781611970944.
  • [51] C. T. Kelley, Iterative Methods for Optimization, SIAM, Philadelphia, PA, 1999, https://doi.org/10.1137/1.9781611970920.
  • [52] E. S. Kennedy and W. R. Transue, A medieval iterative algorism, Amer. Math. Monthly, 63 (1956), pp. 80–83, https://doi.org/10.1080/00029890.1956.11988762.
  • [53] D. E. Knuth, Big Omicron and big Omega and big Theta, ACM SIGACT News, 8 (1976), pp. 18–24, https://doi.org/10.1145/1008328.1008329.
  • [54] D. E. Knuth, The TeXbook, Addison-Wesley, 1984.
  • [55] N. Kollerstrom, Thomas Simpson and Newton’s method of approximation: An enduring myth, British J. History Sci., 25 (1992), pp. 347–354, https://doi.org/10.1017/s0007087400029150.
  • [56] K. Liang, Homocentric convergence ball of the secant method, Appl. Math. Chin. Univ., 22 (2007), pp. 353–365, https://doi.org/10.1007/s11766-007-0313-3.
  • [57] D. Luca and I. Păvăloiu, On the Heron’s method for approximating the cubic root of a real number, Rev. Anal. Numér. Théor. Approx., 26 (1997), pp. 103–108, https://ictp.acad.ro/jnaat/journal/article/view/1997-vol26-nos1-2-art15 (accessed 2021-04-07).
  • [58] Mathworks, MATLAB, Version 2019b, 2019, http://www.mathworks.com (accessed 2021/04/07).
  • [59] J. M. McNamee, Numerical Methods for Roots of Polynomials, Part I, Elsevier, Amsterdam, 2007.
  • [60] J. M. McNamee and V. Y. Pan, Numerical Methods for Roots of Polynomials, Part II, Elsevier, Amsterdam, 2013.
  • [61] U. C. Merzbach and C. B. Boyer, A History of Mathematics, 3rd ed., John Wiley & Sons, 2011.
  • [62] J.-M. Muller, N. Brunie, F. de Dinechin, C.-P. Jeannerod, M. Joldes, V. Lefèvre, G. Melquiond, N. Revol, and S. Torres, Handbook of Floating-Point Arithmetic, 2nd ed., Springer, New York, 2018, https://doi.org/10.1007/978-3-319-76526-6.
  • [63] A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, 1983.
  • [64] A. Neumaier, Introduction to Numerical Analysis, Cambridge University Press, 2001.
  • [65] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer, New York, 2006, https://doi.org/10.1007/978-0-387-40065-5.
  • [66] J. M. Ortega, Numerical Analysis. A Second Course, SIAM, Philadelphia, PA, 1990, https://doi.org/10.1137/1.9781611971323.
  • [67] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719468.
  • [68] A. M. Ostrowski, Solution of Equations and Systems of Equations, 2nd ed., Academic Press, New York, 1966; 3rd ed., 1972.
  • [69] M. L. Overton, Numerical Computing with IEEE Floating Point Arithmetic, SIAM, Philadelphia, PA, 2001, https://doi.org/10.1137/1.9780898718072.
  • [70] J. M. Papakonstantinou and R. A. Tapia, Origin and evolution of the secant method in one dimension, Amer. Math. Monthly, 120 (2013), pp. 500–518, https://doi.org/10.4169/amer.math.monthly.120.06.500.
  • [71] I. Păvăloiu, Sur l’ordre de convergence des méthodes d’itération, Mathematica, 23 (46) (1981), pp. 261–265, https://ictp.acad.ro/convergence-order-iterative-methods/.
  • [72] I. Păvăloiu, Optimal algorithms concerning the solving of equations by interpolation, in Research on Theory of Allure, Approximation, Convexity and Optimization, Editura Srima, 1999, pp. 222–248, http://ictp.acad.ro/optimal-algorithms-concerning-solving-equations-interpolation/ (accessed 2021-04-07).
  • [73] K. Plofker, An example of the secant method of iterative approximation in a fifteenth-century Sanskrit text, Historia Math., 23 (1996), pp. 246–256, https://doi.org/10.1006/hmat.1996.0026.
  • [74] K. Plofker, Use and transmission of iterative approximations in India and the Islamic World, in From China to Paris: 2000 Years Transmission of Mathematical Ideas, Y. Dold-Samplonius, J. W. Dauben, M. Folkerts, and B. Van Dalen, eds., Franz Steiner Verlag Wiesbaden GmbH, Stuttgart, 2002, pp. 167–186.
  • [75] E. Polak, Optimization. Algorithms and Consistent Approximations, Springer-Verlag, New York, 1997.
  • [76] F. A. Potra, On Q-order and R-order of convergence, J. Optim. Theory Appl., 63 (1989), pp. 415–431, https://doi.org/10.1007/bf00939805.
  • [77] F. A. Potra, Q-superlinear convergence of the iterates in primal-dual interior-point methods, Math. Program. Ser. A, 91 (2001), pp. 99–115, https://doi.org/10.1007/s101070100230.
  • [78] F. A. Potra, A superquadratic version of Newton’s method, SIAM J. Numer. Anal., 55 (2017), pp. 2863–2884, https://doi.org/10.1137/17M1121056.
  • [79] F. A. Potra and V. Pták, Nondiscrete Induction and Iterative Processes, Pitman, Boston, MA, 1984.
  • [80] F. A. Potra and R. Sheng, A path following method for LCP with superlinearly convergent iteration sequence, Ann. Oper. Res., 81 (1998), pp. 97–114, https://doi.org/10.1023/A:1018942131812.
  • [81] A. Quarteroni, R. Sacco, and F. Saleri, Numerical Mathematics, Springer, Berlin, 2007.
  • [82] R. Rashed, The Development of Arabic Mathematics: Between Arithmetic and Algebra, Stud. Philos. Sci. 156, Springer, Boston, MA, 1994.
  • [83] M. Raydan, Exact order of convergence of the secant method, J. Optim. Theory Appl., 78 (1993), pp. 541–551, https://doi.org/10.1007/BF00939881.
  • [84] W. C. Rheinboldt, An adaptive continuation process for solving systems of nonlinear equations, in Mathematical Models and Numerical Methods, Banach Ctr. Publ. 3, Polish Academy of Science, 1977, pp. 129–142, https://doi.org/10.4064/-3-1-129-142.
  • [85] W. C. Rheinboldt, Methods for Solving Systems of Nonlinear Equations, 2nd ed., SIAM, Philadelphia, PA, 1998, https://doi.org/10.1137/1.9781611970012.
  • [86] T. Sauer, Numerical Analysis, 2nd ed., Pearson, Boston, MA, 2012.
  • [87] E. Schröder, Ueber unendlich viele Algorithmen zur Auflösung der Gleichungen, Math. Ann., 2 (1870), pp. 317–365, https://doi.org/10.1007/bf01444024; available as On Infinitely Many Algorithms for Solving Equations, Tech reports TR–92–121 and TR–2990, Institute for Advanced Computer Studies, University of Maryland, 1992; translated by G. W. Stewart.
  • [88] H. Schwetlick, Numerische Lösung Nichtlinearer Gleichungen, VEB, Berlin, 1979.
  • [89] L. F. Shampine, R. C. Allen, Jr., and S. Pruess, Fundamentals of Numerical Computing, John Wiley and Sons, New York, 1997.
  • [90] T. Steihaug, Computational science in the eighteenth century. Test cases for the methods of Newton, Raphson, and Halley: 1685 to 1745, Numer. Algorithms, 83 (2020), pp. 1259–1275, https://doi.org/10.1007/s11075-019-00724-8.
  • [91] E. Süli and D. Mayers, An Introduction to Numerical Analysis, Cambridge University Press, Cambridge, UK, 2003.
  • [92] T. Tantau, The tikz and pgf Packages. Manual for Version 3.1.5b-34-gff02ccd1, https://github.com/pgf-tikz/pgf.
  • [93] R. A. Tapia, J. E. Dennis, Jr., and J. P. Schäfermeyer, Inverse, shifted inverse, and Rayleigh quotient iteration as Newton’s method, SIAM Rev., 60 (2018), pp. 3–55, https://doi.org/10.1137/15M1049956.
  • [94] R. A. Tapia and D. L. Whitley, The projected Newton method has order 1+2 for the symmetric eigenvalue problem, SIAM J. Numer. Anal., 25 (1988), pp. 1376–1382, https://doi.org/10.1137/0725079.
  • [95] J. F. Traub, Iterative Methods for the Solutions of Equations, Prentice Hall, Englewood Cliffs, NJ, 1964.
  • [96] E. E. Tyrtyshnikov, A Brief Introduction to Numerical Analysis, Birkhäuser/Springer, New York, 1997.
  • [97] J. S. Vandergraft, Introduction to Numerical Computations, 2nd ed., Academic Press, New York, 1983.
  • [98] H. F. Walker, An approach to continuation using Krylov subspace methods, in Computational Science in the 21st Century, J. Periaux, ed., John Wiley and Sons, 1997, pp. 72–81.
  • [99] S. J. Wright, Primal-Dual Interior-Point Methods, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971453.
  • [100] S. J. Wright, Optimization algorithms for data analysis, in The Mathematics of Data, IAS/Park City Math. Ser. 25, AMS, Providence, RI, 2018, pp. 49–97.
  • [101] T. J. Ypma, Historical development of the Newton–Raphson method, SIAM Rev., 37 (1995), pp. 531–551, https://doi.org/10.1137/1037125.

[1] Advanpix team, Advanpix, Version 4.7.0.13642, 2020, https://www.advanpix.com/ (accessed 2020/03/22). (Cited on pp. 4, 26)
[2] I. K. Argyros and S. George, On a result by Dennis and Schnabel for Newton’s method: Further improvements, Appl. Math. Lett., 55 (2016), pp. 49-53, https://doi.org/10.1016/j.aml.2015.12.003. (Cited on p. 28)
[3] D. F. Bailey, A historical survey of solution by functional iteration, Math. Mag., 62 (1989), pp. 155-166, https://doi.org/10.1080/0025570X.1989.11977428. (Cited on pp. 7, 21, 34)
[4] W. A. Beyer, B. R. Ebanks, and C. R. Qualls, Convergence rates and convergence-order profiles for sequences, Acta Appl. Math., 20 (1990), pp. 267-284, https://doi.org/10.1007/bf00049571. (Cited on pp. 8, 9, 11, 14, 17, 19, 21)
[5] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: A fresh approach to numerical computing, SIAM Rev., 59 (2017), pp. 65-98, https://doi.org/10.1137/141000671. (Cited on p. 4)
[6] R. P. Brent, Algorithms for Minimization without Derivatives, Prentice-Hall, Englewood Cliffs, NJ, 1973. (Cited on p. 11)
[7] R. P. Brent, S. Winograd, and P. Wolfe, Optimal iterative processes for root-finding, Numer. Math., 20 (1973), pp. 327-341, https://doi.org/10.1007/bf01402555. (Cited on pp. 13, 15)
[8] C. Brezinski, Comparaison des suites convergentes, Rev. Franҫaise Inform. Rech. Oper., 5 (1971), pp. 95-99. (Cited on pp. 13, 14)
[9] C. Brezinski, Acceleration de la Convergence en Analyse Numerique, Springer-Verlag, Berlin, 1977. (Cited on pp. 2, 13, 14)
[10] C. Brezinski, Limiting relationships and comparison theorems for sequences, Rend. Circolo Mat. Palermo Ser. II, 28 (1979), pp. 273-280, https://doi.org/10.1007/bf02844100. (Cited on p. 19)
[11] C. Brezinski, Vitesse de convergence d’une suite, Rev. Roumaine Math. Pures Appl., 30 (1985), pp. 403-417. (Cited on pp. 11, 13, 18)
[12] P. N. Brown, A local convergence theory for combined inexact-Newton/finite-difference projection methods, SIAM J. Numer. Anal., 24 (1987), pp. 407-434, https://doi.org/10.1137/0724031. (Cited on p. 29)
[13] F. Cajori, Historical note on the Newton-Raphson method of approximation, Amer. Math. Monthly, 18 (1911), pp. 29-32, https://doi.org/10.2307/2973939. (Cited on pp. 22, 23)
[14] A. Cauchy, Sur la determination approximative des racines d’une equation algebrique ou transcendante, Oeuvres Complete (II) 4, Gauthier-Villars, Paris, 1899, 23 (1829), pp. 573-609, http://iris.univ-lille1.fr/pdfpreview/bitstream/handle/1908/4026/41077-2-4.pdf?sequence=1 (accessed 2019/02/01). (Cited on pp. 21, 23)
[15] J.-L. Chabert, ed., A History of Algorithms. From the Pebble to the Microchip, Springer Verlag, Berlin, 1999. (Cited on pp. 21, 22, 23, 28, 30, 34)
[16] J. Chen and I. K. Argyros, Improved results on estimating and extending the radius of an attraction ball, Appl. Math. Lett., 23 (2010), pp. 404-408, https://doi.org/10.1016/j.aml.2009.11.007. (Cited on p. 35)
[17] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust-Region Methods, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719857. (Cited on pp. 14, 17)
[18] E. Catinas, Inexact perturbed Newton methods and applications to a class of Krylov solvers, J. Optim. Theory Appl., 108 (2001), pp. 543-570, https://doi.org/10.1023/a:1017583307974. (Cited on p. 29)
[19] E. Catinas, On accelerating the convergence of the successive approximations method, Rev. Anal. Numer. Theor. Approx., 30 (2001), pp. 3-8, https://ictp.acad.ro/acceleratingconvergence-successive-approximations-method/ (accessed 20/10/10). (Cited on p. 34)
[20] E. Catinas, On the superlinear convergence of the successive approximations method, J. Optim. Theory Appl., 113 (2002), pp. 473-485, https://doi.org/10.1023/a:1015304720071. (Cited on p. 35)
[21] E. Catinas, The inexact, inexact perturbed and quasi-Newton methods are equivalent models, Math. Comp., 74 (2005), pp. 291-301, https://doi.org/10.1090/s0025-5718-04-01646-1. (Cited on p. 29)
[22] E. Catinas, Newton and Newton-type Methods in Solving Nonlinear Systems in RN , Risoprint, Cluj-Napoca, 2007. (Cited on p. 35)
[23] E. Catinas, Estimating the radius of an attraction ball, Appl. Math. Lett., 22 (2009), pp. 712-714, https://doi.org/10.1016/j.aml.2008.08.007. (Cited on p. 35)
[24] E. Catinas, A survey on the high convergence orders and computational convergence orders of sequences, Appl. Math. Comput., 343 (2019), pp. 1-20, https://doi.org/10.1016/j.amc.2018.08.006. (Cited on pp. 7, 8, 10, 11, 12, 14, 15, 16, 17, 18, 20, 21)
[25] E. Catinas, How Many Steps Still Left to x*? Part II. Newton iterates without f and f’, manuscript, 2020. (Cited on pp. 24, 35)
[26] G. Dahlquist and A. Bjork, Numerical Methods, Dover, Mineola, NY, 1974. (Cited on p. 33)
[27] R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal., 19 (1982), pp. 400-408, https://doi.org/10.1137/0719025. (Cited on p. 29)
[28] J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971446. (Cited on pp. 3, 29)
[29] J. E. Dennis, On Newton-like methods, Numer. Math., 11 (1968), pp. 324-330, https://doi.org/10.1007/bf02166685. (Cited on p. 17)
[30] J. E. Dennis, Jr., and J. J. More, A characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comp., 28 (1974), pp. 549-560, https://doi.org/10.1090/s0025-5718-1974-0343581-1. (Cited on pp. 19, 29)
[31] J. E. Dennis, Jr., and J. J. More, Quasi-Newton methods, motivation and theory, SIAM Rev., 19 (1977), pp. 46-89, https://doi.org/10.1137/1019005. (Cited on pp. 27, 29)
[32] J. E. Dennis, Jr., and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, PA, 1996, https://doi.org/10.1137/1.9781611971200. (Cited on pp. 15, 28)
[33] G. Deslauries and S. Dubuc, Le calcul de la racine cubique selon Heron, Elem. Math., 51 (1996), pp. 28-34, http://eudml.org/doc/141587 (accessed 2020/02/18). (Cited on p. 30)
[34] P. Deuflhard, Newton Methods for Nonlinear Problems. Affine Invariance and Adaptive Algorithms, Springer-Verlag, Heidelberg, 2004. (Cited on pp. 23, 24)
[35] P. Deuflhard and F. A. Potra, Asymptotic mesh independence of Newton-Galerkin methods via a refined Mysovskii theorem, SIAM J. Numer. Anal., 29 (1992), pp. 1395-1412, https://doi.org/10.1137/0729080. (Cited on p. 28)
[36] P. Diez, A note on the convergence of the secant method for simple and multiple roots, Appl. Math. Lett., 16 (2003), pp. 1211-1215, https://doi.org/10.1016/s0893-9659(03)90119-4. (Cited on pp. 27, 31)
[37] F. Dubeau and C. Gnang, Fixed point and Newton’s methods for solving a nonlinear equation: From linear to high-order convergence, SIAM Rev., 56 (2014), pp. 691-708, https://doi.org/10.1137/130934799. (Cited on pp. 25, 34)
[38] J. Fourier, Question d’analyse algebrique, Bull. Sci. Soc. Philo., 67 (1818), pp. 61-67, http://gallica.bnf.fr/ark:/12148/bpt6k33707/f248.item (accessed 2018-02-23). (Cited on pp. 7, 28)
[39] W. Gautschi, Numerical Analysis, 2nd ed., Birkhauser/Springer, New York, 2011, https://doi.org/10.1007/978-0-8176-8259-0. (Cited on p. 23)
[40] A. Greenbaum and T. P. Chartier, Numerical Methods. Design, Analysis, and Computer Implementation of Algorithms, Princeton University Press, Princeton, NJ, 2012. (Cited on pp. 23, 27, 31)
[41] M. T. Heath, Scientific Computing. An Introductory Survey, 2nd ed., SIAM, Philadelphia, PA, 2018, https://doi.org/10.1137/1.9781611975581. (Cited on pp. 9, 29)
[42] M. Heinkenschloss, Scientific Computing. An Introductory Survey, Rice University, Houston, TX, 2018. (Cited on pp. 3, 15, 19)
[43] J. Herzberger and L. Metzner, On the Q-order of convergence for coupled sequences arising in iterative numerical processes, Computing, 57 (1996), pp. 357-363, https://doi.org/10.1007/BF02252254. (Cited on p. 13)
[44] N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, Philadelphia, PA, 2002, https://doi.org/10.1137/1.9780898718027. (Cited on p. 3)
[45] M. Igarashi and T. J. Ypma, Empirical versus asymptotic rate of convergence of a class of methods for solving a polynomial equation, J. Comput. Appl. Math., 82 (1997), pp. 229-237, https://doi.org/10.1016/s0377-0427(97)00077-0. (Cited on p. 25)
[46] L. O. Jay, A note on Q-order of convergence, BIT, 41 (2001), pp. 422-429, https://doi.org/10.1023/A:1021902825707. (Cited on pp. 5, 11, 12, 17)
[47] T. A. Jeeves, Secant modification of Newton’s method, Commun. ACM, 1 (1958), pp. 9-10, https://doi.org/10.1145/368892.368913. (Cited on pp. 11, 30)
[48] V. J. Katz, A History of Mathematics. An Introduction, 2nd ed., Addison-Wesley, 1998. (Cited on p. 21)
[49] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, PA, 1995, https://doi.org/10.1137/1.9781611970944. (Cited on pp. 5, 10, 17)
[50] C. T. Kelley, Iterative Methods for Optimization, SIAM, Philadelphia, PA, 1999, https://doi.org/10.1137/1.9781611970920. (Cited on pp. 15, 17)
[51] E. S. Kennedy and W. R. Transue, A medieval iterative algorism, Amer. Math. Monthly, 63 (1956), pp. 80-83, https://doi.org/10.1080/00029890.1956.11988762. (Cited on p. 34)
[52] D. E. Knuth, Big Omicron and big Omega and big Theta, ACM SIGACT News, 8 (1976), pp. 18-24, https://doi.org/10.1145/1008328.1008329. (Cited on pp. 2, 14)
[53] D. E. Knuth, The TeXbook, Addison-Wesley, 1984. (Cited on p. 4)
[54] N. Kollerstrom, Thomas Simpson and Newton’s method of approximation: an enduring myth, British J. History Sci., 25 (1992), pp. 347-354, https://doi.org/10.1017/s0007087400029150. (Cited on pp. 22, 23)
[55] K. Liang, Homocentric convergence ball of the secant method, Appl. Math. Chin. Univ., 22 (2007), pp. 353–365, https://doi.org/10.1007/s11766-007-0313-3. (Cited on p. 33)
[56] D. Luca and I. Pavaloiu, On the Heron’s method for approximating the cubic root of a real number, Rev. Anal. Numer. Theor. Approx., 26 (1997), pp. 103-108, https://ictp.acad.ro/jnaat/journal/article/view/1997-vol26-nos1-2-art15 (accessed 2020-03-21). (Cited on p. 30)
[57] M. Grau-Sanchez, M. Noguera, and J. M. Gutierrez, On some computational orders of convergence, Appl. Math. Lett., 23 (2010), pp. 472-478, https://doi.org/10.1016/j.aml.2009.12.006. (Cited on p. 20)
[58] Mathworks, MATLAB, Version 2019b, 2019, http://www.mathworks.com (accessed 2019/12/12). (Cited on pp. 4, 26)
[59] J. M. McNamee, Numerical Methods for Roots of Polynomials, Part I, Elsevier, Amsterdam, 2007. (Cited on p. 23)
[60] J. M. McNamee and V. Y. Pan, Numerical Methods for Roots of Polynomials, Part II, Elsevier, Amsterdam, 2013. (Cited on pp. 28, 33)
[61] U. C. Merzbach and C. B. Boyer, A History of Mathematics, 3rd ed., John Wiley & Sons, 2011. (Cited on p. 30)
[62] J.-M. Muller, N. Brunie, F. de Dinechin, C.-P. Jeannerod, M. Joldes, V. Lefevre, G. Melquiond, N. Revol, and S. Torres, Handbook of Floating-Point Arithmetic, 2nd ed., Springer, New York, 2018, https://doi.org/10.1007/978-3-319-76526-6. (Cited on pp. 22, 24)
[63] A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, 1983. (Cited on p. 15)
[64] A. Neumaier, Introduction to Numerical Analysis, Cambridge University Press, 2001. (Cited on pp. 31, 33)
[65] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer, New York, 2006, https://doi.org/10.1007/978-0-387-40065-5. (Cited on pp. 8, 17)
[66] J. M. Ortega, Numerical Analysis. A Second Course, SIAM, Philadelphia, PA, 1990, https://doi.org/10.1137/1.9781611971323. (Cited on pp. 28, 34)
[67] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719468. (Cited on pp. 8, 10, 11, 14, 15, 16, 18, 21, 27, 34)
[68] A. M. Ostrowski, Solution of Equations and Systems of Equations, 2nd ed., Academic Press, New York, 1966; 3rd ed., 1972. (Cited on pp. 27, 30, 34)
[69] M. L. Overton, Numerical Computing with IEEE Floating Point Arithmetic, SIAM, Philadelphia, PA, 2001, https://doi.org/10.1137/1.9780898718072. (Cited on pp. 3, 29)
[70] J. M. Papakonstantinou and R. A. Tapia, Origin and evolution of the secant method in one dimension, Amer. Math. Monthly, 120 (2013), pp. 500-518, https://doi.org/10.4169/amer.math.monthly.120.06.500. (Cited on p. 30)
[71] K. Plofker, An example of the secant method of iterative approximation in a fifteenth century sanskrit text, Historia Math., 23 (1996), pp. 246-256, https://doi.org/10.1006/hmat.1996.0026. (Cited on pp. 30, 34)
[72] K. Plofker, Use and transmission of iterative approximations in India and the Islamic World, in From China to Paris: 2000 Years Transmission of Mathematical Ideas, Y. Dold-Samplonius, J. W. Dauben, M. Folkerts, and B. Van Dalen, eds., Franz Steiner Verlag Wiesbaden GmbH, Stuttgart, 2002, pp. 167-186. (Cited on pp. 21, 22, 30, 34)
[73] E. Polak, Optimization. Algorithms and Consistent Approximations, Springer-Verlag, New York, 1997. (Cited on pp. 5, 11, 15, 19)
[74] F. A. Potra, On Q-order and R-order of convergence, J. Optim. Theory Appl., 63 (1989), pp. 415-431, https://doi.org/10.1007/bf00939805. (Cited on pp. 8, 11, 13, 14, 15, 16, 17, 18, 21) [75] F. A. Potra, Q-superlinear convergence of the iterates in primal-dual interior-point methods, Math. Program. Ser. A, 91 (2001), pp. 99-115, https://doi.org/10.1007/s101070100230. (Cited on p. 15)
[76] F. A. Potra, A superquadratic version of Newton’s method, SIAM J. Numer. Anal., 55 (2017), pp. 2863-2884, https://doi.org/10.1137/17M1121056. (Cited on p. 13)
[77] F. A. Potra and V. Ptak, Nondiscrete Induction and Iterative Processes, Pitman, Boston, MA, 1984. (Cited on pp. 8, 11, 13, 15, 19)
[78] F. A. Potra and R. Sheng, A path following method for LCP with superlinearly convergent iteration sequence, Ann. Oper. Res., 81 (1998), pp. 97-11, https://doi.org/10.1023/A:1018942131812. (Cited on p. 14)
[79] I. Pavaloiu, Sur l’ordre de convergence des methodes d’iteration, Mathematica, 23(46) (1981), pp. 261-272, https://doi.org/10.1007/BF02576543. (Cited on p. 14)
[80] I. Pavaloiu, Optimal algorithms concerning the solving of equations by interpolation, Ed. Srima, (1999), pp. 222-248, https://ictp.acad.ro/optimal-algorithms-concerning-solvingequations-interpolation/ (accessed 2018-03-22). (Cited on p. 13)
[81] A. Quarteroni, R. Sacco, and F. Saleri, Numerical Mathematics, Springer, Berlin, 2007. (Cited on pp. 28, 29)
[82] R. Rashed, The Development of Arabic Mathematics: Between Arithmetic and Algebra, Stud. Philos. Sci. 156, Springer, Boston, MA, 1994. (Cited on pp. 22, 30)
[83] M. Raydan, Exact order of convergence of the secant method, J. Optim. Theory Appl., 78 (1993), pp. 541-551, https://doi.org/10.1137/060670341. (Cited on pp. 25, 31)
[84] W. C. Rheinboldt, An adaptive continuation process for solving systems of nonlinear equations, in Mathematical Models and Numerical Methods, Banach Ctr. Publ. 3, Polish Academy of Science, 1977, pp. 129-142, https://doi.org/10.4064/-3-1-129-142. (Cited on p. 28)
[85] W. C. Rheinboldt, Methods for Solving Systems of Nonlinear Equations, 2nd ed., SIAM, Philadelphia, PA, 1998, https://doi.org/10.1137/1.9781611970012. (Cited on pp. 11, 15)
[86] T. Sauer, Numerical Analysis, 2nd ed., Pearson, Boston, MA, 2012. (Cited on pp. 21, 29)
[87] E. Schroder, Ueber unendlich viele algorithmen zur auflosung der gleichungen, Math. Ann., 2 (1870), pp. 317-365, https://doi.org/10.1007/bf01444024; available as On Infinitely Many Algorithms for Solving Equations, Tech reports TR-92-121 and TR-2990, Institute for Advanced Computer Studies, University of Maryland, 1992; translated by G. W. Stewart. (Cited on pp. 7, 34)
[88] H. Schwetlick, Numerische Losung Nichtlinearer Gleichungen, VEB, Berlin, 1979. (Cited on pp. 8, 13, 14, 16)
[89] L. F. Shampine, R. C. Allen, Jr., and S. Pruess, Fundamentals of Numerical Computing, John Wiley and Sons, New York, 1997. (Cited on pp. 29, 33)
[90] T. Steihaug, Computational science in the eighteenth century. Test cases for the methods of Newton, Raphson, and Halley: 1685 to 1745, Numer. Algor., 83 (2020), pp. 1259-1275, https://doi.org/10.1007/s11075-019-00724-8. (Cited on pp. 22, 23)
[91] E. Suli and D. Mayers, An Introduction to Numerical Analysis, Cambridge University Press, Cambridge, UK, 2003. (Cited on p. 33)
[92] T. Tantau, The tikz and pgf Packages. Manual for Version 3.1.5b-34-gff02ccd1, https://github.com/pgf-tikz/pgf. (Cited on p. 4)
[93] R. A. Tapia, J. E. Dennis, Jr., and J. P. Schafermeyer, Inverse, shifted inverse, and Rayleigh quotient iteration as Newton’s method, SIAM Rev., 60 (2018), pp. 3-55, https://doi.org/10.1137/15M1049956. (Cited on pp. 7, 16, 17, 18)
[94] R. A. Tapia and D. L. Whitley, The projected Newton method has order 1 + √2 for the symmetric eigenvalue problem, SIAM J. Numer. Anal., 25 (1989), pp. 1376-1382, https://doi.org/10.1137/0725079. (Cited on p. 13)
[95] J. F. Traub, Iterative Methods for the Solutions of Equations, Prentice Hall, Englewood Cliffs, NJ, 1964. (Cited on pp. 13, 31, 34)
[96] E. E. Tyrtyshnikov, A Brief Introduction to Numerical Analysis, Birkhauser/Springer, New York, 1997. (Cited on p. 24)
[97] J. S. Vandergraft, Introduction to Numerical Computations, 2nd ed., Academic Press, New York, 1983. (Cited on p. 33)
[98] H. F. Walker, An approach to continuation using Krylov subspace methods, in Computational Science in the 21st Century, J. Periaux, ed., John Wiley and Sons, 1997, pp. 72-81. (Cited on p. 19)
[99] S. J. Wright, Primal-Dual Interior-Point Methods, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971453. (Cited on p. 12)
[100] S. J. Wright, Optimization algorithms for data analysis, in The Mathematics of Data, IAS/Park City Math. Ser. 25, AMS, Providence, RI, 2018, pp. 49-97. (Cited on p. 15)
[101] T. J. Ypma, Historical development of the Newton-Raphson method, SIAM Rev., 37 (1995), pp. 531–551, https://doi.org/10.1137/1037125. (Cited on pp. 7, 21, 22, 23, 30, 34)

2021

Related Posts