# How many steps still left to x*?

We survey the convergence orders of sequences and show the orders of the basic iterative sequences for solving nonlinear equations in $${\mathbb R}$$: the Newton method, the secant method, and the successive approximations).

## Abstract

The high speed of $$x_{k}\rightarrow x^\ast\in{\mathbb R}$$ is usually measured using the C-, Q- or R-orders:
\tag{$C$}
\lim \frac {|x^\ast – x_{k+1}|}{|x^\ast – x_k|^{p_0}}\in(0,+\infty),

\tag{$Q$}
\lim \frac {\ln |x^\ast – x_{k+1}|}{\ln |x^\ast – x_k|} =q_0,

\tag{$R$}
\lim \big\vert\ln |x^\ast – x_k| \big \vert ^\frac1k =r_0.

By connecting them to the natural, term-by-term comparison of the errors of two sequences, we find that the C-orders — including (sub)linear — are in agreement. Weird relations may appear though for the Q-orders: we expect
$|x^\ast – x_k|=\mathcal{O}(|x^\ast – y_k|^\alpha), \qquad\forall\alpha>1,$
to attract “$$\geq$$” for the Q-orders of $$\{x_{k}\}$$ vs. $$\{y_{k}\}$$; the contrary is shown by an example providing no vs. infinite Q-orders. The R-orders seem even worse: an $$\{x_{k}\}$$ with infinite R-order may have unbounded nonmonotone errors: $$\limsup$$$$|x^\ast – x_{k+1}|/|x^\ast – x_k|= +\infty$$.

Such aspects motivate the study of equivalent definitions, computational variants, etc.

The orders are also the perspective from which we analyze the three basic iterative methods for nonlinear equations. The Newton method, widely known for its quadratic convergence, may in fact attain any C-order from $$[1,+\infty]$$ (including sublinear); we briefly recall such convergence results, along with connected aspects (historical notes, known asymptotic constants, floating point arithmetic issues, radius of attraction balls, etc.) and provide examples.

This approach leads to similar results for the successive approximations, while the secant method exhibits different behavior: it may not have high C-, but just Q-orders.

## Authors

Emil Cătinaş
“Tiberiu Popoviciu” Institute of Numerical Analysis, Romanian Academy

## Keywords

(computational) convergence orders, iterative methods, Newton method, secant method, successive approximations, asymptotic rates.

## PDF

here you may find an annotated version, with some typos pointed out: https://ictp.acad.ro/wp-content/uploads/2023/12/Catinas-2021-SIREV-How-many-steps-still-left-to-x-annotated.pdf

also freely available at the publisher (without pointed typos): http://doi.org/10.1137/19M1244858

##### Paper coordinates:

E. Catinas, How many steps still left to x*?, SIAM Rev., 63 (2021) no. 3, pp. 585–624, doi: 10.1137/19M1244858

0036-1445

1095-7200

## Github

Some small codes in LaTeX and Julia have been posted on Github in order to illustrate some aspects.

We will post soon in Python too.

## Corrigendum

(introduced on February 10, 2023, modified on February 25, 2023, on August 27, 2023)

Corrigendum:

• for the sequences with $$\bar{Q}_1=\infty$$ we have proposed the terminology “sequences with unbounded nonmonotone errors” (see Remark 2.12.c). This is also termed in the beginning of Section 2.2 as “sequences with no Q-order” – which is not proper, as we have subsequently found sequences with $$\bar{Q}_1=\infty$$ and with Q-order $$1$$, see [Cătinaş, 2023];
• in Remark 2.50.b, “the strict superlinear” should be removed. Indeed, this is not a C-order, but a Q-order, as is defined with the aid of $$\bar{Q}_p$$. On the other hand, the strict superlinear order is not always faster than the C-order $$p>1$$, see [Cătinaş, 2023].

Typos:

• in the abstract, condition $$|x^\ast – x_{k+1}|/|x^\ast – x_{k}| \rightarrow \infty$$ (i.e., $$Q_1=\infty$$), for the sequence with unbounded nonmonotone error, should read instead $$\limsup |x^\ast – x_{k+1}|/|x^\ast – x_{k}| = \infty$$ (i.e., $$\bar{Q}_1=\infty$$); obviously, condition $$Q_1=\infty$$ cannot be fulfilled by a convergent sequence. The matter is correctly dealt with inside the paper (see Remark 2.12.c).

## Notes

Working with the SIAM staff was an amazing experience. I would like to thank to Louis Primus for his solicitude and professionalism.

Version: October 30, 2021.

Some references I have missed:

(these were included in the initial submission, but missing in the final version, as not cited in text, and we switched to BibTeX)

known to me, but still missing (nobody is perfect):

found after sending the revised version:

• J. H. Wilkinson, The evaluation of the zeros of ill-conditioned polynomials. part ii, Numer. Math., 1 (1959), pp. 167–180, doi: 10.1007/BF01386382
• D. F. Bailey and J. M. Borwein, Ancient indian square roots: an exercise in forensic paleo mathematics, Amer. Math. Monthly, 119 (2012), pp. 646–657, doi 10.4169/amer.math.monthly.119.08.646.
• E. Halley, Methodus nova accurata & facilis inveniendi radices aeqnationum quarumcumque generaliter, sine praviae reductione, Phil. Trans. R. Soc., 18 (1694), pp. 136–148, https://doi.org/10.1098/rstl.1694.0029.
• A. Ralston and P. Rabinowitz, A First Course in Numerical Analysis, 2nd ed., 1978.
• M. A. Nordgaard, The origin and development of our present method of extracting the square and cube roots of numbers, The Mathematics Teacher, 17 (1924), pp. 223–238, https://doi.org/https://www.jstor.org/stable/27950616.

and others (to be completed)

## Paper (preprint) in HTML form

How Many Steps Still Left to 𝒙*?Received by the editors February 14, 2019; accepted for publication (in revised form) September 17, 2020; published electronically August 5, 2021.

# How Many Steps Still Left to $x$*?††thanks: Received by the editors February 14, 2019; accepted for publication (in revised form) September 17, 2020; published electronically August 5, 2021.

Emil Cătinaş
“Tiberiu Popoviciu” Institute of Numerical Analysis, Cluj-Napoca, Romania

Abstract. The high speed of $x_{k}\rightarrow x^{\ast}\in{\mathbb{R}}$ is usually measured using the $C$-, $Q$-, or $R$-orders:

 $\lim\frac{|x^{\ast}-x_{k+1}|}{|x^{\ast}-x_{k}|^{p_{0}}}\in(0,+\infty),\ \lim% \frac{\ln|x^{\ast}-x_{k+1}|}{\ln|x^{\ast}-x_{k}|}=q_{0},\ \mbox{or}\ \lim\big{% |}\ln|x^{\ast}-x_{k}|\big{|}^{\frac{1}{k}}=r_{0}.$

By connecting them to the natural, term-by-term comparison of the errors of two sequences, we find that the $C$-orders—including (sub)linear—are in agreement. Weird relations may appear though for the $Q$-orders: we expect $|x^{\ast}-x_{k}|=\mathcal{O}(|x^{\ast}-y_{k}|^{\alpha})$ $\forall\alpha>1$ to imply “$\geq$” for the $Q$-orders of $\{x_{k}\}$ vs. $\{y_{k}\}$; the contrary is shown by an example providing no vs. infinite $Q$-orders. The $R$-orders appear to be even worse: an $\{x_{k}\}$ with infinite $R$-order may have unbounded nonmonotone errors: $\operatorname{limsup}|x^{\ast}-x_{k+1}|/|x^{\ast}-x_{k}|=+\infty$ .
Such aspects motivate the study of equivalent definitions, computational variants, and so on.
These orders are also the perspective from which we analyze the three basic iterative methods for nonlinear equations in $\mathbb{R}$. The Newton method, widely known for its quadratic convergence, may in fact attain any $C$-order from $[1,+\infty]$ (including sublinear); we briefly recall such convergence results, along with connected aspects (such as historical notes, known asymptotic constants, floating point arithmetic issues, and radius of attraction balls), and provide examples.
This approach leads to similar results for the successive approximations method, while the secant method exhibits different behavior: it may not have high $C$-orders, but only $Q$-orders.

Keywords. (computational) convergence orders, iterative methods, Newton method, secant method, successive approximations, asymptotic rates

AMS subject classifications. 65-02, 65-03, 41A25, 40-02, 97-02, 97N40, 65H05

DOI. 10.1137/19M1244858

## 1 Introduction

The analysis of the convergence speed of sequences is an important task, since in numerical applications the aim is to use iterative methods with fast convergence, which provide good approximations in few steps.

The ideal setting for studying a given sequence $\{x_{k}\}:=(x_{k})_{k\geq 0}\subset\mathbb{R}$ is that of error-based analysis, where the finite limit $x^{\ast}$ is assumed to be known and we analyze the absolute values of the errors, $e_{k}:=|x_{k}-x^{\ast}|$. The errors allow the comparison of the speeds of two sequences in the most natural way, even if their limits are distinct (though here we write both as $x^{\ast}$). Let us first define notation, for later reference.

###### Notation.

Given $\dot{x}_{k},\mathring{x}_{k}\rightarrow x^{\ast}\in\mathbb{R}$, denote their errors by $\{\dot{e}_{k}\}$, resp., $\{\mathring{e}_{k}\}$, etc.

###### Comparison 1.1.

$\{\dot{x}_{k}\}$ converges faster (not slower) than $\{\mathring{x}_{k}\}$ if

 $|x^{\ast}-\dot{x}_{k}|\leq|x^{\ast}-\mathring{x}_{k}|,\quad k\geq k_{0}$ ($\dot{e}_{k}\leq\mathring{e}_{k}$)

(or, in brief, $\{\dot{x}_{k}\}$ is eq. $\dot{e}_{k}\leq\mathring{e}_{k}$ faster than $\{\mathring{x}_{k}\}$) and strictly faster if

 $|x^{\ast}-\dot{x}_{k}|<|x^{\ast}-\mathring{x}_{k}|,\quad k\geq k_{0}.$ ($\dot{e}_{k}<\mathring{e}_{k}$)

Furthermore, increasingly faster convergence holds if

- for some $c\in(0,1)$ we have $|x^{\ast}-\dot{x}_{k}|\leq c|x^{\ast}-\mathring{x}_{k}|,k\geq k_{0};$

- the constant $c$ above is not fixed, but tends to zero (see, e.g., [9, p. 2], [53]),

 $|x^{\ast}-\dot{x}_{k}|=o(|x^{\ast}-\mathring{x}_{k}|)\quad\mbox{as }k% \rightarrow\infty;$ ($\dot{e}_{k}=o(\mathring{e}_{k})$)
111This means that $|x^{\ast}-\dot{x}_{k}|\leq c_{k}|x^{\ast}-\mathring{x}_{k}|$, $k\geq k_{0}$, with $c_{k}\rightarrow 0$, which allows a finite number of elements $c_{k}$ to be greater than one.

- given $\alpha>1$, we have

 $|x^{\ast}-\dot{x}_{k}|=\mathcal{O}(|x^{\ast}-\mathring{x}_{k}|^{\alpha})\text{% as }k\rightarrow\infty.$ ($\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$)
222That is, $\exists K>0$ such that $|x^{\ast}-\dot{x}_{k}|\leq K|x^{\ast}-\mathring{x}_{k}|^{\alpha}$, $k\geq k_{0}$.

Let us first illustrate the convergence speed in an intuitive fashion, using some graphs.

###### Example 1.2.

Consider the following two groups of sequences $(\{x_{k}\}=\{e_{k}\})$:

• (a)

$\{\frac{1}{\sqrt{k}}\}$, $\{\frac{1}{k}\}$, $\{\frac{1}{k^{4}}\}$;

• (b)

$\{\frac{k}{2^{k}}\}$, $\{\frac{1}{2^{k}}\}=\{(\frac{1}{2})^{k}\}=\{2^{-k}\}$, $\{\frac{1}{k2^{k}}\}$, $\{\frac{1}{4^{k}}\}$.

Handling the first terms of these sequences raises no difficulties, as all the common programming languages represent them as $\operatorname{fl}(x_{k})$ in standard double precision arithmetic (called binary64 by the IEEE 754-2008 standard); see 1.4 below.

The plotting of $(k,e_{k})$ in fig. 1(a) does not help in analyzing the behavior of the errors, as for $k\geq 10$ we cannot see much, even if the graph is scaled.333The implicit fitting of the figures in the window usually results in different units for the two axes (scaled graph). In fig. 1(a) we would see even less if the graph were unscaled. All we can say is that $\{\frac{1}{\sqrt{k}}\}$ has the slowest convergence, followed by $\{\frac{1}{k}\}$, and also that the first terms of $\{\frac{1}{k^{4}}\}$ are smaller than those of $\{\frac{1}{2^{k}}\}$.

In order to compare the last terms we must use the semilog coordinates $(k,\lg e_{k})$ in fig. 1(b) (see [43, p. 211]); for $\{\frac{1}{2^{k}}\}$, they are $(k,k\operatorname{lg}\frac{1}{2})$, belonging to the line $y=ax$, $a=\lg\frac{1}{2}$. As the graph is twice scaled,444The first scaling is by the use of an unequal number of units ($50$ on axis $x$ vs. $36$ on axis $y$), and the second one results from the fact that the final figure is a rectangle instead of a square. it actually shows another line, $y=bx$.

Figure 1(b) shows that, in reality, $\{\frac{1}{2^{k}}\}$ is eq. $\dot{e}_{k}<\mathring{e}_{k}$ faster than $\{\frac{1}{k^{4}}\}$ ($k_{0}=17$). We will see later that the convergence is sublinear in group (a) and linear in group (b).

###### Exercise 1.3.

Show that (a) $\{\frac{1}{2^{k}}\}$ is $[$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$ $\forall\alpha>1]$ faster than $\{\frac{1}{k^{4}}\}$;

(b) $\{\frac{1}{2^{k}}\}$ is eq. $\dot{e}_{k}=o(\mathring{e}_{k})$ but not $[$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$ for some $\alpha>1]$ faster than $\{\frac{k}{2^{k}}\}$.

###### Exercise 1.4.

(a) The smallest positive binary64  number, which we denote by realmin( binary64 ), is ${2.2250738585072014e-308}$. Analyzing its binary representation, express its value as a power of $2$ (see [69, p. 14], [28], or [45]).

(b) For each sequence from (a) and (b) in the example above, compute the largest $k$ such that $\operatorname{fl}(x_{k})\neq 0$.

(c) Give a formula for the largest index $k$ for which all the elements $\operatorname{fl}(x_{k})$ of all the sequences from (a) and (b) in the example above are nonzero.

Next we analyze some increasingly faster classes of sequences.

###### Example 1.5.

Consider

• (c)

$\big{\{}\frac{1}{2^{k^{2}}}\big{\}}$, $\big{\{}\frac{1}{2^{k^{3}}}\big{\}}$, $\{\frac{1}{k^{k}}\}$;

• (d)

$\Big{\{}\frac{1}{2^{\frac{2^{k}}{k}}}\Big{\}}$, $\big{\{}\frac{1}{2^{2^{k}}}\big{\}}=\{(\frac{1}{2})^{2^{k}}\}=\{2^{-2^{k}}\}$, $\big{\{}\frac{1}{2^{k2^{k}}}\big{\}}$, $\big{\{}\frac{1}{3^{2^{k}}}\big{\}}$;

• (e)

$\big{\{}\frac{1}{2^{3^{k}}}\big{\}}$;

• (f)

$\Big{\{}\frac{1}{2^{2^{k\sqrt{k}}}}\Big{\}}$, $\Big{\{}\frac{1}{2^{2^{k^{2}}}}\Big{\}}$.

In fig. 2 we plot $\{\frac{1}{4^{k}}\}$, $\big{\{}\frac{1}{2^{k^{2}}}\big{\}}$, and $\{\frac{1}{k^{k}}\}$. Their graphs are on a line (the fastest from fig. 1(b)), on a parabola $y=cx^{2}$, and, resp., in between.

The computation with what appears at first sight to be a “reasonable” number of terms (say, $10$) becomes increasingly challenging as we successively consider $\{x_{k}\}$ from (c)–(f).

We leave as an exercise the calculation/computation of the largest index $k$ for each $\{x_{k}\}$ in (c)–(f) such that $\operatorname{fl}(x_{k})$ is a nonzero binary64  number; we note though that all $\operatorname{fl}(1/{2^{2^{k^{2}}}})$ and $\operatorname{fl}(1/2^{2^{k\sqrt{k}}})$ are zero for $k\geq 4$, respectively, $k\geq 5$, and so binary64  numbers are not enough here.

There are plenty of choices for increased precision: the well-known MATLAB [58] (but with the Advanpix toolbox [1] providing arbitrary precision), the recent, freely available Julia [5], and many others (not to mention the programming languages for symbolic computation). All the graphs from this paper were obtained using the tikz/pgf package [92] for .555The initial  system was created by D. E. Knuth [54]. The figures are easy to customize and the instructions to generate them have a simple form, briefly,

    \addplot [domain=1:20, samples=20] {1/sqrt(x)};


Equally important, we will see that the tikz/pgf library allows the use of higher precision than binary64 (more precisely, smaller/larger numbers by increased representation of the exponent, e.g., realmin around $10^{-6500}$). The  sources for the figures are posted on https://github.com/ecatinas/conv-ord.

The sequence groups (c)–(f) are presented in fig. 3; $\big{\{}\frac{1}{2^{k^{3}}}\big{\}}$ belongs to a cubic parabola, and $\big{\{}\frac{1}{2^{2^{k}}}\big{\}}$ to an exponential. The parabola of $\big{\{}\frac{1}{2^{k^{2}}}\big{\}}$ appears flat, which shows how much faster the other sequences converge. The descent of the three terms of $\big{\{}1/{2^{2^{k^{2}}}}\big{\}}$ is the steepest.

As will be seen later, the sequence groups (c)–(f) have increasing order: strict superlinear, quadratic, cubic, and infinite, respectively.

###### Exercise 1.6.

Show that $\big{\{}\frac{1}{3^{2^{k}}}\big{\}}$ is $\big{[}$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$, $\alpha\in(1,\frac{\ln 3}{\ln 2}]\big{]}$ faster than $\big{\{}\frac{1}{2^{2^{k}}}\big{\}}$.

Obviously, when comparing two sequences by eq. $\dot{e}_{k}<\mathring{e}_{k}$, the faster one must have its graph below the other one ($k\geq k_{0}$) (cf. Polak [75, p. 47]). While the more concave the graph the better (cf. Kelley [50, Ex. 5.7.15]), fast convergence does not in fact necessarily require smooth graphs (see Jay [47]).

###### Exercise 1.7.

Prove that any convergence curve between $\big{\{}\frac{1}{2^{3^{k}}}\big{\}}$ and $\big{\{}\frac{1}{3^{2^{k}}}\big{\}}$ is at least $\big{[}$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$, $\alpha\in(1,\frac{\ln 3}{\ln 2}]\big{]}$ faster than $\big{\{}\frac{1}{2^{2^{k}}}\big{\}}$ (hint: use 1.6).

Consider, for instance, the merging of certain sequences.

###### Example 1.8.

Let

 $x^{a}_{k}=\bigg{\{}\begin{array}[]{cc}\frac{1}{3^{2^{k}}},&k\ \text{odd},\\[7.% 0pt] \frac{1}{2^{3^{k}}},&k\ \text{even},\end{array}\qquad x^{b}_{k}=\bigg{\{}% \begin{array}[]{cc}\frac{1}{2^{2^{k}}},&k\ \text{even},\\[7.0pt] \frac{1}{3^{2^{k}}},&k\ \text{odd},\end{array}\qquad x^{c}_{k}=\bigg{\{}\begin% {array}[]{cc}\frac{1}{2^{2^{k}}},&k\ \text{even},\\[7.0pt] \frac{1}{5^{2^{k}}},&k\ \text{odd}.\end{array}$

$\big{\{}\frac{1}{2^{3^{k}}}\big{\}}$, $\{x^{a}_{k}\}$, $\big{\{}\frac{1}{3^{2^{k}}}\big{\}}$, $\{x^{b}_{k}\}$, $\big{\{}\frac{1}{2^{2^{k}}}\big{\}}$, written in eq. $\dot{e}_{k}\leq\mathring{e}_{k}$ order of decreasing speed, are plotted in Figure 4(a). The usual ranking by convergence orders is instead $\big{\{}\frac{1}{2^{3^{k}}}\big{\}}$, $\big{\{}\frac{1}{3^{2^{k}}}\big{\}}$, $\big{\{}\frac{1}{2^{2^{k}}}\big{\}}$, $\{x^{b}_{k}\}$, $\{x^{a}_{k}\}$ ($C$-orders $3$, $2$, $2$, $R$-order $2$, no order). Though $\{x^{a}_{k}\}$ has the second fastest convergence, the orders actually rank it as the slowest. $\{x^{b}_{k}\}$ does not have $C$- or $Q$-order (but just exact $R$-order $2$), so is usually ranked below $\big{\{}\frac{1}{2^{2^{k}}}\big{\}}$.

$\big{\{}\frac{1}{5^{2^{k}}}\big{\}}$, $\{x^{c}_{k}\}$, $\big{\{}\frac{1}{2^{2^{k}}}\big{\}}$, $\big{\{}\frac{1}{1.3^{2^{k}}}\big{\}}$ from fig. 4(b) have ranking $\big{\{}\frac{1}{5^{2^{k}}}\big{\}}$, $\big{\{}\frac{1}{2^{2^{k}}}\big{\}}$, $\big{\{}\frac{1}{1.3^{2^{k}}}\big{\}}$, $\{x^{c}_{k}\}$ ($C$-orders $2$, $2$, $2$, $R$-order $2$). Though nonmonotone, $\{x^{c}_{k}\}$ has exact $R$-order $2$ and is (at least) eq. $\dot{e}_{k}<\mathring{e}_{k}$ faster than the $C$-quadratic $\big{\{}\frac{1}{1.3^{2^{k}}}\big{\}}$.

###### Quiz 1.9.

Given $\{\dot{x}_{k}\}$, $\{\mathring{x}_{k}\}$ with errors shown in fig. 5, which one is faster?

The study of errors using 1.1 seems666Of course, one cannot always compare all sequences by eq. $\dot{e}_{k}\leq\mathring{e}_{k}$: consider $\{x^{a}_{k}\}$ and $\{\frac{1}{6^{2^{k}}}\}$, for example. perfect in theory, but has a notable disadvantage: it requires two sequences. The alternative is to use a “scale” for measuring any single sequence—the convergence orders. The (classical) $C$-orders, though they demand some strong conditions, appear in the convergence of the most often encountered iterative methods for nonlinear equations.777A notable exception: the higher orders of the secant method are $Q$- but not necessarily $C$-orders.

Still, some problems appear, for example, when one wants to check the usual predictions such as “in quadratic convergence, the error is squared at each step.” Indeed, as a sequence is an infinite set, its order refers to the asymptotic range (all sufficiently large indices), where statements such as

“if $\varepsilon>0$ is suff. small, $\exists k_{0}\geq 0$ s.t. [some property] holds $\forall k\geq k_{0}$

come to life. Clearly, the first $k_{0}$ terms ($k_{0}=2,10$, or perhaps $10^{6}$) are not necessarily connected to the order: a finite number of terms does not contribute to the order of an abstract sequence, while, most painful, the high order of an iterative method may not be reflected in its first steps if the initial approximation is not good enough. Therefore, such predictions, instead of being applicable to the very first term, are usually applied only starting from the first terms from the asymptotic range.

When the stringent conditions of $C$-orders are not fulfilled, the $Q$-orders are considered instead, but unfortunately they can be in disagreement with the term-by-term comparison of the errors of two sequences; indeed, we show that convergence even with no $Q$- (or $C$)-order may in fact hide a fast overall speed, only some of the consecutive errors do not decrease fast enough. The $R$-orders are instead known for their limited usefulness, requiring the weakest conditions and allowing nonmonotone errors.

Consider now a further problem: given some iterations for which the calculation of the order turns out to be difficult or even impossible, can the computer be used to approximate it? A positive answer is given by the computational convergence orders.

Despite the fact that convergence orders have a long history, it is only recently that some final connections were made and a comprehensive picture was formed of the $C$-, $Q$-, and $R$-orders, together with their computational variants [20].

The understanding of these notions is important from both the theoretical and the practical viewpoints, the comments of Tapia, Dennis, and Schäfermeyer [93] being most relevant:

The distinction between $Q$- and $R$-convergence is quite meaningful and useful and is essentially sacred to workers in the area of computational optimization. However, for reasons not well understood, computational scientists who are not computational optimizers seem to be at best only tangentially aware of the distinction.

We devote section 2 to error-based analysis of the problem of measuring and comparing convergence speeds of abstract real sequences, using either the convergence orders we review, or the term-by-term comparison of errors.

In section 3 we deal with the computational versions of the convergence orders (based either on the corrections888Each $x_{k+1}$ can be seen as being obtained from $x_{k}$ by adding the correction. $x_{k+1}-x_{k}$ or on the nonlinear residuals $f(x_{k})$) and show that in $\mathbb{R}$ they are equivalent to the error-based ones.

In section 4 we deal with three main iterative methods (Newton, secant, and successive approximations), presenting results on their attainable convergence orders, as well as other connected aspects (asymptotic constants, estimation of the attraction balls, floating point arithmetic issues, brief history, etc.).

## 2 $C$-, $Q$-, and $R$-Orders $p_{0}>1$ vs. Asymptotic Rates

Even though the roots of iterative methods trace back to the Babylonians and Egyptians (ca. 1800 B.C.) [3], the first comment on convergence orders seems to have been made by Newton (ca. 1669) on the doubling of digits in quadratic convergence (quoted by Ypma in [101]): “But if I desire to continue working merely to twice as many figures, less one$,\ldots,$” and then in 1675: “That is, the same Division, by wch you could finde the 6th decimal figure, if prosecuted, will give you all to the 11th decimal figure.”

In the Journal Book of the Royal Society, it is recorded “17 December 1690: Mr Ralphson’s Book was this day produced by E Halley, wherein he gives a Notable Improvemt of ye method […], which doubles the known figures of the Root known by each Operation….”

Halley noticed the tripling of digits in the method he introduced.

In his Tracts, Hutton showed that one of his schemes is of third order [3]. In 1818, Fourier [38] also noted that the iterates double the exact figures at each Newton step.

In 1870, Schröder [87] implicitly defined the high $C$-orders for some iterative methods by considering conditions on the nonlinear mapping.

Three types of convergence orders are used at present: the classical $C$-order (notation adopted from [4]), the $Q$-order (which contains the $C$-order as a particular case), and the $R$-order. Outstanding contributions to their study were successively made by Ortega and Rheinboldt (1970) in their fundamental book [67], Potra and Pták (1984) [79], Potra (1989) [76], and finally by Beyer, Ebanks, and Qualls (1990), in the less known but essential paper [4]. In [20] we connected and completed these results (in $\mathbb{R}^{N}$).

We analyze here only the high orders, noting that the various equivalent definitions of the $Q$-orders lead to intricate implications for linear convergence [4].

### 2.1 $C$-Order $p_{0}>1$

The definition of $C$- and $Q$-orders is obtained by considering for $p\geq 1$ the quotient convergence factors [67, sect. 9.1]

 $Q_{p}(k):=\frac{e_{k+1}}{(e_{k})^{p}}=\frac{|x^{\ast}-x_{k+1}|}{|x^{\ast}-x_{k% }|^{p}},\quad k=0,1,\ldots.3$

It is assumed that $x_{k}\neq x^{\ast},$ $k\geq 0$. If, though, $x_{k_{0}}=x^{\ast}$ in an iterative method, then (hopefully) we get $x_{k}=x^{\ast}$, $k\geq k_{0}$ (this holds for the three methods in section 4).

Let us briefly list the four slowest types of $C$-order:

• no $C$-order, when $\nexists Q_{1}:=\lim_{k\rightarrow\infty}Q_{1}(k)$ (e.g., $x_{k}=\frac{1}{\sqrt{k}}$, $k$ odd; $x_{k}=\frac{2}{\sqrt{k}}$, $k$ even);

• $C$-sublinear, if $Q_{1}=1$ (e.g., $\{\frac{1}{k}\}$);

• $C$-linear, if $0 (e.g., $\{\frac{1}{2^{k}}\}$);

• $C$-superlinear (defined later, usually called (strict) $Q$-superlinear);

notice that $Q_{1}>1$ cannot hold, and that $\{x_{k}^{a}\},\{x_{k}^{c}\}$, with no $C$-order, in fact are fast.

###### Remark 2.1.

(a) (see [65, p. 620]) $C$-linear order implies strict monotone errors:

 $e_{k+1} (1)

(b) $C$-sublinear $\nRightarrow$ monotone errors (e.g., $x_{k}=\frac{1}{k-2}$, $k$ even, $x_{k}=\frac{1}{k}$, $k$ odd, $k\geq 3$).

(c) $\{x^{a}_{k}\},\{x^{c}_{k}\}$, both nonmonotone, have no $C$-order.

###### Exercise 2.2.

If $\{\mathring{x}_{k}\}$ has $C$-sublinear and $\{\dot{x}_{k}\}$ $C$-linear order, show that $\{\dot{x}_{k}\}$ is $[$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$ $\forall\alpha>1]$ faster than $\{\mathring{x}_{k}\}$.

The following definition of high orders is well known; $p_{0}>1$ is implicitly assumed throughout this paper even if not explicitly mentioned.

###### Definition 2.3.

$\{x_{k}\}$ has $C$-order $p_{0}>1$ if

 $Q_{p_{0}}:=\lim_{k\rightarrow\infty}Q_{p_{0}}(k)\in(0,+\infty).$ ($C$)
###### Remark 2.4.

It is important to stress that $Q_{p_{0}}$ above cannot be zero or infinite; these two cases will arise when analyzing the $Q$-orders.

By denoting (see [67, sect. 9.1], [88, p. 83])

 $\underline{Q}_{p}=\liminf\limits_{k\rightarrow\infty}Q_{p}(k),\quad\bar{Q}_{p}% =\limsup\limits_{k\rightarrow\infty}Q_{p}(k),\quad p\geq 1,$ (2)

condition eq. $C$ is equivalent either to relation

 $0<\underline{Q}_{p_{0}}=Q_{p_{0}}=\bar{Q}_{p_{0}}<\infty$ ($C_{Q}$)

or to requiring that $\forall\varepsilon>0,\ \exists k_{0}\geq 0$ such that

 $(Q_{p_{0}}-\varepsilon)e_{k}^{p_{0}}\leq e_{k+1}\leq(Q_{p_{0}}+\varepsilon)e_{% k}^{p_{0}}\qquad\forall k\geq k_{0}.$ ($C_{\varepsilon}$)
###### Remark 2.5.

eq. $C$ $\Rightarrow$ eq. 1 (by eq. $C_{\varepsilon}$, or because eq. $C$ is stronger than $C$-linear).

###### Example 2.6.

(a) $x_{k}=\theta^{p_{0}^{k}}$ for some $\theta\in(0,1)$, $p_{0}>1$, is the standard example for $C$-order $p_{0}$ ($Q_{p_{0}}=1$). The usual instances are the $C$-quadratic $\{2^{-2^{k}}\}$, $\{3^{-2^{k}}\}$ ($\theta=\frac{1}{2}$, resp., $\theta=\frac{1}{3}$, $p_{0}=2$), the $C$-cubic $\{2^{-3^{k}}\}$ ($\theta=\frac{1}{2}$, $p_{0}=3$), etc.

(b) $\{c\cdot 2^{-2^{k}}\}$ ($c\in\mathbb{R}$ given, $c\neq 0$) also has $C$-quadratic convergence.

(c) The $C$-quadratic $\{(-1)^{k}\cdot 2^{-2^{k}}\}$ is nonmonotone, but with monotone $\{e_{k}\}$.

(d) $x^{d}_{k}=\Big{\{}\begin{smallmatrix}c(\frac{1}{3})^{2^{k}},&k\;\text{odd,}\\ \frac{1}{c}(\frac{1}{3})^{2^{k}},&k\;\text{even}\end{smallmatrix}$, $c>1$, does not have $C$-order $2$: $\underline{Q}_{2}=\frac{1}{c^{3}}$, $\bar{Q}_{2}=c^{3}$.

###### Quiz 2.7([42, Ex. 5.6  and p. 245]).

What is the rate of $\{x_{k}\}$ if $\{e_{k}\}$ is

(a) $10^{-2},10^{-3},10^{-4},10^{-5},\ldots;$

(b) $10^{-2},10^{-4},10^{-6},10^{-8},\ldots;$

(c) $10^{-2},10^{-3},10^{-5},10^{-8},10^{-13},10^{-21},\ldots;$

(d) $10^{-2},10^{-4},10^{-8},10^{-16},\ldots$?

###### Remark 2.8(see, e.g., [4]).

If it exists, the $C$-order $p_{0}>1$ of $\{x_{k}\}$ is unique. Indeed, if raising the power of the denominator in ${Q}_{p_{0}}(k)$, the quotient tends to infinity, since $Q_{p_{0}+\varepsilon}(k)={Q}_{p_{0}}(k)\frac{1}{(e_{k})^{\varepsilon}}$ $\forall\varepsilon>0$. Similarly, if lowering the power, it tends to zero.

###### Example 2.9.

Let $\{10^{-5}\cdot 2^{-2^{k}}\}$, $\{2^{-2^{k}/k}\}$, and $\{x^{d}_{k}\}$ ($c=10$); in fig. 6 we plot $\{Q_{1.8}(k)\}$, $\{Q_{2}(k)\}$, and $\{Q_{2.2}(k)\}$ for each of them.

The limiting values of $\underline{Q}_{p}$, $\bar{Q}_{p}$ from eq. 2 may be regarded as “functions” of $p\geq 1$. Their graphical illustration leads to the so-called $Q$-convergence profile of $\{x_{k}\}$ [4] (or $Q$-profile here, for short).

For all $C$-orders $p_{0}>1$, it holds that $\bar{Q}_{p}=\underline{Q}_{p}=Q_{p}$ $\forall p\geq 1$, and $Q_{p}$ is a “function” given by $Q_{p}=0$, $p\in[1,p_{0})$, $Q_{p_{0}}\in(0,+\infty),$ and $Q_{p}=+\infty$, $p>p_{0}$.

In fig. 7 we plot the $Q$-profile of $\{2^{-2^{k}}\}$.

###### Exercise 2.10([4]).

Show that $x_{k}\rightarrow x^{\ast}$ cannot have $C$-order $p_{0}<1$.

###### Exercise 2.11.

(a) Show that if $\{x_{k}\}$ has $C$-order $p_{0}>1$, then so has $\{\frac{e_{k+1}}{e_{k}}\}$ and

 $Q_{p_{0}}\Big{\{}\frac{e_{k+1}}{e_{k}}\Big{\}}=1,$ (3)

regardless of $Q_{p_{0}}\{e_{k}\}\neq 0$ [20, Rem. 3.9]. Does the converse hold? (Hint: take $\{\frac{1}{k}\}$.)

(b) [20, Rem. 3.9] Find a similar statement for $\{e_{k+1}e_{k}\}$.

(c) (Kelley [50, Ex. 5.7.15]) If $\{x_{k}\}$ has $C$-order $p_{0}>1$, show that $\lg e_{k}$ is concave, i.e., $\lg{e_{k+1}}-\lg{e_{k}}$ is decreasing.

Not only convergence with any high order exists, but even the infinite $C$-order, defined either by $Q_{p}=0\;\forall p>1$ (take, e.g., $\{2^{-2^{2^{k}}}\}$ [67, E 9.1-3(g)] or $\{2^{-2^{k^{2}}}\})$ or by convention, when the convergence is in a finite number of steps.

“The higher the order, the better” is a well-known cliché, which we will express in section 2.5 in terms of the big Ohs.

### 2.2 $Q$-Order $p_{0}>1$

We propose the following definitions of $Q$-order convergence:

• with unbounded nonmonotone errors if $\bar{Q}_{1}=\infty$ (e.g., $\{x^{a}_{k}\},\{x^{c}_{k}\}$);

• $Q$-sublinear if $1\leq\bar{Q}_{1}<+\infty$ (e.g., $x_{k}=\frac{2}{k}$, $k$ odd, $x_{k}=\frac{1}{k}$, $k$ even);

• exact $Q$-sublinear if $0<\underline{Q}_{1}\leq 1\leq\bar{Q}_{1}<+\infty$;

• at least $Q$-linear if $\bar{Q}_{1}<1$;

• $Q$-linear if $0<\bar{Q}_{1}<1$;

• exact $Q$-linear if $0<\underline{Q}_{1}\leq\bar{Q}_{1}<1$.

###### Remark 2.12.

(a) Obviously, when $x_{k}\rightarrow x^{\ast}$, then $\underline{Q}_{1}\leq 1$ and $\underline{Q}_{1}\leq\bar{Q}_{1}$.

(b) $\bar{Q}_{1}\in[0,+\infty]$ always exist, while $Q_{1}$ may not (i.e., when $\underline{Q}_{1}<\bar{Q}_{1}$).

(c) $\bar{Q}_{1}<1$ $\Rightarrow$ monotone, while $\bar{Q}_{1}>1$ $\Rightarrow$ nonmonotone errors; $\bar{Q}_{1}=\infty$ means unbounded nonmonotone errors. Unlike $0, $0<\bar{Q}_{1}\leq+\infty$ alone does not necessarily imply slow speed (e.g., $\{x^{a}_{k}\},\{x^{c}_{k}\}$ with unbounded nonmonotone errors).

The convergence may be fast when $\underline{Q}_{1}=0$, even if $\bar{Q}_{1}=\infty$ (e.g., $\{x^{a}_{k}\},\{x^{c}_{k}\}$), but is no longer fast when $0<\bar{Q}_{1}$.

Strict $Q$-superlinear is an intermediate order between linear and $p_{0}>1$.

###### Definition 2.13([67, p. 285]).

$\{x_{k}\}$ has $Q$-superlinear order999Or at least $Q$-superlinear, or even superlinear, as $R$-superlinear is seldom encountered. if $\bar{Q}_{1}=0(=Q_{1})$:

 $\lim_{k\rightarrow\infty}\frac{e_{k+1}}{e_{k}}=0,\qquad(\Leftrightarrow\exists c% _{k}\rightarrow 0\mbox{ s.t. }e_{k+1}=c_{k}e_{k}).$

Strict $Q$-superlinear order holds when, moreover, $\bar{Q}_{p}=+\infty$ $\forall p>1$.

###### Remark 2.14(cf. [75, (37b)]).

$Q$-superlinear order holds if $\exists\theta_{k}>0$ and $c>0$ such that $\theta_{k}\rightarrow 0$ and $e_{k}=c\prod_{i=1}^{k}\theta_{i}$ ($c$ may be taken to be $1$).

###### Example 2.15(strict $Q$-superlinear).

(a) Let $\big{\{}\frac{1}{k^{k}}\big{\}}$ [6, p. 22], $\big{\{}\frac{1}{10^{k^{2}}}\big{\}}$ [11], [4], $\big{\{}\frac{1}{k!}\big{\}}$ [47], $\big{\{}\frac{1}{c^{k^{2}}}\big{\}}$, $c>1$ [67, E 9.2-1(j)].

(b) [67, E 10.1-4], [79, p. 94] Given $0, let $x_{k+1}=-\frac{x_{k}}{\ln x_{k}}$, $k\geq 0$, $x_{0}=c.$

###### Exercise 2.16.

$\{x^{b}_{k}\}$ has $Q$-superlinear order, but not strict $Q$-superlinear order.

The following formulation has been commonly used for half a century, since the 1970 book of Ortega and Rheinboldt: “$\{x_{k}\}$ converges with $Q$-order at least $p_{0}>1$[67], [79], [76], [85]. Here we deal, as in [4] and [20], with a more restrictive notion, with the classic notion corresponding to the lower $Q$-order $q_{l}$ from definition 2.30.

###### Definition 2.17(see [20]; cf. [76], [4]).

$\{x_{k}\}$ has $Q$-order $p_{0}>1$ if any of the following equivalent conditions hold:

 $\displaystyle\lim_{k\rightarrow\infty}Q_{p}(k)=\bigg{\{}\begin{matrix}0,&p\in[% 1,p_{0}),\\[2.84526pt] +\infty,&p\in(p_{0},+\infty);\end{matrix}$ ($Q$) $\displaystyle\lim_{k\rightarrow\infty}Q_{L}(k)=p_{0},\qquad Q_{L}(k):=\frac{% \ln e_{k+1}}{\ln e_{k}};$ ($Q_{L}$) $\displaystyle\lim_{k\rightarrow\infty}Q_{\Lambda}(k)=p_{0},\qquad Q_{\Lambda}(% k):=\frac{\ln\frac{e_{k+2}}{e_{k+1}}}{\ln\frac{e_{k+1}}{e_{k}}};$ ($Q_{\Lambda}$)

or $\forall\varepsilon>0,$ $\exists A,B>0$ such that

 $Ae_{k}^{p_{0}+\varepsilon}\leq e_{k+1}\leq Be_{k}^{p_{0}-\varepsilon}\ \;\;% \forall k\geq k_{0}.$ ($Q_{\varepsilon}$)
###### Remark 2.18.

(a) If the $Q$-order $p_{0}$ exists, it is unique (recall remark 2.8).

(b) If $\{x_{k}\}$ has $Q$-order $p_{0}>1$, then the right-hand side inequalities in eq. $Q_{\varepsilon}$ imply that the errors are strictly monotone $(k\geq k_{0})$; see remarks 2.5 and 2.1.

In [20] we proved the equivalence of the four conditions above for $\{x_{k}\}\subset\mathbb{R}^{N}$, connecting and completing the independent, fundamental results of Potra [76] and Beyer, Ebanks, and Qualls [4]. This proof does not simplify for $\{x_{k}\}\subset\mathbb{R}$, but a natural connection between these conditions is found, e.g., by taking logarithms in eq. $C_{\varepsilon}$ to obtain eq. $Q_{L}$.101010$p_{0}$ in eq. $Q_{L}$ roughly means that the number of figures is multiplied by $p_{0}$ at each step ($k\geq k_{0}$); see, e.g., [48] and [79, p. 91]. Also, eq. $Q_{L}$ and eq. $Q_{\Lambda}$ can be connected by 2.11 and as follows.

###### Exercise 2.19.

If $\{x_{k}\}$ has strict monotone errors for $k\geq k_{0}$, prove that eq. $Q_{\Lambda}$ $\Rightarrow$ eq. $Q_{L}$ (hint: use the Stolz–Cesàro Theorem).

###### Remark 2.20.

(a) One can always set $k_{0}=0$ in eq. $Q_{\varepsilon}$ (this is actually used in the definitions of Potra from [76]), by a proper choice of smaller $A$ and larger $B$.

(b) In [20] we have denoted by $(Q_{\varepsilon})$ a condition which is trivially equivalent to eq. $Q$. Here we use $(Q_{\varepsilon})$ instead of $(Q_{I,\varepsilon})$ from [20] (“$I$” stood for “inequality” in that paper).

(c) eq. $Q$ implies that the $Q$-profile of $\{x_{k}\}$ has a single jump at $p_{0}>1$, but that the limit $Q_{p_{0}}$ is not required to exist, so

 $0\leq\underline{Q}_{p_{0}}\leq\bar{Q}_{p_{0}}\leq\infty,$ (4)

and so six cases result for the possible values of $\underline{Q}_{p_{0}}$ and $\bar{Q}_{p_{0}}$ ($0$, finite $>0$, $+\infty$).

(d) Relations $\underline{Q}_{p_{0}}=0$ or $\bar{Q}_{p_{0}}=+\infty$ might seem rather abstract, but they do actually occur (even simultaneously, as we shall see for the higher orders of the secant method; see also [99, p. 252 and Chap. 7], where $\underline{Q}_{2}=\bar{Q}_{2}=+\infty$, for a problem in $\mathbb{R}^{N}$).

(e) $Q$-order $p_{0}>1$ implies $\bar{Q}_{1}=0$, but the converse is not true.

fig. 8 shows the $Q$-profiles of $\{2^{-2^{k}/k}\}$ and $\{x_{k}^{d}\}$ ($c=\frac{5}{4}=1.25$).

Superlinearity may also be defined in conjunction with higher $Q$-orders, but this is tricky.

###### Definition 2.21(see [47], [20]).

The $Q$-order $p_{0}>1$ is called

• $Q$-superorder $p_{0}$ if $\bar{Q}_{p_{0}}=0(=Q_{p_{0}})$;

• $Q$-suborder $p_{0}$ if $\underline{Q}_{p_{0}}=+\infty(=Q_{p_{0}})$.111111In [20] this was defined by $\bar{Q}_{p_{0}}=+\infty$, which we believe is not strong enough.

The $Q$-order $2$ with $\bar{Q}_{2}=0$ is $Q$-superquadratic (analogously, $Q$-supercubic, etc.).

###### Exercise 2.22.

(a) The $Q$-superquadratic $\{\dot{x}_{k}\}=\{2^{-k2^{k}}\}$, the $Q$- (and $C$-)quadratic $\{2^{-2^{k}}\}$, and the $Q$-subquadratic $\{\mathring{x}_{k}\}=\{2^{-\frac{2^{k}}{k}}\}$ are in $[$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$ $\forall\alpha>1]$ decreasing order of speed. Study the $Q$-order of $z_{k}=\Big{\{}\begin{smallmatrix}\dot{x}_{k},&k\text{ odd,}\\[2.84526pt] \mathring{x}_{k},&k\text{ even,}\end{smallmatrix}$ ($2$ is erroneous in [20]).

(b) Although the $C$-quadratic $\{2^{-2^{k}}\}$ is eq. $\dot{e}_{k}=o(\mathring{e}_{k})$ faster than $\{k2^{-2^{k}}\}$, the latter, considered in [47], is actually $Q$-superquadratic; similarly, $\{2^{-2^{k}}/k\}$ is $Q$-subquadratic. The terminology “$Q$-sub/superquadratic” appears here to be in disagreement with the real speed.

(c) Determine the $Q$-orders from 2.7 by using eq. $Q_{L}$.

###### Example 2.23.

Let $\big{\{}10\cdot 2^{-3^{(k-1/{4^{k}})}}\big{\}}$ ($C$-cubic), $\big{\{}10\cdot 3^{-2^{(k+1/{3^{k}})}}\big{\}}$ ($C$-quadratic), and $\{\frac{1}{k^{k}}\}$ ($Q$-superlinear); in fig. 9 we see that $Q_{L}(k),Q_{\Lambda}(k-1)$ tend to $3,2,$ and $1$, respectively.

###### Remark 2.24.

Plotting/comparing $Q_{L}(k)$ and $Q_{\Lambda}(k-1)$ avoids conclusions based on $Q_{\Lambda}(k)$ using information ahead of that from $Q_{L}(k)$ (i.e., $x_{k+2}$ vs.  $x_{k+1}$).

The calculation of the order may be made easier by using logarithms.

###### Exercise 2.25.

Let $x_{k}\rightarrow x^{\ast}\in\mathbb{R}$.

(a) If, for some given $q\geq 1$ and $A,B>0$, one has

 $Ae_{k}e_{k-1}^{q}\leq e_{k+1}\leq Be_{k}e_{k-1}^{q},\quad k\geq k_{0},$ (5)

show that $\{x_{k}\}$ has $Q$-order

 $\lambda_{q}:=\frac{1+\sqrt{1+4q}}{2}.$

Thus, $\lambda_{1}\approx 1.618$ (the golden ratio), $\lambda_{2}=2$, $\lambda_{3}=\frac{1+\sqrt{13}}{2}\approx 2.3$, etc. Inequality eq. 5 appears in the analysis of the secant method (see theorems 4.18 and 4.22), and it does not necessarily attract $C$- or even exact $Q$-orders, even if $\exists\lim_{k\rightarrow\infty}\frac{e_{k+1}}{e_{k}e_{k-1}^{q}}=c\in(0,\infty)$.

(b) Calculate the $Q$-order of $\{x_{k}\}$ if it satisfies (see [94], [78])

 $Ae_{k}^{2}e_{k-1}\leq e_{k+1}\leq Be_{k}^{2}e_{k-1}.$

(c) If $\{x_{k}\}$ verifies the general relation (see, e.g., [95, Chap. 3], [7], [76], [44], [72])

 $Ae_{k}^{\alpha_{0}}\cdots e_{k-q}^{\alpha_{q}}\leq e_{k+1}\leq Be_{k}^{\alpha_% {0}}\cdots e_{k-q}^{\alpha_{q}},$

determine the condition verified by the $Q$-order (see [76] for the exact $Q$-order).

A special case holds when $\varepsilon=0$ in eq. $Q_{\varepsilon}$ (see also [9], [88], [79], [11], [76]).

###### Definition 2.26(see, e.g., [8]).

$\{x_{k}\}$ has exact $Q$-order $p_{0}$ if $\exists A,B,k_{0}\geq 0$, s.t.

 $A\cdot e_{k}^{p_{0}}\leq e_{k+1}\leq B\cdot e_{k}^{p_{0}}\ \;\;\forall k\geq k% _{0}.$ ($\bar{\underline{Q}}$)

This case can be characterized in terms of the asymptotic constants $\underline{Q}_{p_{0}},\bar{Q}_{p_{0}}$.

###### Proposition 2.27.

$\{x_{k}\}$ has exact $Q$-order $p_{0}$ iff $\underline{Q}_{p_{0}},\bar{Q}_{p_{0}}$ are finite and nonzero,

 $0<\underline{Q}_{p_{0}}\leq\bar{Q}_{p_{0}}<\infty,$ ($\bar{\underline{Q}}_{Q}$)

in which case the constants $A,B$ in eq. $\bar{\underline{Q}}$ are bounded by

 $A\leq\underline{Q}_{p_{0}}\leq\bar{Q}_{p_{0}}\leq B,$

and these bounds are attained in some special restrictive circumstances.

###### Example 2.28.

(a) $\{x^{d}_{k}\}$ in example 2.6 has exact $Q$-order $2$: $\underline{Q}_{2}=\frac{1}{c^{3}}\neq\bar{Q}_{2}=c^{3}$.

(b) $x_{k}=2^{-2^{k}}$, $k$ odd, $x_{k}=3\cdot 2^{-2^{k-\frac{1}{3^{3^{k}}}}}$, $k$ even, has $Q(2k)<\underline{Q}_{2}=\frac{1}{9}$, i.e., $A<\underline{Q}_{2}$.

###### Remark 2.29.

The sequences $\{\mathring{e}_{k}\}$ and $\{\d{e}_{k}\}$ are asymptotically similar when $\mathring{e}_{k}={\mathcal{O}}(\d{e}_{k})$ and $\d{e}_{k}={\mathcal{O}}(\mathring{e}_{k})$, as $k\rightarrow\infty$, denoted by $\d{e}_{k}=\Theta(\mathring{e}_{k})$ (see, e.g., [53], [25, p. 50]).

The exact $Q$-order may also be expressed as $e_{k+1}={\mathcal{O}}(e_{k}^{p_{0}})$ and $e_{k}^{p_{0}}={\mathcal{O}}(e_{k+1})$, as $k\rightarrow\infty$ (see [8], [9, p. 2], and also [71]), i.e., $e_{k+1}=\Theta(e_{k}^{p_{0}})$.

The classical $Q$-order of Ortega and Rheinboldt is the $q_{l}$ below.

###### Definition 2.30([4], [67], [88], [76]).

The lower/upper $Q$-orders of $\{x_{k}\}$ are

 $q_{l}=\Big{\{}\begin{matrix}\infty,\text{\ \ \ if \ }\bar{Q}_{p}=0\ \forall p% \geq 1,\\[2.84526pt] \inf\{p\in[1,\infty)\,:\,\bar{Q}_{p}=+\infty\},\end{matrix}\quad\mbox{resp., % \quad}q_{u}=\sup\big{\{}p\in[1,\infty):\underline{Q}_{p}=0\big{\}}.$ (6)

When $\{x_{k}\}$ has $Q$-order $p_{0}>1$, the lower and upper orders coincide, $q_{l}=q_{u}=p_{0}$; otherwise, $q_{l} [4]; we will also see this in theorem 2.48 (relation eq. 13).

Next we analyze $\{x^{b}_{k}\}$ from example 1.8, which has no $Q$-order, despite it being eq. $\dot{e}_{k}\leq\mathring{e}_{k}$ faster than $\big{\{}\frac{1}{2^{2^{k}}}\big{\}}$. We keep in mind that the $Q$-order does not measure the overall speed, but just compares the consecutive errors.

###### Example 2.31([20]).

Let $x^{b}_{k}=\Big{\{}\begin{smallmatrix}2^{-2^{k}},&k\ \text{even,}\\ 3^{-2^{k}},&k\ \text{odd,}\end{smallmatrix}$ with the $Q$-profile in fig. 10(a).

$\{x^{b}_{k}\}$ does not have a $Q$-order, as $q_{l}=\log_{3}4=1.2\ldots but still it has (exact) $R$-order $2$ (see definitions 2.35 and 2.38). We note the usual statement in this case that “$\{x^{b}_{k}\}$ converges with $Q$-order at least $q_{l}=1.2\ldots$ and with $R$-order at least $2$,” that no longer holds in the setting of this paper.

###### Exercise 2.32.

Show that $\{x^{c}_{k}\}$ has $\underline{Q}_{1}=0,\bar{Q}_{1}=+\infty$ and determine $q_{l},q_{u}$ (here we find $q_{l}<1$, and the $Q$-profile can be extended, as in [4], for $p<1$).

Determine the $Q$-profiles of $\{x^{a}_{k}\}$ and $\{x^{c}_{k}\}$.

The inequalities implied by the lower/upper $Q$-orders are as follows.

###### Remark 2.33.

If $1, then $\forall\varepsilon>0$, with $1, $\exists B>0,k_{0}\geq 0$, s.t. [80]

 $e_{k+1}\leq B\cdot e_{k}^{q_{l}-\varepsilon}\ \;\;\forall k\geq k_{0}$ (7)

(and relation eq. 6 says that $q_{l}$ is the largest value satisfying the above inequalities).

When $1, then $\forall\varepsilon>0,$ $\exists A>0,k_{1}\geq 0,$ such that

 $A\cdot e_{k}^{q_{u}+\varepsilon}\leq e_{k+1}\ \;\;\forall k\geq k_{1}.$ (8)

One can say that the lower $Q$-order is small when the maximum error reduction w.r.t. the previous step is small (i.e., small exponent $q_{l}+\varepsilon$ for $e_{k}$). Analogously, the upper order is large when the minimum error reduction per step is small.

The exact lower/upper $Q$-orders $q_{l}$, respectively, $q_{u}$ are defined if $\varepsilon=0$ in eq. 7eq. 8:

 $A\cdot e_{k}^{q_{u}}\leq e_{k+1}\leq B\cdot e_{k}^{q_{l}}\;\;\forall k\geq k_{% 2}.$

The role of $q_{u}$ will be seen in example 2.46.

The lower and upper $Q$-orders verify the relations (see [76], [20])

 $\displaystyle q_{l}=$ $\displaystyle\underline{Q}_{L}:=\liminf_{k\rightarrow\infty}\frac{\ln e_{k+1}}% {\ln e_{k}},$ (9) $\displaystyle q_{u}=$ $\displaystyle\bar{Q}_{L}:=\limsup_{k\rightarrow\infty}\frac{\ln e_{k+1}}{\ln e% _{k}},$ (10)

which will be completed in formulae eq. 19, respectively, eq. 20 by some computational variants.

### 2.3 $R$-Order $p_{0}>1$

The root factors consider some averaged quantities instead of relating the consecutive terms to one another; they are defined for $p=1$ by

 $R_{1}(k)=e_{k}^{\frac{1}{k}},\quad k\geq 1.$

We propose the following terminology (see also [67], [63], [75], [79], [100]):

• $R$-sublinear/with no $R$-order if $\bar{R}_{1}=1$ (e.g., $\{\frac{1}{k}\},\{\frac{1}{\sqrt{k}}\}$; see [75, Rem. 1.2.40]);

• $R$-linear if $0<\bar{R}_{1}<1$;

• at least $R$-linear if $\bar{R}_{1}<1$ (e.g., $x_{k}=\frac{1}{2^{k}}$, $k$ odd, $x_{k}=\frac{1}{4^{k}}$, $k$ even [75, Rem. 1.2.40], or $x_{k}=\frac{1}{2^{k}}$, $k$ even, $x_{k}=\frac{1}{2^{2^{k}}}$, $k$ odd);

• exact $R$-linear if $0<\underline{R}_{1}\leq\bar{R}_{1}<1$;

• (at least) $R$-superlinear if $\bar{R}_{1}=0(=R_{1})$ (e.g., $x_{k}=\frac{1}{k^{k}}$, $k$ even, $x_{k}=x_{k-1}$, $k$ odd).

The $R$-orders are alternatively defined by requiring the errors to be bounded by sequences converging to zero with corresponding $Q$-orders [32, p. 21], [51, Def. 2.1.3], [43, p. 218]:

• at least $R$-linear if $\exists\theta\in(0,1)$ and $c>0$ s.t. $e_{k}\leq c\cdot\theta^{k}$ ($k\geq 0$) [75, (37a)];

• $R$-superlinear if $\exists\theta_{k}\rightarrow 0$ and $c>0$ s.t. $e_{k}\leq c\prod_{i=1}^{k}\theta_{i}$ [75, (37b)].

###### Remark 2.34.

(a) At least $Q$-linear $\Rightarrow$ at least $R$-linear [67, E 9.3-5], [75, Thm.1.2.41], [77], [100], [43, p. 213] (using, e.g., eq. 14); the converse is false (see above or [100]).

(b) $Q$-superlinear $\Rightarrow$ $R$-superlinear, with the converse again being false (see, e.g., the above example).

When $p>1$, the root factors are defined by (see [67], [79], [76], [85])

 $R_{p}(k)={e_{k}}^{\frac{1}{p^{k}}},\quad k\geq 0.$
###### Definition 2.35(see [20]; cf. [76], [7]).

$\{x_{k}\}$ has $R$-order $p_{0}>1$ if any of the equivalent relations hold:121212In order that relation eq. $R_{\Lambda}$ is properly defined and equivalent to the rest of the following definitions, an additional assumption is required (see [20]): $1

 $\displaystyle\lim_{k\rightarrow\infty}R_{p}(k)=\bigg{\{}\begin{matrix}0,&p\in[% 1,p_{0}),\\[2.84526pt] 1,&p\in(p_{0},+\infty);\end{matrix}$ ($R$) $\displaystyle\lim_{k\rightarrow\infty}R_{L}(k)=p_{0},\quad R_{L}:=\big{|}\ln e% _{k}\big{|}^{\frac{1}{k}};$ ($R_{L}$) $\displaystyle\lim_{k\rightarrow\infty}R_{\Lambda}(k)=p_{0},\quad R_{\Lambda}:=% \bigg{|}\ln\frac{e_{k+1}}{e_{k}}\bigg{|}^{\frac{1}{k}};$ ($R_{\Lambda}$)

or $\forall\varepsilon>0,$ $\exists A,B>0$, $0<\eta,\theta<1$, and $k_{0}\geq 0$ such that

 $A\cdot\eta^{\left(p_{0}+\varepsilon\right)^{k}}\leq e_{k}\leq B\cdot\theta^{% \left(p_{0}-\varepsilon\right)^{k}}\;\;\forall k\geq k_{0}.$ ($R_{\varepsilon}$)
###### Remark 2.36.

(a) If the $R$-order $p_{0}$ exists, it is unique (see also [67, 9.2.3, p. 289]).

(b) We can always set $k_{0}=0$ in eq. $R_{\varepsilon}$, by choosing smaller $A$ and larger $B$ suitably (cf. [76], where $k_{0}=0$ was used in definitions).

(c) We prefer here the notation $(R_{\varepsilon})$ instead of $(R_{I,\varepsilon})$ from [20].

For $p\geq 1$ one defines the asymptotic quantities (see [67, sect. 9.2] and [88, p. 85])

 $\displaystyle\underline{R}_{p}=\liminf\limits_{k\rightarrow\infty}\,R_{p}(k),% \quad\bar{R}_{p}=\limsup\limits_{k\rightarrow\infty}\,R_{p}(k)\leq 1.$ (11)

An example of an $R$-profile is shown in fig. 10(b) for $\{x^{b}_{k}\}$, with (exact) $R$-order $2$.

###### Remark 2.37([67, p. 290], [20]).

If $\bar{R}_{p_{0}}<1$, then $\forall\varepsilon>0$ with $\bar{R}_{p_{0}}+\varepsilon<1,$ $\exists k_{0}\geq 0$ s.t.

 $e_{k}\leq\left(\bar{R}_{p_{0}}+\varepsilon\right)^{p_{0}^{k}}\;\;\;\forall k% \geq k_{0},$

while if $\underline{R}_{p_{0}}>0$, then $\forall\varepsilon>0$ with $0<\underline{R}_{p_{0}}-\varepsilon,$ $\exists k_{1}\geq 0$ s.t.

 $\left(\underline{R}_{p_{0}}-\varepsilon\right)^{p_{0}^{k}}\leq e_{k}\quad% \forall k\geq k_{1}.$

The following notion is defined similarly to the exact $Q$-order.

###### Definition 2.38([76]).

$\{x_{k}\}$ has exact $R$-order $p_{0}$ if $\exists A,B>0,0<\eta,\theta<1$, s.t.

 $A\cdot\eta^{p_{0}^{k}}\leq e_{k}\leq B\cdot\theta^{p_{0}^{k}}\quad\forall k% \geq k_{0}.$ ($\bar{\underline{R}}$)
###### Example 2.39.

$\{x^{b}_{k}\}$, $\{x^{c}_{k}\}$ have exact $R$-orders $2$: $\underline{R}_{2}=\frac{1}{3}$, $\bar{R}_{2}=\frac{1}{2}$, and $\underline{R}_{2}=\frac{1}{5}$, $\bar{R}_{2}=\frac{1}{2}$, respectively.

###### Exercise 2.40.

The $Q$-sub-/superquadratic $\{2^{-2^{k}}/k\}$, $\{k2^{-2^{k}}\}$ have $R_{2}=\frac{1}{2}$, but not exact $R$-order $2$. The $Q$-sub-/superquadratic $\{2^{-2^{k}/k}\}$, $\{2^{-k2^{k}}\}$ have $R_{2}=1$, respectively, $R_{2}=0$, i.e., they are $R$-sub-/superquadratic, again without having exact $R$-order $2$.

###### Remark 2.41.

Perhaps a proper definition of $Q$-superorder $p_{0}$ would be $Q$-order $p_{0}$ with $Q_{p_{0}}=0=R_{p_{0}}$, while $Q$-suborder $p_{0}$ would be $Q$-order $p_{0}$ with $Q_{p_{0}}=\infty$, $R_{p_{0}}=1$.

We characterize the exact $R$-order in the following result.

###### Proposition 2.42.

The exact $R$-order $p_{0}$ of $\{x_{k}\}$ is equivalently defined by

 $0<\underline{R}_{p_{0}}\leq\bar{R}_{p_{0}}<1,$ ($\bar{\underline{R}}_{R}$)

which implies the following bounds for $\eta,\theta$ from eq. $\bar{\underline{R}}$:

 $\eta\leq\underline{R}_{p_{0}}\leq\bar{R}_{p_{0}}\leq\theta;$

these bounds are attained in some special restrictive circumstances.

###### Remark 2.43.

A particular instance from eq. $\bar{\underline{R}}$, i.e., $e_{k}\leq B\cdot\theta^{2^{k}}$, was considered as a definition for (at least) $R$-quadratic convergence, and some computational scientists (who we suspect are not computational optimizers) have misled by simply calling it “quadratic convergence,” leading to the confusion noted in [93].

As the $R$-orders require weaker conditions than the $Q$-orders, they may allow nonmonotone errors. This aspect is perhaps not widely known; we have found it pointed out in [65, p. 620, and in the 1st ed.], [51, p. 14], [25, p. 51], and [47].

Clearly, iterative methods with nonmonotone errors are usually not desired.

We can easily find conditions for monotone errors in the case of exact $R$-order $p_{0}>1$.

###### Theorem 2.44(monotone errors in exact $R$-orders).

If $\{x_{k}\}$ obeys eq. $\bar{\underline{R}}$ and
$p_{0}>\frac{\ln\eta}{\ln\theta}\left(\geq\frac{\ln\bar{R}_{p_{0}}}{\ln% \underline{R}_{p_{0}}}\right)\quad\mbox{or}\quad\left(p_{0}=\frac{\ln\eta}{\ln% \theta}\ \mbox{and}\ B
then it has strict monotone errors ($k\geq k_{0}$).

The lower and upper $R$-orders, $r_{l}$, respectively, $r_{u}$, and further notations from [20] are

 $\displaystyle r_{l}=$ $\displaystyle\inf\big{\{}p\in[1,\infty):\bar{R}_{p}=1\big{\}},\qquad r_{u}=% \sup\big{\{}p\in[1,\infty):\underline{R}_{p}=0\big{\}},$ $\displaystyle\underline{R}_{L}:=$ $\displaystyle\liminf\limits_{k\rightarrow\infty}\big{|}\ln e_{k}\big{|}^{\frac% {1}{k}},\quad\bar{R}_{L}:=\limsup\limits_{k\rightarrow\infty}\big{|}\ln e_{k}% \big{|}^{\frac{1}{k}},$ $\displaystyle\underline{R}_{\Lambda}:=$ $\displaystyle\liminf\limits_{k\rightarrow\infty}\bigg{|}\ln\frac{e_{k+1}}{e_{k% }}\bigg{|}^{\frac{1}{k}},\quad\bar{R}_{\Lambda}:=\limsup\limits_{k\rightarrow% \infty}\bigg{|}\ln\frac{e_{k+1}}{e_{k}}\bigg{|}^{\frac{1}{k}}.$
###### Remark 2.45.

The lower $R$-order $r_{l}$ of $\{x_{k}\}$ was also defined as follows: $\exists\{\mathring{x}_{k}\}$ with $C$-order $\mathring{p}_{0}=r_{l}$ s.t. $\{x_{k}\}$ is eq. $\dot{e}_{k}\leq\mathring{e}_{k}$ faster than $\{\mathring{x}_{k}\}$ [29], [76], [50, Def. 4.1.3].

We now consider some sequences similar to those in example 1.8.

###### Example 2.46.

(a) Let $\mathring{x}_{k}=2^{-2^{k}}$ and take $\dot{x}_{k}=\Big{\{}\begin{smallmatrix}2^{-2^{k}},&k\ {\it even},\\ 2^{-3^{k}},&k\ {\it odd}\end{smallmatrix}$ (Jay [47]).

Then $\{\dot{x}_{k}\}$ converges eq. $\dot{e}_{k}\leq\mathring{e}_{k}$ faster than $\{\mathring{x}_{k}\}$ and though $\{\dot{x}_{k}\}$ has neither $C$-order, nor $Q$-order, nor $R$-order ($\dot{r}_{l}=2$, $\dot{r}_{u}=3$), $\{\mathring{x}_{k}\}$ has $C$-order $2$.

(b) Extending the above behavior, $\dot{x}_{k}=\Big{\{}\begin{smallmatrix}2^{-4^{k}},&k\ {\it even},\\ 2^{-5^{k}},&k\ {\it odd}\end{smallmatrix}$ has no $C$-, $Q$-, or $R$-order, but it converges $[$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$ $\forall\alpha>1]$ faster than $\mathring{x}_{k}=2^{-2^{k}}$.

(c) $\dot{x}_{k}=\Big{\{}\begin{smallmatrix}2^{-3^{2^{k}}},&k\ {\it even},\\ 2^{-4^{2^{k}}},&k\ {\it odd}\end{smallmatrix}$ is $[$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$ $\forall\alpha>1]$ faster than $\mathring{x}_{k}=2^{-2^{2^{k}}}$; $\{\dot{x}_{k}\}$ has no $C$- or $Q$-order (but has infinite $R$-order), while $\{\mathring{x}_{k}\}$ has infinite $C$-order.

Here $\{\dot{x}_{k}\}$ deserves a closer look, as it is a perfect candidate to support the comments from [93]: indeed, it attains infinite $R$-order, but its errors are unbounded nonmonotone. Nocedal and Wright [65, p. 620] also noted such possible behavior.

###### Remark 2.47.

Clearly, statements such as those in the above example can appear only for sequences $\{\dot{x}_{k}\}$ with $\underline{Q}_{1}=0$, as this condition allows large upper orders $q_{u}$. When $\underline{Q}_{1}>0$, the corresponding sequence cannot converge quickly.

### 2.4 Relations between the $C$-, $Q$-, and $R$-Orders of a Sequence

$C$-order $p_{0}>1$ of $\{x_{k}\}$ implies $Q$-order $p_{0}$ and in turn $R$-order $p_{0}$ (see 2.22 and 2.31 for converses). We state this below and, as in [4], we use curly braces for equivalent orders.

###### Theorem 2.48(see [20]; cf. [76], [4]).

Let $x_{k}\rightarrow x^{\ast}$ and $p_{0}>1$. Then (see footnote 12)

 $\{C,C_{Q},C_{\varepsilon}\}\underset{\nLeftarrow}{\Rightarrow}\{Q,Q_{L},Q_{% \Lambda},Q_{\varepsilon}\}\underset{\nLeftarrow}{\Rightarrow}\{R,R_{L},R_{% \Lambda},R_{\varepsilon}\},$ (12)

or, in a more generic fashion (i.e., $\{C\}:=\{C,C_{Q},C_{\varepsilon}\}$),

 $\{C\}\underset{\nLeftarrow}{\Rightarrow}\{Q\}\underset{\nLeftarrow}{% \Rightarrow}\{R\}.$

Moreover, the following relation holds for the lower and upper orders:

 $q_{l}=\underline{Q}_{L}\leq\underline{R}_{L}=\underline{R}_{\Lambda}=r_{l}\leq r% _{u}=\bar{R}_{L}=\bar{R}_{\Lambda}\leq\bar{Q}_{L}=q_{u}.$ (13)

Relation eq. 13 is obtained from the well-known inequalities for positive numbers,

 $\liminf\limits_{k\rightarrow\infty}\frac{a_{k+1}}{a_{k}}\leq\liminf\limits_{k% \rightarrow\infty}\left|a_{k}\right|^{\frac{1}{k}}\leq\limsup_{k\rightarrow% \infty}\left|a_{k}\right|^{\frac{1}{k}}\leq\limsup_{k\rightarrow\infty}\frac{a% _{k+1}}{a_{k}},$ (14)

taking $a_{k}=\big{|}\ln e_{k}\big{|}$; see [76] and [20].

Any inequality from eq. 13 may be strict. Now we see (e.g., in examples 2.31 and 2.49) that the inner inequality may be an equality (i.e., obtain $R$-order), while one of the outer inequalities may be strict (i.e., no $Q$-order).

An $\{x_{k}\}$ with exact $R$-order $\tau>1$ arbitrarily large and $1 (but arbitrarily close) is again suitable for justifying the comments from [93].

###### Example 2.49([67, E 9.3-3], [76]).

Given any numbers $1, take $0<\theta<1$, $\eta=\theta^{q}$ with $q>1$ such that $qs>\tau.$ Then $x_{k}=\Big{\{}\begin{smallmatrix}\theta^{\tau^{k}},&k\;\text{odd,}\\ \eta^{\tau^{k}},&k\;\text{even}\end{smallmatrix}$ has exact $R$-order $\tau$, while $q_{l}=\underline{Q}_{L}=\frac{\tau}{q}$ and $q_{u}=\bar{Q}_{L}=\tau q$($>\tau$), and thus it has no $Q$-order (in the classical sense from [67] $\{x_{k}\}$ has $Q$-order at least $\frac{\tau}{q}$).

### 2.5 Comparing the Speeds of Two Sequences

We consider only $C$-orders.

###### Comparison 2.50(higher $C$-order, faster speed).

If $\{\mathring{x}_{k}\},\{\dot{x}_{k}\}$ have $C$-orders $1<\mathring{p}_{0}<\dot{p}_{0}$, then $\{\dot{x}_{k}\}$ is $[$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$ $\forall\alpha>1]$ faster than $\{\mathring{x}_{k}\}$.

For the proof, one may use eq. $C_{\varepsilon}$. In [21] we show some simplified proofs.

###### Remark 2.51.

(a) 2.50 holds regardless of the magnitude of $\dot{p}_{0}-\mathring{p}_{0}$.

(b) The $C$-orders (i.e., sublinear, linear, strict superlinear, ( strict superlinear removed! ), $1<\mathring{p}_{0}<\dot{p}_{0}$, and infinite) form classes in $[$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$ $\forall\alpha>1]$ increasing speed (cf. 2.2).

As seen in previous examples, comparing by $[$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$ $\forall\alpha>1]$ is much stronger than by $[$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$, $\alpha\in(1,\alpha_{0}]]$ for some given $\alpha_{0}>1$.

How similarly do two sequences with the same $C$-order and identical $Q_{p_{0}}$ behave?

###### Comparison 2.52.

$\{\frac{1}{3^{2^{k}}}\}$ is $\big{[}$eq. $\dot{e}_{k}={\mathcal{O}}(\mathring{e}_{k}^{\alpha})$, $\alpha\in(1,\frac{\ln 3}{\ln 2}]\big{]}$ faster than $\{\frac{1}{2^{2^{k}}}\}$ (recall 1.6). This shows that if $\{\mathring{x}_{k}\},\{\dot{x}_{k}\}$ have $C$-order $p_{0}$ with the same $Q_{p_{0}}$, this does not necessarily mean that (in the asymptotic range) one has $\{\dot{e}_{k}\}\approx\{\mathring{e}_{k}\}$.

Brezinski [11] noted that both $\{\frac{1}{k}\}$ and $\{\frac{1}{k^{2}}\}$ have the same $Q_{1}=1$ (i.e., $C$-sublinear order), but quite different speeds.

###### Remark 2.53.

Assessing the asymptotic constant $Q_{p_{0}}$ may not make sense when $\{x_{k}\}\subset\mathbb{R}^{N},N\geq 2$, as its definition is norm-dependent [67, E 9.1-2], [20].

## 3 Computational Versions of the Convergence Orders

The error-based analysis of the orders above is similar in some ways to the local convergence theory for iterative methods from section 4: both assume the existence of the limit/solution $x^{\ast}$ and infer essential properties, even if $x^{\ast}$ is not known and in practice one needs to use quantities based solely on information available at each step $k$.

Next, we analyze two practical approaches, equivalent to error-based analysis: the replacing of $|x^{\ast}-x_{k}|$ by (the absolute value of) either the corrections $s_{k}:=|x_{k+1}-x_{k}|$ or the nonlinear residuals $|f_{k}|:=|f(x_{k})|$.

We keep the notation from previous work and obtain a rather theoretical setting in this analysis (e.g., $s_{k}$ requiring $x_{k+1}$); in numerical examples, however, we will use only information available at step $k$ (i.e., $x_{k},|f_{k}|,s_{k-1},x_{k-1},|f_{k-1}|,s_{k-2},\ldots,$ etc.).

### 3.1 Computational Convergence Orders Based on Corrections

When the corrections $\{s_{k}\}$ converge with lower $Q$-order $q_{l}$, then $\{x_{k}\}$ also converges and attains at least lower $R$-order $q_{l}$.

###### Theorem 3.1(see [75, Thm. 1.2.42], [43, Lem. 4.5.6]).

Let $\{x_{k}\}\subset\mathbb{R}$. If $\exists c\in(0,1)$ and $k_{0}\in\mathbb{N}$ with

 $|x_{k+1}-x_{k}|\leq c\cdot|x_{k}-x_{k-1}|\quad(\mbox{i.e., }s_{k}\leq c\cdot s% _{k-1}),\quad k\geq k_{0}+1,$

then $\exists x^{\ast}\in\mathbb{R}$ such that $x_{k}\rightarrow x^{\ast}$ at least $R$-linearly.

If $\exists c>0,p_{0}>1$, and $k_{0}\in\mathbb{N}$ s.t.

 $c^{\frac{1}{p_{0}-1}}s_{k_{0}}<1$

and

 $s_{k}\leq c\cdot s_{k-1}^{p_{0}},\quad k\geq k_{0}+1,$

then $\exists x^{\ast}\in\mathbb{R}$ such that $x_{k}\rightarrow x^{\ast}$ with lower $R$-order at least $p_{0}$.

The errors and corrections are tightly connected when the convergence is fast.

###### Lemma 3.2(Potra–Pták–Walker Lemma; [79, Prop. 6.4], [98]).

$x_{k}\rightarrow x^{\ast}$ $Q$-superlinearly iff $x_{k+1}-x_{k}\rightarrow 0$ $Q$-superlinearly.

In the case of $Q$-superlinear convergence, it holds (the Dennis–Moré Lemma [30]) that

 $\lim_{k\rightarrow\infty}\frac{|x_{k+1}-x_{k}|}{|x^{\ast}-x_{k}|}=1.$ (15)
###### Remark 3.3.

(a) As pointed out by Dennis and Moré in [30], eq. 15 alone does not imply $Q$-superlinear convergence: take $x_{2k-1}=\frac{1}{k!}$, $x_{2k}=2x_{2k-1}$, $k\geq 1$, for example.

(b) While the statement and the proofs of this result consider multidimensional spaces and norms (or even metrics) in eq. 15, Brezinski [10] has noted that in $\mathbb{R}$ the sufficiency can also be proved by using l’Hôpital’s rule for sequences.

As in [4], we use an apostrophe for the resulting quotient factors ($p>1$):

 $Q_{p}^{\prime}(k):=\frac{s_{k+1}}{s_{k}^{p}}=\frac{|x_{k+2}-x_{k+1}|}{|x_{k+1}% -x_{k}|^{p}},\quad k\geq 0,$

and the above lemma shows that $C$-order $p_{0}$ $\Rightarrow$ $C^{\prime}$-order $p_{0}$ with $Q_{p_{0}}=Q^{\prime}_{p_{0}}$ (see also [40]). The same statement holds for the $Q$-orders and corresponding upper/lower limits of $Q_{p_{0}}(k)$. The equivalence of the error-based and computational orders ($\{C,C^{\prime}\}$, etc.) is stated in section 3.3, which incorporates the nonlinear residuals as well.

The $C^{\prime}$-, $Q^{\prime}$-, and $Q^{\prime}_{\varepsilon}$-orders are semicomputational: they do not use $x^{\ast}$ but still require $p_{0}$. Instead, of much practical interest are the (full) computational expressions

 $\displaystyle\lim_{k\rightarrow\infty}Q^{\prime}_{L}(k)=p_{0},\qquad Q^{\prime% }_{L}(k):=\frac{\ln s_{k+1}}{\ln s_{k}},$ ($Q^{\prime}_{L}$) $\displaystyle\lim_{k\rightarrow\infty}Q^{\prime}_{\Lambda}(k)=p_{0},\qquad Q^{% \prime}_{\Lambda}(k):=\frac{\ln\frac{s_{k+2}}{s_{k+1}}}{\ln\frac{s_{k+1}}{s_{k% }}},$ ($Q^{\prime}_{\Lambda}$)

as they are equivalent to the corresponding error-based orders (see corollary 3.5).

### 3.2 Computational Convergence Orders Based on Nonlinear Residuals

In section 4 we study the speed of some iterative methods toward a zero $x^{\ast}$ of $f$. The nonlinear mapping $f:D\subseteq\mathbb{R}\rightarrow\mathbb{R}$ is assumed to be sufficiently many times differentiable and the solution is assumed simple:131313This terminology is used for equations in $\mathbb{R}$, while for systems in $\mathbb{R}^{N}$ the terminology is “nonsingular,” as (the Jacobian) $F^{\prime}(x^{\ast})$ is a matrix. $f^{\prime}(x^{\ast})\neq 0$ (this means that $f(x)=(x-x^{\ast})g(x)$ and $g(x^{\ast})\neq 0$). However, we will consider multiple solutions as well.

The use of the nonlinear residuals for controlling the convergence to $x^{\ast}$ is natural, as they can be seen as “proportional” to the errors: by the Lagrange theorem,

 $f(x)=f^{\prime}(x+\theta(x^{\ast}-x))(x-x^{\ast})\quad\mbox{ for some }\theta% \in(0,1),$ (16)

which in applied sciences is usually written as

 $f(x)\approx f^{\prime}(x^{\ast})(x-x^{\ast})\quad(x\approx x^{\ast}).$ (17)

However, instead of an asymptotic relation, this is often regarded as an approximate equality. Such an approach is as precise as the idiomatic “the higher the order, the fewer the iterations (needed)”: it may not refer to the asymptotic range.

###### Quiz 3.4.

If $f$ is a polynomial of degree $2$ with $f^{\prime}(x^{\ast})=1$ at the root $x^{\ast}=0$, how large can $|f(x)|$ be when $|x-x^{\ast}|=0.0001$ holds (and then for $|x-x^{\ast}|=10^{-16}$)?

Returning to the quotient factors, we use, as in [20], double prime marks,

 $Q_{p_{0}}^{\prime\prime}(k):=\frac{|f(x_{k+1})|}{|f(x_{k})|^{p_{0}}}=:\frac{|f% _{k+1}|}{|f_{k}|^{p_{0}}},$

and notice that if $Q_{p_{0}}\in(0,+\infty)$, then, by (16),

 $Q_{p_{0}}^{\prime\prime}(k)\rightarrow\frac{Q_{p_{0}}}{|f^{\prime}(x^{\ast})|^% {p_{0}-1}}.$ (18)

This leads us to the $C^{\prime\prime}$- and $Q^{\prime\prime}$-orders $p_{0}>1$. For instance,

 $\displaystyle\lim_{k\rightarrow\infty}Q^{\prime\prime}_{L}(k)=p_{0},\qquad Q^{% \prime\prime}_{L}(k):=\frac{\ln|f_{k+1}|}{\ln|f_{k}|},$ ($Q^{\prime\prime}_{L}$) $\displaystyle\lim_{k\rightarrow\infty}Q^{\prime\prime}_{\Lambda}(k)=p_{0},% \qquad Q^{\prime\prime}_{\Lambda}(k):=\frac{\ln|f_{k+2}/f_{k+1}|}{\ln|f_{k+1}/% f_{k}|}.$ ($Q^{\prime\prime}_{\Lambda}$)

### 3.3 The Equivalence of the Error-Based and Computational Orders

This equivalence was given in $\mathbb{R}^{N}$ [20] (see also [76] and [4]). Here we formulate it as a corollary, since in $\mathbb{R}$ the three $C$-orders are equivalent (see [20, Ex. 3.22] for $\{C^{\prime\prime}\}\nRightarrow\{C,C^{\prime}\}$ in $\mathbb{R}^{N}$).

###### Corollary 3.5(cf. [20]).

Let $f$ be smooth, $x^{\ast}$ simple, $x_{k}\rightarrow x^{\ast}$, $p_{0}>1$. Then (see footnote 12)

 $\{C,C^{\prime},C^{\prime\prime}\}\underset{\nLeftarrow}{\Rightarrow}\{Q,Q^{% \prime},Q^{\prime\prime}\}\underset{\nLeftarrow}{\Rightarrow}\{R,R^{\prime},R^% {\prime\prime}\}.$

Moreover, the lower/upper orders and asymptotic constants obey

 $\displaystyle\underline{Q}_{L}$ $\displaystyle=\underline{Q}_{L}^{\prime}=\underline{Q}_{L}^{\prime\prime}=q_{l% }\quad\textrm{and\quad}\bar{Q}_{L}=\bar{Q}_{L}^{\prime}=\bar{Q}_{L}^{\prime% \prime}=q_{u},\quad\mbox{resp.,}$ (19) $\displaystyle\underline{Q}_{q_{u}}$ $\displaystyle=\underline{Q}_{q_{u}}^{\prime}=|f^{\prime}(x^{\ast})|^{{p_{0}}-1% }\underline{Q}_{q_{u}}^{\prime\prime}\quad\textrm{and\quad}\bar{Q}_{q_{l}}=% \bar{Q}_{q_{l}}^{\prime}=|f^{\prime}(x^{\ast})|^{{p_{0}}-1}\bar{Q}_{q_{l}}^{% \prime\prime}.$ (20)

Consequently, one has $C$-order $p_{0}>1$ iff

 $0

and $Q$-order $p_{0}>1$ iff

 $p_{0}=\underline{Q}_{L}=\underline{Q}_{L}^{\prime}=\underline{Q}_{L}^{\prime% \prime}=q_{l}=\bar{Q}_{L}=\bar{Q}_{L}^{\prime}=\bar{Q}_{L}^{\prime\prime}=q_{u% }=Q_{\Lambda}=Q^{\prime}_{\Lambda}=Q^{\prime\prime}_{\Lambda}.$

Relation eq. 19 shows equalities of orders, while eq. 20 shows that of constants.

The equivalence of the computational orders based on the corrections follows from the Potra–Pták–Walker lemma, while eq. 19 and eq. 20 follow from the Dennis–Moré lemma.

The equivalence of the computational orders based on the nonlinear residuals is obtained using the Lagrange theorem, by eq. 16.

## 4 Iterative Methods for Nonlinear Equations

Unless otherwise stated, the nonlinear mapping $f:D\subseteq\mathbb{R}\rightarrow\mathbb{R}$ is sufficiently smooth and the solution $x^{\ast}$ is simple.

Since an equation $f(x)=0$ might have a unique, several (even infinite), or no solution at all, the iterative methods usually need an initial approximation close to an $x^{\ast}$.

###### Definition 4.1([67, Def. 10.1.1]).

A method converges locally to $x^{\ast}$ if $\exists V\subseteq D$ (neighborhood of $x^{\ast}$) s.t. $\forall x_{0}\in V$, the generated iterations $\{x_{k}\}\subset D$ and $x_{k}\rightarrow x^{\ast}$.

### 4.1 The Newton Method

The first result on the local convergence of a method was given for the Newton iterations (see [67, NR 10.2-1], [101], [23, sect. 6.4]),

 $x_{k+1}=x_{k}-\frac{f(x_{k})}{f^{\prime}(x_{k})},\quad k=0,1,\ldots,x_{0}\in D,$ (21)

and it is to due to Cauchy in 1829 ([22], available on the Internet). These iterates are also known as the tangent method, though their initial devising did not consider this geometric interpretation.141414The first such interpretation was given by Mourraille (1768) [23, sect. 6.3].

#### 4.1.1 History

This method can be traced back to ancient times (Babylon and Egypt, ca. 1800 B.C.), as it is conjectured that it was used in the computation (with five correct—equivalent—decimal digits) of $\sqrt{2}$ (see [3], [49, sect. 1.7], [74], with the references therein, and also [86, p. 39] for an annotated picture). While this is not certain, there is no doubt that Heron used these iterations for $\sqrt{a}$ [3].

“Heron of Alexandria (first century A.D.) seems to have been the first to explicitly propose an iterative method of approximation in which each new value is used to obtain the next value,” as noted by Chabert et al. [23, p. 200]. Indeed, Heron has approximated $\sqrt{720}$ in an iterative fashion by elements corresponding to the Newton method for solving $x^{2}-720=0$ [23, sect. 7.1].

We can also recognize the Newton method for $x^{2}-A=0$ used by Theon of Alexandria (ca. 370 A.D.) in approximating $\sqrt{4500}$, based entirely on geometric considerations (originating from Ptolemy; see [74]). See [23, sect. 7.2].

So efficient is this method in computing two of the basic floating point arithmetic operations—the square root and division (see 4.4)—that it is still a choice even in current (2021) codes (see, e.g., [62, pp. 138, 124]).

Let us recall some general comments on the subsequent history of this method, thoroughly analyzed by Ypma in [101].

A method algebraically equivalent to Newton’s method was known to the 12th century algebraist Sharaf al-Dīn al-Ṭūsī […], and the 15th century Arabic mathematician Al-Kāshī used a form of it in solving $x^{p}-N=0$ to find roots of $N$. In western Europe a similar method was used by Henry Briggs in his Trigonometria Britannica, published in 1633, though Newton appears to have been unaware of this […]. [101] (see also [82])

In solving nonlinear problems using such iterates, Newton ($\approx$$1669$) and subsequently Raphson (1690) dealt only with polynomial equations,151515$x^{3}-2x-5=0$ is “the classical equation where the Newton method is applied” [90].,161616Actually, as noted by Ypma, Newton considered such iterations in solving a transcendent equation, namely, the Kepler equation $x-e\sin{x}=M$. “However, given the geometrical obscurity of the argument, it seems unlikely that this passage exerted any influence on the historical development of the Newton-Raphson technique in general” [101]. described in [90]. Newton considered the process of successively computing the corrections, which are ultimately added together to form the final approximation (cf. [90]); see algorithm 1.

Algorithm 1 The algorithm devised by Newton (left), and his first example (right)
Let $f(x)$, $x_{0}$ [$f(x)=x^{3}-2x-5$, $x_{0}=2$]

Let $g_{0}(s)=f(x_{0}+s)$ $[g_{0}(s)=(2+s)^{3}-2(2+s)-5=s^{3}+6s^{2}+10s-1$]

For $k=0:m-1$

Compute $\tilde{g}_{k}(s)$ (the order-one approx. of $g_{k}(s)$) [$\tilde{g}_{0}(s)=10s-1$]

Solve for $s_{k}$ in $\tilde{g}_{k}(s)=0$ [$s_{0}=0.1$]

Let $g_{k+1}(s)=g_{k}(s_{k}+s)$ [$g_{1}(s)=g_{0}(s_{0}+s)=s^{3}+6.3s^{2}+11.23s+0.061$]

$x_{n}:=x_{0}+\sum_{k=0}^{n-1}s_{k}$ [$x_{n}=2+0.1-\frac{0.061}{11.23}+\cdots+s_{n-1}$]

Raphson considered approximations updated at each step, a process equivalent to eq. 21 (see [55], [101], [90]). However, the derivatives of $f$ (which could be calculated with the “fluxions” of that time) do not appear in their formulae, only the first-order approximations from the finite series development of polynomials (using the binomial formula). For more than a century, the belief was that these two variants represented two different methods;171717“Nearly all eighteenth century writers and most of the early writers of the nineteenth century carefully discriminated between the method of Newton and that of Raphson” (Cajori [13]). For a given input ($n$ fixed), the result of LABEL:alg._devised_by_Newton may depend on how accurately the different fractions (e.g., $\frac{0.061}{11.23}$) are computed. it was Stewart (1745) who pointed out that they are in fact equivalent (see [90]). As noted by Steihaug [90],

It took an additional 50 years before it was generally accepted that the methods of Raphson and Newton were identical methods, but implemented differently. Joseph Lagrange in 1798 derives the Newton-Raphson method […] and writes that the Newton’s method and Raphson’s method are the same but presented differently and Raphson’s method is plus simple que celle de Newton.

Simpson (1740) was the first to apply the method to transcendent equations, using “fluxions.”181818The fluxions $\dot{x}$ did not represent $f^{\prime}$, but are “essentially equivalent to $dx/dt$; implicit differentiation is used to obtain $dy/dt$, subsequently dividing through by $dx/dt$ as instructed produces the derivative $A=dy/dx$ of the function. […] Thus Simpson’s instructions closely resemble, and are mathematically equivalent to, the use of [eq. 21]. The formulation of the method using the now familiar $f^{\prime}(x)$ calculus notation of [eq. 21] was published by Lagrange in 1798 […]” [101] (see also [90]).${}^{,}\,$${}^{\ref{ftnt: Newton solved iter Kepler eq}}$ Even more important, he extended it to the solving of nonlinear systems of two equations (see [23], [101], [90]),191919The first nonlinear system considered was $y+\sqrt{y^{2}-x^{2}}-10=0$, $x+\sqrt{y^{2}+x}-12=0$, with $(x_{0},y_{0})=(5,6)$, according to [101] and [90]. subsequently generalized to the usual form known today: $F(x)=0$ with $F:\mathbb{R}^{N}\rightarrow\mathbb{R}^{N}$, for which, denoting the Jacobian of $F$ at $x_{k}$ by $F^{\prime}(x_{k})$, one has to solve for $s_{k}$ at each step the linear system

 $F^{\prime}(x_{k})s=-F(x_{k}).$ (22)

Then appear writers like Euler, Laplace, Lacroix and Legendre, who explain the Newton-Raphson process, but use no names. Finally, in a publication of Budan in 1807, in those of Fourier of 1818 and 1831, in that of Dandelin in 1826, the Newton-Raphson method is attributed to Newton. The immense popularity of Fourier’s writings led to the universal adoption of the misnomer “Newton’s method” for the Newton-Raphson process.

While some authors use “the Newton–Raphson method,”Add footnote: 202020“Who was Raphson?” (N. Bićanić and K. H. Johnson, 1979) has the popular answer “Raphson was Newton’s programmer” (S. Nash, 1992). the name Simpson—though appropriate212121“[…] it would seem that the Newton–Raphson–Simpson method is a designation more nearly representing the facts of history in reference to this method” (Ypma [101]). “I found no source which credited Simpson as being an inventor of the method. None the less, one is driven to conclude that neither Raphson, Halley nor anyone else prior to Simpson applied fluxions to an iterative approximation technique” (Kollerstrom [55]).—is only occasionally encountered (e.g., [59], [34], [41], [39]). In a few cases “Newton–Kantorovich” appears, but refers to the mid-1900 contributions of Kantorovich to semilocal convergence results when $F$ is defined on normed spaces.

#### 4.1.2 Attainable $C$-Orders

The method attains at least $C$-order $2$ for simple solutions. Regarding rigorous convergence results, “Fourier appears to have been the first to approach this question in a note entitled Question d’analyse algébrique (1818) […], but an error was found in his formula,” as noted by Chabert et al. [23, sect. 6.4], who continue by noting that

Cauchy studied the subject from 1821 onwards […], but did not give a satisfactory formulation until 1829. […] The concern for clarity and rigour, which is found in all of Cauchy’s work, tidies up the question of the convergence of Newton’s method for the present.

###### Theorem 4.2(cf. [22], [23, Thm. II, p. 184]).

If $x^{\ast}$ is simple and $f^{\prime\prime}(x^{\ast})\neq 0$, then the Newton method converges locally, with $C$-quadratic order:

 $\lim_{k\rightarrow\infty}\frac{|x^{\ast}-x_{k+1}|}{|x^{\ast}-x_{k}|^{2}}=\bigg% {|}\frac{f^{\prime\prime}(x^{\ast})}{2f^{\prime}(x^{\ast})}\bigg{|}=:Q_{2}.$ (23)
###### Remark 4.3.

The usual proofs for eq. 23 consider the Taylor developments of $f(x_{k})$ and $f^{\prime}(x_{k})$, assuming small errors and neglecting “higher-order terms”:

 $\ell_{k+1}=\ell_{k}-\frac{f(x^{\ast})+f^{\prime}(x^{\ast})\ell_{k}+\cdots}{f^{% \prime}(x^{\ast})+f^{\prime\prime}(x^{\ast})\ell_{k}+\cdots}\approx\frac{f^{% \prime\prime}(x^{\ast})}{2f^{\prime}(x^{\ast})}\ell^{2}_{k}\qquad(\ell_{k}:=x_% {k}-x^{\ast},\mbox{ i.e., signed}).$ (24)
###### Exercise 4.4.

(a) Write the Newton iterates for computing $\sqrt{2}$ (cf. [62, p. 138]).

(b) (Newton everywhere [96, sect. 17.9]) Apply eq. 21 to compute the division $a/b$ of real numbers in terms of only the operations $+,-,*$ (cf. [62, p. 124]).

###### Remark 4.5.

(a) The usual perception is that the smaller $Q_{2}$, the faster the asymptotic speed; expression eq. 23 shows that for the Newton method this is realized with smaller $|f^{\prime\prime}|$ and larger $|f^{\prime}|$, and this means that near $x^{\ast}$, the graph of $f$ resembles a line ($|f^{\prime\prime}|$ small) having large angle with $Ox$ (the line is not close to $Ox$).222222In the opposite situation, when the angle is zero ($x^{\ast}$ is multiple), the convergence is only linear. While this interpretation holds locally for the asymptotic range of the Newton iterates, it is worth mentioning that the larger the region with such a property, the larger the attraction set (see also eq. 32), and the better conditioned the problem of solving $f(x)=0$ (see section 4.1.6).

We note that taking $2f(x)$ instead of $f(x)$, despite doubling $f^{\prime}(x)$, does not reduce $Q_{2}$, leading in fact to the same iterates (this property is called affine invariance; see [34]).

(b) Of course, it would be of major interest to state a conclusion like “the smaller $Q_{2}$, the fewer the iterates needed.” However, even if supported by a number of numerical examples, such a statement should be treated with great care. The reason is simple: we do not address similar objects ($Q_{2}$ is an asymptotic constant, referring to $k\rightarrow\infty$, while the desired conclusion refers to a finite number of steps). Moreover, when we say “the smaller $Q_{2}$,” this means that we consider different mappings $f$; even if they have the same $x^{\ast}$, the attraction set and the dynamics of the iterates may be different.

The role of the asymptotic constants will be analyzed in the forthcoming work [21].

Let us analyze the expectation that this method behaves similarly for the same $Q_{2}$.

###### Example 4.6.

(a) Let $a\in\mathbb{R}$ be given and

 $f(x):=x+x^{2}+ax^{3}=0,\quad\mbox{ with }x^{\ast}=0.$

We get $f^{\prime}(0)=1$ and $f^{\prime\prime}(0)=2$, so by eq. 23, $Q_{2}=1$ ($\forall a\in\mathbb{R}$ given).

The full Taylor series gives the exact relation for the signed errors from eq. 24:

 $\ell_{k+1}=\frac{1+2a\ell_{k}}{1+2\ell_{k}+3a\ell^{2}_{k}}\ell^{2}_{k}.$ (25)

Take $x_{0}=0.0001(=e_{0})$ fixed. By eq. 24, we should obtain $|\ell_{1}|=e_{1}\approx e_{0}^{2}=10^{-8}$ for any $a>0$. In table 1 we see this does not hold when $a$ has large values.

As $|a|$ grows, despite being “small,” the error $e_{0}$ in fact is not “squared,” as suggested by eq. 24. Indeed, when $|a|$ is very large, in eq. 24 the terms with larger errors (corresponding to $f^{\prime\prime\prime}(x)=6a$) are actually neglected, despite their having higher powers.

(b) Taking $x_{0}=0.0001$ for $f$ as above and then for $f(x)=x+2x^{2}$ shows that $Q_{2}$ and the number of iterates needed (for the same accuracy) are not in direct proportion.

###### Remark 4.7.

Igarashi and Ypma [46] noticed that a smaller $Q_{2}$ does not necessarily attract fewer iterates for attaining a given accuracy. However, the numerical examples used to support this claim were based not on revealing the individual asymptotic regime of the sequences, but on the number of iterates required to attain a given ( binary64 ) accuracy.

If $f^{\prime\prime}(x^{\ast})=0$, then eq. 23 implies $Q$-superquadratic order, but actually the order is higher: it is given by the first index$\geq 2$ of the nonzero derivative at the solution.

###### Theorem 4.8(see, e.g., [37]).

Let $x^{\ast}$ be simple. The Newton method converges locally with $C$-order $p_{0}\in\mathbb{N}$, $p_{0}\geq 2$, iff

 $f^{(p_{0})}(x^{\ast})\neq 0\quad and\quad f^{\prime\prime}(x^{\ast})=\cdots=f^% {(p_{0}-1)}(x^{\ast})=0,$

in which case

 $Q_{p_{0}}=\frac{p_{0}-1}{p_{0}!}\bigg{|}\frac{f^{(p_{0})}(x^{\ast})}{f^{\prime% }(x^{\ast})}\bigg{|}.$

#### 4.1.3 Examples for the Attained $C$-Orders

We consider decreasing orders.

(a) Infinite $C$-order. If $f(x)=ax+b$, $a\neq 0$, then $x_{1}=x^{\ast}$ $\forall x_{0}\in{\mathbb{R}}$.

(b) Arbitrary natural $C$-order $p_{0}\geq 2$ (see remark 4.13 for $p_{0}\notin{\mathbb{N}}$).

###### Example 4.9.

Let $p\geq 2$ be a natural given number and consider

 $f(x)=x+x^{p},\quad x^{\ast}=0,\ p=2,3,4\ \mbox{\rm\cite[cite]{[\@@bibref{}{% Raydan-1993}{}{}]}}.$

By theorem 4.8, we obtain local convergence with $C$-order $p$ and $Q_{p}=p-1$.

Let us first deal with $p=2$. In double precision, for $x_{0}=0.46$ we obtain six nonzero iterates (shown truncated in table 2). In analyzing $Q_{L}(k-1)$ (shown with the last six decimal digits truncated), we note that $Q_{L}(4)$ is a better approximation than the last one, $Q_{L}(5)$. This phenomenon may appear even if we increase the precision (see Figure 12).

When $x_{0}=0.47$ and $0.45$ we obtain five nonzero iterates (see 4.11), and $Q_{L}(k)$ is increasingly better.

For $p=3$ and $4$, we obtain four nonzero iterations for the three choices of $x_{0}$, and therefore a single meaningful term in computing $\{Q^{\prime}_{\Lambda}(k)\}$.

###### Remark 4.10.

The higher the convergence order, the faster $\operatorname{fl}(x^{\ast})$ may be reached, and we might end up with fewer distinct terms.

Higher precision is needed in this study. We have chosen to use the Julia language here, which allows not only quadruple precision (digits128, compliant with IEEE 754-2008) but arbitrary precision as well (by using the setprecision command). Some examples are posted at https://github.com/ecatinas/conv-ord and may be easily run online (e.g., by IJulia notebooks running on Binder).232323Such examples may be also performed in MATLAB [58] (by choosing the Advanpix [1] toolbox, which handles arbitrary precision), or in other languages.

In fig. 11 we show the resulting $Q_{L}(k-1)$, $Q_{\Lambda}(k-2)$, $Q^{\prime}_{L}(k-2)$, $Q^{\prime}_{\Lambda}(k-3)$ (in order to use only the information available at step $k$).

We see that all four (computational) convergence orders approach the corresponding $p$’s, but we cannot clearly distinguish their speed.

In fig. 12 we present $|Q_{\Lambda}(k-2)-p|$ and $|Q_{L}(k-1)-p|$.

###### Exercise 4.11.

Examining the numerator and denominator in the Newton corrections, give the two different reasons why $x_{7}$ is $0$ in table 2 (when $x_{0}=0.47$ and $0.45$). When is realmin or machine epsilon (or both) too large?

(c) $C$-order $p\in(1,2)$. The $C$-order $2$ is lost with the smoothness of $f$ (when $\nexists f^{\prime\prime}$).

###### Example 4.12([67, E 10.2-4], [31]).

The Newton method has $C$-order $1+\alpha$ for

 $f(x)=x+|x|^{1+\alpha},\quad\alpha\in(0,1)\text{ given},\quad x^{\ast}=0.$
###### Remark 4.13.

Modifying the above example by taking $f(x)=x+|x|^{p_{0}}$ shows that the Newton method may in fact attain any real nonnatural order $p_{0}\in(2,+\infty)$.

(d) $C$-linear order. This appears for a multiple solution $x^{\ast}$, which is implicitly assumed to be isolated ($f^{\prime}(x)\neq 0$ $\forall x\neq x^{\ast}$ in a neighborhood of $x^{\ast}$).242424Otherwise, an iterative method would not be able to converge to a specific solution we were interested in (take, e.g., the constant function $f(x)=0$ with $\mathbb{R}$ as the solution set, or $f(x)=x\sin\frac{1}{x}$, $x\neq 0$, $f(0)=0$). Indeed, if the multiplicity of $x^{\ast}$ is $q\in\mathbb{N}$,

 $f(x^{\ast})=f^{\prime}(x^{\ast})=\cdots=f^{(q-1)}(x^{\ast})=0,\quad f^{(q)}(x^% {\ast})\neq 0$ (26)

(i.e., $f(x)=(x-x^{\ast})^{q}g(x)$, $g(x^{\ast})\neq 0$), then (see, e.g., [67, E 10.2-8], [36], [41])

 $Q_{1}=1-\frac{1}{q}.$ (27)

If $q$ is known, one may use the Schröder formula [68] to restore the $C$-order $2$:

 $x_{k+1}=x_{k}-q\frac{f(x_{k})}{f^{\prime}(x_{k})},\qquad\mbox{with }Q_{2}=% \frac{|f^{(q+1)}(x^{\ast})|}{q(q+1)|f^{(q)}(x^{\ast})|}\mbox{ (see, e.g., % \cite[cite]{[\@@bibref{}{Ostrowski-1966}{}{}, (8.13)]})}.$
###### Exercise 4.14.

(a) Calculate the Newton and Schröder iterates for $f(x)=x^{2}$.

(b) The Schröder iterates are the Newton iterates for $\sqrt[q]{f(x)}=(x-x^{\ast})\sqrt[q]{g(x)}$.

Approximation of the unknown $q$ is surveyed in [60, sect. 7.9]; see also [81, sect. 6.6.2].

(e) $C$-sublinear order.

###### Exercise 4.15.

Let $f(x)=e^{-1/{x^{2}}}$, $x\neq 0$, $f(0)=0$. Show that the Newton method converges locally to $x^{\ast}=0$, with $Q_{1}(\{x_{k}\})=1$, i.e., $C$-sublinearly.

#### 4.1.4 Convexity

The influence of $x_{0}$ and the convexity of $f$ were first analyzed by Mourraille (1768) [23, sect. 6.3], in an intuitive fashion.

In 1818, Fourier ([38], available on the Internet) gave the condition

 $f(x_{0})f^{\prime\prime}(x_{0})>0,$ (28)

which ensures the convergence of the Newton method, provided $f$ maintains the monotony and convexity in the interval determined by $x^{\ast}$ and $x_{0}$. This condition can lead to sided convergence intervals (i.e., not necessarily centered at $x^{\ast}$).

Clearly, the method may converge even on the whole axis, a result which is more interesting to consider in $\mathbb{R}^{n}$ (cf. the Baluev theorem [66, Thm. 8.3.4]).

###### Remark 4.16.

(a) By eq. 24, the signs of the errors are periodic with period either $1$ (i.e., monotone iterates) or $2$ ($k\geq k_{0}$).

(b) The Fourier condition is sufficient for convergence but not also necessary; the iterates may converge locally (by theorem 4.2) even when eq. 28 does not hold: take, e.g., $f(x)=\sin(x)$, $x^{\ast}=0$, with the signs of errors alternating at each step ($k\geq k_{0}$).

#### 4.1.5 Attraction Balls

Estimations for $B_{r}(x^{\ast})=\{x\in D:|x^{\ast}-x| $=(x^{\ast}-r,x^{\ast}+r)$ as $V$ in definition 4.1 are usually given using the condition

 $|f^{\prime}(x)-f^{\prime}(y)|\leq L|x-y|\quad\forall x,y\in D.$ (29)

The Lipschitz constant $L$ of $f^{\prime}$ measures the nonlinearity of $f$ (i.e., how much the graph of $f$ is bent); its exact bound shows this is in accordance with remark 4.5, since

 $L\geq\sup_{x\in D}|f^{\prime\prime}(x)|.$ (30)

The usual assumptions also require

 $|f^{\prime}(x^{\ast})|\geq\frac{1}{\beta},$ (31)

and the Dennis–Schnabel [32], Rheinboldt [84], and Deuflhard–Potra [35] estimations are, respectively,

 $r=\frac{1}{2\beta L},\quad r=\frac{2}{3\beta L},\quad\mbox{and }r=\frac{2}{% \omega},$ (32)

where $\omega$ satisfies $|f^{\prime}(x)^{-1}|\cdot|f^{\prime}(x+t(y-x))-f^{\prime}(x)|\leq t\omega|y-x|$ $\forall t\in(0,1)$, $\forall x,y\in D$, i.e., it can be viewed as a combination of both $L$ and $\beta$ (see also [2]).

We note that eq. 32 estimates a small $r$ for example 4.6, as $L$ is large.

#### 4.1.6 Floating Point Arithmetic

Curious behavior may arise in this setting.

The derivative $f^{\prime}$ measures the sensitivity in computing $f(x)$ using $\tilde{x}\approx x$ instead: $|f(x)-f(\tilde{x})|\approx|f^{\prime}(x)|\cdot|x-\tilde{x}|$. This shows that for large $|f^{\prime}(x)|$, the absolute error in $x$ may be amplified. When the relative error in $x$ is of interest, the condition number becomes $|f^{\prime}(x)|\cdot|x|$: $|f(x)-f(\tilde{x})|\approx|f^{\prime}(x)|\cdot|x|\frac{|x-\tilde{x}|}{|x|}$.

For the relative errors of $f$, the condition numbers are given by [69], [42, sects. 1.2.6 and 5.3]

 $\frac{|f(x)-f(\tilde{x})|}{|f(x)|}\approx\frac{|xf^{\prime}(x)|}{|f(x)|}\frac{% |x-\tilde{x}|}{|x|}\quad\mbox{or}\quad\frac{|f(x)-f(\tilde{x})|}{|f(x)|}% \approx\frac{|f^{\prime}(x)|}{|f(x)|}|x-\tilde{x}|,$

which show that the relative errors in $f(x)$ may be large near a solution $x^{\ast}$.

However, as shown in [42, sects. 1.2.6 and 5.3] and [81, Ex. 2.4, sect. 6.1], the problem of solving an equation has the conditioning number $1/|f^{\prime}(x^{\ast})|$, which is equal to the inverse of the conditioning number for computing $f(x)$.

When $|f^{\prime}(x)|$ (i.e., $|f^{\prime}(x^{\ast})|$) is not small, Shampine, Allen, and Pruess argue that the Newton method is “stable at limiting precision” [89, p. 147], as the Newton corrections $\frac{f(x)}{f^{\prime}(x)}$ are small.

When $|f^{\prime}(x^{\ast})|$ is small (e.g., $x^{\ast}$ is multiple), notable cancellations may appear in computing $f$ and prevent the usual graphical interpretation. The (expanded) polynomial $(x-2)^{9}$ of Demmel [28, p. 8] computed in double precision is well known.

Even simpler examples exist: $f(x)=x^{3}-2x^{2}+\frac{4}{3}x-\frac{8}{27}(=(x-\frac{2}{3})^{3})$ (Sauer [86, p. 45]) and $f(x)=x^{3}e^{3x}-3x^{2}e^{2x}+3xe^{x}-1(=(e^{x}-1)^{3})$, $x^{\ast}\approx 0.56$ (Shampine, Allen, and Pruess [89, Fig. 1.2, p. 23]).

$|f^{\prime}|$ not being small is no guarantee that things won’t get weird due to other reasons.

###### Quiz 4.17(Wilkinson; see [89, Ex. 4.3]).

Let $f(x)=x^{20}-1$, $x^{\ast}=1$, and (here) $x_{0}=10$. Why, in double precision, are the (rounded) Newton iterates: $x_{1}=9.5,x_{2}=9.0,x_{3}=8.6,x_{4}=8.1$, etc.? Take $\ell_{0}=\pm 0.001$ and compute $x_{1}$ and $e_{1}$.

#### 4.1.7 Nonlinear Systems in $\mathbb{R}^{N}$

When Newton-type methods are used in practice to solve nonlinear systems in $\mathbb{R}^{N}$, obtaining high $Q$-orders becomes challenging. As the dimension may be huge ($N=10^{6}$ and even higher) the main difficulty resides—having solved the representation of the data in memory252525One can use Newton–Krylov methods, which do not require storage of the matrices $F^{\prime}(x)$, but instead computation of matrix-vector products $F^{\prime}(x)v$, further approximated by finite differences [12].—in accurately solving the resulting linear systems eq. 22 at each step. In many practical situations, superlinear convergence may be considered satisfactory.

The most common setting allows the systems eq. 22 to be only approximately solved (the corrections verify $F^{\prime}(x_{k})s_{k}=-F(x_{k})+r_{k}$, $r_{k}\neq 0$), leading to the inexact Newton method; its superlinear convergence and the orders $p\in(1,2]$ were characterized by Dembo, Eisenstat, and Steihaug in [27], the approach being easily extended by us for further sources of perturbation in [14], [17].

The quasi-Newton methods form another important class of Newton-type iterations; they consider approximate Jacobians, in order to produce linear systems that are easier to solve [31]. Their superlinear convergence was characterized by Dennis and Moré [30].

The inexact and quasi-Newton methods are in fact tightly connected models [17].

### 4.2 The Secant Method

This is a two-step method, requiring $x_{0}\neq x_{1}\in D$,

 $x_{k+1}=x_{k}-\frac{f(x_{k})}{\frac{f(x_{k})-f(x_{k-1})}{x_{k}-x_{k-1}}}=\frac% {x_{k-1}f(x_{k})-x_{k}f(x_{k-1})}{f(x_{k})-f(x_{k-1})},\quad k=1,2,\ldots,$ (33)

but none of the above formulae is recommended for programming (see section 4.2.5).

#### 4.2.1 History

Though nowadays the secant method is usually seen as a Newton-type method with derivatives replaced by finite differences (i.e., a quasi-Newton method) or an inverse interpolatory method, this is not the way it first appeared. The roots of this method can be traced back to approximately the same time as that of the Newton method, the 18th century B.C., found on Babylonian clay tablets and on the Egyptian Rhind Papyrus [23], [70], when it was used as a single iteration for solving linear equations.262626Such equations appear trivial these days, but their solution required ingenuity back then: the decimal system and the zero symbol were used much later, in the 800s (see, e.g., [61, pp. 192–193]). During these times and even later, the terminology for such a noniterated form was “the rule of double false position” (see, e.g., [70]); indeed, two initial “false positions” $x_{0},x_{1}$ yield in one iteration the true solution of the linear equation.

Heron of Alexandria [33] approximated the cubic root of $100$ by this method (as revealed by Luca and Păvăloiu [57], it turns out that it can be seen as applied equivalently to $x^{2}-\frac{100}{x}=0$).

While reinvented in several cultures (in China,272727Jiuzhang Suanshu (Nine Chapters on the Mathematical Art) [23, sect. 3.3]. India,282828For example, in the work of Madhava’s student Paramesvara [73], [74], [23, sect. 3.4]. Arab countries, and Africa,292929al-Ṭūsī, al-Kāshī, al-Khayyām (see [82], [101], [23, sect. 3.5]). sometimes even for $2$-by-$2$ systems of equations), its use as an iterative method was first considered by Cardano (1545), who called it Regula Liberae Positionis [70] (or regula aurea [82, note 5, p. 200]).

Viète also used a secant-type method, and it appears that Newton (around 1665) independently rediscovered the secant method [70], [101].

#### 4.2.2 Attainable $Q$-Orders

The standard convergence result shows the usual convergence with $C$-order given by the golden ratio $\lambda_{1}:=\frac{1+\sqrt{5}}{2}\approx 1.618$; as noted by Papakonstantinou and Tapia [70], this was first obtained by Jeeves in 1958 (in a technical report, published as [48]).

###### Theorem 4.18([68, Chap. 12, sects. 11–12]).

If $x^{\ast}$ is simple and $f^{\prime\prime}(x^{\ast})\neq 0$, then the secant method has local convergence with $C$-order $\lambda_{1}=\frac{1+\sqrt{5}}{2}$:

 $\lim_{k\rightarrow\infty}\frac{|x^{\ast}-x_{k+1}|}{|x^{\ast}-x_{k}|^{\lambda_{% 1}}}=\bigg{|}\frac{f^{\prime\prime}(x^{\ast})}{2f^{\prime}(x^{\ast})}\bigg{|}^% {\lambda_{1}-1}=:Q_{\lambda_{1}}.$ (34)
###### Remark 4.19.

The $Q$-order $\lambda_{1}$ follows by considering signed errors (see, e.g., [48])

 $\ell_{k+1}=\frac{[x_{k-1},x_{k},x^{\ast};f]}{[x_{k-1},x_{k};f]}\ell_{k}\ell_{k% -1},\quad k\geq 1\quad(\ell_{k}:=x_{k}-x^{\ast}),$ (35)

and then 2.25. Indeed, the divided differences converge, $[x_{k-1},x_{k};f]\rightarrow f^{\prime}(x^{\ast})$, $[x_{k-1},x_{k},x^{\ast};f]\rightarrow\frac{f^{\prime\prime}(x^{\ast})}{2}$, and denoting $l=\big{|}\frac{f^{\prime\prime}(x^{\ast})}{2f^{\prime}(x^{\ast})}\big{|}$, it follows that in eq. 5 we may take $A=l-\varepsilon$ and $B=l+\varepsilon$ for some small $\varepsilon$.

###### Quiz 4.20.

Find the incompleteness in proving the $C$-order $\lambda_{1}$ in eq. 34 by arguing that

 $(l-\varepsilon)\Big{(}\frac{e_{k-1}^{\lambda_{1}}}{e_{k}}\Big{)}^{\frac{1}{% \lambda_{1}}}e_{k}^{\frac{1}{\lambda_{1}}+1-\lambda_{1}}\leq(l-\varepsilon)% \frac{e_{k}e_{k-1}}{e_{k}^{\lambda_{1}}}\leq\frac{e_{k+1}}{e_{k}^{\lambda_{1}}% }\leq(l+\varepsilon)\frac{e_{k}e_{k-1}}{e_{k}^{\lambda_{1}}}=(l+\varepsilon)% \Big{(}\frac{e_{k-1}^{\lambda_{1}}}{e_{k}}\Big{)}^{\frac{1}{\lambda_{1}}},$

$\frac{1}{\lambda_{1}}+1-\lambda_{1}=0$, and therefore $Q_{\lambda_{1}}$ satisfies $Q_{\lambda_{1}}=lQ_{\lambda_{1}}^{-\frac{1}{\lambda_{1}}}$, i.e., is given by eq. 34.

###### Remark 4.21.

By eq. 35, the signs of the errors (and of $f(x_{k})$ as well) are periodic with period at most $3$ ($k\geq k_{0}$), as noted by Neumaier [64, Cor. 5.1.3].

Higher $Q$- (but not necessarily $C$-)orders may be attained.

###### Theorem 4.22(Raydan [83]).

If $x^{\ast}$ is simple and $q\geq 1$ is the first index such that $f^{(q+1)}(x^{\ast})\neq 0$, then the secant method converges locally with $Q$-order given by

 $\lambda_{q}=\frac{1+\sqrt{1+4q}}{2}$

and obeys

 $\lim\frac{e_{k+1}}{e_{k}e_{k-1}^{q}}=\frac{|f^{(q+1)}(x^{\ast})|}{(q+1)!|f^{% \prime}(x^{\ast})|}.$

Moreover, specific results hold in the following cases, noting the attained orders:

• $q=1$: $C$-order $\lambda_{1}$;

• $q=2$: exact $Q$-order $\lambda_{2}=2$ (but not necessarily $C$-order $2$);

• $q=3$: $Q$-order $\lambda_{3}\approx 2.3$ (but not necessarily exact $Q$-order $\lambda_{3}$).

###### Remark 4.23.

When $q\geq 2$, the secant iterates may have only $Q$- but no $C$-order, despite $\frac{e_{k+1}}{e_{k}e_{k-1}^{q}}$ converging to a finite nonzero limit (see 2.25).

###### Example 4.24(cf. [83]).

Let $f_{[q+1]}(x)=x+x^{q+1}$, $x_{0}=0.5$, $x_{1}=\frac{1}{(q+1)^{\lambda_{q}}}$, $q=1,2,3$, and compute the secant iterates (by eq. 36).

The assertions of theorem 4.22 are verified in this case, since $|Q_{L}(k-1)-\lambda_{q}|$ and $|Q_{\Lambda}(k-2)-\lambda_{q}|$, computed using setprecision(500), tend to $0$ (see fig. 13).

In fig. 14 we plot $Q_{\lambda_{q}}(k)$ when $q=1,2,$ and $3$.

We see that $Q_{\lambda_{1}}(k)$ tends to $1$.

For $q=2,$ $Q_{\lambda_{2}}(k)$ oscillates (suggesting $\underline{Q}_{\lambda_{2}}(k)\approx 0.57$, $\bar{Q}_{\lambda_{2}}(k)\approx 1.74$).

For $q=3$, $\bar{Q}_{\lambda_{3}}(k)$ tends to infinity (as rigorously proved by Raydan for $q\geq 3$). For this particular data it appears that $\underline{Q}_{\lambda_{3}}=0$ (which remains to be rigorously proved). In order to see it more clearly, we have also used further increased precision (setprecision(5000)), providing confirmation.

When $q=3$, $\bar{Q}_{\lambda_{3}}=\infty$ or $\underline{Q}_{\lambda_{3}}=0$ do not necessarily hold for any $x_{0},x_{1}$: they do not hold (numerically) for $x_{0}=0.5,x_{1}=0.25$ (but they do hold, e.g., for $x_{0}=1,x_{1}=0.5$).

The higher the order, the faster the iterates attain $\operatorname{fl}(x^{\ast})$. We see in fig. 14 that setprecision(500) allows $14$ iterates when $q=1$ and only $7$ iterates when $q=3$ (increased to $11$ for setprecision(5000)).

#### 4.2.3 Multiple Solutions

Not only the high orders, but even very local convergence may be lost if $x^{\ast}$ is not simple.

###### Example 4.25(cf. [41]).

If $f(x)=x^{2}$, $x_{0}=-\varepsilon,x_{1}=\varepsilon$, $\nexists x_{2}$ in eq. 36 $(\forall\varepsilon>0)$.

Assuming that the secant method converges, only a linear rate is obtained [95]; Díez [36] showed that if the multiplicity of $x^{\ast}$ in eq. 26 is

• $q=2$, then $Q_{1}=\frac{1}{\lambda_{1}}=\lambda_{1}-1\approx 0.618$;

• $q\geq 3$, then $Q_{1}$ is the $0<\lambda^{\ast}<1$ solution of $x^{q}+x^{q-1}-1=0$.

Also, if the iterates verify at each step $|f(x_{k})|<|f(x_{k-1})|$ (by swapping), one obtains superlinear convergence for the root secant method of Neumaier [64, p. 241]:

 $x_{k+1}=x_{k}-\frac{x_{k}-x_{k-1}}{1-\sqrt{f(x_{k-1})/f(x_{k})}}.$

For other references on this topic, see [60, sect. 7.9] and [91, Ex. 1.9].

#### 4.2.4 Attraction Balls

Estimates for the radius of an attraction ball were given by Liang [56]: if eq. 29 and $\big{|}[x,y,z;f]\big{|}\leq K$ $\forall x,y,z\in D$ hold, then

 $r=\frac{1}{3\beta K}.$

When using a specific $x_{1}$ in the secant method ($x_{0}\in B_{r}$ arbitrary, $x_{1}=x_{0}-\frac{f(x_{0})}{f^{\prime}(x_{0})}$), estimates similar to eq. 32 were obtained: if eq. 31 and eq. 30, then

 $r=\frac{2}{3\beta L}\quad\mbox{(see \cite[cite]{[\@@bibref{}{Liang-2007}{}{}]}% )}.$

#### 4.2.5 Floating Point Arithmetic

The two expressions in eq. 33 are not recommended for programming in an ad literam fashion, respectively, at all; indeed, the first one should be written instead as

 $x_{k+1}=x_{k}-\frac{f(x_{k})(x_{k}-x_{k-1})}{f(x_{k})-f(x_{k-1})},\quad k=1,2,\ldots,$ (36)

while the second one can lead, close to $x^{\ast}$, to cancellations when $f(x_{k})$ and $f(x_{k-1})$ have the same sign [26, p. 228], [60, p. 4].

Further equivalent formulae are

 $x_{k+1}=x_{k}-\frac{x_{k}-x_{k-1}}{1-\frac{f(x_{k-1})}{f(x_{k})}},\quad k=1,2,\ldots,$

and (Vandergraft [97, p. 265])

 $x_{k+1}=x_{k}-\left[(x_{k-1}-x_{k})\left(\frac{f(x_{k})}{f(x_{k-1})}\right)% \right]\left/\left(1-\frac{f(x_{k})}{f(x_{k-1})}\right),\right.$

which requires $|\frac{f(x_{k})}{f(x_{k-1})}|<1$ at each step (by swapping the iterates).

It is worth noting, however, that swapping the consecutive iterates in order to obtain $|f(x_{k})|<|f(x_{k-1})|$ leads to iterates that may have a different dynamic than is standard.

The secant method may not be stable at limiting precision, even for large $|f^{\prime}(x^{\ast})|$ (Shampine, Allen, and Pruess [89, p. 147]), and higher precision may be needed.

### 4.3 Successive Approximations for Fixed Point Problems

We consider a sufficiently differentiable mapping $g:D\rightarrow D$, $D\subseteq\mathbb{R}$, and the fixed point problem

 $g(x)=x.$

A solution $x^{\ast}$ is called a fixed point of $g$; $x^{\ast}$ is further called an attraction fixed point when one has local convergence for the successive approximations

 $x_{k+1}=g(x_{k}),\quad k\geq 0.$

#### 4.3.1 History

Assuming that the computation of $\sqrt{2}$ with five (decimal) digits is highly improbable in the absence of an iterative method, it is likely that successive approximations date back to at least 1800 B.C. (considering the Newton iterates as particular instances of successive approximations).

Regarding the first use of iterates not connected to the Newton or secant-type iterates, they seem have appeared in the fifth and sixth centuries in India (see Plofker [74]; in this paper the author describes as an example the Brahmasphutasiddhanta of Brahmagupta from the seventh century).

Later occurrences are mentioned for Ibn al-Banna as cited in [23, sect. 7.3], al-Ṭūsī [23, sect. 7.4], Viète [23, sect. 7.5], Parameśvara (aviśeṣa) [73], al-Kāshī [23, sect. 7.5], and Ḥabash al-Ḥāsib al-Marwazi [52], [3].

Further occurrences of such iterates appear for Kepler (1618–1622) in [23, sect. 7.6], Gregory (1672 [101]; 1674 [3]), Dary (1674) [3], and Newton (1674) [101].

#### 4.3.2 Local Convergence, Attainable $C$-Orders

The first part of the following local convergence result was first obtained by Schröder in 1870 [87]. The extension to mappings in $\mathbb{R}^{N}$ implied the analysis of the eigenvalues of the Jacobian at the fixed point, which is an important topic in numerical analysis.303030Given $H\in\mathbb{R}^{N\times N}$ and the linear system $x=Hx+c$, the following result was called by Ortega [66, p. 118] the fundamental theorem of linear iterative methods: If $x=Hx+c$ has a unique solution $x^{\ast}$, then the successive approximations converge to $x^{\ast}$ $\forall x_{0}\in\mathbb{R}^{N}$ iff the spectral radius $\rho(H)$ is $<1$. The standard formulation of the result was first given by Ostrowski in 1957, but Perron had essentially already obtained it in 1929 (see [67, NR 10.1-1]).

###### Theorem 4.26.
• (a)

([67, Thm. 10.1.3], [68]) Let $g$ be continuous on $D\subseteq{\mathbb{R}}$ and differentiable at the fixed point $x^{\ast}\in D$. If

 $|g^{\prime}(x^{\ast})|<1,$

then $x^{\ast}$ is an attraction fixed point.

• (b)

([37]) Conversely, if $x^{\ast}$ is an attraction fixed point, then $|g^{\prime}(x^{\ast})|\leq 1$.

###### Remark 4.27(see [67, E 10.1-2]).

Condition $|g^{\prime}(x^{\ast})|\leq 1$ is sharp: by considering $|g^{\prime}(x^{\ast})|=1$ and taking $g(x)=x-x^{3}$, $x^{\ast}=0$ is an attraction fixed point, while if $g(x)=x+x^{3}$, the same fixed point $x^{\ast}$ is no longer of attraction.

The higher orders attained are characterized as follows.

###### Theorem 4.28(see, e.g., [68, Chap. 4, sect. 10, p. 44], [95], [37]).

Let $p_{0}\geq 2$ be an integer and $x^{\ast}$ a fixed point of $g$. Then $x_{k+1}=g(x_{k})\rightarrow x^{\ast}$ with $C$-order $p_{0}$ iff

 $g^{\prime}(x^{\ast})=g^{\prime\prime}(x^{\ast})=\cdots=g^{(p_{0}-1)}(x^{\ast})% =0\quad and\quad g^{(p_{0})}(x^{\ast})\neq 0,$

in which case

 $\lim_{k\rightarrow\infty}\frac{|x^{\ast}-x_{k+1}|}{|x^{\ast}-x_{k}|^{p_{0}}}=% \frac{|g^{(p_{0})}(x^{\ast})|}{p_{0}!}=:Q_{p_{0}}.$ (37)
###### Example 4.29([67, E 10.1-10]).

Any $C$-order $p_{0}>1$ is possible: let $g(x)=x^{p_{0}}$.

The $C$-sublinear order may be attained too.

###### Exercise 4.30.

Let $g(x)=x-x^{3}$ and $x_{k+1}=g(x_{k})\rightarrow 0$; then $Q_{1}(\{x_{k}\})=1$.

The successive approximations are usually regarded as the most general iterations, but actually, they may also be seen as instances of an inexact Newton method [15]; such an approach allows us to study acceleration techniques in a different setting.

#### 4.3.3 Attraction Balls

The following result holds.

###### Theorem 4.31([19]).

Let $x^{\ast}$ be an attraction fixed point of $g$ for which

 $|g^{\prime}(x^{\ast})|\leq q<1.$

Suppose there exist $r_{1}>0,$ $L>0$ such that $g$ is differentiable on $B_{r_{1}},$ with $g^{\prime}$ Lipschitz continuous of constant $L$, and denote

 $r=\min\bigg{\{}r_{1},\sqrt{\frac{2(1-q)}{L}}\bigg{\}}.$ (38)

Then, $\forall x_{0}\in B_{r}$, letting $t=\frac{L}{2}|x_{0}-x^{\ast}|+q<1$ we get

 $|x_{k+1}-x^{\ast}|\leq t|x_{k}-x^{\ast}|,\quad k\geq 0,$

which implies that $x_{k+1}=g(x_{k})\in B_{r}$ and $x_{k+1}=g(x_{k})\rightarrow x^{\ast}$.

###### Remark 4.32.

It is important to note that this result does not require $g$ to be a contraction on $D$ (i.e., $L<1$), in contrast to the classical Banach contraction theorem.

The center-Lipschitz condition (i.e., $x^{\ast}$ instead of $y$ in eq. 29) is an improvement [24].

The estimate eq. 38 may be sharp in certain cases.

###### Example 4.33([19]).

Let $g(x)=x^{2}$ with $x^{\ast}=0$, $g^{\prime}(x^{\ast})=0$, $r_{1}>0$ arbitrary, and $L=2$. Then eq. 38 gives the sharp $r=r_{2}=1$, since $x^{\ast}=1$ is another fixed point.

We observe that $|g^{\prime}(x)|=2$ at $|x-x^{\ast}|=r=1$, which upholds remark 4.32.

Regarding the general case of several dimensions with $G:D\subseteq{\mathbb{R}^{N}}\rightarrow D$, we note that the trajectories with high convergence orders are characterized by the zero eigenvalue of $G^{\prime}(x^{\ast})$ and its corresponding eigenvectors (see [16]).

###### Remark 4.34([18, Ex. 2.8]).

definition 4.1 allows a finite number of $x_{k}\notin V$ (regardless of how “small” $V$ is chosen to be): take, e.g., $G(x)=Ax$, $A=\big{(}\begin{smallmatrix}0&2\\ \frac{1}{8}&0\end{smallmatrix}\big{)}$, $V=B_{r}(0)$.

## 5 Conclusions

Various aspects of convergence orders have been analyzed here, but rather than providing a final view, we see this work as a fresh start; in [21] we pursue the study of the following themes: the characterizations of the $C$-order, the connections between the $Q$- and $R$-orders and the big Oh rates, the asymptotic constants (with their immediate computational variants), the convergence orders of the discretization schemes (to mention a few).

1.9: $\{\dot{x}_{k}\}=\{\frac{1}{2^{2^{k}}}\}=\{\mathring{x}_{k}\}$ in different windows.

2.7: (a) $Q_{1}=10^{-1}$; (b) $Q_{1}=10^{-2}$; (c) $10^{-x_{k}}$, $\{x_{k}\}$ the Fibonacci sequence; (d) $Q_{2}=1$.

3.4: unbounded (take $f(x)=x+ax^{2}$).

4.17: $\operatorname{fl}(10^{20}-1)=\operatorname{fl}(10^{20})$, in double precision.

4.20: the limit $Q_{\lambda_{1}}$ is not proved to exist; see also theorem 4.22.

## Acknowledgments

I am grateful to two referees for constructive remarks which helped to improve the manuscript, particularly the one who brought the Julia language to my attention; also, I am grateful to my colleagues M. Nechita and I. Boros, as well as to S. Filip for useful suggestions regarding Julia.

## References

• [2] I. K. Argyros and S. George, On a result by Dennis and Schnabel for Newton’s method: Further improvements, Appl. Math. Lett., 55 (2016), pp. 49–53, https://doi.org/10.1016/j.aml.2015.12.003.
• [3] D. F. Bailey, A historical survey of solution by functional iteration, Math. Mag., 62 (1989), pp. 155–166, https://doi.org/10.1080/0025570X.1989.11977428.
• [4] W. A. Beyer, B. R. Ebanks, and C. R. Qualls, Convergence rates and convergence-order profiles for sequences, Acta Appl. Math., 20 (1990), pp. 267–284, https://doi.org/10.1007/bf00049571.
• [5] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: A fresh approach to numerical computing, SIAM Rev., 59 (2017), pp. 65–98, https://doi.org/10.1137/141000671.
• [6] R. P. Brent, Algorithms for Minimization without Derivatives, Prentice-Hall, Englewood Cliffs, NJ, 1973.
• [7] R. P. Brent, S. Winograd, and P. Wolfe, Optimal iterative processes for root-finding, Numer. Math., 20 (1973), pp. 327–341, https://doi.org/10.1007/bf01402555.
• [8] C. Brezinski, Comparaison des suites convergentes, Rev. Française Inform. Rech. Opér., 5 (1971), pp. 95–99.
• [9] C. Brezinski, Accélération de la Convergence en Analyse Numérique, Springer-Verlag, Berlin, 1977.
• [10] C. Brezinski, Limiting relationships and comparison theorems for sequences, Rend. Circolo Mat. Palermo Ser. II, 28 (1979), pp. 273–280, https://doi.org/10.1007/bf02844100.
• [11] C. Brezinski, Vitesse de convergence d’une suite, Rev. Roumaine Math. Pures Appl., 30 (1985), pp. 403–417.
• [12] P. N. Brown, A local convergence theory for combined inexact-Newton/finite-difference projection methods, SIAM J. Numer. Anal., 24 (1987), pp. 407–434, https://doi.org/10.1137/0724031.
• [13] F. Cajori, Historical note on the Newton-Raphson method of approximation, Amer. Math. Monthly, 18 (1911), pp. 29–32, https://doi.org/10.2307/2973939.
• [14] E. Cătinaş, Inexact perturbed Newton methods and applications to a class of Krylov solvers, J. Optim. Theory Appl., 108 (2001), pp. 543–570, https://doi.org/10.1023/a:1017583307974.
• [15] E. Cătinaş, On accelerating the convergence of the successive approximations method, Rev. Anal. Numér. Théor. Approx., 30 (2001), pp. 3–8, https://ictp.acad.ro/accelerating-convergence-successive-approximations-method/ (accessed 2021/04/07).
• [16] E. Cătinaş, On the superlinear convergence of the successive approximations method, J. Optim. Theory Appl., 113 (2002), pp. 473–485, https://doi.org/10.1023/a:1015304720071.
• [17] E. Cătinaş, The inexact, inexact perturbed and quasi-Newton methods are equivalent models, Math. Comp., 74 (2005), pp. 291–301, https://doi.org/10.1090/s0025-5718-04-01646-1.
• [18] E. Cătinaş, Newton and Newton-type Methods in Solving Nonlinear Systems in $\mathbb{R}^{N}$, Risoprint, Cluj-Napoca, Romania, 2007, https://ictp.acad.ro/methods-of-newton-and-newton-krylov-type/ (accessed on 2021/6/9).
• [19] E. Cătinaş, Estimating the radius of an attraction ball, Appl. Math. Lett., 22 (2009), pp. 712–714, https://doi.org/10.1016/j.aml.2008.08.007.
• [20] E. Cătinaş, A survey on the high convergence orders and computational convergence orders of sequences, Appl. Math. Comput., 343 (2019), pp. 1–20, https://doi.org/10.1016/j.amc.2018.08.006.
• [21] E. Cătinaş, How Many Steps Still Left to $x$*? Part II. Newton Iterates without $f$ and $f^{\prime}$, manuscript, 2020.
• [22] A. Cauchy, Sur la determination approximative des racines d’une equation algebrique ou transcendante, Oeuvres Complete (II) 4, Gauthier-Villars, Paris, 1899, 23 (1829), pp. 573–609, http://iris.univ-lille.fr/pdfpreview/bitstream/handle/1908/4026/41077-2-4.pdf?sequence=1 (accessed 2021/04/07).
• [23] J.-L. Chabert, ed., A History of Algorithms. From the Pebble to the Microchip, Springer-Verlag, Berlin, 1999.
• [24] J. Chen and I. K. Argyros, Improved results on estimating and extending the radius of an attraction ball, Appl. Math. Lett., 23 (2010), pp. 404–408, https://doi.org/10.1016/j.aml.2009.11.007.
• [25] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust-Region Methods, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719857.
• [26] G. Dahlquist and Å. Björk, Numerical Methods, Dover, Mineola, NY, 1974.
• [27] R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal., 19 (1982), pp. 400–408, https://doi.org/10.1137/0719025.
• [28] J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971446.
• [29] J. E. Dennis, On Newton-like methods, Numer. Math., 11 (1968), pp. 324–330, https://doi.org/10.1007/bf02166685.
• [30] J. E. Dennis, Jr., and J. J. Moré, A characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comp., 28 (1974), pp. 549–560, https://doi.org/10.1090/s0025-5718-1974-0343581-1.
• [31] J. E. Dennis, Jr., and J. J. Moré, Quasi-Newton methods, motivation and theory, SIAM Rev., 19 (1977), pp. 46–89, https://doi.org/10.1137/1019005.
• [32] J. E. Dennis, Jr., and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, PA, 1996, https://doi.org/10.1137/1.9781611971200.
• [33] G. Deslauries and S. Dubuc, Le calcul de la racine cubique selon Héron, Elem. Math., 51 (1996), pp. 28–34, http://eudml.org/doc/141587 (accessed 2020/02/18).
• [34] P. Deuflhard, Newton Methods for Nonlinear Problems. Affine Invariance and Adaptive Algorithms, Springer-Verlag, Heidelberg, 2004.
• [35] P. Deuflhard and F. A. Potra, Asymptotic mesh independence of Newton–Galerkin methods via a refined Mysovskii theorem, SIAM J. Numer. Anal., 29 (1992), pp. 1395–1412, https://doi.org/10.1137/0729080.
• [36] P. Díez, A note on the convergence of the secant method for simple and multiple roots, Appl. Math. Lett., 16 (2003), pp. 1211–1215, https://doi.org/10.1016/s0893-9659(03)90119-4.
• [37] F. Dubeau and C. Gnang, Fixed point and Newton’s methods for solving a nonlinear equation: From linear to high-order convergence, SIAM Rev., 56 (2014), pp. 691–708, https://doi.org/10.1137/130934799.
• [38] J. Fourier, Question d’analyse algébrique, Bull. Sci. Soc. Philo., 67 (1818), pp. 61–67, http://gallica.bnf.fr/ark:/12148/bpt6k33707/f248.item (accessed 2021-04-07).
• [39] W. Gautschi, Numerical Analysis, 2nd ed., Birkhäuser/Springer, New York, 2011, https://doi.org/10.1007/978-0-8176-8259-0.
• [40] M. Grau-Sánchez, M. Noguera, and J. M. Gutiérrez, On some computational orders of convergence, Appl. Math. Lett., 23 (2010), pp. 472–478, https://doi.org/10.1016/j.aml.2009.12.006.
• [41] A. Greenbaum and T. P. Chartier, Numerical Methods. Design, Analysis, and Computer Implementation of Algorithms, Princeton University Press, Princeton, NJ, 2012.
• [42] M. T. Heath, Scientific Computing. An Introductory Survey, 2nd ed., SIAM, Philadelphia, PA, 2018, https://doi.org/10.1137/1.9781611975581.
• [43] M. Heinkenschloss, Scientific Computing. An Introductory Survey, Rice University, Houston, TX, 2018.
• [44] J. Herzberger and L. Metzner, On the Q-order of convergence for coupled sequences arising in iterative numerical processes, Computing, 57 (1996), pp. 357–363, https://doi.org/10.1007/BF02252254.
• [45] N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, Philadelphia, PA, 2002, https://doi.org/10.1137/1.9780898718027.
• [46] M. Igarashi and T. J. Ypma, Empirical versus asymptotic rate of convergence of a class of methods for solving a polynomial equation, J. Comput. Appl. Math., 82 (1997), pp. 229–237, https://doi.org/10.1016/s0377-0427(97)00077-0.
• [47] L. O. Jay, A note on Q-order of convergence, BIT, 41 (2001), pp. 422–429, https://doi.org/10.1023/A:1021902825707.
• [48] T. A. Jeeves, Secant modification of Newton’s method, Commun. ACM, 1 (1958), pp. 9–10, https://doi.org/10.1145/368892.368913.
• [49] V. J. Katz, A History of Mathematics. An Introduction, 2nd ed., Addison-Wesley, 1998.
• [50] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, PA, 1995, https://doi.org/10.1137/1.9781611970944.
• [51] C. T. Kelley, Iterative Methods for Optimization, SIAM, Philadelphia, PA, 1999, https://doi.org/10.1137/1.9781611970920.
• [52] E. S. Kennedy and W. R. Transue, A medieval iterative algorism, Amer. Math. Monthly, 63 (1956), pp. 80–83, https://doi.org/10.1080/00029890.1956.11988762.
• [53] D. E. Knuth, Big Omicron and big Omega and big Theta, ACM SIGACT News, 8 (1976), pp. 18–24, https://doi.org/10.1145/1008328.1008329.
• [54] D. E. Knuth, The TeXbook, Addison-Wesley, 1984.
• [55] N. Kollerstrom, Thomas Simpson and Newton’s method of approximation: An enduring myth, British J. History Sci., 25 (1992), pp. 347–354, https://doi.org/10.1017/s0007087400029150.
• [56] K. Liang, Homocentric convergence ball of the secant method, Appl. Math. Chin. Univ., 22 (2007), pp. 353–365, https://doi.org/10.1007/s11766-007-0313-3.
• [57] D. Luca and I. Păvăloiu, On the Heron’s method for approximating the cubic root of a real number, Rev. Anal. Numér. Théor. Approx., 26 (1997), pp. 103–108, https://ictp.acad.ro/jnaat/journal/article/view/1997-vol26-nos1-2-art15 (accessed 2021-04-07).
• [58] Mathworks, MATLAB, Version 2019b, 2019, http://www.mathworks.com (accessed 2021/04/07).
• [59] J. M. McNamee, Numerical Methods for Roots of Polynomials, Part I, Elsevier, Amsterdam, 2007.
• [60] J. M. McNamee and V. Y. Pan, Numerical Methods for Roots of Polynomials, Part II, Elsevier, Amsterdam, 2013.
• [61] U. C. Merzbach and C. B. Boyer, A History of Mathematics, 3rd ed., John Wiley & Sons, 2011.
• [62] J.-M. Muller, N. Brunie, F. de Dinechin, C.-P. Jeannerod, M. Joldes, V. Lefèvre, G. Melquiond, N. Revol, and S. Torres, Handbook of Floating-Point Arithmetic, 2nd ed., Springer, New York, 2018, https://doi.org/10.1007/978-3-319-76526-6.
• [63] A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, 1983.
• [64] A. Neumaier, Introduction to Numerical Analysis, Cambridge University Press, 2001.
• [65] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer, New York, 2006, https://doi.org/10.1007/978-0-387-40065-5.
• [66] J. M. Ortega, Numerical Analysis. A Second Course, SIAM, Philadelphia, PA, 1990, https://doi.org/10.1137/1.9781611971323.
• [67] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719468.
• [68] A. M. Ostrowski, Solution of Equations and Systems of Equations, 2nd ed., Academic Press, New York, 1966; 3rd ed., 1972.
• [69] M. L. Overton, Numerical Computing with IEEE Floating Point Arithmetic, SIAM, Philadelphia, PA, 2001, https://doi.org/10.1137/1.9780898718072.
• [70] J. M. Papakonstantinou and R. A. Tapia, Origin and evolution of the secant method in one dimension, Amer. Math. Monthly, 120 (2013), pp. 500–518, https://doi.org/10.4169/amer.math.monthly.120.06.500.
• [71] I. Păvăloiu, Sur l’ordre de convergence des méthodes d’itération, Mathematica, 23 (46) (1981), pp. 261–265, https://ictp.acad.ro/convergence-order-iterative-methods/.
• [72] I. Păvăloiu, Optimal algorithms concerning the solving of equations by interpolation, in Research on Theory of Allure, Approximation, Convexity and Optimization, Editura Srima, 1999, pp. 222–248, http://ictp.acad.ro/optimal-algorithms-concerning-solving-equations-interpolation/ (accessed 2021-04-07).
• [73] K. Plofker, An example of the secant method of iterative approximation in a fifteenth-century Sanskrit text, Historia Math., 23 (1996), pp. 246–256, https://doi.org/10.1006/hmat.1996.0026.
• [74] K. Plofker, Use and transmission of iterative approximations in India and the Islamic World, in From China to Paris: 2000 Years Transmission of Mathematical Ideas, Y. Dold-Samplonius, J. W. Dauben, M. Folkerts, and B. Van Dalen, eds., Franz Steiner Verlag Wiesbaden GmbH, Stuttgart, 2002, pp. 167–186.
• [75] E. Polak, Optimization. Algorithms and Consistent Approximations, Springer-Verlag, New York, 1997.
• [76] F. A. Potra, On Q-order and R-order of convergence, J. Optim. Theory Appl., 63 (1989), pp. 415–431, https://doi.org/10.1007/bf00939805.
• [77] F. A. Potra, Q-superlinear convergence of the iterates in primal-dual interior-point methods, Math. Program. Ser. A, 91 (2001), pp. 99–115, https://doi.org/10.1007/s101070100230.
• [78] F. A. Potra, A superquadratic version of Newton’s method, SIAM J. Numer. Anal., 55 (2017), pp. 2863–2884, https://doi.org/10.1137/17M1121056.
• [79] F. A. Potra and V. Pták, Nondiscrete Induction and Iterative Processes, Pitman, Boston, MA, 1984.
• [80] F. A. Potra and R. Sheng, A path following method for LCP with superlinearly convergent iteration sequence, Ann. Oper. Res., 81 (1998), pp. 97–114, https://doi.org/10.1023/A:1018942131812.
• [81] A. Quarteroni, R. Sacco, and F. Saleri, Numerical Mathematics, Springer, Berlin, 2007.
• [82] R. Rashed, The Development of Arabic Mathematics: Between Arithmetic and Algebra, Stud. Philos. Sci. 156, Springer, Boston, MA, 1994.
• [83] M. Raydan, Exact order of convergence of the secant method, J. Optim. Theory Appl., 78 (1993), pp. 541–551, https://doi.org/10.1007/BF00939881.
• [84] W. C. Rheinboldt, An adaptive continuation process for solving systems of nonlinear equations, in Mathematical Models and Numerical Methods, Banach Ctr. Publ. 3, Polish Academy of Science, 1977, pp. 129–142, https://doi.org/10.4064/-3-1-129-142.
• [85] W. C. Rheinboldt, Methods for Solving Systems of Nonlinear Equations, 2nd ed., SIAM, Philadelphia, PA, 1998, https://doi.org/10.1137/1.9781611970012.
• [86] T. Sauer, Numerical Analysis, 2nd ed., Pearson, Boston, MA, 2012.
• [87] E. Schröder, Ueber unendlich viele Algorithmen zur Auflösung der Gleichungen, Math. Ann., 2 (1870), pp. 317–365, https://doi.org/10.1007/bf01444024; available as On Infinitely Many Algorithms for Solving Equations, Tech reports TR–92–121 and TR–2990, Institute for Advanced Computer Studies, University of Maryland, 1992; translated by G. W. Stewart.
• [88] H. Schwetlick, Numerische Lösung Nichtlinearer Gleichungen, VEB, Berlin, 1979.
• [89] L. F. Shampine, R. C. Allen, Jr., and S. Pruess, Fundamentals of Numerical Computing, John Wiley and Sons, New York, 1997.
• [90] T. Steihaug, Computational science in the eighteenth century. Test cases for the methods of Newton, Raphson, and Halley: $1685$ to $1745$, Numer. Algorithms, 83 (2020), pp. 1259–1275, https://doi.org/10.1007/s11075-019-00724-8.
• [91] E. Süli and D. Mayers, An Introduction to Numerical Analysis, Cambridge University Press, Cambridge, UK, 2003.
• [92] T. Tantau, The tikz and pgf Packages. Manual for Version $3.1.5$b-$34$-gff$02$ccd$1$, https://github.com/pgf-tikz/pgf.
• [93] R. A. Tapia, J. E. Dennis, Jr., and J. P. Schäfermeyer, Inverse, shifted inverse, and Rayleigh quotient iteration as Newton’s method, SIAM Rev., 60 (2018), pp. 3–55, https://doi.org/10.1137/15M1049956.
• [94] R. A. Tapia and D. L. Whitley, The projected Newton method has order $1+\sqrt{2}$ for the symmetric eigenvalue problem, SIAM J. Numer. Anal., 25 (1988), pp. 1376–1382, https://doi.org/10.1137/0725079.
• [95] J. F. Traub, Iterative Methods for the Solutions of Equations, Prentice Hall, Englewood Cliffs, NJ, 1964.
• [96] E. E. Tyrtyshnikov, A Brief Introduction to Numerical Analysis, Birkhäuser/Springer, New York, 1997.
• [97] J. S. Vandergraft, Introduction to Numerical Computations, 2nd ed., Academic Press, New York, 1983.
• [98] H. F. Walker, An approach to continuation using Krylov subspace methods, in Computational Science in the 21st Century, J. Periaux, ed., John Wiley and Sons, 1997, pp. 72–81.
• [99] S. J. Wright, Primal-Dual Interior-Point Methods, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971453.
• [100] S. J. Wright, Optimization algorithms for data analysis, in The Mathematics of Data, IAS/Park City Math. Ser. 25, AMS, Providence, RI, 2018, pp. 49–97.
• [101] T. J. Ypma, Historical development of the Newton–Raphson method, SIAM Rev., 37 (1995), pp. 531–551, https://doi.org/10.1137/1037125.

[2] I. K. Argyros and S. George, On a result by Dennis and Schnabel for Newton’s method: Further improvements, Appl. Math. Lett., 55 (2016), pp. 49-53, https://doi.org/10.1016/j.aml.2015.12.003. (Cited on p. 28)
[3] D. F. Bailey, A historical survey of solution by functional iteration, Math. Mag., 62 (1989), pp. 155-166, https://doi.org/10.1080/0025570X.1989.11977428. (Cited on pp. 7, 21, 34)
[4] W. A. Beyer, B. R. Ebanks, and C. R. Qualls, Convergence rates and convergence-order profiles for sequences, Acta Appl. Math., 20 (1990), pp. 267-284, https://doi.org/10.1007/bf00049571. (Cited on pp. 8, 9, 11, 14, 17, 19, 21)
[5] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: A fresh approach to numerical computing, SIAM Rev., 59 (2017), pp. 65-98, https://doi.org/10.1137/141000671. (Cited on p. 4)
[6] R. P. Brent, Algorithms for Minimization without Derivatives, Prentice-Hall, Englewood Cliffs, NJ, 1973. (Cited on p. 11)
[7] R. P. Brent, S. Winograd, and P. Wolfe, Optimal iterative processes for root-finding, Numer. Math., 20 (1973), pp. 327-341, https://doi.org/10.1007/bf01402555. (Cited on pp. 13, 15)
[8] C. Brezinski, Comparaison des suites convergentes, Rev. Franҫaise Inform. Rech. Oper., 5 (1971), pp. 95-99. (Cited on pp. 13, 14)
[9] C. Brezinski, Acceleration de la Convergence en Analyse Numerique, Springer-Verlag, Berlin, 1977. (Cited on pp. 2, 13, 14)
[10] C. Brezinski, Limiting relationships and comparison theorems for sequences, Rend. Circolo Mat. Palermo Ser. II, 28 (1979), pp. 273-280, https://doi.org/10.1007/bf02844100. (Cited on p. 19)
[11] C. Brezinski, Vitesse de convergence d’une suite, Rev. Roumaine Math. Pures Appl., 30 (1985), pp. 403-417. (Cited on pp. 11, 13, 18)
[12] P. N. Brown, A local convergence theory for combined inexact-Newton/finite-difference projection methods, SIAM J. Numer. Anal., 24 (1987), pp. 407-434, https://doi.org/10.1137/0724031. (Cited on p. 29)
[13] F. Cajori, Historical note on the Newton-Raphson method of approximation, Amer. Math. Monthly, 18 (1911), pp. 29-32, https://doi.org/10.2307/2973939. (Cited on pp. 22, 23)
[14] A. Cauchy, Sur la determination approximative des racines d’une equation algebrique ou transcendante, Oeuvres Complete (II) 4, Gauthier-Villars, Paris, 1899, 23 (1829), pp. 573-609, http://iris.univ-lille1.fr/pdfpreview/bitstream/handle/1908/4026/41077-2-4.pdf?sequence=1 (accessed 2019/02/01). (Cited on pp. 21, 23)
[15] J.-L. Chabert, ed., A History of Algorithms. From the Pebble to the Microchip, Springer Verlag, Berlin, 1999. (Cited on pp. 21, 22, 23, 28, 30, 34)
[16] J. Chen and I. K. Argyros, Improved results on estimating and extending the radius of an attraction ball, Appl. Math. Lett., 23 (2010), pp. 404-408, https://doi.org/10.1016/j.aml.2009.11.007. (Cited on p. 35)
[17] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust-Region Methods, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719857. (Cited on pp. 14, 17)
[18] E. Catinas, Inexact perturbed Newton methods and applications to a class of Krylov solvers, J. Optim. Theory Appl., 108 (2001), pp. 543-570, https://doi.org/10.1023/a:1017583307974. (Cited on p. 29)
[19] E. Catinas, On accelerating the convergence of the successive approximations method, Rev. Anal. Numer. Theor. Approx., 30 (2001), pp. 3-8, https://ictp.acad.ro/acceleratingconvergence-successive-approximations-method/ (accessed 20/10/10). (Cited on p. 34)
[20] E. Catinas, On the superlinear convergence of the successive approximations method, J. Optim. Theory Appl., 113 (2002), pp. 473-485, https://doi.org/10.1023/a:1015304720071. (Cited on p. 35)
[21] E. Catinas, The inexact, inexact perturbed and quasi-Newton methods are equivalent models, Math. Comp., 74 (2005), pp. 291-301, https://doi.org/10.1090/s0025-5718-04-01646-1. (Cited on p. 29)
[22] E. Catinas, Newton and Newton-type Methods in Solving Nonlinear Systems in RN , Risoprint, Cluj-Napoca, 2007. (Cited on p. 35)
[23] E. Catinas, Estimating the radius of an attraction ball, Appl. Math. Lett., 22 (2009), pp. 712-714, https://doi.org/10.1016/j.aml.2008.08.007. (Cited on p. 35)
[24] E. Catinas, A survey on the high convergence orders and computational convergence orders of sequences, Appl. Math. Comput., 343 (2019), pp. 1-20, https://doi.org/10.1016/j.amc.2018.08.006. (Cited on pp. 7, 8, 10, 11, 12, 14, 15, 16, 17, 18, 20, 21)
[25] E. Catinas, How Many Steps Still Left to x*? Part II. Newton iterates without f and f’, manuscript, 2020. (Cited on pp. 24, 35)
[26] G. Dahlquist and A. Bjork, Numerical Methods, Dover, Mineola, NY, 1974. (Cited on p. 33)
[27] R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal., 19 (1982), pp. 400-408, https://doi.org/10.1137/0719025. (Cited on p. 29)
[28] J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971446. (Cited on pp. 3, 29)
[29] J. E. Dennis, On Newton-like methods, Numer. Math., 11 (1968), pp. 324-330, https://doi.org/10.1007/bf02166685. (Cited on p. 17)
[30] J. E. Dennis, Jr., and J. J. More, A characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comp., 28 (1974), pp. 549-560, https://doi.org/10.1090/s0025-5718-1974-0343581-1. (Cited on pp. 19, 29)
[31] J. E. Dennis, Jr., and J. J. More, Quasi-Newton methods, motivation and theory, SIAM Rev., 19 (1977), pp. 46-89, https://doi.org/10.1137/1019005. (Cited on pp. 27, 29)
[32] J. E. Dennis, Jr., and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, PA, 1996, https://doi.org/10.1137/1.9781611971200. (Cited on pp. 15, 28)
[33] G. Deslauries and S. Dubuc, Le calcul de la racine cubique selon Heron, Elem. Math., 51 (1996), pp. 28-34, http://eudml.org/doc/141587 (accessed 2020/02/18). (Cited on p. 30)
[34] P. Deuflhard, Newton Methods for Nonlinear Problems. Affine Invariance and Adaptive Algorithms, Springer-Verlag, Heidelberg, 2004. (Cited on pp. 23, 24)
[35] P. Deuflhard and F. A. Potra, Asymptotic mesh independence of Newton-Galerkin methods via a refined Mysovskii theorem, SIAM J. Numer. Anal., 29 (1992), pp. 1395-1412, https://doi.org/10.1137/0729080. (Cited on p. 28)
[36] P. Diez, A note on the convergence of the secant method for simple and multiple roots, Appl. Math. Lett., 16 (2003), pp. 1211-1215, https://doi.org/10.1016/s0893-9659(03)90119-4. (Cited on pp. 27, 31)
[37] F. Dubeau and C. Gnang, Fixed point and Newton’s methods for solving a nonlinear equation: From linear to high-order convergence, SIAM Rev., 56 (2014), pp. 691-708, https://doi.org/10.1137/130934799. (Cited on pp. 25, 34)
[38] J. Fourier, Question d’analyse algebrique, Bull. Sci. Soc. Philo., 67 (1818), pp. 61-67, http://gallica.bnf.fr/ark:/12148/bpt6k33707/f248.item (accessed 2018-02-23). (Cited on pp. 7, 28)
[39] W. Gautschi, Numerical Analysis, 2nd ed., Birkhauser/Springer, New York, 2011, https://doi.org/10.1007/978-0-8176-8259-0. (Cited on p. 23)
[40] A. Greenbaum and T. P. Chartier, Numerical Methods. Design, Analysis, and Computer Implementation of Algorithms, Princeton University Press, Princeton, NJ, 2012. (Cited on pp. 23, 27, 31)
[41] M. T. Heath, Scientific Computing. An Introductory Survey, 2nd ed., SIAM, Philadelphia, PA, 2018, https://doi.org/10.1137/1.9781611975581. (Cited on pp. 9, 29)
[42] M. Heinkenschloss, Scientific Computing. An Introductory Survey, Rice University, Houston, TX, 2018. (Cited on pp. 3, 15, 19)
[43] J. Herzberger and L. Metzner, On the Q-order of convergence for coupled sequences arising in iterative numerical processes, Computing, 57 (1996), pp. 357-363, https://doi.org/10.1007/BF02252254. (Cited on p. 13)
[44] N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, Philadelphia, PA, 2002, https://doi.org/10.1137/1.9780898718027. (Cited on p. 3)
[45] M. Igarashi and T. J. Ypma, Empirical versus asymptotic rate of convergence of a class of methods for solving a polynomial equation, J. Comput. Appl. Math., 82 (1997), pp. 229-237, https://doi.org/10.1016/s0377-0427(97)00077-0. (Cited on p. 25)
[46] L. O. Jay, A note on Q-order of convergence, BIT, 41 (2001), pp. 422-429, https://doi.org/10.1023/A:1021902825707. (Cited on pp. 5, 11, 12, 17)
[47] T. A. Jeeves, Secant modification of Newton’s method, Commun. ACM, 1 (1958), pp. 9-10, https://doi.org/10.1145/368892.368913. (Cited on pp. 11, 30)
[48] V. J. Katz, A History of Mathematics. An Introduction, 2nd ed., Addison-Wesley, 1998. (Cited on p. 21)
[49] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, PA, 1995, https://doi.org/10.1137/1.9781611970944. (Cited on pp. 5, 10, 17)
[50] C. T. Kelley, Iterative Methods for Optimization, SIAM, Philadelphia, PA, 1999, https://doi.org/10.1137/1.9781611970920. (Cited on pp. 15, 17)
[51] E. S. Kennedy and W. R. Transue, A medieval iterative algorism, Amer. Math. Monthly, 63 (1956), pp. 80-83, https://doi.org/10.1080/00029890.1956.11988762. (Cited on p. 34)
[52] D. E. Knuth, Big Omicron and big Omega and big Theta, ACM SIGACT News, 8 (1976), pp. 18-24, https://doi.org/10.1145/1008328.1008329. (Cited on pp. 2, 14)
[53] D. E. Knuth, The TeXbook, Addison-Wesley, 1984. (Cited on p. 4)
[54] N. Kollerstrom, Thomas Simpson and Newton’s method of approximation: an enduring myth, British J. History Sci., 25 (1992), pp. 347-354, https://doi.org/10.1017/s0007087400029150. (Cited on pp. 22, 23)
[55] K. Liang, Homocentric convergence ball of the secant method, Appl. Math. Chin. Univ., 22 (2007), pp. 353–365, https://doi.org/10.1007/s11766-007-0313-3. (Cited on p. 33)
[56] D. Luca and I. Pavaloiu, On the Heron’s method for approximating the cubic root of a real number, Rev. Anal. Numer. Theor. Approx., 26 (1997), pp. 103-108, https://ictp.acad.ro/jnaat/journal/article/view/1997-vol26-nos1-2-art15 (accessed 2020-03-21). (Cited on p. 30)
[57] M. Grau-Sanchez, M. Noguera, and J. M. Gutierrez, On some computational orders of convergence, Appl. Math. Lett., 23 (2010), pp. 472-478, https://doi.org/10.1016/j.aml.2009.12.006. (Cited on p. 20)
[58] Mathworks, MATLAB, Version 2019b, 2019, http://www.mathworks.com (accessed 2019/12/12). (Cited on pp. 4, 26)
[59] J. M. McNamee, Numerical Methods for Roots of Polynomials, Part I, Elsevier, Amsterdam, 2007. (Cited on p. 23)
[60] J. M. McNamee and V. Y. Pan, Numerical Methods for Roots of Polynomials, Part II, Elsevier, Amsterdam, 2013. (Cited on pp. 28, 33)
[61] U. C. Merzbach and C. B. Boyer, A History of Mathematics, 3rd ed., John Wiley & Sons, 2011. (Cited on p. 30)
[62] J.-M. Muller, N. Brunie, F. de Dinechin, C.-P. Jeannerod, M. Joldes, V. Lefevre, G. Melquiond, N. Revol, and S. Torres, Handbook of Floating-Point Arithmetic, 2nd ed., Springer, New York, 2018, https://doi.org/10.1007/978-3-319-76526-6. (Cited on pp. 22, 24)
[63] A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, 1983. (Cited on p. 15)
[64] A. Neumaier, Introduction to Numerical Analysis, Cambridge University Press, 2001. (Cited on pp. 31, 33)
[65] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer, New York, 2006, https://doi.org/10.1007/978-0-387-40065-5. (Cited on pp. 8, 17)
[66] J. M. Ortega, Numerical Analysis. A Second Course, SIAM, Philadelphia, PA, 1990, https://doi.org/10.1137/1.9781611971323. (Cited on pp. 28, 34)
[67] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719468. (Cited on pp. 8, 10, 11, 14, 15, 16, 18, 21, 27, 34)
[68] A. M. Ostrowski, Solution of Equations and Systems of Equations, 2nd ed., Academic Press, New York, 1966; 3rd ed., 1972. (Cited on pp. 27, 30, 34)
[69] M. L. Overton, Numerical Computing with IEEE Floating Point Arithmetic, SIAM, Philadelphia, PA, 2001, https://doi.org/10.1137/1.9780898718072. (Cited on pp. 3, 29)
[70] J. M. Papakonstantinou and R. A. Tapia, Origin and evolution of the secant method in one dimension, Amer. Math. Monthly, 120 (2013), pp. 500-518, https://doi.org/10.4169/amer.math.monthly.120.06.500. (Cited on p. 30)
[71] K. Plofker, An example of the secant method of iterative approximation in a fifteenth century sanskrit text, Historia Math., 23 (1996), pp. 246-256, https://doi.org/10.1006/hmat.1996.0026. (Cited on pp. 30, 34)
[72] K. Plofker, Use and transmission of iterative approximations in India and the Islamic World, in From China to Paris: 2000 Years Transmission of Mathematical Ideas, Y. Dold-Samplonius, J. W. Dauben, M. Folkerts, and B. Van Dalen, eds., Franz Steiner Verlag Wiesbaden GmbH, Stuttgart, 2002, pp. 167-186. (Cited on pp. 21, 22, 30, 34)
[73] E. Polak, Optimization. Algorithms and Consistent Approximations, Springer-Verlag, New York, 1997. (Cited on pp. 5, 11, 15, 19)
[74] F. A. Potra, On Q-order and R-order of convergence, J. Optim. Theory Appl., 63 (1989), pp. 415-431, https://doi.org/10.1007/bf00939805. (Cited on pp. 8, 11, 13, 14, 15, 16, 17, 18, 21) [75] F. A. Potra, Q-superlinear convergence of the iterates in primal-dual interior-point methods, Math. Program. Ser. A, 91 (2001), pp. 99-115, https://doi.org/10.1007/s101070100230. (Cited on p. 15)
[76] F. A. Potra, A superquadratic version of Newton’s method, SIAM J. Numer. Anal., 55 (2017), pp. 2863-2884, https://doi.org/10.1137/17M1121056. (Cited on p. 13)
[77] F. A. Potra and V. Ptak, Nondiscrete Induction and Iterative Processes, Pitman, Boston, MA, 1984. (Cited on pp. 8, 11, 13, 15, 19)
[78] F. A. Potra and R. Sheng, A path following method for LCP with superlinearly convergent iteration sequence, Ann. Oper. Res., 81 (1998), pp. 97-11, https://doi.org/10.1023/A:1018942131812. (Cited on p. 14)
[79] I. Pavaloiu, Sur l’ordre de convergence des methodes d’iteration, Mathematica, 23(46) (1981), pp. 261-272, https://doi.org/10.1007/BF02576543. (Cited on p. 14)
[80] I. Pavaloiu, Optimal algorithms concerning the solving of equations by interpolation, Ed. Srima, (1999), pp. 222-248, https://ictp.acad.ro/optimal-algorithms-concerning-solvingequations-interpolation/ (accessed 2018-03-22). (Cited on p. 13)
[81] A. Quarteroni, R. Sacco, and F. Saleri, Numerical Mathematics, Springer, Berlin, 2007. (Cited on pp. 28, 29)
[82] R. Rashed, The Development of Arabic Mathematics: Between Arithmetic and Algebra, Stud. Philos. Sci. 156, Springer, Boston, MA, 1994. (Cited on pp. 22, 30)
[83] M. Raydan, Exact order of convergence of the secant method, J. Optim. Theory Appl., 78 (1993), pp. 541-551, https://doi.org/10.1137/060670341. (Cited on pp. 25, 31)
[84] W. C. Rheinboldt, An adaptive continuation process for solving systems of nonlinear equations, in Mathematical Models and Numerical Methods, Banach Ctr. Publ. 3, Polish Academy of Science, 1977, pp. 129-142, https://doi.org/10.4064/-3-1-129-142. (Cited on p. 28)
[85] W. C. Rheinboldt, Methods for Solving Systems of Nonlinear Equations, 2nd ed., SIAM, Philadelphia, PA, 1998, https://doi.org/10.1137/1.9781611970012. (Cited on pp. 11, 15)
[86] T. Sauer, Numerical Analysis, 2nd ed., Pearson, Boston, MA, 2012. (Cited on pp. 21, 29)
[87] E. Schroder, Ueber unendlich viele algorithmen zur auflosung der gleichungen, Math. Ann., 2 (1870), pp. 317-365, https://doi.org/10.1007/bf01444024; available as On Infinitely Many Algorithms for Solving Equations, Tech reports TR-92-121 and TR-2990, Institute for Advanced Computer Studies, University of Maryland, 1992; translated by G. W. Stewart. (Cited on pp. 7, 34)
[88] H. Schwetlick, Numerische Losung Nichtlinearer Gleichungen, VEB, Berlin, 1979. (Cited on pp. 8, 13, 14, 16)
[89] L. F. Shampine, R. C. Allen, Jr., and S. Pruess, Fundamentals of Numerical Computing, John Wiley and Sons, New York, 1997. (Cited on pp. 29, 33)
[90] T. Steihaug, Computational science in the eighteenth century. Test cases for the methods of Newton, Raphson, and Halley: 1685 to 1745, Numer. Algor., 83 (2020), pp. 1259-1275, https://doi.org/10.1007/s11075-019-00724-8. (Cited on pp. 22, 23)
[91] E. Suli and D. Mayers, An Introduction to Numerical Analysis, Cambridge University Press, Cambridge, UK, 2003. (Cited on p. 33)
[92] T. Tantau, The tikz and pgf Packages. Manual for Version 3.1.5b-34-gff02ccd1, https://github.com/pgf-tikz/pgf. (Cited on p. 4)
[93] R. A. Tapia, J. E. Dennis, Jr., and J. P. Schafermeyer, Inverse, shifted inverse, and Rayleigh quotient iteration as Newton’s method, SIAM Rev., 60 (2018), pp. 3-55, https://doi.org/10.1137/15M1049956. (Cited on pp. 7, 16, 17, 18)
[94] R. A. Tapia and D. L. Whitley, The projected Newton method has order 1 + √2 for the symmetric eigenvalue problem, SIAM J. Numer. Anal., 25 (1989), pp. 1376-1382, https://doi.org/10.1137/0725079. (Cited on p. 13)
[95] J. F. Traub, Iterative Methods for the Solutions of Equations, Prentice Hall, Englewood Cliffs, NJ, 1964. (Cited on pp. 13, 31, 34)
[96] E. E. Tyrtyshnikov, A Brief Introduction to Numerical Analysis, Birkhauser/Springer, New York, 1997. (Cited on p. 24)
[97] J. S. Vandergraft, Introduction to Numerical Computations, 2nd ed., Academic Press, New York, 1983. (Cited on p. 33)
[98] H. F. Walker, An approach to continuation using Krylov subspace methods, in Computational Science in the 21st Century, J. Periaux, ed., John Wiley and Sons, 1997, pp. 72-81. (Cited on p. 19)
[99] S. J. Wright, Primal-Dual Interior-Point Methods, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971453. (Cited on p. 12)
[100] S. J. Wright, Optimization algorithms for data analysis, in The Mathematics of Data, IAS/Park City Math. Ser. 25, AMS, Providence, RI, 2018, pp. 49-97. (Cited on p. 15)
[101] T. J. Ypma, Historical development of the Newton-Raphson method, SIAM Rev., 37 (1995), pp. 531–551, https://doi.org/10.1137/1037125. (Cited on pp. 7, 21, 22, 23, 30, 34)