# How many steps still left to x*?

We survey the convergence orders of sequences and show the orders of the basic iterative sequences for solving nonlinear equations in $${\mathbb R}$$: the Newton method, the secant method, and the successive approximations).

## Abstract

The high speed of $$x_{k}\rightarrow x^\ast\in{\mathbb R}$$ is usually measured using the C-, Q- or R-orders:
\begin{equation}\tag{$C$}
\lim \frac {|x^\ast – x_{k+1}|}{|x^\ast – x_k|^{p_0}}\in(0,+\infty),
\end{equation}
\begin{equation}\tag{$Q$}
\lim \frac {\ln |x^\ast – x_{k+1}|}{\ln |x^\ast – x_k|} =q_0,
\end{equation}
\begin{equation}\tag{$R$}
\lim \big\vert\ln |x^\ast – x_k| \big \vert ^\frac1k =r_0.
\end{equation}
By connecting them to the natural, term-by-term comparison of the errors of two sequences, we find that the C-orders — including (sub)linear — are in agreement. Weird relations may appear though for the Q-orders: we expect
$|x^\ast – x_k|=\mathcal{O}(|x^\ast – y_k|^\alpha), \qquad\forall\alpha>1,$
to attract “$$\geq$$” for the Q-orders of $$\{x_{k}\}$$ vs. $$\{y_{k}\}$$; the contrary is shown by an example providing no vs. infinite Q-orders. The R-orders seem even worse: an $$\{x_{k}\}$$ with infinite R-order may have unbounded nonmonotone errors: $$\limsup$$$$|x^\ast – x_{k+1}|/|x^\ast – x_k|= +\infty$$.

Such aspects motivate the study of equivalent definitions, computational variants, etc.

The orders are also the perspective from which we analyze the three basic iterative methods for nonlinear equations. The Newton method, widely known for its quadratic convergence, may in fact attain any C-order from $$[1,+\infty]$$ (including sublinear); we briefly recall such convergence results, along with connected aspects (historical notes, known asymptotic constants, floating point arithmetic issues, radius of attraction balls, etc.) and provide examples.

This approach leads to similar results for the successive approximations, while the secant method exhibits different behavior: it may not have high C-, but just Q-orders.

## Authors

Emil Cătinaş
“Tiberiu Popoviciu” Institute of Numerical Analysis, Romanian Academy

## Keywords

(computational) convergence orders, iterative methods, Newton method, secant method, successive approximations, asymptotic rates.

## PDF

freely available at the publisher: http://doi.org/10.1137/19M1244858

##### Paper coordinates:

E. Catinas, How many steps still left to x*?, SIAM Rev., 63 (2021) no. 3, pp. 585–624, doi: 10.1137/19M1244858

0036-1445

1095-7200

## Github

Some small codes in LaTeX and Julia have been posted on Github in order to illustrate some aspects.

We will post soon in Python too.

## Corrigendum

(introduced on February 10, 2023, modified on February 25, 2023, on August 27, 2023)

Corrigendum:

• for the sequences with $$\bar{Q}_1=\infty$$ we have proposed the terminology “sequences with unbounded nonmonotone errors” (see Remark 2.12.c). This is also termed in the beginning of Section 2.2 as “sequences with no Q-order” – which is not proper, as we have subsequently found sequences with $$\bar{Q}_1=\infty$$ and with Q-order $$1$$, see [Cătinaş, 2023];
• in Remark 2.50.b, “the strict superlinear” should be removed. Indeed, this is not a C-order, but a Q-order, as is defined with the aid of $$\bar{Q}_p$$. On the other hand, the strict superlinear order is not always faster than the C-order $$p>1$$, see [Cătinaş, 2023].

Typos:

• in the abstract, condition $$|x^\ast – x_{k+1}|/|x^\ast – x_{k}| \rightarrow \infty$$ (i.e., $$Q_1=\infty$$), for the sequence with unbounded nonmonotone error, should read instead $$\limsup |x^\ast – x_{k+1}|/|x^\ast – x_{k}| = \infty$$ (i.e., $$\bar{Q}_1=\infty$$); obviously, condition $$Q_1=\infty$$ cannot be fulfilled by a convergent sequence. The matter is correctly dealt with inside the paper (see Remark 2.12.c).

## Notes

Working with the SIAM staff was an amazing experience. I would like to thank to Louis Primus for his solicitude and professionalism.

Version: October 30, 2021.

Some references I have missed:

(these were included in the initial submission, but missing in the final version, as not cited in text, and we switched to BibTeX)

known to me, but still missing (nobody is perfect):

found after sending the revised version:

• J. H. Wilkinson, The evaluation of the zeros of ill-conditioned polynomials. part ii, Numer. Math., 1 (1959), pp. 167–180, doi: 10.1007/BF01386382
• D. F. Bailey and J. M. Borwein, Ancient indian square roots: an exercise in forensic paleo mathematics, Amer. Math. Monthly, 119 (2012), pp. 646–657, doi 10.4169/amer.math.monthly.119.08.646.
• E. Halley, Methodus nova accurata & facilis inveniendi radices aeqnationum quarumcumque generaliter, sine praviae reductione, Phil. Trans. R. Soc., 18 (1694), pp. 136–148, https://doi.org/10.1098/rstl.1694.0029.
• A. Ralston and P. Rabinowitz, A First Course in Numerical Analysis, 2nd ed., 1978.
• M. A. Nordgaard, The origin and development of our present method of extracting the square and cube roots of numbers, The Mathematics Teacher, 17 (1924), pp. 223–238, https://doi.org/https://www.jstor.org/stable/27950616.

and others (to be completed)

 I. K. Argyros and S. George, On a result by Dennis and Schnabel for Newton’s method: Further improvements, Appl. Math. Lett., 55 (2016), pp. 49-53, https://doi.org/10.1016/j.aml.2015.12.003. (Cited on p. 28)
 D. F. Bailey, A historical survey of solution by functional iteration, Math. Mag., 62 (1989), pp. 155-166, https://doi.org/10.1080/0025570X.1989.11977428. (Cited on pp. 7, 21, 34)
 W. A. Beyer, B. R. Ebanks, and C. R. Qualls, Convergence rates and convergence-order profiles for sequences, Acta Appl. Math., 20 (1990), pp. 267-284, https://doi.org/10.1007/bf00049571. (Cited on pp. 8, 9, 11, 14, 17, 19, 21)
 J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: A fresh approach to numerical computing, SIAM Rev., 59 (2017), pp. 65-98, https://doi.org/10.1137/141000671. (Cited on p. 4)
 R. P. Brent, Algorithms for Minimization without Derivatives, Prentice-Hall, Englewood Cliffs, NJ, 1973. (Cited on p. 11)
 R. P. Brent, S. Winograd, and P. Wolfe, Optimal iterative processes for root-finding, Numer. Math., 20 (1973), pp. 327-341, https://doi.org/10.1007/bf01402555. (Cited on pp. 13, 15)
 C. Brezinski, Comparaison des suites convergentes, Rev. Franҫaise Inform. Rech. Oper., 5 (1971), pp. 95-99. (Cited on pp. 13, 14)
 C. Brezinski, Acceleration de la Convergence en Analyse Numerique, Springer-Verlag, Berlin, 1977. (Cited on pp. 2, 13, 14)
 C. Brezinski, Limiting relationships and comparison theorems for sequences, Rend. Circolo Mat. Palermo Ser. II, 28 (1979), pp. 273-280, https://doi.org/10.1007/bf02844100. (Cited on p. 19)
 C. Brezinski, Vitesse de convergence d’une suite, Rev. Roumaine Math. Pures Appl., 30 (1985), pp. 403-417. (Cited on pp. 11, 13, 18)
 P. N. Brown, A local convergence theory for combined inexact-Newton/finite-difference projection methods, SIAM J. Numer. Anal., 24 (1987), pp. 407-434, https://doi.org/10.1137/0724031. (Cited on p. 29)
 F. Cajori, Historical note on the Newton-Raphson method of approximation, Amer. Math. Monthly, 18 (1911), pp. 29-32, https://doi.org/10.2307/2973939. (Cited on pp. 22, 23)
 A. Cauchy, Sur la determination approximative des racines d’une equation algebrique ou transcendante, Oeuvres Complete (II) 4, Gauthier-Villars, Paris, 1899, 23 (1829), pp. 573-609, http://iris.univ-lille1.fr/pdfpreview/bitstream/handle/1908/4026/41077-2-4.pdf?sequence=1 (accessed 2019/02/01). (Cited on pp. 21, 23)
 J.-L. Chabert, ed., A History of Algorithms. From the Pebble to the Microchip, Springer Verlag, Berlin, 1999. (Cited on pp. 21, 22, 23, 28, 30, 34)
 J. Chen and I. K. Argyros, Improved results on estimating and extending the radius of an attraction ball, Appl. Math. Lett., 23 (2010), pp. 404-408, https://doi.org/10.1016/j.aml.2009.11.007. (Cited on p. 35)
 A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust-Region Methods, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719857. (Cited on pp. 14, 17)
 E. Catinas, Inexact perturbed Newton methods and applications to a class of Krylov solvers, J. Optim. Theory Appl., 108 (2001), pp. 543-570, https://doi.org/10.1023/a:1017583307974. (Cited on p. 29)
 E. Catinas, On accelerating the convergence of the successive approximations method, Rev. Anal. Numer. Theor. Approx., 30 (2001), pp. 3-8, https://ictp.acad.ro/acceleratingconvergence-successive-approximations-method/ (accessed 20/10/10). (Cited on p. 34)
 E. Catinas, On the superlinear convergence of the successive approximations method, J. Optim. Theory Appl., 113 (2002), pp. 473-485, https://doi.org/10.1023/a:1015304720071. (Cited on p. 35)
 E. Catinas, The inexact, inexact perturbed and quasi-Newton methods are equivalent models, Math. Comp., 74 (2005), pp. 291-301, https://doi.org/10.1090/s0025-5718-04-01646-1. (Cited on p. 29)
 E. Catinas, Newton and Newton-type Methods in Solving Nonlinear Systems in RN , Risoprint, Cluj-Napoca, 2007. (Cited on p. 35)
 E. Catinas, Estimating the radius of an attraction ball, Appl. Math. Lett., 22 (2009), pp. 712-714, https://doi.org/10.1016/j.aml.2008.08.007. (Cited on p. 35)
 E. Catinas, A survey on the high convergence orders and computational convergence orders of sequences, Appl. Math. Comput., 343 (2019), pp. 1-20, https://doi.org/10.1016/j.amc.2018.08.006. (Cited on pp. 7, 8, 10, 11, 12, 14, 15, 16, 17, 18, 20, 21)
 E. Catinas, How Many Steps Still Left to x*? Part II. Newton iterates without f and f’, manuscript, 2020. (Cited on pp. 24, 35)
 G. Dahlquist and A. Bjork, Numerical Methods, Dover, Mineola, NY, 1974. (Cited on p. 33)
 R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal., 19 (1982), pp. 400-408, https://doi.org/10.1137/0719025. (Cited on p. 29)
 J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971446. (Cited on pp. 3, 29)
 J. E. Dennis, On Newton-like methods, Numer. Math., 11 (1968), pp. 324-330, https://doi.org/10.1007/bf02166685. (Cited on p. 17)
 J. E. Dennis, Jr., and J. J. More, A characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comp., 28 (1974), pp. 549-560, https://doi.org/10.1090/s0025-5718-1974-0343581-1. (Cited on pp. 19, 29)
 J. E. Dennis, Jr., and J. J. More, Quasi-Newton methods, motivation and theory, SIAM Rev., 19 (1977), pp. 46-89, https://doi.org/10.1137/1019005. (Cited on pp. 27, 29)
 J. E. Dennis, Jr., and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, PA, 1996, https://doi.org/10.1137/1.9781611971200. (Cited on pp. 15, 28)
 G. Deslauries and S. Dubuc, Le calcul de la racine cubique selon Heron, Elem. Math., 51 (1996), pp. 28-34, http://eudml.org/doc/141587 (accessed 2020/02/18). (Cited on p. 30)
 P. Deuflhard, Newton Methods for Nonlinear Problems. Affine Invariance and Adaptive Algorithms, Springer-Verlag, Heidelberg, 2004. (Cited on pp. 23, 24)
 P. Deuflhard and F. A. Potra, Asymptotic mesh independence of Newton-Galerkin methods via a refined Mysovskii theorem, SIAM J. Numer. Anal., 29 (1992), pp. 1395-1412, https://doi.org/10.1137/0729080. (Cited on p. 28)
 P. Diez, A note on the convergence of the secant method for simple and multiple roots, Appl. Math. Lett., 16 (2003), pp. 1211-1215, https://doi.org/10.1016/s0893-9659(03)90119-4. (Cited on pp. 27, 31)
 F. Dubeau and C. Gnang, Fixed point and Newton’s methods for solving a nonlinear equation: From linear to high-order convergence, SIAM Rev., 56 (2014), pp. 691-708, https://doi.org/10.1137/130934799. (Cited on pp. 25, 34)
 J. Fourier, Question d’analyse algebrique, Bull. Sci. Soc. Philo., 67 (1818), pp. 61-67, http://gallica.bnf.fr/ark:/12148/bpt6k33707/f248.item (accessed 2018-02-23). (Cited on pp. 7, 28)
 W. Gautschi, Numerical Analysis, 2nd ed., Birkhauser/Springer, New York, 2011, https://doi.org/10.1007/978-0-8176-8259-0. (Cited on p. 23)
 A. Greenbaum and T. P. Chartier, Numerical Methods. Design, Analysis, and Computer Implementation of Algorithms, Princeton University Press, Princeton, NJ, 2012. (Cited on pp. 23, 27, 31)
 M. T. Heath, Scientific Computing. An Introductory Survey, 2nd ed., SIAM, Philadelphia, PA, 2018, https://doi.org/10.1137/1.9781611975581. (Cited on pp. 9, 29)
 M. Heinkenschloss, Scientific Computing. An Introductory Survey, Rice University, Houston, TX, 2018. (Cited on pp. 3, 15, 19)
 J. Herzberger and L. Metzner, On the Q-order of convergence for coupled sequences arising in iterative numerical processes, Computing, 57 (1996), pp. 357-363, https://doi.org/10.1007/BF02252254. (Cited on p. 13)
 N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, Philadelphia, PA, 2002, https://doi.org/10.1137/1.9780898718027. (Cited on p. 3)
 M. Igarashi and T. J. Ypma, Empirical versus asymptotic rate of convergence of a class of methods for solving a polynomial equation, J. Comput. Appl. Math., 82 (1997), pp. 229-237, https://doi.org/10.1016/s0377-0427(97)00077-0. (Cited on p. 25)
 L. O. Jay, A note on Q-order of convergence, BIT, 41 (2001), pp. 422-429, https://doi.org/10.1023/A:1021902825707. (Cited on pp. 5, 11, 12, 17)
 T. A. Jeeves, Secant modification of Newton’s method, Commun. ACM, 1 (1958), pp. 9-10, https://doi.org/10.1145/368892.368913. (Cited on pp. 11, 30)
 V. J. Katz, A History of Mathematics. An Introduction, 2nd ed., Addison-Wesley, 1998. (Cited on p. 21)
 C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, PA, 1995, https://doi.org/10.1137/1.9781611970944. (Cited on pp. 5, 10, 17)
 C. T. Kelley, Iterative Methods for Optimization, SIAM, Philadelphia, PA, 1999, https://doi.org/10.1137/1.9781611970920. (Cited on pp. 15, 17)
 E. S. Kennedy and W. R. Transue, A medieval iterative algorism, Amer. Math. Monthly, 63 (1956), pp. 80-83, https://doi.org/10.1080/00029890.1956.11988762. (Cited on p. 34)
 D. E. Knuth, Big Omicron and big Omega and big Theta, ACM SIGACT News, 8 (1976), pp. 18-24, https://doi.org/10.1145/1008328.1008329. (Cited on pp. 2, 14)
 D. E. Knuth, The TeXbook, Addison-Wesley, 1984. (Cited on p. 4)
 N. Kollerstrom, Thomas Simpson and Newton’s method of approximation: an enduring myth, British J. History Sci., 25 (1992), pp. 347-354, https://doi.org/10.1017/s0007087400029150. (Cited on pp. 22, 23)
 K. Liang, Homocentric convergence ball of the secant method, Appl. Math. Chin. Univ., 22 (2007), pp. 353–365, https://doi.org/10.1007/s11766-007-0313-3. (Cited on p. 33)
 D. Luca and I. Pavaloiu, On the Heron’s method for approximating the cubic root of a real number, Rev. Anal. Numer. Theor. Approx., 26 (1997), pp. 103-108, https://ictp.acad.ro/jnaat/journal/article/view/1997-vol26-nos1-2-art15 (accessed 2020-03-21). (Cited on p. 30)
 M. Grau-Sanchez, M. Noguera, and J. M. Gutierrez, On some computational orders of convergence, Appl. Math. Lett., 23 (2010), pp. 472-478, https://doi.org/10.1016/j.aml.2009.12.006. (Cited on p. 20)
 Mathworks, MATLAB, Version 2019b, 2019, http://www.mathworks.com (accessed 2019/12/12). (Cited on pp. 4, 26)
 J. M. McNamee, Numerical Methods for Roots of Polynomials, Part I, Elsevier, Amsterdam, 2007. (Cited on p. 23)
 J. M. McNamee and V. Y. Pan, Numerical Methods for Roots of Polynomials, Part II, Elsevier, Amsterdam, 2013. (Cited on pp. 28, 33)
 U. C. Merzbach and C. B. Boyer, A History of Mathematics, 3rd ed., John Wiley & Sons, 2011. (Cited on p. 30)
 J.-M. Muller, N. Brunie, F. de Dinechin, C.-P. Jeannerod, M. Joldes, V. Lefevre, G. Melquiond, N. Revol, and S. Torres, Handbook of Floating-Point Arithmetic, 2nd ed., Springer, New York, 2018, https://doi.org/10.1007/978-3-319-76526-6. (Cited on pp. 22, 24)
 A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, 1983. (Cited on p. 15)
 A. Neumaier, Introduction to Numerical Analysis, Cambridge University Press, 2001. (Cited on pp. 31, 33)
 J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer, New York, 2006, https://doi.org/10.1007/978-0-387-40065-5. (Cited on pp. 8, 17)
 J. M. Ortega, Numerical Analysis. A Second Course, SIAM, Philadelphia, PA, 1990, https://doi.org/10.1137/1.9781611971323. (Cited on pp. 28, 34)
 J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719468. (Cited on pp. 8, 10, 11, 14, 15, 16, 18, 21, 27, 34)
 A. M. Ostrowski, Solution of Equations and Systems of Equations, 2nd ed., Academic Press, New York, 1966; 3rd ed., 1972. (Cited on pp. 27, 30, 34)
 M. L. Overton, Numerical Computing with IEEE Floating Point Arithmetic, SIAM, Philadelphia, PA, 2001, https://doi.org/10.1137/1.9780898718072. (Cited on pp. 3, 29)
 J. M. Papakonstantinou and R. A. Tapia, Origin and evolution of the secant method in one dimension, Amer. Math. Monthly, 120 (2013), pp. 500-518, https://doi.org/10.4169/amer.math.monthly.120.06.500. (Cited on p. 30)
 K. Plofker, An example of the secant method of iterative approximation in a fifteenth century sanskrit text, Historia Math., 23 (1996), pp. 246-256, https://doi.org/10.1006/hmat.1996.0026. (Cited on pp. 30, 34)
 K. Plofker, Use and transmission of iterative approximations in India and the Islamic World, in From China to Paris: 2000 Years Transmission of Mathematical Ideas, Y. Dold-Samplonius, J. W. Dauben, M. Folkerts, and B. Van Dalen, eds., Franz Steiner Verlag Wiesbaden GmbH, Stuttgart, 2002, pp. 167-186. (Cited on pp. 21, 22, 30, 34)
 E. Polak, Optimization. Algorithms and Consistent Approximations, Springer-Verlag, New York, 1997. (Cited on pp. 5, 11, 15, 19)
 F. A. Potra, On Q-order and R-order of convergence, J. Optim. Theory Appl., 63 (1989), pp. 415-431, https://doi.org/10.1007/bf00939805. (Cited on pp. 8, 11, 13, 14, 15, 16, 17, 18, 21)  F. A. Potra, Q-superlinear convergence of the iterates in primal-dual interior-point methods, Math. Program. Ser. A, 91 (2001), pp. 99-115, https://doi.org/10.1007/s101070100230. (Cited on p. 15)
 F. A. Potra, A superquadratic version of Newton’s method, SIAM J. Numer. Anal., 55 (2017), pp. 2863-2884, https://doi.org/10.1137/17M1121056. (Cited on p. 13)
 F. A. Potra and V. Ptak, Nondiscrete Induction and Iterative Processes, Pitman, Boston, MA, 1984. (Cited on pp. 8, 11, 13, 15, 19)
 F. A. Potra and R. Sheng, A path following method for LCP with superlinearly convergent iteration sequence, Ann. Oper. Res., 81 (1998), pp. 97-11, https://doi.org/10.1023/A:1018942131812. (Cited on p. 14)
 I. Pavaloiu, Sur l’ordre de convergence des methodes d’iteration, Mathematica, 23(46) (1981), pp. 261-272, https://doi.org/10.1007/BF02576543. (Cited on p. 14)
 I. Pavaloiu, Optimal algorithms concerning the solving of equations by interpolation, Ed. Srima, (1999), pp. 222-248, https://ictp.acad.ro/optimal-algorithms-concerning-solvingequations-interpolation/ (accessed 2018-03-22). (Cited on p. 13)
 A. Quarteroni, R. Sacco, and F. Saleri, Numerical Mathematics, Springer, Berlin, 2007. (Cited on pp. 28, 29)
 R. Rashed, The Development of Arabic Mathematics: Between Arithmetic and Algebra, Stud. Philos. Sci. 156, Springer, Boston, MA, 1994. (Cited on pp. 22, 30)
 M. Raydan, Exact order of convergence of the secant method, J. Optim. Theory Appl., 78 (1993), pp. 541-551, https://doi.org/10.1137/060670341. (Cited on pp. 25, 31)
 W. C. Rheinboldt, An adaptive continuation process for solving systems of nonlinear equations, in Mathematical Models and Numerical Methods, Banach Ctr. Publ. 3, Polish Academy of Science, 1977, pp. 129-142, https://doi.org/10.4064/-3-1-129-142. (Cited on p. 28)
 W. C. Rheinboldt, Methods for Solving Systems of Nonlinear Equations, 2nd ed., SIAM, Philadelphia, PA, 1998, https://doi.org/10.1137/1.9781611970012. (Cited on pp. 11, 15)
 T. Sauer, Numerical Analysis, 2nd ed., Pearson, Boston, MA, 2012. (Cited on pp. 21, 29)
 E. Schroder, Ueber unendlich viele algorithmen zur auflosung der gleichungen, Math. Ann., 2 (1870), pp. 317-365, https://doi.org/10.1007/bf01444024; available as On Infinitely Many Algorithms for Solving Equations, Tech reports TR-92-121 and TR-2990, Institute for Advanced Computer Studies, University of Maryland, 1992; translated by G. W. Stewart. (Cited on pp. 7, 34)
 H. Schwetlick, Numerische Losung Nichtlinearer Gleichungen, VEB, Berlin, 1979. (Cited on pp. 8, 13, 14, 16)
 L. F. Shampine, R. C. Allen, Jr., and S. Pruess, Fundamentals of Numerical Computing, John Wiley and Sons, New York, 1997. (Cited on pp. 29, 33)
 T. Steihaug, Computational science in the eighteenth century. Test cases for the methods of Newton, Raphson, and Halley: 1685 to 1745, Numer. Algor., 83 (2020), pp. 1259-1275, https://doi.org/10.1007/s11075-019-00724-8. (Cited on pp. 22, 23)
 E. Suli and D. Mayers, An Introduction to Numerical Analysis, Cambridge University Press, Cambridge, UK, 2003. (Cited on p. 33)
 T. Tantau, The tikz and pgf Packages. Manual for Version 3.1.5b-34-gff02ccd1, https://github.com/pgf-tikz/pgf. (Cited on p. 4)
 R. A. Tapia, J. E. Dennis, Jr., and J. P. Schafermeyer, Inverse, shifted inverse, and Rayleigh quotient iteration as Newton’s method, SIAM Rev., 60 (2018), pp. 3-55, https://doi.org/10.1137/15M1049956. (Cited on pp. 7, 16, 17, 18)
 R. A. Tapia and D. L. Whitley, The projected Newton method has order 1 + √2 for the symmetric eigenvalue problem, SIAM J. Numer. Anal., 25 (1989), pp. 1376-1382, https://doi.org/10.1137/0725079. (Cited on p. 13)
 J. F. Traub, Iterative Methods for the Solutions of Equations, Prentice Hall, Englewood Cliffs, NJ, 1964. (Cited on pp. 13, 31, 34)
 E. E. Tyrtyshnikov, A Brief Introduction to Numerical Analysis, Birkhauser/Springer, New York, 1997. (Cited on p. 24)
 J. S. Vandergraft, Introduction to Numerical Computations, 2nd ed., Academic Press, New York, 1983. (Cited on p. 33)
 H. F. Walker, An approach to continuation using Krylov subspace methods, in Computational Science in the 21st Century, J. Periaux, ed., John Wiley and Sons, 1997, pp. 72-81. (Cited on p. 19)
 S. J. Wright, Primal-Dual Interior-Point Methods, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971453. (Cited on p. 12)
 S. J. Wright, Optimization algorithms for data analysis, in The Mathematics of Data, IAS/Park City Math. Ser. 25, AMS, Providence, RI, 2018, pp. 49-97. (Cited on p. 15)
 T. J. Ypma, Historical development of the Newton-Raphson method, SIAM Rev., 37 (1995), pp. 531–551, https://doi.org/10.1137/1037125. (Cited on pp. 7, 21, 22, 23, 30, 34)

The high speed of x k â†’ x * âˆˆ R is usually measured using the C -, Q -, or R -orders: lim | x * âˆ’ x k + 1 | | x * âˆ’ x k | p 0 âˆˆ ( 0 , + âˆž ) , lim ln â¡ | x * âˆ’ x k + 1 | ln â¡ | x * âˆ’ x k | = q 0 , or lim | ln â¡ | x * âˆ’ x k | | 1 k = r 0 . By connecting them to the natural, term-by-term comparison of the errors of two sequences, we find that the C -orders—including (sub)linear—are in agreement. Weird relations may appear though for the Q -orders: we expect | x * âˆ’ x k | = O â¡ ( | x * âˆ’ y k | Î± ) âˆ€ Î± > 1 to imply “ â‰¥ ” for the Q -orders of { x k } vs.Â  { y k } ; the contrary is shown by an example providing no vs.Â infinite Q -orders. The R -orders appear to be even worse: an { x k } with infinite R -order may have unbounded nonmonotone errors: | x * âˆ’ x k + 1 | / | x * âˆ’ x k | â†’ + âˆž . Such aspects motivate the study of equivalent definitions, computational variants, and so on. These orders are also the perspective from which we analyze the three basic iterative methods for nonlinear equations in R . The Newton method, widely known for its quadratic convergence, may in fact attain any C -order from [ 1 , + âˆž ] (including sublinear); we briefly recall such convergence results, along with connected aspects (such as historical notes, known asymptotic constants, floating point arithmetic issues, and radius of attraction balls), and provide examples.This approach leads to similar results for the successive approximations method, while the secant method exhibits different behavior: it may not have high C -orders, but only Q -orders. (computational) convergence orders, iterative methods, Newton method, secant method, successive approximations, asymptotic rates 65-02, 65-03, 41A25, 40-02, 97-02, 97N40, 65H05 10.1137/19M1244858 \bfseriesContents \setcounter{tocdepth}{3} Introduction The analysis of the convergence speed of sequences is an important task, since in numerical applications the aim is to use iterative methods with fast convergence, which provide good approximations in few steps. The ideal setting for studying a given sequence { x k } := ( x k ) k â‰¥ 0 âŠ‚ R is that of error-based analysis, where the finite limit x * is assumed to be known and we analyze the absolute values of the errors, e k := | x k âˆ’ x * | . The errors allow the comparison of the speeds of two sequences in the most natural way, even if their limits are distinct (though here we write both as x * ). Let us first define notation, for later reference. Given x Ë™ k , k â†’ x * âˆˆ R , denote their errors by { e Ë™ k } , resp., { k } , etc. { x Ë™ k } converges faster (not slower) than { k } if | x * âˆ’ x Ë™ k | â‰¤ | x * âˆ’ k | , â€ƒ k â‰¥ k 0 (or, in brief, { x Ë™ k } is \cref{eq: |x*-xk| <= |x*-yk|} faster than { k } ) and strictly faster if | x * âˆ’ x Ë™ k | < | x * âˆ’ k | , â€ƒ k â‰¥ k 0 . Furthermore, increasingly faster convergence holds if – for some c âˆˆ ( 0 , 1 ) we have | x * âˆ’ x Ë™ k | â‰¤ c â¢ | x * âˆ’ k | , k â‰¥ k 0 ; – the constant c above is not fixed, but tends to zero (see, e.g., p.~2, ), | x * âˆ’ x Ë™ k | = o â¡ ( | x * âˆ’ k | ) â€ƒ asâ€‚ k â†’ âˆž ; This means that | x * âˆ’ x Ë™ k | â‰¤ c k â¢ | x * âˆ’ k | , k â‰¥ k 0 , with c k â†’ 0 , which allows a finite number of elements c k to be greater than one. – given Î± > 1 , we have | x * âˆ’ x Ë™ k | = O â¡ ( | x * âˆ’ k | Î± ) as k â†’ âˆž . That is, âˆƒ K > 0 such that | x * âˆ’ x Ë™ k | â‰¤ K â¢ | x * âˆ’ k | Î± , k â‰¥ k 0 . Obviously, \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} â‡’ \cref{eq: |x*-xk| = o(|x*-yk|)} â‡’ \cref{eq: |x*-xk| < |x*-yk|} â‡’ \cref{eq: |x*-xk| <= |x*-yk|}. Let us first illustrate the convergence speed in an intuitive fashion, using some graphs. Consider the following two groups of sequences ( { x k } = { e k } ) : (a) { 1 k } , { 1 k } , { 1 k 4 } ; (b) { k 2 k } , { 1 2 k } = { ( 1 2 ) k } = { 2 âˆ’ k } , { 1 k â¢ 2 k } , { 1 4 k } . Handling the first terms of these sequences raises no difficulties, as all the common programming languages represent them as fl â¡ ( x k ) in standard double precision arithmetic (called \ttdigits64 by the IEEE 754-2008 standard); see \cref{exe. smallest k for xk in a)-b)} below. The plotting of ( k , e k ) in \cref{fig: plot errs linear conv} does not help in analyzing the behavior of the errors, as for k â‰¥ 10 we cannot see much, even if the graph is scaled. The implicit fitting of the figures in the window usually results in different units for the two axes (scaled graph). In \cref{fig: plot errs linear conv} we would see even less if the graph were unscaled. All we can say is that { 1 k } has the slowest convergence, followed by { 1 k } , and also that the first terms of { 1 k 4 } are smaller than those of { 1 2 k } . In order to compare the last terms we must use the semilog coordinates ( k , lg â¡ e k ) in \cref{fig: semilog errs linear conv} (see p.~211); for { 1 2 k } , they are ( k , k â¢ lg â¡ 1 2 ) , belonging to the line y = a â¢ x , a = lg â¡ 1 2 . As the graph is twice scaled, The first scaling is by the use of an unequal number of units ( 50 on axis x vs.Â  36 on axis y ), and the second one results from the fact that the final figure is a rectangle instead of a square. it actually shows another line, y = b â¢ x . \Cref{fig: semilog errs linear conv} shows that, in reality, { 1 2 k } is \cref{eq: |x*-xk| < |x*-yk|} faster than { 1 k 4 } ( k 0 = 17 ). We will see later that the convergence is sublinear in group (a) and linear in group (b). Show that (a) { 1 2 k } is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} âˆ€ Î± > 1 ] faster than { 1 k 4 } ; (b) { 1 2 k } is \cref{eq: |x*-xk| = o(|x*-yk|)} but not [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} for some Î± > 1 ] faster than { k 2 k } . (a) The smallest positive \ttdigits64 number, which we denote by \ttrealmin(digits64), is 2.2250738585072014 â¢ e âˆ’ 308 . Analyzing its binary representation, express its value as a power of 2 (see p.~14, , or ). (b) For each sequence from (a) and (b) in the example above, compute the largest k such that fl â¡ ( x k ) â‰  0 . (c) Give a formula for the largest index k for which all the elements fl â¡ ( x k ) of all the sequences from (a) and (b) in the example above are nonzero. Next we analyze some increasingly faster classes of sequences. Consider (c) { 1 2 k 2 } , { 1 2 k 3 } , { 1 k k } ; (d) { 1 2 2 k k } , { 1 2 2 k } = { ( 1 2 ) 2 k } = { 2 âˆ’ 2 k } , { 1 2 k â¢ 2 k } , { 1 3 2 k } ; (e) { 1 2 3 k } ; (f) { 1 2 2 k â¢ k } , { 1 2 2 k 2 } . In \cref{fig: semilogy errs ord 1 and sl – line parabola and betw} we plot { 1 4 k } , { 1 2 k 2 } , and { 1 k k } . Their graphs are on a line (the fastest from \cref{fig: semilog errs linear conv}), on a parabola y = c â¢ x 2 , and, resp., in between. The computation with what appears at first sight to be a “reasonable” number of terms (say, 10 ) becomes increasingly challenging as we successively consider { x k } from (c)–(f). We leave as an exercise the calculation/computation of the largest index k for each { x k } in (c)–(f) such that fl â¡ ( x k ) is a nonzero \ttdigits64 number; we note though that all fl â¡ ( 1 / 2 2 k 2 ) and fl â¡ ( 1 / 2 2 k â¢ k ) are zero for k â‰¥ 4 , respectively, k â‰¥ 5 , and so \ttdigits64 numbers are not enough here. There are plenty of choices for increased precision: the well-known MATLAB (but with the Advanpix toolbox providing arbitrary precision), the recent, freely available Julia , and many others (not to mention the programming languages for symbolic computation). All the graphs from this paper were obtained using the tikz/pgf package for LATEX. The initial TEXÂ system was created by D.~E.Â Knuth . The figures are easy to customize and the instructions to generate them have a simple form, briefly, \tt \addplot [domain=1:20, samples=20] {1/sqrt(x)}; Equally important, we will see that the tikz/pgf library allows the use of higher precision than \ttdigits64 (more precisely, smaller/larger numbers by increased representation of the exponent, e.g., \ttrealmin around 10 âˆ’ 6500 ). The LATEXÂ sources for the figures are posted on \url{https://github.com/ecatinas/conv-ord}. The sequence groups (c)–(f) are presented in \cref{fig: semilogy errs std conv sl 1 2 3 inf}; { 1 2 k 3 } belongs to a cubic parabola, and { 1 2 2 k } to an exponential. The parabola of { 1 2 k 2 } appears flat, which shows how much faster the other sequences converge. The descent of the three terms of { 1 / 2 2 k 2 } is the steepest. As will be seen later, the sequence groups (c)–(f) have increasing order: strict superlinear, quadratic, cubic, and infinite, respectively. Show that { 1 3 2 k } is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))}, Î± âˆˆ ( 1 , ln â¡ 3 ln â¡ 2 ] ] faster than { 1 2 2 k } . Obviously, when comparing two sequences by \cref{eq: |x*-xk| < |x*-yk|}, the faster one must have its graph below the other one ( k â‰¥ k 0 ) (cf.~Polak p.~47). While the more concave the graph the better (cf.~Kelley Ex.~5.7.15), fast convergence does not in fact necessarily require smooth graphs (see Jay ). Prove that any convergence curve between { 1 2 3 k } and { 1 3 2 k } is at least [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))}, Î± âˆˆ ( 1 , ln â¡ 3 ln â¡ 2 ] ] faster than { 1 2 2 k } (hint: use \cref{exe. 1/3^2^k = O((1/2^2^k)^1+alpha)}). Consider, for instance, the merging of certain sequences. Let x k a = { 1 3 2 k , k odd , 1 2 3 k , k even , â€ƒâ€ƒ x k b = { 1 2 2 k , k even , 1 3 2 k , k odd , â€ƒâ€ƒ x k c = { 1 2 2 k , k even , 1 5 2 k , k odd . { 1 2 3 k } , { x k a } , { 1 3 2 k } , { x k b } , { 1 2 2 k } , written in \cref{eq: |x*-xk| <= |x*-yk|} order of decreasing speed, are plotted in \Cref{fig: semilogy errs merged seq ords 3 2}. The usual ranking by convergence orders is instead { 1 2 3 k } , { 1 3 2 k } , { 1 2 2 k } , { x k b } , { x k a } ( C -orders 3 , 2 , 2 , R -order 2 , no order). Though { x k a } has the second fastest convergence, the orders actually rank it as the slowest. { x k b } does not have C – or Q -order (but just exact R -order 2 ), so is usually ranked below { 1 2 2 k } . { 1 5 2 k } , { x k c } , { 1 2 2 k } , { 1 1.3 2 k } from \cref{fig: semilogy errs seq ords 3 2 xck 2} have ranking { 1 5 2 k } , { 1 2 2 k } , { 1 1.3 2 k } , { x k c } ( C -orders 2 , 2 , 2 , R -order 2 ). Though nonmonotone, { x k c } has exact R -order 2 and is (at least) \cref{eq: |x*-xk| < |x*-yk|} faster than the C -quadratic { 1 1.3 2 k } . Given { x Ë™ k } , { k } with errors shown in \cref{fig: semilogy errs barbar xk= bar xk = 1/2^2^k in 2 different windows}, which one is faster? The study of errors using \cref{comp: def term by term} seems Of course, one cannot always compare all sequences by \cref{eq: |x*-xk| <= |x*-yk|}: consider { x k a } and { 1 6 2 k } , for example. perfect in theory, but has a notable disadvantage: it requires two sequences. The alternative is to use a “scale” for measuring any single sequence—the convergence orders. The (classical) C -orders, though they demand some strong conditions, appear in the convergence of the most often encountered iterative methods for nonlinear equations. A notable exception: the higher orders of the secant method are Q – but not necessarily C -orders. Still, some problems appear, for example, when one wants to check the usual predictions such as “in quadratic convergence, the error is squared at each step.” Indeed, as a sequence is an infinite set, its order refers to the asymptotic range (all sufficiently large indices), where statements such as “if Îµ > 0 is suff. small, âˆƒ k 0 â‰¥ 0 s.t. [some property] holds âˆ€ k â‰¥ k 0 ” come to life. Clearly, the first k 0 terms ( k 0 = 2 , 10 , or perhaps 10 6 ) are not necessarily connected to the order: a finite number of terms does not contribute to the order of an abstract sequence, while, most painful, the high order of an iterative method may not be reflected in its first steps if the initial approximation is not good enough. Therefore, such predictions, instead of being applicable to the very first term, are usually applied only starting from the first terms from the asymptotic range. When the stringent conditions of C -orders are not fulfilled, the Q -orders are considered instead, but unfortunately they can be in disagreement with the term-by-term comparison of the errors of two sequences; indeed, we show that convergence even with no Q – (or C )-order may in fact hide a fast overall speed, only some of the consecutive errors do not decrease fast enough. The R -orders are instead known for their limited usefulness, requiring the weakest conditions and allowing nonmonotone errors. Consider now a further problem: given some iterations for which the calculation of the order turns out to be difficult or even impossible, can the computer be used to approximate it? A positive answer is given by the computational convergence orders. Despite the fact that convergence orders have a long history, it is only recently that some final connections were made and a comprehensive picture was formed of the C -, Q -, and R -orders, together with their computational variants . The understanding of these notions is important from both the theoretical and the practical viewpoints, the comments of Tapia, Dennis, and SchÃ¤fermeyer being most relevant: The distinction between Q – and R -convergence is quite meaningful and useful and is essentially sacred to workers in the area of computational optimization. However, for reasons not well understood, computational scientists who are not computational optimizers seem to be at best only tangentially aware of the distinction. We devote \cref{sec: C- Q- and R-orders + comparisons} to error-based analysis of the problem of measuring and comparing convergence speeds of abstract real sequences, using either the convergence orders we review, or the term-by-term comparison of errors. In \cref{sec: Comput ver of conv orders} we deal with the computational versions of the convergence orders (based either on the corrections Each x k + 1 can be seen as being obtained from x k by adding the correction. x k + 1 âˆ’ x k or on the nonlinear residuals f â¡ ( x k ) ) and show that in R they are equivalent to the error-based ones. In \cref{sec: iterative meth for nonlin eq} we deal with three main iterative methods (Newton, secant, and successive approximations), presenting results on their attainable convergence orders, as well as other connected aspects (asymptotic constants, estimation of the attraction balls, floating point arithmetic issues, brief history, etc.). C -, Q -, and R -Orders p 0 > 1 vs.Â Asymptotic Rates Even though the roots of iterative methods trace back to the Babylonians and Egyptians (ca.~1800 B.C.) , the first comment on convergence orders seems to have been made by Newton (ca.~1669) on the doubling of digits in quadratic convergence (quoted by Ypma in ): “But if I desire to continue working merely to twice as many figures, less one , â€¦ , ” and then in 1675: “That is, the same Division, by wch you could finde the 6th decimal figure, if prosecuted, will give you all to the 11th decimal figure.” In the Journal Book of the Royal Society, it is recorded “17 December 1690: Mr Ralphson’s Book was this day produced by E Halley, wherein he gives a Notable Improvemt of ye method [â€¦], which doubles the known figures of the Root known by each Operationâ€¦.” Halley noticed the tripling of digits in the method he introduced. In his Tracts, Hutton showed that one of his schemes is of third order . In 1818, Fourier also noted that the iterates double the exact figures at each Newton step. In 1870, SchrÃ¶der implicitly defined the high C -orders for some iterative methods by considering conditions on the nonlinear mapping. Three types of convergence orders are used at present: the classical C -order (notation adopted from ), the Q -order (which contains the C -order as a particular case), and the R -order. Outstanding contributions to their study were successively made by Ortega and Rheinboldt (1970) in their fundamental book , Potra and PtÃ¡k (1984) , Potra (1989) , and finally by Beyer, Ebanks, and Qualls (1990), in the less known but essential paper . In we connected and completed these results (in R N ). We analyze here only the high orders, noting that the various equivalent definitions of the Q -orders lead to intricate implications for linear convergence . C -Order p 0 > 1 The definition of C – and Q -orders is obtained by considering for p â‰¥ 1 the \emquotient convergence factors sect. 9.1 Q p â¡ ( k ) := e k + 1 ( e k ) p = | x * âˆ’ x k + 1 | | x * âˆ’ x k | p , â€ƒ k = 0 , 1 , â€¦ â¢ .3 It is assumed that x k â‰  x * , k â‰¥ 0 . If, though, x k 0 = x * in an iterative method, then (hopefully) we get x k = x * , k â‰¥ k 0 (this holds for the three methods in \cref{sec: iterative meth for nonlin eq}). Let us briefly list the four slowest types of C -order: no C -order, when âˆ„ Q 1 := lim k â†’ âˆž Q 1 â¡ ( k ) (e.g., x k = 1 k , k odd; x k = 2 k , k even); C -sublinear, if Q 1 = 1 (e.g., { 1 k } ); C -linear, if 0 < Q 1 < 1 (e.g., { 1 2 k } ); C -superlinear (defined later, usually called (strict) Q -superlinear); notice that Q 1 > 1 cannot hold, and that { x k a } , { x k c } , with no C -order, in fact are fast. (a) (see p.~620) C -linear order implies strict monotone errors: e k + 1 < e k , â€ƒ k â‰¥ k 0 . (b) C -sublinear â‡ monotone errors (e.g., x k = 1 k âˆ’ 2 , k even, x k = 1 k , k odd, k â‰¥ 3 ). (c) { x k a } , { x k c } , both nonmonotone, have no C -order. If { k } has C -sublinear and { x Ë™ k } C -linear order, show that { x Ë™ k } is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} âˆ€ Î± > 1 ] faster than { k } . The following definition of high orders is well known; p 0 > 1 is implicitly assumed throughout this paper even if not explicitly mentioned. { x k } has C -order p 0 > 1 if Q p 0 := lim k â†’ âˆž Q p 0 â¡ ( k ) âˆˆ ( 0 , + âˆž ) . It is important to stress that Q p 0 above cannot be zero or infinite; these two cases will arise when analyzing the Q -orders. By denoting (see sect. 9.1, p.~83) \barbelow {Q} p = liminf k â†’ âˆž Q p â¡ ( k ) , â€ƒ Q Â¯ p = limsup k â†’ âˆž Q p â¡ ( k ) , â€ƒ p â‰¥ 1 , condition \cref{f. def C-order} is equivalent either to relation 0 < \barbelow {Q} p 0 = Q p 0 = Q Â¯ p 0 < âˆž or to requiring that âˆ€ Îµ > 0 , âˆƒ k 0 â‰¥ 0 such that ( Q p 0 âˆ’ Îµ ) â¢ e k p 0 â‰¤ e k + 1 â‰¤ ( Q p 0 + Îµ ) â¢ e k p 0 â€ƒâ€ƒ âˆ€ k â‰¥ k 0 . \cref{f. def C-order} â‡’ \cref{f. strict monotone errors} (by \cref{eq. C-order p0 <=> (Qp0-eps).ek^p0 <= ekp1 <= (Qp0+eps).ek^p0}, or because \cref{f. def C-order} is stronger than C -linear). (a) x k = Î¸ p 0 k for some Î¸ âˆˆ ( 0 , 1 ) , p 0 > 1 , is the standard example for C -order p 0 ( Q p 0 = 1 ). The usual instances are the C -quadratic { 2 âˆ’ 2 k } , { 3 âˆ’ 2 k } ( Î¸ = 1 2 , resp., Î¸ = 1 3 , p 0 = 2 ), the C -cubic { 2 âˆ’ 3 k } ( Î¸ = 1 2 , p 0 = 3 ), etc. (b) { c â‹… 2 âˆ’ 2 k } ( c âˆˆ R given, c â‰  0 ) also has C -quadratic convergence. (c) The C -quadratic { ( âˆ’ 1 ) k â‹… 2 âˆ’ 2 k } is nonmonotone, but with monotone { e k } . (d) x k d = { c â¢ ( 1 3 ) 2 k , k odd, 1 c â¢ ( 1 3 ) 2 k , k even , c > 1 , does not have C -order 2 : \barbelow {Q} 2 = 1 c 3 , Q Â¯ 2 = c 3 . [Ex.~5.6~ and~p.~245] What is the rate of { x k } if { e k } is (a) 10 âˆ’ 2 , 10 âˆ’ 3 , 10 âˆ’ 4 , 10 âˆ’ 5 , â€¦ ; (b) 10 âˆ’ 2 , 10 âˆ’ 4 , 10 âˆ’ 6 , 10 âˆ’ 8 , â€¦ ; (c) 10 âˆ’ 2 , 10 âˆ’ 3 , 10 âˆ’ 5 , 10 âˆ’ 8 , 10 âˆ’ 13 , 10 âˆ’ 21 , â€¦ ; (d) 10 âˆ’ 2 , 10 âˆ’ 4 , 10 âˆ’ 8 , 10 âˆ’ 16 , â€¦ ? [see, e.g., ] If it exists, the C -order p 0 > 1 of { x k } is unique. Indeed, if raising the power of the denominator in Q p 0 â¡ ( k ) , the quotient tends to infinity, since Q p 0 + Îµ â¡ ( k ) = Q p 0 â¡ ( k ) â¢ 1 ( e k ) Îµ âˆ€ Îµ > 0 . Similarly, if lowering the power, it tends to zero. Let { 10 âˆ’ 5 â‹… 2 âˆ’ 2 k } , { 2 âˆ’ 2 k / k } , and { x k d } ( c = 10 ); in \cref{fig: Qp(k) k=1-13 p=1.8 2 2.2} we plot { Q 1.8 â¡ ( k ) } , { Q 2 â¡ ( k ) } , and { Q 2.2 â¡ ( k ) } for each of them. The limiting values of \barbelow {Q} p , Q Â¯ p from \cref{eq: def barbel Q & bar Q} may be regarded as “functions” of p â‰¥ 1 . Their graphical illustration leads to the so-called Q -convergence profile of { x k } (or Q -profile here, for short). For all C -orders p 0 > 1 , it holds that Q Â¯ p = \barbelow {Q} p = Q p âˆ€ p â‰¥ 1 , and Q p is a “function” given by Q p = 0 , p âˆˆ [ 1 , p 0 ) , Q p 0 âˆˆ ( 0 , + âˆž ) , and Q p = + âˆž , p > p 0 . In \cref{fig: C-order 2} we plot the Q -profile of { 2 âˆ’ 2 k } . [] Show that x k â†’ x * cannot have C -order p 0 < 1 . (a) Show that if { x k } has C -order p 0 > 1 , then so has { e k + 1 e k } and Q p 0 â¢ { e k + 1 e k } = 1 , regardless of Q p 0 â¢ { e k } â‰  0 Rem.~3.9. Does the converse hold? (Hint: take { 1 k } .) (b) Rem.~3.9 Find a similar statement for { e k + 1 â¢ e k } . (c) (Kelley Ex.~5.7.15) If { x k } has C -order p 0 > 1 , show that lg â¡ e k is concave, i.e., lg â¡ e k + 1 âˆ’ lg â¡ e k is decreasing. Not only convergence with any high order exists, but even the infinite C -order, defined either by Q p = 0 âˆ€ p > 1 (take, e.g., { 2 âˆ’ 2 2 k } E 9.1-3(g) or { 2 âˆ’ 2 k 2 } ) or by convention, when the convergence is in a finite number of steps. “The higher the order, the better” is a well-known clichÃ©, which we will express in \cref{subs: relating rates and orders} in terms of the big Ohs. Q -Order p 0 > 1 We propose the following definitions of Q -order \leavevmode convergence: no Q -order if Q Â¯ 1 = âˆž (e.g., { x k a } , { x k c } ); Q -sublinear if 1 â‰¤ Q Â¯ 1 < + âˆž (e.g., x k = 2 k , k odd, x k = 1 k , k even); exact Q -sublinear if 0 < \barbelow {Q} 1 â‰¤ 1 â‰¤ Q Â¯ 1 < + âˆž ; at least Q -linear if Q Â¯ 1 < 1 ; Q -linear if 0 < Q Â¯ 1 < 1 ; exact Q -linear if 0 < \barbelow {Q} 1 â‰¤ Q Â¯ 1 < 1 . (a) Obviously, when x k â†’ x * , then \barbelow {Q} 1 â‰¤ 1 and \barbelow {Q} 1 â‰¤ Q Â¯ 1 . (b) Q Â¯ 1 âˆˆ [ 0 , + âˆž ] always exist, while Q 1 may not (i.e., when \barbelow {Q} 1 < Q Â¯ 1 ). (c) Q Â¯ 1 < 1 â‡’ monotone, while Q Â¯ 1 > 1 â‡’ nonmonotone errors; Q Â¯ 1 = âˆž means unbounded nonmonotone errors. Unlike 0 < Q 1 â‰¤ 1 , 0 < Q Â¯ 1 â‰¤ + âˆž alone does not necessarily imply slow speed (e.g., { x k a } , { x k c } with unbounded nonmonotone errors). The convergence may be fast when \barbelow {Q} 1 = 0 , even if Q Â¯ 1 = âˆž (e.g., { x k a } , { x k c } ), but is no longer fast when 0 < Q Â¯ 1 . Strict Q -superlinear is an intermediate order between linear and p 0 > 1 . [p.~285] { x k } has Q -superlinear order Or at least Q -superlinear, or even superlinear, as R -superlinear is seldom encountered. if Q Â¯ 1 = 0 â£ ( = Q 1 ) : lim k â†’ âˆž e k + 1 e k = 0 , â€ƒâ€ƒ ( â‡” âˆƒ c k â†’ 0 â€‚s.t.â€‚ e k + 1 = c k â¢ e k ) . Strict Q -superlinear order holds when, moreover, Q Â¯ p = + âˆž âˆ€ p > 1 . [cf.~(37b)] Q -superlinear order holds if âˆƒ Î¸ k > 0 and c > 0 such that Î¸ k â†’ 0 and e k = c â¢ âˆ i = 1 k Î¸ i ( c may be taken to be 1 ). [strict Q -superlinear](a) Let { 1 k k } p.~22, { 1 10 k 2 } , , { 1 k ! } , { 1 c k 2 } , c > 1 E~9.2-1(j). (b) E~10.1-4, p.~94 Given 0 < c < 1 e , let x k + 1 = âˆ’ x k ln â¡ x k , k â‰¥ 0 , x 0 = c . { x k b } has Q -superlinear order, but not strict Q -superlinear order. The following formulation has been commonly used for half a century, since the 1970 book of Ortega and Rheinboldt: “ { x k } converges with Q -order at least p 0 > 1 ” , , , . Here we deal, as in and , with a more restrictive notion, with the classic notion corresponding to the lower Q -order q l from \cref{def: lower/upper Q-orders ql & qu}. [see ; cf.~, ] { x k } has Q -order p 0 > 1 if any of the following equivalent conditions hold: lim k â†’ âˆž Q p â¡ ( k ) = { 0 , p âˆˆ [ 1 , p 0 ) , + âˆž , p âˆˆ ( p 0 , + âˆž ) ; lim k â†’ âˆž Q L â¡ ( k ) = p 0 , â€ƒâ€ƒ Q L â¡ ( k ) := ln â¡ e k + 1 ln â¡ e k ; lim k â†’ âˆž Q Î› â¡ ( k ) = p 0 , â€ƒâ€ƒ Q Î› â¡ ( k ) := ln â¡ e k + 2 e k + 1 ln â¡ e k + 1 e k ; or âˆ€ Îµ > 0 , âˆƒ A , B > 0 such that A â¢ e k p 0 + Îµ â‰¤ e k + 1 â‰¤ B â¢ e k p 0 âˆ’ Îµ â€„â€„â€„ âˆ€ k â‰¥ k 0 . (a) If the Q -order p 0 exists, it is unique (recall \cref{rem: C-order def conv order profile}). (b) If { x k } has Q -order p 0 > 1 , then the right-hand side inequalities in \cref{f. def Q_Ieps} imply that the errors are strictly monotone ( k â‰¥ k 0 ) ; see \cref{rem C-order p0>1 => ekp1 < ek,rem. C-linear => strict monot errs}. In we proved the equivalence of the four conditions above for { x k } âŠ‚ R N , connecting and completing the independent, fundamental results of Potra and Beyer, Ebanks, and Qualls . This proof does not simplify for { x k } âŠ‚ R , but a natural connection between these conditions is found, e.g., by taking logarithms in \cref{eq. C-order p0 <=> (Qp0-eps).ek^p0 <= ekp1 <= (Qp0+eps).ek^p0} to obtain \cref{f. def Q_L}. p 0 in \cref{f. def Q_L} roughly means that the number of figures is multiplied by p 0 at each step ( k â‰¥ k 0 ); see, e.g., and p.~91. Also, \cref{f. def Q_L} and \cref{f. def Q_Lam} can be connected by \cref{exe. xk C-order p0 => ekp1/ek C-order p0 and Qp0=1} and as follows. If { x k } has strict monotone errors for k â‰¥ k 0 , prove that \cref{f. def Q_Lam} â‡’ \cref{f. def Q_L} (hint: use the Stolz–CesÃ ro Theorem). (a) One can always set k 0 = 0 in \cref{f. def Q_Ieps} (this is actually used in the definitions of Potra from ), by a proper choice of smaller A and larger B . (b) In we have denoted by ( Q Îµ ) a condition which is trivially equivalent to \cref{f. def Q-order}. Here we use ( Q Îµ ) instead of ( Q I , Îµ ) from (“ I ” stood for “inequality” in that paper). (c) \cref{f. def Q-order} implies that the Q -profile of { x k } has a single jump at p 0 > 1 , but that the limit Q p 0 is not required to exist, so 0 â‰¤ \barbelow {Q} p 0 â‰¤ Q Â¯ p 0 â‰¤ âˆž , and so six cases result for the possible values of \barbelow {Q} p 0 and Q Â¯ p 0 ( 0 , finite > 0 , + âˆž ). (d) Relations \barbelow {Q} p 0 = 0 or Q Â¯ p 0 = + âˆž might seem rather abstract, but they do actually occur (even simultaneously, as we shall see for the higher orders of the secant method; see also p.~252~and~Chap.~7, where \barbelow {Q} 2 = Q Â¯ 2 = + âˆž , for a problem in R N ). (e) Q -order p 0 > 1 implies Q Â¯ 1 = 0 , but the converse is not true. \cref{fig: Q-profiles Q-subquadratic and exact Q-quadratic} shows the Q -profiles of { 2 âˆ’ 2 k / k } and { x k d } ( c = 5 4 = 1.25 ). Superlinearity may also be defined in conjunction with higher Q -orders, but this is tricky. [see , ] The Q -order p 0 > 1 is called Q -superorder p 0 if Q Â¯ p 0 = 0 â£ ( = Q p 0 ) ; Q -suborder p 0 if \barbelow {Q} p 0 = + âˆž â£ ( = Q p 0 ) . In this was defined by Q Â¯ p 0 = + âˆž , which we believe is not strong enough. The Q -order 2 with Q Â¯ 2 = 0 is Q -superquadratic (analogously, Q -supercubic, etc.). (a) The Q -superquadratic { x Ë™ k } = { 2 âˆ’ k â¢ 2 k } , the Q – (and C -)quadratic { 2 âˆ’ 2 k } , and the Q -subquadratic { k } = { 2 âˆ’ 2 k k } are in [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} âˆ€ Î± > 1 ] decreasing order of speed. Study the Q -order of z k = { x Ë™ k , k â€‚odd, x Ëš k , k â€‚even, ( 2 is erroneous in ). (b) Although the C -quadratic { 2 âˆ’ 2 k } is \cref{eq: |x*-xk| = o(|x*-yk|)} faster than { k â¢ 2 âˆ’ 2 k } , the latter, considered in , is actually Q -superquadratic; similarly, { 2 âˆ’ 2 k / k } is Q -subquadratic. The terminology “ Q -sub/superquadratic” appears here to be in disagreement with the real speed. (c) Determine the Q -orders from \cref{quiz: the rate of 10-1 10-2 10-3} by using \cref{f. def Q_L}. Let { 10 â‹… 2 âˆ’ 3 ( k âˆ’ 1 / 4 k ) } ( C -cubic), { 10 â‹… 3 âˆ’ 2 ( k + 1 / 3 k ) } ( C -quadratic), and { 1 k k } ( Q -superlinear); in \cref{fig: Q_l(k) Q_Lam(k-1)} we see that Q L â¡ ( k ) , Q Î› â¢ ( k âˆ’ 1 ) \leavevmode tend to 3 , 2 , and 1 , \leavevmode respectively. Plotting/comparing Q L â¡ ( k ) and Q Î› â¢ ( k âˆ’ 1 ) avoids conclusions based on Q Î› â¡ ( k ) using information ahead of that from Q L â¡ ( k ) (i.e., x k + 2 vs.Â  x k + 1 ). The calculation of the order may be made easier by using logarithms. Let x k â†’ x * âˆˆ R . (a) If, for some given q â‰¥ 1 and A , B > 0 , one has A â¢ e k â¢ e k âˆ’ 1 q â‰¤ e k + 1 â‰¤ B â¢ e k â¢ e k âˆ’ 1 q , â€ƒ k â‰¥ k 0 , show that { x k } has Q -order Î» q := 1 + 1 + 4 â¢ q 2 . Thus, Î» 1 â‰ˆ 1.618 (the golden ratio), Î» 2 = 2 , Î» 3 = 1 + 13 2 â‰ˆ 2.3 , etc. Inequality \cref{f. Aekekm1^p <= ekp1 <= Bekekm1^p} appears in the analysis of the secant method (see \cref{th. secant method – standard convergence,th: secant method Raydan higher orders}), and it does not necessarily attract C – or even exact Q -orders, even if âˆƒ lim k â†’ âˆž e k + 1 e k â¢ e k âˆ’ 1 q = c âˆˆ ( 0 , âˆž ) . (b) Calculate the Q -order of { x k } if it satisfies (see , ) A â¢ e k 2 â¢ e k âˆ’ 1 â‰¤ e k + 1 â‰¤ B â¢ e k 2 â¢ e k âˆ’ 1 . (c) If { x k } verifies the general relation (see, e.g., Chap.~3, , , , ) A â¢ e k Î± 0 â¢ â‹¯ â¢ e k âˆ’ q Î± q â‰¤ e k + 1 â‰¤ B â¢ e k Î± 0 â¢ â‹¯ â¢ e k âˆ’ q Î± q , determine the condition verified by the Q -order (see for the exact Q -order). A special case holds when Îµ = 0 in \cref{f. def Q_Ieps} (see also , , , , ). [see, e.g., ] { x k } has exact Q -order p 0 if âˆƒ A , B , k 0 â‰¥ 0 , s.t. A â‹… e k p 0 â‰¤ e k + 1 â‰¤ B â‹… e k p 0 â€„â€„â€„ âˆ€ k â‰¥ k 0 . This case can be characterized in terms of the asymptotic constants \barbelow {Q} p 0 , Q Â¯ p 0 . { x k } has exact Q -order p 0 iff \barbelow {Q} p 0 , Q Â¯ p 0 are finite and nonzero, 0 < \barbelow {Q} p 0 â‰¤ Q Â¯ p 0 < âˆž , in which case the constants A , B in \cref{eq: exact Q-order Aek^p < ekp1 < Bek^p} are bounded by A â‰¤ \barbelow {Q} p 0 â‰¤ Q Â¯ p 0 â‰¤ B , and these bounds are attained in some special restrictive circumstances. (a) { x k d } in \cref{ex. C-order examples} has exact Q -order 2 : \barbelow {Q} 2 = 1 c 3 â‰  Q Â¯ 2 = c 3 . (b) x k = 2 âˆ’ 2 k , k odd, x k = 3 â‹… 2 âˆ’ 2 k âˆ’ 1 3 3 k , k even, has Q â¢ ( 2 â¢ k ) < \barbelow {Q} 2 = 1 9 , i.e., A < \barbelow {Q} 2 . The sequences { k } and { \underaccent {\mathring }{e} k } are asymptotically similar when k = O â¡ ( \underaccent {\mathring }{e} k ) and \underaccent {\mathring }{e} k = O â¡ ( k ) , as k â†’ âˆž , denoted by \underaccent {\mathring }{e} k = Î˜ â¡ ( k ) (see, e.g., , p.~50). The exact Q -order may also be expressed as e k + 1 = O â¡ ( e k p 0 ) and e k p 0 = O â¡ ( e k + 1 ) , as k â†’ âˆž (see , p.~2, and also ), i.e., e k + 1 = Î˜ â¡ ( e k p 0 ) . The classical Q -order of Ortega and Rheinboldt is the q l below. [, , , ] The lower/upper Q -orders of { x k } are q l = { âˆž , â€‚â€‚â€‚ifâ€‚â€‚ Q Â¯ p = 0 âˆ€ p â‰¥ 1 , inf { p âˆˆ [ 1 , âˆž ) : Q Â¯ p = + âˆž } , â€ƒ resp.,â€‚â€ƒ q u = sup { p âˆˆ [ 1 , âˆž ) : \barbelow {Q} p = 0 } . When { x k } has Q -order p 0 > 1 , the lower and upper orders coincide, q l = q u = p 0 ; otherwise, q l < q u ; we will also see this in \cref{th. Cp0 => Qp0 => Rp0} (relation \cref{f. q_l <= r_l <= r_u <= q_u}). Next we analyze { x k b } from \cref{ex. xk w err betw ord 2 3}, which has no Q -order, despite it being \cref{eq: |x*-xk| <= |x*-yk|} faster than { 1 2 2 k } . We keep in mind that the Q -order does not measure the overall speed, but just compares the consecutive errors. [] Let x k b = { 2 âˆ’ 2 k , k even, 3 âˆ’ 2 k , k odd, with the Q -profile in \cref{fig: no Q-order}. { x k b } does not have a Q -order, as q l = log 3 â¡ 4 = 1.2 â¢ â€¦ < q u = 4 â¢ log 4 â¡ 3 = 3.1 , â€¦ , but still it has (exact) R -order 2 (see \cref{def: R-order by four equiv conds,def. exact R-order}). We note the usual statement in this case that “ { x k b } converges with Q -order at least q l = 1.2 â¢ â€¦ and with R -order at least 2 ,” that no longer holds in the setting of this paper. Show that { x k c } has \barbelow {Q} 1 = 0 , Q Â¯ 1 = + âˆž and determine q l , q u (here we find q l < 1 , and the Q -profile can be extended, as in , for p < 1 ). Determine the Q -profiles of { x k a } and { x k c } . The inequalities implied by the lower/upper Q -orders are as follows. If 1 < q l < âˆž , then âˆ€ Îµ > 0 , with 1 < q l âˆ’ Îµ , âˆƒ B > 0 , k 0 â‰¥ 0 , s.t. e k + 1 â‰¤ B â‹… e k q l âˆ’ Îµ â€„â€„â€„ âˆ€ k â‰¥ k 0 (and relation \cref{eq: def ql} says that q l is the largest value satisfying the above inequalities). When 1 < q u < âˆž , then âˆ€ Îµ > 0 , âˆƒ A > 0 , k 1 â‰¥ 0 , such that A â‹… e k q u + Îµ â‰¤ e k + 1 â€„â€„â€„ âˆ€ k â‰¥ k 1 . One can say that the lower Q -order is small when the maximum error reduction w.r.t. the previous step is small (i.e., small exponent q l + Îµ for e k ). Analogously, the upper order is large when the minimum error reduction per step is small. The exact lower/upper Q -orders q l , respectively, q u are defined if Îµ = 0 in \cref{f. Q-order def lower ql ekp1<=Bek^ql-ep}–\cref{f. Q-order def upper qu Aek^qu+ep <= ekp1}: A â‹… e k q u â‰¤ e k + 1 â‰¤ B â‹… e k q l â€„â€„ âˆ€ k â‰¥ k 2 . The role of q u will be seen in \cref{ex. Jay xk = 2^-2^k k even 2^-3^k k odd}. The lower and upper Q -orders verify the relations (see , ) q l = \barbelow {Q} L := liminf k â†’ âˆž ln â¡ e k + 1 ln â¡ e k , q u = Q Â¯ L := limsup k â†’ âˆž ln â¡ e k + 1 ln â¡ e k , which will be completed in formulae \cref{f. Q_L=Q’_L=q_l and ^}, respectively, \cref{f. barbelow Q_q_u = barbelow Q_q_u^prime} by some computational variants. R -Order p 0 > 1 The root factors consider some averaged quantities instead of relating the consecutive terms to one another; they are defined for p = 1 by R 1 â¡ ( k ) = e k 1 k , â€ƒ k â‰¥ 1. We propose the following terminology (see also , , , , ): R -sublinear/with no R -order if R Â¯ 1 = 1 (e.g., { 1 k } , { 1 k } ; see Rem.~1.2.40); R -linear if 0 < R Â¯ 1 < 1 ; at least R -linear if R Â¯ 1 < 1 (e.g., x k = 1 2 k , k odd, x k = 1 4 k , k even Rem.~1.2.40, or x k = 1 2 k , k even, x k = 1 2 2 k , k odd); exact R -linear if 0 < \barbelow {R} 1 â‰¤ R Â¯ 1 < 1 ; (at least) R -superlinear if R Â¯ 1 = 0 â£ ( = R 1 ) (e.g., x k = 1 k k , k even, x k = x k âˆ’ 1 , k odd). The R -orders are alternatively defined by requiring the errors to be bounded by sequences converging to zero with corresponding Q -orders p.~21, Def.~2.1.3, p.~218: at least R -linear if âˆƒ Î¸ âˆˆ ( 0 , 1 ) and c > 0 s.t. e k â‰¤ c â‹… Î¸ k ( k â‰¥ 0 ) (37a); R -superlinear if âˆƒ Î¸ k â†’ 0 and c > 0 s.t. e k â‰¤ c â¢ âˆ i = 1 k Î¸ i (37b). (a) At least Q -linear â‡’ at least R -linear E~9.3-5, Thm. 1.2.41, , , p.~213 (using, e.g., \cref{f. ineq liminf akp1/ak liminf ak^1/k etc}); the converse is false (see above or ). (b) Q -superlinear â‡’ R -superlinear, with the converse again being false (see, e.g., the above example). When p > 1 , the root factors are defined by (see~, , , ) R p â¡ ( k ) = e k 1 p k , â€ƒ k â‰¥ 0. [see~; cf.~, ] { x k } has R -order p 0 > 1 if any of the equivalent relations hold: In order that relation \cref{f. def R_Lam} is properly defined and equivalent to the rest of the following definitions, an additional assumption is required (see ): 1 < q l â‰¤ q u < + âˆž . lim k â†’ âˆž R p â¡ ( k ) = { 0 , p âˆˆ [ 1 , p 0 ) , 1 , p âˆˆ ( p 0 , + âˆž ) ; lim k â†’ âˆž R L â¡ ( k ) = p 0 , â€ƒ R L := | ln â¡ e k | 1 k ; lim k â†’ âˆž R Î› â¡ ( k ) = p 0 , â€ƒ R Î› := | ln â¡ e k + 1 e k | 1 k ; or âˆ€ Îµ > 0 , âˆƒ A , B > 0 , 0 < Î· , Î¸ < 1 , and k 0 â‰¥ 0 such that A â‹… Î· ( p 0 + Îµ ) k â‰¤ e k â‰¤ B â‹… Î¸ ( p 0 âˆ’ Îµ ) k â€„â€„ âˆ€ k â‰¥ k 0 . (a) If the R -order p 0 exists, it is unique (see also 9.2.3, p.~289). (b) We can always set k 0 = 0 in \cref{f. def R_eps}, by choosing smaller A and larger B suitably (cf.~, where k 0 = 0 was used in definitions). (c) We prefer here the notation ( R Îµ ) instead of ( R I , Îµ ) from . For p â‰¥ 1 one defines the asymptotic quantities (see sect.~9.2 and p.~85) \barbelow {R} p = liminf k â†’ âˆž R p â¡ ( k ) , â€ƒ R Â¯ p = limsup k â†’ âˆž R p â¡ ( k ) â‰¤ 1. An example of an R -profile is shown in \cref{fig: conv profile exact R-order 2 for xak} for { x k b } , with (exact) R -order 2 . [p.~290, ] If R Â¯ p 0 < 1 , then âˆ€ Îµ > 0 with R Â¯ p 0 + Îµ < 1 , âˆƒ k 0 â‰¥ 0 s.t. e k â‰¤ ( R Â¯ p 0 + Îµ ) p 0 k â€„â€„â€„ âˆ€ k â‰¥ k 0 , while if \barbelow {R} p 0 > 0 , then âˆ€ Îµ > 0 with 0 < \barbelow {R} p 0 âˆ’ Îµ , âˆƒ k 1 â‰¥ 0 s.t. ( \barbelow {R} p 0 âˆ’ Îµ ) p 0 k â‰¤ e k â€ƒ âˆ€ k â‰¥ k 1 . The following notion is defined similarly to the exact Q -order. []\leavevmode { x k } has exact R -order p 0 if âˆƒ A , B > 0 , 0 < Î· , Î¸ < 1 , s.t. A â‹… Î· p 0 k â‰¤ e k â‰¤ B â‹… Î¸ p 0 k â€ƒ âˆ€ k â‰¥ k 0 . { x k b } , { x k c } have exact R -orders 2 : \barbelow {R} 2 = 1 3 , R Â¯ 2 = 1 2 , and \barbelow {R} 2 = 1 5 , R Â¯ 2 = 1 2 , respectively. The Q -sub-/superquadratic { 2 âˆ’ 2 k / k } , { k â¢ 2 âˆ’ 2 k } have R 2 = 1 2 , but not exact R -order 2 . The Q -sub-/superquadratic { 2 âˆ’ 2 k / k } , { 2 âˆ’ k â¢ 2 k } have R 2 = 1 , respectively, R 2 = 0 , i.e., they are R -sub-/superquadratic, again without having exact R -order 2 . Perhaps a proper definition of Q -superorder p 0 would be Q -order p 0 with Q p 0 = 0 = R p 0 , while Q -suborder p 0 would be Q -order p 0 with Q p 0 = âˆž , R p 0 = 1 . We characterize the exact R -order in the following result. The exact R -order p 0 of { x k } is equivalently defined by 0 < \barbelow {R} p 0 â‰¤ R Â¯ p 0 < 1 , which implies the following bounds for Î· , Î¸ from \cref{eq: def exact R-order A eta^p^k < ek < B theta^p^k}: Î· â‰¤ \barbelow {R} p 0 â‰¤ R Â¯ p 0 â‰¤ Î¸ ; these bounds are attained in some special restrictive circumstances. A particular instance from \cref{eq: def exact R-order A eta^p^k < ek < B theta^p^k}, i.e., e k â‰¤ B â‹… Î¸ 2 k , was considered as a definition for (at least) R -quadratic convergence, and some computational scientists (who we suspect are not computational optimizers) have misled by simply calling it “quadratic convergence,” leading to the confusion noted in . As the R -orders require weaker conditions than the Q -orders, they may allow nonmonotone errors. This aspect is perhaps not widely known; we have found it pointed out in p.~620, and in the 1st ed., p.~14, p.~51, and . Clearly, iterative methods with nonmonotone errors are usually not desired. We can easily find conditions for monotone errors in the case of exact R -order p 0 > 1 . [monotone errors in exact R -orders] If { x k } obeys \cref{eq: def exact R-order A eta^p^k < ek < B theta^p^k} and p 0 > ln â¡ Î· ln â¡ Î¸ â£ ( â‰¥ ln â¡ R Â¯ p 0 ln â¡ \underaccent {\bar }{R} p 0 ) â€ƒ or â€ƒ ( p 0 = ln â¡ Î· ln â¡ Î¸ and B < A ) , then it has strict monotone errors ( k â‰¥ k 0 ). The lower and upper R -orders, r l , respectively, r u , and further notations from are r l = inf { p âˆˆ [ 1 , âˆž ) : R Â¯ p = 1 } , â€ƒâ€ƒ r u = sup { p âˆˆ [ 1 , âˆž ) : \barbelow {R} p = 0 } , \barbelow {R} L := liminf k â†’ âˆž | ln â¡ e k | 1 k , â€ƒ R Â¯ L := limsup k â†’ âˆž | ln â¡ e k | 1 k , \barbelow {R} Î› := liminf k â†’ âˆž | ln â¡ e k + 1 e k | 1 k , â€ƒ R Â¯ Î› := limsup k â†’ âˆž | ln â¡ e k + 1 e k | 1 k . The lower R -order r l of { x k } was also defined as follows: âˆƒ { k } with C -order 0 = r l s.t. { x k } is \cref{eq: |x*-xk| <= |x*-yk|} faster than { k } , , Def.~4.1.3. We now consider some sequences similar to those in \cref{ex. xk w err betw ord 2 3}. (a) Let k = 2 âˆ’ 2 k and take x Ë™ k = { 2 âˆ’ 2 k , k â¢ e â¢ v â¢ e â¢ n , 2 âˆ’ 3 k , k â¢ o â¢ d â¢ d (Jay ). Then { x Ë™ k } converges \cref{eq: |x*-xk| <= |x*-yk|} faster than { k } and though { x Ë™ k } has neither C -order, nor Q -order, nor R -order ( r Ë™ l = 2 , r Ë™ u = 3 ), { k } has C -order 2 . (b) Extending the above behavior, x Ë™ k = { 2 âˆ’ 4 k , k â¢ e â¢ v â¢ e â¢ n , 2 âˆ’ 5 k , k â¢ o â¢ d â¢ d has no C -, Q -, or R -order, but it converges [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} âˆ€ Î± > 1 ] faster than k = 2 âˆ’ 2 k . (c) x Ë™ k = { 2 âˆ’ 3 2 k , k â¢ e â¢ v â¢ e â¢ n , 2 âˆ’ 4 2 k , k â¢ o â¢ d â¢ d is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} âˆ€ Î± > 1 ] faster than k = 2 âˆ’ 2 2 k ; { x Ë™ k } has no C – or Q -order (but has infinite R -order), while { k } has infinite C -order. Here { x Ë™ k } deserves a closer look, as it is a perfect candidate to support the comments from : indeed, it attains infinite R -order, but its errors are unbounded nonmonotone. Nocedal and Wright p.~620 also noted such possible behavior. Clearly, statements such as those in the above example can appear only for sequences { x Ë™ k } with \barbelow {Q} 1 = 0 , as this condition allows large upper orders q u . When \barbelow {Q} 1 > 0 , the corresponding sequence cannot converge quickly. Relations between the C -, Q -, and R -Orders of a Sequence C -order p 0 > 1 of { x k } implies Q -order p 0 and in turn R -order p 0 (see \cref{exe. xk has Q-order 2 but no C-order,ex. xk = 2^-2^k k even =3^-2^k k odd} for converses). We state this below and, as in , we use curly braces for equivalent orders. [see ; cf.~, ] Let x k â†’ x * and p 0 > 1 . Then (see footnote~) { C , C Q , C Îµ } â‡’ â‡ { Q , Q L , Q Î› , Q Îµ } â‡’ â‡ { R , R L , R Î› , R Îµ } , or, in a more generic fashion (i.e., { C } := { C , C Q , C Îµ } ), { C } â‡’ â‡ { Q } â‡’ â‡ { R } . Moreover, the following relation holds for the lower and upper orders: q l = \barbelow {Q} L â‰¤ \barbelow {R} L = \barbelow {R} Î› = r l â‰¤ r u = R Â¯ L = R Â¯ Î› â‰¤ Q Â¯ L = q u . Relation \cref{f. q_l <= r_l <= r_u <= q_u} is obtained from the well-known inequalities for positive numbers, liminf k â†’ âˆž a k + 1 a k â‰¤ liminf k â†’ âˆž | a k | 1 k â‰¤ limsup k â†’ âˆž | a k | 1 k â‰¤ limsup k â†’ âˆž a k + 1 a k , taking a k = | ln â¡ e k | ; see and . Any inequality from \cref{f. q_l <= r_l <= r_u <= q_u} may be strict. Now we see (e.g., in \cref{ex. xk = 2^-2^k k even =3^-2^k k odd,ex Potra-Ptak xk w r-order large and q-order small}) that the inner inequality may be an equality (i.e., obtain R -order), while one of the outer inequalities may be strict (i.e., no Q -order). An { x k } with exact R -order Ï„ > 1 arbitrarily large and 1 < q l (but arbitrarily close) is again suitable for justifying the comments from . [E~9.3-3, ] Given any numbers 1 < s < Ï„ , take 0 < Î¸ < 1 , Î· = Î¸ q with q > 1 such that q â¢ s > Ï„ . Then x k = { Î¸ Ï„ k , k odd, Î· Ï„ k , k even has exact R -order Ï„ , while q l = \barbelow {Q} L = Ï„ q and q u = Q Â¯ L = Ï„ â¢ q ( > Ï„ ), and thus it has no Q -order (in the classical sense from { x k } has Q -order at least Ï„ q ). Comparing the Speeds of Two Sequences We consider only C -orders. [higher C -order, faster speed] If { k } , { x Ë™ k } have C -orders 1 < 0 < p Ë™ 0 , then { x Ë™ k } is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} âˆ€ Î± > 1 ] faster than { k } . For the proof, one may use \cref{eq. C-order p0 <=> (Qp0-eps).ek^p0 <= ekp1 <= (Qp0+eps).ek^p0}. In we show some simplified proofs. (a) \cref{comp: C-order xk < yk} holds regardless of the magnitude of p Ë™ 0 âˆ’ 0 . (b) The C -orders (i.e., sublinear, linear, strict superlinear, 1 < 0 < p Ë™ 0 , and infinite) form classes in [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} âˆ€ Î± > 1 ] increasing speed (cf.~\cref{exe. C-sublinear is ek<=Oek^a faster than C-linear}). As seen in previous examples, comparing by [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} âˆ€ Î± > 1 ] is much stronger than by [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))}, Î± âˆˆ ( 1 , Î± 0 ] ] for some given Î± 0 > 1 . How similarly do two sequences with the same C -order and identical Q p 0 behave? { 1 3 2 k } is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))}, Î± âˆˆ ( 1 , ln â¡ 3 ln â¡ 2 ] ] faster than { 1 2 2 k } (recall \cref{exe. 1/3^2^k = O((1/2^2^k)^1+alpha)}). This shows that if { k } , { x Ë™ k } have C -order p 0 with the same Q p 0 , this does not necessarily mean that (in the asymptotic range) one has { e Ë™ k } â‰ˆ { k } . Brezinski noted that both { 1 k } and { 1 k 2 } have the same Q 1 = 1 (i.e., C -sublinear order), but quite different speeds. Assessing the asymptotic constant Q p 0 may not make sense when { x k } âŠ‚ R N , N â‰¥ 2 , as its definition is norm-dependent E~9.1-2, . Computational Versions of the Convergence Orders The error-based analysis of the orders above is similar in some ways to the local convergence theory for iterative methods from \cref{sec: iterative meth for nonlin eq}: both assume the existence of the limit/solution x * and infer essential properties, even if x * is not known and in practice one needs to use quantities based solely on information available at each step k . Next, we analyze two practical approaches, equivalent to error-based analysis: the replacing of | x * âˆ’ x k | by (the absolute value of) either the corrections s k := | x k + 1 âˆ’ x k | or the nonlinear residuals | f k | := | f â¡ ( x k ) | . We keep the notation from previous work and obtain a rather theoretical setting in this analysis (e.g., s k requiring x k + 1 ); in numerical examples, however, we will use only information available at step k (i.e., x k , | f k | , s k âˆ’ 1 , x k âˆ’ 1 , | f k âˆ’ 1 | , s k âˆ’ 2 , â€¦ , etc.). Computational Convergence Orders Based on Corrections When the corrections { s k } converge with lower Q -order q l , then { x k } also converges and attains at least lower R -order q l . [see Thm.~1.2.42, Lem.~4.5.6] Let { x k } âŠ‚ R . If âˆƒ c âˆˆ ( 0 , 1 ) and k 0 âˆˆ N with | x k + 1 âˆ’ x k | â‰¤ c â‹… | x k âˆ’ x k âˆ’ 1 | â€ƒ ( i.e.,â€‚ s k â‰¤ c â‹… s k âˆ’ 1 ) , â€ƒ k â‰¥ k 0 + 1 , then âˆƒ x * âˆˆ R such that x k â†’ x * at least R -linearly. If âˆƒ c > 0 , p 0 > 1 , and k 0 âˆˆ N s.t. c 1 p 0 âˆ’ 1 â¢ s k 0 < 1 and s k â‰¤ c â‹… s k âˆ’ 1 p 0 , â€ƒ k â‰¥ k 0 + 1 , then âˆƒ x * âˆˆ R such that x k â†’ x * with lower R -order at least p 0 . The errors and corrections are tightly connected when the convergence is fast. [Potra–PtÃ¡k–Walker Lemma; Prop.~6.4, ] x k â†’ x * Q -superlinearly iff x k + 1 âˆ’ x k â†’ 0 Q -superlinearly. In the case of Q -superlinear convergence, it holds (the Dennis–MorÃ© Lemma ) that lim k â†’ âˆž | x k + 1 âˆ’ x k | | x * âˆ’ x k | = 1. (a) As pointed out by Dennis and MorÃ© in , \cref{eq: lim sk/ek = 1 Dennis-More} alone does not imply Q -superlinear convergence: take x 2 â¢ k âˆ’ 1 = 1 k ! , x 2 â£ k = 2 â¢ x 2 â¢ k âˆ’ 1 , k â‰¥ 1 , for example. (b) While the statement and the proofs of this result consider multidimensional spaces and norms (or even metrics) in \cref{eq: lim sk/ek = 1 Dennis-More}, Brezinski has noted that in R the sufficiency can also be proved by using l’HÃ´pital’s rule for sequences. As in , we use an apostrophe for the resulting quotient factors ( p > 1 ): Q p â€² â¡ ( k ) := s k + 1 s k p = | x k + 2 âˆ’ x k + 1 | | x k + 1 âˆ’ x k | p , â€ƒ k â‰¥ 0 , and the above lemma shows that C -order p 0 â‡’ C â¢ ‘ -order p 0 with Q p 0 = Q p 0 â€² (see also ). The same statement holds for the Q -orders and corresponding upper/lower limits of Q p 0 â¡ ( k ) . The equivalence of the error-based and computational orders ( { C , C â¢ ‘ } , etc.)Â is stated in \cref{subsec: Equiv of conv orders and computational versions}, which incorporates the nonlinear residuals as well. The C â€² -, Q â€² -, and Q Îµ â€² -orders are semicomputational: they do not use x * but still require p 0 . Instead, of much practical interest are the (full) computational expressions lim k â†’ âˆž Q L â€² â¡ ( k ) = p 0 , â€ƒâ€ƒ Q L â€² â¡ ( k ) := ln â¡ s k + 1 ln â¡ s k , lim k â†’ âˆž Q Î› â€² â¡ ( k ) = p 0 , â€ƒâ€ƒ Q Î› â€² â¡ ( k ) := ln â¡ s k + 2 s k + 1 ln â¡ s k + 1 s k , as they are equivalent to the corresponding error-based orders (see \cref{cor. equiv of all C C’ C” resp Q and R-orders}). Computational Convergence Orders Based on Nonlinear Residuals In \cref{sec: iterative meth for nonlin eq} we study the speed of some iterative methods toward a zero x * of f . The nonlinear mapping f : D âŠ† R â†’ R is assumed to be sufficiently many times differentiable and the solution is assumed simple: This terminology is used for equations in R , while for systems in R N the terminology is “nonsingular,” as (the Jacobian) F â€² â¡ ( x * ) is a matrix. f â€² â¡ ( x * ) â‰  0 (this means that f â¡ ( x ) = ( x âˆ’ x * ) â¢ g â¡ ( x ) and g â¡ ( x * ) â‰  0 ). However, we will consider multiple solutions as well. The use of the nonlinear residuals for controlling the convergence to x * is natural, as they can be seen as “proportional” to the errors: by the Lagrange theorem, f â¡ ( x ) = f â€² â¡ ( x + Î¸ â¢ ( x * âˆ’ x ) ) â¢ ( x âˆ’ x * ) â€ƒ â€‚for someâ€‚ Î¸ âˆˆ ( 0 , 1 ) , which in applied sciences is usually written as f â¡ ( x ) â‰ˆ f â€² â¡ ( x * ) â¢ ( x âˆ’ x * ) â€ƒ ( x â‰ˆ x * ) . However, instead of an asymptotic relation, this is often regarded as an approximate equality. Such an approach is as precise as the idiomatic “the higher the order, the fewer the iterations (needed)”: it may not refer to the asymptotic range. If f is a polynomial of degree 2 with f â€² â¡ ( x * ) = 1 at the root x * = 0 , how large can | f â¡ ( x ) | be when | x âˆ’ x * | = 0.0001 holds (and then for | x âˆ’ x * | = 10 âˆ’ 16 )? Returning to the quotient factors, we use, as in , double prime marks, Q p 0 â€² â€² â¡ ( k ) := | f â¡ ( x k + 1 ) | | f â¡ ( x k ) | p 0 = : | f k + 1 | | f k | p 0 , and notice that if Q p 0 âˆˆ ( 0 , + âˆž ) , then, by \eqref{f. Lagrange f(x) approx f'(x*)(x-x*)}, Q p 0 â€² â€² â¡ ( k ) â†’ Q p 0 | f â€² â¡ ( x * ) | p 0 âˆ’ 1 . This leads us to the C â€² â€² – and Q â€² â€² -orders p 0 > 1 . For instance, The Equivalence of the Error-Based and Computational Orders This equivalence was given in R N (see also and ). Here we formulate it as a corollary, since in R the three C -orders are equivalent (see Ex. 3.22 for { C â€² â€² } â‡ { C , C â€² } in R N ). [cf.~] Let f be smooth, x * simple, x k â†’ x * , p 0 > 1 . Then (see footnote ) { C , C â€² , C â€² â€² } â‡’ â‡ { Q , Q â€² , Q â€² â€² } â‡’ â‡ { R , R â€² , R â€² â€² } . Moreover, the lower/upper orders and asymptotic constants obey Consequently, one has C -order p 0 > 1 iff 0 < Q p 0 = Q p 0 â€² = | f â€² â¡ ( x * ) | p 0 âˆ’ 1 â¢ Q p 0 â€² â€² < âˆž and Q -order p 0 > 1 iff Relation \cref{f. Q_L=Q’_L=q_l and ^} shows equalities of orders, while \cref{f. barbelow Q_q_u = barbelow Q_q_u^prime} shows that of constants. The equivalence of the computational orders based on the corrections follows from the Potra–PtÃ¡k–Walker lemma, while \cref{f. Q_L=Q’_L=q_l and ^} and \cref{f. barbelow Q_q_u = barbelow Q_q_u^prime} follow from the Dennis–MorÃ© lemma. The equivalence of the computational orders based on the nonlinear residuals is obtained using the Lagrange theorem, by \cref{f. Lagrange f(x) approx f'(x*)(x-x*)}. Iterative Methods for Nonlinear Equations Unless otherwise stated, the nonlinear mapping f : D âŠ† R â†’ R is sufficiently smooth and the solution x * is simple. Since an equation f â¡ ( x ) = 0 might have a unique, several (even infinite), or no solution at all, the iterative methods usually need an initial approximation close to an x * . [Def.~10.1.1] A method converges locally to x * if âˆƒ V âŠ† D (neighborhood of x * ) s.t. âˆ€ x 0 âˆˆ V , the generated iterations { x k } âŠ‚ D and x k â†’ x * . The Newton Method The first result on the local convergence of a method was given for the Newton iterations (see NR~10.2-1, , sect.~6.4), x k + 1 = x k âˆ’ f â¡ ( x k ) f â€² â¡ ( x k ) , â€ƒ k = 0 , 1 , â€¦ , x 0 âˆˆ D , and it is to due to Cauchy in 1829 (, available on the Internet). These iterates are also known as the tangent method, though their initial devising did not consider this geometric interpretation. The first such interpretation was given by Mourraille (1768) sect.~6.3. History This method can be traced back to ancient times (Babylon and Egypt, ca.~1800 B.C.), as it is conjectured that it was used in the computation (with five correct—equivalent—decimal digits) of 2 (see , sect.~1.7, , with the references therein, and also p.~39 for an annotated picture). While this is not certain, there is no doubt that Heron used these iterations for a . “Heron of Alexandria (first century A.D.) seems to have been the first to explicitly propose an iterative method of approximation in which each new value is used to obtain the next value,” as noted by Chabert et al.~p.~200. Indeed, Heron has approximated 720 in an iterative fashion by elements corresponding to the Newton method for solving x 2 âˆ’ 720 = 0 sect.~7.1. We can also recognize the Newton method for x 2 âˆ’ A = 0 used by Theon of Alexandria (ca.~370 A.D.) in approximating 4500 , based entirely on geometric considerations (originating from Ptolemy; see ). See sect.~7.2. So efficient is this method in computing two of the basic floating point arithmetic operations—the square root and division (see \cref{exe: Newton iterates for sqrt and division})—that it is still a choice even in current (2021) codes (see, e.g., pp.~138,~124). Let us recall some general comments on the subsequent history of this method, thoroughly analyzed by Ypma in . A method algebraically equivalent to Newton’s method was known to the 12th century algebraist Sharaf al-DÄ«n al-TÌ£Å«sÄ« [â€¦], and the 15th century Arabic mathematician Al-KÄshÄ« used a form of it in solving x p âˆ’ N = 0 to find roots of N . In western Europe a similar method was used by Henry Briggs in his Trigonometria Britannica, published in 1633, though Newton appears to have been unaware of this [â€¦]. (see also ) In solving nonlinear problems using such iterates, Newton ( â‰ˆ 1669 ) and subsequently Raphson (1690) dealt only with polynomial equations, x 3 âˆ’ 2 â¢ x âˆ’ 5 = 0 is “the classical equation where the Newton method is applied” . , Actually, as noted by Ypma, Newton considered such iterations in solving a transcendent equation, namely, the Kepler equation x âˆ’ e â¢ sin â¡ x = M . “However, given the geometrical obscurity of the argument, it seems unlikely that this passage exerted any influence on the historical development of the Newton-Raphson technique in general” . described in . Newton considered the process of successively computing the corrections, which are ultimately added together to form the final approximation (cf.~); see \cref{alg. devised by Newton}. The algorithm devised by Newton (left), and his first example (right) \STATE{Let $f(x)$, $x_0$ \hfill [$f(x)=x^3 -2x -5$, $x_0=2$]} \STATE{ Let $g_0(s)=f(x_0+s)$ \hfill $[g_0(s)=(2+s)^3 – 2(2+s) – 5 =s^3 + 6s^2 + 10s – 1$] } \STATE{ For $k=0:m-1$ } \STATE{ \qquad Compute $\Tilde {g}_k(s)$ (the order-one approx. of $g_k(s)$) \hfill [$\Tilde {g}_0(s) = 10s – 1$] } \STATE{ \qquad Solve for $s_k$ in $\Tilde {g}_k(s)=0$ \hfill [$s_0=0.1$] } \STATE{ \qquad Let $g_{k+1}(s)=g_k(s_k+s)$ \hfill [$g_1(s)=g_0(s_0+s)=s^3 + 6.3s^2 + 11.23s + 0.061$] } \STATE{ $x_n:=x_0 + \sum _{k=0}^{n-1} s_k$ \hfill [$x_n = 2 + 0.1 – \frac {0.061}{11.23}+\cdots + s_{n-1}$] } Raphson considered approximations updated at each step, a process equivalent to \cref{f. Newton method} (see , , ). However, the derivatives of f (which could be calculated with the “fluxions” of that time) do not appear in their formulae, only the first-order approximations from the finite series development of polynomials (using the binomial formula). For more than a century, the belief was that these two variants represented two different methods; “Nearly all eighteenth century writers and most of the early writers of the nineteenth century carefully discriminated between the method of Newton and that of Raphson” (Cajori ). For a given input ( n fixed), the result of \cref{alg. devised by Newton} may depend on how accurately the different fractions (e.g., 0.061 11.23 ) are computed. it was Stewart (1745) who pointed out that they are in fact equivalent (see ). As noted by Steihaug , It took an additional 50 years before it was generally accepted that the methods of Raphson and Newton were identical methods, but implemented differently. Joseph Lagrange in 1798 derives the Newton-Raphson method [â€¦] and writes that the Newton’s method and Raphson’s method are the same but presented differently and Raphson’s method is plus simple que celle de Newton. Simpson (1740) was the first to apply the method to transcendent equations, using “fluxions.” The fluxions \Dot {x} did not represent f â€² , but are “essentially equivalent to d â¢ x / d â¢ t ; implicit differentiation is used to obtain d â¢ y / d â¢ t , subsequently dividing through by d â¢ x / d â¢ t as instructed produces the derivative A = d â¢ y / d â¢ x of the function. [â€¦] Thus Simpson’s instructions closely resemble, and are mathematically equivalent to, the use of [\cref{f. Newton method}]. The formulation of the method using the now familiar f â¢ ‘ â¡ ( x ) calculus notation of [\cref{f. Newton method}] was published by Lagrange in 1798 [â€¦]” (see also ). , Even more important, he extended it to the solving of nonlinear systems of two equations (see , , ), The first nonlinear system considered was y + y 2 âˆ’ x 2 âˆ’ 10 = 0 , x + y 2 + x âˆ’ 12 = 0 , with ( x 0 , y 0 ) = ( 5 , 6 ) , according to and . subsequently generalized to the usual form known today: F â¡ ( x ) = 0 with F : R N â†’ R N , for which, denoting the Jacobian of F at x k by F â¢ ‘ â¡ ( x k ) , one has to solve for s k at each step the linear system F â¢ ‘ â¡ ( x k ) â¢ s = âˆ’ F â¡ ( x k ) . Cajori noted (see also ): Then appear writers like Euler, Laplace, Lacroix and Legendre, who explain the Newton-Raphson process, but use no names. Finally, in a publication of Budan in 1807, in those of Fourier of 1818 and 1831, in that of Dandelin in 1826, the Newton-Raphson method is attributed to Newton. The immense popularity of Fourier’s writings led to the universal adoption of the misnomer “Newton’s method” for the Newton-Raphson process. While some authors use “the Newton–Raphson method,” “Who was `–Raphson’?” (N.~BiÄ‡aniÄ‡ and K.~H. Johnson, 1979) has the popular answer “Raphson was Newton’s programmer” (S.~Nash, 1992). the name Simpson—though appropriate “[â€¦] it would seem that the Newton–Raphson–Simpson method is a designation more nearly representing the facts of history in reference to this method” (Ypma ). “I found no source which credited Simpson as being an inventor of the method. None the less, one is driven to conclude that neither Raphson, Halley nor anyone else prior to Simpson applied fluxions to an iterative approximation technique” (Kollerstrom ).—is only occasionally encountered (e.g., , , , ). In a few cases “Newton–Kantorovich” appears, but refers to the mid-1900 contributions of Kantorovich to semilocal convergence results when F is defined on normed spaces. Attainable C -Orders The method attains at least C -order 2 for simple solutions. Regarding rigorous convergence results, “Fourier appears to have been the first to approach this question in a note entitled Question d’analyse algÃ©brique (1818) [â€¦], but an error was found in his formula,” as noted by Chabert et al.Â sect. 6.4, who continue by noting that Cauchy studied the subject from 1821 onwards [â€¦], but did not give a satisfactory formulation until 1829. [â€¦] The concern for clarity and rigour, which is found in all of Cauchy’s work, tidies up the question of the convergence of Newton’s method for the present. [cf.~, Thm.~II,~p.~184] If x * is simple and f â€² â€² â¡ ( x * ) â‰  0 , then the Newton method converges locally, with C -quadratic order: lim k â†’ âˆž | x * âˆ’ x k + 1 | | x * âˆ’ x k | 2 = | f â€² â€² â¡ ( x * ) 2 â¢ f â€² â¡ ( x * ) | = : Q 2 . The usual proofs for \cref{f. Newton method C2=|f”/2f’| for x* simple} consider the Taylor developments of f â¡ ( x k ) and f â€² â¡ ( x k ) , assuming small errors and neglecting “higher-order terms”: â„“ k + 1 = â„“ k âˆ’ f â¡ ( x * ) + f â€² â¡ ( x * ) â¢ â„“ k + â‹¯ f â€² â¡ ( x * ) + f â€² â€² â¡ ( x * ) â¢ â„“ k + â‹¯ â‰ˆ f â€² â€² â¡ ( x * ) 2 â¢ f â€² â¡ ( x * ) â¢ â„“ k 2 â€ƒâ€ƒ ( â„“ k := x k âˆ’ x * , â€‚i.e., signed ) . (a) Write the Newton iterates for computing 2 (cf.~p.~138). (b) (Newton everywhere sect.~17.9) Apply \cref{f. Newton method} to compute the division a / b of real numbers in terms of only the operations + , âˆ’ , * (cf.~p.~124). (a) The usual perception is that the smaller Q 2 , the faster the asymptotic speed; expression \cref{f. Newton method C2=|f”/2f’| for x* simple} shows that for the Newton method this is realized with smaller | f â¢ ‘ â¢ ‘ | and larger | f â¢ ‘ | , and this means that near x * , the graph of f resembles a line ( | f â¢ ‘ â¢ ‘ | small) having large angle with O â¢ x (the line is not close to O â¢ x ). In the opposite situation, when the angle is zero ( x * is multiple), the convergence is only linear. While this interpretation holds locally for the asymptotic range of the Newton iterates, it is worth mentioning that the larger the region with such a property, the larger the attraction set (see also \cref{f. Newton meth estimation radius attr ball}), and the better conditioned the problem of solving f â¡ ( x ) = 0 (see \cref{subsec: Newton method floating point arithmetic}). We note that taking 2 â¢ f â¡ ( x ) instead of f â¡ ( x ) , despite doubling f â€² â¡ ( x ) , does not reduce Q 2 , leading in fact to the same iterates (this property is called affine invariance; see ). (b) Of course, it would be of major interest to state a conclusion like “the smaller Q 2 , the fewer the iterates needed.” However, even if supported by a number of numerical examples, such a statement should be treated with great care. The reason is simple: we do not address similar objects ( Q 2 is an asymptotic constant, referring to k â†’ âˆž , while the desired conclusion refers to a finite number of steps). Moreover, when we say “the smaller Q 2 ,” this means that we consider different mappings f ; even if they have the same x * , the attraction set and the dynamics of the iterates may be different. The role of the asymptotic constants will be analyzed in the forthcoming work . Let us analyze the expectation that this method behaves similarly for the same Q 2 . (a) Let a âˆˆ R be given and f â¡ ( x ) := x + x 2 + a â¢ x 3 = 0 , â€ƒ â€‚withâ€‚ x * = 0. We get f â€² â¡ ( 0 ) = 1 and f â€² â€² â¡ ( 0 ) = 2 , so by \cref{f. Newton method C2=|f”/2f’| for x* simple}, Q 2 = 1 ( âˆ€ a âˆˆ R given). The full Taylor series gives the exact relation for the signed errors from \cref{f. Newton method ekp1=-f”/f’ ek^2}: â„“ k + 1 = 1 + 2 â¢ a â¢ â„“ k 1 + 2 â¢ â„“ k + 3 â¢ a â¢ â„“ k 2 â¢ â„“ k 2 . Take x 0 = 0.0001 â£ ( = e 0 ) fixed. By \cref{f. Newton method ekp1=-f”/f’ ek^2}, we should obtain | â„“ 1 | = e 1 â‰ˆ e 0 2 = 10 âˆ’ 8 for any a > 0 . In \cref{tab. errors in one Newton step for a} we see this does not hold when a has large values. As | a | grows, despite being “small,” the error e 0 in fact is not “squared,” as suggested by \cref{f. Newton method ekp1=-f”/f’ ek^2}. Indeed, when | a | is very large, in \cref{f. Newton method ekp1=-f”/f’ ek^2} the terms with larger errors (corresponding to f â€² â€² â€² â¡ ( x ) = 6 â¢ a ) are actually neglected, despite their having higher powers. (b) Taking x 0 = 0.0001 for f as above and then for f â¡ ( x ) = x + 2 â¢ x 2 shows that Q 2 and the number of iterates needed (for the same accuracy) are not in direct proportion. \renewcommand{\arraystretch }{1.2} e 1 = | â„“ 1 | computed in two equivalent ways, for different a ‘s (\ttdigits64, Julia).
a
â„“ 1 â£ ( = x 1 )
â„“ 1 by \cref{f. ekp by frac of full Taylor series}
1
9.999999700077396e-9
9.999999700059996e-9
10
1.0017993395925492e-8
1.0017993395922797e-8
10 5
2.0933014354066327e-7
2.093301435406699e-7
10 6
1.9510774606872468e-6
1.951077460687245e-6
10 7
1.5389940009229362e-5
1.5389940009229348e-5
Igarashi and Ypma noticed that a smaller Q 2 does not necessarily attract fewer iterates for attaining a given accuracy. However, the numerical examples used to support this claim were based not on revealing the individual asymptotic regime of the sequences, but on the number of iterates required to attain a given (\ttdigits64) accuracy. If f â€² â€² â¡ ( x * ) = 0 , then \cref{f. Newton method C2=|f”/2f’| for x* simple} implies Q -superquadratic order, but actually the order is higher: it is given by the first index â‰¥ 2 of the nonzero derivative at the solution. [see, e.g., ] Let x * be simple. The Newton method converges locally with C -order p 0 âˆˆ N , p 0 â‰¥ 2 , iff f ( p 0 ) â¡ ( x * ) â‰  0 â€ƒ a â¢ n â¢ d â€ƒ f â€² â€² â¡ ( x * ) = â‹¯ = f ( p 0 âˆ’ 1 ) â¡ ( x * ) = 0 , in which case Q p 0 = p 0 âˆ’ 1 p 0 ! â¢ | f ( p 0 ) â¡ ( x * ) f â€² â¡ ( x * ) | . Examples for the Attained C -Orders We consider decreasing orders. (a) \em{Infinite $C$-order.} If f â¡ ( x ) = a â¢ x + b , a â‰  0 , then x 1 = x * âˆ€ x 0 âˆˆ R . (b) \emArbitrary natural C -order p 0 â‰¥ 2 (see \cref{rem: real order p0>=2 for Newton method} for p 0 âˆ‰ N ). Let p â‰¥ 2 be a natural given number and consider f â¡ ( x ) = x + x p , â€ƒ x * = 0 , p = 2 , 3 , 4 . By \cref{th: Dubeau-Gnang characterize C-conv orders p>2 Newton method}, we obtain local convergence with C -order p and Q p = p âˆ’ 1 . Let us first deal with p = 2 . In double precision, for x 0 = 0.46 we obtain six nonzero iterates (shown truncated in \cref{tab. Newton iterates double prec}). In analyzing Q L â¢ ( k âˆ’ 1 ) (shown with the last six decimal digits truncated), we note that Q L â¡ ( 4 ) is a better approximation than the last one, Q L â¡ ( 5 ) . This phenomenon may appear even if we increase the precision (see Figure~). When x 0 = 0.47 and 0.45 we obtain five nonzero iterates (see \cref{exe. five Newton iterates for x0=0.47 and 0.45}), and Q L â¡ ( k ) is increasingly better. \renewcommand{\arraystretch }{1.2} Newton iterates for x + x 2 = 0 (\ttdigits64, Julia).
k
x k
Q L â¢ ( k âˆ’ 1 )
x k
Q L â¢ ( k âˆ’ 1 )
x k
Q L â¢ ( k âˆ’ 1 )
0
0.47

0.46

0.45

1
1.1 â¢ e âˆ’ 1
2.877706159
1.1 â¢ e âˆ’ 01
2.840052802
1.0 â¢ e âˆ’ 1
2.803816781
2
1.0 â¢ e âˆ’ 2
2.094428776
9.9 â¢ e âˆ’ 03
2.090320979
9.3 â¢ e âˆ’ 3
2.086305525
3
1.0 â¢ e âˆ’ 4
2.004592994
9.7 â¢ e âˆ’ 05
2.004275304
8.6 â¢ e âˆ’ 5
2.003972042
4
1.1 â¢ e âˆ’ 8
2.000023942
9.4 â¢ e âˆ’ 09
2.000021019
7.4 â¢ e âˆ’ 9
2.000018386
5
1.4 â¢ e âˆ’ 16
2.000000001
8.8 â¢ e âˆ’ 17
2.000000001
5.4 â¢ e âˆ’ 17
2.000000001
6
0

1.2 â¢ e âˆ’ 32
1.987981962
0

7
0

0