Abstract
The high speed of \(x_{k}\rightarrow x^\ast\in{\mathbb R}\) is usually measured using the C-, Q- or R-orders:
\begin{equation}\tag{$C$}
\lim \frac {|x^\ast – x_{k+1}|}{|x^\ast – x_k|^{p_0}}\in(0,+\infty),
\end{equation}
\begin{equation}\tag{$Q$}
\lim \frac {\ln |x^\ast – x_{k+1}|}{\ln |x^\ast – x_k|} =q_0,
\qquad \mbox{resp.}
\end{equation}
\begin{equation}\tag{$R$}
\lim \big\vert\ln |x^\ast – x_k| \big \vert ^\frac1k =r_0.
\end{equation}
By connecting them to the natural, term-by-term comparison of the errors of two sequences, we find that the C-orders — including (sub)linear — are in agreement. Weird relations may appear though for the Q-orders: we expect
\[
|x^\ast – x_k|=\mathcal{O}(|x^\ast – y_k|^\alpha), \qquad\forall\alpha>1,
\]
to attract “\(\geq\)” for the Q-orders of \(\{x_{k}\}\) vs. \(\{y_{k}\}\); the contrary is shown by an example providing no vs. infinite Q-orders. The R-orders seem even worse: an \(\{x_{k}\}\) with infinite R-order may have unbounded nonmonotone errors: \(\limsup \)\(|x^\ast – x_{k+1}|/|x^\ast – x_k|= +\infty\).
Such aspects motivate the study of equivalent definitions, computational variants, etc.
The orders are also the perspective from which we analyze the three basic iterative methods for nonlinear equations. The Newton method, widely known for its quadratic convergence, may in fact attain any C-order from \([1,+\infty]\) (including sublinear); we briefly recall such convergence results, along with connected aspects (historical notes, known asymptotic constants, floating point arithmetic issues, radius of attraction balls, etc.) and provide examples.
This approach leads to similar results for the successive approximations, while the secant method exhibits different behavior: it may not have high C-, but just Q-orders.
Authors
Keywords
(computational) convergence orders, iterative methods, Newton method, secant method, successive approximations, asymptotic rates.
freely available at the publisher: http://doi.org/10.1137/19M1244858
Paper coordinates:
E. Catinas, How many steps still left to x*?, SIAM Rev., 63 (2021) no. 3, pp. 585–624, doi: 10.1137/19M1244858
About this paper
Print ISSN
0036-1445
Online ISSN
1095-7200
Github
Some small codes in LaTeX and Julia have been posted on Github in order to illustrate some aspects.
We will post soon in Python too.
The link is: https://github.com/ecatinas/conv-ord
Corrigendum
(introduced on February 10, 2023, modified on February 25, 2023)
Corrigendum:
- for the sequences with \(\bar{Q}_1=\infty\) we have proposed the terminology “sequences with unbounded nonmonotone errors” (see Remark 2.12.c). This is also termed in the beginning of Section 2.2 as “sequences with no Q-order” – which is not proper, as we have subsequently found sequences with \(\bar{Q}_1=\infty\) and with Q-order \(1\);
- in Remark 2.50.b, “the strict superlinear” should be removed. Indeed, this is not a C-order, but a Q-order, as is defined with the aid of \(\bar{Q}_p\). On the other hand, the strict superlinear order is not always faster than the C-order \(p>1\).
Typos:
- in the abstract, condition \(|x^\ast – x_{k+1}|/|x^\ast – x_{k}| \rightarrow \infty\) (i.e., \(Q_1=\infty\)), for the sequence with unbounded nonmonotone error, should read instead \(\limsup |x^\ast – x_{k+1}|/|x^\ast – x_{k}| = \infty\) (i.e., \(\bar{Q}_1=\infty\)); obviously, condition \(Q_1=\infty\) cannot be fulfilled by a convergent sequence. The matter is correctly dealt with inside the paper (see Remark 2.12.c).
Notes
Working with the SIAM staff was an amazing experience. I would like to thank to Louis Primus for his solicitude and professionalism.
Version: October 30, 2021.
Some references I have missed:
(these were included in the initial submission, but missing in the final version, as not cited in text, and we switched to BibTeX)
- M. Grau-Sánchez, M. Noguera, A. Grau, J.R. Herrero, On new computational local orders of convergence, Appl. Math. Lett. 25 (2012) 2023–2030, doi: 10.1016/j.aml.2012.04.012
- I. Pavaloiu, Approximation of the roots of equations by Aitken-Steffensen-type monotonic sequences, Calcolo 32 (1–2) (1995) 69–82, doi: 10.1007/BF02576543.
- I. Pavaloiu, On the convergence of one step and multistep iterative methods, Unpublished manuscript, http://ictp.acad.ro/convergence-one-step-multistep-iterative-methods/.
known to me, but still missing (nobody is perfect):
- N. Bicanic and K. H. Johnson, Who was ’-Raphson’?, Intl. J. Numerical Methods Engrg., 14 (1979), pp. 148–152, https://doi.org/10.1002/nme.1620140112.
- A. Melman, Geometry and convergence of Euler’s and Halley’s methods, SIAM Rev., 39 (1997), pp. 728–735, https://doi.org/10.1137/S0036144595301140.
- S. Nash, Who was -Raphson? (an answer), NA Digest, 92 (1992).
found after sending the revised version:
- J. H. Wilkinson, The evaluation of the zeros of ill-conditioned polynomials. part ii, Numer. Math., 1 (1959), pp. 167–180, doi: 10.1007/BF01386382
- D. F. Bailey and J. M. Borwein, Ancient indian square roots: an exercise in forensic paleo mathematics, Amer. Math. Monthly, 119 (2012), pp. 646–657, doi 10.4169/amer.math.monthly.119.08.646.
- E. Halley, Methodus nova accurata & facilis inveniendi radices aeqnationum quarumcumque generaliter, sine praviae reductione, Phil. Trans. R. Soc., 18 (1694), pp. 136–148, https://doi.org/10.1098/rstl.1694.0029.
- A. Ralston and P. Rabinowitz, A First Course in Numerical Analysis, 2nd ed., 1978.
- M. A. Nordgaard, The origin and development of our present method of extracting the square and cube roots of numbers, The Mathematics Teacher, 17 (1924), pp. 223–238, https://doi.org/https://www.jstor.org/stable/27950616.
…
and others (to be completed)
[1] Advanpix team, Advanpix, Version 4.7.0.13642, 2020, https://www.advanpix.com/ (accessed 2020/03/22). (Cited on pp. 4, 26)
[2] I. K. Argyros and S. George, On a result by Dennis and Schnabel for Newton’s method: Further improvements, Appl. Math. Lett., 55 (2016), pp. 49-53, https://doi.org/10.1016/j.aml.2015.12.003. (Cited on p. 28)
[3] D. F. Bailey, A historical survey of solution by functional iteration, Math. Mag., 62 (1989), pp. 155-166, https://doi.org/10.1080/0025570X.1989.11977428. (Cited on pp. 7, 21, 34)
[4] W. A. Beyer, B. R. Ebanks, and C. R. Qualls, Convergence rates and convergence-order profiles for sequences, Acta Appl. Math., 20 (1990), pp. 267-284, https://doi.org/10.1007/bf00049571. (Cited on pp. 8, 9, 11, 14, 17, 19, 21)
[5] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: A fresh approach to numerical computing, SIAM Rev., 59 (2017), pp. 65-98, https://doi.org/10.1137/141000671. (Cited on p. 4)
[6] R. P. Brent, Algorithms for Minimization without Derivatives, Prentice-Hall, Englewood Cliffs, NJ, 1973. (Cited on p. 11)
[7] R. P. Brent, S. Winograd, and P. Wolfe, Optimal iterative processes for root-finding, Numer. Math., 20 (1973), pp. 327-341, https://doi.org/10.1007/bf01402555. (Cited on pp. 13, 15)
[8] C. Brezinski, Comparaison des suites convergentes, Rev. Franҫaise Inform. Rech. Oper., 5 (1971), pp. 95-99. (Cited on pp. 13, 14)
[9] C. Brezinski, Acceleration de la Convergence en Analyse Numerique, Springer-Verlag, Berlin, 1977. (Cited on pp. 2, 13, 14)
[10] C. Brezinski, Limiting relationships and comparison theorems for sequences, Rend. Circolo Mat. Palermo Ser. II, 28 (1979), pp. 273-280, https://doi.org/10.1007/bf02844100. (Cited on p. 19)
[11] C. Brezinski, Vitesse de convergence d’une suite, Rev. Roumaine Math. Pures Appl., 30 (1985), pp. 403-417. (Cited on pp. 11, 13, 18)
[12] P. N. Brown, A local convergence theory for combined inexact-Newton/finite-difference projection methods, SIAM J. Numer. Anal., 24 (1987), pp. 407-434, https://doi.org/10.1137/0724031. (Cited on p. 29)
[13] F. Cajori, Historical note on the Newton-Raphson method of approximation, Amer. Math. Monthly, 18 (1911), pp. 29-32, https://doi.org/10.2307/2973939. (Cited on pp. 22, 23)
[14] A. Cauchy, Sur la determination approximative des racines d’une equation algebrique ou transcendante, Oeuvres Complete (II) 4, Gauthier-Villars, Paris, 1899, 23 (1829), pp. 573-609, http://iris.univ-lille1.fr/pdfpreview/bitstream/handle/1908/4026/41077-2-4.pdf?sequence=1 (accessed 2019/02/01). (Cited on pp. 21, 23)
[15] J.-L. Chabert, ed., A History of Algorithms. From the Pebble to the Microchip, Springer Verlag, Berlin, 1999. (Cited on pp. 21, 22, 23, 28, 30, 34)
[16] J. Chen and I. K. Argyros, Improved results on estimating and extending the radius of an attraction ball, Appl. Math. Lett., 23 (2010), pp. 404-408, https://doi.org/10.1016/j.aml.2009.11.007. (Cited on p. 35)
[17] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust-Region Methods, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719857. (Cited on pp. 14, 17)
[18] E. Catinas, Inexact perturbed Newton methods and applications to a class of Krylov solvers, J. Optim. Theory Appl., 108 (2001), pp. 543-570, https://doi.org/10.1023/a:1017583307974. (Cited on p. 29)
[19] E. Catinas, On accelerating the convergence of the successive approximations method, Rev. Anal. Numer. Theor. Approx., 30 (2001), pp. 3-8, https://ictp.acad.ro/acceleratingconvergence-successive-approximations-method/ (accessed 20/10/10). (Cited on p. 34)
[20] E. Catinas, On the superlinear convergence of the successive approximations method, J. Optim. Theory Appl., 113 (2002), pp. 473-485, https://doi.org/10.1023/a:1015304720071. (Cited on p. 35)
[21] E. Catinas, The inexact, inexact perturbed and quasi-Newton methods are equivalent models, Math. Comp., 74 (2005), pp. 291-301, https://doi.org/10.1090/s0025-5718-04-01646-1. (Cited on p. 29)
[22] E. Catinas, Newton and Newton-type Methods in Solving Nonlinear Systems in RN , Risoprint, Cluj-Napoca, 2007. (Cited on p. 35)
[23] E. Catinas, Estimating the radius of an attraction ball, Appl. Math. Lett., 22 (2009), pp. 712-714, https://doi.org/10.1016/j.aml.2008.08.007. (Cited on p. 35)
[24] E. Catinas, A survey on the high convergence orders and computational convergence orders of sequences, Appl. Math. Comput., 343 (2019), pp. 1-20, https://doi.org/10.1016/j.amc.2018.08.006. (Cited on pp. 7, 8, 10, 11, 12, 14, 15, 16, 17, 18, 20, 21)
[25] E. Catinas, How Many Steps Still Left to x*? Part II. Newton iterates without f and f’, manuscript, 2020. (Cited on pp. 24, 35)
[26] G. Dahlquist and A. Bjork, Numerical Methods, Dover, Mineola, NY, 1974. (Cited on p. 33)
[27] R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal., 19 (1982), pp. 400-408, https://doi.org/10.1137/0719025. (Cited on p. 29)
[28] J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971446. (Cited on pp. 3, 29)
[29] J. E. Dennis, On Newton-like methods, Numer. Math., 11 (1968), pp. 324-330, https://doi.org/10.1007/bf02166685. (Cited on p. 17)
[30] J. E. Dennis, Jr., and J. J. More, A characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comp., 28 (1974), pp. 549-560, https://doi.org/10.1090/s0025-5718-1974-0343581-1. (Cited on pp. 19, 29)
[31] J. E. Dennis, Jr., and J. J. More, Quasi-Newton methods, motivation and theory, SIAM Rev., 19 (1977), pp. 46-89, https://doi.org/10.1137/1019005. (Cited on pp. 27, 29)
[32] J. E. Dennis, Jr., and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, PA, 1996, https://doi.org/10.1137/1.9781611971200. (Cited on pp. 15, 28)
[33] G. Deslauries and S. Dubuc, Le calcul de la racine cubique selon Heron, Elem. Math., 51 (1996), pp. 28-34, http://eudml.org/doc/141587 (accessed 2020/02/18). (Cited on p. 30)
[34] P. Deuflhard, Newton Methods for Nonlinear Problems. Affine Invariance and Adaptive Algorithms, Springer-Verlag, Heidelberg, 2004. (Cited on pp. 23, 24)
[35] P. Deuflhard and F. A. Potra, Asymptotic mesh independence of Newton-Galerkin methods via a refined Mysovskii theorem, SIAM J. Numer. Anal., 29 (1992), pp. 1395-1412, https://doi.org/10.1137/0729080. (Cited on p. 28)
[36] P. Diez, A note on the convergence of the secant method for simple and multiple roots, Appl. Math. Lett., 16 (2003), pp. 1211-1215, https://doi.org/10.1016/s0893-9659(03)90119-4. (Cited on pp. 27, 31)
[37] F. Dubeau and C. Gnang, Fixed point and Newton’s methods for solving a nonlinear equation: From linear to high-order convergence, SIAM Rev., 56 (2014), pp. 691-708, https://doi.org/10.1137/130934799. (Cited on pp. 25, 34)
[38] J. Fourier, Question d’analyse algebrique, Bull. Sci. Soc. Philo., 67 (1818), pp. 61-67, http://gallica.bnf.fr/ark:/12148/bpt6k33707/f248.item (accessed 2018-02-23). (Cited on pp. 7, 28)
[39] W. Gautschi, Numerical Analysis, 2nd ed., Birkhauser/Springer, New York, 2011, https://doi.org/10.1007/978-0-8176-8259-0. (Cited on p. 23)
[40] A. Greenbaum and T. P. Chartier, Numerical Methods. Design, Analysis, and Computer Implementation of Algorithms, Princeton University Press, Princeton, NJ, 2012. (Cited on pp. 23, 27, 31)
[41] M. T. Heath, Scientific Computing. An Introductory Survey, 2nd ed., SIAM, Philadelphia, PA, 2018, https://doi.org/10.1137/1.9781611975581. (Cited on pp. 9, 29)
[42] M. Heinkenschloss, Scientific Computing. An Introductory Survey, Rice University, Houston, TX, 2018. (Cited on pp. 3, 15, 19)
[43] J. Herzberger and L. Metzner, On the Q-order of convergence for coupled sequences arising in iterative numerical processes, Computing, 57 (1996), pp. 357-363, https://doi.org/10.1007/BF02252254. (Cited on p. 13)
[44] N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, Philadelphia, PA, 2002, https://doi.org/10.1137/1.9780898718027. (Cited on p. 3)
[45] M. Igarashi and T. J. Ypma, Empirical versus asymptotic rate of convergence of a class of methods for solving a polynomial equation, J. Comput. Appl. Math., 82 (1997), pp. 229-237, https://doi.org/10.1016/s0377-0427(97)00077-0. (Cited on p. 25)
[46] L. O. Jay, A note on Q-order of convergence, BIT, 41 (2001), pp. 422-429, https://doi.org/10.1023/A:1021902825707. (Cited on pp. 5, 11, 12, 17)
[47] T. A. Jeeves, Secant modification of Newton’s method, Commun. ACM, 1 (1958), pp. 9-10, https://doi.org/10.1145/368892.368913. (Cited on pp. 11, 30)
[48] V. J. Katz, A History of Mathematics. An Introduction, 2nd ed., Addison-Wesley, 1998. (Cited on p. 21)
[49] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, PA, 1995, https://doi.org/10.1137/1.9781611970944. (Cited on pp. 5, 10, 17)
[50] C. T. Kelley, Iterative Methods for Optimization, SIAM, Philadelphia, PA, 1999, https://doi.org/10.1137/1.9781611970920. (Cited on pp. 15, 17)
[51] E. S. Kennedy and W. R. Transue, A medieval iterative algorism, Amer. Math. Monthly, 63 (1956), pp. 80-83, https://doi.org/10.1080/00029890.1956.11988762. (Cited on p. 34)
[52] D. E. Knuth, Big Omicron and big Omega and big Theta, ACM SIGACT News, 8 (1976), pp. 18-24, https://doi.org/10.1145/1008328.1008329. (Cited on pp. 2, 14)
[53] D. E. Knuth, The TeXbook, Addison-Wesley, 1984. (Cited on p. 4)
[54] N. Kollerstrom, Thomas Simpson and Newton’s method of approximation: an enduring myth, British J. History Sci., 25 (1992), pp. 347-354, https://doi.org/10.1017/s0007087400029150. (Cited on pp. 22, 23)
[55] K. Liang, Homocentric convergence ball of the secant method, Appl. Math. Chin. Univ., 22 (2007), pp. 353–365, https://doi.org/10.1007/s11766-007-0313-3. (Cited on p. 33)
[56] D. Luca and I. Pavaloiu, On the Heron’s method for approximating the cubic root of a real number, Rev. Anal. Numer. Theor. Approx., 26 (1997), pp. 103-108, https://ictp.acad.ro/jnaat/journal/article/view/1997-vol26-nos1-2-art15 (accessed 2020-03-21). (Cited on p. 30)
[57] M. Grau-Sanchez, M. Noguera, and J. M. Gutierrez, On some computational orders of convergence, Appl. Math. Lett., 23 (2010), pp. 472-478, https://doi.org/10.1016/j.aml.2009.12.006. (Cited on p. 20)
[58] Mathworks, MATLAB, Version 2019b, 2019, http://www.mathworks.com (accessed 2019/12/12). (Cited on pp. 4, 26)
[59] J. M. McNamee, Numerical Methods for Roots of Polynomials, Part I, Elsevier, Amsterdam, 2007. (Cited on p. 23)
[60] J. M. McNamee and V. Y. Pan, Numerical Methods for Roots of Polynomials, Part II, Elsevier, Amsterdam, 2013. (Cited on pp. 28, 33)
[61] U. C. Merzbach and C. B. Boyer, A History of Mathematics, 3rd ed., John Wiley & Sons, 2011. (Cited on p. 30)
[62] J.-M. Muller, N. Brunie, F. de Dinechin, C.-P. Jeannerod, M. Joldes, V. Lefevre, G. Melquiond, N. Revol, and S. Torres, Handbook of Floating-Point Arithmetic, 2nd ed., Springer, New York, 2018, https://doi.org/10.1007/978-3-319-76526-6. (Cited on pp. 22, 24)
[63] A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, 1983. (Cited on p. 15)
[64] A. Neumaier, Introduction to Numerical Analysis, Cambridge University Press, 2001. (Cited on pp. 31, 33)
[65] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer, New York, 2006, https://doi.org/10.1007/978-0-387-40065-5. (Cited on pp. 8, 17)
[66] J. M. Ortega, Numerical Analysis. A Second Course, SIAM, Philadelphia, PA, 1990, https://doi.org/10.1137/1.9781611971323. (Cited on pp. 28, 34)
[67] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, SIAM, Philadelphia, PA, 2000, https://doi.org/10.1137/1.9780898719468. (Cited on pp. 8, 10, 11, 14, 15, 16, 18, 21, 27, 34)
[68] A. M. Ostrowski, Solution of Equations and Systems of Equations, 2nd ed., Academic Press, New York, 1966; 3rd ed., 1972. (Cited on pp. 27, 30, 34)
[69] M. L. Overton, Numerical Computing with IEEE Floating Point Arithmetic, SIAM, Philadelphia, PA, 2001, https://doi.org/10.1137/1.9780898718072. (Cited on pp. 3, 29)
[70] J. M. Papakonstantinou and R. A. Tapia, Origin and evolution of the secant method in one dimension, Amer. Math. Monthly, 120 (2013), pp. 500-518, https://doi.org/10.4169/amer.math.monthly.120.06.500. (Cited on p. 30)
[71] K. Plofker, An example of the secant method of iterative approximation in a fifteenth century sanskrit text, Historia Math., 23 (1996), pp. 246-256, https://doi.org/10.1006/hmat.1996.0026. (Cited on pp. 30, 34)
[72] K. Plofker, Use and transmission of iterative approximations in India and the Islamic World, in From China to Paris: 2000 Years Transmission of Mathematical Ideas, Y. Dold-Samplonius, J. W. Dauben, M. Folkerts, and B. Van Dalen, eds., Franz Steiner Verlag Wiesbaden GmbH, Stuttgart, 2002, pp. 167-186. (Cited on pp. 21, 22, 30, 34)
[73] E. Polak, Optimization. Algorithms and Consistent Approximations, Springer-Verlag, New York, 1997. (Cited on pp. 5, 11, 15, 19)
[74] F. A. Potra, On Q-order and R-order of convergence, J. Optim. Theory Appl., 63 (1989), pp. 415-431, https://doi.org/10.1007/bf00939805. (Cited on pp. 8, 11, 13, 14, 15, 16, 17, 18, 21) [75] F. A. Potra, Q-superlinear convergence of the iterates in primal-dual interior-point methods, Math. Program. Ser. A, 91 (2001), pp. 99-115, https://doi.org/10.1007/s101070100230. (Cited on p. 15)
[76] F. A. Potra, A superquadratic version of Newton’s method, SIAM J. Numer. Anal., 55 (2017), pp. 2863-2884, https://doi.org/10.1137/17M1121056. (Cited on p. 13)
[77] F. A. Potra and V. Ptak, Nondiscrete Induction and Iterative Processes, Pitman, Boston, MA, 1984. (Cited on pp. 8, 11, 13, 15, 19)
[78] F. A. Potra and R. Sheng, A path following method for LCP with superlinearly convergent iteration sequence, Ann. Oper. Res., 81 (1998), pp. 97-11, https://doi.org/10.1023/A:1018942131812. (Cited on p. 14)
[79] I. Pavaloiu, Sur l’ordre de convergence des methodes d’iteration, Mathematica, 23(46) (1981), pp. 261-272, https://doi.org/10.1007/BF02576543. (Cited on p. 14)
[80] I. Pavaloiu, Optimal algorithms concerning the solving of equations by interpolation, Ed. Srima, (1999), pp. 222-248, http://ictp.acad.ro/optimal-algorithms-concerning-solvingequations-interpolation/ (accessed 2018-03-22). (Cited on p. 13)
[81] A. Quarteroni, R. Sacco, and F. Saleri, Numerical Mathematics, Springer, Berlin, 2007. (Cited on pp. 28, 29)
[82] R. Rashed, The Development of Arabic Mathematics: Between Arithmetic and Algebra, Stud. Philos. Sci. 156, Springer, Boston, MA, 1994. (Cited on pp. 22, 30)
[83] M. Raydan, Exact order of convergence of the secant method, J. Optim. Theory Appl., 78 (1993), pp. 541-551, https://doi.org/10.1137/060670341. (Cited on pp. 25, 31)
[84] W. C. Rheinboldt, An adaptive continuation process for solving systems of nonlinear equations, in Mathematical Models and Numerical Methods, Banach Ctr. Publ. 3, Polish Academy of Science, 1977, pp. 129-142, https://doi.org/10.4064/-3-1-129-142. (Cited on p. 28)
[85] W. C. Rheinboldt, Methods for Solving Systems of Nonlinear Equations, 2nd ed., SIAM, Philadelphia, PA, 1998, https://doi.org/10.1137/1.9781611970012. (Cited on pp. 11, 15)
[86] T. Sauer, Numerical Analysis, 2nd ed., Pearson, Boston, MA, 2012. (Cited on pp. 21, 29)
[87] E. Schroder, Ueber unendlich viele algorithmen zur auflosung der gleichungen, Math. Ann., 2 (1870), pp. 317-365, https://doi.org/10.1007/bf01444024; available as On Infinitely Many Algorithms for Solving Equations, Tech reports TR-92-121 and TR-2990, Institute for Advanced Computer Studies, University of Maryland, 1992; translated by G. W. Stewart. (Cited on pp. 7, 34)
[88] H. Schwetlick, Numerische Losung Nichtlinearer Gleichungen, VEB, Berlin, 1979. (Cited on pp. 8, 13, 14, 16)
[89] L. F. Shampine, R. C. Allen, Jr., and S. Pruess, Fundamentals of Numerical Computing, John Wiley and Sons, New York, 1997. (Cited on pp. 29, 33)
[90] T. Steihaug, Computational science in the eighteenth century. Test cases for the methods of Newton, Raphson, and Halley: 1685 to 1745, Numer. Algor., 83 (2020), pp. 1259-1275, https://doi.org/10.1007/s11075-019-00724-8. (Cited on pp. 22, 23)
[91] E. Suli and D. Mayers, An Introduction to Numerical Analysis, Cambridge University Press, Cambridge, UK, 2003. (Cited on p. 33)
[92] T. Tantau, The tikz and pgf Packages. Manual for Version 3.1.5b-34-gff02ccd1, https://github.com/pgf-tikz/pgf. (Cited on p. 4)
[93] R. A. Tapia, J. E. Dennis, Jr., and J. P. Schafermeyer, Inverse, shifted inverse, and Rayleigh quotient iteration as Newton’s method, SIAM Rev., 60 (2018), pp. 3-55, https://doi.org/10.1137/15M1049956. (Cited on pp. 7, 16, 17, 18)
[94] R. A. Tapia and D. L. Whitley, The projected Newton method has order 1 + √2 for the symmetric eigenvalue problem, SIAM J. Numer. Anal., 25 (1989), pp. 1376-1382, https://doi.org/10.1137/0725079. (Cited on p. 13)
[95] J. F. Traub, Iterative Methods for the Solutions of Equations, Prentice Hall, Englewood Cliffs, NJ, 1964. (Cited on pp. 13, 31, 34)
[96] E. E. Tyrtyshnikov, A Brief Introduction to Numerical Analysis, Birkhauser/Springer, New York, 1997. (Cited on p. 24)
[97] J. S. Vandergraft, Introduction to Numerical Computations, 2nd ed., Academic Press, New York, 1983. (Cited on p. 33)
[98] H. F. Walker, An approach to continuation using Krylov subspace methods, in Computational Science in the 21st Century, J. Periaux, ed., John Wiley and Sons, 1997, pp. 72-81. (Cited on p. 19)
[99] S. J. Wright, Primal-Dual Interior-Point Methods, SIAM, Philadelphia, PA, 1997, https://doi.org/10.1137/1.9781611971453. (Cited on p. 12)
[100] S. J. Wright, Optimization algorithms for data analysis, in The Mathematics of Data, IAS/Park City Math. Ser. 25, AMS, Providence, RI, 2018, pp. 49-97. (Cited on p. 15)
[101] T. J. Ypma, Historical development of the Newton-Raphson method, SIAM Rev., 37 (1995), pp. 531–551, https://doi.org/10.1137/1037125. (Cited on pp. 7, 21, 22, 23, 30, 34)
The high speed of x k → x * ∈ R is usually measured using the C -, Q -, or R -orders: lim | x * − x k + 1 | | x * − x k | p 0 ∈ ( 0 , + ∞ ) , lim ln â¡ | x * − x k + 1 | ln â¡ | x * − x k | = q 0 , or lim | ln â¡ | x * − x k | | 1 k = r 0 . By connecting them to the natural, term-by-term comparison of the errors of two sequences, we find that the C -orders—including (sub)linear—are in agreement. Weird relations may appear though for the Q -orders: we expect | x * − x k | = O â¡ ( | x * − y k | α ) ∀ α > 1 to imply “ ≥ ” for the Q -orders of { x k } vs. { y k } ; the contrary is shown by an example providing no vs. infinite Q -orders. The R -orders appear to be even worse: an { x k } with infinite R -order may have unbounded nonmonotone errors: | x * − x k + 1 | / | x * − x k | → + ∞ . Such aspects motivate the study of equivalent definitions, computational variants, and so on. These orders are also the perspective from which we analyze the three basic iterative methods for nonlinear equations in R . The Newton method, widely known for its quadratic convergence, may in fact attain any C -order from [ 1 , + ∞ ] (including sublinear); we briefly recall such convergence results, along with connected aspects (such as historical notes, known asymptotic constants, floating point arithmetic issues, and radius of attraction balls), and provide examples.This approach leads to similar results for the successive approximations method, while the secant method exhibits different behavior: it may not have high C -orders, but only Q -orders. (computational) convergence orders, iterative methods, Newton method, secant method, successive approximations, asymptotic rates 65-02, 65-03, 41A25, 40-02, 97-02, 97N40, 65H05 10.1137/19M1244858 \bfseriesContents \setcounter{tocdepth}{3} Introduction The analysis of the convergence speed of sequences is an important task, since in numerical applications the aim is to use iterative methods with fast convergence, which provide good approximations in few steps. The ideal setting for studying a given sequence { x k } := ( x k ) k ≥ 0 ⊂ R is that of error-based analysis, where the finite limit x * is assumed to be known and we analyze the absolute values of the errors, e k := | x k − x * | . The errors allow the comparison of the speeds of two sequences in the most natural way, even if their limits are distinct (though here we write both as x * ). Let us first define notation, for later reference. Given x Ë™ k , k → x * ∈ R , denote their errors by { e Ë™ k } , resp., { k } , etc. { x Ë™ k } converges faster (not slower) than { k } if | x * − x Ë™ k | ≤ | x * − k | ,   k ≥ k 0 (or, in brief, { x Ë™ k } is \cref{eq: |x*-xk| <= |x*-yk|} faster than { k } ) and strictly faster if | x * − x Ë™ k | < | x * − k | ,   k ≥ k 0 . Furthermore, increasingly faster convergence holds if – for some c ∈ ( 0 , 1 ) we have | x * − x Ë™ k | ≤ c ⢠| x * − k | , k ≥ k 0 ; – the constant c above is not fixed, but tends to zero (see, e.g., p.~2, ), | x * − x Ë™ k | = o â¡ ( | x * − k | )   as  k → ∞ ; This means that | x * − x Ë™ k | ≤ c k ⢠| x * − k | , k ≥ k 0 , with c k → 0 , which allows a finite number of elements c k to be greater than one. – given α > 1 , we have | x * − x Ë™ k | = O â¡ ( | x * − k | α ) as k → ∞ . That is, ∃ K > 0 such that | x * − x Ë™ k | ≤ K ⢠| x * − k | α , k ≥ k 0 . Obviously, \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} ⇒ \cref{eq: |x*-xk| = o(|x*-yk|)} ⇒ \cref{eq: |x*-xk| < |x*-yk|} ⇒ \cref{eq: |x*-xk| <= |x*-yk|}. Let us first illustrate the convergence speed in an intuitive fashion, using some graphs. Consider the following two groups of sequences ( { x k } = { e k } ) : (a) { 1 k } , { 1 k } , { 1 k 4 } ; (b) { k 2 k } , { 1 2 k } = { ( 1 2 ) k } = { 2 − k } , { 1 k ⢠2 k } , { 1 4 k } . Handling the first terms of these sequences raises no difficulties, as all the common programming languages represent them as fl â¡ ( x k ) in standard double precision arithmetic (called \ttdigits64 by the IEEE 754-2008 standard); see \cref{exe. smallest k for xk in a)-b)} below. The plotting of ( k , e k ) in \cref{fig: plot errs linear conv} does not help in analyzing the behavior of the errors, as for k ≥ 10 we cannot see much, even if the graph is scaled. The implicit fitting of the figures in the window usually results in different units for the two axes (scaled graph). In \cref{fig: plot errs linear conv} we would see even less if the graph were unscaled. All we can say is that { 1 k } has the slowest convergence, followed by { 1 k } , and also that the first terms of { 1 k 4 } are smaller than those of { 1 2 k } . In order to compare the last terms we must use the semilog coordinates ( k , lg â¡ e k ) in \cref{fig: semilog errs linear conv} (see p.~211); for { 1 2 k } , they are ( k , k ⢠lg â¡ 1 2 ) , belonging to the line y = a ⢠x , a = lg â¡ 1 2 . As the graph is twice scaled, The first scaling is by the use of an unequal number of units ( 50 on axis x vs. 36 on axis y ), and the second one results from the fact that the final figure is a rectangle instead of a square. it actually shows another line, y = b ⢠x . \Cref{fig: semilog errs linear conv} shows that, in reality, { 1 2 k } is \cref{eq: |x*-xk| < |x*-yk|} faster than { 1 k 4 } ( k 0 = 17 ). We will see later that the convergence is sublinear in group (a) and linear in group (b). Show that (a) { 1 2 k } is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} ∀ α > 1 ] faster than { 1 k 4 } ; (b) { 1 2 k } is \cref{eq: |x*-xk| = o(|x*-yk|)} but not [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} for some α > 1 ] faster than { k 2 k } . (a) The smallest positive \ttdigits64 number, which we denote by \ttrealmin(digits64), is 2.2250738585072014 ⢠e − 308 . Analyzing its binary representation, express its value as a power of 2 (see p.~14, , or ). (b) For each sequence from (a) and (b) in the example above, compute the largest k such that fl â¡ ( x k ) ≠0 . (c) Give a formula for the largest index k for which all the elements fl â¡ ( x k ) of all the sequences from (a) and (b) in the example above are nonzero. Next we analyze some increasingly faster classes of sequences. Consider (c) { 1 2 k 2 } , { 1 2 k 3 } , { 1 k k } ; (d) { 1 2 2 k k } , { 1 2 2 k } = { ( 1 2 ) 2 k } = { 2 − 2 k } , { 1 2 k ⢠2 k } , { 1 3 2 k } ; (e) { 1 2 3 k } ; (f) { 1 2 2 k ⢠k } , { 1 2 2 k 2 } . In \cref{fig: semilogy errs ord 1 and sl – line parabola and betw} we plot { 1 4 k } , { 1 2 k 2 } , and { 1 k k } . Their graphs are on a line (the fastest from \cref{fig: semilog errs linear conv}), on a parabola y = c ⢠x 2 , and, resp., in between. The computation with what appears at first sight to be a “reasonable” number of terms (say, 10 ) becomes increasingly challenging as we successively consider { x k } from (c)–(f). We leave as an exercise the calculation/computation of the largest index k for each { x k } in (c)–(f) such that fl â¡ ( x k ) is a nonzero \ttdigits64 number; we note though that all fl â¡ ( 1 / 2 2 k 2 ) and fl â¡ ( 1 / 2 2 k ⢠k ) are zero for k ≥ 4 , respectively, k ≥ 5 , and so \ttdigits64 numbers are not enough here. There are plenty of choices for increased precision: the well-known MATLAB (but with the Advanpix toolbox providing arbitrary precision), the recent, freely available Julia , and many others (not to mention the programming languages for symbolic computation). All the graphs from this paper were obtained using the tikz/pgf package for LATEX. The initial TEX system was created by D.~E. Knuth . The figures are easy to customize and the instructions to generate them have a simple form, briefly, \tt \addplot [domain=1:20, samples=20] {1/sqrt(x)}; Equally important, we will see that the tikz/pgf library allows the use of higher precision than \ttdigits64 (more precisely, smaller/larger numbers by increased representation of the exponent, e.g., \ttrealmin around 10 − 6500 ). The LATEX sources for the figures are posted on \url{https://github.com/ecatinas/conv-ord}. The sequence groups (c)–(f) are presented in \cref{fig: semilogy errs std conv sl 1 2 3 inf}; { 1 2 k 3 } belongs to a cubic parabola, and { 1 2 2 k } to an exponential. The parabola of { 1 2 k 2 } appears flat, which shows how much faster the other sequences converge. The descent of the three terms of { 1 / 2 2 k 2 } is the steepest. As will be seen later, the sequence groups (c)–(f) have increasing order: strict superlinear, quadratic, cubic, and infinite, respectively. Show that { 1 3 2 k } is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))}, α ∈ ( 1 , ln â¡ 3 ln â¡ 2 ] ] faster than { 1 2 2 k } . Obviously, when comparing two sequences by \cref{eq: |x*-xk| < |x*-yk|}, the faster one must have its graph below the other one ( k ≥ k 0 ) (cf.~Polak p.~47). While the more concave the graph the better (cf.~Kelley Ex.~5.7.15), fast convergence does not in fact necessarily require smooth graphs (see Jay ). Prove that any convergence curve between { 1 2 3 k } and { 1 3 2 k } is at least [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))}, α ∈ ( 1 , ln â¡ 3 ln â¡ 2 ] ] faster than { 1 2 2 k } (hint: use \cref{exe. 1/3^2^k = O((1/2^2^k)^1+alpha)}). Consider, for instance, the merging of certain sequences. Let x k a = { 1 3 2 k , k odd , 1 2 3 k , k even ,    x k b = { 1 2 2 k , k even , 1 3 2 k , k odd ,    x k c = { 1 2 2 k , k even , 1 5 2 k , k odd . { 1 2 3 k } , { x k a } , { 1 3 2 k } , { x k b } , { 1 2 2 k } , written in \cref{eq: |x*-xk| <= |x*-yk|} order of decreasing speed, are plotted in \Cref{fig: semilogy errs merged seq ords 3 2}. The usual ranking by convergence orders is instead { 1 2 3 k } , { 1 3 2 k } , { 1 2 2 k } , { x k b } , { x k a } ( C -orders 3 , 2 , 2 , R -order 2 , no order). Though { x k a } has the second fastest convergence, the orders actually rank it as the slowest. { x k b } does not have C – or Q -order (but just exact R -order 2 ), so is usually ranked below { 1 2 2 k } . { 1 5 2 k } , { x k c } , { 1 2 2 k } , { 1 1.3 2 k } from \cref{fig: semilogy errs seq ords 3 2 xck 2} have ranking { 1 5 2 k } , { 1 2 2 k } , { 1 1.3 2 k } , { x k c } ( C -orders 2 , 2 , 2 , R -order 2 ). Though nonmonotone, { x k c } has exact R -order 2 and is (at least) \cref{eq: |x*-xk| < |x*-yk|} faster than the C -quadratic { 1 1.3 2 k } . Given { x Ë™ k } , { k } with errors shown in \cref{fig: semilogy errs barbar xk= bar xk = 1/2^2^k in 2 different windows}, which one is faster? The study of errors using \cref{comp: def term by term} seems Of course, one cannot always compare all sequences by \cref{eq: |x*-xk| <= |x*-yk|}: consider { x k a } and { 1 6 2 k } , for example. perfect in theory, but has a notable disadvantage: it requires two sequences. The alternative is to use a “scale” for measuring any single sequence—the convergence orders. The (classical) C -orders, though they demand some strong conditions, appear in the convergence of the most often encountered iterative methods for nonlinear equations. A notable exception: the higher orders of the secant method are Q – but not necessarily C -orders. Still, some problems appear, for example, when one wants to check the usual predictions such as “in quadratic convergence, the error is squared at each step.” Indeed, as a sequence is an infinite set, its order refers to the asymptotic range (all sufficiently large indices), where statements such as “if ε > 0 is suff. small, ∃ k 0 ≥ 0 s.t. [some property] holds ∀ k ≥ k 0 ” come to life. Clearly, the first k 0 terms ( k 0 = 2 , 10 , or perhaps 10 6 ) are not necessarily connected to the order: a finite number of terms does not contribute to the order of an abstract sequence, while, most painful, the high order of an iterative method may not be reflected in its first steps if the initial approximation is not good enough. Therefore, such predictions, instead of being applicable to the very first term, are usually applied only starting from the first terms from the asymptotic range. When the stringent conditions of C -orders are not fulfilled, the Q -orders are considered instead, but unfortunately they can be in disagreement with the term-by-term comparison of the errors of two sequences; indeed, we show that convergence even with no Q – (or C )-order may in fact hide a fast overall speed, only some of the consecutive errors do not decrease fast enough. The R -orders are instead known for their limited usefulness, requiring the weakest conditions and allowing nonmonotone errors. Consider now a further problem: given some iterations for which the calculation of the order turns out to be difficult or even impossible, can the computer be used to approximate it? A positive answer is given by the computational convergence orders. Despite the fact that convergence orders have a long history, it is only recently that some final connections were made and a comprehensive picture was formed of the C -, Q -, and R -orders, together with their computational variants . The understanding of these notions is important from both the theoretical and the practical viewpoints, the comments of Tapia, Dennis, and Schäfermeyer being most relevant: The distinction between Q – and R -convergence is quite meaningful and useful and is essentially sacred to workers in the area of computational optimization. However, for reasons not well understood, computational scientists who are not computational optimizers seem to be at best only tangentially aware of the distinction. We devote \cref{sec: C- Q- and R-orders + comparisons} to error-based analysis of the problem of measuring and comparing convergence speeds of abstract real sequences, using either the convergence orders we review, or the term-by-term comparison of errors. In \cref{sec: Comput ver of conv orders} we deal with the computational versions of the convergence orders (based either on the corrections Each x k + 1 can be seen as being obtained from x k by adding the correction. x k + 1 − x k or on the nonlinear residuals f â¡ ( x k ) ) and show that in R they are equivalent to the error-based ones. In \cref{sec: iterative meth for nonlin eq} we deal with three main iterative methods (Newton, secant, and successive approximations), presenting results on their attainable convergence orders, as well as other connected aspects (asymptotic constants, estimation of the attraction balls, floating point arithmetic issues, brief history, etc.). C -, Q -, and R -Orders p 0 > 1 vs. Asymptotic Rates Even though the roots of iterative methods trace back to the Babylonians and Egyptians (ca.~1800 B.C.) , the first comment on convergence orders seems to have been made by Newton (ca.~1669) on the doubling of digits in quadratic convergence (quoted by Ypma in ): “But if I desire to continue working merely to twice as many figures, less one , … , ” and then in 1675: “That is, the same Division, by wch you could finde the 6th decimal figure, if prosecuted, will give you all to the 11th decimal figure.” In the Journal Book of the Royal Society, it is recorded “17 December 1690: Mr Ralphson’s Book was this day produced by E Halley, wherein he gives a Notable Improvemt of ye method […], which doubles the known figures of the Root known by each Operation….” Halley noticed the tripling of digits in the method he introduced. In his Tracts, Hutton showed that one of his schemes is of third order . In 1818, Fourier also noted that the iterates double the exact figures at each Newton step. In 1870, Schröder implicitly defined the high C -orders for some iterative methods by considering conditions on the nonlinear mapping. Three types of convergence orders are used at present: the classical C -order (notation adopted from ), the Q -order (which contains the C -order as a particular case), and the R -order. Outstanding contributions to their study were successively made by Ortega and Rheinboldt (1970) in their fundamental book , Potra and Pták (1984) , Potra (1989) , and finally by Beyer, Ebanks, and Qualls (1990), in the less known but essential paper . In we connected and completed these results (in R N ). We analyze here only the high orders, noting that the various equivalent definitions of the Q -orders lead to intricate implications for linear convergence . C -Order p 0 > 1 The definition of C – and Q -orders is obtained by considering for p ≥ 1 the \emquotient convergence factors sect. 9.1 Q p â¡ ( k ) := e k + 1 ( e k ) p = | x * − x k + 1 | | x * − x k | p ,   k = 0 , 1 , … ⢠.3 It is assumed that x k ≠x * , k ≥ 0 . If, though, x k 0 = x * in an iterative method, then (hopefully) we get x k = x * , k ≥ k 0 (this holds for the three methods in \cref{sec: iterative meth for nonlin eq}). Let us briefly list the four slowest types of C -order: no C -order, when ∄ Q 1 := lim k → ∞ Q 1 â¡ ( k ) (e.g., x k = 1 k , k odd; x k = 2 k , k even); C -sublinear, if Q 1 = 1 (e.g., { 1 k } ); C -linear, if 0 < Q 1 < 1 (e.g., { 1 2 k } ); C -superlinear (defined later, usually called (strict) Q -superlinear); notice that Q 1 > 1 cannot hold, and that { x k a } , { x k c } , with no C -order, in fact are fast. (a) (see p.~620) C -linear order implies strict monotone errors: e k + 1 < e k ,   k ≥ k 0 . (b) C -sublinear ⇠monotone errors (e.g., x k = 1 k − 2 , k even, x k = 1 k , k odd, k ≥ 3 ). (c) { x k a } , { x k c } , both nonmonotone, have no C -order. If { k } has C -sublinear and { x Ë™ k } C -linear order, show that { x Ë™ k } is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} ∀ α > 1 ] faster than { k } . The following definition of high orders is well known; p 0 > 1 is implicitly assumed throughout this paper even if not explicitly mentioned. { x k } has C -order p 0 > 1 if Q p 0 := lim k → ∞ Q p 0 â¡ ( k ) ∈ ( 0 , + ∞ ) . It is important to stress that Q p 0 above cannot be zero or infinite; these two cases will arise when analyzing the Q -orders. By denoting (see sect. 9.1, p.~83) \barbelow {Q} p = liminf k → ∞ Q p â¡ ( k ) ,   Q ¯ p = limsup k → ∞ Q p â¡ ( k ) ,   p ≥ 1 , condition \cref{f. def C-order} is equivalent either to relation 0 < \barbelow {Q} p 0 = Q p 0 = Q ¯ p 0 < ∞ or to requiring that ∀ ε > 0 , ∃ k 0 ≥ 0 such that ( Q p 0 − ε ) ⢠e k p 0 ≤ e k + 1 ≤ ( Q p 0 + ε ) ⢠e k p 0    ∀ k ≥ k 0 . \cref{f. def C-order} ⇒ \cref{f. strict monotone errors} (by \cref{eq. C-order p0 <=> (Qp0-eps).ek^p0 <= ekp1 <= (Qp0+eps).ek^p0}, or because \cref{f. def C-order} is stronger than C -linear). (a) x k = θ p 0 k for some θ ∈ ( 0 , 1 ) , p 0 > 1 , is the standard example for C -order p 0 ( Q p 0 = 1 ). The usual instances are the C -quadratic { 2 − 2 k } , { 3 − 2 k } ( θ = 1 2 , resp., θ = 1 3 , p 0 = 2 ), the C -cubic { 2 − 3 k } ( θ = 1 2 , p 0 = 3 ), etc. (b) { c â‹… 2 − 2 k } ( c ∈ R given, c ≠0 ) also has C -quadratic convergence. (c) The C -quadratic { ( − 1 ) k â‹… 2 − 2 k } is nonmonotone, but with monotone { e k } . (d) x k d = { c ⢠( 1 3 ) 2 k , k odd, 1 c ⢠( 1 3 ) 2 k , k even , c > 1 , does not have C -order 2 : \barbelow {Q} 2 = 1 c 3 , Q ¯ 2 = c 3 . [Ex.~5.6~ and~p.~245] What is the rate of { x k } if { e k } is (a) 10 − 2 , 10 − 3 , 10 − 4 , 10 − 5 , … ; (b) 10 − 2 , 10 − 4 , 10 − 6 , 10 − 8 , … ; (c) 10 − 2 , 10 − 3 , 10 − 5 , 10 − 8 , 10 − 13 , 10 − 21 , … ; (d) 10 − 2 , 10 − 4 , 10 − 8 , 10 − 16 , … ? [see, e.g., ] If it exists, the C -order p 0 > 1 of { x k } is unique. Indeed, if raising the power of the denominator in Q p 0 â¡ ( k ) , the quotient tends to infinity, since Q p 0 + ε â¡ ( k ) = Q p 0 â¡ ( k ) ⢠1 ( e k ) ε ∀ ε > 0 . Similarly, if lowering the power, it tends to zero. Let { 10 − 5 â‹… 2 − 2 k } , { 2 − 2 k / k } , and { x k d } ( c = 10 ); in \cref{fig: Qp(k) k=1-13 p=1.8 2 2.2} we plot { Q 1.8 â¡ ( k ) } , { Q 2 â¡ ( k ) } , and { Q 2.2 â¡ ( k ) } for each of them. The limiting values of \barbelow {Q} p , Q ¯ p from \cref{eq: def barbel Q & bar Q} may be regarded as “functions” of p ≥ 1 . Their graphical illustration leads to the so-called Q -convergence profile of { x k } (or Q -profile here, for short). For all C -orders p 0 > 1 , it holds that Q ¯ p = \barbelow {Q} p = Q p ∀ p ≥ 1 , and Q p is a “function” given by Q p = 0 , p ∈ [ 1 , p 0 ) , Q p 0 ∈ ( 0 , + ∞ ) , and Q p = + ∞ , p > p 0 . In \cref{fig: C-order 2} we plot the Q -profile of { 2 − 2 k } . [] Show that x k → x * cannot have C -order p 0 < 1 . (a) Show that if { x k } has C -order p 0 > 1 , then so has { e k + 1 e k } and Q p 0 ⢠{ e k + 1 e k } = 1 , regardless of Q p 0 ⢠{ e k } ≠0 Rem.~3.9. Does the converse hold? (Hint: take { 1 k } .) (b) Rem.~3.9 Find a similar statement for { e k + 1 ⢠e k } . (c) (Kelley Ex.~5.7.15) If { x k } has C -order p 0 > 1 , show that lg â¡ e k is concave, i.e., lg â¡ e k + 1 − lg â¡ e k is decreasing. Not only convergence with any high order exists, but even the infinite C -order, defined either by Q p = 0 ∀ p > 1 (take, e.g., { 2 − 2 2 k } E 9.1-3(g) or { 2 − 2 k 2 } ) or by convention, when the convergence is in a finite number of steps. “The higher the order, the better” is a well-known cliché, which we will express in \cref{subs: relating rates and orders} in terms of the big Ohs. Q -Order p 0 > 1 We propose the following definitions of Q -order \leavevmode convergence: no Q -order if Q ¯ 1 = ∞ (e.g., { x k a } , { x k c } ); Q -sublinear if 1 ≤ Q ¯ 1 < + ∞ (e.g., x k = 2 k , k odd, x k = 1 k , k even); exact Q -sublinear if 0 < \barbelow {Q} 1 ≤ 1 ≤ Q ¯ 1 < + ∞ ; at least Q -linear if Q ¯ 1 < 1 ; Q -linear if 0 < Q ¯ 1 < 1 ; exact Q -linear if 0 < \barbelow {Q} 1 ≤ Q ¯ 1 < 1 . (a) Obviously, when x k → x * , then \barbelow {Q} 1 ≤ 1 and \barbelow {Q} 1 ≤ Q ¯ 1 . (b) Q ¯ 1 ∈ [ 0 , + ∞ ] always exist, while Q 1 may not (i.e., when \barbelow {Q} 1 < Q ¯ 1 ). (c) Q ¯ 1 < 1 ⇒ monotone, while Q ¯ 1 > 1 ⇒ nonmonotone errors; Q ¯ 1 = ∞ means unbounded nonmonotone errors. Unlike 0 < Q 1 ≤ 1 , 0 < Q ¯ 1 ≤ + ∞ alone does not necessarily imply slow speed (e.g., { x k a } , { x k c } with unbounded nonmonotone errors). The convergence may be fast when \barbelow {Q} 1 = 0 , even if Q ¯ 1 = ∞ (e.g., { x k a } , { x k c } ), but is no longer fast when 0 < Q ¯ 1 . Strict Q -superlinear is an intermediate order between linear and p 0 > 1 . [p.~285] { x k } has Q -superlinear order Or at least Q -superlinear, or even superlinear, as R -superlinear is seldom encountered. if Q ¯ 1 = 0 ⣠( = Q 1 ) : lim k → ∞ e k + 1 e k = 0 ,    ( ⇔ ∃ c k → 0  s.t.  e k + 1 = c k ⢠e k ) . Strict Q -superlinear order holds when, moreover, Q ¯ p = + ∞ ∀ p > 1 . [cf.~(37b)] Q -superlinear order holds if ∃ θ k > 0 and c > 0 such that θ k → 0 and e k = c ⢠∠i = 1 k θ i ( c may be taken to be 1 ). [strict Q -superlinear](a) Let { 1 k k } p.~22, { 1 10 k 2 } , , { 1 k ! } , { 1 c k 2 } , c > 1 E~9.2-1(j). (b) E~10.1-4, p.~94 Given 0 < c < 1 e , let x k + 1 = − x k ln â¡ x k , k ≥ 0 , x 0 = c . { x k b } has Q -superlinear order, but not strict Q -superlinear order. The following formulation has been commonly used for half a century, since the 1970 book of Ortega and Rheinboldt: “ { x k } converges with Q -order at least p 0 > 1 ” , , , . Here we deal, as in and , with a more restrictive notion, with the classic notion corresponding to the lower Q -order q l from \cref{def: lower/upper Q-orders ql & qu}. [see ; cf.~, ] { x k } has Q -order p 0 > 1 if any of the following equivalent conditions hold: lim k → ∞ Q p â¡ ( k ) = { 0 , p ∈ [ 1 , p 0 ) , + ∞ , p ∈ ( p 0 , + ∞ ) ; lim k → ∞ Q L â¡ ( k ) = p 0 ,    Q L â¡ ( k ) := ln â¡ e k + 1 ln â¡ e k ; lim k → ∞ Q Λ â¡ ( k ) = p 0 ,    Q Λ â¡ ( k ) := ln â¡ e k + 2 e k + 1 ln â¡ e k + 1 e k ; or ∀ ε > 0 , ∃ A , B > 0 such that A ⢠e k p 0 + ε ≤ e k + 1 ≤ B ⢠e k p 0 − ε     ∀ k ≥ k 0 . (a) If the Q -order p 0 exists, it is unique (recall \cref{rem: C-order def conv order profile}). (b) If { x k } has Q -order p 0 > 1 , then the right-hand side inequalities in \cref{f. def Q_Ieps} imply that the errors are strictly monotone ( k ≥ k 0 ) ; see \cref{rem C-order p0>1 => ekp1 < ek,rem. C-linear => strict monot errs}. In we proved the equivalence of the four conditions above for { x k } ⊂ R N , connecting and completing the independent, fundamental results of Potra and Beyer, Ebanks, and Qualls . This proof does not simplify for { x k } ⊂ R , but a natural connection between these conditions is found, e.g., by taking logarithms in \cref{eq. C-order p0 <=> (Qp0-eps).ek^p0 <= ekp1 <= (Qp0+eps).ek^p0} to obtain \cref{f. def Q_L}. p 0 in \cref{f. def Q_L} roughly means that the number of figures is multiplied by p 0 at each step ( k ≥ k 0 ); see, e.g., and p.~91. Also, \cref{f. def Q_L} and \cref{f. def Q_Lam} can be connected by \cref{exe. xk C-order p0 => ekp1/ek C-order p0 and Qp0=1} and as follows. If { x k } has strict monotone errors for k ≥ k 0 , prove that \cref{f. def Q_Lam} ⇒ \cref{f. def Q_L} (hint: use the Stolz–Cesà ro Theorem). (a) One can always set k 0 = 0 in \cref{f. def Q_Ieps} (this is actually used in the definitions of Potra from ), by a proper choice of smaller A and larger B . (b) In we have denoted by ( Q ε ) a condition which is trivially equivalent to \cref{f. def Q-order}. Here we use ( Q ε ) instead of ( Q I , ε ) from (“ I ” stood for “inequality” in that paper). (c) \cref{f. def Q-order} implies that the Q -profile of { x k } has a single jump at p 0 > 1 , but that the limit Q p 0 is not required to exist, so 0 ≤ \barbelow {Q} p 0 ≤ Q ¯ p 0 ≤ ∞ , and so six cases result for the possible values of \barbelow {Q} p 0 and Q ¯ p 0 ( 0 , finite > 0 , + ∞ ). (d) Relations \barbelow {Q} p 0 = 0 or Q ¯ p 0 = + ∞ might seem rather abstract, but they do actually occur (even simultaneously, as we shall see for the higher orders of the secant method; see also p.~252~and~Chap.~7, where \barbelow {Q} 2 = Q ¯ 2 = + ∞ , for a problem in R N ). (e) Q -order p 0 > 1 implies Q ¯ 1 = 0 , but the converse is not true. \cref{fig: Q-profiles Q-subquadratic and exact Q-quadratic} shows the Q -profiles of { 2 − 2 k / k } and { x k d } ( c = 5 4 = 1.25 ). Superlinearity may also be defined in conjunction with higher Q -orders, but this is tricky. [see , ] The Q -order p 0 > 1 is called Q -superorder p 0 if Q ¯ p 0 = 0 ⣠( = Q p 0 ) ; Q -suborder p 0 if \barbelow {Q} p 0 = + ∞ ⣠( = Q p 0 ) . In this was defined by Q ¯ p 0 = + ∞ , which we believe is not strong enough. The Q -order 2 with Q ¯ 2 = 0 is Q -superquadratic (analogously, Q -supercubic, etc.). (a) The Q -superquadratic { x Ë™ k } = { 2 − k ⢠2 k } , the Q – (and C -)quadratic { 2 − 2 k } , and the Q -subquadratic { k } = { 2 − 2 k k } are in [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} ∀ α > 1 ] decreasing order of speed. Study the Q -order of z k = { x Ë™ k , k  odd, x Ëš k , k  even, ( 2 is erroneous in ). (b) Although the C -quadratic { 2 − 2 k } is \cref{eq: |x*-xk| = o(|x*-yk|)} faster than { k ⢠2 − 2 k } , the latter, considered in , is actually Q -superquadratic; similarly, { 2 − 2 k / k } is Q -subquadratic. The terminology “ Q -sub/superquadratic” appears here to be in disagreement with the real speed. (c) Determine the Q -orders from \cref{quiz: the rate of 10-1 10-2 10-3} by using \cref{f. def Q_L}. Let { 10 â‹… 2 − 3 ( k − 1 / 4 k ) } ( C -cubic), { 10 â‹… 3 − 2 ( k + 1 / 3 k ) } ( C -quadratic), and { 1 k k } ( Q -superlinear); in \cref{fig: Q_l(k) Q_Lam(k-1)} we see that Q L â¡ ( k ) , Q Λ ⢠( k − 1 ) \leavevmode tend to 3 , 2 , and 1 , \leavevmode respectively. Plotting/comparing Q L â¡ ( k ) and Q Λ ⢠( k − 1 ) avoids conclusions based on Q Λ â¡ ( k ) using information ahead of that from Q L â¡ ( k ) (i.e., x k + 2 vs. x k + 1 ). The calculation of the order may be made easier by using logarithms. Let x k → x * ∈ R . (a) If, for some given q ≥ 1 and A , B > 0 , one has A ⢠e k ⢠e k − 1 q ≤ e k + 1 ≤ B ⢠e k ⢠e k − 1 q ,   k ≥ k 0 , show that { x k } has Q -order λ q := 1 + 1 + 4 ⢠q 2 . Thus, λ 1 ≈ 1.618 (the golden ratio), λ 2 = 2 , λ 3 = 1 + 13 2 ≈ 2.3 , etc. Inequality \cref{f. Aekekm1^p <= ekp1 <= Bekekm1^p} appears in the analysis of the secant method (see \cref{th. secant method – standard convergence,th: secant method Raydan higher orders}), and it does not necessarily attract C – or even exact Q -orders, even if ∃ lim k → ∞ e k + 1 e k ⢠e k − 1 q = c ∈ ( 0 , ∞ ) . (b) Calculate the Q -order of { x k } if it satisfies (see , ) A ⢠e k 2 ⢠e k − 1 ≤ e k + 1 ≤ B ⢠e k 2 ⢠e k − 1 . (c) If { x k } verifies the general relation (see, e.g., Chap.~3, , , , ) A ⢠e k α 0 ⢠⋯ ⢠e k − q α q ≤ e k + 1 ≤ B ⢠e k α 0 ⢠⋯ ⢠e k − q α q , determine the condition verified by the Q -order (see for the exact Q -order). A special case holds when ε = 0 in \cref{f. def Q_Ieps} (see also , , , , ). [see, e.g., ] { x k } has exact Q -order p 0 if ∃ A , B , k 0 ≥ 0 , s.t. A â‹… e k p 0 ≤ e k + 1 ≤ B â‹… e k p 0     ∀ k ≥ k 0 . This case can be characterized in terms of the asymptotic constants \barbelow {Q} p 0 , Q ¯ p 0 . { x k } has exact Q -order p 0 iff \barbelow {Q} p 0 , Q ¯ p 0 are finite and nonzero, 0 < \barbelow {Q} p 0 ≤ Q ¯ p 0 < ∞ , in which case the constants A , B in \cref{eq: exact Q-order Aek^p < ekp1 < Bek^p} are bounded by A ≤ \barbelow {Q} p 0 ≤ Q ¯ p 0 ≤ B , and these bounds are attained in some special restrictive circumstances. (a) { x k d } in \cref{ex. C-order examples} has exact Q -order 2 : \barbelow {Q} 2 = 1 c 3 ≠Q ¯ 2 = c 3 . (b) x k = 2 − 2 k , k odd, x k = 3 â‹… 2 − 2 k − 1 3 3 k , k even, has Q ⢠( 2 ⢠k ) < \barbelow {Q} 2 = 1 9 , i.e., A < \barbelow {Q} 2 . The sequences { k } and { \underaccent {\mathring }{e} k } are asymptotically similar when k = O â¡ ( \underaccent {\mathring }{e} k ) and \underaccent {\mathring }{e} k = O â¡ ( k ) , as k → ∞ , denoted by \underaccent {\mathring }{e} k = Θ â¡ ( k ) (see, e.g., , p.~50). The exact Q -order may also be expressed as e k + 1 = O â¡ ( e k p 0 ) and e k p 0 = O â¡ ( e k + 1 ) , as k → ∞ (see , p.~2, and also ), i.e., e k + 1 = Θ â¡ ( e k p 0 ) . The classical Q -order of Ortega and Rheinboldt is the q l below. [, , , ] The lower/upper Q -orders of { x k } are q l = { ∞ ,    if   Q ¯ p = 0 ∀ p ≥ 1 , inf { p ∈ [ 1 , ∞ ) : Q ¯ p = + ∞ } ,   resp.,   q u = sup { p ∈ [ 1 , ∞ ) : \barbelow {Q} p = 0 } . When { x k } has Q -order p 0 > 1 , the lower and upper orders coincide, q l = q u = p 0 ; otherwise, q l < q u ; we will also see this in \cref{th. Cp0 => Qp0 => Rp0} (relation \cref{f. q_l <= r_l <= r_u <= q_u}). Next we analyze { x k b } from \cref{ex. xk w err betw ord 2 3}, which has no Q -order, despite it being \cref{eq: |x*-xk| <= |x*-yk|} faster than { 1 2 2 k } . We keep in mind that the Q -order does not measure the overall speed, but just compares the consecutive errors. [] Let x k b = { 2 − 2 k , k even, 3 − 2 k , k odd, with the Q -profile in \cref{fig: no Q-order}. { x k b } does not have a Q -order, as q l = log 3 â¡ 4 = 1.2 ⢠… < q u = 4 ⢠log 4 â¡ 3 = 3.1 , … , but still it has (exact) R -order 2 (see \cref{def: R-order by four equiv conds,def. exact R-order}). We note the usual statement in this case that “ { x k b } converges with Q -order at least q l = 1.2 ⢠… and with R -order at least 2 ,” that no longer holds in the setting of this paper. Show that { x k c } has \barbelow {Q} 1 = 0 , Q ¯ 1 = + ∞ and determine q l , q u (here we find q l < 1 , and the Q -profile can be extended, as in , for p < 1 ). Determine the Q -profiles of { x k a } and { x k c } . The inequalities implied by the lower/upper Q -orders are as follows. If 1 < q l < ∞ , then ∀ ε > 0 , with 1 < q l − ε , ∃ B > 0 , k 0 ≥ 0 , s.t. e k + 1 ≤ B â‹… e k q l − ε     ∀ k ≥ k 0 (and relation \cref{eq: def ql} says that q l is the largest value satisfying the above inequalities). When 1 < q u < ∞ , then ∀ ε > 0 , ∃ A > 0 , k 1 ≥ 0 , such that A â‹… e k q u + ε ≤ e k + 1     ∀ k ≥ k 1 . One can say that the lower Q -order is small when the maximum error reduction w.r.t. the previous step is small (i.e., small exponent q l + ε for e k ). Analogously, the upper order is large when the minimum error reduction per step is small. The exact lower/upper Q -orders q l , respectively, q u are defined if ε = 0 in \cref{f. Q-order def lower ql ekp1<=Bek^ql-ep}–\cref{f. Q-order def upper qu Aek^qu+ep <= ekp1}: A â‹… e k q u ≤ e k + 1 ≤ B â‹… e k q l    ∀ k ≥ k 2 . The role of q u will be seen in \cref{ex. Jay xk = 2^-2^k k even 2^-3^k k odd}. The lower and upper Q -orders verify the relations (see , ) q l = \barbelow {Q} L := liminf k → ∞ ln â¡ e k + 1 ln â¡ e k , q u = Q ¯ L := limsup k → ∞ ln â¡ e k + 1 ln â¡ e k , which will be completed in formulae \cref{f. Q_L=Q’_L=q_l and ^}, respectively, \cref{f. barbelow Q_q_u = barbelow Q_q_u^prime} by some computational variants. R -Order p 0 > 1 The root factors consider some averaged quantities instead of relating the consecutive terms to one another; they are defined for p = 1 by R 1 â¡ ( k ) = e k 1 k ,   k ≥ 1. We propose the following terminology (see also , , , , ): R -sublinear/with no R -order if R ¯ 1 = 1 (e.g., { 1 k } , { 1 k } ; see Rem.~1.2.40); R -linear if 0 < R ¯ 1 < 1 ; at least R -linear if R ¯ 1 < 1 (e.g., x k = 1 2 k , k odd, x k = 1 4 k , k even Rem.~1.2.40, or x k = 1 2 k , k even, x k = 1 2 2 k , k odd); exact R -linear if 0 < \barbelow {R} 1 ≤ R ¯ 1 < 1 ; (at least) R -superlinear if R ¯ 1 = 0 ⣠( = R 1 ) (e.g., x k = 1 k k , k even, x k = x k − 1 , k odd). The R -orders are alternatively defined by requiring the errors to be bounded by sequences converging to zero with corresponding Q -orders p.~21, Def.~2.1.3, p.~218: at least R -linear if ∃ θ ∈ ( 0 , 1 ) and c > 0 s.t. e k ≤ c â‹… θ k ( k ≥ 0 ) (37a); R -superlinear if ∃ θ k → 0 and c > 0 s.t. e k ≤ c ⢠∠i = 1 k θ i (37b). (a) At least Q -linear ⇒ at least R -linear E~9.3-5, Thm. 1.2.41, , , p.~213 (using, e.g., \cref{f. ineq liminf akp1/ak liminf ak^1/k etc}); the converse is false (see above or ). (b) Q -superlinear ⇒ R -superlinear, with the converse again being false (see, e.g., the above example). When p > 1 , the root factors are defined by (see~, , , ) R p â¡ ( k ) = e k 1 p k ,   k ≥ 0. [see~; cf.~, ] { x k } has R -order p 0 > 1 if any of the equivalent relations hold: In order that relation \cref{f. def R_Lam} is properly defined and equivalent to the rest of the following definitions, an additional assumption is required (see ): 1 < q l ≤ q u < + ∞ . lim k → ∞ R p â¡ ( k ) = { 0 , p ∈ [ 1 , p 0 ) , 1 , p ∈ ( p 0 , + ∞ ) ; lim k → ∞ R L â¡ ( k ) = p 0 ,   R L := | ln â¡ e k | 1 k ; lim k → ∞ R Λ â¡ ( k ) = p 0 ,   R Λ := | ln â¡ e k + 1 e k | 1 k ; or ∀ ε > 0 , ∃ A , B > 0 , 0 < η , θ < 1 , and k 0 ≥ 0 such that A â‹… η ( p 0 + ε ) k ≤ e k ≤ B â‹… θ ( p 0 − ε ) k    ∀ k ≥ k 0 . (a) If the R -order p 0 exists, it is unique (see also 9.2.3, p.~289). (b) We can always set k 0 = 0 in \cref{f. def R_eps}, by choosing smaller A and larger B suitably (cf.~, where k 0 = 0 was used in definitions). (c) We prefer here the notation ( R ε ) instead of ( R I , ε ) from . For p ≥ 1 one defines the asymptotic quantities (see sect.~9.2 and p.~85) \barbelow {R} p = liminf k → ∞ R p â¡ ( k ) ,   R ¯ p = limsup k → ∞ R p â¡ ( k ) ≤ 1. An example of an R -profile is shown in \cref{fig: conv profile exact R-order 2 for xak} for { x k b } , with (exact) R -order 2 . [p.~290, ] If R ¯ p 0 < 1 , then ∀ ε > 0 with R ¯ p 0 + ε < 1 , ∃ k 0 ≥ 0 s.t. e k ≤ ( R ¯ p 0 + ε ) p 0 k     ∀ k ≥ k 0 , while if \barbelow {R} p 0 > 0 , then ∀ ε > 0 with 0 < \barbelow {R} p 0 − ε , ∃ k 1 ≥ 0 s.t. ( \barbelow {R} p 0 − ε ) p 0 k ≤ e k   ∀ k ≥ k 1 . The following notion is defined similarly to the exact Q -order. []\leavevmode { x k } has exact R -order p 0 if ∃ A , B > 0 , 0 < η , θ < 1 , s.t. A â‹… η p 0 k ≤ e k ≤ B â‹… θ p 0 k   ∀ k ≥ k 0 . { x k b } , { x k c } have exact R -orders 2 : \barbelow {R} 2 = 1 3 , R ¯ 2 = 1 2 , and \barbelow {R} 2 = 1 5 , R ¯ 2 = 1 2 , respectively. The Q -sub-/superquadratic { 2 − 2 k / k } , { k ⢠2 − 2 k } have R 2 = 1 2 , but not exact R -order 2 . The Q -sub-/superquadratic { 2 − 2 k / k } , { 2 − k ⢠2 k } have R 2 = 1 , respectively, R 2 = 0 , i.e., they are R -sub-/superquadratic, again without having exact R -order 2 . Perhaps a proper definition of Q -superorder p 0 would be Q -order p 0 with Q p 0 = 0 = R p 0 , while Q -suborder p 0 would be Q -order p 0 with Q p 0 = ∞ , R p 0 = 1 . We characterize the exact R -order in the following result. The exact R -order p 0 of { x k } is equivalently defined by 0 < \barbelow {R} p 0 ≤ R ¯ p 0 < 1 , which implies the following bounds for η , θ from \cref{eq: def exact R-order A eta^p^k < ek < B theta^p^k}: η ≤ \barbelow {R} p 0 ≤ R ¯ p 0 ≤ θ ; these bounds are attained in some special restrictive circumstances. A particular instance from \cref{eq: def exact R-order A eta^p^k < ek < B theta^p^k}, i.e., e k ≤ B â‹… θ 2 k , was considered as a definition for (at least) R -quadratic convergence, and some computational scientists (who we suspect are not computational optimizers) have misled by simply calling it “quadratic convergence,” leading to the confusion noted in . As the R -orders require weaker conditions than the Q -orders, they may allow nonmonotone errors. This aspect is perhaps not widely known; we have found it pointed out in p.~620, and in the 1st ed., p.~14, p.~51, and . Clearly, iterative methods with nonmonotone errors are usually not desired. We can easily find conditions for monotone errors in the case of exact R -order p 0 > 1 . [monotone errors in exact R -orders] If { x k } obeys \cref{eq: def exact R-order A eta^p^k < ek < B theta^p^k} and p 0 > ln ⡠η ln ⡠θ ⣠( ≥ ln â¡ R ¯ p 0 ln â¡ \underaccent {\bar }{R} p 0 )   or   ( p 0 = ln ⡠η ln ⡠θ and B < A ) , then it has strict monotone errors ( k ≥ k 0 ). The lower and upper R -orders, r l , respectively, r u , and further notations from are r l = inf { p ∈ [ 1 , ∞ ) : R ¯ p = 1 } ,    r u = sup { p ∈ [ 1 , ∞ ) : \barbelow {R} p = 0 } , \barbelow {R} L := liminf k → ∞ | ln â¡ e k | 1 k ,   R ¯ L := limsup k → ∞ | ln â¡ e k | 1 k , \barbelow {R} Λ := liminf k → ∞ | ln â¡ e k + 1 e k | 1 k ,   R ¯ Λ := limsup k → ∞ | ln â¡ e k + 1 e k | 1 k . The lower R -order r l of { x k } was also defined as follows: ∃ { k } with C -order 0 = r l s.t. { x k } is \cref{eq: |x*-xk| <= |x*-yk|} faster than { k } , , Def.~4.1.3. We now consider some sequences similar to those in \cref{ex. xk w err betw ord 2 3}. (a) Let k = 2 − 2 k and take x Ë™ k = { 2 − 2 k , k ⢠e ⢠v ⢠e ⢠n , 2 − 3 k , k ⢠o ⢠d ⢠d (Jay ). Then { x Ë™ k } converges \cref{eq: |x*-xk| <= |x*-yk|} faster than { k } and though { x Ë™ k } has neither C -order, nor Q -order, nor R -order ( r Ë™ l = 2 , r Ë™ u = 3 ), { k } has C -order 2 . (b) Extending the above behavior, x Ë™ k = { 2 − 4 k , k ⢠e ⢠v ⢠e ⢠n , 2 − 5 k , k ⢠o ⢠d ⢠d has no C -, Q -, or R -order, but it converges [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} ∀ α > 1 ] faster than k = 2 − 2 k . (c) x Ë™ k = { 2 − 3 2 k , k ⢠e ⢠v ⢠e ⢠n , 2 − 4 2 k , k ⢠o ⢠d ⢠d is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} ∀ α > 1 ] faster than k = 2 − 2 2 k ; { x Ë™ k } has no C – or Q -order (but has infinite R -order), while { k } has infinite C -order. Here { x Ë™ k } deserves a closer look, as it is a perfect candidate to support the comments from : indeed, it attains infinite R -order, but its errors are unbounded nonmonotone. Nocedal and Wright p.~620 also noted such possible behavior. Clearly, statements such as those in the above example can appear only for sequences { x Ë™ k } with \barbelow {Q} 1 = 0 , as this condition allows large upper orders q u . When \barbelow {Q} 1 > 0 , the corresponding sequence cannot converge quickly. Relations between the C -, Q -, and R -Orders of a Sequence C -order p 0 > 1 of { x k } implies Q -order p 0 and in turn R -order p 0 (see \cref{exe. xk has Q-order 2 but no C-order,ex. xk = 2^-2^k k even =3^-2^k k odd} for converses). We state this below and, as in , we use curly braces for equivalent orders. [see ; cf.~, ] Let x k → x * and p 0 > 1 . Then (see footnote~) { C , C Q , C ε } ⇒ ⇠{ Q , Q L , Q Λ , Q ε } ⇒ ⇠{ R , R L , R Λ , R ε } , or, in a more generic fashion (i.e., { C } := { C , C Q , C ε } ), { C } ⇒ ⇠{ Q } ⇒ ⇠{ R } . Moreover, the following relation holds for the lower and upper orders: q l = \barbelow {Q} L ≤ \barbelow {R} L = \barbelow {R} Λ = r l ≤ r u = R ¯ L = R ¯ Λ ≤ Q ¯ L = q u . Relation \cref{f. q_l <= r_l <= r_u <= q_u} is obtained from the well-known inequalities for positive numbers, liminf k → ∞ a k + 1 a k ≤ liminf k → ∞ | a k | 1 k ≤ limsup k → ∞ | a k | 1 k ≤ limsup k → ∞ a k + 1 a k , taking a k = | ln â¡ e k | ; see and . Any inequality from \cref{f. q_l <= r_l <= r_u <= q_u} may be strict. Now we see (e.g., in \cref{ex. xk = 2^-2^k k even =3^-2^k k odd,ex Potra-Ptak xk w r-order large and q-order small}) that the inner inequality may be an equality (i.e., obtain R -order), while one of the outer inequalities may be strict (i.e., no Q -order). An { x k } with exact R -order Ï„ > 1 arbitrarily large and 1 < q l (but arbitrarily close) is again suitable for justifying the comments from . [E~9.3-3, ] Given any numbers 1 < s < Ï„ , take 0 < θ < 1 , η = θ q with q > 1 such that q ⢠s > Ï„ . Then x k = { θ Ï„ k , k odd, η Ï„ k , k even has exact R -order Ï„ , while q l = \barbelow {Q} L = Ï„ q and q u = Q ¯ L = Ï„ ⢠q ( > Ï„ ), and thus it has no Q -order (in the classical sense from { x k } has Q -order at least Ï„ q ). Comparing the Speeds of Two Sequences We consider only C -orders. [higher C -order, faster speed] If { k } , { x Ë™ k } have C -orders 1 < 0 < p Ë™ 0 , then { x Ë™ k } is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} ∀ α > 1 ] faster than { k } . For the proof, one may use \cref{eq. C-order p0 <=> (Qp0-eps).ek^p0 <= ekp1 <= (Qp0+eps).ek^p0}. In we show some simplified proofs. (a) \cref{comp: C-order xk < yk} holds regardless of the magnitude of p Ë™ 0 − 0 . (b) The C -orders (i.e., sublinear, linear, strict superlinear, 1 < 0 < p Ë™ 0 , and infinite) form classes in [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} ∀ α > 1 ] increasing speed (cf.~\cref{exe. C-sublinear is ek<=Oek^a faster than C-linear}). As seen in previous examples, comparing by [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))} ∀ α > 1 ] is much stronger than by [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))}, α ∈ ( 1 , α 0 ] ] for some given α 0 > 1 . How similarly do two sequences with the same C -order and identical Q p 0 behave? { 1 3 2 k } is [ \cref{eq: |x*-yk| = O(|x*-xk|^(1+al))}, α ∈ ( 1 , ln â¡ 3 ln â¡ 2 ] ] faster than { 1 2 2 k } (recall \cref{exe. 1/3^2^k = O((1/2^2^k)^1+alpha)}). This shows that if { k } , { x Ë™ k } have C -order p 0 with the same Q p 0 , this does not necessarily mean that (in the asymptotic range) one has { e Ë™ k } ≈ { k } . Brezinski noted that both { 1 k } and { 1 k 2 } have the same Q 1 = 1 (i.e., C -sublinear order), but quite different speeds. Assessing the asymptotic constant Q p 0 may not make sense when { x k } ⊂ R N , N ≥ 2 , as its definition is norm-dependent E~9.1-2, . Computational Versions of the Convergence Orders The error-based analysis of the orders above is similar in some ways to the local convergence theory for iterative methods from \cref{sec: iterative meth for nonlin eq}: both assume the existence of the limit/solution x * and infer essential properties, even if x * is not known and in practice one needs to use quantities based solely on information available at each step k . Next, we analyze two practical approaches, equivalent to error-based analysis: the replacing of | x * − x k | by (the absolute value of) either the corrections s k := | x k + 1 − x k | or the nonlinear residuals | f k | := | f â¡ ( x k ) | . We keep the notation from previous work and obtain a rather theoretical setting in this analysis (e.g., s k requiring x k + 1 ); in numerical examples, however, we will use only information available at step k (i.e., x k , | f k | , s k − 1 , x k − 1 , | f k − 1 | , s k − 2 , … , etc.). Computational Convergence Orders Based on Corrections When the corrections { s k } converge with lower Q -order q l , then { x k } also converges and attains at least lower R -order q l . [see Thm.~1.2.42, Lem.~4.5.6] Let { x k } ⊂ R . If ∃ c ∈ ( 0 , 1 ) and k 0 ∈ N with | x k + 1 − x k | ≤ c â‹… | x k − x k − 1 |   ( i.e.,  s k ≤ c â‹… s k − 1 ) ,   k ≥ k 0 + 1 , then ∃ x * ∈ R such that x k → x * at least R -linearly. If ∃ c > 0 , p 0 > 1 , and k 0 ∈ N s.t. c 1 p 0 − 1 ⢠s k 0 < 1 and s k ≤ c â‹… s k − 1 p 0 ,   k ≥ k 0 + 1 , then ∃ x * ∈ R such that x k → x * with lower R -order at least p 0 . The errors and corrections are tightly connected when the convergence is fast. [Potra–Pták–Walker Lemma; Prop.~6.4, ] x k → x * Q -superlinearly iff x k + 1 − x k → 0 Q -superlinearly. In the case of Q -superlinear convergence, it holds (the Dennis–Moré Lemma ) that lim k → ∞ | x k + 1 − x k | | x * − x k | = 1. (a) As pointed out by Dennis and Moré in , \cref{eq: lim sk/ek = 1 Dennis-More} alone does not imply Q -superlinear convergence: take x 2 ⢠k − 1 = 1 k ! , x 2 ⣠k = 2 ⢠x 2 ⢠k − 1 , k ≥ 1 , for example. (b) While the statement and the proofs of this result consider multidimensional spaces and norms (or even metrics) in \cref{eq: lim sk/ek = 1 Dennis-More}, Brezinski has noted that in R the sufficiency can also be proved by using l’Hôpital’s rule for sequences. As in , we use an apostrophe for the resulting quotient factors ( p > 1 ): Q p ′ â¡ ( k ) := s k + 1 s k p = | x k + 2 − x k + 1 | | x k + 1 − x k | p ,   k ≥ 0 , and the above lemma shows that C -order p 0 ⇒ C ⢠‘ -order p 0 with Q p 0 = Q p 0 ′ (see also ). The same statement holds for the Q -orders and corresponding upper/lower limits of Q p 0 â¡ ( k ) . The equivalence of the error-based and computational orders ( { C , C ⢠‘ } , etc.) is stated in \cref{subsec: Equiv of conv orders and computational versions}, which incorporates the nonlinear residuals as well. The C ′ -, Q ′ -, and Q ε ′ -orders are semicomputational: they do not use x * but still require p 0 . Instead, of much practical interest are the (full) computational expressions lim k → ∞ Q L ′ â¡ ( k ) = p 0 ,    Q L ′ â¡ ( k ) := ln â¡ s k + 1 ln â¡ s k , lim k → ∞ Q Λ ′ â¡ ( k ) = p 0 ,    Q Λ ′ â¡ ( k ) := ln â¡ s k + 2 s k + 1 ln â¡ s k + 1 s k , as they are equivalent to the corresponding error-based orders (see \cref{cor. equiv of all C C’ C” resp Q and R-orders}). Computational Convergence Orders Based on Nonlinear Residuals In \cref{sec: iterative meth for nonlin eq} we study the speed of some iterative methods toward a zero x * of f . The nonlinear mapping f : D ⊆ R → R is assumed to be sufficiently many times differentiable and the solution is assumed simple: This terminology is used for equations in R , while for systems in R N the terminology is “nonsingular,” as (the Jacobian) F ′ â¡ ( x * ) is a matrix. f ′ â¡ ( x * ) ≠0 (this means that f â¡ ( x ) = ( x − x * ) ⢠g â¡ ( x ) and g â¡ ( x * ) ≠0 ). However, we will consider multiple solutions as well. The use of the nonlinear residuals for controlling the convergence to x * is natural, as they can be seen as “proportional” to the errors: by the Lagrange theorem, f â¡ ( x ) = f ′ â¡ ( x + θ ⢠( x * − x ) ) ⢠( x − x * )    for some  θ ∈ ( 0 , 1 ) , which in applied sciences is usually written as f â¡ ( x ) ≈ f ′ â¡ ( x * ) ⢠( x − x * )   ( x ≈ x * ) . However, instead of an asymptotic relation, this is often regarded as an approximate equality. Such an approach is as precise as the idiomatic “the higher the order, the fewer the iterations (needed)”: it may not refer to the asymptotic range. If f is a polynomial of degree 2 with f ′ â¡ ( x * ) = 1 at the root x * = 0 , how large can | f â¡ ( x ) | be when | x − x * | = 0.0001 holds (and then for | x − x * | = 10 − 16 )? Returning to the quotient factors, we use, as in , double prime marks, Q p 0 ′ ′ â¡ ( k ) := | f â¡ ( x k + 1 ) | | f â¡ ( x k ) | p 0 = : | f k + 1 | | f k | p 0 , and notice that if Q p 0 ∈ ( 0 , + ∞ ) , then, by \eqref{f. Lagrange f(x) approx f'(x*)(x-x*)}, Q p 0 ′ ′ â¡ ( k ) → Q p 0 | f ′ â¡ ( x * ) | p 0 − 1 . This leads us to the C ′ ′ – and Q ′ ′ -orders p 0 > 1 . For instance, The Equivalence of the Error-Based and Computational Orders This equivalence was given in R N (see also and ). Here we formulate it as a corollary, since in R the three C -orders are equivalent (see Ex. 3.22 for { C ′ ′ } ⇠{ C , C ′ } in R N ). [cf.~] Let f be smooth, x * simple, x k → x * , p 0 > 1 . Then (see footnote ) { C , C ′ , C ′ ′ } ⇒ ⇠{ Q , Q ′ , Q ′ ′ } ⇒ ⇠{ R , R ′ , R ′ ′ } . Moreover, the lower/upper orders and asymptotic constants obey Consequently, one has C -order p 0 > 1 iff 0 < Q p 0 = Q p 0 ′ = | f ′ â¡ ( x * ) | p 0 − 1 ⢠Q p 0 ′ ′ < ∞ and Q -order p 0 > 1 iff Relation \cref{f. Q_L=Q’_L=q_l and ^} shows equalities of orders, while \cref{f. barbelow Q_q_u = barbelow Q_q_u^prime} shows that of constants. The equivalence of the computational orders based on the corrections follows from the Potra–Pták–Walker lemma, while \cref{f. Q_L=Q’_L=q_l and ^} and \cref{f. barbelow Q_q_u = barbelow Q_q_u^prime} follow from the Dennis–Moré lemma. The equivalence of the computational orders based on the nonlinear residuals is obtained using the Lagrange theorem, by \cref{f. Lagrange f(x) approx f'(x*)(x-x*)}. Iterative Methods for Nonlinear Equations Unless otherwise stated, the nonlinear mapping f : D ⊆ R → R is sufficiently smooth and the solution x * is simple. Since an equation f â¡ ( x ) = 0 might have a unique, several (even infinite), or no solution at all, the iterative methods usually need an initial approximation close to an x * . [Def.~10.1.1] A method converges locally to x * if ∃ V ⊆ D (neighborhood of x * ) s.t. ∀ x 0 ∈ V , the generated iterations { x k } ⊂ D and x k → x * . The Newton Method The first result on the local convergence of a method was given for the Newton iterations (see NR~10.2-1, , sect.~6.4), x k + 1 = x k − f â¡ ( x k ) f ′ â¡ ( x k ) ,   k = 0 , 1 , … , x 0 ∈ D , and it is to due to Cauchy in 1829 (, available on the Internet). These iterates are also known as the tangent method, though their initial devising did not consider this geometric interpretation. The first such interpretation was given by Mourraille (1768) sect.~6.3. History This method can be traced back to ancient times (Babylon and Egypt, ca.~1800 B.C.), as it is conjectured that it was used in the computation (with five correct—equivalent—decimal digits) of 2 (see , sect.~1.7, , with the references therein, and also p.~39 for an annotated picture). While this is not certain, there is no doubt that Heron used these iterations for a . “Heron of Alexandria (first century A.D.) seems to have been the first to explicitly propose an iterative method of approximation in which each new value is used to obtain the next value,” as noted by Chabert et al.~p.~200. Indeed, Heron has approximated 720 in an iterative fashion by elements corresponding to the Newton method for solving x 2 − 720 = 0 sect.~7.1. We can also recognize the Newton method for x 2 − A = 0 used by Theon of Alexandria (ca.~370 A.D.) in approximating 4500 , based entirely on geometric considerations (originating from Ptolemy; see ). See sect.~7.2. So efficient is this method in computing two of the basic floating point arithmetic operations—the square root and division (see \cref{exe: Newton iterates for sqrt and division})—that it is still a choice even in current (2021) codes (see, e.g., pp.~138,~124). Let us recall some general comments on the subsequent history of this method, thoroughly analyzed by Ypma in . A method algebraically equivalent to Newton’s method was known to the 12th century algebraist Sharaf al-DÄ«n al-ṬūsÄ« […], and the 15th century Arabic mathematician Al-KÄshÄ« used a form of it in solving x p − N = 0 to find roots of N . In western Europe a similar method was used by Henry Briggs in his Trigonometria Britannica, published in 1633, though Newton appears to have been unaware of this […]. (see also ) In solving nonlinear problems using such iterates, Newton ( ≈ 1669 ) and subsequently Raphson (1690) dealt only with polynomial equations, x 3 − 2 ⢠x − 5 = 0 is “the classical equation where the Newton method is applied” . , Actually, as noted by Ypma, Newton considered such iterations in solving a transcendent equation, namely, the Kepler equation x − e ⢠sin â¡ x = M . “However, given the geometrical obscurity of the argument, it seems unlikely that this passage exerted any influence on the historical development of the Newton-Raphson technique in general” . described in . Newton considered the process of successively computing the corrections, which are ultimately added together to form the final approximation (cf.~); see \cref{alg. devised by Newton}. The algorithm devised by Newton (left), and his first example (right) \STATE{Let $f(x)$, $x_0$ \hfill [$f(x)=x^3 -2x -5$, $x_0=2$]} \STATE{ Let $g_0(s)=f(x_0+s)$ \hfill $[g_0(s)=(2+s)^3 – 2(2+s) – 5 =s^3 + 6s^2 + 10s – 1$] } \STATE{ For $k=0:m-1$ } \STATE{ \qquad Compute $\Tilde {g}_k(s)$ (the order-one approx. of $g_k(s)$) \hfill [$\Tilde {g}_0(s) = 10s – 1$] } \STATE{ \qquad Solve for $s_k$ in $\Tilde {g}_k(s)=0$ \hfill [$s_0=0.1$] } \STATE{ \qquad Let $g_{k+1}(s)=g_k(s_k+s)$ \hfill [$g_1(s)=g_0(s_0+s)=s^3 + 6.3s^2 + 11.23s + 0.061$] } \STATE{ $x_n:=x_0 + \sum _{k=0}^{n-1} s_k$ \hfill [$x_n = 2 + 0.1 – \frac {0.061}{11.23}+\cdots + s_{n-1}$] } Raphson considered approximations updated at each step, a process equivalent to \cref{f. Newton method} (see , , ). However, the derivatives of f (which could be calculated with the “fluxions” of that time) do not appear in their formulae, only the first-order approximations from the finite series development of polynomials (using the binomial formula). For more than a century, the belief was that these two variants represented two different methods; “Nearly all eighteenth century writers and most of the early writers of the nineteenth century carefully discriminated between the method of Newton and that of Raphson” (Cajori ). For a given input ( n fixed), the result of \cref{alg. devised by Newton} may depend on how accurately the different fractions (e.g., 0.061 11.23 ) are computed. it was Stewart (1745) who pointed out that they are in fact equivalent (see ). As noted by Steihaug , It took an additional 50 years before it was generally accepted that the methods of Raphson and Newton were identical methods, but implemented differently. Joseph Lagrange in 1798 derives the Newton-Raphson method […] and writes that the Newton’s method and Raphson’s method are the same but presented differently and Raphson’s method is plus simple que celle de Newton. Simpson (1740) was the first to apply the method to transcendent equations, using “fluxions.” The fluxions \Dot {x} did not represent f ′ , but are “essentially equivalent to d ⢠x / d ⢠t ; implicit differentiation is used to obtain d ⢠y / d ⢠t , subsequently dividing through by d ⢠x / d ⢠t as instructed produces the derivative A = d ⢠y / d ⢠x of the function. […] Thus Simpson’s instructions closely resemble, and are mathematically equivalent to, the use of [\cref{f. Newton method}]. The formulation of the method using the now familiar f ⢠‘ â¡ ( x ) calculus notation of [\cref{f. Newton method}] was published by Lagrange in 1798 […]” (see also ). , Even more important, he extended it to the solving of nonlinear systems of two equations (see , , ), The first nonlinear system considered was y + y 2 − x 2 − 10 = 0 , x + y 2 + x − 12 = 0 , with ( x 0 , y 0 ) = ( 5 , 6 ) , according to and . subsequently generalized to the usual form known today: F â¡ ( x ) = 0 with F : R N → R N , for which, denoting the Jacobian of F at x k by F ⢠‘ â¡ ( x k ) , one has to solve for s k at each step the linear system F ⢠‘ â¡ ( x k ) ⢠s = − F â¡ ( x k ) . Cajori noted (see also ): Then appear writers like Euler, Laplace, Lacroix and Legendre, who explain the Newton-Raphson process, but use no names. Finally, in a publication of Budan in 1807, in those of Fourier of 1818 and 1831, in that of Dandelin in 1826, the Newton-Raphson method is attributed to Newton. The immense popularity of Fourier’s writings led to the universal adoption of the misnomer “Newton’s method” for the Newton-Raphson process. While some authors use “the Newton–Raphson method,” “Who was `–Raphson’?” (N.~Bićanić and K.~H. Johnson, 1979) has the popular answer “Raphson was Newton’s programmer” (S.~Nash, 1992). the name Simpson—though appropriate “[…] it would seem that the Newton–Raphson–Simpson method is a designation more nearly representing the facts of history in reference to this method” (Ypma ). “I found no source which credited Simpson as being an inventor of the method. None the less, one is driven to conclude that neither Raphson, Halley nor anyone else prior to Simpson applied fluxions to an iterative approximation technique” (Kollerstrom ).—is only occasionally encountered (e.g., , , , ). In a few cases “Newton–Kantorovich” appears, but refers to the mid-1900 contributions of Kantorovich to semilocal convergence results when F is defined on normed spaces. Attainable C -Orders The method attains at least C -order 2 for simple solutions. Regarding rigorous convergence results, “Fourier appears to have been the first to approach this question in a note entitled Question d’analyse algébrique (1818) […], but an error was found in his formula,” as noted by Chabert et al. sect. 6.4, who continue by noting that Cauchy studied the subject from 1821 onwards […], but did not give a satisfactory formulation until 1829. […] The concern for clarity and rigour, which is found in all of Cauchy’s work, tidies up the question of the convergence of Newton’s method for the present. [cf.~, Thm.~II,~p.~184] If x * is simple and f ′ ′ â¡ ( x * ) ≠0 , then the Newton method converges locally, with C -quadratic order: lim k → ∞ | x * − x k + 1 | | x * − x k | 2 = | f ′ ′ â¡ ( x * ) 2 ⢠f ′ â¡ ( x * ) | = : Q 2 . The usual proofs for \cref{f. Newton method C2=|f”/2f’| for x* simple} consider the Taylor developments of f â¡ ( x k ) and f ′ â¡ ( x k ) , assuming small errors and neglecting “higher-order terms”: â„“ k + 1 = â„“ k − f â¡ ( x * ) + f ′ â¡ ( x * ) ⢠ℓ k + ⋯ f ′ â¡ ( x * ) + f ′ ′ â¡ ( x * ) ⢠ℓ k + ⋯ ≈ f ′ ′ â¡ ( x * ) 2 ⢠f ′ â¡ ( x * ) ⢠ℓ k 2    ( â„“ k := x k − x * ,  i.e., signed ) . (a) Write the Newton iterates for computing 2 (cf.~p.~138). (b) (Newton everywhere sect.~17.9) Apply \cref{f. Newton method} to compute the division a / b of real numbers in terms of only the operations + , − , * (cf.~p.~124). (a) The usual perception is that the smaller Q 2 , the faster the asymptotic speed; expression \cref{f. Newton method C2=|f”/2f’| for x* simple} shows that for the Newton method this is realized with smaller | f ⢠‘ ⢠‘ | and larger | f ⢠‘ | , and this means that near x * , the graph of f resembles a line ( | f ⢠‘ ⢠‘ | small) having large angle with O ⢠x (the line is not close to O ⢠x ). In the opposite situation, when the angle is zero ( x * is multiple), the convergence is only linear. While this interpretation holds locally for the asymptotic range of the Newton iterates, it is worth mentioning that the larger the region with such a property, the larger the attraction set (see also \cref{f. Newton meth estimation radius attr ball}), and the better conditioned the problem of solving f â¡ ( x ) = 0 (see \cref{subsec: Newton method floating point arithmetic}). We note that taking 2 ⢠f â¡ ( x ) instead of f â¡ ( x ) , despite doubling f ′ â¡ ( x ) , does not reduce Q 2 , leading in fact to the same iterates (this property is called affine invariance; see ). (b) Of course, it would be of major interest to state a conclusion like “the smaller Q 2 , the fewer the iterates needed.” However, even if supported by a number of numerical examples, such a statement should be treated with great care. The reason is simple: we do not address similar objects ( Q 2 is an asymptotic constant, referring to k → ∞ , while the desired conclusion refers to a finite number of steps). Moreover, when we say “the smaller Q 2 ,” this means that we consider different mappings f ; even if they have the same x * , the attraction set and the dynamics of the iterates may be different. The role of the asymptotic constants will be analyzed in the forthcoming work . Let us analyze the expectation that this method behaves similarly for the same Q 2 . (a) Let a ∈ R be given and f â¡ ( x ) := x + x 2 + a ⢠x 3 = 0 ,    with  x * = 0. We get f ′ â¡ ( 0 ) = 1 and f ′ ′ â¡ ( 0 ) = 2 , so by \cref{f. Newton method C2=|f”/2f’| for x* simple}, Q 2 = 1 ( ∀ a ∈ R given). The full Taylor series gives the exact relation for the signed errors from \cref{f. Newton method ekp1=-f”/f’ ek^2}: â„“ k + 1 = 1 + 2 ⢠a ⢠ℓ k 1 + 2 ⢠ℓ k + 3 ⢠a ⢠ℓ k 2 ⢠ℓ k 2 . Take x 0 = 0.0001 ⣠( = e 0 ) fixed. By \cref{f. Newton method ekp1=-f”/f’ ek^2}, we should obtain | â„“ 1 | = e 1 ≈ e 0 2 = 10 − 8 for any a > 0 . In \cref{tab. errors in one Newton step for a} we see this does not hold when a has large values. As | a | grows, despite being “small,” the error e 0 in fact is not “squared,” as suggested by \cref{f. Newton method ekp1=-f”/f’ ek^2}. Indeed, when | a | is very large, in \cref{f. Newton method ekp1=-f”/f’ ek^2} the terms with larger errors (corresponding to f ′ ′ ′ â¡ ( x ) = 6 ⢠a ) are actually neglected, despite their having higher powers. (b) Taking x 0 = 0.0001 for f as above and then for f â¡ ( x ) = x + 2 ⢠x 2 shows that Q 2 and the number of iterates needed (for the same accuracy) are not in direct proportion. \renewcommand{\arraystretch }{1.2} e 1 = | â„“ 1 | computed in two equivalent ways, for different a ‘s (\ttdigits64, Julia).
a
ℓ 1 ⣠( = x 1 )
â„“ 1 by \cref{f. ekp by frac of full Taylor series}
1
9.999999700077396e-9
9.999999700059996e-9
10
1.0017993395925492e-8
1.0017993395922797e-8
10 5
2.0933014354066327e-7
2.093301435406699e-7
10 6
1.9510774606872468e-6
1.951077460687245e-6
10 7
1.5389940009229362e-5
1.5389940009229348e-5
Igarashi and Ypma noticed that a smaller Q 2 does not necessarily attract fewer iterates for attaining a given accuracy. However, the numerical examples used to support this claim were based not on revealing the individual asymptotic regime of the sequences, but on the number of iterates required to attain a given (\ttdigits64) accuracy. If f ′ ′ â¡ ( x * ) = 0 , then \cref{f. Newton method C2=|f”/2f’| for x* simple} implies Q -superquadratic order, but actually the order is higher: it is given by the first index ≥ 2 of the nonzero derivative at the solution. [see, e.g., ] Let x * be simple. The Newton method converges locally with C -order p 0 ∈ N , p 0 ≥ 2 , iff f ( p 0 ) â¡ ( x * ) ≠0   a ⢠n ⢠d   f ′ ′ â¡ ( x * ) = ⋯ = f ( p 0 − 1 ) â¡ ( x * ) = 0 , in which case Q p 0 = p 0 − 1 p 0 ! ⢠| f ( p 0 ) â¡ ( x * ) f ′ â¡ ( x * ) | . Examples for the Attained C -Orders We consider decreasing orders. (a) \em{Infinite $C$-order.} If f â¡ ( x ) = a ⢠x + b , a ≠0 , then x 1 = x * ∀ x 0 ∈ R . (b) \emArbitrary natural C -order p 0 ≥ 2 (see \cref{rem: real order p0>=2 for Newton method} for p 0 ∉ N ). Let p ≥ 2 be a natural given number and consider f â¡ ( x ) = x + x p ,   x * = 0 , p = 2 , 3 , 4 . By \cref{th: Dubeau-Gnang characterize C-conv orders p>2 Newton method}, we obtain local convergence with C -order p and Q p = p − 1 . Let us first deal with p = 2 . In double precision, for x 0 = 0.46 we obtain six nonzero iterates (shown truncated in \cref{tab. Newton iterates double prec}). In analyzing Q L ⢠( k − 1 ) (shown with the last six decimal digits truncated), we note that Q L â¡ ( 4 ) is a better approximation than the last one, Q L â¡ ( 5 ) . This phenomenon may appear even if we increase the precision (see Figure~). When x 0 = 0.47 and 0.45 we obtain five nonzero iterates (see \cref{exe. five Newton iterates for x0=0.47 and 0.45}), and Q L â¡ ( k ) is increasingly better. \renewcommand{\arraystretch }{1.2} Newton iterates for x + x 2 = 0 (\ttdigits64, Julia).
k
x k
Q L ⢠( k − 1 )
x k
Q L ⢠( k − 1 )
x k
Q L ⢠( k − 1 )
0
0.47
0.46
0.45
1
1.1 ⢠e − 1
2.877706159
1.1 ⢠e − 01
2.840052802
1.0 ⢠e − 1
2.803816781
2
1.0 ⢠e − 2
2.094428776
9.9 ⢠e − 03
2.090320979
9.3 ⢠e − 3
2.086305525
3
1.0 ⢠e − 4
2.004592994
9.7 ⢠e − 05
2.004275304
8.6 ⢠e − 5
2.003972042
4
1.1 ⢠e − 8
2.000023942
9.4 ⢠e − 09
2.000021019
7.4 ⢠e − 9
2.000018386
5
1.4 ⢠e − 16
2.000000001
8.8 ⢠e − 17
2.000000001
5.4 ⢠e − 17
2.000000001
6
0
1.2 ⢠e − 32
1.987981962
0
7
0
0
For p = 3 and 4 , we obtain four nonzero iterations for the three choices of x 0 , and therefore a single meaningful term in computing { Q Λ ′ â¡ ( k ) } . The higher the convergence order, the faster fl â¡ ( x * ) may be reached, and we might end up with fewer distinct terms. Higher precision is needed in this study. We have chosen to use the Julia language here, which allows not only quadruple precision (\ttdigits128, compliant with IEEE 754-2008) but arbitrary precision as well (by using the \ttsetprecision command). Some examples are posted at \url{https://github.com/ecatinas/conv-ord} and may be easily run online (e.g., by IJulia notebooks running on Binder). Such examples may be also performed in MATLAB (by choosing the Advanpix toolbox, which handles arbitrary precision), or in other languages. In \cref{fig: QL and QLam for f(x)=x+x^q q=4 3 2} we show the resulting Q L ⢠( k − 1 ) , Q Λ ⢠( k − 2 ) , Q L ′ ⢠( k − 2 ) , Q Λ ′ ⢠( k − 3 ) (in order to use only the information available at step k ). We see that all four (computational) convergence orders approach the corresponding p ‘s, but we cannot clearly distinguish their speed. In \cref{fig. Newton |p-QL| and |p-QLam|} we present | Q Λ ⢠( k − 2 ) − p | and | Q L ⢠( k − 1 ) − p | . Examining the numerator and denominator in the Newton corrections, give the two different reasons why x 7 is 0 in \cref{tab. Newton iterates double prec} (when x 0 = 0.47 and 0.45 ). When is \ttrealmin or \ttmachine epsilon (or both) too large? (c) \em C -order p ∈ ( 1 , 2 ) . The C -order 2 is lost with the smoothness of f (when ∄ f ′ ′ ). [E~10.2-4, ] The Newton method has C -order 1 + α for f â¡ ( x ) = x + | x | 1 + α ,   α ∈ ( 0 , 1 ) given ,   x * = 0. Modifying the above example by taking f â¡ ( x ) = x + | x | p 0 shows that the Newton method may in fact attain any real nonnatural order p 0 ∈ ( 2 , + ∞ ) . (d) \em C -linear order. This appears for a multiple solution x * , which is implicitly assumed to be isolated ( f ′ â¡ ( x ) ≠0 ∀ x ≠x * in a neighborhood of x * ). Otherwise, an iterative method would not be able to converge to a specific solution we were interested in (take, e.g., the constant function f â¡ ( x ) = 0 with R as the solution set, or f â¡ ( x ) = x ⢠sin â¡ 1 x , x ≠0 , f â¡ ( 0 ) = 0 ). Indeed, if the multiplicity of x * is q ∈ N , f â¡ ( x * ) = f ′ â¡ ( x * ) = ⋯ = f ( q − 1 ) â¡ ( x * ) = 0 ,   f ( q ) â¡ ( x * ) ≠0 (i.e., f â¡ ( x ) = ( x − x * ) q ⢠g â¡ ( x ) , g â¡ ( x * ) ≠0 ), then (see, e.g., E~10.2-8, , ) Q 1 = 1 − 1 q . If q is known, one may use the Schröder formula to restore the C -order 2 : x k + 1 = x k − q ⢠f â¡ ( x k ) f ′ â¡ ( x k ) ,    with  Q 2 = | f ( q + 1 ) â¡ ( x * ) | q ⢠( q + 1 ) ⢠| f ( q ) â¡ ( x * ) |  (see, e.g., ) . (a) Calculate the Newton and Schröder iterates for f â¡ ( x ) = x 2 . (b) The Schröder iterates are the Newton iterates for f â¡ ( x ) q = ( x − x * ) ⢠g â¡ ( x ) q . Approximation of the unknown q is surveyed in sect.~7.9; see also sect.~6.6.2. (e) \em C -sublinear order. Let f â¡ ( x ) = e − 1 / x 2 , x ≠0 , f â¡ ( 0 ) = 0 . Show that the Newton method converges locally to x * = 0 , with Q 1 â¡ ( { x k } ) = 1 , i.e., C -sublinearly. Convexity The influence of x 0 and the convexity of f were first analyzed by Mourraille (1768) sect.~6.3, in an intuitive fashion. In 1818, Fourier (, available on the Internet) gave the condition f â¡ ( x 0 ) ⢠f ′ ′ â¡ ( x 0 ) > 0 , which ensures the convergence of the Newton method, provided f maintains the monotony and convexity in the interval determined by x * and x 0 . This condition can lead to sided convergence intervals (i.e., not necessarily centered at x * ). Clearly, the method may converge even on the whole axis, a result which is more interesting to consider in R n (cf. the Baluev theorem Thm.~8.3.4). (a) By \cref{f. Newton method ekp1=-f”/f’ ek^2}, the signs of the errors are periodic with period either 1 (i.e., monotone iterates) or 2 ( k ≥ k 0 ). (b) The Fourier condition is sufficient for convergence but not also necessary; the iterates may converge locally (by \cref{th: standard C-quadratic conv of Newton method}) even when \cref{f. Fourier condition f(x0)f”(x0)} does not hold: take, e.g., f â¡ ( x ) = sin â¡ ( x ) , x * = 0 , with the signs of errors alternating at each step ( k ≥ k 0 ). Attraction Balls Estimations for B r â¡ ( x * ) = { x ∈ D : | x * − x | < r } = ( x * − r , x * + r ) as V in \cref{def: local conv of an iter method} are usually given using the condition | f ′ â¡ ( x ) − f ′ â¡ ( y ) | ≤ L ⢠| x − y |   ∀ x , y ∈ D . The Lipschitz constant L of f ′ measures the nonlinearity of f (i.e., how much the graph of f is bent); its exact bound shows this is in accordance with \cref{rem: Newton method best Q2 w large f’ small f”}, since L ≥ sup x ∈ D | f ′ ′ â¡ ( x ) | . The usual assumptions also require | f ′ â¡ ( x * ) | ≥ 1 β , and the Dennis–Schnabel , Rheinboldt , and Deuflhard–Potra estimations are, respectively, r = 1 2 ⢠β ⢠L ,   r = 2 3 ⢠β ⢠L ,   and  r = 2 ω , where ω satisfies | f ′ ⢠( x ) − 1 | â‹… | f ′ â¡ ( x + t ⢠( y − x ) ) − f ′ â¡ ( x ) | ≤ t ⢠ω ⢠| y − x | ∀ t ∈ ( 0 , 1 ) , ∀ x , y ∈ D , i.e., it can be viewed as a combination of both L and β (see also ). We note that \cref{f. Newton meth estimation radius attr ball} estimates a small r for \cref{ex: Newton method for x + x^2 + ax^3 =0}, as L is large. Floating Point Arithmetic Curious behavior may arise in this setting. The derivative f ′ measures the sensitivity in computing f â¡ ( x ) using \Tilde {x} ≈ x instead: | f â¡ ( x ) − f â¡ ( \Tilde {x} ) | ≈ | f ′ â¡ ( x ) | â‹… | x − \Tilde {x} | . This shows that for large | f ′ â¡ ( x ) | , the absolute error in x may be amplified. When the relative error in x is of interest, the condition number becomes | f ′ â¡ ( x ) | â‹… | x | : | f â¡ ( x ) − f â¡ ( \Tilde {x} ) | ≈ | f ′ â¡ ( x ) | â‹… | x | ⢠| x − \Tilde {x} | | x | . For the relative errors of f , the condition numbers are given by , sects.~1.2.6 and 5.3 | f â¡ ( x ) − f â¡ ( \Tilde {x} ) | | f â¡ ( x ) | ≈ | x ⢠f ′ â¡ ( x ) | | f â¡ ( x ) | ⢠| x − \Tilde {x} | | x |   or   | f â¡ ( x ) − f â¡ ( \Tilde {x} ) | | f â¡ ( x ) | ≈ | f ′ â¡ ( x ) | | f â¡ ( x ) | ⢠| x − \Tilde {x} | , which show that the relative errors in f â¡ ( x ) may be large near a solution x * . However, as shown in sects.~1.2.6 and 5.3 and Ex.~2.4, sect.~6.1, the problem of solving an equation has the conditioning number 1 / | f ′ â¡ ( x * ) | , which is equal to the inverse of the conditioning number for computing f â¡ ( x ) . When | f ′ â¡ ( x ) | (i.e., | f ′ â¡ ( x * ) | ) is not small, Shampine, Allen, and Pruess argue that the Newton method is “stable at limiting precision” p.~147, as the Newton corrections f â¡ ( x ) f ′ â¡ ( x ) are small. When | f ′ â¡ ( x * ) | is small (e.g., x * is multiple), notable cancellations may appear in computing f and prevent the usual graphical interpretation. The (expanded) polynomial ( x − 2 ) 9 of Demmel p.~8 computed in double precision is well known. Even simpler examples exist: f â¡ ( x ) = x 3 − 2 ⢠x 2 + 4 3 ⢠x − 8 27 ⣠( = ( x − 2 3 ) 3 ) (Sauer p.~45) and f â¡ ( x ) = x 3 ⢠e 3 ⢠x − 3 ⢠x 2 ⢠e 2 ⢠x + 3 ⢠x ⢠e x − 1 ⣠( = ( e x − 1 ) 3 ) , x * ≈ 0.56 (Shampine, Allen, and Pruess Fig.~1.2, p.~23). | f ′ | not being small is no guarantee that things won’t get weird due to other reasons. [Wilkinson; see Ex.~4.3] Let f â¡ ( x ) = x 20 − 1 , x * = 1 , and (here) x 0 = 10 . Why, in double precision, are the (rounded) Newton iterates: x 1 = 9.5 , x 2 = 9.0 , x 3 = 8.6 , x 4 = 8.1 , etc.? Take â„“ 0 = ± 0.001 and compute x 1 and e 1 . Nonlinear Systems in R N When Newton-type methods are used in practice to solve nonlinear systems in R N , obtaining high Q -orders becomes challenging. As the dimension may be huge ( N = 10 6 and even higher) the main difficulty resides—having solved the representation of the data in memory One can use Newton–Krylov methods, which do not require storage of the matrices F ′ â¡ ( x ) , but instead computation of matrix-vector products F ′ â¡ ( x ) ⢠v , further approximated by finite differences .—in accurately solving the resulting linear systems \cref{f. Newton method for systems in Rn} at each step. In many practical situations, superlinear convergence may be considered satisfactory. The most common setting allows the systems \cref{f. Newton method for systems in Rn} to be only approximately solved (the corrections verify F ⢠‘ â¡ ( x k ) ⢠s k = − F â¡ ( x k ) + r k , r k ≠0 ), leading to the inexact Newton method; its superlinear convergence and the orders p ∈ ( 1 , 2 ] were characterized by Dembo, Eisenstat, and Steihaug in , the approach being easily extended by us for further sources of perturbation in , . The quasi-Newton methods form another important class of Newton-type iterations; they consider approximate Jacobians, in order to produce linear systems that are easier to solve . Their superlinear convergence was characterized by Dennis and Moré . The inexact and quasi-Newton methods are in fact tightly connected models . The Secant Method This is a two-step method, requiring x 0 ≠x 1 ∈ D , x k + 1 = x k − f â¡ ( x k ) f â¡ ( x k ) − f â¡ ( x k − 1 ) x k − x k − 1 = x k − 1 ⢠f â¡ ( x k ) − x k ⢠f â¡ ( x k − 1 ) f â¡ ( x k ) − f â¡ ( x k − 1 ) ,   k = 1 , 2 , … , but none of the above formulae is recommended for programming (see \cref{subsec: secant method – floating point arithmetic}). History Though nowadays the secant method is usually seen as a Newton-type method with derivatives replaced by finite differences (i.e., a quasi-Newton method) or an inverse interpolatory method, this is not the way it first appeared. The roots of this method can be traced back to approximately the same time as that of the Newton method, the 18th century B.C., found on Babylonian clay tablets and on the Egyptian Rhind Papyrus , , when it was used as a single iteration for solving linear equations. Such equations appear trivial these days, but their solution required ingenuity back then: the decimal system and the zero symbol were used much later, in the 800s (see, e.g., pp.~192–193). During these times and even later, the terminology for such a noniterated form was “the rule of double false position” (see, e.g., ); indeed, two initial “false positions” x 0 , x 1 yield in one iteration the true solution of the linear equation. Heron of Alexandria approximated the cubic root of 100 by this method (as revealed by Luca and Păvăloiu , it turns out that it can be seen as applied equivalently to x 2 − 100 x = 0 ). While reinvented in several cultures (in China, Jiuzhang Suanshu (\emNine Chapters on the Mathematical Art) sect.~3.3. India, For example, in the work of Madhava’s student Paramesvara , , sect.~3.4. Arab countries, and Africa, al-ṬūsÄ«, al-KÄshÄ«, al-KhayyÄm (see , , sect.~3.5). sometimes even for 2 -by- 2 systems of equations), its use as an iterative method was first considered by Cardano (1545), who called it Regula Liberae Positionis (or regula aurea note~5, p.~200). Viète also used a secant-type method, and it appears that Newton (around 1665) independently rediscovered the secant method , . Attainable Q -Orders The standard convergence result shows the usual convergence with C -order given by the golden ratio λ 1 := 1 + 5 2 ≈ 1.618 ; as noted by Papakonstantinou and Tapia , this was first obtained by Jeeves in 1958 (in a technical report, published as ). [Chap.~12,~sects.~11–12] If x * is simple and f ′ ′ â¡ ( x * ) ≠0 , then the secant method has local convergence with C -order λ 1 = 1 + 5 2 : lim k → ∞ | x * − x k + 1 | | x * − x k | λ 1 = | f ′ ′ â¡ ( x * ) 2 ⢠f ′ â¡ ( x * ) | λ 1 − 1 = : Q λ 1 . The Q -order λ 1 follows by considering signed errors (see, e.g., ) â„“ k + 1 = [ x k − 1 , x k , x * ; f ] [ x k − 1 , x k ; f ] ⢠ℓ k ⢠ℓ k − 1 ,   k ≥ 1   ( â„“ k := x k − x * ) , and then \cref{ex. Q-order lam for ineq ekp1ekekm1}. Indeed, the divided differences converge, [ x k − 1 , x k ; f ] → f ′ â¡ ( x * ) , [ x k − 1 , x k , x * ; f ] → f ′ ′ â¡ ( x * ) 2 , and denoting l = | f ′ ′ â¡ ( x * ) 2 ⢠f ′ â¡ ( x * ) | , it follows that in \cref{f. Aekekm1^p <= ekp1 <= Bekekm1^p} we may take A = l − ε and B = l + ε for some small ε . Find the incompleteness in proving the C -order λ 1 in \cref{f. secant simple sol Q_lam_1 formula} by arguing that ( l − ε ) ⢠( e k − 1 λ 1 e k ) 1 λ 1 ⢠e k 1 λ 1 + 1 − λ 1 ≤ ( l − ε ) ⢠e k ⢠e k − 1 e k λ 1 ≤ e k + 1 e k λ 1 ≤ ( l + ε ) ⢠e k ⢠e k − 1 e k λ 1 = ( l + ε ) ⢠( e k − 1 λ 1 e k ) 1 λ 1 , 1 λ 1 + 1 − λ 1 = 0 , and therefore Q λ 1 satisfies Q λ 1 = l ⢠Q λ 1 − 1 λ 1 , i.e., is given by \cref{f. secant simple sol Q_lam_1 formula}. By \cref{f. secant meth simple sol signed errors formula w div diff}, the signs of the errors (and of f â¡ ( x k ) as well) are periodic with period at most 3 ( k ≥ k 0 ), as noted by Neumaier Cor.~5.1.3. Higher Q – (but not necessarily C -)orders may be attained. [Raydan ] If x * is simple and q ≥ 1 is the first index such that f ( q + 1 ) â¡ ( x * ) ≠0 , then the secant method converges locally with Q -order given by λ q = 1 + 1 + 4 ⢠q 2 and obeys lim e k + 1 e k ⢠e k − 1 q = | f ( q + 1 ) â¡ ( x * ) | ( q + 1 ) ! ⢠| f ′ â¡ ( x * ) | . Moreover, specific results hold in the following cases, noting the attained orders: q = 1 : C -order λ 1 ; q = 2 : exact Q -order λ 2 = 2 (but not necessarily C -order 2 ); q = 3 : Q -order λ 3 ≈ 2.3 (but not necessarily exact Q -order λ 3 ). When q ≥ 2 , the secant iterates may have only Q – but no C -order, despite e k + 1 e k ⢠e k − 1 q converging to a finite nonzero limit (see \cref{ex. Q-order lam for ineq ekp1ekekm1}). [cf.~] Let f [ q + 1 ] â¡ ( x ) = x + x q + 1 , x 0 = 0.5 , x 1 = 1 ( q + 1 ) λ q , q = 1 , 2 , 3 , and compute the secant iterates (by \cref{eq. secant method used in programs}). The assertions of \cref{th: secant method Raydan higher orders} are verified in this case, since | Q L ⢠( k − 1 ) − λ q | and | Q Λ ⢠( k − 2 ) − λ q | , computed using \ttsetprecision(500), tend to 0 (see \cref{fig. secant |p-QL| and |p-QLam|}). In \cref{fig. secant method Q_q(k)} we plot Q λ q â¡ ( k ) when q = 1 , 2 , and 3 . We see that Q λ 1 â¡ ( k ) tends to 1 . For q = 2 , Q λ 2 â¡ ( k ) oscillates (suggesting \barbelow {Q} λ 2 ⢠( k ) ≈ 0.57 , Q ¯ λ 2 ⢠( k ) ≈ 1.74 ). For q = 3 , Q ¯ λ 3 ⢠( k ) tends to infinity (as rigorously proved by Raydan for q ≥ 3 ). For this particular data it appears that \barbelow {Q} λ 3 = 0 (which remains to be rigorously proved). In order to see it more clearly, we have also used further increased precision (\ttsetprecision(5000)), providing confirmation. When q = 3 , Q ¯ λ 3 = ∞ or \barbelow {Q} λ 3 = 0 do not necessarily hold for any x 0 , x 1 : they do not hold (numerically) for x 0 = 0.5 , x 1 = 0.25 (but they do hold, e.g., for x 0 = 1 , x 1 = 0.5 ). The higher the order, the faster the iterates attain fl â¡ ( x * ) . We see in \cref{fig. secant method Q_q(k)} that \ttsetprecision(500) allows 14 iterates when q = 1 and only 7 iterates when q = 3 (increased to 11 for \ttsetprecision(5000)). Multiple Solutions Not only the high orders, but even very local convergence may be lost if x * is not simple. [cf.~] If f â¡ ( x ) = x 2 , x 0 = − ε , x 1 = ε , ∄ x 2 in \cref{eq. secant method used in programs} ( ∀ ε > 0 ) . Assuming that the secant method converges, only a linear rate is obtained ; DÃez showed that if the multiplicity of x * in \cref{f. def x* multiple order q} is q = 2 , then Q 1 = 1 λ 1 = λ 1 − 1 ≈ 0.618 ; q ≥ 3 , then Q 1 is the 0 < λ * < 1 solution of x q + x q − 1 − 1 = 0 . Also, if the iterates verify at each step | f â¡ ( x k ) | < | f â¡ ( x k − 1 ) | (by swapping), one obtains superlinear convergence for the root secant method of Neumaier p.~241: x k + 1 = x k − x k − x k − 1 1 − f â¡ ( x k − 1 ) / f â¡ ( x k ) . For other references on this topic, see sect.~7.9 and Ex.~1.9. Attraction Balls Estimates for the radius of an attraction ball were given by Liang : if \cref{eq: def Lipschitz condition} and | [ x , y , z ; f ] | ≤ K ∀ x , y , z ∈ D hold, then r = 1 3 ⢠β ⢠K . When using a specific x 1 in the secant method ( x 0 ∈ B r arbitrary, x 1 = x 0 − f â¡ ( x 0 ) f ′ â¡ ( x 0 ) ), estimates similar to \cref{f. Newton meth estimation radius attr ball} were obtained: if \cref{f. |f'(x*)|>1/beta} and \cref{f. L <= sup |f”|}, then r = 2 3 ⢠β ⢠L   (see ) . Floating Point Arithmetic The two expressions in \cref{eq. secant method two defs not recommended} are not recommended for programming in an ad literam fashion, respectively, at all; indeed, the first one should be written instead as x k + 1 = x k − f â¡ ( x k ) ⢠( x k − x k − 1 ) f â¡ ( x k ) − f â¡ ( x k − 1 ) ,   k = 1 , 2 , … , while the second one can lead, close to x * , to cancellations when f â¡ ( x k ) and f â¡ ( x k − 1 ) have the same sign p.~228, p.~4. Further equivalent formulae are x k + 1 = x k − x k − x k − 1 1 − f â¡ ( x k − 1 ) f â¡ ( x k ) ,   k = 1 , 2 , … , and (Vandergraft p.~265) x k + 1 = x k − [ ( x k − 1 − x k ) ⢠( f â¡ ( x k ) f â¡ ( x k − 1 ) ) ] / ( 1 − f â¡ ( x k ) f â¡ ( x k − 1 ) ) , ┊ which requires | f â¡ ( x k ) f â¡ ( x k − 1 ) | < 1 at each step (by swapping the iterates). It is worth noting, however, that swapping the consecutive iterates in order to obtain | f â¡ ( x k ) | < | f â¡ ( x k − 1 ) | leads to iterates that may have a different dynamic than is standard. The secant method may not be stable at limiting precision, even for large | f ′ â¡ ( x * ) | (Shampine, Allen, and Pruess p.~147), and higher precision may be needed. Successive Approximations for Fixed Point Problems We consider a sufficiently differentiable mapping g : D → D , D ⊆ R , and the fixed point problem g â¡ ( x ) = x . A solution x * is called a fixed point of g ; x * is further called an attraction fixed point when one has local convergence for the successive approximations x k + 1 = g â¡ ( x k ) ,   k ≥ 0. History Assuming that the computation of 2 with five (decimal) digits is highly improbable in the absence of an iterative method, it is likely that successive approximations date back to at least 1800 B.C. (considering the Newton iterates as particular instances of successive approximations). Regarding the first use of iterates not connected to the Newton or secant-type iterates, they seem have appeared in the fifth and sixth centuries in India (see Plofker ; in this paper the author describes as an example the Brahmasphutasiddhanta of Brahmagupta from the seventh century). Later occurrences are mentioned for Ibn al-Banna as cited in sect.~7.3, al-ṬūsÄ« sect.~7.4, Viète sect.~7.5, ParameÅ›vara (aviÅ›esÌ£a) , al-KÄshÄ« sect.~7.5, and HÌ£abash al-HÌ£Äsib al-Marwazi , . Further occurrences of such iterates appear for Kepler (1618–1622) in sect.~7.6, Gregory (1672 ; 1674 ), Dary (1674) , and Newton (1674) . Local Convergence, Attainable C -Orders The first part of the following local convergence result was first obtained by Schröder in 1870 . The extension to mappings in R N implied the analysis of the eigenvalues of the Jacobian at the fixed point, which is an important topic in numerical analysis. Given H ∈ R N × N and the linear system x = H ⢠x + c , the following result was called by Ortega p.~118 the fundamental theorem of linear iterative methods: If x = H ⢠x + c has a unique solution x * , then the successive approximations converge to x * ∀ x 0 ∈ R N iff the spectral radius Ï â¡ ( H ) is < 1 . The standard formulation of the result was first given by Ostrowski in 1957, but Perron had essentially already obtained it in 1929 (see NR~10.1-1). \hfill (a) (Thm.~10.1.3, ) Let g be continuous on D ⊆ R and differentiable at the fixed point x * ∈ D . If | g ′ â¡ ( x * ) | < 1 , then x * is an attraction fixed point. (b) () Conversely, if x * is an attraction fixed point, then | g ′ â¡ ( x * ) | ≤ 1 . [see E~10.1-2] Condition | g ′ â¡ ( x * ) | ≤ 1 is sharp: by considering | g ′ â¡ ( x * ) | = 1 and taking g â¡ ( x ) = x − x 3 , x * = 0 is an attraction fixed point, while if g â¡ ( x ) = x + x 3 , the same fixed point x * is no longer of attraction. The higher orders attained are characterized as follows. [see, e.g., Chap.~4,~sect.~10,~p.~44, , ]Let p 0 ≥ 2 be an integer and x * a fixed point of g . Then x k + 1 = g â¡ ( x k ) → x * with C -order p 0 iff g ′ â¡ ( x * ) = g ′ ′ â¡ ( x * ) = ⋯ = g ( p 0 − 1 ) â¡ ( x * ) = 0   a ⢠n ⢠d   g ( p 0 ) â¡ ( x * ) ≠0 , in which case lim k → ∞ | x * − x k + 1 | | x * − x k | p 0 = | g ( p 0 ) â¡ ( x * ) | p 0 ! = : Q p 0 . [E~10.1-10] Any C -order p 0 > 1 is possible: let g â¡ ( x ) = x p 0 . The C -sublinear order may be attained too. Let g â¡ ( x ) = x − x 3 and x k + 1 = g â¡ ( x k ) → 0 ; then Q 1 â¡ ( { x k } ) = 1 . The successive approximations are usually regarded as the most general iterations, but actually, they may also be seen as instances of an inexact Newton method ; such an approach allows us to study acceleration techniques in a different setting. Attraction Balls The following result holds. [] Let x * be an attraction fixed point of g for which | g ′ â¡ ( x * ) | ≤ q < 1. Suppose there exist r 1 > 0 , L > 0 such that g is differentiable on B r 1 , with g ′ Lipschitz continuous of constant L , and denote r = min { r 1 , 2 ⢠( 1 − q ) L } . Then, ∀ x 0 ∈ B r , letting t = L 2 ⢠| x 0 − x * | + q < 1 we get | x k + 1 − x * | ≤ t ⢠| x k − x * | ,   k ≥ 0 , which implies that x k + 1 = g â¡ ( x k ) ∈ B r and x k + 1 = g â¡ ( x k ) → x * . It is important to note that this result does not require g to be a contraction on D (i.e., L < 1 ), in contrast to the classical Banach contraction theorem. The center-Lipschitz condition (i.e., x * instead of y in \cref{eq: def Lipschitz condition}) is an improvement . The estimate \cref{f. succ approx – est r attr ball} may be sharp in certain cases. [] Let g â¡ ( x ) = x 2 with x * = 0 , g ′ â¡ ( x * ) = 0 , r 1 > 0 arbitrary, and L = 2 . Then \cref{f. succ approx – est r attr ball} gives the sharp r = r 2 = 1 , since x * = 1 is another fixed point. We observe that | g ′ â¡ ( x ) | = 2 at | x − x * | = r = 1 , which upholds \cref{rem: succ approx – attr ball without contraction}. Regarding the general case of several dimensions with G : D ⊆ R N → D , we note that the trajectories with high convergence orders are characterized by the zero eigenvalue of G ′ â¡ ( x * ) and its corresponding eigenvectors (see ). [Ex.~2.8] \cref{def: local conv of an iter method} allows a finite number of x k ∉ V (regardless of how “small” V is chosen to be): take, e.g., G â¡ ( x ) = A ⢠x , A = ( 0 2 1 8 0 ) , V = B r â¡ ( 0 ) . Conclusions Various aspects of convergence orders have been analyzed here, but rather than providing a final view, we see this work as a fresh start; in we pursue the study of the following themes: the characterizations of the C -order, the connections between the Q – and R -orders and the big Oh rates, the asymptotic constants (with their immediate computational variants), the convergence orders of the discretization schemes (to mention a few). Answers to Quizzes \cref{quiz: given graph errors which is faster?}: { x Ë™ k } = { 1 2 2 k } = { k } in different windows. \cref{quiz: the rate of 10-1 10-2 10-3}: (a) Q 1 = 10 − 1 ; (b) Q 1 = 10 − 2 ; (c) 10 − x k , { x k } the Fibonacci sequence; (d) Q 2 = 1 . \cref{quiz: if f is second deg polyn how large |f| can be?}: unbounded (take f â¡ ( x ) = x + a ⢠x 2 ). \cref{quiz: Newton method for x^20-1=0}: fl â¡ ( 10 20 − 1 ) = fl â¡ ( 10 20 ) , in double precision. \cref{quiz: secant method why C_lam1 nexists}: the limit Q λ 1 is not proved to exist; see also \cref{th: secant method Raydan higher orders}. \addcontentsline{toc}{section}{Acknowledgments} Acknowledgments I am grateful to two referees for constructive remarks which helped to improve the manuscript, particularly the one who brought the Julia language to my attention; also, I am grateful to my colleagues M.~Nechita and I.~Boros, as well as to S.~Filip for useful suggestions regarding Julia. Advanpix \sc{Advanpix team}, \emAdvanpix, Version 4.7.0.13642, 2020, \url{https://www.advanpix.com/} (accessed 2021/04/07). Argyros-George-2016 \scI.~K. Argyros and S.~George, \emOn a result by Dennis and Schnabel for Newton’s method: Further improvements, Appl. Math. Lett., 55 (2016), pp.~49–53, \url{https://doi.org/10.1016/j.aml.2015.12.003}. Bailey-1989 \scD.~F. Bailey, \emA historical survey of solution by functional iteration, Math. Mag., 62 (1989), pp.~155–166, \url{https://doi.org/10.1080/0025570X.1989.11977428}. BEQ90 \scW.~A. Beyer, B.~R. Ebanks, and C.~R. Qualls, \emConvergence rates and convergence-order profiles for sequences, Acta Appl. Math., 20 (1990), pp.~267–284, \url{https://doi.org/10.1007/bf00049571}. Bezanson-Edelman-Karpinski-Shah-2017 \scJ.~Bezanson, A.~Edelman, S.~Karpinski, and V.~B. Shah, \emJulia: A fresh approach to numerical computing, SIAM Rev., 59 (2017), pp.~65–98, \url{https://doi.org/10.1137/141000671}.\enlargethispage*{9pt} Brent-1973 \scR.~P. Brent, \emAlgorithms for Minimization without Derivatives, Prentice-Hall, Englewood Cliffs, NJ, 1973. Brent-Winograd-Wolfe-1973 \scR.~P. Brent, S.~Winograd, and P.~Wolfe, \emOptimal iterative processes for root-finding, Numer. Math., 20 (1973), pp.~327–341, \url{https://doi.org/10.1007/bf01402555}. Bre-1971 \scC.~Brezinski, \emComparaison des suites convergentes, Rev. Française Inform. Rech. Opér., 5 (1971), pp.~95–99. Brez-1977 \scC.~Brezinski, \emAccélération de la Convergence en Analyse Numérique, Springer-Verlag, Berlin, 1977. Brez-1979 \scC.~Brezinski, \emLimiting relationships and comparison theorems for sequences, Rend. Circolo Mat. Palermo Ser. II, 28 (1979), pp.~273–280, \url{https://doi.org/10.1007/bf02844100}. Brez-1985 \scC.~Brezinski, \emVitesse de convergence d’une suite, Rev. Roumaine Math. Pures Appl., 30 (1985), pp.~403–417. Brown-1987 \scP.~N. Brown, \emA local convergence theory for combined inexact-Newton/finite-difference projection methods, SIAM J. Numer. Anal., 24 (1987), pp.~407–434, \url{https://doi.org/10.1137/0724031}. Cajori-1911 \scF.~Cajori, \emHistorical note on the Newton-Raphson method of approximation, Amer. Math. Monthly, 18 (1911), pp.~29–32, \url{https://doi.org/10.2307/2973939}. Catinas-2001 \scE.~CătinaÅŸ, \emInexact perturbed Newton methods and applications to a class of Krylov solvers, J. Optim. Theory Appl., 108 (2001), pp.~543–570, \url{https://doi.org/10.1023/a:1017583307974}. Catinas-2001-accelerating \scE.~CătinaÅŸ, \emOn accelerating the convergence of the successive approximations method, Rev. Anal. Numér. Théor. Approx., 30 (2001), pp.~3–8, \url{https://ictp.acad.ro/accelerating-convergence-successive-approximations-method/} (accessed 2021/04/07). Catinas-2002 \scE.~CătinaÅŸ, \emOn the superlinear convergence of the successive approximations method, J. Optim. Theory Appl., 113 (2002), pp.~473–485, \url{https://doi.org/10.1023/a:1015304720071}. Catinas-2005 \scE.~CătinaÅŸ, \emThe inexact, inexact perturbed and quasi-Newton methods are equivalent models, Math. Comp., 74 (2005), pp.~291–301, \url{https://doi.org/10.1090/s0025-5718-04-01646-1}. Catinas-2007 \scE.~CătinaÅŸ, \emNewton and Newton-type Methods in Solving Nonlinear Systems in R N , Risoprint, Cluj-Napoca, Romania, 2007, https://ictp.acad.ro/methods-of-newton-and-newton-krylov-type/ (accessed on 2021/6/9). Catinas-2009-ball \scE.~CătinaÅŸ, \emEstimating the radius of an attraction ball, Appl. Math. Lett., 22 (2009), pp.~712–714, \url{https://doi.org/10.1016/j.aml.2008.08.007}. Cat2019-survey \scE.~CătinaÅŸ, \emA survey on the high convergence orders and computational convergence orders of sequences, Appl. Math. Comput., 343 (2019), pp.~1–20, \url{https://doi.org/10.1016/j.amc.2018.08.006}. Catinas-manuscript-1 \scE.~CătinaÅŸ, \emHow Many Steps Still Left to x *? Part II. Newton Iterates without f and f ′ , manuscript, 2020. Cauchy-1829 \scA.~Cauchy, \emSur la determination approximative des racines d’une equation algebrique ou transcendante, Oeuvres Complete (II) 4, Gauthier-Villars, Paris, 1899, 23 (1829), pp.~573–609, \url{http://iris.univ-lille.fr/pdfpreview/bitstream/handle/1908/4026/41077-2-4.pdf?sequence=1} (accessed 2021/04/07). Chabert-1999 \scJ.-L. Chabert, ed., \emA History of Algorithms. From the Pebble to the Microchip, Springer-Verlag, Berlin, 1999. Chen-Argyros-2010 \scJ.~Chen and I.~K. Argyros, \emImproved results on estimating and extending the radius of an attraction ball, Appl. Math. Lett., 23 (2010), pp.~404–408, \url{https://doi.org/10.1016/j.aml.2009.11.007}. Conn-Gould-Toint-2000 \scA.~R. Conn, N.~I.~M. Gould, and P.~L. Toint, \emTrust-Region Methods, SIAM, Philadelphia, PA, 2000, \url{https://doi.org/10.1137/1.9780898719857}. Dahlquist-Bjork-1974 \scG.~Dahlquist and AÌŠ. Björk, \emNumerical Methods, Dover, Mineola, NY, 1974. DemEisSte82 \scR.~S. Dembo, S.~C. Eisenstat, and T.~Steihaug, \emInexact Newton methods, SIAM J. Numer. Anal., 19 (1982), pp.~400–408, \url{https://doi.org/10.1137/0719025}. Demmel-1997 \scJ.~W. Demmel, \emApplied Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997, \url{https://doi.org/10.1137/1.9781611971446}. Dennis-1968 \scJ.~E. Dennis, \emOn Newton-like methods, Numer. Math., 11 (1968), pp.~324–330, \url{https://doi.org/10.1007/bf02166685}. Dennis-More-1974 \scJ.~E. Dennis,~Jr., and J.~J. Moré, \emA characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comp., 28 (1974), pp.~549–560, \url{https://doi.org/10.1090/s0025-5718-1974-0343581-1}. Dennis-More-1977 \scJ.~E. Dennis,~Jr., and J.~J. Moré, \emQuasi-Newton methods, motivation and theory, SIAM Rev., 19 (1977), pp.~46–89, \url{https://doi.org/10.1137/1019005}. Dennis-Schnabel-1983 \scJ.~E. Dennis,~Jr., and R.~B. Schnabel, \emNumerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, PA, 1996, \url{https://doi.org/10.1137/1.9781611971200}. Deslauries-Dubuc-1996 \scG.~Deslauries and S.~Dubuc, \emLe calcul de la racine cubique selon Héron, Elem. Math., 51 (1996), pp.~28–34, \url{http://eudml.org/doc/141587} (accessed 2020/02/18). Deuflhard-2011 \scP.~Deuflhard, \emNewton Methods for Nonlinear Problems. Affine Invariance and Adaptive Algorithms, Springer-Verlag, Heidelberg, 2004. Deuflhard-Potra-1992 \scP.~Deuflhard and F.~A. Potra, \emAsymptotic mesh independence of Newton–Galerkin methods via a refined Mysovskii theorem, SIAM J. Numer. Anal., 29 (1992), pp.~1395–1412, \url{https://doi.org/10.1137/0729080}. Diez-2003 \scP.~DÃez, \emA note on the convergence of the secant method for simple and multiple roots, Appl. Math. Lett., 16 (2003), pp.~1211–1215, \url{https://doi.org/10.1016/s0893-9659(03)90119-4}. Dubeau-Gnang-2014 \scF.~Dubeau and C.~Gnang, \emFixed point and Newton’s methods for solving a nonlinear equation: From linear to high-order convergence, SIAM Rev., 56 (2014), pp.~691–708, \url{https://doi.org/10.1137/130934799}. Fourier-1818 \scJ.~Fourier, \emQuestion d’analyse algébrique, Bull. Sci. Soc. Philo., 67 (1818), pp.~61–67, \url{http://gallica.bnf.fr/ark:/12148/bpt6k33707/f248.item} (accessed 2021-04-07). Gautschi-2011 \scW.~Gautschi, \emNumerical Analysis, 2nd ed., Birkhäuser/Springer, New York, 2011, \url{https://doi.org/10.1007/978-0-8176-8259-0}. GrauSanchez-Noguera-Gutierrez-2010 \sc{M. Grau-S\'{a}nchez}, M. Noguera, and J. M. Gutiérrez, \emOn some computational orders of convergence, Appl. Math. Lett., 23 (2010), pp.~472–478, \url{https://doi.org/10.1016/j.aml.2009.12.006}. Greenbaum-Chartier-2012 \scA.~Greenbaum and T.~P. Chartier, \emNumerical Methods. Design, Analysis, and Computer Implementation of Algorithms, Princeton University Press, Princeton, NJ, 2012. Heath-2018 \scM.~T. Heath, \emScientific Computing. An Introductory Survey, 2nd ed., SIAM, Philadelphia, PA, 2018, \url{https://doi.org/10.1137/1.9781611975581}. Heinkenschloss-2018 \scM.~Heinkenschloss, \emScientific Computing. An Introductory Survey, Rice University, Houston, TX, 2018. Herzberger-Metzner-1996 \scJ.~Herzberger and L.~Metzner, \emOn the Q-order of convergence for coupled sequences arising in iterative numerical processes, Computing, 57 (1996), pp.~357–363, \url{https://doi.org/10.1007/BF02252254}. Higham-2002 \scN.~J. Higham, \emAccuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, Philadelphia, PA, 2002, \url{https://doi.org/10.1137/1.9780898718027}. IgaYpm97 \scM.~Igarashi and T.~J. Ypma, \emEmpirical versus asymptotic rate of convergence of a class of methods for solving a polynomial equation, J. Comput. Appl. Math., 82 (1997), pp.~229–237, \url{https://doi.org/10.1016/s0377-0427(97)00077-0}. Jay-2001 \scL.~O. Jay, \emA note on Q-order of convergence, BIT, 41 (2001), pp.~422–429, \url{https://doi.org/10.1023/A:1021902825707}. Jeeves-1958 \scT.~A. Jeeves, \emSecant modification of Newton’s method, Commun. ACM, 1 (1958), pp.~9–10, \url{https://doi.org/10.1145/368892.368913}. Katz-1998 \scV.~J. Katz, \emA History of Mathematics. An Introduction, 2nd ed., Addison-Wesley, 1998. Kelley-1995 \scC.~T. Kelley, \emIterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, PA, 1995, \url{https://doi.org/10.1137/1.9781611970944}. Kelley-1999 \scC.~T. Kelley, \emIterative Methods for Optimization, SIAM, Philadelphia, PA, 1999, \url{https://doi.org/10.1137/1.9781611970920}. Kennedy-Transue-1956 \scE.~S. Kennedy and W.~R. Transue, \emA medieval iterative algorism, Amer. Math. Monthly, 63 (1956), pp.~80–83, \url{https://doi.org/10.1080/00029890.1956.11988762}. Knuth-1976 \scD.~E. Knuth, \emBig Omicron and big Omega and big Theta, ACM SIGACT News, 8 (1976), pp.~18–24, \url{https://doi.org/10.1145/1008328.1008329}. Knuth-1984 \scD.~E. Knuth, \emThe TeXbook, Addison-Wesley, 1984. Kollerstrom-1992 \scN.~Kollerstrom, \em{Thomas Simpson} and Newton’s method of approximation: An enduring myth, British J. History Sci., 25 (1992), pp.~347–354, \url{https://doi.org/10.1017/s0007087400029150}. Liang-2007 \scK.~Liang, \emHomocentric convergence ball of the secant method, Appl. Math. Chin. Univ., 22 (2007), pp.~353–365, \url{https://doi.org/10.1007/s11766-007-0313-3}. Luca-Pavaloiu-1997 \scD.~Luca and I.~Păvăloiu, \emOn the Heron’s method for approximating the cubic root of a real number, Rev. Anal. Numér. Théor. Approx., 26 (1997), pp.~103–108, \url{https://ictp.acad.ro/jnaat/journal/article/view/1997-vol26-nos1-2-art15} (accessed 2021-04-07). MatLab \sc{Mathworks}, \emMATLAB, Version 2019b, 2019, \url{http://www.mathworks.com} (accessed 2021/04/07). McNamee-2007 \scJ.~M. McNamee, \emNumerical Methods for Roots of Polynomials, Part I, Elsevier, Amsterdam, 2007. McNamee-Pan-2013 \scJ.~M. McNamee and V.~Y. Pan, \emNumerical Methods for Roots of Polynomials, Part II, Elsevier, Amsterdam, 2013. Merzbach-Boyer-2011 \scU.~C. Merzbach and C.~B. Boyer, \em{A History of Mathematics}, 3rd~ed., John Wiley & Sons, 2011. Muller-et-al-2018 \scJ.-M. Muller, N.~Brunie, F.~de Dinechin, C.-P. Jeannerod, M.~Joldes, V.~Lefèvre, G.~Melquiond, N.~Revol, and S.~Torres, \emHandbook of Floating-Point Arithmetic, 2nd~ed., Springer, New York, 2018, \url{https://doi.org/10.1007/978-3-319-76526-6}. Nemirovsky-Yudin-1983 \scA.~S. Nemirovsky and D.~B. Yudin, \emProblem Complexity and Method Efficiency in Optimization, Wiley, 1983.\enlargethispage*{-9pt} Neumaier-2001 \scA.~Neumaier, \emIntroduction to Numerical Analysis, Cambridge University Press, 2001. Nocedal-Wright-2006 \scJ.~Nocedal and S.~J. Wright, \emNumerical Optimization, 2nd~ed., Springer, New York, 2006, \url{https://doi.org/10.1007/978-0-387-40065-5}. Ortega-1987 \scJ.~M. Ortega, \emNumerical Analysis. A Second Course, SIAM, Philadelphia, PA, 1990, \url{https://doi.org/10.1137/1.9781611971323}. OR70 \scJ.~M. Ortega and W.~C. Rheinboldt, \emIterative Solution of Nonlinear Equations in Several Variables, SIAM, Philadelphia, PA, 2000, \url{https://doi.org/10.1137/1.9780898719468}. Ostrowski-1966 \scA.~M. Ostrowski, \emSolution of Equations and Systems of Equations, 2nd ed., Academic Press, New York, 1966; 3rd ed., 1972. Overton-2001 \scM.~L. Overton, \emNumerical Computing with IEEE Floating Point Arithmetic, SIAM, Philadelphia, PA, 2001, \url{https://doi.org/10.1137/1.9780898718072}. Papakonstantinou-Tapia-2013 \scJ.~M. Papakonstantinou and R.~A. Tapia, \emOrigin and evolution of the secant method in one dimension, Amer. Math. Monthly, 120 (2013), pp.~500–518, \url{https://doi.org/10.4169/amer.math.monthly.120.06.500}. Pavaloiu-1981 \scI.~Păvăloiu, \emSur l’ordre de convergence des méthodes d’itération, Mathematica, 23 (46) (1981), pp.~261–265, \url{https://ictp.acad.ro/convergence-order-iterative-methods/}. Pavaloiu-1999 \scI.~Păvăloiu, \emOptimal algorithms concerning the solving of equations by interpolation, in Research on Theory of Allure, Approximation, Convexity and Optimization, Editura Srima, 1999, pp.~222–248, \url{http://ictp.acad.ro/optimal-algorithms-concerning-solving-equations-interpolation/} (accessed 2021-04-07).\enlargethispage*{-9pt} Plofker-1996 \scK.~Plofker, \emAn example of the secant method of iterative approximation in a fifteenth-century Sanskrit text, Historia Math., 23 (1996), pp.~246–256, \url{https://doi.org/10.1006/hmat.1996.0026}. Plofker-2002 \scK.~Plofker, \emUse and transmission of iterative approximations in India and the Islamic World, in From China to Paris: 2000 Years Transmission of Mathematical Ideas, Y. Dold-Samplonius, J. W. Dauben, M. Folkerts, and B. Van Dalen, eds., Franz Steiner Verlag Wiesbaden GmbH, Stuttgart, 2002, pp.~167–186. Polak-1997 \scE.~Polak, \emOptimization. Algorithms and Consistent Approximations, Springer-Verlag, New York, 1997. Potra89 \scF.~A. Potra, \emOn Q-order and R-order of convergence, J. Optim. Theory Appl., 63 (1989), pp.~415–431, \url{https://doi.org/10.1007/bf00939805}. Potra-2001 \scF.~A. Potra, \emQ-superlinear convergence of the iterates in primal-dual interior-point methods, Math. Program. Ser. A, 91 (2001), pp.~99–115, \url{https://doi.org/10.1007/s101070100230}. Potra-2017 \scF.~A. Potra, \emA superquadratic version of Newton’s method, SIAM J. Numer. Anal., 55 (2017), pp.~2863–2884, \url{https://doi.org/10.1137/17M1121056}. Potra-Ptak-1984 \scF.~A. Potra and V.~Pták, \emNondiscrete Induction and Iterative Processes, Pitman, Boston, MA, 1984. Potra-Sheng-1998 \scF.~A. Potra and R.~Sheng, \emA path following method for LCP with superlinearly convergent iteration sequence, Ann. Oper. Res., 81 (1998), pp.~97–114, \url{https://doi.org/10.1023/A:1018942131812}. Quarteroni-Sacco-Saleri-2007 \scA.~Quarteroni, R.~Sacco, and F.~Saleri, \emNumerical Mathematics, Springer, Berlin, 2007. Rashed-1994 \scR.~Rashed, \emThe Development of Arabic Mathematics: Between Arithmetic and Algebra, Stud. Philos. Sci.~156, Springer, Boston, MA, 1994. Raydan-1993 \scM.~Raydan, \emExact order of convergence of the secant method, J. Optim. Theory Appl., 78 (1993), pp.~541–551, \url{https://doi.org/10.1007/BF00939881}. Rhe77 \scW.~C. Rheinboldt, \emAn adaptive continuation process for solving systems of nonlinear equations, in Mathematical Models and Numerical Methods, Banach Ctr. Publ.~3, Polish Academy of Science, 1977, pp.~129–142, \url{https://doi.org/10.4064/-3-1-129-142}. Rhe98 \scW.~C. Rheinboldt, \emMethods for Solving Systems of Nonlinear Equations, 2nd~ed., SIAM, Philadelphia, PA, 1998, \url{https://doi.org/10.1137/1.9781611970012}. Sauer-2012 \scT.~Sauer, \emNumerical Analysis, 2nd~ed., Pearson, Boston, MA, 2012. Schroder-1870 \scE.~Schröder, \emUeber unendlich viele Algorithmen zur Auflösung der Gleichungen, Math. Ann., 2 (1870), pp.~317–365, \url{https://doi.org/10.1007/bf01444024}; available as On Infinitely Many Algorithms for Solving Equations, Tech reports TR–92–121 and TR–2990, Institute for Advanced Computer Studies, University of Maryland, 1992; translated by G.~W. Stewart. Schwetlick-1979 \scH.~Schwetlick, \emNumerische Lösung Nichtlinearer Gleichungen, VEB, Berlin, 1979. Shampine-Allen-Pruess-1997 \scL.~F. Shampine, R. C. Allen, Jr., and S.~Pruess, \emFundamentals of Numerical Computing, John Wiley and Sons, New York, 1997. Steihaug-2019 \scT.~Steihaug, \emComputational science in the eighteenth century. Test cases for the methods of Newton, Raphson, and Halley: 1685 to 1745 , Numer. Algorithms, 83 (2020), pp.~1259–1275, \url{https://doi.org/10.1007/s11075-019-00724-8}. Suli-Mayers-2003 \scE.~Süli and D.~Mayers, \emAn Introduction to Numerical Analysis, Cambridge University Press, Cambridge, UK, 2003. Tantau-2020 \scT.~Tantau, \emThe tikz and pgf Packages. Manual for Version 3.1 ⢠.5 b- 34 -gff 02 ccd 1 , \url{https://github.com/pgf-tikz/pgf}. Tapia-Dennis-Schafermeyer-2018 \scR.~A. Tapia, J. E. Dennis, Jr., and J.~P. Schäfermeyer, \emInverse, shifted inverse, and Rayleigh quotient iteration as Newton’s method, SIAM Rev., 60 (2018), pp.~3–55, \url{https://doi.org/10.1137/15M1049956}. Tapia-Whitley-1988 \scR.~A. Tapia and D.~L. Whitley, \emThe projected Newton method has order 1 + 2 for the symmetric eigenvalue problem, SIAM J. Numer. Anal., 25 (1988), pp.~1376–1382, \url{https://doi.org/10.1137/0725079}. Traub-1964 \scJ.~F. Traub, \emIterative Methods for the Solutions of Equations, Prentice Hall, Englewood Cliffs, NJ, 1964. Tyrtyshnikov-1997 \scE.~E. Tyrtyshnikov, \emA Brief Introduction to Numerical Analysis, Birkhäuser/Springer, New York, 1997. Vandergraft-1983 \scJ.~S. Vandergraft, \emIntroduction to Numerical Computations, 2nd~ed., Academic Press, New York, 1983. Walker1997 \scH.~F. Walker, \emAn approach to continuation using Krylov subspace methods, in Computational Science in the 21st Century, J. Periaux, ed., John Wiley and Sons, 1997, pp.~72–81. Wright-1997 \scS.~J. Wright, \emPrimal-Dual Interior-Point Methods, SIAM, Philadelphia, PA, 1997, \url{https://doi.org/10.1137/1.9781611971453}. Wright-2018 \scS.~J. Wright, \emOptimization algorithms for data analysis, in The Mathematics of Data, IAS/Park City Math. Ser.~25, AMS, Providence, RI, 2018, pp.~49–97. Ypma-1995 \scT.~J. Ypma, \emHistorical development of the Newton–Raphson method, SIAM Rev., 37 (1995), pp.~531–551, \url{https://doi.org/10.1137/1037125}.