Abstract
The strict superlinear order (superlinear convergence) was usually regarded as having an intermediate speed between between linear and with order \(p>1\).
In this paper we analyze the superlinear convergence (superlinear order/rate) of sequences and we show that actually there are four distinct classes of strict superlinear order: “weak”, “medium”, “strong” and “mixed”. The speed of the sequences from the first three classes is increasingly much faster, term-by-term big Oh, i.e.,
\[|x^\ast-x_k|=\mathcal{O}(|y^\ast-y_k|^{\alpha}), \mbox{ as } k\rightarrow \infty, \forall \alpha>1 \mbox{ given},
\]whereas the speed of the “mixed” class cannot be assessed.
We prove that the speed of the sequences from the “medium” and “weak” classes is term-by-term slower than the speed of the sequences with high classical C-orders \(p>1\) (in the sense of big Oh above), while an example shows that certain sequences from the “mixed” class may be term-by-term faster than some sequences with infinite C-order.
We also show that for a given sequence with strict superlinear convergence, one can evaluate numerically to which class it belongs.
Some recent results of Rodomanov and Nesterov (Math. Program., 2022), resp. Ye et al. (Math. Program., 2023) show that certain classical quasi-Newton methods (DFP, BFGS and SR1) belong to the “weak” class.
Authors
Emil Cătinaş
“Tiberiu Popoviciu” Institute of Numerical Analysis
Keywords
convergent sequence, convergence order, convergence speed, superlinear order, Q-order, convergence rate, big Oh, quasi-Newton method, DFP, BFGS, SR1.
Paper coordinates
E. Cătinaş, The strict superlinear order can be faster than the infinite order, Numer. Algor., 95 (2024), pp. 1177–1186. https://doi.org/10.1007/s11075-023-01604-y
About this paper
Journal
Numerical Algorithms
Publisher Name
Springer Nature
Print ISSN
1017-1398
Online ISSN
1572-9265
google scholar link
Paper (preprint) in HTML version
The strict superlinear order
can be faster than the infinite order
Abstract
The sequences with strict superlinear convergence are the output of numerous algorithms; such speed is clearly much faster than linear, but is it also slower than, say, quadratic?
In this note we show that actually there are four distinct classes of strict superlinear order: “weak”, “medium”, “strong” and “mixed”. The speed of the sequences from the first three classes is increasingly much faster (term-by-term big Oh, i.e., , as given), whereas the speed of the “mixed” class cannot be assessed.
We prove that the speed of the sequences from the “medium” and “weak” classes is term-by-term slower than the speed of the sequences with high classical -orders (in the sense of big Oh above), while an example shows that certain sequences from the “mixed” class may be term-by-term faster than some sequences with infinite -order.
We also show that for a given sequence with strict superlinear convergence, one can evaluate numerically to which class it belongs.
Some recent results of Rodomanov and Nesterov (2022), resp. Qi et al. (2023) show that certain classical quasi-Newton methods (DFP, BFGS and SR1) belong to the “weak” class.
Keywords: convergent sequence, convergence order, convergence speed, superlinear order, -order, convergence rate, big Oh, quasi-Newton method, DFP, BFGS, SR1.
MSC: 40A05.
1 Introduction
We consider here real sequences converging to finite limits, and, since we analyze the behavior of the absolute value of their errors, we assume without loss of generality that and (i.e., ). A more general frame may be considered (normed or metric spaces), but the one-dimensional setting is enough to analyze some essential properties.
First we recall some definitions and notations regarding orders other than superlinear. The quotient convergence factors are defined by [12, sect. 9.1]
with their lower/upper limits given by , resp. [1].
The sequence has -order if (i.e., exists and is finite and nonzero), the linear -order is attained when , while the sublinear -order requires .
For the -order there are four equivalent definitions (see [5], [15], [1], [7], [10]), but we consider here only two convenient ones
() | ||||
() |
The lower/upper -orders, denoted by resp. , are (see [1], [7] and [10])
and it holds:
-
•
, (retrieving in another way),
- •
the condition providing the third equivalent definition of the -order . When , the four definitions are no longer equivalent, as shown in [1].
Remark 1
, when the limit exists (i.e., cannot hold). When (and therefore ) the “global speed” of may be difficult to assess.
The sequences with , called in [10] “with unbounded nonmonotone errors”333Recall that one cannot have (as suggested by a typo in the abstract of [10]), but just . despite not attaining -linear order, were shown that sometimes may have fast speed, as, e.g., (see [10, Ex. 2.46]). Such sequences were also called subsequently there “with no -order” [10, §2.2] but we notice here that such terminology is not suitable: shows that one may have and ()-order one.
As regarding the fast speed, it cannot be attained when (due to the obvious lower bounds of the errors) and one necessarily needs ; however, condition alone is not also sufficient for fast speed: take above or
The -superlinear convergence—which we call here simply as superlinear—requires (see [12, ch. 9], [4], [14], [7], [10])
(1) |
and it is usually understood as “at least superlinear order”: indeed, all with -order or just with lower -order are also superlinear ( implies s.t. , [12, sect. 9.1], [10] and therefore ).
Remark 2
There is another type of superlinear speed, the -superlinear order, usually defined by using root-convergence factors (see [12, ch. 9], [14]), or (as for higher orders), simply by requiring that the errors are bounded by another sequence, converging to zero with -superlinear order [19], [13, (37b)].
Tapia [19] noticed that convergence with (lower) -order arbitrary high does not imply -superlinear convergence, while Boggs, Tolle and Wang [3] concluded that “because an -superlinearly convergent sequence need not be even -linearly convergent, -superlinear convergence by itself is computationally meaningless”. We have seen in the previous Example that the fast sequence with -superlinear convergence (in fact with infinite -order) not only that does not attain some type of -linear order, but even has unbounded nonmonotone errors ().
The above Example therefore shows that the -superlinear convergence is not necessary for high speed: there converges pretty fast despite .
2 The strict superlinear (SSL) convergence
The strict superlinear (SSL) convergence requires additionally that [10]
(2) |
i.e., does attain neither some -order nor some lower -order .
The SSL was usually seen as a fast convergence, with an intermediate speed between the linear and high orders. Indeed, the SSL is (much) faster than the linear -order, in the sense that
() |
For the proof we use here (and later too) the well known ratio test for sequences.
Lemma 4
Given a sequence with , if then . The same conclusion holds if .
However, the SSL convergence is not always slower than that with some -order , as we will show in Example 10. Before that, we further analyze the SSL order.
Example 5
The convergence of and of is SSL—with the “strictness” implied by their -order one ().
By denoting , , 444When is starting from , is starting from (as, e.g., , , etc.). condition (1) may be equivalently written as (see, e.g., [13, (37b)] and [10])
(3) |
For the SSL convergence of , the -order of cannot be greater than one.
Theorem 6
Let with as . If has -order then from (3) has SSL order. If has -order then has -order
Now, consider , or just simply ask ourselves: how fast does converge to zero? We obtain the following classification of the SSL order:
- (a) (strong SSL)
-
strict superlinearly:
- (b) (medium SSL)
-
with -order :
- (c) (weak SSL)
-
-sublinearly:
- (d) (mixed SSL)
-
otherwise.
Remark 7
We note that the weak SSL order may be defined by requiring only that (instead of requiring -linear order).
Example 8
One can show that has medium SSL while has strong SSL order.
Example 9
has weak SSL order (hint: ).
Example 10
a) (strong SSL) so ; one may take further and by theorem 6 is again SSL, etc. For another instance, take , so
b) (medium SSL) for fixed, take so
c) (weak SSL) , so
For the mixed class, we may take (similar) sequences from Remark 1 as .
d) (mixed SSL) etc., resulting the corresponding sequences , , , resp. .
We notice that , , i.e., the SSL is term-by-term faster than (with no -order) which is in turn term-by-term faster than (with infinite - and -order).
In Figure 1 we have plotted some of these sequences (), computed using the Tikz/Pgf LaTeX package [18] (the cases , from (a), from (c) and , from (d), were first computed in Julia [2], using BigFloat with setprecision(10^4), and then passed to Tikz/Pgf).
The first values of the fastest sequence (in red), written truncated, are (not plotted), , , , , and cannot therefore be plotted, being below the realmin from Tikz/Pgf.
The second fastest sequence (violet, with infinite -order) has the following iterates (not plotted), , , , (not plotted).
The strong SSL appears close to the quadratic , but their speed is quite different (as also shown by the next Theorem): indeed, at , has approximately the same value as at ; one more step of needs four steps of to get the same magnitude.
With the scales of Figure 1, the weak SSL of appears only slightly faster than that of the linear , but the conclusions of Proposition 3 hold.
The following result justifies the terminology of strong, medium and weak SSL.
Theorem 11
Proof. Let with and .
As , by Lemma 4 we get and therefore . The second part is similar.
Remark 12
In [10, Rem. 2.50(b)] we noted (without proof) that “the -orders (i.e., sublinear, linear, strict superlinear, and infinite) form classes in (), increasing speed”. However, the SSL order was erroneously mentioned, as it is not a -order (its definition requires , as noted in (2)) and moreover such a statement does not hold, as we have already seen in Example 10.
Here we prove that the weak and the medium SSL are (much) slower than the -orders .
Proof. Let , . As , by Lemma 4 we successively get .
In some particular cases, the strong SSL is also (much) slower than -order .
Theorem 14
Proof. The proof is similar to the one of Theorem 13.
For the remaining part of this paper we denote the absolute value of errors by (instead of simply by ).
In case of at least superlinear convergence, it was shown that the absolute value of the errors and of the corrections behave asymptotically in the same way (the result also holds in ).
Lemma 15 (Potra–Pták Lemma, [14, Prop. 6.4]; see also [20])
at least -superlinearly iff at least -superlinearly.
If has (at least) -superlinear convergence, then it holds that (the Dennis–Moré Lemma [11])
(4) |
We obtain the following immediate result, which holds in as well.
Corollary 16
Given a sequence s.t. with SSL convergence, let and take
(5) |
-
•
If then has weak SSL;
-
•
if then has medium SSL;
-
•
if then has strong SSL;
-
•
if does not converge at all then has mixed SSL.
The proof is obvious, as the errors from the expressions by (4) may be replaced by corrections.
Example 17
We consider the SSL sequences from Example 10 and for each of them we represent graphically in Figure 2 the expressions (5), using the same corresponding colors as in Figure 1 (here the limit is known, but formula (5) does not require its knowledge).
For the same precision as in Example 10 (setprecision(10^4)), we have obtained nonzero results for ; the (truncated) values of for are, for the following ’s: (true limit ); (true limit ); (true limit ); (true limit ); (true limit , but very slow convergence); (true limit , but very slow convergence); (true limit ).
In the last two cases (Example 10d, , resp. ), much fewer relevant terms can be computed with this precision, and one clearly see we do not have convergence.
In each case it is confirmed numerically that the sequence belongs to the stated SSL class. However, it is worth noting that in some cases (e.g., and ) some more terms are needed to conclude that the sequence converges.
In certain applications it may be difficult to verify numerically whether converges to , to or to some value in (if at all), especially when the convergence is slow.
Numerous methods from Computational Optimization were proved to be with (at least) superlinear order. Three classical methods have explicit expressions for their errors, and now their speed can be assessed.
Example 18
The quasi-Newton methods for smooth unconstrained optimization are known for a long time that attain superlinear order, but it is only recently that some explicit expressions for the errors have been obtained.
Rodomanov and Nesterov [16] have shown for the Davidon–Fletcher–Powell (DFP) method that the errors are of the form ( depending on the dimension problem, the Lipschitz constant and the strong convexity parameter) and for the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method that similarly .
Subsequently, Ye et al. [21] have shown for the Symmetric-Rank 1 (SR1) method that .
We see that all the three methods attain weak SSL order, and it becomes interesting to find methods with medium or perhaps strong SSL order, for the reason that their faster speed may appear from the first terms.
3 Conclusions
The main role of the convergence orders is to allow the comparison of the speed of different convergent sequences: we usually expect that the higher the order, the faster the speed; while this is in general true (as shown in [8] for orders ), these notions are imperfect and there are some notable exceptions. Indeed, a sequence with no -order was shown in [10] that may have fast speed (term-by-term faster than another sequence, with infinite order).
Here we have seen that a similar phenomenon holds in the case of the (mixed) superlinear convergence, as such a sequence may converge faster than another sequence, with infinite order (though, it is worth noting that not all sequences with mixed SSL order have such fast speed).
The explanation for such a behavior resides in the fact that both the superlinear and the -orders rely on conditions on consecutive errors, not on “global” relations involving all the errors. The graph of the sequence from Example 10(d) () which was plotted in Figure 1 in olive color is perhaps most relevant: while its speed is high (steep slope), some of the consecutive errors do not have a high relative reduction.
A similar analysis may be taken for the linear convergence [9], and the higher orders [8], but with notable different particularities; we have also started to analyze the sublinear convergence.
Data availability. All data generated or analysed during this study are included in this published article.
Conflict of interest. The author has no conflict of interests to declare that are relevant to the content of this article.
Ethical Approval. Not Applicable.
Availability of supporting data. Not Applicable.
Competing interests. Not Applicable.
Funding. Not Applicable.
Authors’ contributions. Not Applicable.
Acknowledgments. Not Applicable.
References
- [1] (1990) Convergence rates and convergence-order profiles for sequences. Acta Appl. Math. 20 (), pp. 267–284. External Links: Document Cited by: §1, §1, §1.
- [2] (2017) Julia: a fresh approach to numerical computing. SIAM Review 59 (), pp. 65–98. External Links: Document Cited by: Example 10.
- [3] (1982) On the local convergence of quasi-newton methods for constrained optimization. SIAM Journal on Control and Optimization 20 (2), pp. 161–171. External Links: Document, Link, https://doi.org/10.1137/0320014 Cited by: Remark 2.
- [4] (1977) Accélération de la convergence en analyse numérique. Springer-Verlag, Berlin. Cited by: §1.
- [5] (1985) Vitesse de convergence d’une suite. Rev. Roumaine Math. Pures Appl. 30, pp. 403–417. Cited by: §1.
- [6] (2021) Measuring the measures. Note: (manuscript) Cited by: §3.
- [7] (2019) A survey on the high convergence orders and computational convergence orders of sequences. Appl. Math. Comput. 343 (), pp. 1–20. External Links: Document Cited by: §1, §1, §1.
- [8] (2021) Characterizing the classical convergence orders. Note: manuscript, submitted Cited by: §3, §3.
- [9] (2021) Characterizing the classical linear convergence order. Note: manuscript Cited by: §3.
- [10] (2021) How many steps still left to *?. SIAM Rev. 63 (3), pp. 585–624. External Links: Document Cited by: §1, §1, §1, §2, §2, §3, Remark 1, Remark 12, footnote 3.
- [11] (1974) A characterization of superlinear convergence and its application to quasi-Newton methods. Math. Comp. 28 (126), pp. 549–560. External Links: Document Cited by: Lemma 15.
- [12] (2000) Iterative solution of nonlinear equations in several variables. SIAM, PA. External Links: Document Cited by: §1, §1, Remark 2.
- [13] (1997) Optimization. algorithms and consistent approximations. Springer-Verlag, New York. Cited by: §2, Remark 2.
- [14] (1984) Nondiscrete induction and iterative processes. Pitman, Boston, Massachusetts. Cited by: §1, Lemma 15, Remark 2.
- [15] (1989) On Q-order and R-order of convergence. J. Optim. Theory Appl. 63 (3), pp. 415–431. External Links: Document Cited by: §1.
- [16] (2022) Rates of superlinear convergence for classical quasi-Newton methods. Math. Program. 194 (1-2, Ser. A), pp. 159–190. External Links: ISSN 0025-5610, Document, Link Cited by: Example 18.
- [17] (2021) Simpler proofs for the Q-order. Note: manuscript Cited by: §3.
- [18] The tikz and pgf packages. Manual for version 3.1.5b-34-gff02ccd1. External Links: Link Cited by: Example 10.
- [19] (1977) Diagonalized multiplier methods and quasi-Newton methods for constrained optimization. J. Optim. Theory Appl. 22 (2), pp. 135–194. External Links: ISSN 0022-3239, Document, Link Cited by: Remark 2, Remark 2.
- [20] (1997) An approach to continuation using Krylov subspace methods. Computational Science in the 21st Century, J. Periaux, ed., John Wiley and Sons, Ltd., pp. 72–81. Cited by: Lemma 15.
- [21] (2023) Towards explicit superlinear convergence rate for SR1. Math. Program. 199 (1-2, Ser. A), pp. 1273–1303. External Links: ISSN 0025-5610, Document, Link, MathReview Entry Cited by: Example 18.
[1] Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. SIAM, PA (2000)
[2] Beyer, W.A., Ebanks, B.R., Qualls, C.R.: Convergence rates and convergence-order profiles for sequences. Acta Appl. Math. 20, 267–284 (1990)
[3] Brezinski, C., Vitesse de convergence d’une suite. Rev. Roumaine Math. Pures Appl. 30, 403–417 (1985)
[4] Potra, F.A., On Q-order and R-order of convergence. J. Optim. Theory Appl. 63(3), 415–431 (1989)
[5] Catinas, E., A survey on the high convergence orders and computational convergence orders of sequences. Appl. Math. Comput. 343, 1–20 (2019)
[6] Catinas, E., How many steps still left to x*? SIAM Rev. 63(3), 585–624 (2021)
[7] Brezinski, C., Accélération de la Convergence en Analyse Numérique. Springer-Verlag, Berlin (1977)
[8] Potra, F.A., Pták, V., Nondiscrete Induction and Iterative Processes. Pitman, Boston, Massachusetts (1984)
[9] Tapia, R.A., Diagonalized multiplier methods and quasi-Newton methods for constrained optimization. J. Optim. Theory Appl. 22(2), 135–194 (1977)
[10] Polak, E., Optimization. Algorithms and Consistent Approximations. Springer-Verlag, New York (1997)
[11] Boggs, Paul T., Tolle, Jon W., Wang, Pyng, On the local convergence of Quasi-Newton methods for constrained optimization. SIAM Journal on Control and Optimization 20(2), 161–171 (1982)
[12] Tantau, T., The tikz and pgf packages. Manual for version 3.1.5b-34-gff02ccd1
[13] Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. SIAM Review 59, 65–98 (2017)
[14] Walker, H.F., An approach to continuation using Krylov subspace methods. In: Periaux, J. (ed.) Computational Science in the 21st Century, pp. 72–81. John Wiley and Sons, Ltd. (1997)
[15] Dennis, J.E., Jr., Moré, J.J., A characterization of superlinear convergence and its application to quasiNewton methods. Math. Comp. 28(126), 549–560 (1974)
[16] Rodomanov, A., Nesterov, Y., Rates of superlinear convergence for classical quasi-Newton methods. Math. Program. 194(1–2, Ser. A), 159–190 (2022)
[17] Ye, H., Lin, D., Chang, X., Zhang, Z., Towards explicit superlinear convergence rate for SR1. Math. Program. 199(1–2, Ser. A), 1273–1303 (2023)
[18] Catinas, E., Characterizing the classical convergence orders. Manuscript, submitted (2021)
[19] Catinas, E., Characterizing the classical linear convergence order Manuscript. (2021)
[20] Catinas, E., Stan, A., Measuring the measures. (2021). (manuscript)
[21] Stan, A., Catinas, E., Simpler proofs for the Q-order. Manuscript (2021)