Tikhonov regularization of a perturbed heavy ball system with vanishing damping

Abstract

This paper deals with aa perturbed heavy ball system with vanishing damping that contains a Tikhonov regularization term, in connection to the minimization problem of a convex Fréchet differentiable function. We show that the value of the objective function in a generated trajectory converges in order o (1/t2) to the global minimum of the objective function. We also obtain the fast convergence of the velocities towards zero. Moreover, we obtain that a trajectory generated by the dynamical system converges weakly to a minimizer of the objective function. Finally, we show that the presence of the Tikhonov regularization term assures the strong convergence of the generated trajectories to an element of minimal norm from the argmin set of the objective function.

Authors

Cristian Daniel Alecsa,
(“Tiberiu Popoviciu” Institute of Numerical Analysis, Romanian Academy,
Romanian Institute of Science and Technology)

Laszlo Szilard Csaba
Universitatea Tehnica Cluj-Napoca

Note on author affiliation

The paper has been elaborated by the first author while working at ICTP, according to the version of the preprint posted on ResearchGate, at http://doi.org/10.13140/RG.2.2.29212.92801.
However, for the final/published version of the paper, the first author has deleted his affiliation at ICTP.

Keywords

convex optimization; heavy ball method; continuous second order dynamical system; Tikhonov regularization;  convergence rate; strong convergence

PDF

Cite this paper as:

C.D. Alecsa, L.S. Csaba, Tikhonov regularization of a perturbed heavy ball system with vanishing damping, SIAM J. Optim., 31 (2021) no. 4, pp. 2921-2954, http://doi.org/10.1137/20M1382027

About this paper

Journal

SIAM J. Optim.

Publisher Name

SIAM

Print ISSN

1052-6234

Online ISSN

1095-7189

Google Scholar Profile

References

[1] B. Abbas, H. Attouch, B.F. Svaiter, Newton-like dynamics and forward-backward methods for structured monotone  inclusions in Hilbert spaces, Journal of Optimization Theory and its Applications, 161(2) (2014), 331-360748
[2] C.D. Alecsa, The long time behavior and the rate of convergence of symplectic convex algorithms obtained via splitting discretizations of inertial damping systems, (2020), arxiv.org/abs/2001.10831750
[3] C.D. Alecsa, S.C. Laszlo, T. Pinta, An Extension of the Second Order Dynamical System that Models Nesterov’s Convex Gradient Method, Applied Mathematics and Optimization, (2020), https://doi.org/10.1007/s00245-020-09692-1752
[4] C.D. Alecsa, S.C. Laszlo, A. Viorel, A gradient type algorithm with backward inertial steps associated to a nonconvex minimization problem, Numerical Algorithms, 84(2) (2020), 485-512754
[5] V. Apidopoulos, J.F. Aujol, Ch. Dossal, Convergence rate of inertial Forward-Backward algorithm beyond Nesterov’s rule, Mathematical Programming, 180 (2020), 137-156756
[6] H. Attouch, A. Balhag, Z. Chbani, H. Riahi, Fast convex optimization via inertial dynamics combining vis-757 cous and Hessian-driven damping with time rescaling, Evolution Equations and Control Theory, (2021), https://doi.org/10.3934/eect.202101759
[7] H. Attouch, A. Cabot, Convergence rates of inertial forward-backward algorithms, SIAM J. Optim., 28(1) (2018), 849-874
[8] H. Attouch, Z. Chbani, J. Fadili, H. Riahi, First-order algorithms via inertial systems with Hessian driven damping, Math. Program., (2020), https://doi.org/10.1007/s10107-020-01591-1762
[9] H. Attouch, Z. Chbani, J. Peypouquet, P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. Program. Ser. B, 168 (2018), 123-175764
[10] H. Attouch, Z. Chbani, H. Riahi, Combining fast inertial dynamics for convex optimization with Tikhonov regularization, Journal of Mathematical Analysis and Applications, 457(2) (2018), 1065-1094766
[11] H. Attouch, Z. Chbani, R. Riahi, Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α3. ESAIM-COCV, 25 (2019), Article number 2768
[12] H. Attouch, M.-O. Czarnecki, Asymptotic Control and Stabilization of Nonlinear Oscillators with Non-isolated Equilibria, J. Differential Equations, 179 (2002), 278-310770
[13] H. Attouch, R. Cominetti, A dynamical approach to convex minimization coupling approximation with the steepest descent method, J. Differential Equations, 128 (1996), 519-540772
[14] H. Attouch, S.C. Laszlo, Newton-like Inertial Dynamics and Proximal Algorithms Governed by Maximally Monotone Operators, SIAM Journal on Optimization, 30(4) (2020), 3252–3283774
[15] H. Attouch, S.C. Laszlo, Continuous Newton-like Inertial Dynamics for Monotone Inclusions, Set-Valued and Variational Analysis, (2020), doi:10.1007/s11228-020-00564-y776
[16] H. Attouch, S.C. Laszlo, Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution, (2021), https://arxiv.org/abs/2104.11987778
[17] H. Attouch, J. Peypouquet, The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k2. SIAM J. Optim., 26(3) (2016), 1824-1834780
[18] H. Attouch, J. Peypouquet, Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators, Mathematical Programming, 174(1-2) (2019), 391–432782
[19] H. Attouch, J. Peypouquet, P. Redont, Fast convex optimization via inertial dynamics with Hessian driven dampingJournal of Differential Equations, 261(10) (2016), 5734-5783784
[20] R.I. Bot¸, E.R. Csetnek, S.C. Laszlo, Tikhonov regularization of a second order dynamical system with Hessian dampingMath. Program., (2020), https://doi.org/10.1007/s10107-020-01528-8786
[21] R.I. Bot¸, E.R. Csetnek, S.C. Laszlo, A second-order dynamical approach with variable damping to nonconvex smooth minimization Applicable Analysis, 99(3) (2020), 361-378788
[22] R.I. Bot¸, S.M. Grad, D. Meier, M. Staudigl, Inducing strong convergence of trajectories in dynamical systems associated to monotone inclusions with composite structure, Adv. Nonlinear Anal., 10 (2021), 450–476790
[23] A. Cabot, H. Engler, S. Gadat, On the long time behavior of second order differential equations with asymptotically small dissipation, Transactions of the American Mathematical Society, 361 (2009), 5983-6017792
[24] A. Chambolle, Ch. Dossal, On the convergence of the iterates of the Fast Iterative Shrinkage Thresholding AlgorithmJournal of Optimization Theory and Applications, 166 (2015), 968-982794
[25] R. Cominetti, J. Peypouquet, S. Sorin, Strong asymptotic convergence of evolution equations governed by maximal monotone operators with Tikhonov regularization, J. Differential Equations, 245 (2008), 3753-3763796
[26] M.A. Jendoubi, R. May, Asymptotics for a second-order differential equation with nonautonomous damping and an integrable source term, Appl. Anal., 94(2) (2015), 435–443798
[27] S.C. Laszlo, Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimizationMathematical Programming, (2020), https://doi.org/doi.org/10.1007/s10107-020-01534-w800
[28] S.C. Laszlo, Forward-backward algorithms with different inertial terms for structured non-convex minimization problems, (2020), arxiv.org/abs/2002.07154802
[29] T. Lin, M.I. Jordan, A Control-Theoretic Perspective on Optimal High-Order Optimization, (2019), arXiv:1912.07168v1803
[30] R. May, Asymptotic for a second-order evolution equation with convex potential and vanishing damping term, Turkish Journal of Math., 41(3) (2017), 681-685805
[31] Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k2), Soviet Mathematics Doklady, 27 (1983), 372-376807
[32] B. Shi, S.S. Du, M.I. Jordan, W.J. Su, Understanding the acceleration phenomenon via high-resolution differential equations, (2018), arXiv:1810.08907v3809
[33] W. Su, S. Boyd, E.J. Candes, A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights, Journal of Machine Learning Research 17(153) (2016), 1-43

Paper (preprint) in HTML form

2021 Alecsa Tikhonov Regularization of a Perturbed

TIKHONOV REGULARIZATION OF A PERTURBED HEAVY BALL SYSTEM WITH VANISHING DAMPING

CRISTIAN DANIEL ALECSA * AND SZILÁRD CSABA LÁSZLÓ †

Abstract

This paper deals with a perturbed heavy ball system with vanishing damping that contains a Tikhonov regularization term, in connection to the minimization problem of a convex Fréchet differentiable function. We show that the value of the objective function in a generated trajectory converges in order o ( 1 / t 2 ) o 1 / t 2 o(1//t^(2))o\left(1 / t^{2}\right)o(1/t2) to the global minimum of the objective function. We also obtain the fast convergence of the velocities towards zero. Moreover, we obtain that a trajectory generated by the dynamical system converges weakly to a minimizer of the objective function. Finally, we show that the presence of the Tikhonov regularization term assures the strong convergence of the generated trajectories to an element of minimal norm from the argmin set of the objective function.

Key words. convex optimization; heavy ball method; continuous second order dynamical system; Tikhonov regularization; convergence rate; strong convergence.
AMS subject classifications. 34G20, 47J25, 90C25, 90C30, 65K10.
  1. Introduction. Let H H H\mathcal{H}H be a real Hilbert space endowed with the scalar product , , (:*,*:)\langle\cdot, \cdot\rangle, and norm ||*||\|\cdot\| and let g : H R g : H R g:HlongrightarrowRg: \mathcal{H} \longrightarrow \mathbb{R}g:HR be a convex Fréchet differentiable function. Consider the minimization problem
( P ) inf x H g ( x ) ( P ) inf x H g ( x ) (P)i n f_(x inH)g(x)(P) \inf _{x \in \mathcal{H}} g(x)(P)infxHg(x)
in connection to the second order dynamical system
(1.1) { x ¨ ( t ) + α t x ˙ ( t ) + g ( x ( t ) + ( γ + β t ) x ˙ ( t ) ) + ϵ ( t ) x ( t ) = 0 x ( t 0 ) = u 0 , x ˙ ( t 0 ) = v 0 (1.1) x ¨ ( t ) + α t x ˙ ( t ) + g x ( t ) + γ + β t x ˙ ( t ) + ϵ ( t ) x ( t ) = 0 x t 0 = u 0 , x ˙ t 0 = v 0 {:(1.1){[x^(¨)(t)+(alpha )/(t)x^(˙)(t)+grad g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))+epsilon(t)x(t)=0],[x(t_(0))=u_(0)","x^(˙)(t_(0))=v_(0)]:}:}\left\{\begin{array}{l} \ddot{x}(t)+\frac{\alpha}{t} \dot{x}(t)+\nabla g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right)+\epsilon(t) x(t)=0 \tag{1.1}\\ x\left(t_{0}\right)=u_{0}, \dot{x}\left(t_{0}\right)=v_{0} \end{array}\right.(1.1){x¨(t)+αtx˙(t)+g(x(t)+(γ+βt)x˙(t))+ϵ(t)x(t)=0x(t0)=u0,x˙(t0)=v0
where t 0 > 0 , ( u 0 , v 0 ) H × H , α 3 , γ > 0 , β R t 0 > 0 , u 0 , v 0 H × H , α 3 , γ > 0 , β R t_(0) > 0,(u_(0),v_(0))inHxxH,alpha >= 3,gamma > 0,beta inRt_{0}>0,\left(u_{0}, v_{0}\right) \in \mathcal{H} \times \mathcal{H}, \alpha \geq 3, \gamma>0, \beta \in \mathbb{R}t0>0,(u0,v0)H×H,α3,γ>0,βR or γ = 0 , β 0 γ = 0 , β 0 gamma=0,beta >= 0\gamma=0, \beta \geq 0γ=0,β0 and ϵ : [ t 0 , + ) R + ϵ : t 0 , + R + epsilon:[t_(0),+oo)longrightarrowR_(+)\epsilon:\left[t_{0},+\infty\right) \longrightarrow \mathbb{R}_{+}ϵ:[t0,+)R+is a nonincreasing function of class C 1 C 1 C^(1)C^{1}C1, such that lim t + ϵ ( t ) = 0 lim t + ϵ ( t ) = 0 lim_(t rarr+oo)epsilon(t)=0\lim _{t \rightarrow+\infty} \epsilon(t)=0limt+ϵ(t)=0. The starting time t 0 t 0 t_(0)t_{0}t0 is taken as strictly greater than zero whenever the coefficients α t α t (alpha )/(t)\frac{\alpha}{t}αt and β t β t (beta )/(t)\frac{\beta}{t}βt have singularities at 0 . This is not a limitation of the generality of the proposed approach, since we will focus on the asymptotic behaviour of the generated trajectories.
First of all, note that the dynamical system (1.1) is the Tikhonov regularized version of the perturbed heavy ball system with vanishing damping considered in connection to the optimization problem (P) by Alecsa-László-Pinţa in [3]. The dynamical system considered in [3] can be seen as an intermediate system between the heavy ball system with vanishing damping [33] and the heavy ball system with Hessian driven damping [19] and possesses all the valuable properties of the latter ones. Indeed, according to [3], in case γ > 0 , β R γ > 0 , β R gamma > 0,beta inR\gamma>0, \beta \in \mathbb{R}γ>0,βR, or γ = 0 , β 0 γ = 0 , β 0 gamma=0,beta >= 0\gamma=0, \beta \geq 0γ=0,β0, the objective function value in a trajectory generated by the perturbed heavy ball system converges in order O ( 1 / t 2 ) O 1 / t 2 O(1//t^(2))\mathcal{O}\left(1 / t^{2}\right)O(1/t2) to the global minimum of the objective function and the trajectory converges weakly to a minimizer of the objective function. Further, according to [3, Remark 2], in case γ = 0 γ = 0 gamma=0\gamma=0γ=0 and β < 0 β < 0 beta < 0\beta<0β<0 the perturbed heavy ball system can generate periodical solutions, therefore in this case the convergence of a generated trajectory to a minimizer of the objective is hopeless.
Throughout the paper we assume that g g grad g\nabla gg is Lipschitz continuous on bounded sets and argmin g argmin g argmin g!=O/\operatorname{argmin} g \neq \emptysetargming. Further, the Tikhonov regularization parameter, ϵ ( t ) ϵ ( t ) epsilon(t)\epsilon(t)ϵ(t), satisfies one of the following assumptions, (see also Remark 1).
(C1) There exist K > 1 K > 1 K > 1K>1K>1 and t 1 t 0 t 1 t 0 t_(1) >= t_(0)t_{1} \geq t_{0}t1t0 such that
ϵ ˙ ( t ) K 2 | γ + β t | ϵ 2 ( t ) for every t t 1 ϵ ˙ ( t ) K 2 γ + β t ϵ 2 ( t )  for every  t t 1 epsilon^(˙)(t) <= -(K)/(2)|gamma+(beta )/(t)|epsilon^(2)(t)" for every "t >= t_(1)\dot{\epsilon}(t) \leq-\frac{K}{2}\left|\gamma+\frac{\beta}{t}\right| \epsilon^{2}(t) \text { for every } t \geq t_{1}ϵ˙(t)K2|γ+βt|ϵ2(t) for every tt1
(C2) There exists K > 0 K > 0 K > 0K>0K>0 and t 1 t 0 t 1 t 0 t_(1) >= t_(0)t_{1} \geq t_{0}t1t0 such that
ϵ ( t ) K t for every t t 1 . ϵ ( t ) K t  for every  t t 1 epsilon(t) <= (K)/(t)" for every "t >= t_(1)". "\epsilon(t) \leq \frac{K}{t} \text { for every } t \geq t_{1} \text {. }ϵ(t)Kt for every tt1
Natural candidates for the Tikhonov regularization parameter that satisfy the above conditions are ϵ ( t ) = a t r , a > 0 , r 1 ϵ ( t ) = a t r , a > 0 , r 1 epsilon(t)=(a)/(t^(r)),a > 0,r >= 1\epsilon(t)=\frac{a}{t^{r}}, a>0, r \geq 1ϵ(t)=atr,a>0,r1. In order to give a better perspective of the results obtained in this paper, for this special case of the Tikhonov regularization parameter, we can conclude the following.
  1. According to Theorem 2.1 and Theorem 2.3, if the Tikhonov regularization parameter is ϵ ( t ) = a t r , a > 0 , r > 2 ϵ ( t ) = a t r , a > 0 , r > 2 epsilon(t)=(a)/(t^(r)),a > 0,r > 2\epsilon(t)= \frac{a}{t^{r}}, a>0, r>2ϵ(t)=atr,a>0,r>2 the following statements hold. When α 3 α 3 alpha >= 3\alpha \geq 3α3, one has g ( x ( t ) + ( γ + β t ) x ˙ ( t ) ) min g = O ( t 2 ) g x ( t ) + γ + β t x ˙ ( t ) min g = O t 2 g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))-min g=O(t^(-2))g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right)-\min g= \mathcal{O}\left(t^{-2}\right)g(x(t)+(γ+βt)x˙(t))ming=O(t2) as t + t + t rarr+oot \rightarrow+\inftyt+. Further, one has the integral estimates t 0 + t 2 g ( x ( t ) + ( γ + β t ) x ˙ ( t ) ) 2 d t < + t 0 + t 2 g x ( t ) + γ + β t x ˙ ( t ) 2 d t < + int_(t_(0))^(+oo)t^(2)||grad g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))||^(2)dt < +oo\int_{t_{0}}^{+\infty} t^{2}\left\|\nabla g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right)\right\|^{2} d t< +\inftyt0+t2g(x(t)+(γ+βt)x˙(t))2dt<+, whenever γ > 0 , β R γ > 0 , β R gamma > 0,beta inR\gamma>0, \beta \in \mathbb{R}γ>0,βR and t 0 + t g ( x ( t ) + β t x ˙ ( t ) ) 2 d t < + t 0 + t g x ( t ) + β t x ˙ ( t ) 2 d t < + int_(t_(0))^(+oo)t||grad g(x(t)+(beta )/(t)(x^(˙))(t))||^(2)dt < +oo\int_{t_{0}}^{+\infty} t\left\|\nabla g\left(x(t)+\frac{\beta}{t} \dot{x}(t)\right)\right\|^{2} d t<+\inftyt0+tg(x(t)+βtx˙(t))2dt<+, whenever γ = 0 , β > 0 γ = 0 , β > 0 gamma=0,beta > 0\gamma=0, \beta>0γ=0,β>0. When α > α > alpha >\alpha>α> 3 one has g ( x ( t ) + ( γ + β t ) x ˙ ( t ) ) min g = o ( t 2 ) , x ˙ ( t ) = o ( t 2 ) g x ( t ) + γ + β t x ˙ ( t ) min g = o t 2 , x ˙ ( t ) = o t 2 g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))-min g=o(t^(-2)),||x^(˙)(t)||=o(t^(-2))g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right)-\min g=o\left(t^{-2}\right),\|\dot{x}(t)\|=o\left(t^{-2}\right)g(x(t)+(γ+βt)x˙(t))ming=o(t2),x˙(t)=o(t2) as t + t + t rarr+oot \rightarrow+\inftyt+ and x ( t ) x ( t ) x(t)x(t)x(t) converges weakly to a minimizer of g g ggg. Further, t 0 + t x ˙ ( t ) 2 d t < + t 0 + t x ˙ ( t ) 2 d t < + int_(t_(0))^(+oo)t||x^(˙)(t)||^(2)dt < +oo\int_{t_{0}}^{+\infty} t\|\dot{x}(t)\|^{2} d t<+\inftyt0+tx˙(t)2dt<+ and t 0 + t ( g ( x ( t ) + ( γ + β t ) x ˙ ( t ) ) min g ) d t < + t 0 + t g x ( t ) + γ + β t x ˙ ( t ) min g d t < + int_(t_(0))^(+oo)t(g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))-min g)dt < +oo\int_{t_{0}}^{+\infty} t\left(g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right)-\min g\right) d t<+\inftyt0+t(g(x(t)+(γ+βt)x˙(t))ming)dt<+.
  2. According to Theorem 3.2, in the case ϵ ( t ) = a t r , a > 0 , 1 < r < 2 ϵ ( t ) = a t r , a > 0 , 1 < r < 2 epsilon(t)=(a)/(t^(r)),a > 0,1 < r < 2\epsilon(t)=\frac{a}{t^{r}}, a>0,1<r<2ϵ(t)=atr,a>0,1<r<2, for α > 3 α > 3 alpha > 3\alpha>3α>3 and α = 3 , γ = 0 , β 0 α = 3 , γ = 0 , β 0 alpha=3,gamma=0,beta >= 0\alpha=3, \gamma=0, \beta \geq 0α=3,γ=0,β0 one has lim inf t x ( t ) x = 0 lim inf t x ( t ) x = 0 l i m   i n f_(t rarr oo)||x(t)-x^(**)||=0\liminf _{t \rightarrow \infty}\left\|x(t)-x^{*}\right\|=0lim inftx(t)x=0, where x x x^(**)x^{*}x is the element of minimum norm of argmin g g ggg. In addition, x ( t ) x ( t ) x(t)x(t)x(t) converges strongly to x x x^(**)x^{*}x when either the trajectory { x ( t ) : t T } { x ( t ) : t T } {x(t):t >= T}\{x(t): t \geq T\}{x(t):tT} remains in the ball B ( 0 , x ) B 0 , x B(0,||x^(**)||)B\left(0,\left\|x^{*}\right\|\right)B(0,x), or in its complement, for T T TTT large enough.
  3. In the case ϵ ( t ) = a t 2 , a > 0 ϵ ( t ) = a t 2 , a > 0 epsilon(t)=(a)/(t^(2)),a > 0\epsilon(t)=\frac{a}{t^{2}}, a>0ϵ(t)=at2,a>0 and α > 3 α > 3 alpha > 3\alpha>3α>3, according to Theorem 2.2 one has that x x xxx is bounded, g ( x ( t ) + ( γ + β t ) x ˙ ( t ) ) min g = O ( 1 t 2 ) g x ( t ) + γ + β t x ˙ ( t ) min g = O 1 t 2 g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))-min g=O((1)/(t^(2)))g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right)-\min g=\mathcal{O}\left(\frac{1}{t^{2}}\right)g(x(t)+(γ+βt)x˙(t))ming=O(1t2) and x ˙ ( t ) = O ( 1 t ) x ˙ ( t ) = O 1 t ||x^(˙)(t)||=O((1)/(t))\|\dot{x}(t)\|=\mathcal{O}\left(\frac{1}{t}\right)x˙(t)=O(1t) as t + t + t rarr+oot \rightarrow+\inftyt+. Further, if a > 2 9 α ( α 3 ) a > 2 9 α ( α 3 ) a > (2)/(9)alpha(alpha-3)a>\frac{2}{9} \alpha(\alpha-3)a>29α(α3), according to Theorem 3.2 the conclusions stated at 2. hold.
Observe that according to 3. we are able to obtain both fast convergence of the function values and strong convergence of the trajectories for the same Tikhonov regularization parameter. For a long time this was an unsolved problem in the literature. However, recently Attouch and László [16] studied a dynamical system in connection to the optimization problem (P) and succeeded to obtain rapid convergence towards the infimal value of g g ggg, and the strong convergence of the trajectories towards the element of minimum norm of the set of minimizers of g g ggg. In our context, one can observe that the case r = 2 r = 2 r=2r=2r=2 is critical, in the sense that separates the two cases: the case when we obtain fast convergence of the function values and weak convergence of the trajectories to a minimizer and the case when the strong convergence of the trajectories to a minimizer of minimum norm is assured. These results are in concordance with the results obtained in [10] and [20], since, as will be shown in what follows, the dynamical system studied in this paper can be thought as an intermediate system between the dynamical system studied in [10] and the dynamical system considered in [20]. Before we give a more enlightening discussion about the connection of the dynamical system (1.1) and the Tikhonov regularized dynamical systems studied in [10] and [20] we underline two new features of our analysis. Firstly, we can show fast convergence of the velocity to zero, a property which is also obtained for the Tikhonov regularized system studied in [10], but this property is not shown for the Tikhonov regularized system considered in [20]. Secondly, we obtain some integral estimates for the gradient of the objective function and these results also appear for the Tikhonov regularized system studied in [20], but these estimates are not shown for the Tikhonov regularized system studied in [10].
For further insight into the Tikhonov regularization techniques we refer to [ 10 , 12 , 13 , 16 , 20 , 22 , 25 ] [ 10 , 12 , 13 , 16 , 20 , 22 , 25 ] [10,12,13,16,20,22,25][10,12,13,16,20,22,25][10,12,13,16,20,22,25].
1.1. Connection with second order dynamical systems with asymptotically vanishing damping. The dynamical system (1.1) is strongly related to the second order dynamical systems with an asymptotically vanishing damping term, studied by Su-Boyd-Candès in [33] in connection to the optimization problem ( P P PPP ), that is,
( A V D ) α x ¨ ( t ) + α t x ˙ ( t ) + g ( x ( t ) ) = 0 , x ( t 0 ) = u 0 , x ˙ ( t 0 ) = v 0 , u 0 , v 0 H . ( A V D ) α x ¨ ( t ) + α t x ˙ ( t ) + g ( x ( t ) ) = 0 , x t 0 = u 0 , x ˙ t 0 = v 0 , u 0 , v 0 H . (AVD)_(alpha)quadx^(¨)(t)+(alpha )/(t)x^(˙)(t)+grad g(x(t))=0,x(t_(0))=u_(0),x^(˙)(t_(0))=v_(0),u_(0),v_(0)inH.(A V D)_{\alpha} \quad \ddot{x}(t)+\frac{\alpha}{t} \dot{x}(t)+\nabla g(x(t))=0, x\left(t_{0}\right)=u_{0}, \dot{x}\left(t_{0}\right)=v_{0}, u_{0}, v_{0} \in \mathcal{H} .(AVD)αx¨(t)+αtx˙(t)+g(x(t))=0,x(t0)=u0,x˙(t0)=v0,u0,v0H.
It is obvious that the latter system can be obtained from (1.1) by taking γ = β = 0 γ = β = 0 gamma=beta=0\gamma=\beta=0γ=β=0 and ϵ 0 ϵ 0 epsilon-=0\epsilon \equiv 0ϵ0. According to [33], the trajectories generated by (AVD) α α _(alpha){ }_{\alpha}α assure fast minimization property of order O ( 1 / t 2 ) O 1 / t 2 O(1//t^(2))\mathcal{O}\left(1 / t^{2}\right)O(1/t2) for the decay g ( x ( t ) ) min g g ( x ( t ) ) min g g(x(t))-min gg(x(t))-\min gg(x(t))ming, provided α 3 α 3 alpha >= 3\alpha \geq 3α3. For α > 3 α > 3 alpha > 3\alpha>3α>3, it has been shown by Attouch-Chbani-Peypouquet-Redont [9] that each trajectory generated by (AVD) α α _(alpha){ }_{\alpha}α converges weakly to a minimizer of the objective function g g ggg. Further, it is shown in [17] and [30] that the asymptotic convergence rate of the values is actually o ( 1 / t 2 ) o 1 / t 2 o(1//t^(2))o\left(1 / t^{2}\right)o(1/t2).
An appropriate discretization of (AVD) α α _(alpha){ }_{\alpha}α with α = 3 α = 3 alpha=3\alpha=3α=3 corresponds to Nesterov's historical algorithm [31].
Therefore, as it was emphasized in [33], for α = 3 α = 3 alpha=3\alpha=3α=3 the system (AVD) α α _(alpha){ }_{\alpha}α can be seen as a continuous version of the accelerated gradient method of Nesterov.
However, the case α = 3 α = 3 alpha=3\alpha=3α=3 is critical, i.e., the convergence of the trajectories generated by (AVD) α α _(alpha){ }_{\alpha}α remains an open problem. The subcritical case α 3 α 3 alpha <= 3\alpha \leq 3α3 has been examined by Apidopoulos-Aujol-Dossal [5] and Attouch-Chbani-Riahi [11], with the convergence rate of the objective values O ( t 2 α 3 ) O t 2 α 3 O(t^(-(2alpha)/(3)))\mathcal{O}\left(t^{-\frac{2 \alpha}{3}}\right)O(t2α3).
When the objective function g g ggg is not convex, the convergence of the trajectories generated by (AVD) α α _(alpha){ }_{\alpha}α is a largely open question. Recent progress has been made in [21], where the convergence of the trajectories of a system, which can be considered as a perturbation of (AVD) α α _(alpha){ }_{\alpha}α has been obtained in a non-convex setting.
The corresponding inertial algorithms obtained from (AVD) α α _(alpha){ }_{\alpha}α via discretization, are in line with the Nesterov accelerated gradient method and enjoy similar properties to the continuous case, see [24] and also [4, 7, 9, 27, 28] for further results and the extensions to proximal-gradient algorithms for structured optimization. For other results concerning the system (AVD) α α _(alpha){ }_{\alpha}α and its extensions in we refer to [11, 23, 26].
A version of (AVD) α α _(alpha){ }_{\alpha}α containing a Tikhonov regularization term, strongly related to (1.1), was considered by Attouch-Chbani-Riahi in [10]. According to [10] the presence of Tikhonov regularization term ϵ ( t ) x ( t ) ϵ ( t ) x ( t ) epsilon(t)x(t)\epsilon(t) x(t)ϵ(t)x(t) provides the strong convergence of the trajectories to the element of minimum norm of the set of minimizers of g g ggg, when ϵ ( t ) ϵ ( t ) epsilon(t)\epsilon(t)ϵ(t) tends slowly to zero. We emphasize that (1.1), for the case β = γ = 0 β = γ = 0 beta=gamma=0\beta=\gamma=0β=γ=0, reduces to the system studied in [10].
1.2. Connection with second order dynamical systems with Hessian driven damping. The dynamical system (1.1) is also related to the second order dynamical system with Hessian driven damping term, studied by Attouch-Peypouquet-Redont in [19], that is,
( D I N A V D ) α , β x ¨ ( t ) + α t x ˙ ( t ) + β 2 g ( x ( t ) ) x ˙ ( t ) + g ( x ( t ) ) = 0 , x ( t 0 ) = u 0 , x ˙ ( t 0 ) = v 0 , u 0 , v 0 H , β 0 ( D I N A V D ) α , β x ¨ ( t ) + α t x ˙ ( t ) + β 2 g ( x ( t ) ) x ˙ ( t ) + g ( x ( t ) ) = 0 , x t 0 = u 0 , x ˙ t 0 = v 0 , u 0 , v 0 H , β 0 (DIN-AVD)_(alpha,beta)quadx^(¨)(t)+(alpha )/(t)x^(˙)(t)+betagrad^(2)g(x(t))x^(˙)(t)+grad g(x(t))=0,x(t_(0))=u_(0),x^(˙)(t_(0))=v_(0),u_(0),v_(0)inH,beta >= 0(D I N-A V D)_{\alpha, \beta} \quad \ddot{x}(t)+\frac{\alpha}{t} \dot{x}(t)+\beta \nabla^{2} g(x(t)) \dot{x}(t)+\nabla g(x(t))=0, x\left(t_{0}\right)=u_{0}, \dot{x}\left(t_{0}\right)=v_{0}, u_{0}, v_{0} \in \mathcal{H}, \beta \geq 0(DINAVD)α,βx¨(t)+αtx˙(t)+β2g(x(t))x˙(t)+g(x(t))=0,x(t0)=u0,x˙(t0)=v0,u0,v0H,β0.
In [19], for the case α > 3 , β > 0 α > 3 , β > 0 alpha > 3,beta > 0\alpha>3, \beta>0α>3,β>0 the authors showed the weak convergence of a generated trajectory to a minimizer of g g ggg and they obtained convergence rate of order o ( 1 / t 2 ) o 1 / t 2 o(1//t^(2))o\left(1 / t^{2}\right)o(1/t2) for the objective function along the trajectory. The temporal discretization of this dynamical system provides first-order algorithms which beside the extrapolation term contain a correction term which is equal to the difference of the gradients at two consecutive steps [8, 2]. Several recent studies have been devoted to this subject, see [6, 14, 15, 29, 32]. Further, Bot-Csetnek-László considered in [20] the Tikhonov regularization of (DIN-AVD) α , β α , β _(alpha,beta){ }_{\alpha, \beta}α,β. They obtained fast convergence results for the function values along the trajectories and strong convergence results of the trajectory to the minimizer of the objective function of minimum norm. Now, by using the Taylor expansion of g ( ) g ( ) grad g(*)\nabla g(\cdot)g() we get
(1.2) g ( x ( t ) + ( γ + β t ) x ˙ ( t ) ) g ( x ( t ) ) + ( γ + β t ) 2 g ( x ( t ) ) x ˙ ( t ) , (1.2) g x ( t ) + γ + β t x ˙ ( t ) g ( x ( t ) ) + γ + β t 2 g ( x ( t ) ) x ˙ ( t ) , {:(1.2)grad g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))~~grad g(x(t))+(gamma+(beta )/(t))grad^(2)g(x(t))x^(˙)(t)",":}\begin{equation*} \nabla g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right) \approx \nabla g(x(t))+\left(\gamma+\frac{\beta}{t}\right) \nabla^{2} g(x(t)) \dot{x}(t), \tag{1.2} \end{equation*}(1.2)g(x(t)+(γ+βt)x˙(t))g(x(t))+(γ+βt)2g(x(t))x˙(t),
which shows that the system (DIN-AVD) α , β α , β _(alpha,beta){ }_{\alpha, \beta}α,β with Tikhonov regularization term ϵ ( t ) x ( t ) ϵ ( t ) x ( t ) epsilon(t)x(t)\epsilon(t) x(t)ϵ(t)x(t) considered in [20] and the dynamical system (1.1) are strongly related. In this paper, we aim to obtain fast convergence results for the function values along the trajectories generated by the dynamical system (1.1) and strong convergence results of the trajectories to the minimizer of the objective function of minimum norm, under some similar assumption as those considered in [20]. However, we emphasize that our objective function g g ggg is of class C 1 C 1 C^(1)C^{1}C1 meanwhile the objective function considered in [20] is of class C 2 C 2 C^(2)C^{2}C2. Further, as we mentioned before, we are also able to show the rate o ( 1 / t ) o ( 1 / t ) o(1//t)o(1 / t)o(1/t) for the velocity. Moreover, according to [3] the dynamical system (1.1), (with ϵ 0 ϵ 0 epsilon-=0\epsilon \equiv 0ϵ0 ), leads via explicit Euler discretization to inertial algorithms. In particular the Nesterov accelerated convex gradient method can be obtained from (1.1) via natural explicit discretization.
The following numerical experiments reveal that a trajectory generated by (1.1) and the objective function value in this trajectory have a better convergence behaviour than a trajectory generated by the system ( AVD ) α ( AVD ) α (AVD)_(alpha)(\mathrm{AVD})_{\alpha}(AVD)α with a Tikhonov regularization term studied in [10] and also a similar behaviour as the trajectories generated by the dynamical system considered in [20]. At the same time, the perturbation term ( γ + β t ) x ˙ ( t ) γ + β t x ˙ ( t ) (gamma+(beta )/(t))x^(˙)(t)\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)(γ+βt)x˙(t) in the argument of the gradient of the objective function g g ggg has a smoothing effect, just as the case of (DINAVD) α , β α , β _(alpha,beta){ }_{\alpha, \beta}α,β. This also confirms our conclusion that (1.1) can be thought as an intermediate system between the system ( AVD ) α (  AVD  ) α (" AVD ")_(alpha)(\text { AVD })_{\alpha}( AVD )α with a Tikhonov regularization term considered in [10] and (DIN-AVD) α , β α , β _(alpha,beta){ }_{\alpha, \beta}α,β with a Tikhonov regularization term studied in [20], which inherits the best properties of the latter systems.
1.3. Some numerical experiments. In this section we consider two numerical experiments for the trajectories generated by the dynamical system (1.1) for a convex but not strongly convex objective function
g : R 2 R , g ( x , y ) = ( a x + b y ) 2 where a , b R { 0 } . g : R 2 R , g ( x , y ) = ( a x + b y ) 2  where  a , b R { 0 } . g:R^(2)rarrR,quad g(x,y)=(ax+by)^(2)" where "a,b inR\\{0}.g: \mathbb{R}^{2} \rightarrow \mathbb{R}, \quad g(x, y)=(a x+b y)^{2} \text { where } a, b \in \mathbb{R} \backslash\{0\} .g:R2R,g(x,y)=(ax+by)2 where a,bR{0}.
Observe that argmin g = { ( x , a b x ) : x R } argmin g = x , a b x : x R argmin g={(x,-(a)/(b)x):x inR}\operatorname{argmin} g=\left\{\left(x,-\frac{a}{b} x\right): x \in \mathbb{R}\right\}argming={(x,abx):xR} and min g = 0 min g = 0 min g=0\min g=0ming=0. Obviously, the minimizer of minimal norm of g g ggg is x = ( 0 , 0 ) x = ( 0 , 0 ) x^(**)=(0,0)x^{*}=(0,0)x=(0,0).
Everywhere in the following numerical experiments we consider the continuous time dynamical systems (1.1) and the dynamical systems (AVD) α α _(alpha){ }_{\alpha}α and (DIN-AVD) α , β α , β _(alpha,beta){ }_{\alpha, \beta}α,β with or without the regularization term ϵ ( t ) x ( t ) ϵ ( t ) x ( t ) epsilon(t)x(t)\epsilon(t) x(t)ϵ(t)x(t), solved numerically with the ode45 adaptive method in MATLAB on the interval [1, 10]. For the Tikhonov regularization parameter we take ϵ ( t ) = 1 t 1.5 ϵ ( t ) = 1 t 1.5 epsilon(t)=(1)/(t^(1.5))\epsilon(t)=\frac{1}{t^{1.5}}ϵ(t)=1t1.5 and we consider the starting points x ( 1 ) = ( 1 , 1 ) , x ˙ ( 1 ) = ( 1 , 1 ) x ( 1 ) = ( 1 , 1 ) , x ˙ ( 1 ) = ( 1 , 1 ) x(1)=(1,-1),x^(˙)(1)=(-1,1)x(1)=(1,-1), \dot{x}(1)= (-1,1)x(1)=(1,1),x˙(1)=(1,1).
g ( x ¯ ) lim inf n + g ( x ( t n ) ) = min g , g ( x ¯ ) lim inf n + g x t n = min g , g( bar(x)) <= l i m   i n f_(n rarr+oo)g(x(t_(n)))=min g,g(\bar{x}) \leq \liminf _{n \rightarrow+\infty} g\left(x\left(t_{n}\right)\right)=\min g,g(x¯)lim infn+g(x(tn))=ming,
hence x ¯ argmin g x ¯ argmin g bar(x)in argmin g\bar{x} \in \operatorname{argmin} gx¯argming. Now, since the norm is weakly lower semicontinuous one has that
x ¯ lim inf n + x ( t n ) x , x ¯ lim inf n + x t n x , || bar(x)|| <= l i m   i n f_(n rarr+oo)||x(t_(n))|| <= ||x^(**)||,\|\bar{x}\| \leq \liminf _{n \rightarrow+\infty}\left\|x\left(t_{n}\right)\right\| \leq\left\|x^{*}\right\|,x¯lim infn+x(tn)x,
which, from the definition of x x x^(**)x^{*}x, implies that x ¯ = x x ¯ = x bar(x)=x^(**)\bar{x}=x^{*}x¯=x. This shows that the trajectory x ( ) x ( ) x(*)x(\cdot)x() converges weakly to x x x^(**)x^{*}x. So
x lim inf t + x ( t ) lim sup t + x ( t ) x x lim inf t + x ( t ) lim sup t + x ( t ) x ||x^(**)|| <= l i m   i n f_(t rarr+oo)||x(t)|| <= l i m   s u p_(t rarr+oo)||x(t)|| <= ||x^(**)||\left\|x^{*}\right\| \leq \liminf _{t \rightarrow+\infty}\|x(t)\| \leq \limsup _{t \rightarrow+\infty}\|x(t)\| \leq\left\|x^{*}\right\|xlim inft+x(t)lim supt+x(t)x
hence we have
lim t + x ( t ) = x lim t + x ( t ) = x lim_(t rarr+oo)||x(t)||=||x^(**)||\lim _{t \rightarrow+\infty}\|x(t)\|=\left\|x^{*}\right\|limt+x(t)=x
From the previous relation and the fact that x ( t ) x x ( t ) x x(t)⇀x^(**)x(t) \rightharpoonup x^{*}x(t)x as t + t + t rarr+oot \rightarrow+\inftyt+, we obtain the strong convergence, i.e.
lim t + x ( t ) = x lim t + x ( t ) = x lim_(t rarr+oo)x(t)=x^(**)\lim _{t \rightarrow+\infty} x(t)=x^{*}limt+x(t)=x
Finally, the last case reads as follows.
Case III: We suppose that for every T t 0 T t 0 T >= t_(0)T \geq t_{0}Tt0 there exists t T t T t >= Tt \geq TtT such that x > x ( t ) x > x ( t ) ||x^(**)|| > ||x(t)||\left\|x^{*}\right\|>\|x(t)\|x>x(t) and also there exists s T s T s >= Ts \geq TsT such that x x ( s ) x x ( s ) ||x^(**)|| <= ||x(s)||\left\|x^{*}\right\| \leq\|x(s)\|xx(s). From the continuity of the unique strong global solution x ( ) x ( ) x(*)x(\cdot)x(), we find that there exists a sequence ( t n ) n N [ t 0 , + ) t n n N t 0 , + (t_(n))_(n inN)sube[t_(0),+oo)\left(t_{n}\right)_{n \in \mathbb{N}} \subseteq\left[t_{0},+\infty\right)(tn)nN[t0,+) such that t n + t n + t_(n)rarr+oot_{n} \rightarrow+\inftytn+ as n + n + n rarr+oon \rightarrow+\inftyn+ and, for all n N n N n inNn \in \mathbb{N}nN we have
x ( t n ) = x x t n = x ||x(t_(n))||=||x^(**)||\left\|x\left(t_{n}\right)\right\|=\left\|x^{*}\right\|x(tn)=x
In order to show that x ( t n ) x x t n x x(t_(n))rarrx^(**)x\left(t_{n}\right) \rightarrow x^{*}x(tn)x as n + n + n rarr+oon \rightarrow+\inftyn+, we let x ¯ H x ¯ H bar(x)inH\bar{x} \in \mathcal{H}x¯H to be a weak sequential cluster point of ( x ( t n ) ) n N x t n n N (x(t_(n)))_(n inN)\left(x\left(t_{n}\right)\right)_{n \in \mathbb{N}}(x(tn))nN. By using that the sequence is bounded and by employing arguments similar to the previous case, we eventually find that ( x ( t n ) ) n N x t n n N (x(t_(n)))_(n inN)\left(x\left(t_{n}\right)\right)_{n \in \mathbb{N}}(x(tn))nN converges weakly to x x x^(**)x^{*}x as n + n + n rarr+oon \rightarrow+\inftyn+. Obviously x ( t n ) x x t n x ||x(t_(n))||rarr||x^(**)||\left\|x\left(t_{n}\right)\right\| \rightarrow\left\|x^{*}\right\|x(tn)x as n + n + n rarr+oon \rightarrow+\inftyn+. So, it follows that x ( t n ) x 0 x t n x 0 ||x(t_(n))-x^(**)||rarr0\left\|x\left(t_{n}\right)-x^{*}\right\| \rightarrow 0x(tn)x0 as n + n + n rarr+oon \rightarrow+\inftyn+. This leads to
lim inf t + x ( t ) x = 0 lim inf t + x ( t ) x = 0 l i m   i n f_(t rarr+oo)||x(t)-x^(**)||=0\liminf _{t \rightarrow+\infty}\left\|x(t)-x^{*}\right\|=0lim inft+x(t)x=0
and the proof is over.
Remark 4. Observe that according to (3.23), in case α > 3 , γ = 0 α > 3 , γ = 0 alpha > 3,gamma=0\alpha>3, \gamma=0α>3,γ=0 it is enough to assume that
lim t 1 ϵ ( t ) t α 3 + 1 t 0 t ϵ 2 ( s ) s α 3 d s = 0 lim t 1 ϵ ( t ) t α 3 + 1 t 0 t ϵ 2 ( s ) s α 3 d s = 0 lim_(t rarr oo)(1)/(epsilon(t)t^((alpha)/(3)+1))int_(t_(0))^(t)epsilon^(2)(s)s^((alpha)/(3))ds=0\lim _{t \rightarrow \infty} \frac{1}{\epsilon(t) t^{\frac{\alpha}{3}+1}} \int_{t_{0}}^{t} \epsilon^{2}(s) s^{\frac{\alpha}{3}} d s=0limt1ϵ(t)tα3+1t0tϵ2(s)sα3ds=0
  1. Conclusion, perspectives. The dynamical system (1.1) studied in this paper can be seen as a second order system with implicit Hessian driven damping, therefore it is in strong connection with the dynamical system studied in [20]. At the same time, (1.1) is a perturbed version of the dynamical system with asymptotically vanishing damping considered in [10]. We have shown that (1.1) possess all the valuable properties of the two related systems, as we obtained fast convergence of velocities to zero, integral estimates for the gradient of the objective function and fast convergence of the objective function values to the minimum of the objective function. Further, depending the Tikhonov regularization parameter ϵ ( t ) ϵ ( t ) epsilon(t)\epsilon(t)ϵ(t) goes fast or slow to zero, we obtained weak convergence of the trajectories to a minimizer of the objective function and strong convergence of the trajectories to a minimizer of minimum norm, respectively. Even more, by using the techniques from [16], we were able to obtain both strong convergence of the trajectories to a minimizer of minimum norm and fast convergence of the function values for the same dynamics.
The article presents the basic analysis of the dynamical system (1.1), many aspects of which have yet to be developed. We ought to enlarge the framework by considering optimization problems with nonsmooth convex objective function and in the corresponding dynamical system, with or without Tikhonov regularization term, to replace the function by its Moreau envelope, (see [15]). Further, we intend to study the inertial algorithms obtained from (1.1) via explicit discretization and by taking advantage by the fact that these algorithms may have different inertial terms (see [3]), to obtain convergence of the generated sequences to a minimizer of the objective function. The Tikhonov regularization of these algorithms may allow to obtain strong convergence of the generated sequences to a minimizer of minimal norm of the objective function, (see [16]). Some recent results show that for non-convex objective, considering different inertial terms in the corresponding algorithms, bring some improvements (see [28]). It would be interesting to show that similar results hold also in convex case. However, if we discretize the dynamical system (1.1) by making use the Taylor expansion of the gradient we obtain inertial algorithms similar to the algorithms considered in [8] and [2]. Further, in case the objective function is non-smooth, the above described discretization techniques lead to algorithms which are related to the celebrated algorithms (RIPA) [18] and (PRINAM) [14].
Acknowledgements The authors are thankful to three anonymous reviewers for their comments and suggestions which improved the quality of the paper.
Appendix A. Existence and uniqueness of the trajectories generated by the dynamical system (1.1). In what follows we show the existence and uniqueness of a classical C 2 C 2 C^(2)C^{2}C2 solution x x xxx of the dynamical system (1.1). To this purpose we rewrite (1.1) as a first order system relevant for the Cauchy-Lipschitz-Picard theorem.
Theorem A.1. Let ( u 0 , v 0 ) H × H u 0 , v 0 H × H (u_(0),v_(0))inHxxH\left(u_{0}, v_{0}\right) \in \mathcal{H} \times \mathcal{H}(u0,v0)H×H. Then, the dynamical system (1.1) admits a unique global C 2 ( ( t 0 , + ) , H ) C 2 t 0 , + , H C^(2)((t_(0),+oo),H)C^{2}\left(\left(t_{0},+\infty\right), \mathcal{H}\right)C2((t0,+),H) solution.
Proof. Indeed, by using the notation X ( t ) := ( x ( t ) , x ˙ ( t ) ) X ( t ) := ( x ( t ) , x ˙ ( t ) ) X(t):=(x(t),x^(˙)(t))X(t):=(x(t), \dot{x}(t))X(t):=(x(t),x˙(t)), the dynamical system (1.1) can be put in the form
(A.1) { X ˙ ( t ) = F ( t , X ( t ) ) X ( t 0 ) = ( u 0 , v 0 ) (A.1) X ˙ ( t ) = F ( t , X ( t ) ) X t 0 = u 0 , v 0 {:(A.1){[X^(˙)(t)=F(t","X(t))],[X(t_(0))=(u_(0),v_(0))]:}:}\left\{\begin{array}{l} \dot{X}(t)=F(t, X(t)) \tag{A.1}\\ X\left(t_{0}\right)=\left(u_{0}, v_{0}\right) \end{array}\right.(A.1){X˙(t)=F(t,X(t))X(t0)=(u0,v0)
where F : [ t 0 , ) × H × H H × H , F ( t , u , v ) = ( v , α t v ϵ ( t ) u g ( u + ( γ + β t ) v ) ) F : t 0 , × H × H H × H , F ( t , u , v ) = v , α t v ϵ ( t ) u g u + γ + β t v F:[t_(0),oo)xxHxxHlongrightarrowHxxH,F(t,u,v)=(v,-(alpha )/(t)v-epsilon(t)u-grad g(u+(gamma+(beta )/(t))v))F:\left[t_{0}, \infty\right) \times \mathcal{H} \times \mathcal{H} \longrightarrow \mathcal{H} \times \mathcal{H}, F(t, u, v)=\left(v,-\frac{\alpha}{t} v-\epsilon(t) u-\nabla g\left(u+\left(\gamma+\frac{\beta}{t}\right) v\right)\right)F:[t0,)×H×HH×H,F(t,u,v)=(v,αtvϵ(t)ug(u+(γ+βt)v)).
Our proof is inspired from [12]. Since g g grad g\nabla gg is Lipschitz on bounded sets, it is obvious that for (A.1) the classical Cauchy-Picard theorem can be applied, hence, there exist a unique C 1 C 1 C^(1)C^{1}C1 local solution X X XXX. Consequently, (1.1) has a unique C 2 C 2 C^(2)C^{2}C2 local solution. Let x x xxx be a maximal solution of (1.1), defined on an interval [ t 0 , T max ) , T max + t 0 , T max , T max + [t_(0),T_(max)),T_(max) <= +oo\left[t_{0}, T_{\max }\right), T_{\max } \leq+\infty[t0,Tmax),Tmax+. In order to prove that x ˙ x ˙ x^(˙)\dot{x}x˙ is bounded on [ t 0 , T max ) t 0 , T max [t_(0),T_(max))\left[t_{0}, T_{\max }\right)[t0,Tmax) one can use the same arguments as in the proof of Lemma 3.1.
Let x ˙ = sup t [ t 0 , T max ) x ˙ ( t ) x ˙ = sup t t 0 , T max x ˙ ( t ) ||x^(˙)_(oo)||=s u p_(t in[t_(0),T_(max)))||x^(˙)(t)||\left\|\dot{x}_{\infty}\right\|=\sup _{t \in\left[t_{0}, T_{\max }\right)}\|\dot{x}(t)\|x˙=supt[t0,Tmax)x˙(t) and assume that T max < + T max < + T_(max) < +ooT_{\max }<+\inftyTmax<+. Since x ( t ) x ( t ) x ˙ t t x ( t ) x t x ˙ t t ||x(t)-x(t^('))|| <= ||x^(˙)_(oo)||t-t^(')∣\left\|x(t)-x\left(t^{\prime}\right)\right\| \leq\left\|\dot{x}_{\infty}\right\| t-t^{\prime} \midx(t)x(t)x˙tt, we get that lim t T max x ( t ) := x H lim t T max  x ( t ) := x H lim_(t rarrT_("max "))x(t):=x_(oo)inH\lim _{t \rightarrow T_{\text {max }}} x(t):=x_{\infty} \in \mathcal{H}limtTmax x(t):=xH. By (1.1) the map x ¨ x ¨ x^(¨)\ddot{x}x¨ is also bounded on the interval [ t 0 , T max ) t 0 , T max  [t_(0),T_("max "))\left[t_{0}, T_{\text {max }}\right)[t0,Tmax ) and under the same argument as before lim t T max x ˙ ( t ) := x lim t T max  x ˙ ( t ) := x lim_(t rarrT_("max "))x^(˙)(t):=x_(oo)\lim _{t \rightarrow T_{\text {max }}} \dot{x}(t):=x_{\infty}limtTmax x˙(t):=x exists. Applying the local existence theorem with initial data ( x , x ˙ ) x , x ˙ (x_(oo),x^(˙)_(oo))\left(x_{\infty}, \dot{x}_{\infty}\right)(x,x˙), we can extend the maximal solution to a strictly larger interval, a clear contradiction. Hence T max = + T max  = + T_("max ")=+ooT_{\text {max }}=+\inftyTmax =+, which completes the proof.
Appendix B. Auxiliary results. In this appendix, we collect some lemmas and technical results which we will use in the analysis of the dynamical system (1.1). The following lemma was stated for instance
in [10, Lemma A.3] and is used to prove the convergence of the objective function along the trajectory to its minimal value.
Lemma B.1. Let δ > 0 δ > 0 delta > 0\delta>0δ>0 and f L 1 ( ( δ , + ) , R ) f L 1 ( ( δ , + ) , R ) f inL^(1)((delta,+oo),R)f \in L^{1}((\delta,+\infty), \mathbb{R})fL1((δ,+),R) be a nonnegative and continuous function. Let φ : [ δ , + ) [ 0 , + ) φ : [ δ , + ) [ 0 , + ) varphi:[delta,+oo)longrightarrow[0,+oo)\varphi: [\delta,+\infty) \longrightarrow[0,+\infty)φ:[δ,+)[0,+) be a nondecreasing function such that lim t + φ ( t ) = + lim t + φ ( t ) = + lim_(t rarr+oo)varphi(t)=+oo\lim _{t \rightarrow+\infty} \varphi(t)=+\inftylimt+φ(t)=+. Then it holds
lim t + 1 φ ( t ) δ t φ ( s ) f ( s ) d s = 0 lim t + 1 φ ( t ) δ t φ ( s ) f ( s ) d s = 0 lim_(t rarr+oo)(1)/(varphi(t))int_(delta)^(t)varphi(s)f(s)ds=0\lim _{t \rightarrow+\infty} \frac{1}{\varphi(t)} \int_{\delta}^{t} \varphi(s) f(s) d s=0limt+1φ(t)δtφ(s)f(s)ds=0
The following statement is the continuous counterpart of a convergence result of quasi-Fejér monotone sequences. For its proofs we refer to [1, Lemma 5.1].
Lemma B.2. Suppose that F : [ t 0 , + ) R F : t 0 , + R F:[t_(0),+oo)longrightarrowRF:\left[t_{0},+\infty\right) \longrightarrow \mathbb{R}F:[t0,+)R is locally absolutely continuous and bounded from below and that there exists G L 1 ( [ t 0 , + ) , R ) G L 1 t 0 , + , R G inL^(1)([t_(0),+oo),R)G \in L^{1}\left(\left[t_{0},+\infty\right), \mathbb{R}\right)GL1([t0,+),R) such that
d d t F ( t ) G ( t ) d d t F ( t ) G ( t ) (d)/(dt)F(t) <= G(t)\frac{d}{d t} F(t) \leq G(t)ddtF(t)G(t)
for almost every t [ t 0 , + ) t t 0 , + t in[t_(0),+oo)t \in\left[t_{0},+\infty\right)t[t0,+). Then there exists lim t + F ( t ) R lim t + F ( t ) R lim_(t rarr+oo)F(t)inR\lim _{t \rightarrow+\infty} F(t) \in \mathbb{R}limt+F(t)R.
The following technical result is [19, Lemma 2].
Lemma B.3. Let u : [ t 0 , + ) H u : t 0 , + H u:[t_(0),+oo)longrightarrowHu:\left[t_{0},+\infty\right) \longrightarrow \mathcal{H}u:[t0,+)H be a continuously differentiable function satisfying u ( t ) + t α u ˙ ( t ) u H u ( t ) + t α u ˙ ( t ) u H u(t)+(t)/( alpha)u^(˙)(t)rarr u inHu(t)+\frac{t}{\alpha} \dot{u}(t) \rightarrow u \in \mathcal{H}u(t)+tαu˙(t)uH as t + t + t rarr+oot \rightarrow+\inftyt+, where α > 0 α > 0 alpha > 0\alpha>0α>0. Then u ( t ) u u ( t ) u u(t)rarr uu(t) \rightarrow uu(t)u as t + t + t rarr+oot \rightarrow+\inftyt+.
The continuous version of the Opial Lemma (see [9]) is the main tool for proving weak convergence for the generated trajectory.
Lemma B.4. Let S H S H S subeHS \subseteq \mathcal{H}SH be a nonempty set and x : [ t 0 , + ) H x : t 0 , + H x:[t_(0),+oo)longrightarrow Hx:\left[t_{0},+\infty\right) \longrightarrow Hx:[t0,+)H a given map such that:
(i) for every z S z S z in Sz \in SzS the limit lim t + x ( t ) z lim t + x ( t ) z lim_(t rarr+oo)||x(t)-z||\lim _{t \rightarrow+\infty}\|x(t)-z\|limt+x(t)z exists;
(ii) every weak sequential limit point of x ( t ) x ( t ) x(t)x(t)x(t) belongs to the set S S SSS.
Then the trajectory x ( t ) x ( t ) x(t)x(t)x(t) converges weakly to an element in S S SSS as t + t + t rarr+oot \rightarrow+\inftyt+.
Lemma B.5. (Lemma A.6 [18]) Let t 0 > 0 t 0 > 0 t_(0) > 0t_{0}>0t0>0 and let w : [ t 0 , + ) R w : t 0 , + R w:[t_(0),+oo)longrightarrowRw:\left[t_{0},+\infty\right) \longrightarrow \mathbb{R}w:[t0,+)R be a continuously differentiable function which is bounded from below. Given a nonnegative function θ θ theta\thetaθ, let us assume that
t w ¨ ( t ) + α w ˙ ( t ) + θ ( t ) k ( t ) t w ¨ ( t ) + α w ˙ ( t ) + θ ( t ) k ( t ) tw^(¨)(t)+alphaw^(˙)(t)+theta(t) <= k(t)t \ddot{w}(t)+\alpha \dot{w}(t)+\theta(t) \leq k(t)tw¨(t)+αw˙(t)+θ(t)k(t)
for some α > 1 α > 1 alpha > 1\alpha>1α>1, almost every t > t 0 t > t 0 t > t_(0)t>t_{0}t>t0, and some nonnegative function k L 1 ( ( t 0 , + ) , R ) k L 1 t 0 , + , R k inL^(1)((t_(0),+oo),R)k \in L^{1}\left(\left(t_{0},+\infty\right), \mathbb{R}\right)kL1((t0,+),R).
Then, the positive part [ w ˙ ] + [ w ˙ ] + [w^(˙)]_(+)[\dot{w}]_{+}[w˙]+of w ˙ w ˙ w^(˙)\dot{w}w˙ belongs to L 1 ( ( t 0 , + ) , R ) L 1 t 0 , + , R L^(1)((t_(0),+oo),R)L^{1}\left(\left(t_{0},+\infty\right), \mathbb{R}\right)L1((t0,+),R) and lim t + w ( t ) lim t + w ( t ) lim_(t rarr+oo)w(t)\lim _{t \rightarrow+\infty} w(t)limt+w(t) exists. Moreover, we have
t 0 + θ ( t ) d t < + t 0 + θ ( t ) d t < + int_(t_(0))^(+oo)theta(t)dt < +oo\int_{t_{0}}^{+\infty} \theta(t) d t<+\inftyt0+θ(t)dt<+

REFERENCES

[1] B. Abbas, H. Attouch, B.F. Svaiter, Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces, Journal of Optimization Theory and its Applications, 161(2) (2014), 331-360
[2] C.D. Alecsa, The long time behavior and the rate of convergence of symplectic convex algorithms obtained via splitting discretizations of inertial damping systems, (2020), arxiv.org/abs/2001.10831
[3] C.D. Alecsa, S.C. László, T. Pinţa, An Extension of the Second Order Dynamical System that Models Nesterov's Convex Gradient Method, Applied Mathematics and Optimization, (2020), https://doi.org/10.1007/s00245-020-09692-1
[4] C.D. Alecsa, S.C. László, A. Viorel, A gradient type algorithm with backward inertial steps associated to a nonconvex minimization problem, Numerical Algorithms, 84(2) (2020), 485-512
[5] V. Apidopoulos, J.F. Aujol, Ch. Dossal, Convergence rate of inertial Forward-Backward algorithm beyond Nesterov's rule, Mathematical Programming, 180 (2020), 137-156
[6] H. Attouch, A. Balhag, Z. Chbani, H. Riahi, Fast convex optimization via inertial dynamics combining viscous and Hessian-driven damping with time rescaling, Evolution Equations and Control Theory, (2021), https://doi.org/10.3934/eect. 202101
[7] H. Attouch, A. Cabot, Convergence rates of inertial forward-backward algorithms, SIAM J. Optim., 28(1) (2018), 849-874
[8] H. Attouch, Z. Chbani, J. Fadili, H. Riahi, First-order algorithms via inertial systems with Hessian driven damping, Math. Program., (2020), https://doi.org/10.1007/s10107-020-01591-1
[9] H. Attouch, Z. Chbani, J. Peypouquet, P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. Program. Ser. B, 168 (2018), 123-175
[10] H. Attouch, Z. Chbani, H. Riahi, Combining fast inertial dynamics for convex optimization with Tikhonov regularization, Journal of Mathematical Analysis and Applications, 457(2) (2018), 1065-1094
[11] H. Attouch, Z. Chbani, R. Riahi, Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α 3 α 3 alpha <= 3\alpha \leq 3α3. ESAIM-COCV, 25 (2019), Article number 2
[12] H. Attouch, M.-O. Czarnecki, Asymptotic Control and Stabilization of Nonlinear Oscillators with Non-isolated Equilibria, J. Differential Equations, 179 (2002), 278-310
[13] H. Attouch, R. Cominetti, A dynamical approach to convex minimization coupling approximation with the steepest descent method, J. Differential Equations, 128 (1996), 519-540
[14] H. Attouch, S.C. László, Newton-like Inertial Dynamics and Proximal Algorithms Governed by Maximally Monotone Operators, SIAM Journal on Optimization, 30(4) (2020), 3252-3283
[15] H. Attouch, S.C. László, Continuous Newton-like Inertial Dynamics for Monotone Inclusions, Set-Valued and Variational Analysis, (2020), doi:10.1007/s11228-020-00564-y
[16] H. Attouch, S.C. László, Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution, (2021), https://arxiv.org/abs/2104.11987
[17] H. Attouch, J. Peypouquet, The rate of convergence of Nesterov's accelerated forward-backward method is actually faster than 1 / k 2 1 / k 2 1//k^(2)1 / k^{2}1/k2. SIAM J. Optim., 26(3) (2016), 1824-1834
[18] H. Attouch, J. Peypouquet, Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators, Mathematical Programming, 174(1-2) (2019), 391-432
[19] H. Attouch, J. Peypouquet, P. Redont, Fast convex optimization via inertial dynamics with Hessian driven damping, Journal of Differential Equations, 261(10) (2016), 5734-5783
[20] R.I. Boţ, E.R. Csetnek, S.C. László, Tikhonov regularization of a second order dynamical system with Hessian damping, Math. Program., (2020), https://doi.org/10.1007/s10107-020-01528-8
[21] R.I. Boţ, E.R. Csetnek, S.C. László, A second-order dynamical approach with variable damping to nonconvex smooth minimization Applicable Analysis, 99(3) (2020), 361-378
[22] R.I. Boţ, S.M. Grad, D. Meier, M. Staudigl, Inducing strong convergence of trajectories in dynamical systems associated to monotone inclusions with composite structure, Adv. Nonlinear Anal., 10 (2021), 450-476
[23] A. Cabot, H. Engler, S. Gadat, On the long time behavior of second order differential equations with asymptotically small dissipation, Transactions of the American Mathematical Society, 361 (2009), 5983-6017
[24] A. Chambolle, Ch. Dossal, On the convergence of the iterates of the Fast Iterative Shrinkage Thresholding Algorithm, Journal of Optimization Theory and Applications, 166 (2015), 968-982
[25] R. Cominetti, J. Peypouquet, S. Sorin, Strong asymptotic convergence of evolution equations governed by maximal monotone operators with Tikhonov regularization, J. Differential Equations, 245 (2008), 3753-3763
[26] M.A. Jendoubi, R. May, Asymptotics for a second-order differential equation with nonautonomous damping and an integrable source term, Appl. Anal., 94(2) (2015), 435-443
[27] S.C. László, Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization, Mathematical Programming, (2020), https://doi.org/doi.org/10.1007/s10107-020-01534-w
[28] S.C. László, Forward-backward algorithms with different inertial terms for structured non-convex minimization problems, (2020), arxiv.org/abs/2002.07154
[29] T. Lin, M.I. Jordan, A Control-Theoretic Perspective on Optimal High-Order Optimization, (2019), arXiv:1912.07168v1
[30] R. May, Asymptotic for a second-order evolution equation with convex potential and vanishing damping term, Turkish Journal of Math., 41(3) (2017), 681-685
[31] Y. Nesterov, A method of solving a convex programming problem with convergence rate O ( 1 / k 2 ) O 1 / k 2 O(1//k^(2))O\left(1 / k^{2}\right)O(1/k2), Soviet Mathematics Doklady, 27 (1983), 372-376
[32] B. Shi, S.S. Du, M.I. Jordan, W.J. Su, Understanding the acceleration phenomenon via high-resolution differential equations, (2018), arXiv:1810.08907v3
[33] W. Su, S. Boyd, E.J. Candès, A differential equation for modeling Nesterov's accelerated gradient method: theory and insights, Journal of Machine Learning Research 17(153) (2016), 1-43

  1. *Technical University of Cluj-Napoca, Department of Mathematics, Str. Memorandumului nr. 28, 400114 Cluj-Napoca, Romania and Romanian Institute of Science and Technology, Cluj-Napoca, Romania, e-mail: alecsa@rist.ro.
    ^(†){ }^{\dagger} Technical University of Cluj-Napoca, Department of Mathematics, Str. Memorandumului nr. 28, 400114 Cluj-Napoca, Romania, e-mail: szilard.laszlo@math.utcluj.ro
2021

Related Posts