This paper deals with aa perturbed heavy ball system with vanishing damping that contains a Tikhonov regularization term, in connection to the minimization problem of a convex Fréchet differentiable function. We show that the value of the objective function in a generated trajectory converges in order o (1/t2) to the global minimum of the objective function. We also obtain the fast convergence of the velocities towards zero. Moreover, we obtain that a trajectory generated by the dynamical system converges weakly to a minimizer of the objective function. Finally, we show that the presence of the Tikhonov regularization term assures the strong convergence of the generated trajectories to an element of minimal norm from the argmin set of the objective function.
Authors
Cristian Daniel Alecsa, (“Tiberiu Popoviciu” Institute of Numerical Analysis, Romanian Academy,
Romanian Institute of Science and Technology)
Laszlo Szilard Csaba Universitatea Tehnica Cluj-Napoca
Note on author affiliation
The paper has been elaborated by the first author while working at ICTP, according to the version of the preprint posted on ResearchGate, at http://doi.org/10.13140/RG.2.2.29212.92801.
However, for the final/published version of the paper, the first author has deleted his affiliation at ICTP.
Keywords
convex optimization; heavy ball method; continuous second order dynamical system; Tikhonov regularization; convergence rate; strong convergence
C.D. Alecsa, L.S. Csaba, Tikhonov regularization of a perturbed heavy ball system with vanishing damping, SIAM J. Optim., 31 (2021) no. 4, pp. 2921-2954, http://doi.org/10.1137/20M1382027
[1] B. Abbas, H. Attouch, B.F. Svaiter, Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces, Journal of Optimization Theory and its Applications, 161(2) (2014), 331-360748
[2] C.D. Alecsa, The long time behavior and the rate of convergence of symplectic convex algorithms obtained via splitting discretizations of inertial damping systems, (2020), arxiv.org/abs/2001.10831750
[3] C.D. Alecsa, S.C. Laszlo, T. Pinta, An Extension of the Second Order Dynamical System that Models Nesterov’s ConvexGradient Method, Applied Mathematics and Optimization, (2020), https://doi.org/10.1007/s00245-020-09692-1752
[4] C.D. Alecsa, S.C. Laszlo, A. Viorel, A gradient type algorithm with backward inertial steps associated to a nonconvexminimization problem, Numerical Algorithms, 84(2) (2020), 485-512754
[5] V. Apidopoulos, J.F. Aujol, Ch. Dossal, Convergence rate of inertial Forward-Backward algorithm beyond Nesterov’srule, Mathematical Programming, 180 (2020), 137-156756
[6] H. Attouch, A. Balhag, Z. Chbani, H. Riahi, Fast convex optimization via inertial dynamics combining vis-757 cous and Hessian-driven damping with time rescaling, Evolution Equations and Control Theory, (2021),https://doi.org/10.3934/eect.202101759
[7] H. Attouch, A. Cabot, Convergence rates of inertial forward-backward algorithms, SIAM J. Optim., 28(1) (2018), 849-874
[8] H. Attouch, Z. Chbani, J. Fadili, H. Riahi, First-order algorithms via inertial systems with Hessian driven damping,Math. Program., (2020), https://doi.org/10.1007/s10107-020-01591-1762
[9] H. Attouch, Z. Chbani, J. Peypouquet, P. Redont, Fast convergence of inertial dynamics and algorithms with asymptoticvanishing viscosity, Math. Program. Ser. B, 168 (2018), 123-175764
[10] H. Attouch, Z. Chbani, H. Riahi, Combining fast inertial dynamics for convex optimization with Tikhonov regularization,Journal of Mathematical Analysis and Applications, 457(2) (2018), 1065-1094766
[11] H. Attouch, Z. Chbani, R. Riahi, Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α≤3. ESAIM-COCV, 25 (2019), Article number 2768
[12] H. Attouch, M.-O. Czarnecki, Asymptotic Control and Stabilization of Nonlinear Oscillators with Non-isolated Equilibria, J. Differential Equations, 179 (2002), 278-310770
[13] H. Attouch, R. Cominetti, A dynamical approach to convex minimization coupling approximation with the steepest descent method, J. Differential Equations, 128 (1996), 519-540772
[14] H. Attouch, S.C. Laszlo, Newton-like Inertial Dynamics and Proximal Algorithms Governed by Maximally Monotone Operators, SIAM Journal on Optimization, 30(4) (2020), 3252–3283774
[15] H. Attouch, S.C. Laszlo, Continuous Newton-like Inertial Dynamics for Monotone Inclusions, Set-Valued and Variational Analysis, (2020), doi:10.1007/s11228-020-00564-y776
[16] H. Attouch, S.C. Laszlo, Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution, (2021), https://arxiv.org/abs/2104.11987778
[17] H. Attouch, J. Peypouquet, The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k2. SIAM J. Optim., 26(3) (2016), 1824-1834780
[18] H. Attouch, J. Peypouquet, Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators, Mathematical Programming, 174(1-2) (2019), 391–432782
[19] H. Attouch, J. Peypouquet, P. Redont, Fast convex optimization via inertial dynamics with Hessian driven damping, Journal of Differential Equations, 261(10) (2016), 5734-5783784
[20] R.I. Bot¸, E.R. Csetnek, S.C. Laszlo, Tikhonov regularization of a second order dynamical system with Hessian damping, Math. Program., (2020), https://doi.org/10.1007/s10107-020-01528-8786
[21] R.I. Bot¸, E.R. Csetnek, S.C. Laszlo, A second-order dynamical approach with variable damping to nonconvex smooth minimization Applicable Analysis, 99(3) (2020), 361-378788
[22] R.I. Bot¸, S.M. Grad, D. Meier, M. Staudigl, Inducing strong convergence of trajectories in dynamical systems associated to monotone inclusions with composite structure, Adv. Nonlinear Anal., 10 (2021), 450–476790
[23] A. Cabot, H. Engler, S. Gadat, On the long time behavior of second order differential equations with asymptotically small dissipation, Transactions of the American Mathematical Society, 361 (2009), 5983-6017792
[24] A. Chambolle, Ch. Dossal, On the convergence of the iterates of the Fast Iterative Shrinkage Thresholding Algorithm, Journal of Optimization Theory and Applications, 166 (2015), 968-982794
[25] R. Cominetti, J. Peypouquet, S. Sorin, Strong asymptotic convergence of evolution equations governed by maximal monotone operators with Tikhonov regularization, J. Differential Equations, 245 (2008), 3753-3763796
[26] M.A. Jendoubi, R. May, Asymptotics for a second-order differential equation with nonautonomous damping and an integrable source term, Appl. Anal., 94(2) (2015), 435–443798
[27] S.C. Laszlo, Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization, Mathematical Programming, (2020), https://doi.org/doi.org/10.1007/s10107-020-01534-w800
[28] S.C. Laszlo, Forward-backward algorithms with different inertial terms for structured non-convex minimization problems,(2020), arxiv.org/abs/2002.07154802
[29] T. Lin, M.I. Jordan, A Control-Theoretic Perspective on Optimal High-Order Optimization, (2019), arXiv:1912.07168v1803
[30] R. May, Asymptotic for a second-order evolution equation with convex potential and vanishing damping term, Turkish Journal of Math., 41(3) (2017), 681-685805
[31] Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k2), Soviet Mathematics Doklady, 27 (1983), 372-376807
[32] B. Shi, S.S. Du, M.I. Jordan, W.J. Su, Understanding the acceleration phenomenon via high-resolution differential equations, (2018), arXiv:1810.08907v3809
[33] W. Su, S. Boyd, E.J. Candes, A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights, Journal of Machine Learning Research 17(153) (2016), 1-43
Paper (preprint) in HTML form
2021 Alecsa Tikhonov Regularization of a Perturbed
TIKHONOV REGULARIZATION OF A PERTURBED HEAVY BALL SYSTEM WITH VANISHING DAMPING
CRISTIAN DANIEL ALECSA * AND SZILÁRD CSABA LÁSZLÓ †
Abstract
This paper deals with a perturbed heavy ball system with vanishing damping that contains a Tikhonov regularization term, in connection to the minimization problem of a convex Fréchet differentiable function. We show that the value of the objective function in a generated trajectory converges in order o(1//t^(2))o\left(1 / t^{2}\right) to the global minimum of the objective function. We also obtain the fast convergence of the velocities towards zero. Moreover, we obtain that a trajectory generated by the dynamical system converges weakly to a minimizer of the objective function. Finally, we show that the presence of the Tikhonov regularization term assures the strong convergence of the generated trajectories to an element of minimal norm from the argmin set of the objective function.
Key words. convex optimization; heavy ball method; continuous second order dynamical system; Tikhonov regularization; convergence rate; strong convergence.
Introduction. Let H\mathcal{H} be a real Hilbert space endowed with the scalar product (:*,*:)\langle\cdot, \cdot\rangle and norm ||*||\|\cdot\| and let g:HlongrightarrowRg: \mathcal{H} \longrightarrow \mathbb{R} be a convex Fréchet differentiable function. Consider the minimization problem
(P)i n f_(x inH)g(x)(P) \inf _{x \in \mathcal{H}} g(x)
in connection to the second order dynamical system
where t_(0) > 0,(u_(0),v_(0))inHxxH,alpha >= 3,gamma > 0,beta inRt_{0}>0,\left(u_{0}, v_{0}\right) \in \mathcal{H} \times \mathcal{H}, \alpha \geq 3, \gamma>0, \beta \in \mathbb{R} or gamma=0,beta >= 0\gamma=0, \beta \geq 0 and epsilon:[t_(0),+oo)longrightarrowR_(+)\epsilon:\left[t_{0},+\infty\right) \longrightarrow \mathbb{R}_{+}is a nonincreasing function of class C^(1)C^{1}, such that lim_(t rarr+oo)epsilon(t)=0\lim _{t \rightarrow+\infty} \epsilon(t)=0. The starting time t_(0)t_{0} is taken as strictly greater than zero whenever the coefficients (alpha )/(t)\frac{\alpha}{t} and (beta )/(t)\frac{\beta}{t} have singularities at 0 . This is not a limitation of the generality of the proposed approach, since we will focus on the asymptotic behaviour of the generated trajectories.
First of all, note that the dynamical system (1.1) is the Tikhonov regularized version of the perturbed heavy ball system with vanishing damping considered in connection to the optimization problem (P) by Alecsa-László-Pinţa in [3]. The dynamical system considered in [3] can be seen as an intermediate system between the heavy ball system with vanishing damping [33] and the heavy ball system with Hessian driven damping [19] and possesses all the valuable properties of the latter ones. Indeed, according to [3], in case gamma > 0,beta inR\gamma>0, \beta \in \mathbb{R}, or gamma=0,beta >= 0\gamma=0, \beta \geq 0, the objective function value in a trajectory generated by the perturbed heavy ball system converges in order O(1//t^(2))\mathcal{O}\left(1 / t^{2}\right) to the global minimum of the objective function and the trajectory converges weakly to a minimizer of the objective function. Further, according to [3, Remark 2], in case gamma=0\gamma=0 and beta < 0\beta<0 the perturbed heavy ball system can generate periodical solutions, therefore in this case the convergence of a generated trajectory to a minimizer of the objective is hopeless.
Throughout the paper we assume that grad g\nabla g is Lipschitz continuous on bounded sets and argmin g!=O/\operatorname{argmin} g \neq \emptyset. Further, the Tikhonov regularization parameter, epsilon(t)\epsilon(t), satisfies one of the following assumptions, (see also Remark 1).
(C1) There exist K > 1K>1 and t_(1) >= t_(0)t_{1} \geq t_{0} such that
epsilon^(˙)(t) <= -(K)/(2)|gamma+(beta )/(t)|epsilon^(2)(t)" for every "t >= t_(1)\dot{\epsilon}(t) \leq-\frac{K}{2}\left|\gamma+\frac{\beta}{t}\right| \epsilon^{2}(t) \text { for every } t \geq t_{1}
(C2) There exists K > 0K>0 and t_(1) >= t_(0)t_{1} \geq t_{0} such that
epsilon(t) <= (K)/(t)" for every "t >= t_(1)". "\epsilon(t) \leq \frac{K}{t} \text { for every } t \geq t_{1} \text {. }
Natural candidates for the Tikhonov regularization parameter that satisfy the above conditions are epsilon(t)=(a)/(t^(r)),a > 0,r >= 1\epsilon(t)=\frac{a}{t^{r}}, a>0, r \geq 1. In order to give a better perspective of the results obtained in this paper, for this special case of the Tikhonov regularization parameter, we can conclude the following.
According to Theorem 2.1 and Theorem 2.3, if the Tikhonov regularization parameter is epsilon(t)=(a)/(t^(r)),a > 0,r > 2\epsilon(t)= \frac{a}{t^{r}}, a>0, r>2 the following statements hold. When alpha >= 3\alpha \geq 3, one has g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))-min g=O(t^(-2))g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right)-\min g= \mathcal{O}\left(t^{-2}\right) as t rarr+oot \rightarrow+\infty. Further, one has the integral estimates int_(t_(0))^(+oo)t^(2)||grad g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))||^(2)dt < +oo\int_{t_{0}}^{+\infty} t^{2}\left\|\nabla g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right)\right\|^{2} d t< +\infty, whenever gamma > 0,beta inR\gamma>0, \beta \in \mathbb{R} and int_(t_(0))^(+oo)t||grad g(x(t)+(beta )/(t)(x^(˙))(t))||^(2)dt < +oo\int_{t_{0}}^{+\infty} t\left\|\nabla g\left(x(t)+\frac{\beta}{t} \dot{x}(t)\right)\right\|^{2} d t<+\infty, whenever gamma=0,beta > 0\gamma=0, \beta>0. When alpha >\alpha> 3 one has g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))-min g=o(t^(-2)),||x^(˙)(t)||=o(t^(-2))g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right)-\min g=o\left(t^{-2}\right),\|\dot{x}(t)\|=o\left(t^{-2}\right) as t rarr+oot \rightarrow+\infty and x(t)x(t) converges weakly to a minimizer of gg. Further, int_(t_(0))^(+oo)t||x^(˙)(t)||^(2)dt < +oo\int_{t_{0}}^{+\infty} t\|\dot{x}(t)\|^{2} d t<+\infty and int_(t_(0))^(+oo)t(g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))-min g)dt < +oo\int_{t_{0}}^{+\infty} t\left(g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right)-\min g\right) d t<+\infty.
According to Theorem 3.2, in the case epsilon(t)=(a)/(t^(r)),a > 0,1 < r < 2\epsilon(t)=\frac{a}{t^{r}}, a>0,1<r<2, for alpha > 3\alpha>3 and alpha=3,gamma=0,beta >= 0\alpha=3, \gamma=0, \beta \geq 0 one has l i m i n f_(t rarr oo)||x(t)-x^(**)||=0\liminf _{t \rightarrow \infty}\left\|x(t)-x^{*}\right\|=0, where x^(**)x^{*} is the element of minimum norm of argmin gg. In addition, x(t)x(t) converges strongly to x^(**)x^{*} when either the trajectory {x(t):t >= T}\{x(t): t \geq T\} remains in the ball B(0,||x^(**)||)B\left(0,\left\|x^{*}\right\|\right), or in its complement, for TT large enough.
In the case epsilon(t)=(a)/(t^(2)),a > 0\epsilon(t)=\frac{a}{t^{2}}, a>0 and alpha > 3\alpha>3, according to Theorem 2.2 one has that xx is bounded, g(x(t)+(gamma+(beta )/(t))(x^(˙))(t))-min g=O((1)/(t^(2)))g\left(x(t)+\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t)\right)-\min g=\mathcal{O}\left(\frac{1}{t^{2}}\right) and ||x^(˙)(t)||=O((1)/(t))\|\dot{x}(t)\|=\mathcal{O}\left(\frac{1}{t}\right) as t rarr+oot \rightarrow+\infty. Further, if a > (2)/(9)alpha(alpha-3)a>\frac{2}{9} \alpha(\alpha-3), according to Theorem 3.2 the conclusions stated at 2. hold.
Observe that according to 3. we are able to obtain both fast convergence of the function values and strong convergence of the trajectories for the same Tikhonov regularization parameter. For a long time this was an unsolved problem in the literature. However, recently Attouch and László [16] studied a dynamical system in connection to the optimization problem (P) and succeeded to obtain rapid convergence towards the infimal value of gg, and the strong convergence of the trajectories towards the element of minimum norm of the set of minimizers of gg. In our context, one can observe that the case r=2r=2 is critical, in the sense that separates the two cases: the case when we obtain fast convergence of the function values and weak convergence of the trajectories to a minimizer and the case when the strong convergence of the trajectories to a minimizer of minimum norm is assured. These results are in concordance with the results obtained in [10] and [20], since, as will be shown in what follows, the dynamical system studied in this paper can be thought as an intermediate system between the dynamical system studied in [10] and the dynamical system considered in [20]. Before we give a more enlightening discussion about the connection of the dynamical system (1.1) and the Tikhonov regularized dynamical systems studied in [10] and [20] we underline two new features of our analysis. Firstly, we can show fast convergence of the velocity to zero, a property which is also obtained for the Tikhonov regularized system studied in [10], but this property is not shown for the Tikhonov regularized system considered in [20]. Secondly, we obtain some integral estimates for the gradient of the objective function and these results also appear for the Tikhonov regularized system studied in [20], but these estimates are not shown for the Tikhonov regularized system studied in [10].
For further insight into the Tikhonov regularization techniques we refer to [10,12,13,16,20,22,25][10,12,13,16,20,22,25].
1.1. Connection with second order dynamical systems with asymptotically vanishing damping. The dynamical system (1.1) is strongly related to the second order dynamical systems with an asymptotically vanishing damping term, studied by Su-Boyd-Candès in [33] in connection to the optimization problem ( PP ), that is,
It is obvious that the latter system can be obtained from (1.1) by taking gamma=beta=0\gamma=\beta=0 and epsilon-=0\epsilon \equiv 0. According to [33], the trajectories generated by (AVD) _(alpha){ }_{\alpha} assure fast minimization property of order O(1//t^(2))\mathcal{O}\left(1 / t^{2}\right) for the decay g(x(t))-min gg(x(t))-\min g, provided alpha >= 3\alpha \geq 3. For alpha > 3\alpha>3, it has been shown by Attouch-Chbani-Peypouquet-Redont [9] that each trajectory generated by (AVD) _(alpha){ }_{\alpha} converges weakly to a minimizer of the objective function gg. Further, it is shown in [17] and [30] that the asymptotic convergence rate of the values is actually o(1//t^(2))o\left(1 / t^{2}\right).
An appropriate discretization of (AVD) _(alpha){ }_{\alpha} with alpha=3\alpha=3 corresponds to Nesterov's historical algorithm [31].
Therefore, as it was emphasized in [33], for alpha=3\alpha=3 the system (AVD) _(alpha){ }_{\alpha} can be seen as a continuous version of the accelerated gradient method of Nesterov.
However, the case alpha=3\alpha=3 is critical, i.e., the convergence of the trajectories generated by (AVD) _(alpha){ }_{\alpha} remains an open problem. The subcritical case alpha <= 3\alpha \leq 3 has been examined by Apidopoulos-Aujol-Dossal [5] and Attouch-Chbani-Riahi [11], with the convergence rate of the objective values O(t^(-(2alpha)/(3)))\mathcal{O}\left(t^{-\frac{2 \alpha}{3}}\right).
When the objective function gg is not convex, the convergence of the trajectories generated by (AVD) _(alpha){ }_{\alpha} is a largely open question. Recent progress has been made in [21], where the convergence of the trajectories of a system, which can be considered as a perturbation of (AVD) _(alpha){ }_{\alpha} has been obtained in a non-convex setting.
The corresponding inertial algorithms obtained from (AVD) _(alpha){ }_{\alpha} via discretization, are in line with the Nesterov accelerated gradient method and enjoy similar properties to the continuous case, see [24] and also [4, 7, 9, 27, 28] for further results and the extensions to proximal-gradient algorithms for structured optimization. For other results concerning the system (AVD) _(alpha){ }_{\alpha} and its extensions in we refer to [11, 23, 26].
A version of (AVD) _(alpha){ }_{\alpha} containing a Tikhonov regularization term, strongly related to (1.1), was considered by Attouch-Chbani-Riahi in [10]. According to [10] the presence of Tikhonov regularization term epsilon(t)x(t)\epsilon(t) x(t) provides the strong convergence of the trajectories to the element of minimum norm of the set of minimizers of gg, when epsilon(t)\epsilon(t) tends slowly to zero. We emphasize that (1.1), for the case beta=gamma=0\beta=\gamma=0, reduces to the system studied in [10].
1.2. Connection with second order dynamical systems with Hessian driven damping. The dynamical system (1.1) is also related to the second order dynamical system with Hessian driven damping term, studied by Attouch-Peypouquet-Redont in [19], that is, (DIN-AVD)_(alpha,beta)quadx^(¨)(t)+(alpha )/(t)x^(˙)(t)+betagrad^(2)g(x(t))x^(˙)(t)+grad g(x(t))=0,x(t_(0))=u_(0),x^(˙)(t_(0))=v_(0),u_(0),v_(0)inH,beta >= 0(D I N-A V D)_{\alpha, \beta} \quad \ddot{x}(t)+\frac{\alpha}{t} \dot{x}(t)+\beta \nabla^{2} g(x(t)) \dot{x}(t)+\nabla g(x(t))=0, x\left(t_{0}\right)=u_{0}, \dot{x}\left(t_{0}\right)=v_{0}, u_{0}, v_{0} \in \mathcal{H}, \beta \geq 0.
In [19], for the case alpha > 3,beta > 0\alpha>3, \beta>0 the authors showed the weak convergence of a generated trajectory to a minimizer of gg and they obtained convergence rate of order o(1//t^(2))o\left(1 / t^{2}\right) for the objective function along the trajectory. The temporal discretization of this dynamical system provides first-order algorithms which beside the extrapolation term contain a correction term which is equal to the difference of the gradients at two consecutive steps [8, 2]. Several recent studies have been devoted to this subject, see [6, 14, 15, 29, 32]. Further, Bot-Csetnek-László considered in [20] the Tikhonov regularization of (DIN-AVD) _(alpha,beta){ }_{\alpha, \beta}. They obtained fast convergence results for the function values along the trajectories and strong convergence results of the trajectory to the minimizer of the objective function of minimum norm. Now, by using the Taylor expansion of grad g(*)\nabla g(\cdot) we get
which shows that the system (DIN-AVD) _(alpha,beta){ }_{\alpha, \beta} with Tikhonov regularization term epsilon(t)x(t)\epsilon(t) x(t) considered in [20] and the dynamical system (1.1) are strongly related. In this paper, we aim to obtain fast convergence results for the function values along the trajectories generated by the dynamical system (1.1) and strong convergence results of the trajectories to the minimizer of the objective function of minimum norm, under some similar assumption as those considered in [20]. However, we emphasize that our objective function gg is of class C^(1)C^{1} meanwhile the objective function considered in [20] is of class C^(2)C^{2}. Further, as we mentioned before, we are also able to show the rate o(1//t)o(1 / t) for the velocity. Moreover, according to [3] the dynamical system (1.1), (with epsilon-=0\epsilon \equiv 0 ), leads via explicit Euler discretization to inertial algorithms. In particular the Nesterov accelerated convex gradient method can be obtained from (1.1) via natural explicit discretization.
The following numerical experiments reveal that a trajectory generated by (1.1) and the objective function value in this trajectory have a better convergence behaviour than a trajectory generated by the system (AVD)_(alpha)(\mathrm{AVD})_{\alpha} with a Tikhonov regularization term studied in [10] and also a similar behaviour as the trajectories generated by the dynamical system considered in [20]. At the same time, the perturbation term (gamma+(beta )/(t))x^(˙)(t)\left(\gamma+\frac{\beta}{t}\right) \dot{x}(t) in the argument of the gradient of the objective function gg has a smoothing effect, just as the case of (DINAVD) _(alpha,beta){ }_{\alpha, \beta}. This also confirms our conclusion that (1.1) can be thought as an intermediate system between the system (" AVD ")_(alpha)(\text { AVD })_{\alpha} with a Tikhonov regularization term considered in [10] and (DIN-AVD) _(alpha,beta){ }_{\alpha, \beta} with a Tikhonov regularization term studied in [20], which inherits the best properties of the latter systems.
1.3. Some numerical experiments. In this section we consider two numerical experiments for the trajectories generated by the dynamical system (1.1) for a convex but not strongly convex objective function
g:R^(2)rarrR,quad g(x,y)=(ax+by)^(2)" where "a,b inR\\{0}.g: \mathbb{R}^{2} \rightarrow \mathbb{R}, \quad g(x, y)=(a x+b y)^{2} \text { where } a, b \in \mathbb{R} \backslash\{0\} .
Observe that argmin g={(x,-(a)/(b)x):x inR}\operatorname{argmin} g=\left\{\left(x,-\frac{a}{b} x\right): x \in \mathbb{R}\right\} and min g=0\min g=0. Obviously, the minimizer of minimal norm of gg is x^(**)=(0,0)x^{*}=(0,0).
Everywhere in the following numerical experiments we consider the continuous time dynamical systems (1.1) and the dynamical systems (AVD) _(alpha){ }_{\alpha} and (DIN-AVD) _(alpha,beta){ }_{\alpha, \beta} with or without the regularization term epsilon(t)x(t)\epsilon(t) x(t), solved numerically with the ode45 adaptive method in MATLAB on the interval [1, 10]. For the Tikhonov regularization parameter we take epsilon(t)=(1)/(t^(1.5))\epsilon(t)=\frac{1}{t^{1.5}} and we consider the starting points x(1)=(1,-1),x^(˙)(1)=(-1,1)x(1)=(1,-1), \dot{x}(1)= (-1,1).
g( bar(x)) <= l i m i n f_(n rarr+oo)g(x(t_(n)))=min g,g(\bar{x}) \leq \liminf _{n \rightarrow+\infty} g\left(x\left(t_{n}\right)\right)=\min g,
hence bar(x)in argmin g\bar{x} \in \operatorname{argmin} g. Now, since the norm is weakly lower semicontinuous one has that
|| bar(x)|| <= l i m i n f_(n rarr+oo)||x(t_(n))|| <= ||x^(**)||,\|\bar{x}\| \leq \liminf _{n \rightarrow+\infty}\left\|x\left(t_{n}\right)\right\| \leq\left\|x^{*}\right\|,
which, from the definition of x^(**)x^{*}, implies that bar(x)=x^(**)\bar{x}=x^{*}. This shows that the trajectory x(*)x(\cdot) converges weakly to x^(**)x^{*}. So
||x^(**)|| <= l i m i n f_(t rarr+oo)||x(t)|| <= l i m s u p_(t rarr+oo)||x(t)|| <= ||x^(**)||\left\|x^{*}\right\| \leq \liminf _{t \rightarrow+\infty}\|x(t)\| \leq \limsup _{t \rightarrow+\infty}\|x(t)\| \leq\left\|x^{*}\right\|
From the previous relation and the fact that x(t)⇀x^(**)x(t) \rightharpoonup x^{*} as t rarr+oot \rightarrow+\infty, we obtain the strong convergence, i.e.
Finally, the last case reads as follows.
Case III: We suppose that for every T >= t_(0)T \geq t_{0} there exists t >= Tt \geq T such that ||x^(**)|| > ||x(t)||\left\|x^{*}\right\|>\|x(t)\| and also there exists s >= Ts \geq T such that ||x^(**)|| <= ||x(s)||\left\|x^{*}\right\| \leq\|x(s)\|. From the continuity of the unique strong global solution x(*)x(\cdot), we find that there exists a sequence (t_(n))_(n inN)sube[t_(0),+oo)\left(t_{n}\right)_{n \in \mathbb{N}} \subseteq\left[t_{0},+\infty\right) such that t_(n)rarr+oot_{n} \rightarrow+\infty as n rarr+oon \rightarrow+\infty and, for all n inNn \in \mathbb{N} we have
In order to show that x(t_(n))rarrx^(**)x\left(t_{n}\right) \rightarrow x^{*} as n rarr+oon \rightarrow+\infty, we let bar(x)inH\bar{x} \in \mathcal{H} to be a weak sequential cluster point of (x(t_(n)))_(n inN)\left(x\left(t_{n}\right)\right)_{n \in \mathbb{N}}. By using that the sequence is bounded and by employing arguments similar to the previous case, we eventually find that (x(t_(n)))_(n inN)\left(x\left(t_{n}\right)\right)_{n \in \mathbb{N}} converges weakly to x^(**)x^{*} as n rarr+oon \rightarrow+\infty. Obviously ||x(t_(n))||rarr||x^(**)||\left\|x\left(t_{n}\right)\right\| \rightarrow\left\|x^{*}\right\| as n rarr+oon \rightarrow+\infty. So, it follows that ||x(t_(n))-x^(**)||rarr0\left\|x\left(t_{n}\right)-x^{*}\right\| \rightarrow 0 as n rarr+oon \rightarrow+\infty. This leads to
l i m i n f_(t rarr+oo)||x(t)-x^(**)||=0\liminf _{t \rightarrow+\infty}\left\|x(t)-x^{*}\right\|=0
and the proof is over.
Remark 4. Observe that according to (3.23), in case alpha > 3,gamma=0\alpha>3, \gamma=0 it is enough to assume that
Conclusion, perspectives. The dynamical system (1.1) studied in this paper can be seen as a second order system with implicit Hessian driven damping, therefore it is in strong connection with the dynamical system studied in [20]. At the same time, (1.1) is a perturbed version of the dynamical system with asymptotically vanishing damping considered in [10]. We have shown that (1.1) possess all the valuable properties of the two related systems, as we obtained fast convergence of velocities to zero, integral estimates for the gradient of the objective function and fast convergence of the objective function values to the minimum of the objective function. Further, depending the Tikhonov regularization parameter epsilon(t)\epsilon(t) goes fast or slow to zero, we obtained weak convergence of the trajectories to a minimizer of the objective function and strong convergence of the trajectories to a minimizer of minimum norm, respectively. Even more, by using the techniques from [16], we were able to obtain both strong convergence of the trajectories to a minimizer of minimum norm and fast convergence of the function values for the same dynamics.
The article presents the basic analysis of the dynamical system (1.1), many aspects of which have yet to be developed. We ought to enlarge the framework by considering optimization problems with nonsmooth convex objective function and in the corresponding dynamical system, with or without Tikhonov regularization term, to replace the function by its Moreau envelope, (see [15]). Further, we intend to study the inertial algorithms obtained from (1.1) via explicit discretization and by taking advantage by the fact that these algorithms may have different inertial terms (see [3]), to obtain convergence of the generated sequences to a minimizer of the objective function. The Tikhonov regularization of these algorithms may allow to obtain strong convergence of the generated sequences to a minimizer of minimal norm of the objective function, (see [16]). Some recent results show that for non-convex objective, considering different inertial terms in the corresponding algorithms, bring some improvements (see [28]). It would be interesting to show that similar results hold also in convex case. However, if we discretize the dynamical system (1.1) by making use the Taylor expansion of the gradient we obtain inertial algorithms similar to the algorithms considered in [8] and [2]. Further, in case the objective function is non-smooth, the above described discretization techniques lead to algorithms which are related to the celebrated algorithms (RIPA) [18] and (PRINAM) [14].
Acknowledgements The authors are thankful to three anonymous reviewers for their comments and suggestions which improved the quality of the paper.
Appendix A. Existence and uniqueness of the trajectories generated by the dynamical system (1.1). In what follows we show the existence and uniqueness of a classical C^(2)C^{2} solution xx of the dynamical system (1.1). To this purpose we rewrite (1.1) as a first order system relevant for the Cauchy-Lipschitz-Picard theorem.
Theorem A.1. Let (u_(0),v_(0))inHxxH\left(u_{0}, v_{0}\right) \in \mathcal{H} \times \mathcal{H}. Then, the dynamical system (1.1) admits a unique global C^(2)((t_(0),+oo),H)C^{2}\left(\left(t_{0},+\infty\right), \mathcal{H}\right) solution.
Proof. Indeed, by using the notation X(t):=(x(t),x^(˙)(t))X(t):=(x(t), \dot{x}(t)), the dynamical system (1.1) can be put in the form
where F:[t_(0),oo)xxHxxHlongrightarrowHxxH,F(t,u,v)=(v,-(alpha )/(t)v-epsilon(t)u-grad g(u+(gamma+(beta )/(t))v))F:\left[t_{0}, \infty\right) \times \mathcal{H} \times \mathcal{H} \longrightarrow \mathcal{H} \times \mathcal{H}, F(t, u, v)=\left(v,-\frac{\alpha}{t} v-\epsilon(t) u-\nabla g\left(u+\left(\gamma+\frac{\beta}{t}\right) v\right)\right).
Our proof is inspired from [12]. Since grad g\nabla g is Lipschitz on bounded sets, it is obvious that for (A.1) the classical Cauchy-Picard theorem can be applied, hence, there exist a unique C^(1)C^{1} local solution XX. Consequently, (1.1) has a unique C^(2)C^{2} local solution. Let xx be a maximal solution of (1.1), defined on an interval [t_(0),T_(max)),T_(max) <= +oo\left[t_{0}, T_{\max }\right), T_{\max } \leq+\infty. In order to prove that x^(˙)\dot{x} is bounded on [t_(0),T_(max))\left[t_{0}, T_{\max }\right) one can use the same arguments as in the proof of Lemma 3.1.
Let ||x^(˙)_(oo)||=s u p_(t in[t_(0),T_(max)))||x^(˙)(t)||\left\|\dot{x}_{\infty}\right\|=\sup _{t \in\left[t_{0}, T_{\max }\right)}\|\dot{x}(t)\| and assume that T_(max) < +ooT_{\max }<+\infty. Since ||x(t)-x(t^('))|| <= ||x^(˙)_(oo)||t-t^(')∣\left\|x(t)-x\left(t^{\prime}\right)\right\| \leq\left\|\dot{x}_{\infty}\right\| t-t^{\prime} \mid, we get that lim_(t rarrT_("max "))x(t):=x_(oo)inH\lim _{t \rightarrow T_{\text {max }}} x(t):=x_{\infty} \in \mathcal{H}. By (1.1) the map x^(¨)\ddot{x} is also bounded on the interval [t_(0),T_("max "))\left[t_{0}, T_{\text {max }}\right) and under the same argument as before lim_(t rarrT_("max "))x^(˙)(t):=x_(oo)\lim _{t \rightarrow T_{\text {max }}} \dot{x}(t):=x_{\infty} exists. Applying the local existence theorem with initial data (x_(oo),x^(˙)_(oo))\left(x_{\infty}, \dot{x}_{\infty}\right), we can extend the maximal solution to a strictly larger interval, a clear contradiction. Hence T_("max ")=+ooT_{\text {max }}=+\infty, which completes the proof.
Appendix B. Auxiliary results. In this appendix, we collect some lemmas and technical results which we will use in the analysis of the dynamical system (1.1). The following lemma was stated for instance
in [10, Lemma A.3] and is used to prove the convergence of the objective function along the trajectory to its minimal value.
Lemma B.1. Let delta > 0\delta>0 and f inL^(1)((delta,+oo),R)f \in L^{1}((\delta,+\infty), \mathbb{R}) be a nonnegative and continuous function. Let varphi:[delta,+oo)longrightarrow[0,+oo)\varphi: [\delta,+\infty) \longrightarrow[0,+\infty) be a nondecreasing function such that lim_(t rarr+oo)varphi(t)=+oo\lim _{t \rightarrow+\infty} \varphi(t)=+\infty. Then it holds
lim_(t rarr+oo)(1)/(varphi(t))int_(delta)^(t)varphi(s)f(s)ds=0\lim _{t \rightarrow+\infty} \frac{1}{\varphi(t)} \int_{\delta}^{t} \varphi(s) f(s) d s=0
The following statement is the continuous counterpart of a convergence result of quasi-Fejér monotone sequences. For its proofs we refer to [1, Lemma 5.1].
Lemma B.2. Suppose that F:[t_(0),+oo)longrightarrowRF:\left[t_{0},+\infty\right) \longrightarrow \mathbb{R} is locally absolutely continuous and bounded from below and that there exists G inL^(1)([t_(0),+oo),R)G \in L^{1}\left(\left[t_{0},+\infty\right), \mathbb{R}\right) such that
(d)/(dt)F(t) <= G(t)\frac{d}{d t} F(t) \leq G(t)
for almost every t in[t_(0),+oo)t \in\left[t_{0},+\infty\right). Then there exists lim_(t rarr+oo)F(t)inR\lim _{t \rightarrow+\infty} F(t) \in \mathbb{R}.
The following technical result is [19, Lemma 2].
Lemma B.3. Let u:[t_(0),+oo)longrightarrowHu:\left[t_{0},+\infty\right) \longrightarrow \mathcal{H} be a continuously differentiable function satisfying u(t)+(t)/( alpha)u^(˙)(t)rarr u inHu(t)+\frac{t}{\alpha} \dot{u}(t) \rightarrow u \in \mathcal{H} as t rarr+oot \rightarrow+\infty, where alpha > 0\alpha>0. Then u(t)rarr uu(t) \rightarrow u as t rarr+oot \rightarrow+\infty.
The continuous version of the Opial Lemma (see [9]) is the main tool for proving weak convergence for the generated trajectory.
Lemma B.4. Let S subeHS \subseteq \mathcal{H} be a nonempty set and x:[t_(0),+oo)longrightarrow Hx:\left[t_{0},+\infty\right) \longrightarrow H a given map such that:
(i) for every z in Sz \in S the limit lim_(t rarr+oo)||x(t)-z||\lim _{t \rightarrow+\infty}\|x(t)-z\| exists;
(ii) every weak sequential limit point of x(t)x(t) belongs to the set SS.
Then the trajectory x(t)x(t) converges weakly to an element in SS as t rarr+oot \rightarrow+\infty.
Lemma B.5. (Lemma A.6 [18]) Let t_(0) > 0t_{0}>0 and let w:[t_(0),+oo)longrightarrowRw:\left[t_{0},+\infty\right) \longrightarrow \mathbb{R} be a continuously differentiable function which is bounded from below. Given a nonnegative function theta\theta, let us assume that
for some alpha > 1\alpha>1, almost every t > t_(0)t>t_{0}, and some nonnegative function k inL^(1)((t_(0),+oo),R)k \in L^{1}\left(\left(t_{0},+\infty\right), \mathbb{R}\right).
Then, the positive part [w^(˙)]_(+)[\dot{w}]_{+}of w^(˙)\dot{w} belongs to L^(1)((t_(0),+oo),R)L^{1}\left(\left(t_{0},+\infty\right), \mathbb{R}\right) and lim_(t rarr+oo)w(t)\lim _{t \rightarrow+\infty} w(t) exists. Moreover, we have
int_(t_(0))^(+oo)theta(t)dt < +oo\int_{t_{0}}^{+\infty} \theta(t) d t<+\infty
REFERENCES
[1] B. Abbas, H. Attouch, B.F. Svaiter, Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces, Journal of Optimization Theory and its Applications, 161(2) (2014), 331-360
[2] C.D. Alecsa, The long time behavior and the rate of convergence of symplectic convex algorithms obtained via splitting discretizations of inertial damping systems, (2020), arxiv.org/abs/2001.10831
[3] C.D. Alecsa, S.C. László, T. Pinţa, An Extension of the Second Order Dynamical System that Models Nesterov's Convex Gradient Method, Applied Mathematics and Optimization, (2020), https://doi.org/10.1007/s00245-020-09692-1
[4] C.D. Alecsa, S.C. László, A. Viorel, A gradient type algorithm with backward inertial steps associated to a nonconvex minimization problem, Numerical Algorithms, 84(2) (2020), 485-512
[5] V. Apidopoulos, J.F. Aujol, Ch. Dossal, Convergence rate of inertial Forward-Backward algorithm beyond Nesterov's rule, Mathematical Programming, 180 (2020), 137-156
[6] H. Attouch, A. Balhag, Z. Chbani, H. Riahi, Fast convex optimization via inertial dynamics combining viscous and Hessian-driven damping with time rescaling, Evolution Equations and Control Theory, (2021), https://doi.org/10.3934/eect. 202101
[7] H. Attouch, A. Cabot, Convergence rates of inertial forward-backward algorithms, SIAM J. Optim., 28(1) (2018), 849-874
[8] H. Attouch, Z. Chbani, J. Fadili, H. Riahi, First-order algorithms via inertial systems with Hessian driven damping, Math. Program., (2020), https://doi.org/10.1007/s10107-020-01591-1
[9] H. Attouch, Z. Chbani, J. Peypouquet, P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. Program. Ser. B, 168 (2018), 123-175
[10] H. Attouch, Z. Chbani, H. Riahi, Combining fast inertial dynamics for convex optimization with Tikhonov regularization, Journal of Mathematical Analysis and Applications, 457(2) (2018), 1065-1094
[11] H. Attouch, Z. Chbani, R. Riahi, Rate of convergence of the Nesterov accelerated gradient method in the subcritical case alpha <= 3\alpha \leq 3. ESAIM-COCV, 25 (2019), Article number 2
[12] H. Attouch, M.-O. Czarnecki, Asymptotic Control and Stabilization of Nonlinear Oscillators with Non-isolated Equilibria, J. Differential Equations, 179 (2002), 278-310
[13] H. Attouch, R. Cominetti, A dynamical approach to convex minimization coupling approximation with the steepest descent method, J. Differential Equations, 128 (1996), 519-540
[14] H. Attouch, S.C. László, Newton-like Inertial Dynamics and Proximal Algorithms Governed by Maximally Monotone Operators, SIAM Journal on Optimization, 30(4) (2020), 3252-3283
[15] H. Attouch, S.C. László, Continuous Newton-like Inertial Dynamics for Monotone Inclusions, Set-Valued and Variational Analysis, (2020), doi:10.1007/s11228-020-00564-y
[16] H. Attouch, S.C. László, Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution, (2021), https://arxiv.org/abs/2104.11987
[17] H. Attouch, J. Peypouquet, The rate of convergence of Nesterov's accelerated forward-backward method is actually faster than 1//k^(2)1 / k^{2}. SIAM J. Optim., 26(3) (2016), 1824-1834
[18] H. Attouch, J. Peypouquet, Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators, Mathematical Programming, 174(1-2) (2019), 391-432
[19] H. Attouch, J. Peypouquet, P. Redont, Fast convex optimization via inertial dynamics with Hessian driven damping, Journal of Differential Equations, 261(10) (2016), 5734-5783
[20] R.I. Boţ, E.R. Csetnek, S.C. László, Tikhonov regularization of a second order dynamical system with Hessian damping, Math. Program., (2020), https://doi.org/10.1007/s10107-020-01528-8
[21] R.I. Boţ, E.R. Csetnek, S.C. László, A second-order dynamical approach with variable damping to nonconvex smooth minimization Applicable Analysis, 99(3) (2020), 361-378
[22] R.I. Boţ, S.M. Grad, D. Meier, M. Staudigl, Inducing strong convergence of trajectories in dynamical systems associated to monotone inclusions with composite structure, Adv. Nonlinear Anal., 10 (2021), 450-476
[23] A. Cabot, H. Engler, S. Gadat, On the long time behavior of second order differential equations with asymptotically small dissipation, Transactions of the American Mathematical Society, 361 (2009), 5983-6017
[24] A. Chambolle, Ch. Dossal, On the convergence of the iterates of the Fast Iterative Shrinkage Thresholding Algorithm, Journal of Optimization Theory and Applications, 166 (2015), 968-982
[25] R. Cominetti, J. Peypouquet, S. Sorin, Strong asymptotic convergence of evolution equations governed by maximal monotone operators with Tikhonov regularization, J. Differential Equations, 245 (2008), 3753-3763
[26] M.A. Jendoubi, R. May, Asymptotics for a second-order differential equation with nonautonomous damping and an integrable source term, Appl. Anal., 94(2) (2015), 435-443
[27] S.C. László, Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization, Mathematical Programming, (2020), https://doi.org/doi.org/10.1007/s10107-020-01534-w
[28] S.C. László, Forward-backward algorithms with different inertial terms for structured non-convex minimization problems, (2020), arxiv.org/abs/2002.07154
[29] T. Lin, M.I. Jordan, A Control-Theoretic Perspective on Optimal High-Order Optimization, (2019), arXiv:1912.07168v1
[30] R. May, Asymptotic for a second-order evolution equation with convex potential and vanishing damping term, Turkish Journal of Math., 41(3) (2017), 681-685
[31] Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1//k^(2))O\left(1 / k^{2}\right), Soviet Mathematics Doklady, 27 (1983), 372-376
[32] B. Shi, S.S. Du, M.I. Jordan, W.J. Su, Understanding the acceleration phenomenon via high-resolution differential equations, (2018), arXiv:1810.08907v3
[33] W. Su, S. Boyd, E.J. Candès, A differential equation for modeling Nesterov's accelerated gradient method: theory and insights, Journal of Machine Learning Research 17(153) (2016), 1-43
*Technical University of Cluj-Napoca, Department of Mathematics, Str. Memorandumului nr. 28, 400114 Cluj-Napoca, Romania and Romanian Institute of Science and Technology, Cluj-Napoca, Romania, e-mail: alecsa@rist.ro. ^(†){ }^{\dagger} Technical University of Cluj-Napoca, Department of Mathematics, Str. Memorandumului nr. 28, 400114 Cluj-Napoca, Romania, e-mail: szilard.laszlo@math.utcluj.ro