General multivariate arctangent function activated neural network approximations
received: March 22, 2022; accepted: May 28, 2022; published online: August 25, 2022.
Here we expose multivariate quantitative approximations of Banach space valued continuous multivariate functions on a box or
MSC. 41A17, 41A25, 41A30, 41A36.
Keywords. arctangent function, multivariate neural network approximation, quasi-interpolation operator, Kantorovich type operator, quadrature type operator, multivariate modulus of continuity, abstract approximation, iterated approximation.
1 Introduction
The author in
[
2
]
and
[
3
]
, see chapters 2–5, was the first to establish neural network approximations to continuous functions with rates by very specifically defined neural network operators of Cardaliagnet-Euvrard and “Squashing” types, by employing the modulus of continuity of the engaged function or its high order derivative, and producing very tight Jackson type inequalities. He treats there both the univariate and multivariate cases. The defining these operators “bell-shaped” and “squashing” functions are assumed to be of compact support. Also in
[
3
]
he gives the
For this article the author is motivated by the article [ 13 ] of Z. Chen and F. Cao, also by [ 4 ] , [ 5 ] , [ 6 ] , [ 7 ] , [ 8 ] , [ 9 ] , [ 10 ] , [ 11 ] , [ 14 ] , [ 15 ] .
The author here performs multivariate arctangent function based neural network approximations to continuous functions over boxes or over the whole
The author here comes up with the “right” precisely defined multivariate normalized, quasi-interpolation neural network operators related to boxes or
Feed-forward neural networks (FNNs) with one hidden layer, the only type of networks we deal with in this chapter, are mathematically expressed as
where for
2 Auxiliary Notions
We consider the
We will be using
which is a sigmoid type function and it is strictly increasing. We have that
and
We consider the activation function
and we notice that
it is an even function.
Since
We see that
Let
That is
That is
Observe that
That is the
All in all,
We need
We have that
It holds
So that
We mention
Let
Denote by
We need
Let
i) We have that
for at least some
ii) For large enough
In general, by theorem 1, it holds
We introduce
It has the properties:
(i)
(ii)
where
hence
(iii)
and
(iv)
that is
Here denote
where
We obviously see that
For
In the last two sums the counting is over disjoint vector sets of
(v) As in [ 10 , pp. 379–380 ] , we derive that
with
(vi) By theorem 4 we get that
It is also clear that
(vii)
Furthermore it holds
for at least some
Here
Let
We introduce and define the following multivariate linear normalized neural network operator (
For large enough
When
Clearly
Notice that
Furthermore it holds
Clearly
So, we have that
Let
Furthermore it holds
Since
We call
For convinience we call
That is
Hence
Consequently we derive
We will estimate the right hand side of (36).
For the last and others we need
Let
If
Notice
We have
Clearly we have also:
When
Also for
Again for
Let
where
We set
In this article we study the approximation properties of
3 Multivariate general Neural Network Approximations
Here we present several vectorial neural network approximations to Banach space valued functions given with rates.
We give
Let
1)
and
2)
We notice that
Above
Thus
So that
Now using (36) we finish the proof.
We make
Let
Let
Let
Let
Call
We will work with
Then, by Taylor’s formula [ 12 ] , [ 19 , p. 124 ] , we get
where the remainder is the Riemann integral
here we set
We consider
We obtain
by Lemma 7.1.1,
[
1
,
p.
208
]
, where
Therefore for all
by a change of variable, where
is a (polynomial) spline function, see [ 1 , p. 210–211 ] .
Also from there we get
with equality true only at
Therefore it holds
We have found that
Here
One can rewrite (60) as follows:
a pointwise functional inequality on
Here
Clearly
Let
Therefore we obtain
Clearly (62) is valid when
All the above is preparation for the following theorem, where we assume Fréchet differentiability of functions.
This will be a direct application of Theorem 10.2,
[
11
,
pp.
268–270
]
. The operators
We present the following high order approximation results.
Let
1)
2) additionally if
3)
and
4)
We need
The function
We make
We give
We make
We estimate
(where
We have proved that (
(
And, consequently it holds
So, we have that
Next we estimate
We have that
When
We further have that
That is
Therefore when
and converges to zero, as
We conclude:
In theorem 6, the right hand sides of (65) and (66) converge to zero as
Also in Corollary 1, the right hand sides of (68) and (69) converge to zero as
We have proved that the left hand sides of 63, 64, 65, 66 and 68, 69 converge to zero as
We give
We continue with
Let
1)
2)
Given that
Hence
proving the claim.
We give
Let
1)
2)
Given that
Thus it holds (by (40))
We observe that
proving the claim.
We also present
Let
1)
2)
Given that
proving the claim.
We make
Let
Clearly
Hence
We need
Let
Next we prove the continuity of
We will use the generalized Weierstrass
(a)
(b)
then
Also we will use:
If
We will prove that
There always exists
Since
So for
and
Hence by the generalized Weierstrass
Since
Because
So for
and
Hence by Weierstrass
Since
So we proved that
Writing
we have it as a continuous function on
When
(there always exist
(For convenience call
Thus
Notice that the finite sum of continuous functions
The rest of the summands of
We will prove that
The continuous function
and
So by the Weierstrass
Next we prove continuity on
Notice here that
and
So the double series under consideration is uniformly convergent and continuous. Clearly
Similarly reasoning one can prove easily now, but with more tedious work, that
By (27) it is obvious that
Call
Clearly then
etc.
Therefore we get
the contraction property.
Also we see that
Here
Here
We give the condensed
Let
(i)
where
and
(ii)
For
pointwise and uniformly.
Next we do iterated neural network approximation (see also [ 9 ] ).
We make
Let
Then
That is
We give
All here as in theorem 11 and
So that the speed of convergence to the unit operator of
We make
We give
Let
Clearly, we notice that the speed of convergence to the unit operator of the multiply iterated operator is not worse than the speed of
We continue with
A typical application of all of our results is when
Bibliography
- 1
G.A. Anastassiou, Moments in Probability and Approximation Theory, Pitman Research Notes in Math., Vol. 287, Longman Sci. & Tech., Harlow, U.K., 1993.
- 2
G.A. Anastassiou, Rate of convergence of some neural network operators to the unit-univariate case, J. Math. Anal. Appli. 212 (1997), pp. 237–262. https://doi.org/10.1006/jmaa.1997.5494
- 3
G.A. Anastassiou, Quantitative Approximations, Chapman&Hall/CRC, Boca Raton, New York, 2001.
- 4
G.A. Anastassiou, Inteligent Systems: Approximation by Artificial Neural Networks, Intelligent Systems Reference Library, Vol. 19, Springer, Heidelberg, 2011. https://doi.org/10.1007/978-3-642-21431-8
- 5
G.A. Anastassiou, Univariate hyperbolic tangent neural network approximation, Mathematics and Computer Modelling, 53 (2011), pp. 1111–1132. https://doi.org/10.1016/j.mcm.2010.11.072
- 6
G.A. Anastassiou, Multivariate hyperbolic tangent neural network approximation, Computers and Mathematics 61 (2011), pp. 809–821. https://doi.org/10.1016/j.camwa.2010.12.029
- 7
G.A. Anastassiou, Multivariate sigmoidal neural network approximation, Neural Networks 24 (2011), pp. 378–386. https://doi.org/10.1016/j.neunet.2011.01.003
- 8
G.A. Anastassiou, Univariate sigmoidal neural network approximation, J. of Computational Analysis and Applications, vol. 14 (2012) no. 4, pp. 659–690.
- 9
G.A. Anastassiou, Approximation by neural networks iterates, Advances in Applied Mathematics and Approximation Theory, pp. 1-20, Springer Proceedings in Math. & Stat., Springer, New York, 2013, Eds. G. Anastassiou, O. Duman.
- 10
G. Anastassiou, Intelligent Systems II: Complete Approximation by Neural Network Operators, Springer, Heidelberg, New York, 2016.
- 11
G. Anastassiou, Intelligent Computations: Abstract Fractional Calculus, Inequalities, Approximations, Springer, Heidelberg, New York, 2018.
- 12
H. Cartan, Differential Calculus, Hermann, Paris, 1971.
- 13
Z. Chen and F. Cao, The approximation operators with sigmoidal functions, Computers and Mathematics with Applications, 58 (2009), pp. 758–765. https://doi.org/10.1016/j.camwa.2009.05.001
- 14
D. Costarelli, R. Spigler, Approximation results for neural network operators activated by sigmoidal functions, Neural Networks 44 (2013), pp. 101–106. https://doi.org/10.1016/j.neunet.2013.03.015
- 15
D. Costarelli, R. Spigler, Multivariate neural network operators with sigmoidal activation functions, Neural Networks 48 (2013), pp. 72–77.https://doi.org/10.1016/j.neunet.2013.07.009
- 16
S. Haykin, Neural Networks: A Comprehensive Foundation (2 ed.), Prentice Hall, New York, 1998.
- 17
W. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, 7 (1943), pp. 115–133. https://doi.org/10.1007/bf02478259
- 18
T.M. Mitchell, Machine Learning, WCB-McGraw-Hill, New York, 1997.
- 19
L.B. Rall, Computational Solution of Nonlinear Operator Equations, John Wiley & Sons, New York, 1969.