General multivariate arctangent function activated neural network approximations
received: March 22, 2022; accepted: May 28, 2022; published online: August 25, 2022.
Here we expose multivariate quantitative approximations of Banach space valued continuous multivariate functions on a box or \(\mathbb {R}^{N},\) \(N\in \mathbb {N}\), by the multivariate normalized, quasi-interpolation, Kantorovich type and quadrature type neural network operators. We treat also the case of approximation by iterated operators of the last four types. These approximations are derived by establishing multidimensional Jackson type inequalities involving the multivariate modulus of continuity of the engaged function or its high order Fréchet derivatives. Our multivariate operators are defined by using a multidimensional density function induced by the arctangent function. The approximations are pointwise and uniform. The related feed-forward neural network is with one hidden layer.
MSC. 41A17, 41A25, 41A30, 41A36.
Keywords. arctangent function, multivariate neural network approximation, quasi-interpolation operator, Kantorovich type operator, quadrature type operator, multivariate modulus of continuity, abstract approximation, iterated approximation.
\(^\ast \)Department of Mathematical Sciences, University of Memphis, Memphis, TN 38152, U.S.A., e-mail: ganastss@memphis.edu.
1 Introduction
The author in [ 2 ] and [ 3 ] , see chapters 2–5, was the first to establish neural network approximations to continuous functions with rates by very specifically defined neural network operators of Cardaliagnet-Euvrard and “Squashing” types, by employing the modulus of continuity of the engaged function or its high order derivative, and producing very tight Jackson type inequalities. He treats there both the univariate and multivariate cases. The defining these operators “bell-shaped” and “squashing” functions are assumed to be of compact support. Also in [ 3 ] he gives the \(N\)th order asymptotic expansion for the error of weak approximation of these two operators to a special natural class of smooth functions, see chapters 4–5 there.
For this article the author is motivated by the article [ 13 ] of Z. Chen and F. Cao, also by [ 4 ] , [ 5 ] , [ 6 ] , [ 7 ] , [ 8 ] , [ 9 ] , [ 10 ] , [ 11 ] , [ 14 ] , [ 15 ] .
The author here performs multivariate arctangent function based neural network approximations to continuous functions over boxes or over the whole \(\mathbb {R}^{N}\), \(N\in \mathbb {N}\). Also he does iterated approximation. All convergences here are with rates expressed via the multivariate modulus of continuity of the involved function or its high order Fréchet derivative and given by very tight multidimensional Jackson type inequalities.
The author here comes up with the “right” precisely defined multivariate normalized, quasi-interpolation neural network operators related to boxes or \(\mathbb {R}^{N}\), as well as Kantorovich type and quadrature type related operators on \(\mathbb {R}^{N}\). Our boxes are not necessarily symmetric to the origin. In preparation to prove our results we establish important properties of the basic multivariate density function induced by arctangent function and defining our operators.
Feed-forward neural networks (FNNs) with one hidden layer, the only type of networks we deal with in this chapter, are mathematically expressed as
where for \(0\leq j\leq n\), \(b_{j}\in \mathbb {R}\) are the thresholds, \(a_{j}\in \mathbb {R}^{s}\) are the connection weights, \(c_{j}\in \mathbb {R}\) are the coefficients, \(\left\langle a_{j}\cdot x\right\rangle \) is the inner product of \(a_{j}\) and \(x\), and \(\sigma \) is the activation function of the network. In many fundamental network models, the activation function is the arctangent function. About neural networks read [ 16 ] , [ 17 ] , [ 18 ] .
2 Auxiliary Notions
We consider the
We will be using
which is a sigmoid type function and it is strictly increasing. We have that
and
We consider the activation function
and we notice that
it is an even function.
Since \(x+1{\gt}x-1\), then \(h\left( x+1\right) {\gt}h\left( x-1\right) \), and \(\psi \left( x\right) {\gt}0\), all \(x\in \mathbb {R}\).
We see that
Let \(x{\gt}0\), we have that
That is
That is \(\psi \) is strictly decreasing on \([0,\infty )\) and clearly is strictly increasing on \((-\infty ,0]\), and \(\psi ^{\prime }\left( 0\right) =0.\)
Observe that
That is the \(x\)-axis is the horizontal asymptote on \(\psi \).
All in all, \(\psi \) is a bell symmetric function with maximum \(\psi \left( 0\right) \cong 0.319.\)
We need
We have that
It holds
So that \(\psi \left( x\right) \) is a density function on \(\mathbb {R}.\)
We mention
Let \(0{\lt}\alpha {\lt}1\), and \(n\in \mathbb {N}\) with \(n^{1-\alpha }{\gt}2\). It holds
Denote by \(\left\lfloor \cdot \right\rfloor \) the integral part of the number and by \(\left\lceil \cdot \right\rceil \) the ceiling of the number.
We need
Let \(x\in \left[ a,b\right] \subset \mathbb {R}\) and \(n\in \mathbb {N}\) so that \(\left\lceil na\right\rceil \leq \left\lfloor nb\right\rfloor \). It holds
i) We have that
for at least some \(x\in \left[ a,b\right] .\)
ii) For large enough \(n\in \mathbb {N}\) we always obtain \(\left\lceil na\right\rceil \leq \left\lfloor nb\right\rfloor \). Also \(a\leq \frac{k}{n}\leq b\), iff \(\left\lceil na\right\rceil \leq k\leq \left\lfloor nb\right\rfloor \).
In general, by theorem 1, it holds
We introduce
It has the properties:
(i) \(Z\left( x\right) {\gt}0\), \(\forall \) \(x\in \mathbb {R}^{N},\)
(ii)
where \(k:=\left( k_{1},...,k_{n}\right) \in \mathbb {Z}^{N}\), \(\forall \) \(x\in \mathbb {R}^{N},\)
hence
(iii)
\(\forall \) \(x\in \mathbb {R}^{N};\) \(n\in \mathbb {N}\),
and
(iv)
that is \(Z\) is a multivariate density function.
Here denote \(\left\Vert x\right\Vert _{\infty }:=\max \left\{ \left\vert x_{1}\right\vert ,...,\left\vert x_{N}\right\vert \right\} \), \(x\in \mathbb {R}^{N}\), also set \(\infty :=\left( \infty ,...,\infty \right) \), \(-\infty :=\left( -\infty ,...,-\infty \right) \) upon the multivariate context, and
where \(a:=\left( a_{1},...,a_{N}\right) \), \(b:=\left( b_{1},...,b_{N}\right) .\)
We obviously see that
For \(0{\lt}\beta {\lt}1\) and \(n\in \mathbb {N}\), a fixed \(x\in \mathbb {R}^{N}\), we have that
In the last two sums the counting is over disjoint vector sets of \(k\)’s, because the condition \(\left\Vert \frac{k}{n}-x\right\Vert _{\infty }{\gt}\frac{1}{n^{\beta }}\) implies that there exists at least one \(\left\vert \frac{k_{r}}{n}-x_{r}\right\vert {\gt}\frac{1}{n^{\beta }}\), where \(r\in \left\{ 1,...,N\right\} .\)
(v) As in [ 10 , pp. 379–380 ] , we derive that
with \(n\in \mathbb {N}:n^{1-\beta }{\gt}2\), \(x\in \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] .\)
(vi) By theorem 4 we get that
\(\forall \) \(x\in \left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) \), \(n\in \mathbb {N}\).
It is also clear that
(vii)
\(0{\lt}\beta {\lt}1\), \(n\in \mathbb {N}:n^{1-\beta }{\gt}2\), \(x\in \mathbb {R}^{N}.\)
Furthermore it holds
for at least some \(x\in \left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) .\)
Here \(\left( X,\left\Vert \cdot \right\Vert _{\gamma }\right) \) is a Banach space.
Let \(f\in C\left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] ,X\right) ,\) \(x=\left( x_{1},...,x_{N}\right) \in \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] ,\) \(n\in \mathbb {N}\) such that \(\left\lceil na_{i}\right\rceil \leq \left\lfloor nb_{i}\right\rfloor \), \(i=1,...,N.\)
We introduce and define the following multivariate linear normalized neural network operator (\(x:=\left( x_{1},...,x_{N}\right) \in \left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) \)):
For large enough \(n\in \mathbb {N}\) we always obtain \(\left\lceil na_{i}\right\rceil \leq \left\lfloor nb_{i}\right\rfloor \), \(i=1,...,N\). Also \(a_{i}\leq \frac{k_{i}}{n}\leq b_{i}\), iff \(\left\lceil na_{i}\right\rceil \leq k_{i}\leq \left\lfloor nb_{i}\right\rfloor \), \(i=1,...,N\).
When \(g\in C\left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) \) we define the companion operator
Clearly \(\widetilde{A}_{n}\) is a positive linear operator. We have that
Notice that \(A_{n}\left( f\right) \in C\left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] ,X\right) \) and \(\widetilde{A}_{n}\left( g\right) \in C\left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) .\)
Furthermore it holds
\(\forall \) \(x\in \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] .\)
Clearly \(\left\Vert f\right\Vert _{\gamma }\in C\left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) .\)
So, we have that
\(\forall \) \(x\in \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \), \(\forall \) \(n\in \mathbb {N}\), \(\forall \) \(f\in C\left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] ,X\right) .\)
Let \(c\in X\) and \(g\in C\left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) \), then \(cg\in C\left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] ,X\right) .\)
Furthermore it holds
Since \(\widetilde{A}_{n}\left( 1\right) =1\), we get that
We call \(\widetilde{A}_{n}\) the companion operator of \(A_{n}\).
For convinience we call
\(\forall \) \(x\in \left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) .\)
That is
\(\forall \) \(x\in \left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) \), \(n\in \mathbb {N}.\)
Hence
Consequently we derive
\(\forall \) \(x\in \left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) .\)
We will estimate the right hand side of (36).
For the last and others we need
Let \(M\) be a convex and compact subset of \(\left( \mathbb {R}^{N},\left\Vert \cdot \right\Vert _{p}\right) \), \(p\in \left[ 1,\infty \right] \), and \(\left( X,\left\Vert \cdot \right\Vert _{\gamma }\right) \) be a Banach space. Let \(f\in C\left( M,X\right) .\) We define the first modulus of continuity of \(f\) as
If \(\delta {\gt}diam\left( M\right) \), then
Notice \(\omega _{1}\left( f,\delta \right) \) is increasing in \(\delta {\gt}0\). For \(f\in C_{B}\left( M,X\right) \) (continuous and bounded functions) \(\omega _{1}\left( f,\delta \right) \) is defined similarly.
We have \(\omega _{1}\left( f,\delta \right) \rightarrow 0 \) as \(\delta \downarrow 0\), iff \(f\in C\left( M,X\right) \), where \(M\) is a convex compact subset of \(\left( \mathbb {R}^{N},\left\Vert \cdot \right\Vert _{p}\right) \), \(p\in \left[ 1,\infty \right] .\)
Clearly we have also: \(f\in C_{U}\left( \mathbb {R}^{N},X\right) \) (uniformly continuous functions), iff \(\omega _{1}\left( f,\delta \right) \rightarrow 0\) as \(\delta \downarrow 0\), where \(\omega _{1}\) is defined similarly to (37). The space \(C_{B}\left( \mathbb {R}^{N},X\right) \) denotes the continuous and bounded functions on \(\mathbb {R}^{N}.\)
When \(f\in C_{B}\left( \mathbb {R}^{N},X\right) \) we define,
\(n\in \mathbb {N}\), \(\forall \) \(x\in \mathbb {R}^{N},\) \(N\in \mathbb {N}\), the multivariate quasi-interpolation neural network operator.
Also for \(f\in C_{B}\left( \mathbb {R}^{N},X\right) \) we define the multivariate Kantorovich type neural network operator
\(n\in \mathbb {N},\ \forall \) \(x\in \mathbb {R}^{N}.\)
Again for \(f\in C_{B}\left( \mathbb {R}^{N},X\right) ,\) \(N\in \mathbb {N},\) we define the multivariate neural network operator of quadrature type \(D_{n}\left( f,x\right) \), \(n\in \mathbb {N},\) as follows.
Let \(\theta =\left( \theta _{1},...,\theta _{N}\right) \in \mathbb {N}^{N},\) \(r=\left( r_{1},...,r_{N}\right) \in \mathbb {Z}_{+}^{N}\), \(w_{r}=w_{r_{1},r_{2},...r_{N}}\geq 0\), such that \(\sum \limits _{r=0}^{\theta }w_{r}=\sum \limits _{r_{1}=0}^{\theta _{1}}\sum \limits _{r_{2}=0}^{\theta _{2}}...\sum \limits _{r_{N}=0}^{\theta _{N}}w_{r_{1},r_{2},...r_{N}}=1;\) \(k\in \mathbb {Z}^{N}\) and
where \(\tfrac {r}{\theta }:=\left( \tfrac {r_{1}}{\theta _{1}},\frac{r_{2}}{\theta _{2}},...,\frac{r_{N}}{\theta _{N}}\right) .\)
We set
\(\forall \) \(x\in \mathbb {R}^{N}.\)
In this article we study the approximation properties of \(A_{n},B_{n},C_{n},\) \(D_{n}\) neural network operators and as well of their iterates. That is, the quantitative pointwise and uniform convergence of these operators to the unit operator \(I\).
3 Multivariate general Neural Network Approximations
Here we present several vectorial neural network approximations to Banach space valued functions given with rates.
We give
Let \(f\in C\left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] ,X\right) ,\) \(0{\lt}\beta {\lt}1\), \(x\in \left( \prod _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) ,\) \(N,n\in \mathbb {N}\) with \(n^{1-\beta }{\gt}2\). Then
1)
and
2)
We notice that \(\underset {n\rightarrow \infty }{\lim }A_{n}\left( f\right) \overset {\left\Vert \cdot \right\Vert _{\gamma }}{=}f\), pointwise and uniformly.
Above \(\omega _{1}\) is with respect to \(p=\infty .\)
Thus
So that
Now using (36) we finish the proof.
We make
Let \(\left( \mathbb {R}^{N},\left\Vert \cdot \right\Vert _{p}\right) \), \(N\in \mathbb {N}\); where \(\left\Vert \cdot \right\Vert _{p}\) is the \(L_{p}\)-norm, \(1\leq p\leq \infty \). \(\mathbb {R}^{N} \) is a Banach space, and \(\left( \mathbb {R}^{N}\right) ^{j}\) denotes the \(j\)-fold product space \(\mathbb {R}^{N}\times ...\times \mathbb {R}^{N}\) endowed with the max-norm \(\left\Vert x\right\Vert _{\left( \mathbb {R}^{N}\right) ^{j}}:=\underset {1\leq \lambda \leq j}{\max }\left\Vert x_{\lambda }\right\Vert _{p}\), where \(x:=\left( x_{1},...,x_{j}\right) \in \left( \mathbb {R}^{N}\right) ^{j}.\)
Let \(\left( X,\left\Vert \cdot \right\Vert _{\gamma }\right) \) be a general Banach space. Then the space \(L_{j}:=L_{j}\Big( \big( \mathbb {R}^{N}\big) ^{j};X\Big) \) of all \(j\)-multilinear continuous maps \(g:\big( \mathbb {R}^{N}\big) ^{j}\rightarrow X\), \(j=1,...,m\), is a Banach space with norm
Let \(M\) be a non-empty convex and compact subset of \(\mathbb {R}^{k}\) and \(x_{0}\in M\) is fixed.
Let \(O\) be an open subset of \(\mathbb {R}^{N}:M\subset O\). Let \(f:O\rightarrow X\) be a continuous function, whose Fréchet derivatives (see [ 19 ] ) \(f^{\left( j\right) }:O\rightarrow L_{j}=L_{j}\left( \left( \mathbb {R}^{N}\right) ^{j};X\right) \) exist and are continuous for \(1\leq j\leq m\), \(m\in \mathbb {N}\).
Call \(\left( x-x_{0}\right) ^{j}:=\left( x-x_{0},...,x-x_{0}\right) \in \left( \mathbb {R}^{N}\right) ^{j}\), \(x\in M\).
We will work with \(f|_{M}.\)
Then, by Taylor’s formula [ 12 ] , [ 19 , p. 124 ] , we get
where the remainder is the Riemann integral
here we set \(f^{\left( 0\right) }\left( x_{0}\right) \left( x-x_{0}\right) ^{0}=f\left( x_{0}\right) .\)
We consider
\(h{\gt}0.\)
We obtain
by Lemma 7.1.1, [ 1 , p. 208 ] , where \(\left\lceil \cdot \right\rceil \) is the ceiling.
Therefore for all \(x\in M\) (see [ 1 , pp. 121–122 ] ):
by a change of variable, where
is a (polynomial) spline function, see [ 1 , p. 210–211 ] .
Also from there we get
with equality true only at \(t=0\).
Therefore it holds
We have found that
\(\forall \) \(x,x_{0}\in M.\)
Here \(0{\lt}\omega _{1}\left( f^{\left( m\right) },h\right) {\lt}\infty \), by \(M\) being compact and \(f^{\left( m\right) }\) being continuous on \(M\).
One can rewrite (60) as follows:
a pointwise functional inequality on \(M\).
Here \(\left( \cdot -x_{0}\right) ^{j}\) maps \(M\) into \(\left( \mathbb {R}^{N}\right) ^{j}\) and it is continuous, also \(f^{\left( j\right) }\left( x_{0}\right) \) maps \(\left( \mathbb {R}^{N}\right) ^{j}\) into \(X\) and it is continuous. Hence their composition \(f^{\left( j\right) }\left( x_{0}\right) \left( \cdot -x_{0}\right) ^{j}\) is continuous from \(M\) into \(X\).
Clearly \(f\left( \cdot \right) -\sum \limits _{j=0}^{m}\frac{f^{\left( j\right) }\left( x_{0}\right) \left( \cdot -x_{0}\right) ^{j}}{j!}\in C\left( M,X\right) \), hence \(\left\Vert f\left( \cdot \right) -\sum \limits _{j=0}^{m}\frac{f^{\left( j\right) }\left( x_{0}\right) \left( \cdot -x_{0}\right) ^{j}}{j!}\right\Vert _{\gamma }\in C\left( M\right) \).
Let \(\left\{ \widetilde{L}_{N}\right\} _{N\in \mathbb {N}}\) be a sequence of positive linear operators mapping \(C\left( M\right) \) into \(C\left( M\right) .\)
Therefore we obtain
\(\forall \) \(N\in \mathbb {N}\), \(\forall \) \(x_{0}\in M\). â–¡
Clearly (62) is valid when \(M=\prod \limits _{i=1}^{N}\left[ a_{i},b_{i}\right] \) and \(\widetilde{L}_{n}=\widetilde{A}_{n}\), see (28).
All the above is preparation for the following theorem, where we assume Fréchet differentiability of functions.
This will be a direct application of Theorem 10.2, [ 11 , pp. 268–270 ] . The operators \(A_{n},\) \(\widetilde{A}_{n}\) fulfill its assumptions, see (27), (28), (30), (31) and (32).
We present the following high order approximation results.
Let \(O\) open subset of \(\left( \mathbb {R}^{N},\left\Vert \cdot \right\Vert _{p}\right) \), \(p\in \left[ 1,\infty \right] \), such that \(\prod \limits _{i=1}^{N}\left[ a_{i},b_{i}\right] \subset O\subseteq \mathbb {R}^{N}\), and let \(\left( X,\left\Vert \cdot \right\Vert _{\gamma }\right) \) be a general Banach space. Let \(m\in \mathbb {N}\) and \(f\in C^{m}\left( O,X\right) \), the space of \(m\)-times continuously Fréchet differentiable functions from \(O\) into \(X\). We study the approximation of \(f|_{\prod \limits _{i=1}^{N}\left[ a_{i},b_{i}\right] }.\) Let \(x_{0}\in \left( \prod \limits _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) \) and \(r{\gt}0\). Then
1)
2) additionally if \(f^{\left( j\right) }\left( x_{0}\right) =0\), \(j=1,...,m\), we have
3)
and
4)
We need
The function \(\left( \widetilde{A}_{n}\left( \left\Vert \cdot -x_{0}\right\Vert _{p}^{m}\right) \right) \left( x_{0}\right) \) is continuous in \(x_{0}\in \left( \prod \limits _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) \), \(m\in \mathbb {N}\).
We make
By Remark 10.4, [ 11 , p. 273 ] , we get that
for all \(k=1,...,m.\) â–¡
We give
(to theorem 6, case of \(m=1\)) Then
1)
and
2)
\(r{\gt}0.\)
We make
We estimate \(0{\lt}\alpha {\lt}1\), \(m,n\in \mathbb {N}:n^{1-\alpha }{\gt}2\),
(where \(b-a=\left( b_{1}-a_{1},...,b_{N}-a_{N}\right) \)).
We have proved that (\(\forall \) \(x_{0}\in \prod \limits _{i=1}^{N}\left[ a_{i},b_{i}\right] \))
(\(0{\lt}\alpha {\lt}1\), \(m,n\in \mathbb {N}:n^{1-\alpha }{\gt}2\)).
And, consequently it holds
So, we have that \(\varphi _{1}\left( n\right) \rightarrow 0\), as \(n\rightarrow +\infty \). Thus, when \(p\in \left[ 1,\infty \right] \), from theorem 6 we have the convergence to zero in the right hand sides of parts (1), (2).
Next we estimate \(\left\Vert \left( \widetilde{A}_{n}\left( f^{\left( j\right) }\left( x_{0}\right) \left( \cdot -x_{0}\right) ^{j}\right) \right) \left( x_{0}\right) \right\Vert _{\gamma }.\)
We have that
When \(p=\infty \), \(j=1,...,m,\) we obtain
We further have that
That is
Therefore when \(p=\infty \), for \(j=1,...,m\), we have proved:
and converges to zero, as \(n\rightarrow \infty .\) â–¡
We conclude:
In theorem 6, the right hand sides of (65) and (66) converge to zero as \(n\rightarrow \infty \), for any \(p\in \left[ 1,\infty \right] \).
Also in Corollary 1, the right hand sides of (68) and (69) converge to zero as \(n\rightarrow \infty \), for any \(p\in \left[ 1,\infty \right] .\)
We have proved that the left hand sides of 63, 64, 65, 66 and 68, 69 converge to zero as \(n\rightarrow \infty \), for \(p\in \left[ 1,\infty \right] \). Consequently \(A_{n}\rightarrow I\) (unit operator) pointwise and uniformly, as \(n\rightarrow \infty \), where \(p\in \left[ 1,\infty \right] \). In the presence of initial conditions we achieve a higher speed of convergence, see 64. Higher speed of convergence happens also to the left hand side of 63.
We give
(to 6) Let \(O\) open subset of \(\left( \mathbb {R}^{N},\left\Vert \cdot \right\Vert _{\infty }\right) \), such that
\(\prod \limits _{i=1}^{N}\left[ a_{i},b_{i}\right] \subset O\subseteq \mathbb {R}^{N}\), and let \(\left( X,\left\Vert \cdot \right\Vert _{\gamma }\right) \) be a general Banach space. Let \(m\in \mathbb {N}\) and \(f\in C^{m}\left( O,X\right) \), the space of \(m\)-times continuously Fréchet differentiable functions from \(O\) into \(X\). We study the approximation of \(f|_{\prod \limits _{i=1}^{N}\left[ a_{i},b_{i}\right] }.\) Let \(x_{0}\in \left( \prod \limits _{i=1}^{N}\left[ a_{i},b_{i}\right] \right) \) and \(r{\gt}0\). Here \(\varphi _{1}\left( n\right) \) as in 74 and \(\varphi _{2j}\left( n\right) \) as in 82, where \(n\in \mathbb {N}:n^{1-\alpha }{\gt}2\), \(0{\lt}\alpha {\lt}1\), \(j=1,...,m.\) Then
1)
2) additionally, if \(f^{\left( j\right) }\left( x_{0}\right) =0\), \(j=1,...,m\), we have
3)
We continue with
Let \(f\in C_{B}\left( \mathbb {R}^{N},X\right) ,\) \(0{\lt}\beta {\lt}1\), \(x\in \mathbb {R}^{N},\) \(N,n\in \mathbb {N}\) with \(n^{1-\beta }{\gt}2\), \(\omega _{1} \) is for \(p=\infty \). Then
1)
2)
Given that \(f\in \left( C_{U}\left( \mathbb {R}^{N},X\right) \cap C_{B}\left( \mathbb {R}^{N},X\right) \right) \), we obtain \(\underset {n\rightarrow \infty }{\lim }B_{n}\left( f\right) =f\), uniformly.
Hence
proving the claim.
We give
Let \(f\in C_{B}\left( \mathbb {R}^{N},X\right) ,\) \(0{\lt}\beta {\lt}1\), \(x\in \mathbb {R}^{N},\) \(N,n\in \mathbb {N}\) with \(n^{1-\beta }{\gt}2\), \(\omega _{1} \) is for \(p=\infty \). Then
1)
2)
Given that \(f\in \left( C_{U}\left( \mathbb {R}^{N},X\right) \cap C_{B}\left( \mathbb {R}^{N},X\right) \right) ,\) we obtain \(\underset {n\rightarrow \infty }{\lim }C_{n}\left( f\right) =f\), uniformly.
Thus it holds (by (40))
We observe that
proving the claim.
We also present
Let \(f\in C_{B}\left( \mathbb {R}^{N},X\right) ,\) \(0{\lt}\beta {\lt}1\), \(x\in \mathbb {R}^{N},\) \(N,n\in \mathbb {N}\) with \(n^{1-\beta }{\gt}2\), \(\omega _{1} \) is for \(p=\infty .\) Then
1)
2)
Given that \(f\in \left( C_{U}\left( \mathbb {R}^{N},X\right) \cap C_{B}\left( \mathbb {R}^{N},X\right) \right) ,\) we obtain \(\underset {n\rightarrow \infty }{\lim }D_{n}\left( f\right) =f\), uniformly.
proving the claim.
We make
Let \(f\in C_{B}\left( \mathbb {R}^{N},X\right) \), \(N\in \mathbb {N}\), where \(\left( X,\left\Vert \cdot \right\Vert _{\gamma }\right) \) is a Banach space. We define the general neural network operator
Clearly \(l_{nk}\left( f\right) \) is an \(X\)-valued bounded linear functional such that \(\left\Vert l_{nk}\left( f\right) \right\Vert _{\gamma }\leq \left\Vert \left\Vert f\right\Vert _{\gamma }\right\Vert _{\infty }.\)
Hence \(F_{n}\left( f\right) \) is a bounded linear operator with \(\left\Vert \left\Vert F_{n}\left( f\right) \right\Vert _{\gamma }\right\Vert _{\infty }\leq \left\Vert \left\Vert f\right\Vert _{\gamma }\right\Vert _{\infty }\).
We need
Let \(f\in C_{B}\left( \mathbb {R}^{N},X\right) \), \(N\geq 1\). Then \(F_{n}\left( f\right) \in C_{B}\left( \mathbb {R}^{N},X\right) .\)
Next we prove the continuity of \(F_{n}\left( f\right) \). Notice for \(N=1\), \(Z=\psi \) by (16).
We will use the generalized Weierstrass \(M\) test: If a sequence of positive constants \(M_{1},M_{2},M_{3},...,\) can be found such that in some interval
(a) \(\left\Vert u_{n}\left( x\right) \right\Vert _{\gamma }\leq M_{n}\), \(n=1,2,3,...\)
(b) \(\sum M_{n}\) converges,
then \(\sum u_{n}\left( x\right) \) is uniformly and absolutely convergent in the interval.
Also we will use:
If \(\{ u_{n}\left( x\right) \} \), \(n=1,2,3,...\) are continuous in \(\left[ a,b\right] \) and if \(\sum u_{n}\left( x\right) \) converges uniformly to the sum \(S\left( x\right) \) in \(\left[ a,b\right] \), then \(S\left( x\right) \) is continuous in \(\left[ a,b\right] \). I.e. a uniformly convergent series of continuous functions is a continuous function. First we prove claim for \(N=1\).
We will prove that \(\sum _{k=-\infty }^{\infty }l_{nk}\left( f\right) \psi \left( nx-k\right) \) is continuous in \(x\in \mathbb {R}\).
There always exists \(\lambda \in \mathbb {N}\) such that \(nx\in \left[ -\lambda ,\lambda \right] .\)
Since \(nx\leq \lambda \), then \(-nx\geq -\lambda \) and \(k-nx\geq k-\lambda \geq 0\), when \(k\geq \lambda \). Therefore
So for \(k\geq \lambda \) we get
and
Hence by the generalized Weierstrass \(M\) test we obtain that \(\sum \limits _{k=\lambda }^{\infty }l_{nk}\left( f\right) \psi \left( nx-k\right) \) is uniformly and absolutely convergent on \(\left[ -\frac{\lambda }{n},\frac{\lambda }{n}\right] .\)
Since \(l_{nk}\left( f\right) \psi \left( nx-k\right) \) is continuous in \(x\), then \(\sum _{k=\lambda }^{\infty }l_{nk}\left( f\right) \psi \left( nx-k\right) \) is continuous on \(\left[ -\frac{\lambda }{n},\frac{\lambda }{n}\right] .\)
Because \(nx\geq -\lambda \), then \(-nx\leq \lambda \), and \(k-nx\leq k+\lambda \leq 0\), when \(k\leq -\lambda \). Therefore
So for \(k\leq -\lambda \) we get
and
Hence by Weierstrass \(M\) test we obtain that \(\sum _{k=-\infty }^{-\lambda }l_{nk}\left( f\right) \psi \left( nx-k\right) \) is uniformly and absolutely convergent on \(\left[ -\frac{\lambda }{n},\frac{\lambda }{n}\right] .\)
Since \(l_{nk}\left( f\right) \psi \left( nx-k\right) \) is continuous in \(x\), then \(\sum _{k=-\infty }^{-\lambda }l_{nk}\left( f\right) \psi \left( nx-k\right) \) is continuous on \(\left[ -\frac{\lambda }{n},\frac{\lambda }{n}\right] .\)
So we proved that \(\sum _{k=\lambda }^{\infty }l_{nk}\left( f\right) \psi \left( nx-k\right) \) and \(\sum _{k=-\infty }^{-\lambda }l_{nk}\left( f\right) \psi \left( nx-k\right) \) are continuous on \(\mathbb {R}\). Since \(\sum _{k=-\lambda +1}^{\lambda -1}l_{nk}\left( f\right) \psi \left( nx-k\right) \) is a finite sum of continuous functions on \(\mathbb {R}\), it is also a continuous function on \(\mathbb {R}\).
Writing
we have it as a continuous function on \(\mathbb {R}\). Therefore \(F_{n}\left( f\right) \), when \(N=1\), is a continuous function on \(\mathbb {R}\).
When \(N=2\) we have
(there always exist \(\lambda _{1},\lambda _{2}\in \mathbb {N}\) such that \(nx_{1}\in \left[ -\lambda _{1},\lambda _{1}\right] \) and \(nx_{2}\in \left[ -\lambda _{2},\lambda _{2}\right] \))
(For convenience call
Thus
Notice that the finite sum of continuous functions \(F\left( k_{1},k_{2},x_{1},x_{2}\right) \),
\(\sum _{k_{1}=-\lambda _{1}+1}^{\lambda _{1}-1}\sum _{k_{2}=-\lambda _{2}+1}^{\lambda _{2}-1}F\left( k_{1},k_{2},x_{1},x_{2}\right) \) is a continuous function.
The rest of the summands of \(F_{n}\left( f,x_{1},x_{2}\right) \) are treated all the same way and similarly to the case of \(N=1\). The method is demonstrated as follows.
We will prove that \(\sum _{k_{1}=\lambda _{1}}^{\infty }\sum _{k_{2}=-\infty }^{-\lambda _{2}}l_{nk}\left( f\right) \psi \left( nx_{1}-k_{1}\right) \psi \left( nx_{2}-k_{2}\right) \) is continuous in \(\left( x_{1},x_{2}\right) \in \mathbb {R}^{2}\).
The continuous function
and
So by the Weierstrass \(M\) test we get that
\(\sum _{k_{1}=\lambda _{1}}^{\infty }\sum _{k_{2}=-\infty }^{-\lambda _{2}}l_{nk}\left( f\right) \psi \left( nx_{1}-k_{1}\right) \psi \left( nx_{2}-k_{2}\right) \) is uniformly and absolutely convergent. Therefore it is continuous on \(\mathbb {R}^{2}.\)
Next we prove continuity on \(\mathbb {R}^{2}\) of
\(\sum _{k_{1}=-\lambda _{1}+1}^{\lambda _{1}-1}\sum _{k_{2}=-\infty }^{-\lambda _{2}}l_{nk}\left( f\right) \psi \left( nx_{1}-k_{1}\right) \psi \left( nx_{2}-k_{2}\right) \).
Notice here that
and
So the double series under consideration is uniformly convergent and continuous. Clearly \(F_{n}\left( f,x_{1},x_{2}\right) \) is proved to be continuous on \(\mathbb {R}^{2}.\)
Similarly reasoning one can prove easily now, but with more tedious work, that \(F_{n}\left( f,x_{1},...,x_{N}\right) \) is continuous on \(\mathbb {R}^{N} \), for any \(N\geq 1\). We choose to omit this similar extra work.
By (27) it is obvious that \(\left\Vert \left\Vert A_{n}\left( f\right) \right\Vert _{\gamma }\right\Vert _{\infty }\leq \left\Vert \left\Vert f\right\Vert _{\gamma }\right\Vert _{\infty }{\lt}\infty \), and \(A_{n}\left( f\right) \in C\left( \prod \limits _{i=1}^{N}\left[ a_{i},b_{i}\right] ,X\right) \), given that \(f\in C\left( \prod \limits _{i=1}^{N}\left[ a_{i},b_{i}\right] ,X\right) .\)
Call \(L_{n}\) any of the operators \(A_{n},B_{n},C_{n},D_{n}.\)
Clearly then
etc.
Therefore we get
the contraction property.
Also we see that
Here \(L_{n}^{k}\) are bounded linear operators. â–¡
Here \(N\in \mathbb {N}\), \(0{\lt}\beta {\lt}1.\) Denote by
We give the condensed
Let \(f\in \Omega \), \(0{\lt}\beta {\lt}1\), \(x\in Y;\) \(n,\) \(N\in \mathbb {N}\) with \(n^{1-\beta }{\gt}2\). Then
(i)
where \(\omega _{1}\) is for \(p=\infty ,\)
and
(ii)
For \(f\) uniformly continuous and in \(\Omega \) we obtain
pointwise and uniformly.
Next we do iterated neural network approximation (see also [ 9 ] ).
We make
Let \(r\in \mathbb {N}\) and \(L_{n}\) as above. We observe that
Then
That is
We give
All here as in theorem 11 and \(r\in \mathbb {N}\), \(\tau \left( n\right) \) as in (124). Then
So that the speed of convergence to the unit operator of \(L_{n}^{r}\) is not worse than of \(L_{n}.\)
We make
Let \(m_{1},...,m_{r}\in \mathbb {N}:m_{1}\leq m_{2}\leq ...\leq m_{r}\), \(0{\lt}\beta {\lt}1\), \(f\in \Omega \). Then \(\varphi \left( m_{1}\right) \geq \varphi \left( m_{2}\right) \geq ...\geq \varphi \left( m_{r}\right) \), \(\varphi \) as in (121).
Therefore
Assume further that \(m_{i}^{1-\beta }{\gt}2\), \(i=1,...,r\). Then
Let \(L_{m_{i}}\) as above, \(i=1,...,r,\) all of the same kind.
We write
Hence by the triangle inequality property of \(\left\Vert \left\Vert \cdot \right\Vert _{\gamma }\right\Vert _{\infty }\) we get
(repeatedly applying (117))
That is, we proved
We give
Let \(f\in \Omega \); \(N,\) \(m_{1},m_{2},...,m_{r}\in \mathbb {N}:m_{1}\leq m_{2}\leq ...\leq m_{r},\) \(0{\lt}\beta {\lt}1;\) \(m_{i}^{1-\beta }{\gt}2\), \(i=1,...,r,\) \(x\in Y,\) and let \(\left( L_{m_{1}},...,L_{m_{r}}\right) \) as \(\left( A_{m_{1}},...,A_{m_{r}}\right) \) or \(\left( B_{m_{1}},...,B_{m_{r}}\right) \) or \(\left( C_{m_{1}},...,C_{m_{r}}\right) \) or \(\left( D_{m_{1}},...,D_{m_{r}}\right) \), \(p=\infty .\) Then
Clearly, we notice that the speed of convergence to the unit operator of the multiply iterated operator is not worse than the speed of \(L_{m_{1}}.\)
We continue with
Let all as in Corollary 2, and \(r\in \mathbb {N}\). Here \(\varphi _{3}\left( n\right) \) is as in (85). Then
A typical application of all of our results is when \(\left( X,\left\Vert \cdot \right\Vert _{\gamma }\right) =\left( \mathbb {C},\left\vert \cdot \right\vert \right) \), where \(\mathbb {C}\) are the complex numbers.
Bibliography
- 1
G.A. Anastassiou, Moments in Probability and Approximation Theory, Pitman Research Notes in Math., Vol. 287, Longman Sci. & Tech., Harlow, U.K., 1993.
- 2
G.A. Anastassiou, Rate of convergence of some neural network operators to the unit-univariate case, J. Math. Anal. Appli. 212 (1997), pp. 237–262. https://doi.org/10.1006/jmaa.1997.5494
- 3
G.A. Anastassiou, Quantitative Approximations, Chapman&Hall/CRC, Boca Raton, New York, 2001.
- 4
G.A. Anastassiou, Inteligent Systems: Approximation by Artificial Neural Networks, Intelligent Systems Reference Library, Vol. 19, Springer, Heidelberg, 2011. https://doi.org/10.1007/978-3-642-21431-8
- 5
G.A. Anastassiou, Univariate hyperbolic tangent neural network approximation, Mathematics and Computer Modelling, 53 (2011), pp. 1111–1132. https://doi.org/10.1016/j.mcm.2010.11.072
- 6
G.A. Anastassiou, Multivariate hyperbolic tangent neural network approximation, Computers and Mathematics 61 (2011), pp. 809–821. https://doi.org/10.1016/j.camwa.2010.12.029
- 7
G.A. Anastassiou, Multivariate sigmoidal neural network approximation, Neural Networks 24 (2011), pp. 378–386. https://doi.org/10.1016/j.neunet.2011.01.003
- 8
G.A. Anastassiou, Univariate sigmoidal neural network approximation, J. of Computational Analysis and Applications, vol. 14 (2012) no. 4, pp. 659–690.
- 9
G.A. Anastassiou, Approximation by neural networks iterates, Advances in Applied Mathematics and Approximation Theory, pp. 1-20, Springer Proceedings in Math. & Stat., Springer, New York, 2013, Eds. G. Anastassiou, O. Duman.
- 10
G. Anastassiou, Intelligent Systems II: Complete Approximation by Neural Network Operators, Springer, Heidelberg, New York, 2016.
- 11
G. Anastassiou, Intelligent Computations: Abstract Fractional Calculus, Inequalities, Approximations, Springer, Heidelberg, New York, 2018.
- 12
H. Cartan, Differential Calculus, Hermann, Paris, 1971.
- 13
Z. Chen and F. Cao, The approximation operators with sigmoidal functions, Computers and Mathematics with Applications, 58 (2009), pp. 758–765. https://doi.org/10.1016/j.camwa.2009.05.001
- 14
D. Costarelli, R. Spigler, Approximation results for neural network operators activated by sigmoidal functions, Neural Networks 44 (2013), pp. 101–106. https://doi.org/10.1016/j.neunet.2013.03.015
- 15
D. Costarelli, R. Spigler, Multivariate neural network operators with sigmoidal activation functions, Neural Networks 48 (2013), pp. 72–77.https://doi.org/10.1016/j.neunet.2013.07.009
- 16
S. Haykin, Neural Networks: A Comprehensive Foundation (2 ed.), Prentice Hall, New York, 1998.
- 17
W. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, 7 (1943), pp. 115–133. https://doi.org/10.1007/bf02478259
- 18
T.M. Mitchell, Machine Learning, WCB-McGraw-Hill, New York, 1997.
- 19
L.B. Rall, Computational Solution of Nonlinear Operator Equations, John Wiley & Sons, New York, 1969.