Return to Article Details General multivariate arctangent function activated neural network approximations

General multivariate arctangent function activated neural network approximations

George A. Anastassiou

received: March 22, 2022; accepted: May 28, 2022; published online: August 25, 2022.

Here we expose multivariate quantitative approximations of Banach space valued continuous multivariate functions on a box or RN, NN, by the multivariate normalized, quasi-interpolation, Kantorovich type and quadrature type neural network operators. We treat also the case of approximation by iterated operators of the last four types. These approximations are derived by establishing multidimensional Jackson type inequalities involving the multivariate modulus of continuity of the engaged function or its high order Fréchet derivatives. Our multivariate operators are defined by using a multidimensional density function induced by the arctangent function. The approximations are pointwise and uniform. The related feed-forward neural network is with one hidden layer.

MSC. 41A17, 41A25, 41A30, 41A36.

Keywords. arctangent function, multivariate neural network approximation, quasi-interpolation operator, Kantorovich type operator, quadrature type operator, multivariate modulus of continuity, abstract approximation, iterated approximation.

Department of Mathematical Sciences, University of Memphis, Memphis, TN 38152, U.S.A., e-mail: ganastss@memphis.edu.

1 Introduction

The author in [ 2 ] and [ 3 ] , see chapters 2–5, was the first to establish neural network approximations to continuous functions with rates by very specifically defined neural network operators of Cardaliagnet-Euvrard and “Squashing” types, by employing the modulus of continuity of the engaged function or its high order derivative, and producing very tight Jackson type inequalities. He treats there both the univariate and multivariate cases. The defining these operators “bell-shaped” and “squashing” functions are assumed to be of compact support. Also in [ 3 ] he gives the Nth order asymptotic expansion for the error of weak approximation of these two operators to a special natural class of smooth functions, see chapters 4–5 there.

For this article the author is motivated by the article [ 13 ] of Z. Chen and F. Cao, also by [ 4 ] , [ 5 ] , [ 6 ] , [ 7 ] , [ 8 ] , [ 9 ] , [ 10 ] , [ 11 ] , [ 14 ] , [ 15 ] .

The author here performs multivariate arctangent function based neural network approximations to continuous functions over boxes or over the whole RN, NN. Also he does iterated approximation. All convergences here are with rates expressed via the multivariate modulus of continuity of the involved function or its high order Fréchet derivative and given by very tight multidimensional Jackson type inequalities.

The author here comes up with the “right” precisely defined multivariate normalized, quasi-interpolation neural network operators related to boxes or RN, as well as Kantorovich type and quadrature type related operators on RN. Our boxes are not necessarily symmetric to the origin. In preparation to prove our results we establish important properties of the basic multivariate density function induced by arctangent function and defining our operators.

Feed-forward neural networks (FNNs) with one hidden layer, the only type of networks we deal with in this chapter, are mathematically expressed as

Nn(x)=j=0ncjσ(ajx+bj), \ \ \ xRs, \ \ sN,

where for 0jn, bjR are the thresholds, ajRs are the connection weights, cjR are the coefficients, ajx is the inner product of aj and x, and σ is the activation function of the network. In many fundamental network models, the activation function is the arctangent function. About neural networks read [ 16 ] , [ 17 ] , [ 18 ] .

2 Auxiliary Notions

We consider the

arctanx=0xdz1+z2, \ xR.
1

We will be using

h(x):=2πarctan(π2x)=2π0πx2dz1+z2, \ xR,
2

which is a sigmoid type function and it is strictly increasing. We have that

h(0)=0, \ h(x)=h(x), \ h(+)=1, \ h()=1,

and

h(x)=44+π2x2>0, \ all xR.
3

We consider the activation function

ψ(x):=14(h(x+1)h(x1)), \ xR,
4

and we notice that

ψ(x)=ψ(x),
5

it is an even function.

Since x+1>x1, then h(x+1)>h(x1), and ψ(x)>0, all xR.

We see that

ψ(0)=1πarctanπ20.319.
6

Let x>0, we have that

ψ(x)=14(h(x+1)h(x1))=4π2x(4+π2(x+1)2)(4+π2(x1)2)<0.
7

That is

ψ(x)<0, for x>0.
8

That is ψ is strictly decreasing on [0,) and clearly is strictly increasing on (,0], and ψ(0)=0.

Observe that

limx+ψ(x)=14(h(+)h(+))=0,andlimxψ(x)=14(h()h())=0.
9

That is the x-axis is the horizontal asymptote on ψ.

All in all, ψ is a bell symmetric function with maximum ψ(0)0.319.

We need

Theorem 1 [ 11 , p. 286 ]

We have that

i=ψ(xi)=1, \  xR.
10

Theorem 2 [ 11 , p. 287 ]

It holds

ψ(x)dx=1.
11

So that ψ(x) is a density function on R.

We mention

Theorem 3 [ 11 , p. 288 ]

Let 0<α<1, and nN with n1α>2. It holds

k=|nxk|n1αψ(nxk)<2π2(n1α2).
12

Denote by the integral part of the number and by the ceiling of the number.

We need

Theorem 4 [ 11 , p. 289 ]

Let x[a,b]R and nN so that nanb. It holds

1k=nanbψ(nxk)<1ψ(1)4.9737, \  x[a,b].
13

Note 1 [ 11 , pp. 290–291 ]

i) We have that

limnk=nanbψ(nxk)1,
14

for at least some x[a,b].

ii) For large enough nN we always obtain nanb. Also aknb, iff naknb.

In general, by theorem 1, it holds

k=nanbψ(nxk)1.
15

We introduce

Z(x1,...,xN):=Z(x):=i=1Nψ(xi), \ \ x=(x1,...,xN)RN, NN.
16

It has the properties:

(i) Z(x)>0, xRN,

(ii)

k=Z(xk):=k1=k2=...kN=Z(x1k1,...,xNkN)=1, 
17

where k:=(k1,...,kn)ZN, xRN,

hence

(iii)

k=Z(nxk)=1,
18

xRN; nN,

and

(iv)

RNZ(x)dx=1,
19

that is Z is a multivariate density function.

Here denote x:=max{|x1|,...,|xN|}, xRN, also set :=(,...,), :=(,...,) upon the multivariate context, and

na:=(na1,...,naN),nb:=(nb1,...,nbN),
20

where a:=(a1,...,aN), b:=(b1,...,bN).

We obviously see that

k=nanbZ(nxk)=k=nanb(i=1Nψ(nxiki))=k1=na1nb1...kN=naNnbN(i=1Nψ(nxiki))=i=1N(ki=nainbiψ(nxiki)).

For 0<β<1 and nN, a fixed xRN, we have that

k=nanbψ(nxk)=k=naknx1nβnbψ(nxk)+k=naknx>1nβnbψ(nxk).
22

In the last two sums the counting is over disjoint vector sets of k’s, because the condition knx>1nβ implies that there exists at least one |krnxr|>1nβ, where r{1,...,N}.

(v) As in [ 10 , pp. 379–380 ] , we derive that

k=naknx>1nβnbZ(nxk)<(???)2π2(n1β2), \ 0<β<1,
23

with nN:n1β>2, xi=1N[ai,bi].

(vi) By theorem 4 we get that

0<1k=nanbZ(nxk)<1(ψ(1))N(4.9737)N,
24

x(i=1N[ai,bi]), nN.

It is also clear that

(vii)

k=knx>1nβZ(nxk)<2π2(n1β2),
25

0<β<1, nN:n1β>2, xRN.

Furthermore it holds

limnk=nanbZ(nxk)1,
26

for at least some x(i=1N[ai,bi]).

Here (X,γ) is a Banach space.

Let fC(i=1N[ai,bi],X), x=(x1,...,xN)i=1N[ai,bi], nN such that nainbi, i=1,...,N.

We introduce and define the following multivariate linear normalized neural network operator (x:=(x1,...,xN)(i=1N[ai,bi])):

An(f,x1,...,xN):=:=An(f,x):=k=nanbf(kn)Z(nxk)k=nanbZ(nxk)=k1=na1nb1k2=na2nb2...kN=naNnbNf(k1n,...,kNn)(i=1Nψ(nxiki))i=1N(ki=nainbiψ(nxiki)).

For large enough nN we always obtain nainbi, i=1,...,N. Also aikinbi, iff naikinbi, i=1,...,N.

When gC(i=1N[ai,bi]) we define the companion operator

A~n(g,x):=k=nanbg(kn)Z(nxk)k=nanbZ(nxk).
28

Clearly A~n is a positive linear operator. We have that

A~n(1,x)=1, \  x(i=1N[ai,bi]).

Notice that An(f)C(i=1N[ai,bi],X) and A~n(g)C(i=1N[ai,bi]).

Furthermore it holds

An(f,x)γk=nanbf(kn)γZ(nxk)k=nanbZ(nxk)=A~n(fγ,x),
29

xi=1N[ai,bi].

Clearly fγC(i=1N[ai,bi]).

So, we have that

An(f,x)γA~n(fγ,x),
30

xi=1N[ai,bi], nN, fC(i=1N[ai,bi],X).

Let cX and gC(i=1N[ai,bi]), then cgC(i=1N[ai,bi],X).

Furthermore it holds

An(cg,x)=cA~n(g,x), \  xi=1N[ai,bi].
31

Since A~n(1)=1, we get that

An(c)=c, \  cX.
32

We call A~n the companion operator of An.

For convinience we call

An(f,x):=k=nanbf(kn)Z(nxk)==k1=na1nb1k2=na2nb2...kN=naNnbNf(k1n,...,kNn)(i=1Nψ(nxiki)),

x(i=1N[ai,bi]).

That is

An(f,x):=An(f,x)k=nanbZ(nxk),
34

x(i=1N[ai,bi]), nN.

Hence

An(f,x)f(x)=An(f,x)f(x)(k=nanbZ(nxk))k=nanbZ(nxk).
35

Consequently we derive

An(f,x)f(x)γ(???)(4.9737)NAn(f,x)f(x)k=nanbZ(nxk)γ,
36

x(i=1N[ai,bi]).

We will estimate the right hand side of (36).

For the last and others we need

Definition 1 [ 11 , p. 274 ]

Let M be a convex and compact subset of (RN,p), p[1,], and (X,γ) be a Banach space. Let fC(M,X). We define the first modulus of continuity of f as

ω1(f,δ):=supx,yMxypδf(x)f(y)γ, \ 0<δdiam(M).
37

If δ>diam(M), then

ω1(f,δ)=ω1(f,diam(M)).
38

Notice ω1(f,δ) is increasing in δ>0. For fCB(M,X) (continuous and bounded functions) ω1(f,δ) is defined similarly.

Lemma 1 [ 11 , p. 274 ]

We have ω1(f,δ)0 as δ0, iff fC(M,X), where M is a convex compact subset of (RN,p), p[1,].

Clearly we have also: fCU(RN,X) (uniformly continuous functions), iff ω1(f,δ)0 as δ0, where ω1 is defined similarly to (37). The space CB(RN,X) denotes the continuous and bounded functions on RN.

When fCB(RN,X) we define,

Bn(f,x):=Bn(f,x1,...,xN):=k=f(kn)Z(nxk):=k1=k2=...kN=f(k1n,k2n,...,kNn)(i=1Nψ(nxiki)),

nN, xRN, NN, the multivariate quasi-interpolation neural network operator.

Also for fCB(RN,X) we define the multivariate Kantorovich type neural network operator

Cn(f,x):=Cn(f,x1,...,xN):=k=(nNknk+1nf(t)dt)Z(nxk)=k1=k2=...kN=(nNk1nk1+1nk2nk2+1n...kNnkN+1nf(t1,...,tN)dt1...dtN)(i=1Nψ(nxiki)),

nN,  xRN.

Again for fCB(RN,X), NN, we define the multivariate neural network operator of quadrature type Dn(f,x), nN, as follows.

Let θ=(θ1,...,θN)NN, r=(r1,...,rN)Z+N, wr=wr1,r2,...rN0, such that r=0θwr=r1=0θ1r2=0θ2...rN=0θNwr1,r2,...rN=1; kZN and

δnk(f):=δn,k1,k2,...,kN(f):=r=0θwrf(kn+rnθ)=r1=0θ1r2=0θ2...rN=0θNwr1,r2,...rNf(k1n+r1nθ1,k2n+r2nθ2,...,kNn+rNnθN),

where rθ:=(r1θ1,r2θ2,...,rNθN).

We set

Dn(f,x):=Dn(f,x1,...,xN):=k=δnk(f)Z(nxk)=k1=k2=...kN=δn,k1,k2,...,kN(f)(i=1Nψ(nxiki)),

xRN.

In this article we study the approximation properties of An,Bn,Cn, Dn neural network operators and as well of their iterates. That is, the quantitative pointwise and uniform convergence of these operators to the unit operator I.

3 Multivariate general Neural Network Approximations

Here we present several vectorial neural network approximations to Banach space valued functions given with rates.

We give

Theorem 5

Let fC(i=1N[ai,bi],X), 0<β<1, x(i=1N[ai,bi]), N,nN with n1β>2. Then

1)

An(f,x)f(x)γ(4.9737)N[ω1(f,1nβ)+4fγπ2(n1β2)]=:λ1(n),
43

and

2)

An(f)fγλ1(n).
44

We notice that limnAn(f)=γf, pointwise and uniformly.

Above ω1 is with respect to p=.

Proof â–¼
We observe that
Δ(x):=An(f,x)f(x)k=nanbZ(nxk)=k=nanbf(kn)Z(nxk)k=nanbf(x)Z(nxk)=k=nanb(f(kn)f(x))Z(nxk).

Thus

Unknown environment 'bgroup'

So that

Δ(x)γω1(f,1nβ)+4fγπ2(n1β2).
50

Now using (36) we finish the proof.

Proof â–¼

We make

Remark 1 [ 11 , pp. 263–266 ]

Let (RN,p), NN; where p is the Lp-norm, 1p. RN is a Banach space, and (RN)j denotes the j-fold product space RN×...×RN endowed with the max-norm x(RN)j:=max1λjxλp, where x:=(x1,...,xj)(RN)j.

Let (X,γ) be a general Banach space. Then the space Lj:=Lj((RN)j;X) of all j-multilinear continuous maps g:(RN)jX, j=1,...,m, is a Banach space with norm

g:=gLj:=sup(x(RN)j=1)g(x)γ=supg(x)γx1p...xjp.
51

Let M be a non-empty convex and compact subset of Rk and x0M is fixed.

Let O be an open subset of RN:MO. Let f:OX be a continuous function, whose Fréchet derivatives (see [ 19 ] ) f(j):OLj=Lj((RN)j;X) exist and are continuous for 1jm, mN.

Call (xx0)j:=(xx0,...,xx0)(RN)j, xM.

We will work with f|M.

Then, by Taylor’s formula [ 12 ] , [ 19 , p. 124 ] , we get

f(x)=j=0m1j!f(j)(x0)(xx0)j+Rm(x,x0), all xM,
52

where the remainder is the Riemann integral

Rm(x,x0):=01(1u)m1(m1)!(f(m)(x0+u(xx0))f(m)(x0))(xx0)mdu,
53

here we set f(0)(x0)(xx0)0=f(x0).

We consider

w:=ω1(f(m),h):=supx,yM:xyphf(m)(x)f(m)(y),
54

h>0.

We obtain

(f(m)(x0+u(xx0))f(m)(x0))(xx0)mγf(m)(x0+u(xx0))f(m)(x0)xx0pmwxx0pmuxx0ph,

by Lemma 7.1.1, [ 1 , p. 208 ] , where is the ceiling.

Therefore for all xM (see [ 1 , pp. 121–122 ] ):

Rm(x,x0)γwxx0pm01uxx0ph(1u)m1(m1)!du=wΦm(xx0p)

by a change of variable, where

Φm(t):=0|t|sh(|t|s)m1(m1)!ds=1m!(j=0(|t|jh)+m), \  tR,
57

is a (polynomial) spline function, see [ 1 , p. 210–211 ] .

Also from there we get

Φm(t)(|t|m+1(m+1)!h+|t|m2m!+h|t|m18(m1)!), \  tR,
58

with equality true only at t=0.

Therefore it holds

Rm(x,x0)γw(xx0pm+1(m+1)!h+xx0pm2m!+hxx0pm18(m1)!), \  xM.
59

We have found that

f(x)j=0m1j!f(j)(x0)(xx0)jγω1(f(m),h)(xx0pm+1(m+1)!h+xx0pm2m!+hxx0pm18(m1)!)<,

x,x0M.

Here 0<ω1(f(m),h)<, by M being compact and f(m) being continuous on M.

One can rewrite (60) as follows:

f()j=0mf(j)(x0)(x0)jj!γω1(f(m),h)(x0pm+1(m+1)!h+x0pm2m!+hx0pm18(m1)!),  x0M,

a pointwise functional inequality on M.

Here (x0)j maps M into (RN)j and it is continuous, also f(j)(x0) maps (RN)j into X and it is continuous. Hence their composition f(j)(x0)(x0)j is continuous from M into X.

Clearly f()j=0mf(j)(x0)(x0)jj!C(M,X), hence f()j=0mf(j)(x0)(x0)jj!γC(M).

Let {L~N}NN be a sequence of positive linear operators mapping C(M) into C(M).

Therefore we obtain

(L~N(f()j=0mf(j)(x0)(x0)jj!γ))(x0)ω1(f(m),h)[(L~N(x0pm+1))(x0)(m+1)!h+(L~N(x0pm))(x0)2m!+h(L~N(x0pm1))(x0)8(m1)!],

NN, x0M. â–¡

Clearly (62) is valid when M=i=1N[ai,bi] and L~n=A~n, see (28).

All the above is preparation for the following theorem, where we assume Fréchet differentiability of functions.

This will be a direct application of Theorem 10.2, [ 11 , pp. 268–270 ] . The operators An, A~n fulfill its assumptions, see (27), (28), (30), (31) and (32).

We present the following high order approximation results.

Theorem 6

Let O open subset of (RN,p), p[1,], such that i=1N[ai,bi]ORN, and let (X,γ) be a general Banach space. Let mN and fCm(O,X), the space of m-times continuously Fréchet differentiable functions from O into X. We study the approximation of f|i=1N[ai,bi]. Let x0(i=1N[ai,bi]) and r>0. Then

1)

(An(f))(x0)j=0m1j!(An(f(j)(x0)(x0)j))(x0)γ1rm!ω1(f(m),r((A~n(x0pm+1))(x0))1m+1)((A~n(x0pm+1))(x0))(mm+1)[1(m+1)+r2+mr28],

2) additionally if f(j)(x0)=0, j=1,...,m, we have

(An(f))(x0)f(x0)γ1rm!ω1(f(m),r((A~n(x0pm+1))(x0))1m+1)((A~n(x0pm+1))(x0))(mm+1)[1(m+1)+r2+mr28],

3)

(An(f))(x0)f(x0)γj=1m1j!(An(f(j)(x0)(x0)j))(x0)γ+1rm!ω1(f(m),r((A~n(x0pm+1))(x0))1m+1)((A~n(x0pm+1))(x0))(mm+1)[1(m+1)+r2+mr28],

and

4)

An(f)fγ,i=1N[ai,bi]j=1m1j!(An(f(j)(x0)(x0)j))(x0)γ,x0i=1N[ai,bi]+1rm!ω1(f(m),r(A~n(x0pm+1))(x0),x0i=1N[ai,bi]1m+1)(A~n(x0pm+1))(x0),x0i=1N[ai,bi](mm+1)[1(m+1)+r2+mr28].

We need

Lemma 2

The function (A~n(x0pm))(x0) is continuous in x0(i=1N[ai,bi]), mN.

Proof â–¼
By Lemma 10.3, [ 11 , p. 272 ] .
Proof â–¼

We make

Remark 2

By Remark 10.4, [ 11 , p. 273 ] , we get that

(A~n(x0pk))(x0),x0i=1N[ai,bi](A~n(x0pm+1))(x0),x0i=1N[ai,bi](km+1),

for all k=1,...,m. â–¡

We give

Corollary 1

(to theorem 6, case of m=1) Then

1)

(An(f))(x0)f(x0)γ(An(f(1)(x0)(x0)))(x0)γ+12rω1(f(1),r((A~n(x0p2))(x0))12)((A~n(x0p2))(x0))12[1+r+r24],

and

2)

(An(f))fγ,i=1N[ai,bi](An(f(1)(x0)(x0)))(x0)γ,x0i=1N[ai,bi]+12rω1(f(1),r(A~n(x0p2))(x0),x0i=1N[ai,bi]12)(A~n(x0p2))(x0),x0i=1N[ai,bi]12[1+r+r24],

r>0.

We make

Remark 3

We estimate 0<α<1, m,nN:n1α>2,

Unknown environment 'bgroup'

(where ba=(b1a1,...,bNaN)).

We have proved that ( x0i=1N[ai,bi])

A~n(x0m+1)(x0)<(4.9737)N{1nα(m+1)+2bam+1π2(n1α2)}=:φ1(n)
74

(0<α<1, m,nN:n1α>2).

And, consequently it holds

A~n(x0m+1)(x0),x0i=1N[ai,bi]<<(4.9737)N{1nα(m+1)+2bam+1π2(n1α2)}=φ1(n)0, \ as n+.

So, we have that φ1(n)0, as n+. Thus, when p[1,], from theorem 6 we have the convergence to zero in the right hand sides of parts (1), (2).

Next we estimate (A~n(f(j)(x0)(x0)j))(x0)γ.

We have that

(A~n(f(j)(x0)(x0)j))(x0)=k=nanbf(j)(x0)(knx0)jZ(nx0k)k=nanbZ(nx0k).
76

When p=, j=1,...,m, we obtain

f(j)(x0)(knx0)jγf(j)(x0)knx0j.
77

We further have that

Unknown environment 'bgroup'

That is

(A~n(f(j)(x0)(x0)j))(x0)γ0, as n.

Therefore when p=, for j=1,...,m, we have proved:

(A~n(f(j)(x0)(x0)j))(x0)γ<<(4.9737)Nf(j)(x0){1nαj+2bajπ2(n1α2)}(4.9737)Nf(j){1nαj+2bajπ2(n1α2)}=:φ2j(n)<,

and converges to zero, as n. â–¡

We conclude:

In theorem 6, the right hand sides of (65) and (66) converge to zero as n, for any p[1,].

Also in Corollary 1, the right hand sides of (68) and (69) converge to zero as n, for any p[1,].

Conclusion 1

We have proved that the left hand sides of 63, 64, 65, 66 and 68, 69 converge to zero as n, for p[1,]. Consequently AnI (unit operator) pointwise and uniformly, as n, where p[1,]. In the presence of initial conditions we achieve a higher speed of convergence, see 64. Higher speed of convergence happens also to the left hand side of 63.

We give

Corollary 2

(to 6) Let O open subset of (RN,), such that
i=1N[ai,bi]ORN, and let (X,γ) be a general Banach space. Let mN and fCm(O,X), the space of m-times continuously Fréchet differentiable functions from O into X. We study the approximation of f|i=1N[ai,bi]. Let x0(i=1N[ai,bi]) and r>0. Here φ1(n) as in 74 and φ2j(n) as in 82, where nN:n1α>2, 0<α<1, j=1,...,m. Then

1)

(An(f))(x0)j=0m1j!(An(f(j)(x0)(x0)j))(x0)γ1rm!ω1(f(m),r(φ1(n))1m+1)(φ1(n))(mm+1)[1(m+1)+r2+mr28],

2) additionally, if f(j)(x0)=0, j=1,...,m, we have

(An(f))(x0)f(x0)γ1rm!ω1(f(m),r(φ1(n))1m+1)(φ1(n))(mm+1)[1(m+1)+r2+mr28],

3)

An(f)fγ,i=1N[ai,bi]j=1mφ2j(n)j!+1rm!ω1(f(m),r(φ1(n))1m+1)(φ1(n))(mm+1)[1(m+1)+r2+mr28]=:φ3(n)0, as n.

We continue with

Theorem 7

Let fCB(RN,X), 0<β<1, xRN, N,nN with n1β>2, ω1 is for p=. Then

1)

Bn(f,x)f(x)γω1(f,1nβ)+4fγπ2(n1β2)=:λ2(n),
86

2)

Bn(f)fγλ2(n).
87

Given that f(CU(RN,X)CB(RN,X)), we obtain limnBn(f)=f, uniformly.

Proof â–¼
We have that
Bn(f,x)f(x)=(???)k=f(kn)Z(nxk)f(x)k=Z(nxk)=k=(f(kn)f(x))Z(nxk).

Hence

Unknown environment 'bgroup'

proving the claim.

Proof â–¼

We give

Theorem 8

Let fCB(RN,X), 0<β<1, xRN, N,nN with n1β>2, ω1 is for p=. Then

1)

Cn(f,x)f(x)γω1(f,1n+1nβ)+4fγπ2(n1β2)=:λ3(n),
93

2)

Cn(f)fγλ3(n).
94

Given that f(CU(RN,X)CB(RN,X)), we obtain limnCn(f)=f, uniformly.

Proof â–¼
We notice that
knk+1nf(t)dt=k1nk1+1nk2nk2+1n...kNnkN+1nf(t1,t2,...,tN)dt1dt2...dtN=

01n01n...01nf(t1+k1n,t2+k2n,...,tN+kNn)dt1...dtN=01nf(t+kn)dt.
95

Thus it holds (by (40))

Cn(f,x)=k=(nN01nf(t+kn)dt)Z(nxk).
96

We observe that

Unknown environment 'bgroup'

proving the claim.

Proof â–¼

We also present

Theorem 9

Let fCB(RN,X), 0<β<1, xRN, N,nN with n1β>2, ω1 is for p=. Then

1)

Dn(f,x)f(x)γω1(f,1n+1nβ)+4fγπ2(n1β2)=λ4(n),
103

2)

Dn(f)fγλ4(n).
104

Given that f(CU(RN,X)CB(RN,X)), we obtain limnDn(f)=f, uniformly.

Proof â–¼
We have that (by (42))
Unknown environment 'bgroup'

proving the claim.

Proof â–¼

We make

Definition 2

Let fCB(RN,X), NN, where (X,γ) is a Banach space. We define the general neural network operator

Fn(f,x):=k=lnk(f)Z(nxk)={Bn(f,x), \ if lnk(f)=f(kn),Cn(f,x), \ if lnk(f)=nNknk+1nf(t)dt,Dn(f,x), \ if lnk(f)=δnk(f).
111

Clearly lnk(f) is an X-valued bounded linear functional such that lnk(f)γfγ.

Hence Fn(f) is a bounded linear operator with Fn(f)γfγ.

We need

Theorem 10

Let fCB(RN,X), N1. Then Fn(f)CB(RN,X).

Proof â–¼
Clearly Fn(f) is a bounded function.

Next we prove the continuity of Fn(f). Notice for N=1, Z=ψ by (16).

We will use the generalized Weierstrass M test: If a sequence of positive constants M1,M2,M3,..., can be found such that in some interval

(a) un(x)γMn, n=1,2,3,...

(b) Mn converges,

then un(x) is uniformly and absolutely convergent in the interval.

Also we will use:

If {un(x)}, n=1,2,3,... are continuous in [a,b] and if un(x) converges uniformly to the sum S(x) in [a,b], then S(x) is continuous in [a,b]. I.e. a uniformly convergent series of continuous functions is a continuous function. First we prove claim for N=1.

We will prove that k=lnk(f)ψ(nxk) is continuous in xR.

There always exists λN such that nx[λ,λ].

Since nxλ, then nxλ and knxkλ0, when kλ. Therefore

k=λψ(nxk)=k=λψ(knx)k=λψ(kλ)=k=0ψ(k)1.
112

So for kλ we get

lnk(f)γψ(nxk)fγψ(kλ),

and

fγk=λψ(kλ)fγ.

Hence by the generalized Weierstrass M test we obtain that k=λlnk(f)ψ(nxk) is uniformly and absolutely convergent on [λn,λn].

Since lnk(f)ψ(nxk) is continuous in x, then k=λlnk(f)ψ(nxk) is continuous on [λn,λn].

Because nxλ, then nxλ, and knxk+λ0, when kλ. Therefore

k=λψ(nxk)=k=λψ(knx)k=λψ(k+λ)=k=0ψ(k)1.

So for kλ we get

lnk(f)γψ(nxk)fγψ(k+λ),
113

and

fγk=λψ(k+λ)fγ.

Hence by Weierstrass M test we obtain that k=λlnk(f)ψ(nxk) is uniformly and absolutely convergent on [λn,λn].

Since lnk(f)ψ(nxk) is continuous in x, then k=λlnk(f)ψ(nxk) is continuous on [λn,λn].

So we proved that k=λlnk(f)ψ(nxk) and k=λlnk(f)ψ(nxk) are continuous on R. Since k=λ+1λ1lnk(f)ψ(nxk) is a finite sum of continuous functions on R, it is also a continuous function on R.

Writing

k=lnk(f)ψ(nxk)=k=λlnk(f)ψ(nxk)+k=λ+1λ1lnk(f)ψ(nxk)+k=λlnk(f)ψ(nxk)

we have it as a continuous function on R. Therefore Fn(f), when N=1, is a continuous function on R.

When N=2 we have

Fn(f,x1,x2)=k1=k2=lnk(f)ψ(nx1k1)ψ(nx2k2)=k1=ψ(nx1k1)(k2=lnk(f)ψ(nx2k2))

(there always exist λ1,λ2N such that nx1[λ1,λ1] and nx2[λ2,λ2])

=k1=ψ(nx1k1)[k2=λ2lnk(f)ψ(nx2k2)+k2=λ2+1λ21lnk(f)ψ(nx2k2)+k2=λ2lnk(f)ψ(nx2k2)]=k1=k2=λ2lnk(f)ψ(nx1k1)ψ(nx2k2)+k1=k2=λ2+1λ21lnk(f)ψ(nx1k1)ψ(nx2k2)+k1=k2=λ2lnk(f)ψ(nx1k1)ψ(nx2k2)=:().

(For convenience call

F(k1,k2,x1,x2):=lnk(f)ψ(nx1k1)ψ(nx2k2). )

Thus

()=k1=λ1k2=λ2F(k1,k2,x1,x2)+k1=λ1+1λ11k2=λ2F(k1,k2,x1,x2)+k1=λ1k2=λ2F(k1,k2,x1,x2)+k1=λ1k2=λ2+1λ21F(k1,k2,x1,x2)+k1=λ1+1λ11k2=λ2+1λ21F(k1,k2,x1,x2)+k1=λ1k2=λ2+1λ21F(k1,k2,x1,x2)+k1=λ1k2=λ2F(k1,k2,x1,x2)+k1=λ1+1λ11k2=λ2F(k1,k2,x1,x2)+k1=λ1k2=λ2F(k1,k2,x1,x2).

Notice that the finite sum of continuous functions F(k1,k2,x1,x2),

k1=λ1+1λ11k2=λ2+1λ21F(k1,k2,x1,x2) is a continuous function.

The rest of the summands of Fn(f,x1,x2) are treated all the same way and similarly to the case of N=1. The method is demonstrated as follows.

We will prove that k1=λ1k2=λ2lnk(f)ψ(nx1k1)ψ(nx2k2) is continuous in (x1,x2)R2.

The continuous function

lnk(f)γψ(nx1k1)ψ(nx2k2)fγψ(k1λ1)ψ(k2+λ2),

and

fγk1=λ1k2=λ2ψ(k1λ1)ψ(k2+λ2)==fγ(k1=λ1ψ(k1λ1))(k2=λ2ψ(k2+λ2))fγ(k1=0ψ(k1))(k2=0ψ(k2))fγ.

So by the Weierstrass M test we get that

k1=λ1k2=λ2lnk(f)ψ(nx1k1)ψ(nx2k2) is uniformly and absolutely convergent. Therefore it is continuous on R2.

Next we prove continuity on R2 of

k1=λ1+1λ11k2=λ2lnk(f)ψ(nx1k1)ψ(nx2k2).

Notice here that

lnk(f)γψ(nx1k1)ψ(nx2k2)fγψ(nx1k1)ψ(k2+λ2)fγψ(0)ψ(k2+λ2)=0.319fγψ(k2+λ2),

and

0.319fγ(k1=λ1+1λ111)(k2=λ2ψ(k2+λ2))==0.319fγ(2λ11)(k2=0ψ(k2))0.319(2λ11)fγ.

So the double series under consideration is uniformly convergent and continuous. Clearly Fn(f,x1,x2) is proved to be continuous on R2.

Similarly reasoning one can prove easily now, but with more tedious work, that Fn(f,x1,...,xN) is continuous on RN, for any N1. We choose to omit this similar extra work.

Proof â–¼

Remark 4

By (27) it is obvious that An(f)γfγ<, and An(f)C(i=1N[ai,bi],X), given that fC(i=1N[ai,bi],X).

Call Ln any of the operators An,Bn,Cn,Dn.

Clearly then

Ln2(f)γ=Ln(Ln(f))γLn(f)γfγ, 
117

etc.

Therefore we get

Lnk(f)γfγ, \  kN
118

the contraction property.

Also we see that

Lnk(f)γLnk1(f)γ...Ln(f)γfγ.
119

Here Lnk are bounded linear operators. â–¡

Notation 1

Here NN, 0<β<1. Denote by

cN:={(4.9737)N, \ if Ln=An,1, \ if Ln=Bn,Cn,Dn,φ(n):={1nβ, \ if Ln=AnBn,1n+1nβ, \ if Ln=Cn,Dn,Ω:={C(i=1N[ai,bi],X), if Ln=AnCB(RN,X), \ if Ln=Bn,Cn,Dn,andY:={i=1N[ai,bi], \ if Ln=AnRN, \ if Ln=Bn,Cn,Dn.

We give the condensed

Theorem 11

Let fΩ, 0<β<1, xY; n, NN with n1β>2. Then

(i)

Ln(f,x)f(x)γcN[ω1(f,φ(n))+4fγπ2(n1β2)]=:τ(n),
124

where ω1 is for p=,

and

(ii)

Ln(f)fγτ(n)0, as n.
125

For f uniformly continuous and in Ω we obtain

limnLn(f)=f,

pointwise and uniformly.

Proof â–¼
By ??.
Proof â–¼

Next we do iterated neural network approximation (see also [ 9 ] ).

We make

Remark 5

Let rN and Ln as above. We observe that

Lnrff=(LnrfLnr1f)+(Lnr1fLnr2f)+(Lnr2fLnr3f)+...+(Ln2fLnf)+(Lnff).

Then

LnrffγLnrfLnr1fγ+Lnr1fLnr2fγ+Lnr2fLnr3fγ+...+Ln2fLnfγ+Lnffγ=Lnr1(Lnff)γ+Lnr2(Lnff)γ+Lnr3(Lnff)γ+...+Ln(Lnff)γ+LnffγrLnffγ.

That is

LnrffγrLnffγ.
127

We give

Theorem 12

All here as in theorem 11 and rN, τ(n) as in (124). Then

Lnrffγrτ(n).
128

So that the speed of convergence to the unit operator of Lnr is not worse than of Ln.

Proof â–¼
By (127) and (125).
Proof â–¼

We make

Remark 6

Let m1,...,mrN:m1m2...mr, 0<β<1, fΩ. Then φ(m1)φ(m2)...φ(mr), φ as in (121).

Therefore

ω1(f,φ(m1))ω1(f,φ(m2))...ω1(f,φ(mr)).
129

Assume further that mi1β>2, i=1,...,r. Then

2π2(m11β2)2π2(m21β2)...2π2(mr1β2).
130

Let Lmi as above, i=1,...,r, all of the same kind.

We write

Lmr(Lmr1(...Lm2(Lm1f)))f==Lmr(Lmr1(...Lm2(Lm1f)))Lmr(Lmr1(...Lm2f))+Lmr(Lmr1(...Lm2f))Lmr(Lmr1(...Lm3f))+Lmr(Lmr1(...Lm3f))Lmr(Lmr1(...Lm4f))+...+Lmr(Lmr1f)Lmrf+Lmrff=Lmr(Lmr1(...Lm2))(Lm1ff)+Lmr(Lmr1(...Lm3))(Lm2ff)+Lmr(Lmr1(...Lm4))(Lm3ff)+...+Lmr(Lmr1ff)+Lmrff.

Hence by the triangle inequality property of γ we get

Lmr(Lmr1(...Lm2(Lm1f)))fγLmr(Lmr1(...Lm2))(Lm1ff)γ+Lmr(Lmr1(...Lm3))(Lm2ff)γ+Lmr(Lmr1(...Lm4))(Lm3ff)γ+...+Lmr(Lmr1ff)γ+Lmrffγ

(repeatedly applying (117))

Lm1ffγ+Lm2ffγ+Lm3ffγ+...+Lmr1ffγ+Lmrffγ=i=1rLmiffγ.

That is, we proved

Lmr(Lmr1(...Lm2(Lm1f)))fγi=1rLmiffγ.
134

We give

Theorem 13

Let fΩ; N, m1,m2,...,mrN:m1m2...mr, 0<β<1; mi1β>2, i=1,...,r, xY, and let (Lm1,...,Lmr) as (Am1,...,Amr) or (Bm1,...,Bmr) or (Cm1,...,Cmr) or (Dm1,...,Dmr), p=. Then

Lmr(Lmr1(...Lm2(Lm1f)))(x)f(x)γLmr(Lmr1(...Lm2(Lm1f)))fγi=1rLmiffγcNi=1r[ω1(f,φ(mi))+4fγπ2(mi1β2)]rcN[ω1(f,φ(m1))+4fγπ2(m11β2)].

Clearly, we notice that the speed of convergence to the unit operator of the multiply iterated operator is not worse than the speed of Lm1.

Proof â–¼
Using (134), (129), (130) and (124), (125).
Proof â–¼

We continue with

Theorem 14

Let all as in Corollary 2, and rN. Here φ3(n) is as in (85). Then

AnrffγrAnffγrφ3(n).
136

Proof â–¼
By (127) and (85).
Proof â–¼

Application 1

A typical application of all of our results is when (X,γ)=(C,||), where C are the complex numbers.

Bibliography

1

G.A. Anastassiou, Moments in Probability and Approximation Theory, Pitman Research Notes in Math., Vol. 287, Longman Sci. & Tech., Harlow, U.K., 1993.

2

G.A. Anastassiou, Rate of convergence of some neural network operators to the unit-univariate case, J. Math. Anal. Appli. 212 (1997), pp. 237–262. https://doi.org/10.1006/jmaa.1997.5494 \includegraphics[scale=0.1]{ext-link.png}

3

G.A. Anastassiou, Quantitative Approximations, Chapman&Hall/CRC, Boca Raton, New York, 2001.

4

G.A. Anastassiou, Inteligent Systems: Approximation by Artificial Neural Networks, Intelligent Systems Reference Library, Vol. 19, Springer, Heidelberg, 2011. https://doi.org/10.1007/978-3-642-21431-8 \includegraphics[scale=0.1]{ext-link.png}

5

G.A. Anastassiou, Univariate hyperbolic tangent neural network approximation, Mathematics and Computer Modelling, 53 (2011), pp. 1111–1132. https://doi.org/10.1016/j.mcm.2010.11.072 \includegraphics[scale=0.1]{ext-link.png}

6

G.A. Anastassiou, Multivariate hyperbolic tangent neural network approximation, Computers and Mathematics 61 (2011), pp. 809–821. https://doi.org/10.1016/j.camwa.2010.12.029 \includegraphics[scale=0.1]{ext-link.png}

7

G.A. Anastassiou, Multivariate sigmoidal neural network approximation, Neural Networks 24 (2011), pp. 378–386. https://doi.org/10.1016/j.neunet.2011.01.003 \includegraphics[scale=0.1]{ext-link.png}

8

G.A. Anastassiou, Univariate sigmoidal neural network approximation, J. of Computational Analysis and Applications, vol. 14 (2012) no. 4, pp. 659–690.

9

G.A. Anastassiou, Approximation by neural networks iterates, Advances in Applied Mathematics and Approximation Theory, pp. 1-20, Springer Proceedings in Math. & Stat., Springer, New York, 2013, Eds. G. Anastassiou, O. Duman.

10

G. Anastassiou, Intelligent Systems II: Complete Approximation by Neural Network Operators, Springer, Heidelberg, New York, 2016.

11

G. Anastassiou, Intelligent Computations: Abstract Fractional Calculus, Inequalities, Approximations, Springer, Heidelberg, New York, 2018.

12

H. Cartan, Differential Calculus, Hermann, Paris, 1971.

13

Z. Chen and F. Cao, The approximation operators with sigmoidal functions, Computers and Mathematics with Applications, 58 (2009), pp. 758–765. https://doi.org/10.1016/j.camwa.2009.05.001 \includegraphics[scale=0.1]{ext-link.png}

14

D. Costarelli, R. Spigler, Approximation results for neural network operators activated by sigmoidal functions, Neural Networks 44 (2013), pp. 101–106. https://doi.org/10.1016/j.neunet.2013.03.015 \includegraphics[scale=0.1]{ext-link.png}

15

D. Costarelli, R. Spigler, Multivariate neural network operators with sigmoidal activation functions, Neural Networks 48 (2013), pp. 72–77.https://doi.org/10.1016/j.neunet.2013.07.009 \includegraphics[scale=0.1]{ext-link.png}

16

S. Haykin, Neural Networks: A Comprehensive Foundation (2 ed.), Prentice Hall, New York, 1998.

17

W. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, 7 (1943), pp. 115–133. https://doi.org/10.1007/bf02478259 \includegraphics[scale=0.1]{ext-link.png}

18

T.M. Mitchell, Machine Learning, WCB-McGraw-Hill, New York, 1997.

19

L.B. Rall, Computational Solution of Nonlinear Operator Equations, John Wiley & Sons, New York, 1969.