Return to Article Details ATM transaction simulation: combination of ACDs and cox process

ATM Transaction Simulation:
Combination of ACDs and Cox Process

Reza Habibi
(Date: June 03, 2024; accepted: February 27, 2025; published online: June 30, 2025.)
Abstract.

Two main approaches for analyzing the ultra-frequency data such as ATM (auto teller machine) transaction are Cox process and autoregressive conditional durations (ACDs). This paper combines both models and gives its advantages. The functional data analysis proposes useful method for modeling the intensity of counting process. Two simulated cases results are verified. A real data set is analyzed and conclusions are also given.

Key words and phrases:
ATM transaction, ACD model, Cox process, functional data analysis, intensity function.
2005 Mathematics Subject Classification:
62M10.
Iran Banking Institute, Central Bank of Iran, Tehran, Iran

1. Introduction

Irregularly spaced financial time series have received considerable attention in high-frequency data literature, see [6]. High frequency time series forecasting is a crucial field that tackles the analysis of data recorded at very short intervals, from seconds to fractions of a second. This discipline is fundamental in various sectors, from meteorology to finance, energy management, and quality control in manufacturing. Intraday transactions of ATM and POS [8], trading in stock markets [11], and volatility patterns in high frequency trading [9] are well-known examples of these time series. In the current paper, the ATM transactions are studied. To this end, suppose that N(t) counts the number of intraday transactions of an ATM of a specific bank recorded up until time 0<t<1. Indeed, N(t) is a counting process with intensity function Λ(t).

There are two main independent approaches for analyzing these types of time series including the Cox Poisson process (referred as approach a, in this paper) from [11] and ACD models (approach b) from [4]. However, in the current paper, the combined approach (called approach c) is proposed. There, it is assumed that both types of Cox and ACD models govern on the data, simultaneously. According to the best author’s knowledge, this model is not applied before in the literature and has many advantages which are discussed in Section 2. Although, approaches a and b are customized for problem of in hand of the current paper.

Approacha: Cox process. Following [11], assume that N(t) is modeled as Cox process. That is a Poisson process with a random intensity Λ(t) where

Λ(t)=0tλ(s)𝑑s.

The Cox process is a type of point process. For comprehensive review on Cox process and generally point processes, see [3].

Often, to find the functional forms of λ and Λ, the functional data analysis (FDA) method is proposed which gives an approximation for λ(t).

Approachb: ACD models. The ACD model mainly uses the stopping times of N(t) and stopped process properties, without assuming N(t) being a Poisson process, as Cox process assumes. To describe more, let τk be the time of k-th transaction (in a day)

τk=inft{N(t)=k}.

The related duration be Lk defined by

Lk=τkτk1.

Let Lk be modeled by ACD model (with intercept c) from [4]; i.e.,

Lk=c+ϑkek,

at which errors ek’s are independent, positive random variables with E(ek)=1 and

ϑk=α+j=1pγjLkj+j=1qβjϑkj,

where p,q are unknown dimensions of model which are optimized during solving the case study problem while parameters γj and βj are unknown parameters which should be estimated. The authors from [4] proposed a close relationship between ACD and GARCH models. The ACDm package of software R estimates these parameters, accurately and quickly, see [1].

Approachc: Combined model. Here, it is assumed that both models of Cox and ACD are hold, simultaneously. Some advantages of this approach are:

  1. 1.

    Under this setting, the exact Monte Carlo simulation for dynamics of λ(t) is derived.

  2. 2.

    Often, FDA λ(t) is a time-consuming task, because of choosing the length of linear combination of orthogonal basis function.

  3. 3.

    Choosing basis functions and number of them are a little subjective which is critical in applied problems.

  4. 4.

    By combined model, N(t) is also simulated, directly, by Binomial distribution which approximates the Poisson distribution, see Section 2.3.

The rest of paper is organized as follows. In the next section, two methods are proposed to derive the dynamics of λ(t). Section 3 gives the results of simulations. A real data set is analyzed in Section 4. Concluding remarks are proposed in Section 5.

2. Dynamics of 𝝀(𝒕)

Here, dynamics of Λ(t) and λ(t)are derived. To this end, first, the FDA method is proposed which gives an approximation for λ(t). Then, under the combined model setting, the derivation of λ(t) is based on Monte Carlo simulation which uses the exact distribution of partial sums of Lk’s.

2.1. FDA 𝝀(𝒕)

In practice, FDA is used to model the intensity function λ(t) of Poisson process as random element in Hilbert space, see [7]. To this end, considering n days, let Ni(t) the number of transactions throughout the i-th day with intensity function λi(t). Following [8], to remove the periodically effects for different days, let λi(t) be the intensity function of i-th day and let

δi(t)=λi(t)λi7(t),

and consider the following functional autoregressive model for δi(t) as follows

δi(t)=01ρ(s,t)δi1(s)𝑑s+εi(t),

where kernel ρ is estimated using the functional principal component and error terms

{εi(t);t[0,1]}

are supposed to be independent functions such that E{εi(t)}=0 for each t[0,1] and

E01εi2(t)𝑑t=σ2<.

For comprehensive review on functional data analysis, see [10]. Package fda.usc of software R is useful instrument for studying functional time series data, see [5]. Then, use the basis representation for λ(t) such as

λ(t)=k=1mbkϕk(t),

(see [10]) for basis functions ϕk(t) for k=1,,m, say Fourier basis functions. Therefore,

Λ(t)=0tk=1mbkϕk(s)ds=k=1mbk0tϕk(s)𝑑s.

In the literature, widely used selections for ϕk(t) are

ϕk(t)={sin(0.5(k+1)t)πkisoddinteger,cos(0.5kt)πkiseveninteger.

However, in practice, we find Λ^(t) which we have

Λ^(t)=Λ(t)+ζ(t),

for some error terms ζ(t). Therefore,

Λ^(t)=k=1mbk0tϕk(s)𝑑s+ζ(t).

Following [2], this is a type of functional regression analysis. To this end, variables Λ^(tj) and 0tjϕk(s)𝑑s are computed for t=tj;j=1,…,M, (m<M) and parameters of a multiple regression model are estimated.

2.2. 𝝀(𝒕) in combined model

Here, the exact functional form of λ(t) is derived. To this end, let L0=τ0=0 and

τk=j=1kLj.

Notice that

P(τkt)=P(N(t)>k).

Hence,

P(τk1t)=P(N(t)>k1).

Let pk(t):=P(N(t)=k). One can see that

pk(t)=P(N(t)>k1)P(N(t)>k)=P(τk1t)P(τkt).

It is concluded that

pk(t)=P(j=1k1Ljt)P(j=1kLjt).

This relation leads to the computation of the exact probabilities. Notice that

pk(t)=P(τk1t<τk)=P(j=1k1Ljt<j=1kLj).

Therefore, the Monte Carlo estimate of pk(t) is the number of times (in M repetitions of Monte Carlo simulations) that the random interval

[j=1k1Lj;j=1kLj)

contains the constant number t. In practice, after fitting and ACD(p,q) to Lk, the Monte Carlo simulation method approximates the exact distribution of j=1kLj and j=1k1Lj, hence P(N(t)=k) is computed. Notice that

k!pk(t)=exp(Λ(t))Λk(t).

Therefore,

Λ(t)klog(Λ(t))+bk(t)=0,

where

bk(t)=log(k!pk(t)).

This is a non-linear equation and root-finding methods such as Newton-Raphson are applicable, as follows: For iteration r-th, let

Λr(t)=Λr1(t)+Λr1(t)klog(Λr1(t))+bk(t)1k/Λr1(t).

To find λ(t), after finding Λ(t) for discrete values of t’s, the numerical differentiation is applied to find discrete values of λ(t). Then, a smoothing method such as spline or smoothing polynomials is used to derive the functional form of λ(t).

2.3. Another combined 𝝀(𝒕)

In Sections 2.1 and 2.2, we proposed some methods for obtaining functional form of λ(t) based on Cox process and ACD models. However, in practice, the series of numbers of events that each occurred in small fraction of the time are recorded and it is necessary to simulate N(t), itself, directly.

The idea behind this method is that the Binomial distribution approximates the Poisson distribution. To propose the method, suppose that throughout a crowded business day which is expected the possible numbers of t’s in (th,t) where at them transaction occur i.e., n (of Bernoulli distribution) is large. However, because of some political or social events which has happened in yesterday, the probability of transaction pt is too low such that

nptthhλ(s)𝑑s.

Following [8] and noticing that

thhλ(s)𝑑shλ(t),

it is seen that 1hnpt is a good estimate of λ(t). Therefore, N(t) is simulated by sampling from binomial distribution with parameter (n,pt), directly and

λ(t)=1hnpt.

Then, by collecting number of transactions and fitting an ACD model, the empirical estimate of N(t) and consequently pt are estimated.

Suppose that, the dynamic of pt is proposed. For example, consider the dynamics of pt given by Ito stochastic differential equation, as follows:

dpt=αptdt+δptdBt,

where Bt is standard Brownian motion on (0,1). To obtain parameters α,δ, notice that they are mean and standard deviation of dptptdt which are estimated by their related samples values. Therefore, the dynamic of λ(t) is proposed.

3. Simulations

In this section, some simulated cases are analyzed. Banks often refuse to provide tick-by-tick transaction data of ATMs and do not make this type of data available (or at least hard available) to the public due to network security issues and keeping customer’s secrets. However, a small part of database is usually given to researchers from the core system of databases of banks. This is why, in the current paper, we only survey the simulated situations which correspond to real data. For using simulated data instead of real one, we must be sure that the simulated data with the combined model are good approximations for the real data. However, since in both cases, we use the dataset in [11], we are sure that these considerations are checked. Case 1 studies the Cox process with known dynamic for λ. Case 2 gives simulation results under the combined model setting.

Case 1: λ(t) as OU process. In the Cox process, motivated by [11], suppose that λ(t) is an Ornstein-Uhlenbeck (OU process) process defined by

dλ(t)=βλ(t)dt+βtdz,

where dz is increment of Brownian motion, t(0,1) and 0<β<. Here, it is supposed that β=0.2 which corresponds to the empirical results from [11]. Consider ti=i100,i=0,,99 and let Λ(ti)=1100u=1bλ(u), where b=[100ti]. To simulate N(t) at ti’s, increments N(ti)N(ti1) are samples from Poisson distribution with rate 1100u=abλ(u) where a=[100ti1]. In this way, the partial sums of increments generate paths of Poisson process. Therefore, Lk’s are computable. It is easy to see that Lk is ACD(1,1) process with intercept, as follows

Lk=0.0898+ϑkek,

where ek has exponential distribution with mean 1 and

ϑk=0.0017+0.885ek11.58ϑk1.

Case 2: Combined λ(t). Here addition to Cox process assumption, assume that Lk come from process ACD(1,1). For the weekly data from [11], the ACD model is defined by

Lk=0.01+ϑkek,
ϑk=0.002+0.65ek12ϑk1,

with ek being exponentially distributed random variable with rate 1. Hence, empirical distributions of Lk,k=1,2 are computed. Next, using the Monte Carlo method proposed in subsection 2.1, P(N(t)=2) is computed for various values of ti’s. Then, using the Newton-Raphson and numerical differentiation, values of Λ(t) and λ(t) are computed, respectively. The following figure gives the plot of λ(t). Smoothing λ(t) by basis Fourier function, it is seen that

λ(t)=(1+sin(t)+cos(t))/π.
Refer to caption
Figure 1. Plot of λ(t)

Although it is difficult to obtain real tick-by-tick data in practice, nevertheless, the case 3 provides an alternative method for reconstructingλ(t). For the weekly data from [11], the following smoothing results are provided:

λi(t)=(c0i+c1isin(t)+c2icos(t))/π,

where π=3.141592 and cji,j=0,1,2 are periodic functions with period 7 given as follows:

i 1 2 3 4 5 6 7
c0i 0.2 0.2 0.5 0.5 0.1 0.25 0.25
c1i 0.1 0.1 0.5 0.25 0.5 0.2 0.5
c2i 0.2 0.2 0.1 0.1 0.1 0.25 0.25
Table 1. Values of cji for j=0,1,2 and i=1,,7.

4. Real data sets

Here, the method of Section 2.3 is applied to 3 real-time series.

Data set 1. The dataset includes 11520 observations which are 15-minute by 15-minute ATM transactions of a selected branch of an Iranian Bank ABC (which we are not naming for security reasons) from March 11, 2024 to October 11 2024 (30 days) during 7 AM to 7 PM. For a day, there are n=48 15 minutes. Therefore, h=1/48. Thus, λ(t)=pt. The time series plot of first 5000 observations is given as follows:

Refer to caption
Refer to caption
Figure 2. (a) Plot of λ(t); (b) Time series of N(t).

It is seen that α=0.641,δ=0.457. Here, we provide fittings of the model on many real high frequency data sets, then obtain the optimal parameters, plot on the same figure the real data and the optimal model and last analyze the corresponding residuals. The following plot gives the simultaneously, time series of actual λ(t) (blue line) and its estimate (red line).

Refer to caption
Figure 3. Plot of λ(t) and its estimate.

The following table gives the max, min, mean and standard deviation (sd) of 5000 residuals

|λ(t)est(t)1|.

The table shows that errors are negligible.

Max Min Mean SD
0.0632 0.0024 0.0414 0.019
Table 2. Residuals properties: maximum, minimum, mean, and standard deviation.

Data set 2. In the second data set, the transactions of an ATM along one day for 6336 days are recorded. First the following ACD model is fitted to duration of transactions

Lk=0.045+ϑkek,
ϑk=0.001+0.34ek11.68ϑk1,

k=1,2,,6336. The following plot gives real N(t) is plotted against its estimated process derived from above ACD model using simulation of a Poisson process based on simulating the Lk’s partial sums. To better presentation the first 3000 observations i.e., the actual N(t) (blue line) and its estimate (red line) are presented. This figure shows the maximum closeness of both series.

Refer to caption
Figure 4. Plot of N(t) and its estimate.

Again, the summaries of errors are proposed in the following table.

Max Min Mean SD
0.0475 0.0064 0.0325 0.032
Table 3. Differences between N(t) and its estimate: maximum, minimum, mean, and standard deviation.

Also, a Poisson process is fitted to data, based on functional estimate of λ(t), given by

λ(t)=(0.01+0.002sin(t)+0.25cos(t))/2π.

The following table gives the related errors.

Max Min Mean SD
0.0734 0.0055 0.0455 0.043
Table 4. Differences between actual N(t) and its Poisson simulation: maximum, minimum, mean, and standard deviation.

Data set 3. Here, the functional form of λ(t) of previous data set are compared with its actual values. First the following table shows that the errors are negligible. Then, different scenarios are studied using 1 times standard deviations of errors as shocks to λ(t). The following figure shows the errors of actual λ(t) and its functional estimates. Shocks are simulated using normal distributions with zero means and 1 times standard deviations.

Refer to caption
Figure 5. Plot of errors between λ(t) and its estimate.
Refer to caption
Figure 6. Plot of λ(t) and its shock estimate.

As seen, the shocks increase as the number of time series increases. However, the shocks are negligible in the estimate of λ(t).

5. Concluding Remarks

This manuscript has many advantages and highlights as follows:

  1. 1.

    The compatibility of the combination of two common models used in the analysis of data with high frequency was examined and it was seen in the simulation section that these two models can be recovered from each other.

  2. 2.

    The use of functional data analysis was used as a practical solution for modeling the intensity function of the Poisson process, and the performance of this solution was seen alongside the previous two methods.

  3. 3.

    Mathematical models were developed to be useful for simulation analysis.

Acknowledgements.

The author thanks the referees and the editor for their useful remarks, which contributed to improving the quality of this article.

References