Autoregressive modeling of biological phenomena

7 years ago

Abstract

Many natural phenomena can be described by power-laws of temporal or spatial correlations. The equivalence in frequency domain is the 1/f spectrum. A closer look at various experimental data reveals more or less significant deviations from a 1/f characteristic. Such deviations are especially evident at low frequencies and less evident at high frequencies where spectra are very noisy. We exemplify such cases with four different types of phenomena offered by molecular biology (series of coding sequence lengths from microbial genomes, series of the atomic mobility of the protein main chain), cell biophysics (flickering of red blood cells), cognitive psychology (mentally generated series of apparent random numbers) and astrophysics (the X-ray flux variability of a galaxy). All these examples appear to be described by autoregressive models of the first-order AR(1) or higher-order models. This further shows that a spectrum needs to be first subjected to averaging as, long ago, suggested by Mandelbrot otherwise the spectra can be more or less easily confused and/or approximated by power-laws.

Authors

V.V. Morariu
National Institute of Research and Development for Isotopic and Molecular Technologies,
Department of Molecular and Biomolecular Physics, Cluj-Napoca, Romania

C. Vamoș
Tiberiu Popoviciu Institute of Numerical Analysis, Romanian Academy

S.M. Soltuz
Tiberiu Popoviciu Institute of Numerical Analysis, Romanian Academy

A. Pop
Astronomical Institute of the Romanian Academy, Astronomical Observatory Cluj-Napoca

L. Buimaga-Iarinca
National Institute of Research and Development for Isotopic and Molecular Technologies, Department of Molecular and Biomolecular Physics, Cluj-Napoca, Romania

Keywords

Autoregresive model; molecular biology; cell biophysics; cognitive psychology

Cite this paper as:

V.V. Morariu, C. Vamoş, Ş.M. Şoltuz, A. Pop, L. Buimaga-Iarinca, O. Zainea, Autoregressive modeling of biological phenomena, Biophysical Reviews and Letters, vol. 5 (2010) no. 3, pp. 109-128.

PDF

2008 Vamos Autoregressive modeling of biological phenomena

About this paper

Journal

Biophysical Reviews and Letters

Publisher Name

DOI

https://doi.org/10.1142/S1793048010001172

Print ISSN

1793-0480

Online ISSN

1793-7035

MR

ZBL

[1] H. E. Stanley and N. Ostrowsky ,Correlations and Connectivity: Geometric Aspects of Physics, Chemistry and Biology ( Kluwer , Dordrecht , 1990 ) . Crossref, Google Scholar

[2] A. Bunde and S. Havlin , Fractals in Science ( Springer , Berlin , 1994 ) . Google Scholar

[3] E. Milotti, 1/f noise: a pedagogical review , arXiv:physics/0204033 . Google Scholar

[4] B. B. Mandelbrot , Multifractals and 1/f Noise ( Springer , New York , 1998 ) . Google Scholar

[5] J. B. Bassingthwaighte , Fractal Physiology ( Oxford University Press , New York , 1994 ) Crossref, Google Scholar

[6] V. V. Morariu and A. Coza, Physica A 320, 461 (2003), DOI: 10.1016/S0378-4371(02)01661-8.Crossref, ISI, Google Scholar

[7] V. V. Morariu, A. Isvoran and O. Zainea, Chaos Soliton. Fract. 32, 1305 (2007), DOI: 10.1016/j.chaos.2005.12.023.Crossref, ISI, Google Scholar

[8] V. V. Morariu and A. Coza, Fluct. Noise Lett 1 (2001) p. L111.Google Scholar

[9] V. V. Morariuet al., Fractals 9, 379 (2001), DOI: 10.1142/S0218348X01000919.Link, ISI, Google Scholar

[10] M. König and J. Timmer, Astron. Astrophys. Suppl. Ser. 124, 589 (1997), DOI: 10.1051/aas:1997104.Crossref, Google Scholar

[11] J. Timmeret al., Phys. Rev. E 61, 1342 (2000), DOI: 10.1103/PhysRevE.61.1342.Crossref, ISI, Google Scholar

[12] Th. L. Thornton and D. L. Gilden, Psychon. Bull. Rev. 12, 409 (2005).Crossref, ISI, Google Scholar

[13] P. J. Brockwel and R. A. Davies , Time Series: Theory and Methods , 2nd edn. ( Springer , New York , 1991 ) . Crossref, Google Scholar

[14] J. D. Hamilton , Time Series Analysis ( Princeton University Press , 1994 ) . Google Scholar

[15] C. Vamoş, S. M. Şoltuz and M. Crăciun, Order 1 autoregressive process of finite length , arXiv:0709.2963 . Google Scholar

[16] P. Stoica and R. L. Moses ,Introduction to Spectral Analysis ( Prentice Hall , 1997 ) . Google Scholar

[17] W. S. Rasband, ImageJ, U.S. National Institutes of Health, Bethesda, Maryland, USA, http://rsb.info.nih.gov/ij, 1997-2010 . Google Scholar

[18] Y. P. Elsworth and J. F. James, Mon. Not. R. Astron. Soc. 198, 889 (1982).Crossref, ISI, Google Scholar

[19] J. C. Lochner, J. H. Swank and A. E. Szymkowiak, Astrophys. J. 76, 295 (1991).Google Scholar

[20] M. A. Smith and R. D. Robinson, ASP Conf. Ser. 292, 263 (2003).Google Scholar

[21] C. M. Gaskell and E. S. Klimek, Astron. Astrophys. Trans. 22, 661 (2003), DOI: 10.1080/1055679031000153851.Crossref, Google Scholar

[22] P. Uttley and I. M. McHardy, Mon. Not. R. Astron. Soc. 323, (2001), DOI: 10.1046/j.1365-8711.2001.04496.x.Google Scholar

[23] I. E. Papadakis and A. Lawrence, Mon. Not. R. Astron. Soc. 272, 161 (1995).Crossref, ISI, Google Scholar

[24] I. M. McHardyet al., Mon. Not. R. Astron. Soc. 348, 783 (2004), DOI: 10.1111/j.1365-2966.2004.07376.x.Crossref, ISI, Google Scholar

[25] C. K. Penget al., Phys. Rev. E 49, 1685 (1994), DOI: 10.1103/PhysRevE.49.1685.Crossref, ISI, Google Scholar

[26] C. Vamoş, Phys. Rev. E 75, 036705 (2007), DOI: 10.1103/PhysRevE.75.036705.Crossref, Google Scholar

[27] D. J. Li and S. Zhang , Prediction of Genomic Properties and Classification of Life by Protein Length Distribution ( ) , arXiv:0806.0205v1 . Google Scholar

[28] D. J. Li and S. Zhang, Mod. Phys. Lett. B 23, 3563 (2009), DOI: 10.1142/S0217984909021624.Link, ISI, Google Scholar

[29] D. J. Li and S. Zhang, Mod. Phys. Lett. B 23, 3471 (2009), DOI: 10.1142/S0217984909021533.Link, ISI, Google Scholar

[30] V. V. Morariu and L. Buimagă-Iarinca, Fluct. Noise Lett. 9, 47 (2010).Link, ISI, Google Scholar

Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea AUTOREGRESSIVE DESCRIPTION OF BIOLOGICAL PHENOMENA VASILE V. MORARIU 1 , CĂLIN VAMOŞ 2 , ALEXANDRU POP 3 ,ŞTEFAN M. ŞOLTUZ 2,4 , LUIZA BUIMAGA-IARINCA 1 , OANA ZAINEA 1 1 National Institute of Research and Development for Isotopic and Molecular Technologies, Department of Molecular and Biomolecular Physics, 400293, Cluj-Napoca, Romania 2 ”T. Popoviciu” Institute of Numerical Analysis, Romanian Academy, 400110 Cluj-Napoca, Romania 3 Astromomical Institute of the Romanian Academy, Astronomical Observatory, 19 Ciresilor, 400487 Cluj- Napoca, Romania 4 Departamento Matematicas, Universidad de los Andes, Carrera 1 No. 18A-10, Bogota, Columbia Many natural phenomena can be described by power-laws of the temporal or spatial correlations. The equivalent in frequency domain is the 1/f spectrum. A closer look at various experimental data reveals more or less signiﬁcant deviations from a 1/f characteristic. Such deviations are especially evident at low frequencies and less evident at high frequencies where spectra are very noisy. We ex- emplify such cases with four different types of phenomena oﬀered by molecular biology (series con- sisting of the atomic mobility of the protein main chain), cell biophysics (ﬂickering of red blood cells), cognitive psychology (series of mentally generated series of apparent random numbers) and astrophysics (the X-ray ﬂux variability of a galaxy). Some of these cases can be better approximated by AR(1) – a ﬁrst order autoregressive model, which is a short-range memory model – than by a 1/f model (long-range memory). The same spectra can be more or less easily confused and/or approxi- mated by power-laws. On the other hand, an AR(1) model is only a zero approximation, which can be improved if more complex short-range correlation models, such as high order AR(p), ARMA, FARIMA models, are used. A key step to detect non-power laws in the spectra, already suggested by Mandelbrot, is to average out the spectra. Keywords: autoregressive model, short-range correlation, molecular biology, cell biophysics, cogni- tive psychology. 1. Introduction Many processes in nature exhibit temporal or spatial correlations that can be described by power-laws. Examples can be found in physical, biological, social and psychological systems [1, 2, 3, 4, 5]. In case of a spectral analysis, the power-law is of the type P = 1/f β , where f is frequency and β is the long-range correlation exponent. Its value is 0 < β < 2 for most of natural processes. A 1/f spectrum is diagnosed by looking at the double log plot which should be linear over the whole range of frequency scale. The slope of the linear ﬁt is the correlation exponent β. However, a general problem encountered in the spectral analysis of various ﬂuctuating systems is that the spectrum is quite noisy. Man- delbrot’s recommendation, that apparent 1/f spectra should be averaged before further interpretation [4], has often been overlooked including previous work [6, 7, 8, 9]. Man- delbrot considers that non-averaged spectra lead to ”unreliable and even meaningless results” [4]. An immediate consequence of the averaging procedure is that deviation from a power-law description of the spectra may be better disclosed. Sometimes, such a devia- tion can be obvious even if the spectrum is not averaged. On the other hand, the non-averaged spectrum can be often reasonably ﬁtted by a straight line, i.e. described by an apparent power-law. Such an example is illustrated in ﬁgure 1 for a heat shock protein (Protein Data Bank code: 1gme). The linear ﬁt of the power spectrum appears to be reasonable good, yet it can be easily noticed the plateau at 1

Autoregressive description of biological phenomena low frequencies (ﬁgure 1a). A simple averaging procedure can result in a clear non-1/f spectrum (ﬁgure 1b). Fig. 1. a) Non-averaged power spectrum for the atomic mobility in the backbone of a heat shock protein (PDB code: 1gme), which might suggest a long-range correlation interpretation. b) The averaged power spectrum with M =21 terms. It clearly indicates the short-range correlation (deviation from linearity). PDB stands for Protein Data Bank, available at http://www.rcsb.org . However, on many occasions the non-averaged spectrum can be easily confused with a linear dependence of the double log plot. In fact, the averaging of the spectrum can in- dicate the deviation form linearity of the double log plot of the spectrum. Consequently, two confusions may occur when dealing with non-averaged spectra: either a hidden non-power law remains undisclosed, and/or non-power laws are easily misinterpreted as power-laws. This work discusses the deviation from power-law behav- ior, which is characterized by two features: leveling of the spectrum at low frequencies, and a tendency for leveling at high frequencies, where the shape of the spectrum is blurred by the high level of noise (”the Spanish moss” as coined by [4]). The idea of this work is twofold: ﬁrst, to disclose cases of non-power laws by a sim- ple averaging of spectra from widely diﬀerent areas of science, and second, to oﬀer an interpretation of such spectra by using an autoregressive model AR(1). This is basically diﬀerent from a power-law description. While the meaning of a power-law is associated with long-range correlation or memory, the autoregressive model describes a system with short-range memory. The long-range memory is described in terms of a long-range corre- lation exponent β, while the short-range memory is characterized by the strength interac- tion φ among consecutive terms. The literature already reported cases where astrophysi- cal and psychological phenomena are described by autoregressive models rather than by power-laws [10, 11, 12]. Details of their approach diﬀer to some extent. The example from astrophysics is included in this work in order to check out our method of calculation against a different methodology. The spectral approach in the present work was done for two reasons: first, many re- sults in the literature are presented in spectral form and the present work was born from the observation that 1/f-like spectra show clear deviations from a power-low , and second, the spectral description can be easily done in an analytical manner. This paper is organized as following: ﬁrst, the main spectral features of an autoregres- sive model AR(1) are described for various interaction strengths among the terms of the 2

Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea series. Then, four classes of phenomena are described by this model: a bio-molecular process (the series of atomic mobility in the main chain of some proteins), a cell biophys- ics case (ﬂuctuation of geometric parameters characterizing the ﬂickering phenomenon of human red blood cells), a cognitive process (series of ”random” integers generated by human subjects) and, ﬁnally, the procedure is checked against an astrophysical phenome- non (X-ray emission of a galaxy) where a more complex calculation based on an AR(1) model has already been published [10]. It will be shown that an identical result is ob- tained with a more simple procedure. All cases proved to be described by non-power laws, although they can be easily confused with 1/f spectra if they are not averaged. Many of these cases can be described by the AR(1) model, while others could be possibly described by higher order autoregressive models. 2. The spectral characteristics of the autoregressive model A discrete stochastic process {X n , n=0, ±1, ±2,…} is called autoregressive process of order p, denoted AR(p), if {X n } is stationary and for any n: n p n p n n Z X X X = − − − − − ϕ ϕ ... 1 1 (1) where {Z n } is a Gaussian white noise with zero mean and variance σ 2 .The real parame- ters φ i , i =1, .., p, can be interpreted as a measure of the inﬂuence of a stochastic process term on the next term after i time steps. The properties of AR(p) processes have been studied in detail and they are the basis of the linear stochastic theory of time series [13] and [14]. Equation 1 has a unique solution if the polynomial Φ(z)=1−φ 1 z−...−φ p z p has no roots z with |z| =1. If in addition Φ(z) ≠1 for all |z| > 1, then the process is causal, i.e. the random variable X n can be expressed only in terms of noise values at previous moments and at the same moment. The spectral density of an AR(p) process is: 5 . 0 5 . 0 , ( 1 2 ) ( 2 ) 2 2 ≤ < − = − υ Φ π σ υ υ πi e f , (2) where ν is the frequency. For an AR(1) process, the spectral density in equation 2 be- comes: 5 . 0 5 . 0 , 2 cos 2 1 1 2 ) ( 2 2 ≤ < − − + = υ πυ ϕ ϕ π σ υ f , (3) where φ is the only parameter φ i in this case. The above mentioned formulas are valid for ideal stochastic processes of ﬁnite length. The time series found in practice have a ﬁnite length and usually they are considered realizations of a ﬁnite sample of an AR(1) process of inﬁnite length. Therefore, the changes of the equations 2 and 3 have to be analyzed for a sample with ﬁnite length {X n , n=1, 2, ..., N} extracted from an inﬁnite process {X n , n=0, ±1, ±2, ...}. A detailed analysis of the power spectrum of the AR(1) process and the inﬂuence of the ﬁnite length is con- tained in [15]. In this paper some of the main conclusions are discussed. The sample estimator of the spectral density is the periodogram: 3

Autoregressive description of biological phenomena 2 ) ( ) ( υ υ N N A I = , (4) where A N (ν) is the discrete Fourier transform of the sample: ∑ − = N n in n N e X N A 1 2 1 ) ( υ π υ , (5) Since the sample contains a ﬁnite number of components, there are only N independent values of A N (ν) and I N (ν). Usually, these values are computed for the Fourier frequencies ν j = j/N, where j is a integer satisfying the condition −0.5 <ν j ≤ 0.5. The periodogram of an AR(p) process is an unbiased estimator of the spectral density: ) ( 2 ) ( lim υ π υ f I j N N = ∞ → , (6) where (ν j −0.5N) < ν ≤ (ν j +0.5N) [13]. Hence, increasing the sample length N, while the time stept is kept constant, the average periodogram becomes a better approximation of the spectral density (equation 6). However, a single periodogram is not a consistent esti- mator, because it does not converge in probability to the spectral density, i.e. the standard deviation of I N (ν j ) does not converge to zero, and two distinct values of the periodogram are uncorrelated, no matter how close the frequencies are when they are computed. Usually, the spectral density and the periodogram are plotted on a log-log scale. The logarithmic coordinates strongly distort the shape of the graphic because by applying the logarithm, any neighborhood of the origin is transformed into an inﬁnite length interval and the value of f(0) can not be plotted. For a sample with N terms, the ﬁrst value of the spectral density is obtained for the minimum frequency ν min =1/N. Figure 2a shows the spectral density in equation 3 for N = 1024, σ =1 and diﬀerent values of the parameter φ. One can observe that the AR(1) processes are fractal-like ones. For φ =0.90 and espe- cially for φ =0.99, a signiﬁcant part of the power spectrum is almost linear with a slope equal to −2, which corresponds to β =2. A signiﬁcant part of the spectrum could be re- garded as linear for smaller value of φ (for example φ=0.5 in ﬁgure 2a). Fig. 2. The spectral density (a) and the absolute value of its derivative (b) of an AR(1) process for N = 1024, σ =1 and diﬀerent values of the interaction factor among close terms φ. In order to verify this behavior of the AR(1) spectrum, ﬁgure 2b includes the derivative of the spectral density from equation 3 in log-log coordinates: 4

Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea )) ( (ln ) ( υ υ υ υ f d d f − = ′ (7) One can notice that for φ ≥ 0.9 there is a region where f / ≅ −2. There is only a maximum value of f / for φ < 0.9 which corresponds to the center of the ”linear” (or ”fractal”) region of the power spectrum. For small frequencies, the AR(1) spectral density is strongly stretched in log-log co- ordinates such that a plateau appears (ﬁgure 2a) with a value given by: 2 2 ) 1 ( 2 ) 0 ( ϕ π σ − = f (8) From equation 3 it follows that the plateau corresponds to the small values of ν, when the variable term at the denominator can be neglected in comparison with the constant term. Using the quadratic approximation of the cosine function, the condition that the graph of the AR(1) power spectrum has a plateau ν < (1 − φ)/2π ϕ is obtained. If φ tends to 1, the plateau appears at smaller values of the frequency. Therefore, if N is large enough, the periodogram of an AR(1) sample has a plateau at small frequencies (if N is large, then ν min → 0). A time series {x n , n=1, 2, ..., N} as a realization of a sample {X n , n=1, 2, ..., N} from an AR(1) process is considered. Applying the discrete Fourier transform (equation 4) to the time series {x n }, and then computing the periodogram (equation 5), values randomly distributed around the spectral density (equation 3) of the AR(1) are obtained. Since the periodogram is not a consistent estimator, by increasing the length N of the sample, the periodogram ﬂuctuations around the theoretical spectral density are not reduced. Consis- tent estimation of the spectral density may be obtained using averaging of the periodo- gram on intervals with length of magnitude order of N [13]. Choosing the optimum weight function is a difficult task, because, if the periodogram is smoothed too much, then the bias with respect to the theoretical spectrum can become large. From various weight functions [16] the simplest one is used, i.e. the averaging with equal weights on symmetric intervals containing M Fourier frequencies, with M = 1, 3, 5, ..., 21. Then, the averaged periodogram contains N − M +1 values, because for the ﬁrst and last (M − 1)/2 values of the periodogram, the symmetric averaging can not be performed. Let us consider that an AR(1) model for an averaged periodogram is to be found, i.e. to ﬁnd the values of the parameters φ and σ. The minimum of the quadratic norm of the diﬀerence between the averaged periodogram and the theoretical spectral density of the AR(1) model has to be determined. The sample standard deviation of the time series and φ = 0 are used as initial values for the optimization algorithm. 3. Data and methods The data considered for analysis in this section belong to diﬀerent areas of life sciences: a) Molecular biology. A protein consists of a chain of atoms built upon a backbone (−N −Cα−C−) having various lateral groups of atoms. The structure and dynamics of a protein is characterized by the atomic coordinates and the temperature factor, or the Debye-Waller factor, which are determined from the X-ray diﬀraction of protein crystals. The temperature factor can 5

Autoregressive description of biological phenomena be regarded as a characteristic parameter for the atomic mobility. These data are available in Protein Data Bank (PDB), at http://www.rcsb.org, for a large number of proteins. We further consider the series of the temperature factors of the protein backbones, as they represent a series of data consisting of a natural succession of terms like in a time series. The organization in such structures can be characterized by looking at their correlation properties. In a previous article it has been claimed that long-range correlation is present in such series [9], and this has been further analyzed in terms of long-range correlation [10, 11, 12]. This interpretation is reconsidered in the next section. Twelve proteins were randomly selected (table 1). The .text ﬁle associated to each protein includes all the in- formation obtained from X-ray crystallography (atomic coordinates, temperature factor, resolution, etc.), from which the temperature factors of the atoms in the backbone main chains were extracted (for a detailed explanation see [7]). b) Cell biophysics – ﬂickering of human red blood cells (RBCs). Cell membrane undulation is a common phenomenon in the world of living cells. By far the best known is the red blood cell shape ﬂuctuations, also known as ﬂickering. They can easily be observed by optical microscopy. Such ﬂuctuations in RBCs consist of sub- micron, out-of-plane displacements of the cell membrane in the frequency range of 0.3-30 Hz. Previous studies performed investigations on RBCs adhering ﬁrmly and irreversibly to the glass substratum. In the present study free ﬂoating cells were investigated. This is an advantage as there are no mechanical restrictions imposed on cells, whereas the disad- vantage is due to various motions of the cell, that induce non-stationary contributions to the basic ﬂuctuations in the time series. Such motions either cause defocusing or an ap- parent change in the plan projection of various parameters. All these motions modify the geometrical parameters of the cell in an apparent manner for reasons others than the ﬂickering itself. As a general rule, these phenomena were minimized by allowing some time for the sample to settle down and by keeping the time of image recording at a short interval. However, these phenomena may persist to various degrees and appear to be non- stationary contributions in the series of data. They can be removed by the detrending pro- cedure mentioned below in this section. Venous blood samples for experiments were obtained from healthy adult donors, via withdrawal into sterile vacuum tubes containing 3.8% natrium citrate. The RBCs were separated from the blood by centrifugation at 1200g for 10 min and washed three times in an isotonic saline buﬀer (145mM NaCl, 5mM KCl, 5mM HEPES, pH7.4). Then, the washed RBCs were resuspended in isotonic saline buﬀer in much more diluted suspen- sions. An upright optical microscope (Olympus, Model BX 51) linked to a computer, via CMOS sensor monochrome camera (PixeLINK, Model PL-A741) with a resolution of 1280x1024 pixels @27fps, was used to capture images of RBCs. For each sample, 1024 images @10fps were captured and processed by Image J, free software [17]. The cell area for each RBC has been determined by time lapse imaging. The series were further sub- jected to spectral analysis. c) Cognitive psychology. It was previously reported an analysis of series of numbers purposely produced by various human subjects as ”random” series [8, 9]. In our experiments the human subjects have been asked to dictate in a random manner numbers from 0 to 9 or just 0 and 1 [9]. A general ”algorithm” used by all subjects was the non-repetition of numbers, while no 6

Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea subject was aware that randomness involves some repetition of the numbers. The data were previously interpreted in terms of long-range correlation. However, we noticed that most of the series revealed non-power laws of the power spectra. Also, the detrended ﬂuctua tion analysis plots (not shown) proved that long-range correlation is not valid throughout the series. The authors were not aware at that time about the non-power law description of the process and the plots were approximated by 1/f spectra [9]. These ex- periments were reconsidered and new longer series of data were produced. The spectra were averaged and they easily revealed non-power law spectra. d) Astrophysics. Diﬀerent type of astrophysical objects display variability phenomena featured by power-law power spectra, e.g. the light curve of the 3C273 quasar. Also, the amplitude spectrum (in log-log plots) of the intermediate polar AE Aquarii shows a-1 slope [18], while the power spectrum of its radio emission time variability revealed the presence of a red noise described by a power-law. The X-ray variability of Cygbus X-1 system [19] and of Be star γ Cassiopeiae [20] is also featured by power spectra displaying 1/f seg- ments. The X-ray variability of active galactic nuclei is also known to show red noise spectra which could be quite well ﬁtted by power-laws [21]. Deviations from the simple power-law behavior are emphasized by several authors. Thus, there are mentioned cases [22, 20, 23, 24] in which diﬀerent frequencies domains are featured by diﬀerent slopes, or the power spectra gradually ﬂatten toward low frequencies. Autoregressive analysis has been reported on the variability of X-ray light curves of the active Galaxy NGC5506 [10]. This analysis is compared with the present more simple approach, by using the same data extracted form Hearc Exosat ME archive for the Seyfert galaxy NGC5506. Many series of data have non-stationary characteristics, so the application of Fourier transform to the data results in misleading spectra. A common procedure to avoid this complication is to use detrended ﬂuctuation analysis (DFA) [25]. This results in a correla- tion exponent free of the correlation introduced by the trend. However, in our case it is essential to obtain the corresponding spectrum, as the shape of the spectrum gives the relevant information (either a power-law or a non-power law is operative). Consequently, an important preliminary step is to remove non-stationary characteristics in the series. We performed detrending by subtracting a polynomial ﬁt from the original series. The prob- lem is to determine the right polynomial ﬁt. We performed 1 to 20 degree polynomial ﬁts and generally found that a polynomial degree around 10 gives the most reliable result for φ and σ. The values of φ and σ also depended on the averaging procedure of the spectra so that optimizing the values of φ and σ involved optimizing both the detrending and the averaging procedures. This will be further discussed in section 4. The above mentioned polynomial fitting was chosen as its accuracy is comparable to the moving average method and to an automatic method for the estimation of a monotone trend. The same work also showed that polynomial fitting for a 1/f noise proved to have the best performance [26]. The succession of operations can be summarized as following: i) Detrend the series of data by subtracting various degrees of polynomial ﬁts; ii) Discrete Fourier transform of the series; iii) Periodogram averaging using 1-21 terms; iv) Fit the spectrum to an AR(1) model. The resulting parameters are the interaction factor φ and the dispersion σ. Their values depend on the degree of the polynomial ﬁt used for the detrending procedure, and on the number of spectral terms used for averaging procedure. v) If the data can be de- 7

Autoregressive description of biological phenomena scribed by an AR(1) model, choose the ﬁnal values of φ and σ, by analyzing the plot of φ and σ against the polynomial degree and the number of averaging terms. 4. Averaged spectra and ﬁtting with AR(1) model An averaged periodogram for mobility of backbone atoms in human hemoglobin is shown in ﬁgure 3. The non-linearity of the periodogram is obvious; therefore, it can not be described by a power-law. On the other hand, the data can be better approximated by an AR(1) model, as shown in the same ﬁgure 3. Fig. 3. Averaged periodogram and its AR(1) ﬁt (continuous line) for the backbone atomic mobility of human hemoglobin (PDB code: 1a30). Detrending of the series was done by subtracting a 10 polynomial degree, and the number of averaged terms in the spectrum was M =21. The interaction factor is φ =0.93 and the noise dis- persion is σ =2.00. Further, an example of protein backbone atomic mobility for HIV-1 protease (PDB code: 4phv) is illustrated in ﬁgure 4. This example can not be described by a simple AR(1) model. The closest AR(1) model is included in ﬁgure 4 by a continuous line and it appears to be quite diﬀerent from AR(1). Clearly a diﬀerent model applies in this case. Fig. 4. Periodogram of the backbone atomic mobility for a HIV-1 protease (PDB code: 4phv). The series was preliminary detrended by subtracting a 10 polynomial degree ﬁt. The continuous line represents an AR(1) model with φ =0.99 and σ =1.13, which was the nearest ﬁt of the periodogram. 8

Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea No Protein PDB code Classiﬁcation Protein class from SCOP AR(1) fit 1 4phv HIV-1 protease; All beta No (aspartic proteinase) 2 1k34 Virus/viral protein Coiled coil No 3 1iiq HIV-1 protease; All beta Yes; hydrolase inhibitor φ =0.927 4 1gbu Deoxy human All alpha No hemoglobin 5 1fbd Hydrolase Alpha Yes; (phosphoric monoester) and beta φ =0.981 6 1c9m Hydrolase Alpha No and beta 7 1c9n Hydrolase Alpha No and beta 8 1a8g HIV-1 protease complex All beta Yes; (acid proteinase/inhibitor) φ =0.965 9 1a30 HIV-1 protease complex All beta Yes; (aspartic protease/inhibitor) φ =0.923 10 1qbs HIV-1 protease; All beta Yes; aspartyl protease φ =0.964 11 1gme Eukaryotic small heat All beta No shock protein; chaperone 12 1atr Chaperone protein Alpha and beta No Table 1: Classiﬁcation and classes of proteins investigated by the AR(1) model. All the cases with no AR(1) ﬁt are described by higher order AR models. The results for φ were obtained for detrended series by using 10 degree polynomials and M =21 terms for averaging. Twelve randomly selected proteins, belonging to various protein classes, were ana- lyzed in order to ﬁnd out to what extend their spectra can be described by an autoregres- sive AR(1) model (table 1). Table 1 also includes the information about the autoregres- sive model in terms of ”yes” if there is a ﬁt to our data by AR(1) and the corresponding correlation coeﬃcient φ is indicated or ”no” if there is no such a fit. It is found that all cases do not have long-range characteristics, i.e. the double log plot of the averaged spectra deviates signiﬁcantly form linearity. About 40% of the cases can be described by an AR(1) model, while the rest of them can be described by higher order autoregressive models (table 1). The next case is taken from red blood cell ﬂickering. An example of a spectral analy- sis of the area ﬂuctuation is shown in ﬁgure 5. The spectrum remains noisy even after the averaging method, but it can be clearly seen that an AR(1) model describes the data better than a power-law. 9

Autoregressive description of biological phenomena Fig. 5. Averaged periodogram for M =21 terms of human RBCs area ﬂuctuations suspended in buﬀer solution and its ﬁtting by an AR(1) model (continuous line) with φ =0.52 and σ =17.92. The initial series was detrended by a 10 polynomial degree. The power spectrum also indicates some oscillations around the AR(1) curve in the higher frequency range. Similar to the case of proteins, it is suitable to use a 10 degree polynomial degree for detrending and 21 terms for the averaging procedure in order to obtain a reliable value for the interaction factor φ. This choice ensures the most constant value of φ for various averaging terms M. All series show that cell area ﬂuctuation is characterized by an interaction factor φ around 0.5 − 0.7. The data seem to be noisier than the protein examples, yet they can be deﬁnitely classiﬁed as short-range memory systems. Further, an example from a cognitive psychology experiment is given. This case can be well modeled by an AR(1) model with a negative φ (ﬁgure 6). From sixteen series gener- ated by diﬀerent subjects only four series could be ﬁtted by the AR(1) model. Other se- ries produced typical spectra of the type shown in ﬁgure 7, which can not be described by an AR(1) model. Although this article is limited to an illustration with AR(1) model of various classes of phenomena, an example of higher order autoregressive model (ﬁgure 8) is included, which resembles qualitatively the data in ﬁgure 7. Fig. 6. Periodogram of mental series generated by the ﬁrst human subject ﬁtted by an AR(1) model (continuous line) with φ = −0.53 and σ =0.41.Note the anti-correlation character of the spectrum. 10

Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea Fig. 7. Periodogram of mental series generated by the second human subject. The continuous line is the AR(1) ﬁt, that gives φ =0.17 and σ =2.22. This plot illustrates that the mental series can not be described by an AR(1) model; higher order AR models should be used. Fig. 8. Periodogram of an AR(2) model series. This AR model simulation is quantitatively similar to the men- tal series in ﬁgure 7 (it is characterized by a higher order AR), with φ 1 =0.8and φ 4 =−0.3. This might suggests that the series are described by a strong interaction among neighbors and by a weaker anti-interaction at four terms distance. In other words, the numbers produced by a subject are strongly related to each other and it is quite the oppo- site at longer distances. In fact, all series which were not ﬁtted by AR(1) can be described by similar higher order AR models. These results show that mental series are generated according to diﬀerent ”algorithms” depending on the subject and therefore, this method of analysis may represent a promising tool of investigation for such cognitive experi- ments. 5. Checking the AR(1) ﬁtting procedure against an astrophysical example Our calculations have as a ﬁnal result the value of φ and σ of the AR(1) model ﬁtted to a particular series of data. The procedure was checked against a more complex system formed by series of astrophysical data. The variability of X-ray ﬂux from galaxies has 11

Autoregressive description of biological phenomena been previously described as ﬂickering or 1/f ﬂuctuation [10]. A speciﬁc problem in as- tronomical observations is the observational noise, as well as other misleading systematic eﬀects occurring in power spectra. As a result, a speciﬁc model was considered in order to generalize AR processes and to estimate the hidden autoregressive process [10]. The model is known as the Linear State Space Model (LSSM) in which the observational noise is explicitly modeled. The results reported by König and Timmer for the ﬁrst order process AR(1) are the parameter φ = 0.994 and the standard deviation σ = 0.722. König and Timmer analysis was compared with the present more simple approach and proved to be in a very good agreement. The data series are represented in ﬁgure 9a. The time series consist of N = 6948 values, with the sample interval dt = 60s. There are only 20 values which are not equidistant. The maximum value of the sampling interval is 3310s. However, the time series are considered as being almost equidistant and further precautions will be reduced to minimum. Fig. 9. a) The X-ray time series of galaxy NGC5506; b) Trends of the time series obtained for three different polynomial fittings (0, 2, 5 and 9). The trend reduces to a constant when q = 0. for q = 2, the polynomial trend describes the global shape of the signal. For q = 5, the polynomial trend describes the different behavior of the first and last half of the signal. The polynomial trend can follow the details of the signal when q = 9. The ﬁrst problem is to remove the deterministic trend that can be found by examining the shape of the signal. Timmer and König did not extract any deterministic trend in their analysis. Three diﬀerent trends can be obtained by diﬀerent polynomial ﬁtting degrees (ﬁgure 9b). The degree of these polynomials equals the minimum degree of a class of polynomial trends q, which has very similar shapes. When the degree of the polynomial trend increases, the shape of the trend does not change monotonically. At certain poly- nomial degrees, the trend has more signiﬁcant changes, while for the following polyno- mial degrees the shape remains practically unchanged. The trend reduces to a constant, which equals the average of the temporal series when q = 0. There is no signiﬁcant im- provement for a linear trend (q = 1), while for q = 2 the polynomial trend describes the global shape of the signal. Only for q = 5 the polynomial trend describes the diﬀerent behavior of the ﬁrst and last half of the signal. The polynomial trend can follow better the details of the signal when q = 9, however, numerical oscillations arise at the end, which can not be associated with real variations. These numerical oscillations increase with the degree of the polynomial, therefore they were not considered. The choice of these four representative values for the degree of the polynomial trend will be quantitatively conﬁrmed by the variation of parameters characterizing the autoregressive model. 12

Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea The calculation of the periodogram can not be done using equation 5, since there are missing data in twenty regions. The other values of the series are separated by time inter- val dt = 30s. Therefore, the series can be considered as equidistant and they have T = 7532 values, with 584 missing values. A zero value is assigned to these data, so they do not contribute in equation 5 to the value of the periodogram. The periodogram is pre- sented in ﬁgure 10a for the signal where only the mean value (q = 0) was subtracted, and in ﬁgure 10b for averaged periodogram with M = 21 values. It can be seen that the aver- aging procedure results in loosing data for the lower frequencies which describe the pla- teau of an autoregressive model. Fig. 10. a) Periodogram of the time series for galaxy NGC5506. The mean value q = 0 was subtracted; b) The averaged periodogram using a rectangular window with M = 21 values. The averaging procedure results in loosing data for the low frequencies that describe the plateau of an autoregressive model. At frequencies higher than the cutting frequency ν 0 = 0.02, the shape of the periodogram changes to a white noise spectrum. Only one part of the spectrum, for ν > ν 0 , can be modeled with an AR(1) process. At frequencies higher than the cutting frequency ν 0 = 0.02, the shape of the periodo- gram changes to a white noise spectrum, as it can be seen in ﬁgure 10b. Then, only one part of the spectrum, for ν > ν 0 , can be modeled with an AR(1) process. The values φ = 0.991, σ = 0.757 and φ = 0.985, σ = 0.744 are obtained by ﬁtting the lower frequencies part of the periodogram to an AR(1) process for the non-averaged and the averaged pe- riodogram respectively. These values are very close to those reported by Timmer and König. As in the former example, the averaging interval of periodogram has a strong inﬂuence on the parameters of the AR(1) model. This dependence is shown in ﬁgure 11 as a function of M for diﬀerent degrees of the polynomial trend. For small values of M, the values of φ and σ for diﬀerent polynomial trends are quite diﬀerent. Their variability for M > 9 is signiﬁcantly reduced, therefore the averaging procedure can eliminate the ﬂuctuation of the periodogram. This is why M = 11 is used in this investigation. 13

Autoregressive description of biological phenomena Fig. 11. a) Dependence of the parameters (a) φ and (b) σ for the AR(1) model on the polynomial trend degree (from 0 to 9) for diﬀerent M values. Continuous lines represent the values reported by [10]. For small values of M, the values of φ and σ for different polynomial trends are quite different. For M > 9, their variability is sig- nificantly reduced, so the averaging procedure can eliminate the periodogram of fluctuation. Another parameter which has to be analyzed in order to ﬁt an autoregressive model is the cutting frequency ν 0 . It can be noticed that at frequencies smaller than 0.02 the perio- dogram presents oscillations, which can be caused by the white noise at higher frequen- cies. The dependence of the autoregressive parameters on the value of ν 0 is presented in ﬁgure 12. It shows that dispersion of the noise does not depend signiﬁcantly on the perio- dogram interval used for the ﬁtting procedure. However, for all the degrees of the poly- nomial trend eliminated from the initial signal, the value of φ decreases with the increase of ν 0 . Only the ﬁrst two values are almost equal. Such a decrease can be explained by the inﬂuence of the observational white noise in the modeled part of the periodogram. Con- sequently, the cutting frequency was established at ν 0 = 0.005. Fig. 12. The dependence of the autoregressive parameters (a) φ and (b) σ on the cutting frequency ν 0 . All these results show a signiﬁcant dependence of the autoregressive parameters on the degree of the polynomial trend which was eliminated from the initial signal. The de- pendence of the autoregressive parameters on the degree of polynomial trend for several values of the cutting frequency is presented in ﬁgure 13. 14

Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea Fig. 13. The dependence of the autoregressive parameters (a) φ and (b) σ on the degree of polynomial trend q. First, the stepwise variation at q = 2, 5 and 9 can be noticed, which corresponds to the discussion mentioned at the beginning of this section. Another discontinuity at q =11 was no longer used, as the oscillations at the end of the interval become too large. Also, it can be noticed that the autoregressive parameters decrease as the degree of the polynomials increases. This is due to the fact that higher polynomial degrees can better describe the oscillations of the signal, while for lower degrees they are modeled by the autoregressive process. As Koning and Timmer did not remove any deterministic trend from the signal, a comparison with their results should consider the results for q =0. The relative error for φ is 0.2% and 5% for σ, which conﬁrms the spectral autoregressive modeling method of the time series proposed in this work. 6. Conclusions Four diﬀerent kind of ﬂuctuating phenomena originating from molecular biology, cell biology, cognitive psychology and astrophysics proved to be short-range memory sys- tems, and a signiﬁcant percentage of them could be described by the most simple autore- gressive model of ﬁrst order AR(1). The characteristic parameters for this model were φ – the strength of interaction among consecutive terms – and the dispersion σ of the data. Phenomena can be broadly classiﬁed into three categories according to φ: a) strong inter- acting systems with φ =0.80− 0.99; b) medium interacting systems with φ =0.5− 0.7, and c) weak interacting systems with φ =0.2− 0.4. This classiﬁcation is neither strictly delim- ited nor attributed to speciﬁc criteria apart from arbitrary numerical appreciation. The ﬁrst category includes the X-ray variability of galaxies and protein structures. In the sec- ond category falls the ﬂickering of blood cells and in the third category, some data for a cognitive process, such as the random generation of numbers, and DNA structure (not shown). The proportion of AR(1) models in each category may vary to some extend. However, around 1/4 to 1/3 of the cases is described by AR(1). It is also evident that higher degree autoregressive models are operative in other cases. The parameter φ proved to be sensitive and therefore can be proﬁtably exploited to investigate various eﬀects on the ﬂuctuating system. On the other hand, such processes can easily be confused and/or approximated by power-laws. The most important step in disclosing the nature of ﬂuctuations is to average out their spectra. Apparent 1/f spectra should be cautiously treated and averaging should be compulsory. It is suspected that 15

Autoregressive description of biological phenomena other phenomena are suitable for an autoregressive description, including higher order autoregressive models. Acknowledgements This work was funded by Romanian Academy for Scientiﬁc Research, Grant no. 2CEEX- 06-II-96. References [1] Stanley, H.E., Ostrowsky, N. Correlations and connectivity: geometric aspects of physics, chemistry and biology, Dordrecht: Kluwer, 1990. [2] Bunde, A., Havlin, S. Fractals in science, Berlin: Springer, 1994. [3] Milotti, E. 1/f noise: a pedagogical review. arxivpreprint,physics/0204033, 2002. [4] Mandelbrot, B.B. Multifractals and 1/f noise, New York: Springer, 1998. [5] Bassingthwaighte, J.B. et al. Fractal physiology, New York: Oxford University Press, 1994. [6] Morariu, V.V., Coza, A. Nonlinear properties of the atomic vibrations in protein backbones. Physica A, 320, 461-474, 2003. [7] Morariu, V.V., Isvoran, A., Zainea, O. A nonlinear approach to the structure- mobility relationship in protein main chains. Chaos, Solitons and Fractals, 32, 1305- 1315, 2007. [8] Morariu, V.V., Coza, A. 2001 Quarter power scaling in dynamics: experimental evidence from cell biology and cognitive psychology. Fluctuation and Noise Letters, 1, L111-L116, 2001. [9] Morariu, V.V., Coza, A., Chis, M. A., Isvoran, A., Morariu, L.C. Scaling in cogni- tion. Fractals, 9, 379-391, 2001. [10] König, M., Timmer, J. Analyzing X-ray variability by linear state space models. Suppl. Ser., 124, 589-596, 1997. [11] Timmer, J., Schwarz, U., Voss, H.U., Kurths, J. Linear and nonlinear time series analysis of the black hole candidate Cygnus X-1. Phys. Rev. E, 61, 1342-1352, 2000. [12] Thornton, Th.L., Gilden, D.L. Provenance of correlation in psychological data. Psychonomic Bulletin & Rev., 12, 409-441, 2005. [13] Brockwel, P.J., Davies, R.A. Time series: theory and methods, 2nd edn. New York: Springer, 1991. [14] Hamilton, J.D. Time series analysis, Princeton University Press, 1994. [15] Vamos, C., Soltuz, S.M., Craciun, M. Order 1 autoregressive process of ﬁnite length. arxiv:0709.2963, 2007. [16] Stoica, P., Moses, R.L. Introduction to spectral analysis, New Jersey: Prentice Hall, 1997. [17] Rasband, W.S.ImageJ, U. S. National Institutes of Health, Bethesda, Maryland, USA, http://rsb.info.nih.gov/ij, 1997-2007. [18] Elsworth, Y.P., James, J.F. The ﬂicker spectrum of AE Aquarii. Mon. Not. R. Astr. Soc., 198, 889-896, 1982. [19] Lochner, J.C., Swank, J.H., Szymkowiak, A.E. Shot model parameters for Cyg- nus X-1 through phase portrait ﬁtting. Astrophys. J., 376, 295-311, 1991. [20] Smith, M.A., Robinson, R.D. Interplay of periodic, cyclic and stochastic vari- ability in selected areas of the H-R diagram. ASP Conf. Ser., 292, 263-274, 2003. [21] Gaskell, C.M., Klimek, E.S. Variability of active galactic nuclei from the optical X-ray regions. Astron. Astrophys. Trans., 22, 661-679, 2003. 16

Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea 17 [22] Uttley, P., McHardy, I.M. The ﬂux-dependent amplitude of broadband noise variability in X-ray binaries and active galaxies. Mon. Not. R. Astr. Soc., 323, L26L30, 2001. [23] Papadakis, I.E., Lawrence, A. A detailed X-ray variability study of the Seyfert galaxy NGC 4051. Mon. Not. R. Astr. Soc. 272, 161-183, 1995. [24] McHardy, I.M., Papadakis, I.E., Uttley, P., Page, M.J., Mason, K.O. Combined long and short time-scale X-ray variability of NGC 4051 with RXTE and XMM-Newton. Mon. Not. R. Astr. Soc., 348, 783-801, 2004. [25] Peng, C.-K., Buldyrev, S.V., Havlin, S., Simons, M., Stanley, H.E., Goldberger, A.L. Mosaic organization of DNA nucleotides. Phys. Rev. E, 49, 1685-1689, 1994. [26] Vamos, C. Automatic algorithm for monotone trend removal. Phys. Rev. E, 75, 036705, 2007

2010

Discrete Approximation Processes of King’s Type

February 17, 2023

Conjugate heat transfer of a nanofluid in a vertical channel adjacent to a heat generating solid domain

December 5, 2018

An asymptotic formula for a class of approximation processes of King’s type