Autoregressive modeling of biological phenomena

Abstract

Many natural phenomena can be described by power-laws of temporal or spatial correlations. The equivalence in frequency domain is the 1/f spectrum. A closer look at various experimental data reveals more or less significant deviations from a 1/f characteristic. Such deviations are especially evident at low frequencies and less evident at high frequencies where spectra are very noisy. We exemplify such cases with four different types of phenomena offered by molecular biology (series of coding sequence lengths from microbial genomes, series of the atomic mobility of the protein main chain), cell biophysics (flickering of red blood cells), cognitive psychology (mentally generated series of apparent random numbers) and astrophysics (the X-ray flux variability of a galaxy). All these examples appear to be described by autoregressive models of the first-order AR(1) or higher-order models. This further shows that a spectrum needs to be first subjected to averaging as, long ago, suggested by Mandelbrot otherwise the spectra can be more or less easily confused and/or approximated by power-laws.

Authors

V.V. Morariu
– National Institute of Research and Development for Isotopic and Molecular Technologies,
– Department of Molecular and Biomolecular Physics, Cluj-Napoca, Romania

C. Vamoș
-Tiberiu Popoviciu Institute of Numerical Analysis, Romanian Academy

S.M. Soltuz
-Tiberiu Popoviciu Institute of Numerical Analysis, Romanian Academy

A. Pop
Astronomical Institute of the Romanian Academy, Astronomical Observatory Cluj-Napoca

L. Buimaga-Iarinca
National Institute of Research and Development for Isotopic and Molecular Technologies, Department of Molecular and Biomolecular Physics, Cluj-Napoca, Romania

Keywords

Autoregresive model; molecular biology; cell biophysics; cognitive psychology

Cite this paper as:

V.V. Morariu, C. Vamoş, Ş.M. Şoltuz, A. Pop, L. Buimaga-Iarinca, O. Zainea, Autoregressive modeling of biological phenomena, Biophysical Reviews and Letters, vol. 5 (2010) no. 3, pp. 109-128.

PDF

About this paper

Journal

Biophysical Reviews and Letters

Publisher Name
Print ISSN

1793-0480

Online ISSN

1793-7035

MR

?

ZBL

?

[1] H. E.   Stanley and N.   Ostrowsky ,Correlations and Connectivity: Geometric Aspects of Physics, Chemistry and Biology ( Kluwer , Dordrecht , 1990 ) . CrossrefGoogle Scholar

[2] A.   Bunde and S.   Havlin , Fractals in Science ( Springer , Berlin , 1994 ) . Google Scholar

[3] E. Milotti, 1/f noise: a pedagogical review , arXiv:physics/0204033 . Google Scholar

[4] B. B.   Mandelbrot , Multifractals and 1/f Noise ( Springer , New York , 1998 ) . Google Scholar

[5] J. B.   Bassingthwaighte , Fractal Physiology ( Oxford University Press , New York , 1994 ) CrossrefGoogle Scholar

[6] V. V. Morariu and A. Coza, Physica A 320, 461 (2003), DOI: 10.1016/S0378-4371(02)01661-8.CrossrefISIGoogle Scholar

[7] V. V. Morariu, A. Isvoran and O. Zainea, Chaos Soliton. Fract. 32, 1305 (2007), DOI: 10.1016/j.chaos.2005.12.023.CrossrefISIGoogle Scholar

[8] V. V. Morariu and A. Coza, Fluct. Noise Lett 1 (2001) p. L111.Google Scholar

[9] V. V. Morariuet al., Fractals 9, 379 (2001), DOI: 10.1142/S0218348X01000919.LinkISIGoogle Scholar

[10] M. König and J. Timmer, Astron. Astrophys. Suppl. Ser. 124, 589 (1997), DOI: 10.1051/aas:1997104.CrossrefGoogle Scholar

[11] J. Timmeret al., Phys. Rev. E 61, 1342 (2000), DOI: 10.1103/PhysRevE.61.1342.CrossrefISIGoogle Scholar

[12] Th. L. Thornton and D. L. Gilden, Psychon. Bull. Rev. 12, 409 (2005).CrossrefISIGoogle Scholar

[13] P. J.   Brockwel and R. A.   Davies , Time Series: Theory and Methods , 2nd edn. ( Springer , New York , 1991 ) . CrossrefGoogle Scholar

[14] J. D.   Hamilton , Time Series Analysis ( Princeton University Press , 1994 ) . Google Scholar

[15] C. Vamoş, S. M. Şoltuz and M. Crăciun, Order 1 autoregressive process of finite length , arXiv:0709.2963 . Google Scholar

[16] P.   Stoica and R. L.   Moses ,Introduction to Spectral Analysis ( Prentice Hall , 1997 ) . Google Scholar

[17] W. S. Rasband, ImageJ, U.S. National Institutes of Health, Bethesda, Maryland, USA, http://rsb.info.nih.gov/ij, 1997-2010 . Google Scholar

[18] Y. P. Elsworth and J. F. James, Mon. Not. R. Astron. Soc. 198, 889 (1982).CrossrefISIGoogle Scholar

[19] J. C. Lochner, J. H. Swank and A. E. Szymkowiak, Astrophys. J. 76, 295 (1991).Google Scholar

[20] M. A. Smith and R. D. Robinson, ASP Conf. Ser. 292, 263 (2003).Google Scholar

[21] C. M. Gaskell and E. S. Klimek, Astron. Astrophys. Trans. 22, 661 (2003), DOI: 10.1080/1055679031000153851.CrossrefGoogle Scholar

[22] P. Uttley and I. M. McHardy, Mon. Not. R. Astron. Soc. 323, (2001), DOI: 10.1046/j.1365-8711.2001.04496.x.Google Scholar

[23] I. E. Papadakis and A. Lawrence, Mon. Not. R. Astron. Soc. 272, 161 (1995).CrossrefISIGoogle Scholar

[24] I. M. McHardyet al., Mon. Not. R. Astron. Soc. 348, 783 (2004), DOI: 10.1111/j.1365-2966.2004.07376.x.CrossrefISIGoogle Scholar

[25] C. K. Penget al., Phys. Rev. E 49, 1685 (1994), DOI: 10.1103/PhysRevE.49.1685.CrossrefISIGoogle Scholar

[26] C. Vamoş, Phys. Rev. E 75, 036705 (2007), DOI: 10.1103/PhysRevE.75.036705.CrossrefGoogle Scholar

[27] D. J.   Li and S.   Zhang , Prediction of Genomic Properties and Classification of Life by Protein Length Distribution ( ) ,   arXiv:0806.0205v1 . Google Scholar

[28] D. J. Li and S. Zhang, Mod. Phys. Lett. B 23, 3563 (2009), DOI: 10.1142/S0217984909021624.LinkISIGoogle Scholar

[29] D. J. Li and S. Zhang, Mod. Phys. Lett. B 23, 3471 (2009), DOI: 10.1142/S0217984909021533.LinkISIGoogle Scholar

[30] V. V. Morariu and L. Buimagă-Iarinca, Fluct. Noise Lett. 9, 47 (2010).LinkISIGoogle Scholar

Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea AUTOREGRESSIVE DESCRIPTION OF BIOLOGICAL PHENOMENA VASILE V. MORARIU 1 , CĂLIN VAMOŞ 2 , ALEXANDRU POP 3 ,ŞTEFAN M. ŞOLTUZ 2,4 , LUIZA BUIMAGA-IARINCA 1 , OANA ZAINEA 1 1 National Institute of Research and Development for Isotopic and Molecular Technologies, Department of Molecular and Biomolecular Physics, 400293, Cluj-Napoca, Romania 2 ”T. Popoviciu” Institute of Numerical Analysis, Romanian Academy, 400110 Cluj-Napoca, Romania 3 Astromomical Institute of the Romanian Academy, Astronomical Observatory, 19 Ciresilor, 400487 Cluj- Napoca, Romania 4 Departamento Matematicas, Universidad de los Andes, Carrera 1 No. 18A-10, Bogota, Columbia Many natural phenomena can be described by power-laws of the temporal or spatial correlations. The equivalent in frequency domain is the 1/f spectrum. A closer look at various experimental data reveals more or less signicant deviations from a 1/f characteristic. Such deviations are especially evident at low frequencies and less evident at high frequencies where spectra are very noisy. We ex- emplify such cases with four different types of phenomena oered by molecular biology (series con- sisting of the atomic mobility of the protein main chain), cell biophysics (ickering of red blood cells), cognitive psychology (series of mentally generated series of apparent random numbers) and astrophysics (the X-ray ux variability of a galaxy). Some of these cases can be better approximated by AR(1) – a rst order autoregressive model, which is a short-range memory model – than by a 1/f model (long-range memory). The same spectra can be more or less easily confused and/or approxi- mated by power-laws. On the other hand, an AR(1) model is only a zero approximation, which can be improved if more complex short-range correlation models, such as high order AR(p), ARMA, FARIMA models, are used. A key step to detect non-power laws in the spectra, already suggested by Mandelbrot, is to average out the spectra. Keywords: autoregressive model, short-range correlation, molecular biology, cell biophysics, cogni- tive psychology. 1. Introduction Many processes in nature exhibit temporal or spatial correlations that can be described by power-laws. Examples can be found in physical, biological, social and psychological systems [1, 2, 3, 4, 5]. In case of a spectral analysis, the power-law is of the type P = 1/f β , where f is frequency and β is the long-range correlation exponent. Its value is 0 < β < 2 for most of natural processes. A 1/f spectrum is diagnosed by looking at the double log plot which should be linear over the whole range of frequency scale. The slope of the linear t is the correlation exponent β. However, a general problem encountered in the spectral analysis of various uctuating systems is that the spectrum is quite noisy. Man- delbrot’s recommendation, that apparent 1/f spectra should be averaged before further interpretation [4], has often been overlooked including previous work [6, 7, 8, 9]. Man- delbrot considers that non-averaged spectra lead to ”unreliable and even meaningless results” [4]. An immediate consequence of the averaging procedure is that deviation from a power-law description of the spectra may be better disclosed. Sometimes, such a devia- tion can be obvious even if the spectrum is not averaged. On the other hand, the non-averaged spectrum can be often reasonably tted by a straight line, i.e. described by an apparent power-law. Such an example is illustrated in gure 1 for a heat shock protein (Protein Data Bank code: 1gme). The linear t of the power spectrum appears to be reasonable good, yet it can be easily noticed the plateau at 1
Autoregressive description of biological phenomena low frequencies (gure 1a). A simple averaging procedure can result in a clear non-1/f spectrum (gure 1b). Fig. 1. a) Non-averaged power spectrum for the atomic mobility in the backbone of a heat shock protein (PDB code: 1gme), which might suggest a long-range correlation interpretation. b) The averaged power spectrum with M =21 terms. It clearly indicates the short-range correlation (deviation from linearity). PDB stands for Protein Data Bank, available at http://www.rcsb.org . However, on many occasions the non-averaged spectrum can be easily confused with a linear dependence of the double log plot. In fact, the averaging of the spectrum can in- dicate the deviation form linearity of the double log plot of the spectrum. Consequently, two confusions may occur when dealing with non-averaged spectra: either a hidden non-power law remains undisclosed, and/or non-power laws are easily misinterpreted as power-laws. This work discusses the deviation from power-law behav- ior, which is characterized by two features: leveling of the spectrum at low frequencies, and a tendency for leveling at high frequencies, where the shape of the spectrum is blurred by the high level of noise (”the Spanish moss” as coined by [4]). The idea of this work is twofold: rst, to disclose cases of non-power laws by a sim- ple averaging of spectra from widely dierent areas of science, and second, to oer an interpretation of such spectra by using an autoregressive model AR(1). This is basically dierent from a power-law description. While the meaning of a power-law is associated with long-range correlation or memory, the autoregressive model describes a system with short-range memory. The long-range memory is described in terms of a long-range corre- lation exponent β, while the short-range memory is characterized by the strength interac- tion φ among consecutive terms. The literature already reported cases where astrophysi- cal and psychological phenomena are described by autoregressive models rather than by power-laws [10, 11, 12]. Details of their approach dier to some extent. The example from astrophysics is included in this work in order to check out our method of calculation against a different methodology. The spectral approach in the present work was done for two reasons: first, many re- sults in the literature are presented in spectral form and the present work was born from the observation that 1/f-like spectra show clear deviations from a power-low , and second, the spectral description can be easily done in an analytical manner. This paper is organized as following: rst, the main spectral features of an autoregres- sive model AR(1) are described for various interaction strengths among the terms of the 2
Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea series. Then, four classes of phenomena are described by this model: a bio-molecular process (the series of atomic mobility in the main chain of some proteins), a cell biophys- ics case (uctuation of geometric parameters characterizing the ickering phenomenon of human red blood cells), a cognitive process (series of ”random” integers generated by human subjects) and, nally, the procedure is checked against an astrophysical phenome- non (X-ray emission of a galaxy) where a more complex calculation based on an AR(1) model has already been published [10]. It will be shown that an identical result is ob- tained with a more simple procedure. All cases proved to be described by non-power laws, although they can be easily confused with 1/f spectra if they are not averaged. Many of these cases can be described by the AR(1) model, while others could be possibly described by higher order autoregressive models. 2. The spectral characteristics of the autoregressive model A discrete stochastic process {X n , n=0, ±1, ±2,…} is called autoregressive process of order p, denoted AR(p), if {X n } is stationary and for any n: n p n p n n Z X X X = ϕ ϕ ... 1 1 (1) where {Z n } is a Gaussian white noise with zero mean and variance σ 2 .The real parame- ters φ i , i =1, .., p, can be interpreted as a measure of the inuence of a stochastic process term on the next term after i time steps. The properties of AR(p) processes have been studied in detail and they are the basis of the linear stochastic theory of time series [13] and [14]. Equation 1 has a unique solution if the polynomial Φ(z)=1φ 1 z...φ p z p has no roots z with |z| =1. If in addition Φ(z) 1 for all |z| > 1, then the process is causal, i.e. the random variable X n can be expressed only in terms of noise values at previous moments and at the same moment. The spectral density of an AR(p) process is: 5 . 0 5 . 0 , ( 1 2 ) ( 2 ) 2 2 < = υ Φ π σ υ υ πi e f , (2) where ν is the frequency. For an AR(1) process, the spectral density in equation 2 be- comes: 5 . 0 5 . 0 , 2 cos 2 1 1 2 ) ( 2 2 < + = υ πυ ϕ ϕ π σ υ f , (3) where φ is the only parameter φ i in this case. The above mentioned formulas are valid for ideal stochastic processes of nite length. The time series found in practice have a nite length and usually they are considered realizations of a nite sample of an AR(1) process of innite length. Therefore, the changes of the equations 2 and 3 have to be analyzed for a sample with nite length {X n , n=1, 2, ..., N} extracted from an innite process {X n , n=0, ±1, ±2, ...}. A detailed analysis of the power spectrum of the AR(1) process and the inuence of the nite length is con- tained in [15]. In this paper some of the main conclusions are discussed. The sample estimator of the spectral density is the periodogram: 3
Autoregressive description of biological phenomena 2 ) ( ) ( υ υ N N A I = , (4) where A N (ν) is the discrete Fourier transform of the sample: = N n in n N e X N A 1 2 1 ) ( υ π υ , (5) Since the sample contains a nite number of components, there are only N independent values of A N (ν) and I N (ν). Usually, these values are computed for the Fourier frequencies ν j = j/N, where j is a integer satisfying the condition 0.5 <ν j 0.5. The periodogram of an AR(p) process is an unbiased estimator of the spectral density: ) ( 2 ) ( lim υ π υ f I j N N = , (6) where (ν j 0.5N) < ν (ν j +0.5N) [13]. Hence, increasing the sample length N, while the time stept is kept constant, the average periodogram becomes a better approximation of the spectral density (equation 6). However, a single periodogram is not a consistent esti- mator, because it does not converge in probability to the spectral density, i.e. the standard deviation of I N (ν j ) does not converge to zero, and two distinct values of the periodogram are uncorrelated, no matter how close the frequencies are when they are computed. Usually, the spectral density and the periodogram are plotted on a log-log scale. The logarithmic coordinates strongly distort the shape of the graphic because by applying the logarithm, any neighborhood of the origin is transformed into an innite length interval and the value of f(0) can not be plotted. For a sample with N terms, the rst value of the spectral density is obtained for the minimum frequency ν min =1/N. Figure 2a shows the spectral density in equation 3 for N = 1024, σ =1 and dierent values of the parameter φ. One can observe that the AR(1) processes are fractal-like ones. For φ =0.90 and espe- cially for φ =0.99, a signicant part of the power spectrum is almost linear with a slope equal to 2, which corresponds to β =2. A signicant part of the spectrum could be re- garded as linear for smaller value of φ (for example φ=0.5 in gure 2a). Fig. 2. The spectral density (a) and the absolute value of its derivative (b) of an AR(1) process for N = 1024, σ =1 and dierent values of the interaction factor among close terms φ. In order to verify this behavior of the AR(1) spectrum, gure 2b includes the derivative of the spectral density from equation 3 in log-log coordinates: 4
Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea )) ( (ln ) ( υ υ υ υ f d d f = (7) One can notice that for φ 0.9 there is a region where f / 2. There is only a maximum value of f / for φ < 0.9 which corresponds to the center of the ”linear” (or ”fractal”) region of the power spectrum. For small frequencies, the AR(1) spectral density is strongly stretched in log-log co- ordinates such that a plateau appears (gure 2a) with a value given by: 2 2 ) 1 ( 2 ) 0 ( ϕ π σ = f (8) From equation 3 it follows that the plateau corresponds to the small values of ν, when the variable term at the denominator can be neglected in comparison with the constant term. Using the quadratic approximation of the cosine function, the condition that the graph of the AR(1) power spectrum has a plateau ν < (1 φ)/2π ϕ is obtained. If φ tends to 1, the plateau appears at smaller values of the frequency. Therefore, if N is large enough, the periodogram of an AR(1) sample has a plateau at small frequencies (if N is large, then ν min 0). A time series {x n , n=1, 2, ..., N} as a realization of a sample {X n , n=1, 2, ..., N} from an AR(1) process is considered. Applying the discrete Fourier transform (equation 4) to the time series {x n }, and then computing the periodogram (equation 5), values randomly distributed around the spectral density (equation 3) of the AR(1) are obtained. Since the periodogram is not a consistent estimator, by increasing the length N of the sample, the periodogram uctuations around the theoretical spectral density are not reduced. Consis- tent estimation of the spectral density may be obtained using averaging of the periodo- gram on intervals with length of magnitude order of N [13]. Choosing the optimum weight function is a difficult task, because, if the periodogram is smoothed too much, then the bias with respect to the theoretical spectrum can become large. From various weight functions [16] the simplest one is used, i.e. the averaging with equal weights on symmetric intervals containing M Fourier frequencies, with M = 1, 3, 5, ..., 21. Then, the averaged periodogram contains N M +1 values, because for the rst and last (M 1)/2 values of the periodogram, the symmetric averaging can not be performed. Let us consider that an AR(1) model for an averaged periodogram is to be found, i.e. to nd the values of the parameters φ and σ. The minimum of the quadratic norm of the dierence between the averaged periodogram and the theoretical spectral density of the AR(1) model has to be determined. The sample standard deviation of the time series and φ = 0 are used as initial values for the optimization algorithm. 3. Data and methods The data considered for analysis in this section belong to dierent areas of life sciences: a) Molecular biology. A protein consists of a chain of atoms built upon a backbone (N Cα−C) having various lateral groups of atoms. The structure and dynamics of a protein is characterized by the atomic coordinates and the temperature factor, or the Debye-Waller factor, which are determined from the X-ray diraction of protein crystals. The temperature factor can 5
Autoregressive description of biological phenomena be regarded as a characteristic parameter for the atomic mobility. These data are available in Protein Data Bank (PDB), at http://www.rcsb.org, for a large number of proteins. We further consider the series of the temperature factors of the protein backbones, as they represent a series of data consisting of a natural succession of terms like in a time series. The organization in such structures can be characterized by looking at their correlation properties. In a previous article it has been claimed that long-range correlation is present in such series [9], and this has been further analyzed in terms of long-range correlation [10, 11, 12]. This interpretation is reconsidered in the next section. Twelve proteins were randomly selected (table 1). The .text le associated to each protein includes all the in- formation obtained from X-ray crystallography (atomic coordinates, temperature factor, resolution, etc.), from which the temperature factors of the atoms in the backbone main chains were extracted (for a detailed explanation see [7]). b) Cell biophysics – ickering of human red blood cells (RBCs). Cell membrane undulation is a common phenomenon in the world of living cells. By far the best known is the red blood cell shape uctuations, also known as ickering. They can easily be observed by optical microscopy. Such uctuations in RBCs consist of sub- micron, out-of-plane displacements of the cell membrane in the frequency range of 0.3-30 Hz. Previous studies performed investigations on RBCs adhering rmly and irreversibly to the glass substratum. In the present study free oating cells were investigated. This is an advantage as there are no mechanical restrictions imposed on cells, whereas the disad- vantage is due to various motions of the cell, that induce non-stationary contributions to the basic uctuations in the time series. Such motions either cause defocusing or an ap- parent change in the plan projection of various parameters. All these motions modify the geometrical parameters of the cell in an apparent manner for reasons others than the ickering itself. As a general rule, these phenomena were minimized by allowing some time for the sample to settle down and by keeping the time of image recording at a short interval. However, these phenomena may persist to various degrees and appear to be non- stationary contributions in the series of data. They can be removed by the detrending pro- cedure mentioned below in this section. Venous blood samples for experiments were obtained from healthy adult donors, via withdrawal into sterile vacuum tubes containing 3.8% natrium citrate. The RBCs were separated from the blood by centrifugation at 1200g for 10 min and washed three times in an isotonic saline buer (145mM NaCl, 5mM KCl, 5mM HEPES, pH7.4). Then, the washed RBCs were resuspended in isotonic saline buer in much more diluted suspen- sions. An upright optical microscope (Olympus, Model BX 51) linked to a computer, via CMOS sensor monochrome camera (PixeLINK, Model PL-A741) with a resolution of 1280x1024 pixels @27fps, was used to capture images of RBCs. For each sample, 1024 images @10fps were captured and processed by Image J, free software [17]. The cell area for each RBC has been determined by time lapse imaging. The series were further sub- jected to spectral analysis. c) Cognitive psychology. It was previously reported an analysis of series of numbers purposely produced by various human subjects as ”random” series [8, 9]. In our experiments the human subjects have been asked to dictate in a random manner numbers from 0 to 9 or just 0 and 1 [9]. A general ”algorithm” used by all subjects was the non-repetition of numbers, while no 6
Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea subject was aware that randomness involves some repetition of the numbers. The data were previously interpreted in terms of long-range correlation. However, we noticed that most of the series revealed non-power laws of the power spectra. Also, the detrended uctua tion analysis plots (not shown) proved that long-range correlation is not valid throughout the series. The authors were not aware at that time about the non-power law description of the process and the plots were approximated by 1/f spectra [9]. These ex- periments were reconsidered and new longer series of data were produced. The spectra were averaged and they easily revealed non-power law spectra. d) Astrophysics. Dierent type of astrophysical objects display variability phenomena featured by power-law power spectra, e.g. the light curve of the 3C273 quasar. Also, the amplitude spectrum (in log-log plots) of the intermediate polar AE Aquarii shows a-1 slope [18], while the power spectrum of its radio emission time variability revealed the presence of a red noise described by a power-law. The X-ray variability of Cygbus X-1 system [19] and of Be star γ Cassiopeiae [20] is also featured by power spectra displaying 1/f seg- ments. The X-ray variability of active galactic nuclei is also known to show red noise spectra which could be quite well tted by power-laws [21]. Deviations from the simple power-law behavior are emphasized by several authors. Thus, there are mentioned cases [22, 20, 23, 24] in which dierent frequencies domains are featured by dierent slopes, or the power spectra gradually atten toward low frequencies. Autoregressive analysis has been reported on the variability of X-ray light curves of the active Galaxy NGC5506 [10]. This analysis is compared with the present more simple approach, by using the same data extracted form Hearc Exosat ME archive for the Seyfert galaxy NGC5506. Many series of data have non-stationary characteristics, so the application of Fourier transform to the data results in misleading spectra. A common procedure to avoid this complication is to use detrended uctuation analysis (DFA) [25]. This results in a correla- tion exponent free of the correlation introduced by the trend. However, in our case it is essential to obtain the corresponding spectrum, as the shape of the spectrum gives the relevant information (either a power-law or a non-power law is operative). Consequently, an important preliminary step is to remove non-stationary characteristics in the series. We performed detrending by subtracting a polynomial t from the original series. The prob- lem is to determine the right polynomial t. We performed 1 to 20 degree polynomial ts and generally found that a polynomial degree around 10 gives the most reliable result for φ and σ. The values of φ and σ also depended on the averaging procedure of the spectra so that optimizing the values of φ and σ involved optimizing both the detrending and the averaging procedures. This will be further discussed in section 4. The above mentioned polynomial fitting was chosen as its accuracy is comparable to the moving average method and to an automatic method for the estimation of a monotone trend. The same work also showed that polynomial fitting for a 1/f noise proved to have the best performance [26]. The succession of operations can be summarized as following: i) Detrend the series of data by subtracting various degrees of polynomial ts; ii) Discrete Fourier transform of the series; iii) Periodogram averaging using 1-21 terms; iv) Fit the spectrum to an AR(1) model. The resulting parameters are the interaction factor φ and the dispersion σ. Their values depend on the degree of the polynomial t used for the detrending procedure, and on the number of spectral terms used for averaging procedure. v) If the data can be de- 7
Autoregressive description of biological phenomena scribed by an AR(1) model, choose the nal values of φ and σ, by analyzing the plot of φ and σ against the polynomial degree and the number of averaging terms. 4. Averaged spectra and fitting with AR(1) model An averaged periodogram for mobility of backbone atoms in human hemoglobin is shown in gure 3. The non-linearity of the periodogram is obvious; therefore, it can not be described by a power-law. On the other hand, the data can be better approximated by an AR(1) model, as shown in the same gure 3. Fig. 3. Averaged periodogram and its AR(1) t (continuous line) for the backbone atomic mobility of human hemoglobin (PDB code: 1a30). Detrending of the series was done by subtracting a 10 polynomial degree, and the number of averaged terms in the spectrum was M =21. The interaction factor is φ =0.93 and the noise dis- persion is σ =2.00. Further, an example of protein backbone atomic mobility for HIV-1 protease (PDB code: 4phv) is illustrated in gure 4. This example can not be described by a simple AR(1) model. The closest AR(1) model is included in gure 4 by a continuous line and it appears to be quite dierent from AR(1). Clearly a dierent model applies in this case. Fig. 4. Periodogram of the backbone atomic mobility for a HIV-1 protease (PDB code: 4phv). The series was preliminary detrended by subtracting a 10 polynomial degree t. The continuous line represents an AR(1) model with φ =0.99 and σ =1.13, which was the nearest t of the periodogram. 8
Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea No Protein PDB code Classication Protein class from SCOP AR(1) fit 1 4phv HIV-1 protease; All beta No (aspartic proteinase) 2 1k34 Virus/viral protein Coiled coil No 3 1iiq HIV-1 protease; All beta Yes; hydrolase inhibitor φ =0.927 4 1gbu Deoxy human All alpha No hemoglobin 5 1fbd Hydrolase Alpha Yes; (phosphoric monoester) and beta φ =0.981 6 1c9m Hydrolase Alpha No and beta 7 1c9n Hydrolase Alpha No and beta 8 1a8g HIV-1 protease complex All beta Yes; (acid proteinase/inhibitor) φ =0.965 9 1a30 HIV-1 protease complex All beta Yes; (aspartic protease/inhibitor) φ =0.923 10 1qbs HIV-1 protease; All beta Yes; aspartyl protease φ =0.964 11 1gme Eukaryotic small heat All beta No shock protein; chaperone 12 1atr Chaperone protein Alpha and beta No Table 1: Classication and classes of proteins investigated by the AR(1) model. All the cases with no AR(1) t are described by higher order AR models. The results for φ were obtained for detrended series by using 10 degree polynomials and M =21 terms for averaging. Twelve randomly selected proteins, belonging to various protein classes, were ana- lyzed in order to nd out to what extend their spectra can be described by an autoregres- sive AR(1) model (table 1). Table 1 also includes the information about the autoregres- sive model in terms of ”yes” if there is a t to our data by AR(1) and the corresponding correlation coecient φ is indicated or ”no” if there is no such a fit. It is found that all cases do not have long-range characteristics, i.e. the double log plot of the averaged spectra deviates signicantly form linearity. About 40% of the cases can be described by an AR(1) model, while the rest of them can be described by higher order autoregressive models (table 1). The next case is taken from red blood cell ickering. An example of a spectral analy- sis of the area uctuation is shown in gure 5. The spectrum remains noisy even after the averaging method, but it can be clearly seen that an AR(1) model describes the data better than a power-law. 9
Autoregressive description of biological phenomena Fig. 5. Averaged periodogram for M =21 terms of human RBCs area uctuations suspended in buer solution and its tting by an AR(1) model (continuous line) with φ =0.52 and σ =17.92. The initial series was detrended by a 10 polynomial degree. The power spectrum also indicates some oscillations around the AR(1) curve in the higher frequency range. Similar to the case of proteins, it is suitable to use a 10 degree polynomial degree for detrending and 21 terms for the averaging procedure in order to obtain a reliable value for the interaction factor φ. This choice ensures the most constant value of φ for various averaging terms M. All series show that cell area uctuation is characterized by an interaction factor φ around 0.5 0.7. The data seem to be noisier than the protein examples, yet they can be denitely classied as short-range memory systems. Further, an example from a cognitive psychology experiment is given. This case can be well modeled by an AR(1) model with a negative φ (gure 6). From sixteen series gener- ated by dierent subjects only four series could be tted by the AR(1) model. Other se- ries produced typical spectra of the type shown in gure 7, which can not be described by an AR(1) model. Although this article is limited to an illustration with AR(1) model of various classes of phenomena, an example of higher order autoregressive model (gure 8) is included, which resembles qualitatively the data in gure 7. Fig. 6. Periodogram of mental series generated by the rst human subject tted by an AR(1) model (continuous line) with φ = 0.53 and σ =0.41.Note the anti-correlation character of the spectrum. 10
Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea Fig. 7. Periodogram of mental series generated by the second human subject. The continuous line is the AR(1) t, that gives φ =0.17 and σ =2.22. This plot illustrates that the mental series can not be described by an AR(1) model; higher order AR models should be used. Fig. 8. Periodogram of an AR(2) model series. This AR model simulation is quantitatively similar to the men- tal series in gure 7 (it is characterized by a higher order AR), with φ 1 =0.8and φ 4 =0.3. This might suggests that the series are described by a strong interaction among neighbors and by a weaker anti-interaction at four terms distance. In other words, the numbers produced by a subject are strongly related to each other and it is quite the oppo- site at longer distances. In fact, all series which were not tted by AR(1) can be described by similar higher order AR models. These results show that mental series are generated according to dierent ”algorithms” depending on the subject and therefore, this method of analysis may represent a promising tool of investigation for such cognitive experi- ments. 5. Checking the AR(1) fitting procedure against an astrophysical example Our calculations have as a nal result the value of φ and σ of the AR(1) model tted to a particular series of data. The procedure was checked against a more complex system formed by series of astrophysical data. The variability of X-ray ux from galaxies has 11
Autoregressive description of biological phenomena been previously described as ickering or 1/f uctuation [10]. A specic problem in as- tronomical observations is the observational noise, as well as other misleading systematic eects occurring in power spectra. As a result, a specic model was considered in order to generalize AR processes and to estimate the hidden autoregressive process [10]. The model is known as the Linear State Space Model (LSSM) in which the observational noise is explicitly modeled. The results reported by König and Timmer for the rst order process AR(1) are the parameter φ = 0.994 and the standard deviation σ = 0.722. König and Timmer analysis was compared with the present more simple approach and proved to be in a very good agreement. The data series are represented in gure 9a. The time series consist of N = 6948 values, with the sample interval dt = 60s. There are only 20 values which are not equidistant. The maximum value of the sampling interval is 3310s. However, the time series are considered as being almost equidistant and further precautions will be reduced to minimum. Fig. 9. a) The X-ray time series of galaxy NGC5506; b) Trends of the time series obtained for three different polynomial fittings (0, 2, 5 and 9). The trend reduces to a constant when q = 0. for q = 2, the polynomial trend describes the global shape of the signal. For q = 5, the polynomial trend describes the different behavior of the first and last half of the signal. The polynomial trend can follow the details of the signal when q = 9. The rst problem is to remove the deterministic trend that can be found by examining the shape of the signal. Timmer and König did not extract any deterministic trend in their analysis. Three dierent trends can be obtained by dierent polynomial tting degrees (gure 9b). The degree of these polynomials equals the minimum degree of a class of polynomial trends q, which has very similar shapes. When the degree of the polynomial trend increases, the shape of the trend does not change monotonically. At certain poly- nomial degrees, the trend has more signicant changes, while for the following polyno- mial degrees the shape remains practically unchanged. The trend reduces to a constant, which equals the average of the temporal series when q = 0. There is no signicant im- provement for a linear trend (q = 1), while for q = 2 the polynomial trend describes the global shape of the signal. Only for q = 5 the polynomial trend describes the dierent behavior of the rst and last half of the signal. The polynomial trend can follow better the details of the signal when q = 9, however, numerical oscillations arise at the end, which can not be associated with real variations. These numerical oscillations increase with the degree of the polynomial, therefore they were not considered. The choice of these four representative values for the degree of the polynomial trend will be quantitatively conrmed by the variation of parameters characterizing the autoregressive model. 12
Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea The calculation of the periodogram can not be done using equation 5, since there are missing data in twenty regions. The other values of the series are separated by time inter- val dt = 30s. Therefore, the series can be considered as equidistant and they have T = 7532 values, with 584 missing values. A zero value is assigned to these data, so they do not contribute in equation 5 to the value of the periodogram. The periodogram is pre- sented in gure 10a for the signal where only the mean value (q = 0) was subtracted, and in gure 10b for averaged periodogram with M = 21 values. It can be seen that the aver- aging procedure results in loosing data for the lower frequencies which describe the pla- teau of an autoregressive model. Fig. 10. a) Periodogram of the time series for galaxy NGC5506. The mean value q = 0 was subtracted; b) The averaged periodogram using a rectangular window with M = 21 values. The averaging procedure results in loosing data for the low frequencies that describe the plateau of an autoregressive model. At frequencies higher than the cutting frequency ν 0 = 0.02, the shape of the periodogram changes to a white noise spectrum. Only one part of the spectrum, for ν > ν 0 , can be modeled with an AR(1) process. At frequencies higher than the cutting frequency ν 0 = 0.02, the shape of the periodo- gram changes to a white noise spectrum, as it can be seen in gure 10b. Then, only one part of the spectrum, for ν > ν 0 , can be modeled with an AR(1) process. The values φ = 0.991, σ = 0.757 and φ = 0.985, σ = 0.744 are obtained by tting the lower frequencies part of the periodogram to an AR(1) process for the non-averaged and the averaged pe- riodogram respectively. These values are very close to those reported by Timmer and König. As in the former example, the averaging interval of periodogram has a strong inuence on the parameters of the AR(1) model. This dependence is shown in gure 11 as a function of M for dierent degrees of the polynomial trend. For small values of M, the values of φ and σ for dierent polynomial trends are quite dierent. Their variability for M > 9 is signicantly reduced, therefore the averaging procedure can eliminate the uctuation of the periodogram. This is why M = 11 is used in this investigation. 13
Autoregressive description of biological phenomena Fig. 11. a) Dependence of the parameters (a) φ and (b) σ for the AR(1) model on the polynomial trend degree (from 0 to 9) for dierent M values. Continuous lines represent the values reported by [10]. For small values of M, the values of φ and σ for different polynomial trends are quite different. For M > 9, their variability is sig- nificantly reduced, so the averaging procedure can eliminate the periodogram of fluctuation. Another parameter which has to be analyzed in order to t an autoregressive model is the cutting frequency ν 0 . It can be noticed that at frequencies smaller than 0.02 the perio- dogram presents oscillations, which can be caused by the white noise at higher frequen- cies. The dependence of the autoregressive parameters on the value of ν 0 is presented in gure 12. It shows that dispersion of the noise does not depend signicantly on the perio- dogram interval used for the tting procedure. However, for all the degrees of the poly- nomial trend eliminated from the initial signal, the value of φ decreases with the increase of ν 0 . Only the rst two values are almost equal. Such a decrease can be explained by the inuence of the observational white noise in the modeled part of the periodogram. Con- sequently, the cutting frequency was established at ν 0 = 0.005. Fig. 12. The dependence of the autoregressive parameters (a) φ and (b) σ on the cutting frequency ν 0 . All these results show a signicant dependence of the autoregressive parameters on the degree of the polynomial trend which was eliminated from the initial signal. The de- pendence of the autoregressive parameters on the degree of polynomial trend for several values of the cutting frequency is presented in gure 13. 14
Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea Fig. 13. The dependence of the autoregressive parameters (a) φ and (b) σ on the degree of polynomial trend q. First, the stepwise variation at q = 2, 5 and 9 can be noticed, which corresponds to the discussion mentioned at the beginning of this section. Another discontinuity at q =11 was no longer used, as the oscillations at the end of the interval become too large. Also, it can be noticed that the autoregressive parameters decrease as the degree of the polynomials increases. This is due to the fact that higher polynomial degrees can better describe the oscillations of the signal, while for lower degrees they are modeled by the autoregressive process. As Koning and Timmer did not remove any deterministic trend from the signal, a comparison with their results should consider the results for q =0. The relative error for φ is 0.2% and 5% for σ, which conrms the spectral autoregressive modeling method of the time series proposed in this work. 6. Conclusions Four dierent kind of uctuating phenomena originating from molecular biology, cell biology, cognitive psychology and astrophysics proved to be short-range memory sys- tems, and a signicant percentage of them could be described by the most simple autore- gressive model of rst order AR(1). The characteristic parameters for this model were φ the strength of interaction among consecutive terms – and the dispersion σ of the data. Phenomena can be broadly classied into three categories according to φ: a) strong inter- acting systems with φ =0.800.99; b) medium interacting systems with φ =0.50.7, and c) weak interacting systems with φ =0.20.4. This classication is neither strictly delim- ited nor attributed to specic criteria apart from arbitrary numerical appreciation. The rst category includes the X-ray variability of galaxies and protein structures. In the sec- ond category falls the ickering of blood cells and in the third category, some data for a cognitive process, such as the random generation of numbers, and DNA structure (not shown). The proportion of AR(1) models in each category may vary to some extend. However, around 1/4 to 1/3 of the cases is described by AR(1). It is also evident that higher degree autoregressive models are operative in other cases. The parameter φ proved to be sensitive and therefore can be protably exploited to investigate various eects on the uctuating system. On the other hand, such processes can easily be confused and/or approximated by power-laws. The most important step in disclosing the nature of uctuations is to average out their spectra. Apparent 1/f spectra should be cautiously treated and averaging should be compulsory. It is suspected that 15
Autoregressive description of biological phenomena other phenomena are suitable for an autoregressive description, including higher order autoregressive models. Acknowledgements This work was funded by Romanian Academy for Scientic Research, Grant no. 2CEEX- 06-II-96. References [1] Stanley, H.E., Ostrowsky, N. Correlations and connectivity: geometric aspects of physics, chemistry and biology, Dordrecht: Kluwer, 1990. [2] Bunde, A., Havlin, S. Fractals in science, Berlin: Springer, 1994. [3] Milotti, E. 1/f noise: a pedagogical review. arxivpreprint,physics/0204033, 2002. [4] Mandelbrot, B.B. Multifractals and 1/f noise, New York: Springer, 1998. [5] Bassingthwaighte, J.B. et al. Fractal physiology, New York: Oxford University Press, 1994. [6] Morariu, V.V., Coza, A. Nonlinear properties of the atomic vibrations in protein backbones. Physica A, 320, 461-474, 2003. [7] Morariu, V.V., Isvoran, A., Zainea, O. A nonlinear approach to the structure- mobility relationship in protein main chains. Chaos, Solitons and Fractals, 32, 1305- 1315, 2007. [8] Morariu, V.V., Coza, A. 2001 Quarter power scaling in dynamics: experimental evidence from cell biology and cognitive psychology. Fluctuation and Noise Letters, 1, L111-L116, 2001. [9] Morariu, V.V., Coza, A., Chis, M. A., Isvoran, A., Morariu, L.C. Scaling in cogni- tion. Fractals, 9, 379-391, 2001. [10] König, M., Timmer, J. Analyzing X-ray variability by linear state space models. Suppl. Ser., 124, 589-596, 1997. [11] Timmer, J., Schwarz, U., Voss, H.U., Kurths, J. Linear and nonlinear time series analysis of the black hole candidate Cygnus X-1. Phys. Rev. E, 61, 1342-1352, 2000. [12] Thornton, Th.L., Gilden, D.L. Provenance of correlation in psychological data. Psychonomic Bulletin & Rev., 12, 409-441, 2005. [13] Brockwel, P.J., Davies, R.A. Time series: theory and methods, 2nd edn. New York: Springer, 1991. [14] Hamilton, J.D. Time series analysis, Princeton University Press, 1994. [15] Vamos, C., Soltuz, S.M., Craciun, M. Order 1 autoregressive process of nite length. arxiv:0709.2963, 2007. [16] Stoica, P., Moses, R.L. Introduction to spectral analysis, New Jersey: Prentice Hall, 1997. [17] Rasband, W.S.ImageJ, U. S. National Institutes of Health, Bethesda, Maryland, USA, http://rsb.info.nih.gov/ij, 1997-2007. [18] Elsworth, Y.P., James, J.F. The icker spectrum of AE Aquarii. Mon. Not. R. Astr. Soc., 198, 889-896, 1982. [19] Lochner, J.C., Swank, J.H., Szymkowiak, A.E. Shot model parameters for Cyg- nus X-1 through phase portrait tting. Astrophys. J., 376, 295-311, 1991. [20] Smith, M.A., Robinson, R.D. Interplay of periodic, cyclic and stochastic vari- ability in selected areas of the H-R diagram. ASP Conf. Ser., 292, 263-274, 2003. [21] Gaskell, C.M., Klimek, E.S. Variability of active galactic nuclei from the optical X-ray regions. Astron. Astrophys. Trans., 22, 661-679, 2003. 16
Morariu, Vamos, Pop, Soltuz, Buimaga-Iarinca, Zainea 17 [22] Uttley, P., McHardy, I.M. The ux-dependent amplitude of broadband noise variability in X-ray binaries and active galaxies. Mon. Not. R. Astr. Soc., 323, L26L30, 2001. [23] Papadakis, I.E., Lawrence, A. A detailed X-ray variability study of the Seyfert galaxy NGC 4051. Mon. Not. R. Astr. Soc. 272, 161-183, 1995. [24] McHardy, I.M., Papadakis, I.E., Uttley, P., Page, M.J., Mason, K.O. Combined long and short time-scale X-ray variability of NGC 4051 with RXTE and XMM-Newton. Mon. Not. R. Astr. Soc., 348, 783-801, 2004. [25] Peng, C.-K., Buldyrev, S.V., Havlin, S., Simons, M., Stanley, H.E., Goldberger, A.L. Mosaic organization of DNA nucleotides. Phys. Rev. E, 49, 1685-1689, 1994. [26] Vamos, C. Automatic algorithm for monotone trend removal. Phys. Rev. E, 75, 036705, 2007
2010

Related Posts