Properties of the complexity function for finite words

Abstract

The subword complexity function \(p_{w}\) of a finite word \(w\) over a finite alphabet A with card \(A=q\geq1\) is defined by \(p_{w}(n)=\)card \((F(w)\cap A^{n})\) for \(n\in N\), where \(F(w)\) represents the set of all the subwords or factors of \(w\). The shape of the complexity function, especially its piecewise monotonicity, is studied in detail. \newline \qquad The function h defined as \(h(n)=min\{q^{n},N-n+1\}forn\in \{0,1,…,N\}\) has values greater than or equal to those of the complexity function pw for any \(w\in A^{N}\) , i.e., \(pw(n)\leq h(n)\) for all \(n\in \{0,1,…,N\}\). As a first result regarding \(h\), it is proved that for each \(N\in N\) there exist words of length N for which the maximum of their complexity function is equal to the maximum of the function \(h\); a way to construct such words is described. This result gives rise to a further question: for a given \(N\), is there a word of length \(N\) whose complexity function coincides with h for each \(n\in \{0,1,…,N\}\)? The problem is answered in affirmative, with different constructive proofs for binary alphabets \((q=2)\) and for those with \(q>2\). This means that for each \(N\in N\), there exist words w of length \(N\) whose complexity function is equal to the function h. Such words are constructed using the de Bruijn graphs.

Authors

Mira-Cristiana Anisiu
T. Popoviciu” Institute of Numerical Analysis, Romanian Academy, Romania

Julien Cassaigne
Institut de Mathematiques de Luminy, CNRS UPR 9016 / FRUMAM, Case 907, 13288 Marseille, France,

Keywords

Subword complexity function; finite words; de Bruijn graph.

Paper coordinates

M.-C. Anisiu, J. Cassaigne, Properties of the complexity function for finite words, Rev. Anal. Numér. Théor. Approx. 33 (2) (2004), 123-139 (pdf file here)

PDF

About this paper

Journal
Publisher Name
Print ISSN
Online ISSN

google scholar link

[1] Anisiu, M.-C., Blazsik, Z.  and Kasa, Z., Maximal complexity of finite words, Pure Math. Appl., 13, pp. 39–48, 2002.
[2] de Bruijn, N. G., A combinatorial problem, Nederl. Akad. Wetensch. Proc., 49, pp. 758–764, 1946 = Indag. Math., 8, pp. 461–467, 1946.
[3] de Bruijn, N. G., Acknowledgement of priority to C. Flye Sainte-Marie on the counting of circular arrangements of 2n zeros and ones that show each n-letter word exactly once,
T. H. -Report 75-WSK-06, Technological University Eindhoven, the Netherlands, pp. 1–14, 1975.
[4] Champernowne, D. G., The construction of decimals normal in the scale of ten, J. London Math. Soc., 8, pp. 254–260, 1933.
[5] Cummings, L. J. and Wiedemann, D., Embedded de Bruijn sequences, Proceedings of the 7th Southeastern international conference on combinatorics, graph theory, and computing (Boca Raton, Florida, 1986), Congr. Numer., 53, pp. 155–160, 1986.
[6] Ehrenfeucht, A., Lee, K. P. and Rozenberg, G., Subword complexities of various classes of deterministic developmental languages without interactions, Theoret. Comput. Sci., 1, pp. 59–75, 1975.
[7] Flye Sainte-Marie, C., Solution to question nr. 48, l’ Intermediaire des Mathematiciens, 1, pp. 107–110, 1894.
[8] Fredericksen, H., A survey of full length nonlinear shift register cycle algorithms, SIAM Review, 24, pp. 195–221, 1982.
[9] Games, R. A., A generalized recursive construction for de Bruijn sequences, IEEE Trans. Inform. Theory, 29, pp. 843–850, 1983.
[10] Good, I. J., Normal recurring decimals, J. London Math. Soc., 21, pp. 167–169, 1946.
[11] Heinz, M., Zur Teilwortkomplexit¨at f¨ur W¨orter und Folgenuber einem endlichen Alphabet, EIK, 13, pp. 27–38, 1977.
[12] Hunyadvary, L. and Ivanyi, A., On some complexity measures of words, Dep. Math., Karl Marx Univ. Econ., Budapest 1984-2, pp. 67–82, 1984.
[13] Hunyadvary, L. and Ivanyi, A. , Construction of complex chains sequences and ring sequences, in Abstracts of Colloquium Theory of Algorithms, Pecs, p. 20, 1984.
[14] Leve, F. and Seebold, P. , Proof of a conjecture on word complexity, Journees Montoises d’Informatique Theorique (Marne-la-Vallee, 2000), Bull. Belg. Math. Soc. Simon Stevin, 8, pp. 277–291, 2001.
[15] de Luca, A., On the combinatorics of finite words, Theoret. Comput. Sci., 218, pp 13–39, 1999.
[16] Martin, M. H., A problem in arrangements, Bull. American Math. Soc., 40, pp. 859–864, 1934.
[17] Morse, M. and Hedlund, G. A., Symbolic dynamics, Amer. J. Math., 60, pp. 815–866, 1938.
[18] Ralston, A., A new memoryless algorithm for de Bruijn sequences, J. Algorithms, 2, pp. 50–62, 1981.
[19] Shallit, J., On the maximum number of distinct factors in a binary string, Graph Comb., 9, pp. 197–200, 1993.
[20] Voros, N. , On the complexity measures of symbol-sequences, in Proceedings of the Conference of Young Programmers and Mathematicians (ed. A. Iv´anyi), Eotvos Lorand University, Budapest, pp. 43–50, 1984.

soon

Paper (preprint) in HTML form

PROPERTIES OF THE COMPLEXITY FUNCTION FOR FINITE WORDS

MIRA-CRISTIANA ANISIU* and JULIEN CASSAIGNE
Dedicated to Professor Elena Popoviciu on the occasion of her 80th birthday
Abstract

The subword complexity function pwp_{w} of a finite word ww over a finite alphabet AA with cardA=q1\operatorname{card}A=q\geq 1 is defined by pw(n)=card(F(w)An)p_{w}(n)=\operatorname{card}\left(F(w)\cap A^{n}\right) for nn\in\mathbb{N}, where F(w)F(w) represents the set of all the subwords or factors of ww. The shape of the complexity function, especially its piecewise monotonicity, is studied in detail.
The function hh defined as h(n)=min{qn,Nn+1}h(n)=\min\left\{q^{n},N-n+1\right\} for n{0,1,,N}n\in\{0,1,\ldots,N\} has values greater than or equal to those of the complexity function pwp_{w} for any wANw\in A^{N}, i.e., pw(n)h(n)p_{w}(n)\leq h(n) for all n{0,1,,N}n\in\{0,1,\ldots,N\}. As a first result regarding hh, it is proved that for each NN\in\mathbb{N} there exist words of length NN for which the maximum of their complexity function is equal to the maximum of the function hh; a way to construct such words is described. This result gives rise to a further question: for a given NN, is there a word of length NN whose complexity function coincides with hh for each n{0,1,,N}n\in\{0,1,\ldots,N\} ? The problem is answered in affirmative, with different constructive proofs for binary alphabets (q=2)(q=2) and for those with q>2q>2. This means that for each NN\in\mathbb{N}, there exist words ww of length NN whose complexity function is equal to the function hh. Such words are constructed using the de Bruijn graphs.

MSC 2000. 68R15.
Keywords. Subword complexity function, finite words, de Bruijn graph.

1. DEFINITIONS

Let an alphabet AA with card A=q1A=q\geq 1 be given. A factor (subword) uu of an infinite sequence or finite word ww has the right valence jj if there are jj and only jj distinct letters xix_{i} such that uxi,1ijux_{i},1\leq i\leq j are also in F(w)F(w) (the set of all the subwords, or factors, of ww ); if a factor has the right valence jj it can be extended on the right in exactly jj ways. The left valence is defined in a similar way. A factor having the right (left) valence 2\geq 2 is called right (left) special; a factor which is both right and left special is called bispecial. The length of a word ww will be denoted by |w||w|.

00footnotetext: *"T. Popoviciu" Institute of Numerical Analysis, P.O. Box 68, 400110 Cluj-Napoca, Romania, e-mail: mira@math.ubbcluj.ro.
Institut de Mathématiques de Luminy, CNRS UPR 9016 / FRUMAM, Case 907, 13288 Marseille, France, e-mail: cassaign@iml.univ-mrs.fr.

For an infinite sequence UU any factor uu can always be extended on the right in a factor of UU. For a finite word ww there are subwords which cannot be extended on the right. Such words have to be suffixes of ww. Let us denote by w0w_{0} the suffix of ww of minimal length which cannot be extended on the right and by KK the length of w0w_{0}. Then any other subword λw0\lambda w_{0} also cannot be extended on the right. Considering the prefix of ww of minimal length which cannot be extended on the left, we shall denote its length by HH. The constants KK and HH were defined by de Luca [15].

Let us denote by S0(w)S_{0}(w) the set of all suffixes of ww which cannot be extended on the right in F(w)F(w), i.e., their right valence is 0 . If the length of ww is NN, then we set for any 0nN0\leq n\leq N

s0(n)=card(S0(w)An)=s(0,n)s_{0}(n)=\operatorname{card}\left(S_{0}(w)\cap A^{n}\right)=s(0,n)

For all 0nN0\leq n\leq N, one has s0(n)1s_{0}(n)\leq 1. Moreover, KK being the length of w0w_{0} (the suffix of ww of minimal length which cannot be extended on the right), s0s_{0} is given by

s0(n)={0,0nK11,KnNs_{0}(n)=\begin{cases}0,&0\leq n\leq K-1\\ 1,&K\leq n\leq N\end{cases}

It follows that the number of subwords which cannot be extended on the right is

card(S0(w))=NK+1.\operatorname{card}\left(S_{0}(w)\right)=N-K+1.

For an infinite sequence UU, the (subword) complexity function pU:p_{U}:\mathbb{N}\longrightarrow\mathbb{N} (defined in [17] as the block growth, then named subword complexity in [6]) is given by pU(n)=card(F(U)An)p_{U}(n)=\operatorname{card}\left(F(U)\cap A^{n}\right) for nn\in\mathbb{N}, so it maps each nonnegative number nn to the number of factors of length nn of UU; it verifies the iterative equation

pU(n+1)=pU(n)+j=2q(j1)s(j,n),p_{U}(n+1)=p_{U}(n)+\sum_{j=2}^{q}(j-1)s(j,n), (1)

s(j,n)s(j,n) being the cardinal of the set of the factors of UU having the length nn and the right valence jj.

For a finite word ww of length NN, the complexity function pw:p_{w}:\mathbb{N}\longrightarrow\mathbb{N} given by pw(n)=card(F(w)An),np_{w}(n)=\operatorname{card}\left(F(w)\cap A^{n}\right),n\in\mathbb{N}, has the property that pw(n)=0p_{w}(n)=0 for n>Nn>N. The corresponding iterative equation is

pw(n+1)=pw(n)+j=2q(j1)s(j,n)s0(n).p_{w}(n+1)=p_{w}(n)+\sum_{j=2}^{q}(j-1)s(j,n)-s_{0}(n). (2)

Since s0(n)=s(0,n)s_{0}(n)=s(0,n) we can write (2) in a condensed form

pw(n+1)=pw(n)+j=0q(j1)s(j,n)p_{w}(n+1)=p_{w}(n)+\sum_{j=0}^{q}(j-1)s(j,n) (3)

The above relations have their correspondents in terms of left extensions of the subwords.

For a finite word ww of length NN over the alphabet AA with card A=qA=q, the subword complexity pw(n)p_{w}(n) will be less than or equal to the number qnq^{n} of all the possible words of length nn over the qq-letter alphabet and also less than or equal to the number Nn+1N-n+1 of all occurrences of subwords of length nn in ww. The map h:{0,1,,N}h:\{0,1,\ldots,N\}\longrightarrow\mathbb{N} defined in [15]

h(n)=min{qn,Nn+1}h(n)=\min\left\{q^{n},N-n+1\right\} (4)

will have values greater than or equal to those of any complexity function pwp_{w} for wANw\in A^{N}, i.e., pw(n)h(n),n{0,1,,N}p_{w}(n)\leq h(n),n\in\{0,1,\ldots,N\}. The fact that pw(n)Nn+1p_{w}(n)\leq N-n+1 was stated in [11], while pw(n)h(n)p_{w}(n)\leq h(n) appeared in [20].

We recall that for infinite sequences UU one has

pU(n)qn,n,p_{U}(n)\leq q^{n},n\in\mathbb{N},

and that there exist sequences, called complete, for which the complexity is precisely qnq^{n} for all nn\in\mathbb{N}. An example is the Champernowne sequence
0.1.10.11.100.101.110.111.1000. …
containing successively all the nonnegative integers written in base 2 , and, more generally, in base qq (it was used in [4] to construct a normal number in base ten).

2. PROPERTIES OF THE FUNCTION hh

Remark 1. We mention at first the trivial case when q=1q=1, for which h(n)=1h(n)=1 for all n{0,1,,N}n\in\{0,1,\ldots,N\}; there is a word, namely w1=aNw_{1}=a^{N}, whose complexity satisfies pw1(n)=1p_{w_{1}}(n)=1 for all n{0,1,,N}n\in\{0,1,\ldots,N\}.

For n1,q2n\geq 1,q\geq 2 and NqN\leq q, we have

Nn+1qqnN-n+1\leq q\leq q^{n}

hence h(n)=Nn+1h(n)=N-n+1 for all n{1,,N}n\in\{1,\ldots,N\}. For each word w2w_{2} containing NqN\leq q distinct elements of AA the complexity function is pw2(n)=Nn+1p_{w_{2}}(n)=N-n+1 for all n{1,,N}n\in\{1,\ldots,N\}, and pw2(0)=h(0)=1p_{w_{2}}(0)=h(0)=1, hence in this case it also coincides with hh.

In what follows we shall consider q2q\geq 2 and N>qN>q. The values of the function hh are given by the minimum of the values of an increasing exponential and of a descending line, so at the beginning hh will follow the exponential, and then the descending line. The following result is presented without proof in [15]:

Proposition 1. If eNe_{N} denotes the first point where ( eN,h(eN)e_{N},h\left(e_{N}\right) ) is on the descending line, the maximum hmaxh_{\max} of the function hh is attained at eNe_{N}, and eNe_{N} is given by [logqN]\left[\log_{q}N\right] or [logqN]+1\left[\log_{q}N\right]+1 (for a real xx, [x][x] denotes the largest integer which is less than or equal to xx ).

We shall determine precisely the point (eN,h(eN))\left(e_{N},h\left(e_{N}\right)\right) where the maximum of the function hh is taken.

Proposition 2. Let eN=min{m{1,,N}:h(m)=Nm+1}e_{N}=\min\{m\in\{1,\ldots,N\}:h(m)=N-m+1\}. If NN\in\mathbb{N} is given so that qk+k<N<qk+1+k+1q^{k}+k<N<q^{k+1}+k+1, then eN=k+1e_{N}=k+1 and this is the unique point where hh attains its maximum NkN-k; for N=qk+kN=q^{k}+k, we have eN=k+1e_{N}=k+1 and the function hh attains its maximum NkN-k at both eNe_{N} and eN1e_{N}-1.

In fact, if qk+kN<qk+1+k+1q^{k}+k\leq N<q^{k+1}+k+1, the function hh is given by

h(n)={qn if nkNn+1 if nk+1h(n)=\begin{cases}q^{n}&\text{ if }n\leq k\\ N-n+1&\text{ if }n\geq k+1\end{cases}

and hmax=Nk=h(k+1)h_{\max}=N-k=h(k+1), for N=qk+kN=q^{k}+k the maximum being attained also at the point kk.

Proof. Let NN\in\mathbb{N} be given so that qk+kN<qk+1+k+1q^{k}+k\leq N<q^{k+1}+k+1. The function hh being increasing on {0,1,,eN1}\left\{0,1,\ldots,e_{N}-1\right\} and decreasing on {eN,,N}\left\{e_{N},\ldots,N\right\}, we have only to compare its values on eN1e_{N}-1 and eNe_{N}. From the definition of hh and of eNe_{N} we have

qeN1<N(eN1)+1q^{e_{N}-1}<N-\left(e_{N}-1\right)+1 (6)

and

qeNNeN+1q^{e_{N}}\geq N-e_{N}+1 (7)

which means that

qeN1+eN1N<qeN+eNq^{e_{N}-1}+e_{N}-1\leq N<q^{e_{N}}+e_{N} (8)

But h(eN1)=qeN1h\left(e_{N}-1\right)=q^{e_{N}-1} and h(eN)=NeN+1h\left(e_{N}\right)=N-e_{N}+1, hence from (6) it follows that the maximum of hh is taken at eNe_{N}. The function f(x)=qx+xf(x)=q^{x}+x being increasing, from (8) one obtains eN=k+1e_{N}=k+1 and h(eN)=Nkh\left(e_{N}\right)=N-k.

If qk+k<N<qk+1+k+1q^{k}+k<N<q^{k+1}+k+1 we have h(k+1)>h(k)h(k+1)>h(k), so eN=k+1e_{N}=k+1 is the unique point where hh attains its maximum which is equal to NkN-k; if N=qk+kN=q^{k}+k, we have h(k+1)=h(k)h(k+1)=h(k), so the maximum NkN-k of hh is taken at two points eN=k+1e_{N}=k+1 and eN1e_{N}-1.

The description of hh given in (5) was established in [12]. The value of eNe_{N} being related to the integer part of logqN\log_{q}N, we can give a more precise result than that in the above Proposition 1.

Remark 2. For qk+kN<qk+1+k+1q^{k}+k\leq N<q^{k+1}+k+1, we have k=[logqN]k=\left[\log_{q}N\right] for qk+kN<qk+1q^{k}+k\leq N<q^{k+1}, and k+1=[logqN]k+1=\left[\log_{q}N\right] for qk+1N<qk+1+k+1q^{k+1}\leq N<q^{k+1}+k+1; it follows that in the first case eN=[logqN]+1e_{N}=\left[\log_{q}N\right]+1, and in the second case eN=[logqN]e_{N}=\left[\log_{q}N\right].

Given the number NN, the maximum of the function hh can be easily determined.

Example 1. Let N=6,q=2N=6,q=2; we obtain eN=3e_{N}=3 and hmax =4=h(2)=h(3)h_{\text{max }}=4=h(2)=h(3). For N=7,q=2N=7,q=2, we have eN=3e_{N}=3 and hmax =5=h(3)h_{\text{max }}=5=h(3).

3. PROPERTIES OF THE COMPLEXITY FUNCTION pwp_{w}

For an infinite sequence UU, the complexity function pUp_{U} is nondecreasing; if there exists mm\in\mathbb{N} such that pU(m+1)=pU(m)p_{U}(m+1)=p_{U}(m), then pUp_{U} is constant for nmn\geq m. The complexity function for a finite word ww of length NN has obviously a different behaviour, because of pw(N)=1p_{w}(N)=1 (there is a unique factor of length NN, namely w)w). The study of the shape of pwp_{w} was considered by Heinz [11] and then by de Luca in [15], the results in [15] being briefly exposed in what follows. Then the piecewise monotonicity of pwp_{w} is established in Theorems 5 and 6.

Let us consider for n{0,,N}n\in\{0,\ldots,N\} the number Rw(n)R_{w}(n) of all right special factors of length nn. Any suffix of a right special factor is still a right special factor. Since Rw(N1)=Rw(N)=0R_{w}(N-1)=R_{w}(N)=0, one can define an integer RR by

R=min{n:Rw(n)=0}R=\min\left\{n\in\mathbb{N}:R_{w}(n)=0\right\}

One has 0RN10\leq R\leq N-1; thus R1R-1 represents the maximal length of a right special factor of ww (excepting the case of the word aNa^{N} which has no special factor and for which R=0R=0 ). If R=1R=1, in ww there are no right special factors of length n1n\geq 1; such an example is w=(ab)k,k1w=(ab)^{k},k\geq 1. Similarly, there exists a number 0LN10\leq L\leq N-1 so that L1L-1 represents the maximal length of a left special factor of ww (except if w=aNw=a^{N} ). Remember the number K(H)K(H) representing the minimal length of a suffix (prefix) of ww which cannot be extended on the right (left). The numbers KK and RR (or their duals HH and L)L) play an important role in the description of the shape of pwp_{w}.

Let us denote

r(n)=j=2q(j1)s(j,n),n{1,,N}r(n)=\sum_{j=2}^{q}(j-1)s(j,n),n\in\{1,\ldots,N\}

The function rr has the property that r(n)>0r(n)>0 for n[0,R1]n\in[0,R-1], and r(n)=0r(n)=0 for n[R,N]n\in[R,N]. The recurrence relation (2) can be written as

pw(n+1)=pw(n)+r(n)s0(n)p_{w}(n+1)=p_{w}(n)+r(n)-s_{0}(n) (9)

If R<KR<K, for n[0,R1]n\in[0,R-1] one has s0(n)=0s_{0}(n)=0 and r(n)>0r(n)>0. From (9) we obtain that pwp_{w} is strictly increasing on [0,R][0,R]. For n[R,K1],s0(n)=0n\in[R,K-1],s_{0}(n)=0 and r(n)=0r(n)=0, so pwp_{w} is constant on the interval [R,K][R,K]. For n[K,N1]n\in[K,N-1], s0(n)=1s_{0}(n)=1 and r(n)=0r(n)=0, so pwp_{w} is strictly decreasing on [K,N][K,N], and, moreover, for n[K,N1],pw(n+1)=pw(n)1n\in[K,N-1],p_{w}(n+1)=p_{w}(n)-1.

If RKR\geq K, for n[0,K1]n\in[0,K-1] we obtain s0(n)=0,r(n)>0s_{0}(n)=0,r(n)>0, so from (9) pwp_{w} will be strictly increasing on [0,K][0,K]. For n[K,R1],s0(n)=1n\in[K,R-1],s_{0}(n)=1 and r(n)>0r(n)>0, hence pwp_{w} is non-decreasing on [K,R][K,R]. For n[R,N1]n\in[R,N-1] one has s0(n)=1s_{0}(n)=1 and r(n)=0r(n)=0, which implies that pwp_{w} is strictly decreasing on [R,N][R,N] and, for n[R,N1],pw(n+1)=pw(n)1n\in[R,N-1],p_{w}(n+1)=p_{w}(n)-1.

It follows
Proposition 3. [15]. The subword complexity pwp_{w} takes its maximum at RR and, moreover,

pw(R)=Nmax{R,K}+1p_{w}(R)=N-\max\{R,K\}+1

Proof. In both cases analyzed above, pwp_{w} has its maximum at RR. If RKR\geq K, pw(n+1)=pw(n)1p_{w}(n+1)=p_{w}(n)-1 for all n[R,N1]n\in[R,N-1], so 1=pw(N)=pw(R)(NR)1=p_{w}(N)=p_{w}(R)-(N-R) and then pw(R)=NR+1p_{w}(R)=N-R+1. If R<K,pw(n+1)=pw(n)1R<K,p_{w}(n+1)=p_{w}(n)-1 for all nn in [K,N1][K,N-1] and 1=pw(N)=pw(K)(NK)1=p_{w}(N)=p_{w}(K)-(N-K). Since pw(K)=pw(R)p_{w}(K)=p_{w}(R) the result follows.

In a similar way, one can prove
Proposition 4. [15]. The subword complexity pwp_{w} takes its maximum at LL and, moreover,

pw(L)=Nmax{L,H}+1,p_{w}(L)=N-\max\{L,H\}+1,

hence max{R,K}=max{L,H}\max\{R,K\}=\max\{L,H\}.
From the analysis before Proposition 3, we have the following information on the shape of the function pwp_{w} [15]:

For R<KR<K, it is strictly increasing (starting from pw(0)=1p_{w}(0)=1 and pw(1)=q=cardAp_{w}(1)=q=\operatorname{card}A ), then constant, and then strictly decreasing (with pw(n+1)=pw(n)1p_{w}(n+1)=p_{w}(n)-1 on the last interval).

For RK,pwR\geq K,p_{w} is at first strictly increasing, then non-decreasing, and at last strictly decreasing also with pw(n+1)=pw(n)1p_{w}(n+1)=p_{w}(n)-1.

So in both cases, there is an interval on which pwp_{w} is increasing and one on which pwp_{w} is strictly decreasing. The only problem is that in the second case it could be that after becoming constant, pwp_{w} would increase again. We show that this is not the case.

Let us consider n[K,R1]n\in[K,R-1], so s0(n)=1,r(n)>0s_{0}(n)=1,r(n)>0 and pw(n+1)pw(n)p_{w}(n+1)\geq p_{w}(n). Suppose that there exists nn so that

Kn<n+1<n+2R,\displaystyle K\leq n<n+1<n+2\leq R,
pw(n+1)=pw(n),\displaystyle p_{w}(n+1)=p_{w}(n), (10)
pw(n+2)>pw(n+1).\displaystyle p_{w}(n+2)>p_{w}(n+1). (11)

From (10) one obtains that s(2,n)=1s(2,n)=1 and s(j,n)=0s(j,n)=0 for j3j\geq 3, i.e., there exists a unique right special factor having length nn, and its valence is 2 : let it be denoted vnv_{n}. From (11) it follows that

r(n+1)=j=2q(j1)s(j,n+1)>1,r(n+1)=\sum_{j=2}^{q}(j-1)s(j,n+1)>1,

which is possible for two situations:
I. s(2,n+1)=2s(2,n+1)=2;
II. j3,s(j,n+1)0\exists j\geq 3,s(j,n+1)\neq 0.

If II. is true, there will be a right special factor of length n+1n+1 having valence at least 3, and then the factor obtained by excluding its first letter will have length nn and valence at least 3 , contradicting the uniqueness of vnv_{n}.

If I. is true, there will exist two different right special factors of length n+1n+1 and valence at least 2 . They can differ only by their first letter, otherwise there would exist two different factors of length nn and valence 2 . So they will have the form

avn,bvn,abav_{n},bv_{n},a\neq b

i.e., vnv_{n} will be bispecial, and in the word ww there will be also

avnc,avnd,bvnc,bvnd,ab,cdav_{n}c,av_{n}d,bv_{n}c,bv_{n}d,a\neq b,c\neq d (12)

The subword vnv_{n} cannot be a suffix of ww since vnv_{n} is extendable to the right and there is no extendable suffix of length greater than or equal to KK. Let us consider the last occurrence of vnv_{n}, suppose it is followed by cc. Then

w=z1vncz2w=z_{1}v_{n}cz_{2} (13)

and, vncv_{n}c being left special, vncv_{n}c will have another occurrence in ww, so

w=z1vncz2,|z2|>|z2|w=z_{1}^{\prime}v_{n}cz_{2}^{\prime},\left|z_{2}^{\prime}\right|>\left|z_{2}\right|

Let uu be the longest common prefix of vncz2v_{n}cz_{2} and vncz2v_{n}cz_{2}^{\prime}, which will satisfy

n+1|u||vncz2|n+1\leq|u|\leq\left|v_{n}cz_{2}\right|

Since the subword uu is a proper prefix of vncz2,uv_{n}cz_{2}^{\prime},u is right extendable; then it cannot be a suffix of ww, hence it is also a proper prefix of vncz2v_{n}cz_{2}, and thus right special. The suffix of length nn of uu is then right special, in contradiction with the fact that the last occurrence of vnv_{n}, the unique right special factor of length nn, was chosen so that w=z1vncz2w=z_{1}v_{n}cz_{2}. It follows that in the case K<RK<R, if pw(n)=pw(n+1)p_{w}(n)=p_{w}(n+1) for a value nKn\geq K, then pwp_{w} will remain constant until it will begin (at RR ) to decrease to 1 (it cannot start increasing again). We mention that Heinz [11] proved that from pw(n)=pw(n+1)p_{w}(n)=p_{w}(n+1) and N>pw(n)+nN>p_{w}(n)+n it follows pw(n)=pw(n+2)p_{w}(n)=p_{w}(n+2).

Let us denote by JJ the smallest number greater than or equal to KK for which ww has precisely one right special factor of that length, with valence 2 (if this is not the case, take J=RJ=R ). We have established the following

Theorem 5. For a finite word of length NN, the complexity function is at first strictly increasing, then constant and at last decreasing having the slope -1 . If R<KR<K, the successive intervals are [0,R],[R,K][0,R],[R,K] and [K,N][K,N], while for RKR\geq K they are [0,J],[J,R][0,J],[J,R] and [R,N][R,N].

One can easily avoid to analyze two cases by simply considering instead of a word wANw\in A^{N}, a word W(A{,#})N+2W\in(A\cup\{*,\#\})^{N+2} obtained by adding two different symbols which are not in AA at the beginning and at the end of ww, i.e., W=w#W=*w\#. The complexity functions for ww and WW are related by pW(n)=pw(n)+2p_{W}(n)=p_{w}(n)+2 for n{1,,N+1}n\in\{1,\ldots,N+1\} (and obviously pW(N+2)=1p_{W}(N+2)=1 ). So the graph of pWp_{W} is the graph of pwp_{w} shifted by two units parallel to the yy-axis,
and the two functions have the same monotonicity. For WW we have KW=1K_{W}=1, RWRwR_{W}\geq R_{w} and, similarly, HW=1,LWLwH_{W}=1,L_{W}\geq L_{w}, hence in this case we have always RWKWR_{W}\geq K_{W}; from Proposition 4 it follows also RW=LWR_{W}=L_{W}. The advantage of considering the word WW is that instead of the four parameters K,H,R,LK,H,R,L we are left with only one, namely the common value MM of RW=LWR_{W}=L_{W}. Denoting by JJ the smallest positive number for which WW has precisely one right special factor of that length, with valence 2 (if there is not such a factor, J=MJ=M ), we obtain

Theorem 6. For a finite word ww of length NN, the intervals of monotonicity of pwp_{w} are [0,J],[J,M][0,J],[J,M] and [M,N][M,N], the function increasing at first, being constant and then decreasing with the slope -1 ; the maximum of pwp_{w} is pw(M)=NM+1p_{w}(M)=N-M+1. The numbers JJ and MM are those defined above for the word W=w#W=*w\#.

Example 2. Let w=babbabbbaaw=babbabbbaa, so N=10,K=2,R=5,H=5,L=4N=10,K=2,R=5,H=5,L=4, J=4,M=5J=4,M=5. In this case pw(0)=1,pw(1)=2,pw(2)=4,pw(3)=5p_{w}(0)=1,p_{w}(1)=2,p_{w}(2)=4,p_{w}(3)=5, pw(4)=6p_{w}(4)=6, and pw(n)=11np_{w}(n)=11-n for n{5,,10}n\in\{5,\ldots,10\}.

For w=aababbababw=aababbabab, we have N=10,K=5,R=4,H=2,L=5,J=4N=10,K=5,R=4,H=2,L=5,J=4, M=5M=5 and pw(0)=1,pw(1)=2,pw(2)=4,pw(3)=5,pw(4)=6p_{w}(0)=1,p_{w}(1)=2,p_{w}(2)=4,p_{w}(3)=5,p_{w}(4)=6, and pw(n)=11np_{w}(n)=11-n for n{5,,10}n\in\{5,\ldots,10\}.

REMARK 3. For the words in example 2, the function pwp_{w} is concave on [1,N][1,N], i.e.,

pw(k+2)pw(k+1)pw(k+1)pw(k),k{1,,N2}p_{w}(k+2)-p_{w}(k+1)\leq p_{w}(k+1)-p_{w}(k),k\in\{1,\ldots,N-2\}

However this is not the rule, as the following examples show for both K<RK<R and K>RK>R.

Example 3. Let w=abbabbbaabaw=abbabbbaaba, so N=11,K=3,R=4,H=4,L=4N=11,K=3,R=4,H=4,L=4, J=4,M=4J=4,M=4. In this case pw(0)=1,pw(1)=2,pw(2)=4,pw(3)=7p_{w}(0)=1,p_{w}(1)=2,p_{w}(2)=4,p_{w}(3)=7, and pw(n)=12np_{w}(n)=12-n for n{4,,11}n\in\{4,\ldots,11\}.

Let w=abbabbbaababbbaw=abbabbbaababbba, so N=15,K=7,R=4,H=4,L=7,J=4N=15,K=7,R=4,H=4,L=7,J=4, M=7M=7. In this case pw(0)=1,pw(1)=2,pw(2)=4,pw(3)=7,pw(4)=pw(5)=pw(6)=pw(7)=9p_{w}(0)=1,p_{w}(1)=2,p_{w}(2)=4,p_{w}(3)=7,p_{w}(4)=p_{w}(5)=p_{w}(6)=p_{w}(7)=9, and pw(n)=16np_{w}(n)=16-n for n{8,,15}n\in\{8,\ldots,15\}.

We mention that the refinement of de Luca’s result has been proved independently by Levé and Séébold [14] while studying kk-reachable integers.

4. THE FUNCTION hh AND RELATED WORDS

In section 2 we found the point where the function hh takes its maximum. A problem to be considered is the following: are there any words ww of length NN such that h(eN)=max{pw(n):n{0,,N}}h\left(e_{N}\right)=\max\left\{p_{w}(n):n\in\{0,\ldots,N\}\right\} ? If such words do exist, they have the property that the maximum of their complexity function cannot be exceeded by the maximum of the complexity function of any other word of length NN.

The answer to this problem is in affirmative and it relies on the following result which was stated by Good in [10] for q=2q=2. The enumeration of the words whose existence is proved was given by de Bruijn [2], who later [3] acknowledged the priority of C. Flye Sainte-Marie [7].

Lemma 7. Given an alphabet AA with card A=qA=q, for each kk\in\mathbb{N} the shortest word containing all the qkq^{k} words of length kk has qk+k1q^{k}+k-1 letters.

Proof. The existence of such a word (which is usually named de Bruijn word of order kk ) is proved by considering the de Bruijn graph Bk1B_{k-1} (which is strongly connected) with qk1q^{k-1} vertices labelled with the elements of Ak1A^{k-1}, and qkarcs(q^{k}\operatorname{arcs}( an arc\operatorname{arc} from uu to vv exists if and only if there exist two letters x,yAx,y\in A such that ux=yvAk)\left.ux=yv\in A^{k}\right). Each vertex has the same number qq of inward and outward arcs; therefore, there exists an Eulerian cycle, and each path, starting from any vertex and following the cycle until coming back to that vertex, will provide a word (obviously the shortest) of length qk+k1q^{k}+k-1 which contains exactly one occurrence of all the qkq^{k} words of length kk. The word of length qk+k1q^{k}+k-1 is often identified with the cycle formed by its first qkq^{k} letters.

Remark 4. For the de Bruijn word of order kk, whose existence was proved in Lemma 7, we have R=K=J=kR=K=J=k, and the maximum of its complexity function is attained at kk and equals qkq^{k}. Such a word can be represented in the form x1xqkxqk+k1x_{1}\ldots x_{q^{k}}\ldots x_{q^{k}+k-1} (with xqk+1xqk+k1=x1xk1x_{q^{k}+1}\ldots x_{q^{k}+k-1}=x_{1}\ldots x_{k-1} ), or as a cycle (x1xqk)\left(x_{1}\ldots x_{q^{k}}\right) or as an infinite periodic sequence with the period qkq^{k}.

The first algorithm which constructs such a word was given by Martin [16]. Considering the alphabet A={i1,,iq}A=\left\{i_{1},\ldots,i_{q}\right\}, the algorithm in question is built up out the following three rules.
I. Each of the first k1k-1 symbols is chosen equal to i1i_{1}.
II. The symbol ama_{m} to be added to the sequence a1a2akamk+1am1a_{1}a_{2}\ldots a_{k}\ldots a_{m-k+1}\ldots a_{m-1}, where a1==ak1=i1,mka_{1}=\ldots=a_{k-1}=i_{1},m\geq k and the aa ’s stand for the ii ’s in a certain order, is the iji_{j} with the greatest subscript consistent with the requirement that the section amk+1am1ama_{m-k+1}\ldots a_{m-1}a_{m} duplicate no previously occurring section of kk symbols in the above sequence.
III. Rule II. is first applied for m=km=k (in which case am=ak=iqa_{m}=a_{k}=i_{q} ) and is then applied repeatedly until a further application is impossible.

This algorithm needs a very large memory (for all the subwords of length kk which have already been obtained), but there exist also some memoryless algorithms exposed, for example, in [8], [9] and [18].

We can prove now
Theorem 8. For each NN\in\mathbb{N}, there exists a word of length NN over an alphabet AA with cardA=q\operatorname{card}A=q for which the maximum of the complexity function is equal to the maximum hmaxh_{\max} of hh; the maximum is taken at the same points for both functions. Such words can be easily constructed using the de Bruijn words.

Proof. Keeping in mind the considerations in Remark 1, which mean precisely that the theorem is true for q=1q=1 and for q2,Nqq\geq 2,N\leq q, we shall consider q2q\geq 2 and N>qN>q. Let kk be the unique natural number so that

qk+kNqk+1+kq^{k}+k\leq N\leq q^{k+1}+k

If N=qk+kN=q^{k}+k, we apply Lemma 7 for kk, obtaining a word of length N1N-1 containing as factors all the qkq^{k} words of kk letters, and qk1q^{k}-1 distinct words of length k+1k+1. The word vv obtained by adding a letter from AA at its end will contain qkq^{k} words of kk letters and qkq^{k} distinct words of length k+1k+1, hence pv(k)=pv(k+1)=Nkp_{v}(k)=p_{v}(k+1)=N-k. This is the maximum of the function pvp_{v}, it is equal to hmax h_{\text{max }} and it is attained at the same points as the maximum of hh given in Proposition 2. Actually, in this case we have pv=hp_{v}=h.

Let us now consider the case N=qk+1+km,m{0,1,,qk+1qk1}N=q^{k+1}+k-m,m\in\left\{0,1,\ldots,q^{k+1}-q^{k}-\right.1\}. Applying Lemma 7 for the number k+1k+1, we obtain a shortest word ww containing all the qk+1q^{k+1} words of length k+1k+1, having qk+1+kq^{k+1}+k letters. So for each m{0,1,,qk+1qk1}m\in\left\{0,1,\ldots,q^{k+1}-q^{k}-1\right\}, the prefix wmw_{m} of ww obtained by deleting mm final letters will satisfy pwm(k+1)=qk+1m>qkpwm(k)p_{w_{m}}(k+1)=q^{k+1}-m>q^{k}\geq p_{w_{m}}(k), this being the maximum of the complexity function for the considered word. The maximum is attained only for k+1k+1.

Applying Proposition 2 for N=qk+1+kmN=q^{k+1}+k-m we obtain hmax =h(k+1)=qk+1+kmk=qk+1mh_{\text{max }}=h(k+1)=q^{k+1}+k-m-k=q^{k+1}-m, which means that the maximum of the complexity function for wmw_{m} is equal to the maximum of the complexities of all possible words of length qk+1+kmq^{k+1}+k-m.

Example 4. Let us consider for the 2-letter alphabet A={a,b}A=\{a,b\} the values N=6N=6 and N=7N=7. For N=6=22+2N=6=2^{2}+2 we have k=2k=2 and, by adding a letter (for example aa ) to the Martin word of order 2, abbaa, we obtain v=abbaaav=abbaaa; the maximum of pvp_{v} is pv(2)=pv(3)=4p_{v}(2)=p_{v}(3)=4. For N=7=22+3N=7=2^{2}+3 we can consider the Martin word of order 3, aabbbabaaa, and delete three symbols from the end. The word w3=aabbbabw_{3}=aabbbab has the maximum of its complexity function given by pw3(3)=5p_{w_{3}}(3)=5.

The maximum of the function hh for N=6N=6 and N=7N=7 was calculated in Example 1 and it coincides with that of pvp_{v}, respectively pw3p_{w_{3}} and is taken at the same points.

5. THE REPRESENTATION OF hh AS A COMPLEXITY FUNCTION

An interesting problem is: Let q1q\geq 1 and NN\in\mathbb{N} be given and the function h:{0,1,,N}h:\{0,1,\ldots,N\}\longrightarrow\mathbb{N} defined as in (4). Is there a word ww of length NN over the qq-letter alphabet AA such that

h(n)=pw(n) for all n{0,1,,N}h(n)=p_{w}(n)\text{ for all }n\in\{0,1,\ldots,N\} (14)

i.e., hh is the complexity function for that word? If such a word does exist, how can it be constructed?

The question has an affirmative answer for the trivial cases q=1q=1 and q2q\geq 2, NqN\leq q, mentioned in Remark 1, so it has to be studied for q2,N>qq\geq 2,N>q. In the proof of Theorem 8 it was shown that, given the number N=qk+k,k1N=q^{k}+k,k\geq 1, there exists a word vv of length NN containing qkq^{k} distinct words of length k+1k+1, and also qkq^{k} words of length kk. This means that hh and pvp_{v} coincide on kk and k+1k+1. One the one hand, pv(k)=h(k)=qkp_{v}(k)=h(k)=q^{k} means that vv contains all possible words of length kk as factors, and this implies that it also contains all possible words of shorter lengths, hence h(n)=pv(n)=qnh(n)=p_{v}(n)=q^{n} for n{0,1,,k}n\in\{0,1,\ldots,k\}. On the other hand, pv(k+1)=h(k+1)=Nkp_{v}(k+1)=h(k+1)=N-k means that each of the NkN-k factors of length k+1k+1 of vv occurs exactly once, as there are precisely NkN-k available positions for a factor of this length, and this implies that longer factors occur only once too, hence h(n)=pv(n)=N+1nh(n)=p_{v}(n)=N+1-n for n{k+1,k+2,,N}n\in\{k+1,k+2,\ldots,N\}. We have shown that h(n)=pv(n)h(n)=p_{v}(n) for all n{0,1,,N}n\in\{0,1,\ldots,N\}, and the question is positively answered for N=qk+k,k1N=q^{k}+k,k\geq 1.

If we consider now N=qk+1+k,k1N=q^{k+1}+k,k\geq 1, case which corresponds to the choice m=0m=0 in the proof of Theorem 8, we obtain the existence of a word w=w0w=w_{0} of length NN containing all qk+1q^{k+1} words of length k+1k+1. The point ( k+1,pw(k+1)k+1,p_{w}(k+1) ) being on both the curves ( n,qnn,q^{n} ) and ( n,N+1nn,N+1-n ), it follows that h(n)=pw(n)h(n)=p_{w}(n) for all n{0,1,,N}n\in\{0,1,\ldots,N\}.

We mention at first a sufficient condition for the existence, for q2q\geq 2 and N>qN>q, of a word ww of length NN whose complexity function is equal to hh.

Lemma 9. Given an alphabet with cardA=q2\operatorname{card}A=q\geq 2, if for each k1k\geq 1 there exists a de Bruijn word vv of order k+1k+1 from which it is possible to obtain successively words shorter with one symbol so that the number of subwords of length k+1k+1 decreases by one, but the number of words of length kk remains qkq^{k}, until we are left with a word of length qk+kq^{k}+k, then for each N{qk+k,,qk+1+k}N\in\left\{q^{k}+k,\ldots,q^{k+1}+k\right\} there exists a word vNv_{N} with pvN=hp_{v_{N}}=h.

Proof. Let vNv_{N} be the word of length N{qk+k,,qk+1+k}N\in\left\{q^{k}+k,\ldots,q^{k+1}+k\right\} obtained from vv after having removed qk+1+kNq^{k+1}+k-N letters, at each step the number of subwords of length k+1k+1 being diminished by 1 , while the number of subwords of length kk remains constant. Then pvN(k+1)=Nk=h(k+1)p_{v_{N}}(k+1)=N-k=h(k+1) and pvN(k)=qk=h(k)p_{v_{N}}(k)=q^{k}=h(k), hence pvN(n)=h(n)p_{v_{N}}(n)=h(n) for each n{0,,N}n\in\{0,\ldots,N\}.

Remark 5. The condition in Lemma 9 is fulfilled if there exists a de Bruijn word of order k+1k+1 whose prefix is a de Bruijn word of order kk. In this case we can simply delete in turn one letter from the end of the word of order k+1k+1.

The existence of words which satisfy the conditions in Lemma 9 (in fact those in Remark 5) was proved for q3q\geq 3 by Vörös [20]. It follows also as a consequence of a stronger result obtained by Cummings and Wiedemann in Proposition 2 from [5]. In fact the overlap of the two de Bruijn sequences in [5] is even longer than it is needed in Remark 5. We remind that the de Bruijn graph BkB_{k} has as vertices the elements in AkA^{k} and an arc from any vertex x1xkx_{1}\ldots x_{k} to x2xkxk+1x_{2}\ldots x_{k}x_{k+1}, where xiAx_{i}\in A for i{1,,k+1}i\in\{1,\ldots,k+1\}. The graph
Bk+1B_{k+1} has as vertices the arcs of BkB_{k}, and the arcs in this graph are obtained by joining two consecutive arcs in BkB_{k}. An Eulerian circuit in BkB_{k} corresponds to a Hamiltonian one in Bk+1B_{k+1} and conversely. The result of Cummings and Wiedemann follows from the fact that if one removes from the Eulerian circuit in BkB_{k}, which corresponds to a de Bruijn sequence of order k+1k+1, the circuit corresponding to a de Bruijn sequence of order kk, the remaining graph is still Eulerian and connected (it is essential that q3q\geq 3 ).

Lemma 10. [5]. If q3q\geq 3 and k1k\geq 1 each de Bruijn sequence of order kk can be strongly embedded in a de Bruijn sequence of order k+tk+t with t1t\geq 1, i.e., the two sequences have the same symbols on the first qk+k+t1q^{k}+k+t-1 positions.

It follows that, for q3q\geq 3, there exist infinite sequences whose prefixes of length NN satisfy (14) for each NN\in\mathbb{N}. Such sequences were called in [13] and [20] supercomplex; similarly, a word of length MM was called supercomplex if all its prefixes of length NMN\leq M satisfied (14). In [13] and [20] it was shown that supercomplex sequences do not exist for binary alphabets, more precisely it was verified that a binary supercomplex word has the length at most 9 . This means that no de Bruijn sequence of order 2 can be embedded in a de Bruijn sequence of order 3. In [5] a general negative result is given for binary alphabets: in this case no de Bruijn sequence of order k2k\geq 2 ever embeds in a de Bruijn sequence of order k+1k+1 (even if we ask the coincidence to take place only for the first 2k+k12^{k}+k-1 positions, hence a weak embedding). It follows that for a binary alphabet we cannot obtain a word as that in the sufficient condition in Remark 5, unless k=1k=1. Nevertheless we can construct in this case a de Bruijn word of order k+1k+1 from which the sequences in Lemma 9 can be obtained, even if this word has not as a prefix a de Bruijn word of order kk.

Lemma 11. A finite number of cycles can be appended to any de Bruijn cycle of order kk over a binary alphabet in order to make it a de Bruijn cycle of order k+1k+1.

Proof. Let w=(x1x2k)w=\left(x_{1}\ldots x_{2^{k}}\right) be a de Bruijn cycle of order kk. It will be also a Hamiltonian circuit in the de Bruijn graph BkB_{k}. The graph GG, formed by all the vertices in AkA^{k} and the arcs in BkB_{k} which are not in the Hamiltonian circuit determined by ww, has each vertex of degree 2 (one outward and one inward arc). It follows that GG will be a union of vertex disjoint cycles, and ww and GG are arc disjoint. Each of these cycles will have common vertices with the Hamiltonian circuit determined by ww, hence they can be appended one by one to it to form finally an Eulerian circuit in BkB_{k}, that is a de Bruijn cycle of order k+1k+1.

Now we can state
Theorem 12. For each alphabet AA with cardA=q1\operatorname{card}A=q\geq 1 and for each NN\in\mathbb{N} there exists a word of length NN whose complexity function coincides with the function hh.

Proof. For q=1q=1, or q2q\geq 2 and NqN\leq q, the result was already proved in Remark 1.

Let q=2q=2 and k1k\geq 1 so that N{2k+k,,2k+1+k}N\in\left\{2^{k}+k,\ldots,2^{k+1}+k\right\}. Consider a de Bruijn cycle of order kk (constructed for example using Martin’s algorithm) and extend it as in Lemma 11, by adding vertex disjoint cycles, to a de Bruijn cycle of order k+1k+1. Write it as a de Bruijn word such that it ends with the letters of one of the appended cycles. When we remove one by one all the symbols in that cycle, the number of subwords of length k+1k+1 will decrease at each step by one, but the number of subwords of length kk will remain the same (all these subwords are included in the initial de Bruijn cycle). Write again the obtained cycle as a word which ends with another appended cycle and delete in turn the last symbol until the cycle disappears. Finally we are left with a word of length 2k+k2^{k}+k obtained from the initial de Bruijn cycle of order kk, which contains 2k2^{k} words of length k+1k+1 and 2k2^{k} words of length kk.

If we are not interested to obtain all the words of length N{qk+k,N\in\left\{q^{k}+k,\ldots\right., qk+1+k}\left.q^{k+1}+k\right\}, but only a specific one, we can apply a result of Shallit [19]: For each i{1,,2k}i\in\left\{1,\ldots,2^{k}\right\} the graph BkB_{k} contains a cycle of length ii that can be used to construct a closed chain of length 2k+i2^{k}+i which visits every vertex at least once.

Finally, let q3q\geq 3 and k1k\geq 1 so that N{qk+k,,qk+1+k}N\in\left\{q^{k}+k,\ldots,q^{k+1}+k\right\}. Applying Lemma 10 for t=1t=1, we obtain the existence of a de Bruijn word of order k+1k+1 which has as prefix a de Bruijn word of order kk, hence it satisfies the conditions in Lemma 9. It follows that for each N{qk+k,,qk+1+k}N\in\left\{q^{k}+k,\ldots,q^{k+1}+k\right\} there exists a word of length NN (obtained by successively deleting a symbol from the end of the de Bruijn word of order k+1k+1 ) whose complexity function is the function hh corresponding to that NN.

Example 5. Let us first consider the case of a binary alphabet A={a,b}A=\{a,b\}. We shall construct, as in the proof of Theorem 12, words uNu_{N} with N{1,2,,10}{37,,69}N\in\{1,2,\ldots,10\}\cup\{37,\ldots,69\} for which puN=hp_{u_{N}}=h. We can obviously consider u1=au_{1}=a and u2=abu_{2}=ab. We have a weak embedding, marked by a gap, of the de Bruijn word of order 1, abab, in the de Bruijn word of order 2, abbaaabbaa (situation which is no longer possible for words of order kk, respectively k+1k+1 for k2k\geq 2 ). We obtain in turn u5=abbaa,u4=abbau_{5}=abbaa,u_{4}=abba and u3=abbu_{3}=abb. Let us now consider for k=2k=2 the Martin cycle w=(abba)w=(abba), which corresponds to the word u5=abbaau_{5}=abbaa. The graph GG obtained from B2B_{2} by removing all the arcs of ww is the union of the cycles (a)(a), (b)(b) and (ab)(ab), i.e., aaaa,bbbbaa\rightarrow aa,bb\rightarrow bb, respectively abbaabab\rightarrow ba\rightarrow ab. Appending these to the cycle ww we obtain for instance the de Bruijn cycle of order 3u=(aa¯a¯b¯b¯b¯a)3u=(a\underline{a}\underline{a}\underline{b}\underline{b}\underline{b}a), where we underlined the appended cycles; we write it as

u10=abbb¯aaa¯bab¯;u_{10}=abb\underline{b}aa\underline{a}b\underline{ab};

deleting the symbols of ( abab ) from the end, and then those in the loops ( aa ) and (b) (which can be deleted without shifting the cycle) we obtain in turn

u9=abbb¯aaa¯ba,u8=abbb¯aaa¯b,u7=abbb¯aab,u6=abbaab.u_{9}=abb\underline{b}aa\underline{a}ba,u_{8}=abb\underline{b}aa\underline{a}b,u_{7}=abb\underline{b}aab,u_{6}=abbaab.

For all these words obtained from uu we have puN(2)=4p_{u_{N}}(2)=4 and puN(3)=N2p_{u_{N}}(3)=N-2, N{6,,10}N\in\{6,\ldots,10\}.

In general, besides the loops ( aa ) and ( bb ), for greater values of kk we can have in GG more than one cycle. We shall consider now k=5k=5 and we shall construct the words of length N{37,,69}N\in\{37,\ldots,69\}. To avoid too lengthy words we shall write xix^{i} for the concatenation of ii letters xx. For k=5k=5, the Martin cycle is w=(a4b5ab3a2b2abab2a3baba2ba)w=\left(a^{4}b^{5}ab^{3}a^{2}b^{2}abab^{2}a^{3}baba^{2}ba\right). The graph GG obtained from B5B_{5} by removing the arcs of ww is the union of the cycles

(a),(b),(ab),(ab2),(a4ba2b3aba3b2a2bab4),(a),(b),(ab),\left(ab^{2}\right),\left(a^{4}ba^{2}b^{3}aba^{3}b^{2}a^{2}bab^{4}\right), (15)

where, for example, ( aa ) represents a5a5a^{5}\rightarrow a^{5}, and ( ab2ab^{2} ) is ab2abb2ab2bab2aab2abab^{2}ab\rightarrow b^{2}ab^{2}\rightarrow bab^{2}a\rightarrow ab^{2}ab. A de Bruijn cycle of order 6 obtained by appending the five cycles (which will be underlined) is

u=(a4a¯b4a¯4ba2b3aba3b2a2bab4bb¯ab2ab2¯ba2b2ababab¯ba3baba2ba)u=\left(a^{4}\underline{a}b^{4}\underline{a}^{4}ba^{2}b^{3}aba^{3}b^{2}a^{2}bab^{4}b\underline{b}ab^{2}\underline{ab^{2}}ba^{2}b^{2}abab\underline{ab}ba^{3}baba^{2}ba\right)

and we can write

u69=ab5b¯ab2ab2¯ba2b2abababba3baba2ba5a¯b4a4ba2b3aba3b2a2bab4\displaystyle u_{69}=ab^{5}\underline{b}ab^{2}\underline{ab^{2}}ba^{2}b^{2}abababba^{3}baba^{2}ba^{5}\underline{a}b^{4}a^{4}ba^{2}b^{3}aba^{3}b^{2}a^{2}bab^{4}
u68=ab5b¯ab2ab2¯ba2b2ababab¯ba3baba2ba5a¯b4a4ba2b3aba3b2a2bab3\displaystyle u_{68}=ab^{5}\underline{b}ab^{2}\underline{ab^{2}}ba^{2}b^{2}abab\underline{ab}ba^{3}baba^{2}ba^{5}\underline{a}b^{4}a^{4}ba^{2}b^{3}aba^{3}b^{2}a^{2}bab^{3}
\displaystyle\cdots
u44=ab5b¯ab2ab2¯ba2b2ababab¯ba3baba2ba5a¯b4\displaystyle u_{44}=ab^{5}\underline{b}ab^{2}\underline{ab^{2}}ba^{2}b^{2}abab\underline{ab}ba^{3}baba^{2}ba^{5}\underline{a}b^{4}

the last one corresponding to the cycle (a¯b5b¯ab2a¯b2ba2b2ababab¯ba3baba2ba5)\left(\underline{a}b^{5}\underline{b}ab^{2}\underline{a}b^{2}ba^{2}b^{2}abab\underline{ab}ba^{3}baba^{2}ba^{5}\right). We write it as a word ending with the next cycle, (ab2)\left(ab^{2}\right), in the union (15)

u44=b2ab3a2b2ababab¯ba3baba2ba5a¯b5b¯ab2ab2¯,u_{44}^{\prime}=b^{2}ab^{3}a^{2}b^{2}abab\underline{ab}ba^{3}baba^{2}ba^{5}\underline{a}b^{5}\underline{b}ab^{2}\underline{ab^{2}},

and we obtain

u43=b2ab3a2b2abababba3baba2ba5a¯b5b¯ab2ab\displaystyle u_{43}=b^{2}ab^{3}a^{2}b^{2}abababba^{3}baba^{2}ba^{5}\underline{a}b^{5}\underline{b}ab^{2}ab
u42=b2ab3a2b2ababab¯ba3baba2ba5a¯b5b¯ab2a\displaystyle u_{42}=b^{2}ab^{3}a^{2}b^{2}abab\underline{ab}ba^{3}baba^{2}ba^{5}\underline{a}b^{5}\underline{b}ab^{2}a
u41=b2ab3a2b2ababab¯ba3baba2ba5a¯b5b¯ab2\displaystyle u_{41}=b^{2}ab^{3}a^{2}b^{2}abab\underline{ab}ba^{3}baba^{2}ba^{5}\underline{a}b^{5}\underline{b}ab^{2}

We write now the corresponding word ending with the cycle ( abab )

u41=babab2a3baba2ba5a¯b5b¯ab3a2b2ababab¯u_{41}^{\prime}=babab^{2}a^{3}baba^{2}ba^{5}\underline{a}b^{5}\underline{b}ab^{3}a^{2}b^{2}abab\underline{ab}

and from this we get

u40=babab2a3baba2ba5a¯b5b¯ab3a2b2ababa\displaystyle u_{40}=babab^{2}a^{3}baba^{2}ba^{5}\underline{a}b^{5}\underline{b}ab^{3}a^{2}b^{2}ababa
u39=babab2a3baba2ba5a¯b5b¯ab3a2b2abab\displaystyle u_{39}=babab^{2}a^{3}baba^{2}ba^{5}\underline{a}b^{5}\underline{b}ab^{3}a^{2}b^{2}abab

Deleting the loop ( bb ) and then the loop ( aa ) we obtain at last

u38=babab2a3baba2ba5ab5ab3a2b2abab\displaystyle u_{38}=babab^{2}a^{3}baba^{2}ba^{5}ab^{5}ab^{3}a^{2}b^{2}abab
u37=babab2a3baba2ba5b5ab3a2b2abab\displaystyle u_{37}=babab^{2}a^{3}baba^{2}ba^{5}b^{5}ab^{3}a^{2}b^{2}abab

We have puN(5)=32p_{u_{N}}(5)=32 and puN(6)=N5p_{u_{N}}(6)=N-5 for N{37,,69}N\in\{37,\ldots,69\}.
Let now a 3-letter alphabet A={a,b,c}A=\{a,b,c\} be given (the situation is similar for any q>3q>3 ). We have obviously w1=a,w2=ab,w3=abcw_{1}=a,w_{2}=ab,w_{3}=abc. From a de

Bruijn word of order 2 which contains a strongly embedded de Bruijn word of order 1
abca accbba,
the gap marking the end of the overlapping, we can obtain the words

w10=abcaaccbba\displaystyle w_{10}=abcaaccbba
\displaystyle\ldots
w4=abca\displaystyle w_{4}=abca

Similarly, from the de Bruijn word of order 3 which contains a strongly embedded de Bruijn word of order 2

aabbacbccaa cababcacccbbbcbaaa,

we can obtain successively the words

w29=aabbacbccaacababcacccbbbcbaaa\displaystyle w_{29}=aabbacbccaacababcacccbbbcbaaa
\displaystyle\ldots
w11=aabbacbccaa,\displaystyle w_{11}=aabbacbccaa,

which satisfy pwN(2)=9p_{w_{N}}(2)=9 and pwN(3)=N2p_{w_{N}}(3)=N-2 for N{11,,29}N\in\{11,\ldots,29\}. \square

Remark 6. It is clear now that Theorem 8 is a consequence of the stronger result from Theorem 12. However, if one is interested only in obtaining words ww with max{pw(n):1,,N}=hmax \max\left\{p_{w}(n):1,\ldots,N\right\}=h_{\text{max }}, the constructive methods in the proof of Theorem 8 are simpler and faster. \square

6. OTHER COMPLEXITY MEASURES FOR FINITE WORDS

The complexity function pwp_{w} which was used throughout the paper has the advantage that it can be defined in the same way both for infinite sequences and for finite words, as it was stated in the introduction. As far as finite words are concerned, the first measure of subword complexity seems to have been introduced by Heinz [11] as the total number of factors of ww,

K(w)=n=0Npw(n).K(w)=\sum_{n=0}^{N}p_{w}(n).

The problem of studying the maximum of K(w)K(w) over all the words of length NN over a finite alphabet with qq elements was a central one. It is easy to see that the maximum of K(w)K(w) over the words in ANA^{N} is attained at w0w_{0} if for each n{0,,N}n\in\{0,\ldots,N\} the maximum of pw(n)p_{w}(n) is attained at pw0(n)p_{w_{0}}(n). One has then the delimitation (obtained in [20])

K(w)n=0Nh(n),K(w)\leq\sum_{n=0}^{N}h(n),

with hh defined in (4), and, having in mind the explicit form of hh in (5),

K(w)qk+11q1+(Nk)(Nk+1)2,K(w)\leq\frac{q^{k+1}-1}{q-1}+\frac{(N-k)(N-k+1)}{2}, (16)

where kk is the unique natural number for which qk+kNqk+1+kq^{k}+k\leq N\leq q^{k+1}+k. The bound in (16) appeared in [12] and [19]. In view of Theorem 12, there are words of length NN whose total complexity K(w)K(w) equals the value in the right hand side of (16). The existence of such words, as it was already mentioned in the proof of Theorem 12, was established for binary alphabets in [19].

There are also other notions of complexity for finite words. The maximal complexity of a word wANw\in A^{N}, defined by Rauzy, is

𝒞(w)=max{pw(n):n{0,1,,N}}.\mathcal{C}(w)=\max\left\{p_{w}(n):n\in\{0,1,\ldots,N\}\right\}.

A notion of global complexity for finite words is given in [1], namely the global maximal complexity in ANA^{N}

𝒦(N)=max{𝒞(w):wAN}.\mathcal{K}(N)=\max\left\{\mathcal{C}(w):w\in A^{N}\right\}.

By (N)\mathcal{R}(N) it is denoted the set of the values ii for which there exists a word wANw\in A^{N} such that pw(i)=𝒦(N):p_{w}(i)=\mathcal{K}(N):

(n)={i{0,1,,N}: there exists wAN,pw(i)=𝒦(n)}.\mathcal{R}(n)=\left\{i\in\{0,1,\ldots,N\}:\text{ there exists }w\in A^{N},p_{w}(i)=\mathcal{K}(n)\right\}.

With these notations we obtain from Theorem 12
Corollary 13. For each NN\in\mathbb{N}, the global maximal complexity is given by 𝒦(N)=max{h(n):n{0,1,,N}}\mathcal{K}(N)=\max\{h(n):n\in\{0,1,\ldots,N\}\}, the function hh being defined by (4).

Proof. We have

𝒦(N)\displaystyle\mathcal{K}(N) =max{max{pw(n):n{0,1,,N}}:wAN}\displaystyle=\max\left\{\max\left\{p_{w}(n):n\in\{0,1,\ldots,N\}\right\}:w\in A^{N}\right\}
=max{max{pw(n):wAN}:n{0,1,,N}}\displaystyle=\max\left\{\max\left\{p_{w}(n):w\in A^{N}\right\}:n\in\{0,1,\ldots,N\}\right\}
=max{h(n):n{0,1,,N}},\displaystyle=\max\{h(n):n\in\{0,1,\ldots,N\}\},

the last equality following from the fact that hh coincides with the complexity function pwp_{w} for at least one word of length NN.

Applying the result in Proposition 2, the values for 𝒦(N)\mathcal{K}(N) and (N)\mathcal{R}(N) given in [1] can be easily obtained.

Corollary 14. For qk+kN<qk+1+k+1q^{k}+k\leq N<q^{k+1}+k+1, we have 𝒦(N)=Nk\mathcal{K}(N)=N-k. If N=qk+kN=q^{k}+k, then (N)={k,k+1}\mathcal{R}(N)=\{k,k+1\}; if qk+k<N<qk+1+k+1q^{k}+k<N<q^{k+1}+k+1, then (N)={k+1}\mathcal{R}(N)=\{k+1\}.

Acknowledgment. This research was partially supported by the program of academic exchange CNRS-Romanian Academy.

REFERENCES

[1] Anisiu, M.-C., Blázsik, Z. and Kása, Z., Maximal complexity of finite words, Pure Math. Appl., 13, pp. 39-48, 2002.
[2] de Bruijn, N. G., A combinatorial problem, Nederl. Akad. Wetensch. Proc., 49, pp. 758-764, 1946 = Indag. Math., 8, pp. 461-467, 1946.
[3] de Bruijn, N. G., Acknowledgement of priority to C. Flye Sainte-Marie on the counting of circular arrangements of 2n2^{n} zeros and ones that show each nn-letter word exactly once, T. H. -Report 75-WSK-06, Technological University Eindhoven, the Netherlands, pp. 114, 1975.
[4] Champernowne, D. G., The construction of decimals normal in the scale of ten, J. London Math. Soc., 8, pp. 254-260, 1933.
[5] Cummings, L. J. and Wiedemann, D., Embedded de Bruijn sequences, Proceedings of the 7th Southeastern international conference on combinatorics, graph theory, and computing (Boca Raton, Florida, 1986), Congr. Numer., 53, pp. 155-160, 1986.
[6] Ehrenfeucht, A., Lee, K. P. and Rozenberg, G., Subword complexities of various classes of deterministic developmental languages without interactions, Theoret. Comput. Sci., 1, pp. 59-75, 1975.
[7] Flye Sainte-Marie, C., Solution to question nr. 48, l’Intermédiaire des Mathématiciens, 1, pp. 107-110, 1894.
[8] Fredericksen, H., A survey of full length nonlinear shift register cycle algorithms, SIAM Review, 24, pp. 195-221, 1982.
[9] Games, R. A., A generalized recursive construction for de Bruijn sequences, IEEE Trans. Inform. Theory, 29, pp. 843-850, 1983.
[10] Good, I. J., Normal recurring decimals, J. London Math. Soc., 21, pp. 167-169, 1946.
[11] Heinz, M., Zur Teilwortkomplexität für Wörter und Folgen über einem endlichen Alphabet, EIK, 13, pp. 27-38, 1977.
[12] Hunyadváry, L. and Iványi, A., On some complexity measures of words, Dep. Math., Karl Marx Univ. Econ., Budapest 1984-2, pp. 67-82, 1984.
[13] Hunyadváry, L. and Iványi, A., Construction of complex chains sequences and ring sequences, in Abstracts of Colloquium Theory of Algorithms, Pécs, p. 20, 1984.
[14] Levé, F. and Séébold, P., Proof of a conjecture on word complexity, Journées Montoises d’Informatique Théorique (Marne-la-Vallée, 2000), Bull. Belg. Math. Soc. Simon Stevin, 8, pp. 277-291, 2001.
[15] de Luca, A., On the combinatorics of finite words, Theoret. Comput. Sci., 218, pp. 13-39, 1999.
[16] Martin, M. H., A problem in arrangements, Bull. American Math. Soc., 40, pp. 859864, 1934.
[17] Morse, M. and Hedlund, G. A., Symbolic dynamics, Amer. J. Math., 60, pp. 815-866, 1938.
[18] Ralston, A., A new memoryless algorithm for de Bruijn sequences, J. Algorithms, 2, pp. 50-62, 1981.
[19] Shallit, J., On the maximum number of distinct factors in a binary string, Graph Comb., 9, pp. 197-200, 1993.
[20] Vörös, N., On the complexity measures of symbol-sequences, in Proceedings of the Conference of Young Programmers and Mathematicians (ed. A. Iványi), Eötvös Loránd University, Budapest, pp. 43-50, 1984.

Received by the editors: March 10, 2004.

2004

Related Posts