Maximal complexity of finite words

Abstract

The subword complexity of a finite \(\omega\) of length \(N\) is a function which associates to each \(n\leq N\) the number of all distinct subwords of \(\omega\) having the length \(n\). We define the maximal complexity \(C\left(\omega \right)\) as the maximum of the subword complexity for \(n\in\{1,2,\ldots,N\}\), and the global maximal complexity \(K\left( N\right)\) as the maximum of \(C\left( \omega \right)\) for all words \(\omega\) of a fixed length \(N\) over a finite alphabet. By \(R\left( N\right)\) we will denote the set of the values \(i\) for which there exists a word of length \(N\) having \(K\left( N\right)\) subwords of length \(i\). \(M\left( N\right)\) represents the number of words of length \(N\) whose maximal complexity is equal to the global maximal complexity.

The values of \(K\left( N\right)\) and \(R)N\) are obtained; methods to compute \(M\left( N\right)\) using the de Bruijn graphs and trees are given. An open problem is to find a formula for \(M\left( N\right)\).

Authors

Mira-Cristiana Anisiu
Tiberiu Popoviciu Institute of Numerical Analysis Cluj-Napoca, Romanian Academy

Zoltan Blazsik
Bolyai Institute of Mathematics, University of Szeged

Zoltan Kasa
Faculty of Mathematics and Informatics, Babes-Bolyai University of Cluj-Napoca

Keywords

complexity of words; de Bruijn graphs and trees.

Paper coordinates

M.-C. Anisiu, Z. Blázsik, Z. Kása, Maximal complexity of finite words, Pure Math. Appl. 13 (1-2) (2002), 39-48

PDF

About this paper

Journal

Pure Mathematics and Applications

Publisher Name

SciencePC

DOI
Print ISSN

2326-9790

Online ISSN

 2326-9812

google scholar link

[1] T. van Aardenne-Ehrenfest, N.G. de Bruijn, Circuits and trees in oriented linear graphs, Simon Stevin 28 (1951), 203-217.
[2] J. Bond, A. Ivanyi, Modelling of interconnection networks using de Bruijn graphs, Third Conference of Program Designer, Ed. A. Ivanyi, Budapest, 1987, 75-87.
[3] N.G. de Bruijn, A combinatorial problem, Nederl. Akad. Wetensch. Proc. 49(1946), 758-764.
[4] M. Heinz, Zur Teilwortkomplexitat Fur Worter und Folgen uber einem endlichen Alphabet, EIK 13 (1977), 27-38.
[5] M.Lothaire Combinatorics on words, Addison-Welsely, Reading, MA, 1983.
[6] A. de Luca, On the combinatorics of finite words, Theor. Comput. Sc. 218 (1999), 13-30.
[7] M.H. Martin, A problem in arrangements, Bull. A.M.S. 40 (1934), 859-864.

2002-Anisiu-MaxFinit

Maximal complexity of finite words

Mira-Cristiana Anisiu* Zoltán Blázsik ^(†)quad{ }^{\dagger} \quad Zoltán Kása ^(‡){ }^{\ddagger}

Abstract

The subword complexity of a finite word w w www of length N N NNN is a function which associates to each n N n N n <= Nn \leq NnN the number of all distinct subwords of w w www having the length n n nnn. We define the maximal complexity C ( w ) C ( w ) C(w)C(w)C(w) as the maximum of the subword complexity for n { 1 , 2 , , N } n { 1 , 2 , , N } n in{1,2,dots,N}n \in\{1,2, \ldots, N\}n{1,2,,N}, and the global maximal complexity K ( N ) K ( N ) K(N)K(N)K(N) as the maximum of C ( w ) C ( w ) C(w)C(w)C(w) for all words w w www of a fixed length N N NNN over a finite alphabet. By R ( N ) R ( N ) R(N)R(N)R(N) we will denote the set of the values i i iii for which there exits a word of length N N NNN having K ( N ) K ( N ) K(N)K(N)K(N) subwords of length i . M ( N ) i . M ( N ) i.M(N)i . M(N)i.M(N) represents the number of words of length N N NNN whose maximal complexity is equal to the global maximal complexity.

The values of K ( N ) K ( N ) K(N)K(N)K(N) and R ( N ) R ( N ) R(N)R(N)R(N) are obtained; methods to compute M ( N ) M ( N ) M(N)M(N)M(N) using the de Bruijn graphs and trees are given. An open problem is to find a formula for M ( N ) M ( N ) M(N)M(N)M(N).

Keywords: complexity of words, de Bruijn graphs and trees
AMS Classification: 68R15

1 Introduction

A finite word is a finite sequence of letters over a finite alphabet A A A\mathcal{A}A, and can be represented as a concatenation of its letters:
w = w 1 w 2 w N with w i A for 1 i N . w = w 1 w 2 w N  with  w i A  for  1 i N . w=w_(1)w_(2)dotsw_(N)quad" with "w_(i)inA" for "1 <= i <= N.w=w_{1} w_{2} \ldots w_{N} \quad \text { with } w_{i} \in \mathcal{A} \text { for } 1 \leq i \leq N .w=w1w2wN with wiA for 1iN.
The number N N NNN is the length of w w www and is denoted by | w | | w | |w||w||w|. A word with no letters (i.e. of length 0 ) is the empty word, denoted by ε ε epsi\varepsilonε. We denote by A + A + A^(+)\mathcal{A}^{+}A+ the set of nonempty words over A A A\mathcal{A}A, by A = A + { ε } A = A + { ε } A^(**)=A^(+)uu{epsi}\mathcal{A}^{*}=\mathcal{A}^{+} \cup\{\varepsilon\}A=A+{ε} the set of words over A A A\mathcal{A}A and by A n A n A^(n)\mathcal{A}^{n}An the set of words of length n n nnn over A A A\mathcal{A}A.
A word u u uuu is a factor (or subword) of w w www if there exist words x , y A x , y A x,y inA^(**)x, y \in \mathcal{A}^{*}x,yA such that w = x u y w = x u y w=xuyw=x u yw=xuy. If x ε x ε x!=epsix \neq \varepsilonxε and y ε y ε y!=epsiy \neq \varepsilonyε then u u uuu is a proper factor (proper subword) of w w www. If x = ε ( y = ε ) x = ε ( y = ε ) x=epsi(y=epsi)x=\varepsilon(y=\varepsilon)x=ε(y=ε) then u u uuu is a prefix (suffix) of w w www. Let us denote by F ( w ) F ( w ) F(w)F(w)F(w) the set of all nonempty factors of w w www, and by F n ( w ) F n ( w ) F_(n)(w)F_{n}(w)Fn(w) the set of all factors of w w www of length n n nnn (hence F n ( w ) = F ( w ) A n F n ( w ) = F ( w ) A n F_(n)(w)=F(w)nnA^(n)F_{n}(w)=F(w) \cap \mathcal{A}^{n}Fn(w)=F(w)An ).
The subword complexity of w w www counts the number of all distinct factors of a given length occurring in w w www and is defined as
f w ( n ) = Card ( F n ( w ) ) for 1 n | w | . f w ( n ) = Card F n ( w )  for  1 n | w | . f_(w)(n)=Card(F_(n)(w))quad" for "1 <= n <= |w|.f_{w}(n)=\operatorname{Card}\left(F_{n}(w)\right) \quad \text { for } 1 \leq n \leq|w| .fw(n)=Card(Fn(w)) for 1n|w|.
Clearly f w ( 1 ) Card ( A ) f w ( 1 ) Card ( A ) f_(w)(1) <= Card(A)f_{w}(1) \leq \operatorname{Card}(\mathcal{A})fw(1)Card(A) and we can consider f w ( n ) = 0 f w ( n ) = 0 f_(w)(n)=0f_{w}(n)=0fw(n)=0 for n > | w | n > | w | n > |w|n>|w|n>|w|. The subword complexity has been extensively studied in [4], [5] and [6].
The maximal value of the subword complexity f w ( n ) f w ( n ) f_(w)(n)f_{w}(n)fw(n) for 1 n | w | 1 n | w | 1 <= n <= |w|1 \leq n \leq|w|1n|w| is called the maximal complexity of w w www and is denoted by C ( w ) C ( w ) C(w)C(w)C(w) :
C ( w ) = max { f w ( n ) n 1 } C ( w ) = max f w ( n ) n 1 C(w)=max{f_(w)(n)∣n >= 1}C(w)=\max \left\{f_{w}(n) \mid n \geq 1\right\}C(w)=max{fw(n)n1}
The global maximal complexity in A N A N A^(N)\mathcal{A}^{N}AN is equal to
K ( N ) = max { C ( w ) w A N } K ( N ) = max C ( w ) w A N K(N)=max{C(w)∣w inA^(N)}K(N)=\max \left\{C(w) \mid w \in \mathcal{A}^{N}\right\}K(N)=max{C(w)wAN}
We shall denote by R ( N ) R ( N ) R(N)R(N)R(N) the set of values i i iii for which there exists a word w A N w A N w inA^(N)w \in \mathcal{A}^{N}wAN such that f w ( i ) = K ( N ) f w ( i ) = K ( N ) f_(w)(i)=K(N)f_{w}(i)=K(N)fw(i)=K(N) :
R ( N ) = { i { 1 , 2 , , N } w A N : f w ( i ) = K ( N ) } R ( N ) = i { 1 , 2 , , N } w A N : f w ( i ) = K ( N ) R(N)={i in{1,2,dots,N}∣EE w inA^(N):f_(w)(i)=K(N)}R(N)=\left\{i \in\{1,2, \ldots, N\} \mid \exists w \in \mathcal{A}^{N}: f_{w}(i)=K(N)\right\}R(N)={i{1,2,,N}wAN:fw(i)=K(N)}
The number of words in A N A N A^(N)\mathcal{A}^{N}AN with the maximal complexity equal to the global maximal complexity will be denoted by M ( N ) M ( N ) M(N)M(N)M(N) :
M ( N ) = Card ( { w A N : C ( w ) = K ( N ) } ) . M ( N ) = Card w A N : C ( w ) = K ( N ) . M(N)=Card({w inA^(N):C(w)=K(N)}).M(N)=\operatorname{Card}\left(\left\{w \in \mathcal{A}^{N}: C(w)=K(N)\right\}\right) .M(N)=Card({wAN:C(w)=K(N)}).
Remark 1. If Card ( A ) = q Card ( A ) = q Card(A)=q\operatorname{Card}(\mathcal{A})=qCard(A)=q, for q = 1 q = 1 q=1q=1q=1 the only word of length N N NNN is w 0 = 0 0 N w 0 = 0 0 N w_(0)=ubrace(0dots0ubrace)_(N)w_{0}= \underbrace{0 \ldots 0}_{N}w0=00N for which f w 0 ( i ) = 1 , i { 1 , 2 , , N } f w 0 ( i ) = 1 , i { 1 , 2 , , N } f_(w_(0))(i)=1,i in{1,2,dots,N}f_{w_{0}}(i)=1, i \in\{1,2, \ldots, N\}fw0(i)=1,i{1,2,,N}, hence C ( w ) = 1 = K ( N ) C ( w ) = 1 = K ( N ) C(w)=1=K(N)C(w)=1=K(N)C(w)=1=K(N), R ( N ) = { 1 , 2 , , N } R ( N ) = { 1 , 2 , , N } R(N)={1,2,dots,N}R(N)=\{1,2, \ldots, N\}R(N)={1,2,,N} and M ( N ) = 1 M ( N ) = 1 M(N)=1M(N)=1M(N)=1. For q 2 q 2 q >= 2q \geq 2q2, but N q N q N <= qN \leq qNq, for each word w 1 w 1 w_(1)w_{1}w1 which contains N N NNN distinct elements of A A A\mathcal{A}A we have C ( w 1 ) = f w 1 ( 1 ) = N = K ( N ) C w 1 = f w 1 ( 1 ) = N = K ( N ) C(w_(1))=f_(w_(1))(1)=N=K(N)C\left(w_{1}\right)=f_{w_{1}}(1)=N=K(N)C(w1)=fw1(1)=N=K(N), R ( N ) = { 1 } R ( N ) = { 1 } R(N)={1}R(N)=\{1\}R(N)={1} and M ( N ) = P q N M ( N ) = P q N M(N)=P_(q)^(N)M(N)=P_{q}^{N}M(N)=PqN (permutations of N N NNN elements taken from q q qqq ).
Some values for K ( N ) , R ( N ) K ( N ) , R ( N ) K(N),R(N)K(N), R(N)K(N),R(N) and M ( N ) M ( N ) M(N)M(N)M(N) in the case of an alphabet of 2 letters are given in Table 1. In the case N = 3 N = 3 N=3N=3N=3 the following six words have maximal complexity: 001 , 010 , 011 , 100 , 101 , 110 001 , 010 , 011 , 100 , 101 , 110 001,010,011,100,101,110001,010,011,100,101,110001,010,011,100,101,110. For each of them f w ( 1 ) = 2 , f w ( 2 ) = 2 , f w ( 3 ) = 1 f w ( 1 ) = 2 , f w ( 2 ) = 2 , f w ( 3 ) = 1 f_(w)(1)=2,f_(w)(2)=2,f_(w)(3)=1f_{w}(1)=2, f_{w}(2)=2, f_{w}(3)=1fw(1)=2,fw(2)=2,fw(3)=1, so K ( 3 ) = 2 , R ( 3 ) = { 1 , 2 } K ( 3 ) = 2 , R ( 3 ) = { 1 , 2 } K(3)=2,R(3)={1,2}K(3)=2, R(3)=\{1,2\}K(3)=2,R(3)={1,2} and M ( 3 ) = 6 M ( 3 ) = 6 M(3)=6M(3)=6M(3)=6.

2 Global maximal subword complexity of finite words

In this section we shall compute the values of the global maximal complexity K ( N ) K ( N ) K(N)K(N)K(N), as well as those of R ( N ) R ( N ) R(N)R(N)R(N), proving that they are in agreement with the values in Table 1. Some special cases being solved in Remark 1, in what follows we shall consider alphabets with Card ( A ) = q 2 Card ( A ) = q 2 Card(A)=q >= 2\operatorname{Card}(\mathcal{A})=q \geq 2Card(A)=q2 and words of length N > q N > q N > qN>qN>q.
We shall use the following result.
Lemma 1. [7] For each k N k N k inN^(**)k \in N^{*}kN, the shortest word containing all the q k q k q^(k)q^{k}qk words of length k k kkk has q k + k 1 q k + k 1 q^(k)+k-1q^{k}+k-1qk+k1 letters (hence in this word each of the q k q k q^(k)q^{k}qk words of length k k kkk appears only once).
N N NNN K ( N ) K ( N ) K(N)K(N)K(N) R ( N ) R ( N ) R(N)R(N)R(N) M ( N ) M ( N ) M(N)M(N)M(N)
2 2 1 2
3 2 1, 2 6
4 3 2 8
5 4 2 4
6 4 2, 3 36
7 5 3 42
8 6 3 48
9 7 3 40
10 8 3 16
11 8 3, 4 558
12 9 4 718
13 10 4 854
14 11 4 920
15 12 4 956
16 13 4 960
17 14 4 912
18 15 4 704
19 16 4 256
20 16 4, 5 79006
N K(N) R(N) M(N) 2 2 1 2 3 2 1, 2 6 4 3 2 8 5 4 2 4 6 4 2, 3 36 7 5 3 42 8 6 3 48 9 7 3 40 10 8 3 16 11 8 3, 4 558 12 9 4 718 13 10 4 854 14 11 4 920 15 12 4 956 16 13 4 960 17 14 4 912 18 15 4 704 19 16 4 256 20 16 4, 5 79006| $N$ | $K(N)$ | $R(N)$ | $M(N)$ | | :--- | :--- | :--- | :--- | | 2 | 2 | 1 | 2 | | 3 | 2 | 1, 2 | 6 | | 4 | 3 | 2 | 8 | | 5 | 4 | 2 | 4 | | 6 | 4 | 2, 3 | 36 | | 7 | 5 | 3 | 42 | | 8 | 6 | 3 | 48 | | 9 | 7 | 3 | 40 | | 10 | 8 | 3 | 16 | | 11 | 8 | 3, 4 | 558 | | 12 | 9 | 4 | 718 | | 13 | 10 | 4 | 854 | | 14 | 11 | 4 | 920 | | 15 | 12 | 4 | 956 | | 16 | 13 | 4 | 960 | | 17 | 14 | 4 | 912 | | 18 | 15 | 4 | 704 | | 19 | 16 | 4 | 256 | | 20 | 16 | 4, 5 | 79006 |
Table 1
An algorithm for obtaining such a word for A = { e 1 , e 2 , , e q } A = e 1 , e 2 , , e q A={e_(1),e_(2),dots,e_(q)}\mathcal{A}=\left\{e_{1}, e_{2}, \ldots, e_{q}\right\}A={e1,e2,,eq} is the following [7]:
i i iii. Each of the first k 1 k 1 k-1k-1k1 symbols is equal to e 1 e 1 e_(1)e_{1}e1.
i i i i iii iii. If the sequence a 1 a 2 a k a m k + 1 a m 1 a 1 a 2 a k a m k + 1 a m 1 a_(1)a_(2)dotsa_(k)dotsa_(m-k+1)dotsa_(m-1)a_{1} a_{2} \ldots a_{k} \ldots a_{m-k+1} \ldots a_{m-1}a1a2akamk+1am1 (with a 1 = = a k 1 = e 1 , m k a 1 = = a k 1 = e 1 , m k a_(1)=dots=a_(k-1)=e_(1),m >= ka_{1}=\ldots=a_{k-1}= e_{1}, m \geq ka1==ak1=e1,mk and the a a aaa 's representing the e e eee 's in a certain order) has been obtained, the symbol a m a m a_(m)a_{m}am to be added is the e i e i e_(i)e_{i}ei with the greatest subscript possible such that a m k + 1 a m 1 a m a m k + 1 a m 1 a m a_(m-k+1)dotsa_(m-1)a_(m)a_{m-k+1} \ldots a_{m-1} a_{m}amk+1am1am does not duplicate a previously occurring section of k k kkk symbols in the above sequence.
i i i i i i iiii i iiii. Rule i i i i iii iii is first applied for m = k m = k m=km=km=k (in which case a m = a k = e q a m = a k = e q a_(m)=a_(k)=e_(q)a_{m}=a_{k}=e_{q}am=ak=eq ) and then applied repeatedly until a further application is impossible.
Proposition 1. If Card ( A ) = q Card ( A ) = q Card(A)=q\operatorname{Card}(\mathcal{A})=qCard(A)=q and q k + k N q k + 1 + k q k + k N q k + 1 + k q^(k)+k <= N <= q^(k+1)+kq^{k}+k \leq N \leq q^{k+1}+kqk+kNqk+1+k then K ( N ) = N k K ( N ) = N k K(N)=N-kK(N)=N-kK(N)=Nk.
Proof. Let us consider at first the case N = q k + 1 + k , k 1 N = q k + 1 + k , k 1 N=q^(k+1)+k,k >= 1N=q^{k+1}+k, k \geq 1N=qk+1+k,k1.
From Lemma 1 we obtain the existence of a word W W WWW of length q k + 1 + k q k + 1 + k q^(k+1)+kq^{k+1}+kqk+1+k which contains all the q k + 1 q k + 1 q^(k+1)q^{k+1}qk+1 words of length k + 1 k + 1 k+1k+1k+1, hence f W ( k + 1 ) = q k + 1 f W ( k + 1 ) = q k + 1 f_(W)(k+1)=q^(k+1)f_{W}(k+1)=q^{k+1}fW(k+1)=qk+1. It is obvious that f W ( l ) = q l < f W ( k + 1 ) f W ( l ) = q l < f W ( k + 1 ) f_(W)(l)=q^(l) < f_(W)(k+1)f_{W}(l)=q^{l}<f_{W}(k+1)fW(l)=ql<fW(k+1) for l { 1 , 2 , , k } l { 1 , 2 , , k } l in{1,2,dots,k}l \in\{1,2, \ldots, k\}l{1,2,,k} and f W ( k + 1 + j ) = q k + 1 j < f W ( k + 1 ) f W ( k + 1 + j ) = q k + 1 j < f W ( k + 1 ) f_(W)(k+1+j)=q^(k+1)-j < f_(W)(k+1)f_{W}(k+1+j)=q^{k+1}-j< f_{W}(k+1)fW(k+1+j)=qk+1j<fW(k+1) for j { 1 , 2 , q k + 1 1 } j 1 , 2 , q k + 1 1 j in{1,2,dotsq^(k+1)-1}j \in\left\{1,2, \ldots q^{k+1}-1\right\}j{1,2,qk+11}. Any other word of length q k + 1 + k q k + 1 + k q^(k+1)+kq^{k+1}+kqk+1+k will have the maximal complexity less than or equal to C ( W ) = f W ( k + 1 ) C ( W ) = f W ( k + 1 ) C(W)=f_(W)(k+1)C(W)=f_{W}(k+1)C(W)=fW(k+1), hence we have K ( N ) = q k + 1 = N k K ( N ) = q k + 1 = N k K(N)=q^(k+1)=N-kK(N)=q^{k+1}=N-kK(N)=qk+1=Nk.
For k 1 k 1 k >= 1k \geq 1k1 we consider now the values of N N NNN of the form N = q k + 1 + k r N = q k + 1 + k r N=q^(k+1)+k-rN=q^{k+1}+k-rN=qk+1+kr with r { 1 , 2 , , q k + 1 q k } r 1 , 2 , , q k + 1 q k r in{1,2,dots,q^(k+1)-q^(k)}r \in\left\{1,2, \ldots, q^{k+1}-q^{k}\right\}r{1,2,,qk+1qk}, hence q k + k N < q k + 1 + k q k + k N < q k + 1 + k q^(k)+k <= N < q^(k+1)+kq^{k}+k \leq N<q^{k+1}+kqk+kN<qk+1+k. If from the word
W W WWW of length q k + 1 + k q k + 1 + k q^(k+1)+kq^{k+1}+kqk+1+k considered above we delete the last r r rrr letters, we obtain a word W N W N W_(N)W_{N}WN of length N = q k + 1 + k r N = q k + 1 + k r N=q^(k+1)+k-rN=q^{k+1}+k-rN=qk+1+kr with r { 1 , 2 , , q k + 1 q k } r 1 , 2 , , q k + 1 q k r in{1,2,dots,q^(k+1)-q^(k)}r \in\left\{1,2, \ldots, q^{k+1}-q^{k}\right\}r{1,2,,qk+1qk}. This word will have f W N ( k + 1 ) = q k + 1 r f W N ( k + 1 ) = q k + 1 r f_(W_(N))(k+1)=q^(k+1)-rf_{W_{N}}(k+1)=q^{k+1}-rfWN(k+1)=qk+1r and this value will be its maximal complexity. Indeed, it is obvious that f W N ( k + 1 + j ) = f W N ( k + 1 ) j < f W N ( k + 1 ) f W N ( k + 1 + j ) = f W N ( k + 1 ) j < f W N ( k + 1 ) f_(W_(N))(k+1+j)=f_(W_(N))(k+1)-j < f_(W_(N))(k+1)f_{W_{N}}(k+1+j)=f_{W_{N}}(k+1)-j<f_{W_{N}}(k+1)fWN(k+1+j)=fWN(k+1)j<fWN(k+1) for j { 1 , 2 , , N k 1 } j { 1 , 2 , , N k 1 } j in{1,2,dots,N-k-1}j \in\{1,2, \ldots, N-k-1\}j{1,2,,Nk1}; for l { 1 , 2 , , k } l { 1 , 2 , , k } l in{1,2,dots,k}l \in\{1,2, \ldots, k\}l{1,2,,k} it follows that f W N ( l ) q l q k q k + 1 r = f W N ( k + 1 ) f W N ( l ) q l q k q k + 1 r = f W N ( k + 1 ) f_(W_(N))(l) <= q^(l) <= q^(k) <= q^(k+1)-r=f_(W_(N))(k+1)f_{W_{N}}(l) \leq q^{l} \leq q^{k} \leq q^{k+1}-r=f_{W_{N}}(k+1)fWN(l)qlqkqk+1r=fWN(k+1), hence C ( W N ) = f W N ( k + 1 ) = q k + 1 r C W N = f W N ( k + 1 ) = q k + 1 r C(W_(N))=f_(W_(N))(k+1)=q^(k+1)-rC\left(W_{N}\right)=f_{W_{N}}(k+1)=q^{k+1}-rC(WN)=fWN(k+1)=qk+1r. Because it is not possible for a word of length N = q k + 1 + k r N = q k + 1 + k r N=q^(k+1)+k-rN=q^{k+1}+k-rN=qk+1+kr, with r { 1 , 2 , , q k + 1 q k } r 1 , 2 , , q k + 1 q k r in{1,2,dots,q^(k+1)-q^(k)}r \in\left\{1,2, \ldots, q^{k+1}-q^{k}\right\}r{1,2,,qk+1qk} to have the maximal complexity greater than q k + 1 r q k + 1 r q^(k+1)-rq^{k+1}-rqk+1r, it follows that K ( N ) = q k + 1 r = N k K ( N ) = q k + 1 r = N k K(N)=q^(k+1)-r=N-kK(N)= q^{k+1}-r=N-kK(N)=qk+1r=Nk.
Proposition 2. If Card ( A ) = q Card ( A ) = q Card(A)=q\operatorname{Card}(\mathcal{A})=qCard(A)=q and q k + k < N < q k + 1 + k + 1 q k + k < N < q k + 1 + k + 1 q^(k)+k < N < q^(k+1)+k+1q^{k}+k<N<q^{k+1}+k+1qk+k<N<qk+1+k+1 then R ( N ) = { k + 1 } R ( N ) = { k + 1 } R(N)={k+1}R(N)= \{k+1\}R(N)={k+1}; if N = q k + k N = q k + k N=q^(k)+kN=q^{k}+kN=qk+k then R ( N ) = { k , k + 1 } R ( N ) = { k , k + 1 } R(N)={k,k+1}R(N)=\{k, k+1\}R(N)={k,k+1}.
Proof. In the first part of the proof of Proposition 1, we proved for N = q k + 1 + k , k 1 N = q k + 1 + k , k 1 N=q^(k+1)+k,k >= 1N= q^{k+1}+k, k \geq 1N=qk+1+k,k1, the existence of a word W W WWW of length N N NNN for which K ( N ) = f W ( k + 1 ) = N k K ( N ) = f W ( k + 1 ) = N k K(N)=f_(W)(k+1)=N-kK(N)= f_{W}(k+1)=N-kK(N)=fW(k+1)=Nk. This means that k + 1 R ( N ) k + 1 R ( N ) k+1in R(N)k+1 \in R(N)k+1R(N). For the word W W WWW, as well as for any other word w w www of length N N NNN, we have f w ( l ) < f W ( k + 1 ) , l k + 1 f w ( l ) < f W ( k + 1 ) , l k + 1 f_(w)(l) < f_(W)(k+1),l!=k+1f_{w}(l)<f_{W}(k+1), l \neq k+1fw(l)<fW(k+1),lk+1, because of the special construction of W W WWW, which contains all the words of length k + 1 k + 1 k+1k+1k+1 in the most compact way. It follows that R ( N ) = { k + 1 } R ( N ) = { k + 1 } R(N)={k+1}R(N)=\{k+1\}R(N)={k+1}.
As in the second part of the proof of Proposition 1, we consider N = q k + 1 + k r N = q k + 1 + k r N=q^(k+1)+k-rN= q^{k+1}+k-rN=qk+1+kr with r { 1 , 2 , q k + 1 q k } r 1 , 2 , q k + 1 q k r in{1,2,dotsq^(k+1)-q^(k)}r \in\left\{1,2, \ldots q^{k+1}-q^{k}\right\}r{1,2,qk+1qk} and the word W N W N W_(N)W_{N}WN for which K ( N ) = f W N ( k + 1 ) = q k + 1 r K ( N ) = f W N ( k + 1 ) = q k + 1 r K(N)=f_(W_(N))(k+1)=q^(k+1)-rK(N)= f_{W_{N}}(k+1)=q^{k+1}-rK(N)=fWN(k+1)=qk+1r. We have again k + 1 R ( N ) k + 1 R ( N ) k+1in R(N)k+1 \in R(N)k+1R(N). For l > k + 1 l > k + 1 l > k+1l>k+1l>k+1, it is obvious that the complexity function of W N W N W_(N)W_{N}WN, or of any other word of length N N NNN, is strictly less than f W N ( k + 1 ) f W N ( k + 1 ) f_(W_(N))(k+1)f_{W_{N}}(k+1)fWN(k+1). We examine now the possibility of finding a word W W WWW with f W ( k + 1 ) = N k f W ( k + 1 ) = N k f_(W)(k+1)=N-kf_{W}(k+1)=N-kfW(k+1)=Nk for which f W ( l ) = N k f W ( l ) = N k f_(W)(l)=N-kf_{W}(l)=N-kfW(l)=Nk for l k l k l <= kl \leq klk. We have f W ( l ) q l q k q k + 1 r f W ( l ) q l q k q k + 1 r f_(W)(l) <= q^(l) <= q^(k) <= q^(k+1)-rf_{W}(l) \leq q^{l} \leq q^{k} \leq q^{k+1}-rfW(l)qlqkqk+1r, hence the equality f W ( l ) = N k = q k + 1 r f W ( l ) = N k = q k + 1 r f_(W)(l)=N-k=q^(k+1)-rf_{W}(l)=N-k=q^{k+1}-rfW(l)=Nk=qk+1r holds only for l = k l = k l=kl=kl=k and r = q k + 1 q k r = q k + 1 q k r=q^(k+1)-q^(k)r=q^{k+1}-q^{k}r=qk+1qk, that is for N = q k + k N = q k + k N=q^(k)+kN=q^{k}+kN=qk+k. We show that for N = q k + k N = q k + k N=q^(k)+kN=q^{k}+kN=qk+k we have indeed R ( N ) = { k , k + 1 } R ( N ) = { k , k + 1 } R(N)={k,k+1}R(N)=\{k, k+1\}R(N)={k,k+1}. If we start with Martin's word of length q k + k 1 q k + k 1 q^(k)+k-1q^{k}+k-1qk+k1 (or with another de Bruijn word) and add to this any letter from A A A\mathcal{A}A, we obtain obviously a word V V VVV of length N = q k + k N = q k + k N=q^(k)+kN=q^{k}+kN=qk+k, which contains all the q k q k q^(k)q^{k}qk words of length k k kkk and q k = N k q k = N k q^(k)=N-kq^{k}=N-kqk=Nk words of length k + 1 k + 1 k+1k+1k+1, hence f V ( k ) = f V ( k + 1 ) = K ( N ) f V ( k ) = f V ( k + 1 ) = K ( N ) f_(V)(k)=f_(V)(k+1)=K(N)f_{V}(k)=f_{V}(k+1)=K(N)fV(k)=fV(k+1)=K(N).
Remark 2. Having in mind the algorithm given by Martin [7] (or other more efficient algorithms), words w w www with maximal complexity C ( w ) = K ( N ) C ( w ) = K ( N ) C(w)=K(N)C(w)=K(N)C(w)=K(N) can be easily constructed for each N N NNN and for both situations in Proposition 2.

3 De Bruijn graphs and trees

In the previous section the global maximal complexity K ( N ) K ( N ) K(N)K(N)K(N) for words of length N N NNN was obtained, as well as the set of points R ( N ) R ( N ) R(N)R(N)R(N) where K ( N ) K ( N ) K(N)K(N)K(N) is equal to the maximal value of the subword complexity of certain words of length N N NNN. To this aim we used a special word constructed by Martin [7], which is one of the de Bruijn words. A de Bruijn word for given q q qqq and k k kkk is a word over an alphabet with q q qqq letters, containing all k k kkk-length words exactly once. The length of such a word is q k + k 1 q k + k 1 q^(k)+k-1q^{k}+k-1qk+k1.
In order to tackle the problem of finding the number of the words for which the global maximal complexity is attained, we shall use the de Bruijn graphs and trees.
For a q q qqq-letter alphabet A A A\mathcal{A}A the de Bruijn graph is defined as:
B ( q , k ) = ( V ( q , k ) , E ( q , k ) ) B ( q , k ) = ( V ( q , k ) , E ( q , k ) ) B(q,k)=(V(q,k),E(q,k))B(q, k)=(V(q, k), E(q, k))B(q,k)=(V(q,k),E(q,k))
with V ( q , k ) = A k V ( q , k ) = A k V(q,k)=A^(k)V(q, k)=\mathcal{A}^{k}V(q,k)=Ak as the set of vertices, and E ( q , k ) = A k + 1 E ( q , k ) = A k + 1 E(q,k)=A^(k+1)E(q, k)=\mathcal{A}^{k+1}E(q,k)=Ak+1 as the set of directed arcs. There is an arc from x 1 x 2 x k x 1 x 2 x k x_(1)x_(2)dotsx_(k)x_{1} x_{2} \ldots x_{k}x1x2xk to y 1 y 2 y k y 1 y 2 y k y_(1)y_(2)dotsy_(k)y_{1} y_{2} \ldots y_{k}y1y2yk if x 2 x 3 x k = y 1 y 2 y k 1 x 2 x 3 x k = y 1 y 2 y k 1 x_(2)x_(3)dotsx_(k)=y_(1)y_(2)dotsy_(k-1)x_{2} x_{3} \ldots x_{k}= y_{1} y_{2} \ldots y_{k-1}x2x3xk=y1y2yk1, and this arc is denoted by x 1 x 2 x k y k x 1 x 2 x k y k x_(1)x_(2)dotsx_(k)y_(k)x_{1} x_{2} \ldots x_{k} y_{k}x1x2xkyk. See Fig. 1 and 2 for B ( 2 , 2 ) B ( 2 , 2 ) B(2,2)B(2,2)B(2,2) and B ( 2 , 3 ) B ( 2 , 3 ) B(2,3)B(2,3)B(2,3). The de Bruijn graphs B ( q , k ) B ( q , k ) B(q,k)B(q, k)B(q,k) are nonplanar for k 4 k 4 k >= 4k \geq 4k4, q 2 q 2 q >= 2q \geq 2q2.
In the de Bruijn graph B ( q , k ) B ( q , k ) B(q,k)B(q, k)B(q,k) a path (i. e. a walk with distinct vertices)
a 1 a 2 a k , a 2 a 3 a k + 1 , , a r k + 1 a r k + 2 a r ( r > k ) a 1 a 2 a k , a 2 a 3 a k + 1 , , a r k + 1 a r k + 2 a r ( r > k ) a_(1)a_(2)dotsa_(k),quada_(2)a_(3)dotsa_(k+1),dots,a_(r-k+1)a_(r-k+2)dotsa_(r)quad(r > k)a_{1} a_{2} \ldots a_{k}, \quad a_{2} a_{3} \ldots a_{k+1}, \ldots, a_{r-k+1} a_{r-k+2} \ldots a_{r} \quad(r>k)a1a2ak,a2a3ak+1,,ark+1ark+2ar(r>k)
corresponds to an r r rrr-length word a 1 a 2 a k a k + 1 a r a 1 a 2 a k a k + 1 a r a_(1)a_(2)dotsa_(k)a_(k+1)dotsa_(r)a_{1} a_{2} \ldots a_{k} a_{k+1} \ldots a_{r}a1a2akak+1ar, which is obtained by adding, in turn, to the vertex a 1 a 2 a k a 1 a 2 a k a_(1)a_(2)dotsa_(k)a_{1} a_{2} \ldots a_{k}a1a2ak the last letter of the following vertices in the path. For example in B ( 2 , 3 ) B ( 2 , 3 ) B(2,3)B(2,3)B(2,3) the path 001 , 010 , 101 001 , 010 , 101 001,010,101001,010,101001,010,101 corresponds to the word 00101. Every maximal length path in the graph B ( q , k ) B ( q , k ) B(q,k)B(q, k)B(q,k) (which is a Hamiltonian one) corresponds to a de Bruijn word.
Fig. 1: The de Bruijn graph B ( 2 , 2 ) B ( 2 , 2 ) B(2,2)B(2,2)B(2,2).
In the directed graph B ( q , k ) B ( q , k ) B(q,k)B(q, k)B(q,k) there always exists an Eulerian circuit because it is connected and all its vertices have the same indegree and outdegree q q qqq. An Eulerian circuit in B ( q , k ) B ( q , k ) B(q,k)B(q, k)B(q,k) is a Hamiltonian path in B ( q , k + 1 ) B ( q , k + 1 ) B(q,k+1)B(q, k+1)B(q,k+1) (which always can be continued in a Hamiltonian cycle). For example in B ( 2 , 2 ) B ( 2 , 2 ) B(2,2)B(2,2)B(2,2) the following walk: 000 , 001 , 010 , 101 , 011 , 111 , 110 , 100 000 , 001 , 010 , 101 , 011 , 111 , 110 , 100 000,001,010,101,011,111,110,100000,001,010,101,011,111,110,100000,001,010,101,011,111,110,100 represents an Eulerian circuit, which in B ( 2 , 3 ) B ( 2 , 3 ) B(2,3)B(2,3)B(2,3) is a Hamiltonian path.
In order to study the number of words in A k A k A^(k)\mathcal{A}^{k}Ak which have the maximal complexity equal to the global maximal complexity K ( N ) K ( N ) K(N)K(N)K(N) we shall introduce the so-called de Bruijn trees. A de Bruijn tree T ( q , w ) T ( q , w ) T(q,w)T(q, w)T(q,w) with the root w A k w A k w inA^(k)w \in \mathcal{A}^{k}wAk is a q q qqq-ary tree defined recursively as follows:
i i iii. The k k kkk-length word w w www over the alphabet A = { e 1 , e 2 , . e q } A = e 1 , e 2 , . e q A={e_(1),e_(2),dots.e_(q)}\mathcal{A}=\left\{e_{1}, e_{2}, \ldots . e_{q}\right\}A={e1,e2,.eq} is the root of T ( q , w ) T ( q , w ) T(q,w)T(q, w)T(q,w).
Fig. 2: The de Bruijn graph B ( 2 , 3 ) B ( 2 , 3 ) B(2,3)B(2,3)B(2,3).
i i i i iii iii. If at any step of the recursive construction of the tree, x 1 x 2 x k x 1 x 2 x k x_(1)x_(2)dotsx_(k)x_{1} x_{2} \ldots x_{k}x1x2xk is a (temporary) leaf (a vertex with outdegree equal to 0 ), then each word among x 2 x 3 x k e 1 , x 2 x 3 x k e 2 , , x 2 x 3 x k e q x 2 x 3 x k e 1 , x 2 x 3 x k e 2 , , x 2 x 3 x k e q x_(2)x_(3)dotsx_(k)e_(1),x_(2)x_(3)dotsx_(k)e_(2),dots,x_(2)x_(3)dotsx_(k)e_(q)x_{2} x_{3} \ldots x_{k} e_{1}, x_{2} x_{3} \ldots x_{k} e_{2}, \ldots, x_{2} x_{3} \ldots x_{k} e_{q}x2x3xke1,x2x3xke2,,x2x3xkeq which is not in the path from the root to x 1 x 2 x k x 1 x 2 x k x_(1)x_(2)dotsx_(k)x_{1} x_{2} \ldots x_{k}x1x2xk will be a descendant of x 1 x 2 x k x 1 x 2 x k x_(1)x_(2)dotsx_(k)x_{1} x_{2} \ldots x_{k}x1x2xk.
i i i i i i iiii i iiii. The rule i i i i iii iii is applied as many times as it is possible.
A path is maximal if we cannot add an arc to its beginning or to its end without destroying the path property. If a maximal path is of maximal length then it is a Hamiltonian one. In any de Bruijn tree each branch is a maximal path in the de Bruijn graph B ( q , k ) B ( q , k ) B(q,k)B(q, k)B(q,k) which begins with the root, and all maximal paths beginning with the root occur. For the de Bruijn trees T ( 2 , 000 ) T ( 2 , 000 ) T(2,000)T(2,000)T(2,000), T ( 2 , 001 ) , T ( 2 , 010 ) T ( 2 , 001 ) , T ( 2 , 010 ) T(2,001),T(2,010)T(2,001), T(2,010)T(2,001),T(2,010) and T ( 2 , 100 ) T ( 2 , 100 ) T(2,100)T(2,100)T(2,100) see Fig. 3-6. The word obtained by Martin's algorithm corresponds to the branch of maximal length in the right side of the de Bruijn tree T ( 2 , 001 ) T ( 2 , 001 ) T(2,001)T(2,001)T(2,001).

4 Methods to compute M ( N ) M ( N ) M(N)M(N)M(N)

The number M ( N ) M ( N ) M(N)M(N)M(N) of the words of length N N NNN for which the maximal complexity is equal to the global maximal complexity K ( N ) K ( N ) K(N)K(N)K(N) can be expressed both in terms of certain paths in a de Bruijn graph and of some vertices in the de Bruijn trees.
Proposition 3. If Card ( A ) = q Card ( A ) = q Card(A)=q\operatorname{Card}(\mathcal{A})=qCard(A)=q and q k + k N q k + 1 + k q k + k N q k + 1 + k q^(k)+k <= N <= q^(k+1)+kq^{k}+k \leq N \leq q^{k+1}+kqk+kNqk+1+k then M ( N ) M ( N ) M(N)M(N)M(N) is equal to the number of different paths of length N k 1 N k 1 N-k-1N-k-1Nk1 in the de Bruijn graph B ( q , k + 1 ) B ( q , k + 1 ) B(q,k+1)B(q, k+1)B(q,k+1).
Proof. From Propositions 1 and 2 it follows that the number M ( N ) M ( N ) M(N)M(N)M(N) of the words of length N N NNN with global maximal complexity is given by the number of words w A N w A N w inA^(N)w \in \mathcal{A}^{N}wAN with f w ( k + 1 ) = N k f w ( k + 1 ) = N k f_(w)(k+1)=N-kf_{w}(k+1)=N-kfw(k+1)=Nk. It means that these words contain N k N k N-kN-kNk subwords of length k + 1 k + 1 k+1k+1k+1, all of them distinct. To enumerate all of them we start successively with each word of k + 1 k + 1 k+1k+1k+1 letters (hence with each vertex in B ( q , k + 1 ) B ( q , k + 1 ) B(q,k+1)B(q, k+1)B(q,k+1) ) and we add at each step, in turn, one of the symbols from A A A\mathcal{A}A which does not duplicate a word of length k + 1 k + 1 k+1k+1k+1 which has already appeared.
Fig. 3: De Bruijn tree T ( 2 , 000 ) T ( 2 , 000 ) T(2,000)T(2,000)T(2,000).
Fig. 4: De Bruijn tree T ( 2 , 001 ) T ( 2 , 001 ) T(2,001)T(2,001)T(2,001).
Fig. 5: De Bruijn tree T ( 2 , 100 ) T ( 2 , 100 ) T(2,100)T(2,100)T(2,100).
Fig. 6: De Bruijn tree T ( 2 , 010 ) T ( 2 , 010 ) T(2,010)T(2,010)T(2,010).
Of course, not all of the trials will finish in a word of length N N NNN, but those which do this, are precisely paths in B ( q , k + 1 ) B ( q , k + 1 ) B(q,k+1)B(q, k+1)B(q,k+1) starting with each vertex in turn and having the length N k 1 N k 1 N-k-1N-k-1Nk1. Hence to each word of length N N NNN with f w ( k + 1 ) = N k f w ( k + 1 ) = N k f_(w)(k+1)=N-kf_{w}(k+1)=N-kfw(k+1)=Nk we can associate a path and only one of length N k 1 N k 1 N-k-1N-k-1Nk1 starting from the vertex given by the first k + 1 k + 1 k+1k+1k+1 letters of the initial word; conversely, any path of length N k 1 N k 1 N-k-1N-k-1Nk1 will provide a word w w www of length N N NNN which contains N k N k N-kN-kNk distinct subwords of length k + 1 k + 1 k+1k+1k+1.
Remark 3. The number of words of length N N NNN having global maximal complexity can be also expressed by means of certain vertices in the de Bruijn trees. M ( N ) M ( N ) M(N)M(N)M(N) is equal to the number of vertices at the level N k 1 N k 1 N-k-1N-k-1Nk1 in the set { T ( q , w ) w A k + 1 } T ( q , w ) w A k + 1 {T(q,w)∣w inA^(k+1)}\left\{T(q, w) \mid w \in \mathcal{A}^{k+1}\right\}{T(q,w)wAk+1} of the de Bruijn trees. (The level of the root is considered to be 0 , its descendants are on level 1 etc.)
The other four trees corresponding to the de Bruijn graph B ( 2 , 3 ) B ( 2 , 3 ) B(2,3)B(2,3)B(2,3) are mirror images of those in Fig. 3-6; we obtain, for example, M ( 6 ) M ( 6 ) M(6)M(6)M(6) by doubling the number of vertices at level 3 in Fig. 3-6, i. e. M ( 6 ) = 2 18 = 36 M ( 6 ) = 2 18 = 36 M(6)=2*18=36M(6)=2 \cdot 18=36M(6)=218=36. Similarly M ( 7 ) = 2 21 = 42 M ( 7 ) = 2 21 = 42 M(7)=2*21=42M(7)=2 \cdot 21=42M(7)=221=42 is obtained by doubling the number of vertices at level 4, and so on up to M ( 10 ) = 2 8 = 16 M ( 10 ) = 2 8 = 16 M(10)=2*8=16M(10)=2 \cdot 8=16M(10)=28=16 (using the vertices at level 7). These results are in accordance with those given in Table 1 obtained by counting all possible words with maximal complexity.
A formula for the number M ( N ) M ( N ) M(N)M(N)M(N) of the words whose maximal complexity is equal to the global maximal complexity K ( N ) K ( N ) K(N)K(N)K(N) can be given for the special case of de Bruijn words.
Proposition 4. If N = 2 k + k 1 N = 2 k + k 1 N=2^(k)+k-1N=2^{k}+k-1N=2k+k1 then M ( N ) = 2 2 k 1 M ( N ) = 2 2 k 1 M(N)=2^(2^(k-1))M(N)=2^{2^{k-1}}M(N)=22k1.
Proof. The number of distinct Hamiltonian cycles in the de Bruijn graph B ( 2 , k ) B ( 2 , k ) B(2,k)B(2, k)B(2,k) is equal to 2 2 k 1 k 2 2 k 1 k 2^(2^(k-1)-k)2^{2^{k-1}-k}22k1k [2]. With each vertex of a Hamiltonian cycle a de Bruijn word (containing all the factors of length k k kkk ) begins, which has maximal complexity, so M ( N ) = 2 k 2 2 k 1 k M ( N ) = 2 k 2 2 k 1 k M(N)=2^(k)*2^(2^(k-1)-k)M(N)=2^{k} \cdot 2^{2^{k-1}-k}M(N)=2k22k1k, which proves the proposition. (In [3] the number of circular de Bruijn words is found, which corresponds to the number of Hamiltonian cycles in de Bruijn graphs).
A generalization for q 2 q 2 q >= 2q \geq 2q2 can be proved in a similar way using the results in [1].
Proposition 5. If N = q k + k 1 N = q k + k 1 N=q^(k)+k-1N=q^{k}+k-1N=qk+k1 then M ( N ) = ( q ! ) q k 1 M ( N ) = ( q ! ) q k 1 M(N)=(q!)^(q^(k-1))M(N)=(q!)^{q^{k-1}}M(N)=(q!)qk1.
In Proposition 1, respectively Proposition 2, we have determined for each natural number N N NNN the value of the global maximal complexity K ( N ) K ( N ) K(N)K(N)K(N), respectively the set of values i i iii for which there exists a word of length N N NNN with K ( N ) K ( N ) K(N)K(N)K(N) subwords of length i i iii. To obtain a general formula for M ( N ) M ( N ) M(N)M(N)M(N) for each natural number N N NNN is still an open problem.

References

[1] T. van Aardenne-Ehrenfest, N. G. de Bruijn, Circuits and trees in oriented linear graphs, Simon Stevin 28 (1951), 203-217.
[2] J. Bond, A. Iványi, Modelling of interconnection networks using de Bruijn graphs, Third Conference of Program Designer, Ed. A. Iványi, Budapest, 1987, 75-87.
[3] N. G. de Bruijn, A combinatorial problem, Nederl. Akad. Wetensch. Proc. 49 (1946), 758-764.
[4] M. Heinz, Zur Teilwortkomplexität für Wörter und Folgen über einem endlichen Alphabet, EIK 13 (1977), 27-38.
[5] M. Lothaire, Combinatorics on words, Addison-Wesley, Reading, MA, 1983.
[6] A. de Luca, On the combinatorics of finite words, Theor. Comput. Sci. 218 (1999), 13-39.
[7] M. H. Martin, A problem in arrangements, Bull. A. M. S. 40 (1934), 859864.

  1. *Tiberiu Popoviciu Institute of Numerical Analysis, Romanian Academy, Cluj-Napoca, E-mail: mira@math.ubbcluj.ro Partially supported by grant CNCSIS 125/64-2001/2002
    ^(†){ }^{\dagger} Bolyai Institute of Mathematics, University of Szeged, E-mail: blazsik@sol.cc.u-szeged.hu
    ^(‡){ }^{\ddagger} Faculty of Mathematics and Informatics, Babes-Bolyai University of Cluj-Napoca, Email: kasa@cs.ubbcluj.roPartially supported by grant CNCSIS 125/64-2001/2002
2002

Related Posts