Abstract
The subword complexity function \(p_{w}\) of a finite word \(w\) over a finite alphabet A with card \(A=q\geq1\) is defined by \(p_{w}(n)=\)card \((F(w)\cap A^{n})\) for \(n\in N\), where \(F(w)\) represents the set of all the subwords or factors of \(w\). The shape of the complexity function, especially its piecewise monotonicity, is studied in detail. \newline \qquad The function h defined as \(h(n)=min\{q^{n},N-n+1\}forn\in \{0,1,…,N\}\) has values greater than or equal to those of the complexity function pw for any \(w\in A^{N}\) , i.e., \(pw(n)\leq h(n)\) for all \(n\in \{0,1,…,N\}\). As a first result regarding \(h\), it is proved that for each \(N\in N\) there exist words of length N for which the maximum of their complexity function is equal to the maximum of the function \(h\); a way to construct such words is described. This result gives rise to a further question: for a given \(N\), is there a word of length \(N\) whose complexity function coincides with h for each \(n\in \{0,1,…,N\}\)? The problem is answered in affirmative, with different constructive proofs for binary alphabets \((q=2)\) and for those with \(q>2\). This means that for each \(N\in N\), there exist words w of length \(N\) whose complexity function is equal to the function h. Such words are constructed using the de Bruijn graphs.
Authors
Mira-Cristiana Anisiu
T. Popoviciu” Institute of Numerical Analysis, Romanian Academy, Romania
Julien Cassaigne
Institut de Mathematiques de Luminy, CNRS UPR 9016 / FRUMAM, Case 907, 13288 Marseille, France,
Keywords
Subword complexity function; finite words; de Bruijn graph.
Paper coordinates
M.-C. Anisiu, J. Cassaigne, Properties of the complexity function for finite words, Rev. Anal. Numér. Théor. Approx. 33 (2) (2004), 123-139 (pdf file here)
About this paper
Print ISSN
Online ISSN
google scholar link
[1] Anisiu, M.-C., Blazsik, Z. and Kasa, Z., Maximal complexity of finite words, Pure Math. Appl., 13, pp. 39–48, 2002.
[2] de Bruijn, N. G., A combinatorial problem, Nederl. Akad. Wetensch. Proc., 49, pp. 758–764, 1946 = Indag. Math., 8, pp. 461–467, 1946.
[3] de Bruijn, N. G., Acknowledgement of priority to C. Flye Sainte-Marie on the counting of circular arrangements of 2n zeros and ones that show each n-letter word exactly once,
T. H. -Report 75-WSK-06, Technological University Eindhoven, the Netherlands, pp. 1–14, 1975.
[4] Champernowne, D. G., The construction of decimals normal in the scale of ten, J. London Math. Soc., 8, pp. 254–260, 1933.
[5] Cummings, L. J. and Wiedemann, D., Embedded de Bruijn sequences, Proceedings of the 7th Southeastern international conference on combinatorics, graph theory, and computing (Boca Raton, Florida, 1986), Congr. Numer., 53, pp. 155–160, 1986.
[6] Ehrenfeucht, A., Lee, K. P. and Rozenberg, G., Subword complexities of various classes of deterministic developmental languages without interactions, Theoret. Comput. Sci., 1, pp. 59–75, 1975.
[7] Flye Sainte-Marie, C., Solution to question nr. 48, l’ Intermediaire des Mathematiciens, 1, pp. 107–110, 1894.
[8] Fredericksen, H., A survey of full length nonlinear shift register cycle algorithms, SIAM Review, 24, pp. 195–221, 1982.
[9] Games, R. A., A generalized recursive construction for de Bruijn sequences, IEEE Trans. Inform. Theory, 29, pp. 843–850, 1983.
[10] Good, I. J., Normal recurring decimals, J. London Math. Soc., 21, pp. 167–169, 1946.
[11] Heinz, M., Zur Teilwortkomplexit¨at f¨ur W¨orter und Folgenuber einem endlichen Alphabet, EIK, 13, pp. 27–38, 1977.
[12] Hunyadvary, L. and Ivanyi, A., On some complexity measures of words, Dep. Math., Karl Marx Univ. Econ., Budapest 1984-2, pp. 67–82, 1984.
[13] Hunyadvary, L. and Ivanyi, A. , Construction of complex chains sequences and ring sequences, in Abstracts of Colloquium Theory of Algorithms, Pecs, p. 20, 1984.
[14] Leve, F. and Seebold, P. , Proof of a conjecture on word complexity, Journees Montoises d’Informatique Theorique (Marne-la-Vallee, 2000), Bull. Belg. Math. Soc. Simon Stevin, 8, pp. 277–291, 2001.
[15] de Luca, A., On the combinatorics of finite words, Theoret. Comput. Sci., 218, pp 13–39, 1999.
[16] Martin, M. H., A problem in arrangements, Bull. American Math. Soc., 40, pp. 859–864, 1934.
[17] Morse, M. and Hedlund, G. A., Symbolic dynamics, Amer. J. Math., 60, pp. 815–866, 1938.
[18] Ralston, A., A new memoryless algorithm for de Bruijn sequences, J. Algorithms, 2, pp. 50–62, 1981.
[19] Shallit, J., On the maximum number of distinct factors in a binary string, Graph Comb., 9, pp. 197–200, 1993.
[20] Voros, N. , On the complexity measures of symbol-sequences, in Proceedings of the Conference of Young Programmers and Mathematicians (ed. A. Iv´anyi), Eotvos Lorand University, Budapest, pp. 43–50, 1984.
soon
Paper (preprint) in HTML form
PROPERTIES OF THE COMPLEXITY FUNCTION FOR FINITE WORDS
Abstract
The subword complexity function of a finite word over a finite alphabet with is defined by for , where represents the set of all the subwords or factors of . The shape of the complexity function, especially its piecewise monotonicity, is studied in detail.
The function defined as for has values greater than or equal to those of the complexity function for any , i.e., for all . As a first result regarding , it is proved that for each there exist words of length for which the maximum of their complexity function is equal to the maximum of the function ; a way to construct such words is described. This result gives rise to a further question: for a given , is there a word of length whose complexity function coincides with for each ? The problem is answered in affirmative, with different constructive proofs for binary alphabets and for those with . This means that for each , there exist words of length whose complexity function is equal to the function . Such words are constructed using the de Bruijn graphs.
MSC 2000. 68R15.
Keywords. Subword complexity function, finite words, de Bruijn graph.
1. DEFINITIONS
Let an alphabet with card be given. A factor (subword) of an infinite sequence or finite word has the right valence if there are and only distinct letters such that are also in (the set of all the subwords, or factors, of ); if a factor has the right valence it can be extended on the right in exactly ways. The left valence is defined in a similar way. A factor having the right (left) valence is called right (left) special; a factor which is both right and left special is called bispecial. The length of a word will be denoted by .
† Institut de Mathématiques de Luminy, CNRS UPR 9016 / FRUMAM, Case 907, 13288 Marseille, France, e-mail: cassaign@iml.univ-mrs.fr.
For an infinite sequence any factor can always be extended on the right in a factor of . For a finite word there are subwords which cannot be extended on the right. Such words have to be suffixes of . Let us denote by the suffix of of minimal length which cannot be extended on the right and by the length of . Then any other subword also cannot be extended on the right. Considering the prefix of of minimal length which cannot be extended on the left, we shall denote its length by . The constants and were defined by de Luca [15].
Let us denote by the set of all suffixes of which cannot be extended on the right in , i.e., their right valence is 0 . If the length of is , then we set for any
For all , one has . Moreover, being the length of (the suffix of of minimal length which cannot be extended on the right), is given by
It follows that the number of subwords which cannot be extended on the right is
For an infinite sequence , the (subword) complexity function (defined in [17] as the block growth, then named subword complexity in [6]) is given by for , so it maps each nonnegative number to the number of factors of length of ; it verifies the iterative equation
| (1) |
being the cardinal of the set of the factors of having the length and the right valence .
For a finite word of length , the complexity function given by , has the property that for . The corresponding iterative equation is
| (2) |
Since we can write (2) in a condensed form
| (3) |
The above relations have their correspondents in terms of left extensions of the subwords.
For a finite word of length over the alphabet with card , the subword complexity will be less than or equal to the number of all the possible words of length over the -letter alphabet and also less than or equal to the number of all occurrences of subwords of length in . The map defined in [15]
| (4) |
will have values greater than or equal to those of any complexity function for , i.e., . The fact that was stated in [11], while appeared in [20].
We recall that for infinite sequences one has
and that there exist sequences, called complete, for which the complexity is precisely for all . An example is the Champernowne sequence
0.1.10.11.100.101.110.111.1000. …
containing successively all the nonnegative integers written in base 2 , and, more generally, in base (it was used in [4] to construct a normal number in base ten).
2. PROPERTIES OF THE FUNCTION
Remark 1. We mention at first the trivial case when , for which for all ; there is a word, namely , whose complexity satisfies for all .
For and , we have
hence for all . For each word containing distinct elements of the complexity function is for all , and , hence in this case it also coincides with .
In what follows we shall consider and . The values of the function are given by the minimum of the values of an increasing exponential and of a descending line, so at the beginning will follow the exponential, and then the descending line. The following result is presented without proof in [15]:
Proposition 1. If denotes the first point where ( ) is on the descending line, the maximum of the function is attained at , and is given by or (for a real , denotes the largest integer which is less than or equal to ).
We shall determine precisely the point where the maximum of the function is taken.
Proposition 2. Let . If is given so that , then and this is the unique point where attains its maximum ; for , we have and the function attains its maximum at both and .
In fact, if , the function is given by
and , for the maximum being attained also at the point .
Proof. Let be given so that . The function being increasing on and decreasing on , we have only to compare its values on and . From the definition of and of we have
| (6) |
and
| (7) |
which means that
| (8) |
But and , hence from (6) it follows that the maximum of is taken at . The function being increasing, from (8) one obtains and .
If we have , so is the unique point where attains its maximum which is equal to ; if , we have , so the maximum of is taken at two points and .
The description of given in (5) was established in [12]. The value of being related to the integer part of , we can give a more precise result than that in the above Proposition 1.
Remark 2. For , we have for , and for ; it follows that in the first case , and in the second case .
Given the number , the maximum of the function can be easily determined.
Example 1. Let ; we obtain and . For , we have and .
3. PROPERTIES OF THE COMPLEXITY FUNCTION
For an infinite sequence , the complexity function is nondecreasing; if there exists such that , then is constant for . The complexity function for a finite word of length has obviously a different behaviour, because of (there is a unique factor of length , namely . The study of the shape of was considered by Heinz [11] and then by de Luca in [15], the results in [15] being briefly exposed in what follows. Then the piecewise monotonicity of is established in Theorems 5 and 6.
Let us consider for the number of all right special factors of length . Any suffix of a right special factor is still a right special factor. Since , one can define an integer by
One has ; thus represents the maximal length of a right special factor of (excepting the case of the word which has no special factor and for which ). If , in there are no right special factors of length ; such an example is . Similarly, there exists a number so that represents the maximal length of a left special factor of (except if ). Remember the number representing the minimal length of a suffix (prefix) of which cannot be extended on the right (left). The numbers and (or their duals and play an important role in the description of the shape of .
Let us denote
The function has the property that for , and for . The recurrence relation (2) can be written as
| (9) |
If , for one has and . From (9) we obtain that is strictly increasing on . For and , so is constant on the interval . For , and , so is strictly decreasing on , and, moreover, for .
If , for we obtain , so from (9) will be strictly increasing on . For and , hence is non-decreasing on . For one has and , which implies that is strictly decreasing on and, for .
It follows
Proposition 3. [15]. The subword complexity takes its maximum at and, moreover,
Proof. In both cases analyzed above, has its maximum at . If , for all , so and then . If for all in and . Since the result follows.
In a similar way, one can prove
Proposition 4. [15]. The subword complexity takes its maximum at and, moreover,
hence .
From the analysis before Proposition 3, we have the following information on the shape of the function [15]:
For , it is strictly increasing (starting from and ), then constant, and then strictly decreasing (with on the last interval).
For is at first strictly increasing, then non-decreasing, and at last strictly decreasing also with .
So in both cases, there is an interval on which is increasing and one on which is strictly decreasing. The only problem is that in the second case it could be that after becoming constant, would increase again. We show that this is not the case.
Let us consider , so and . Suppose that there exists so that
| (10) | |||
| (11) |
From (10) one obtains that and for , i.e., there exists a unique right special factor having length , and its valence is 2 : let it be denoted . From (11) it follows that
which is possible for two situations:
I. ;
II. .
If II. is true, there will be a right special factor of length having valence at least 3, and then the factor obtained by excluding its first letter will have length and valence at least 3 , contradicting the uniqueness of .
If I. is true, there will exist two different right special factors of length and valence at least 2 . They can differ only by their first letter, otherwise there would exist two different factors of length and valence 2 . So they will have the form
i.e., will be bispecial, and in the word there will be also
| (12) |
The subword cannot be a suffix of since is extendable to the right and there is no extendable suffix of length greater than or equal to . Let us consider the last occurrence of , suppose it is followed by . Then
| (13) |
and, being left special, will have another occurrence in , so
Let be the longest common prefix of and , which will satisfy
Since the subword is a proper prefix of is right extendable; then it cannot be a suffix of , hence it is also a proper prefix of , and thus right special. The suffix of length of is then right special, in contradiction with the fact that the last occurrence of , the unique right special factor of length , was chosen so that . It follows that in the case , if for a value , then will remain constant until it will begin (at ) to decrease to 1 (it cannot start increasing again). We mention that Heinz [11] proved that from and it follows .
Let us denote by the smallest number greater than or equal to for which has precisely one right special factor of that length, with valence 2 (if this is not the case, take ). We have established the following
Theorem 5. For a finite word of length , the complexity function is at first strictly increasing, then constant and at last decreasing having the slope -1 . If , the successive intervals are and , while for they are and .
One can easily avoid to analyze two cases by simply considering instead of a word , a word obtained by adding two different symbols which are not in at the beginning and at the end of , i.e., . The complexity functions for and are related by for (and obviously ). So the graph of is the graph of shifted by two units parallel to the -axis,
and the two functions have the same monotonicity. For we have , and, similarly, , hence in this case we have always ; from Proposition 4 it follows also . The advantage of considering the word is that instead of the four parameters we are left with only one, namely the common value of . Denoting by the smallest positive number for which has precisely one right special factor of that length, with valence 2 (if there is not such a factor, ), we obtain
Theorem 6. For a finite word of length , the intervals of monotonicity of are and , the function increasing at first, being constant and then decreasing with the slope -1 ; the maximum of is . The numbers and are those defined above for the word .
Example 2. Let , so , . In this case , , and for .
For , we have , and , and for .
REMARK 3. For the words in example 2, the function is concave on , i.e.,
However this is not the rule, as the following examples show for both and .
Example 3. Let , so , . In this case , and for .
Let , so , . In this case , and for .
We mention that the refinement of de Luca’s result has been proved independently by Levé and Séébold [14] while studying -reachable integers.
4. THE FUNCTION AND RELATED WORDS
In section 2 we found the point where the function takes its maximum. A problem to be considered is the following: are there any words of length such that ? If such words do exist, they have the property that the maximum of their complexity function cannot be exceeded by the maximum of the complexity function of any other word of length .
The answer to this problem is in affirmative and it relies on the following result which was stated by Good in [10] for . The enumeration of the words whose existence is proved was given by de Bruijn [2], who later [3] acknowledged the priority of C. Flye Sainte-Marie [7].
Lemma 7. Given an alphabet with card , for each the shortest word containing all the words of length has letters.
Proof. The existence of such a word (which is usually named de Bruijn word of order ) is proved by considering the de Bruijn graph (which is strongly connected) with vertices labelled with the elements of , and an from to exists if and only if there exist two letters such that . Each vertex has the same number of inward and outward arcs; therefore, there exists an Eulerian cycle, and each path, starting from any vertex and following the cycle until coming back to that vertex, will provide a word (obviously the shortest) of length which contains exactly one occurrence of all the words of length . The word of length is often identified with the cycle formed by its first letters.
Remark 4. For the de Bruijn word of order , whose existence was proved in Lemma 7, we have , and the maximum of its complexity function is attained at and equals . Such a word can be represented in the form (with ), or as a cycle or as an infinite periodic sequence with the period .
The first algorithm which constructs such a word was given by Martin [16]. Considering the alphabet , the algorithm in question is built up out the following three rules.
I. Each of the first symbols is chosen equal to .
II. The symbol to be added to the sequence , where and the ’s stand for the ’s in a certain order, is the with the greatest subscript consistent with the requirement that the section duplicate no previously occurring section of symbols in the above sequence.
III. Rule II. is first applied for (in which case ) and is then applied repeatedly until a further application is impossible.
This algorithm needs a very large memory (for all the subwords of length which have already been obtained), but there exist also some memoryless algorithms exposed, for example, in [8], [9] and [18].
We can prove now
Theorem 8. For each , there exists a word of length over an alphabet with for which the maximum of the complexity function is equal to the maximum of ; the maximum is taken at the same points for both functions. Such words can be easily constructed using the de Bruijn words.
Proof. Keeping in mind the considerations in Remark 1, which mean precisely that the theorem is true for and for , we shall consider and . Let be the unique natural number so that
If , we apply Lemma 7 for , obtaining a word of length containing as factors all the words of letters, and distinct words of length . The word obtained by adding a letter from at its end will contain words of letters and distinct words of length , hence . This is the maximum of the function , it is equal to and it is attained at the same points as the maximum of given in Proposition 2. Actually, in this case we have .
Let us now consider the case . Applying Lemma 7 for the number , we obtain a shortest word containing all the words of length , having letters. So for each , the prefix of obtained by deleting final letters will satisfy , this being the maximum of the complexity function for the considered word. The maximum is attained only for .
Applying Proposition 2 for we obtain , which means that the maximum of the complexity function for is equal to the maximum of the complexities of all possible words of length .
Example 4. Let us consider for the 2-letter alphabet the values and . For we have and, by adding a letter (for example ) to the Martin word of order 2, abbaa, we obtain ; the maximum of is . For we can consider the Martin word of order 3, aabbbabaaa, and delete three symbols from the end. The word has the maximum of its complexity function given by .
The maximum of the function for and was calculated in Example 1 and it coincides with that of , respectively and is taken at the same points.
5. THE REPRESENTATION OF AS A COMPLEXITY FUNCTION
An interesting problem is: Let and be given and the function defined as in (4). Is there a word of length over the -letter alphabet such that
| (14) |
i.e., is the complexity function for that word? If such a word does exist, how can it be constructed?
The question has an affirmative answer for the trivial cases and , , mentioned in Remark 1, so it has to be studied for . In the proof of Theorem 8 it was shown that, given the number , there exists a word of length containing distinct words of length , and also words of length . This means that and coincide on and . One the one hand, means that contains all possible words of length as factors, and this implies that it also contains all possible words of shorter lengths, hence for . On the other hand, means that each of the factors of length of occurs exactly once, as there are precisely available positions for a factor of this length, and this implies that longer factors occur only once too, hence for . We have shown that for all , and the question is positively answered for .
If we consider now , case which corresponds to the choice in the proof of Theorem 8, we obtain the existence of a word of length containing all words of length . The point ( ) being on both the curves ( ) and ( ), it follows that for all .
We mention at first a sufficient condition for the existence, for and , of a word of length whose complexity function is equal to .
Lemma 9. Given an alphabet with , if for each there exists a de Bruijn word of order from which it is possible to obtain successively words shorter with one symbol so that the number of subwords of length decreases by one, but the number of words of length remains , until we are left with a word of length , then for each there exists a word with .
Proof. Let be the word of length obtained from after having removed letters, at each step the number of subwords of length being diminished by 1 , while the number of subwords of length remains constant. Then and , hence for each .
Remark 5. The condition in Lemma 9 is fulfilled if there exists a de Bruijn word of order whose prefix is a de Bruijn word of order . In this case we can simply delete in turn one letter from the end of the word of order .
The existence of words which satisfy the conditions in Lemma 9 (in fact those in Remark 5) was proved for by Vörös [20]. It follows also as a consequence of a stronger result obtained by Cummings and Wiedemann in Proposition 2 from [5]. In fact the overlap of the two de Bruijn sequences in [5] is even longer than it is needed in Remark 5. We remind that the de Bruijn graph has as vertices the elements in and an arc from any vertex to , where for . The graph
has as vertices the arcs of , and the arcs in this graph are obtained by joining two consecutive arcs in . An Eulerian circuit in corresponds to a Hamiltonian one in and conversely. The result of Cummings and Wiedemann follows from the fact that if one removes from the Eulerian circuit in , which corresponds to a de Bruijn sequence of order , the circuit corresponding to a de Bruijn sequence of order , the remaining graph is still Eulerian and connected (it is essential that ).
Lemma 10. [5]. If and each de Bruijn sequence of order can be strongly embedded in a de Bruijn sequence of order with , i.e., the two sequences have the same symbols on the first positions.
It follows that, for , there exist infinite sequences whose prefixes of length satisfy (14) for each . Such sequences were called in [13] and [20] supercomplex; similarly, a word of length was called supercomplex if all its prefixes of length satisfied (14). In [13] and [20] it was shown that supercomplex sequences do not exist for binary alphabets, more precisely it was verified that a binary supercomplex word has the length at most 9 . This means that no de Bruijn sequence of order 2 can be embedded in a de Bruijn sequence of order 3. In [5] a general negative result is given for binary alphabets: in this case no de Bruijn sequence of order ever embeds in a de Bruijn sequence of order (even if we ask the coincidence to take place only for the first positions, hence a weak embedding). It follows that for a binary alphabet we cannot obtain a word as that in the sufficient condition in Remark 5, unless . Nevertheless we can construct in this case a de Bruijn word of order from which the sequences in Lemma 9 can be obtained, even if this word has not as a prefix a de Bruijn word of order .
Lemma 11. A finite number of cycles can be appended to any de Bruijn cycle of order over a binary alphabet in order to make it a de Bruijn cycle of order .
Proof. Let be a de Bruijn cycle of order . It will be also a Hamiltonian circuit in the de Bruijn graph . The graph , formed by all the vertices in and the arcs in which are not in the Hamiltonian circuit determined by , has each vertex of degree 2 (one outward and one inward arc). It follows that will be a union of vertex disjoint cycles, and and are arc disjoint. Each of these cycles will have common vertices with the Hamiltonian circuit determined by , hence they can be appended one by one to it to form finally an Eulerian circuit in , that is a de Bruijn cycle of order .
Now we can state
Theorem 12. For each alphabet with and for each there exists a word of length whose complexity function coincides with the function .
Proof. For , or and , the result was already proved in Remark 1.
Let and so that . Consider a de Bruijn cycle of order (constructed for example using Martin’s algorithm) and extend it as in Lemma 11, by adding vertex disjoint cycles, to a de Bruijn cycle of order . Write it as a de Bruijn word such that it ends with the letters of one of the appended cycles. When we remove one by one all the symbols in that cycle, the number of subwords of length will decrease at each step by one, but the number of subwords of length will remain the same (all these subwords are included in the initial de Bruijn cycle). Write again the obtained cycle as a word which ends with another appended cycle and delete in turn the last symbol until the cycle disappears. Finally we are left with a word of length obtained from the initial de Bruijn cycle of order , which contains words of length and words of length .
If we are not interested to obtain all the words of length , , but only a specific one, we can apply a result of Shallit [19]: For each the graph contains a cycle of length that can be used to construct a closed chain of length which visits every vertex at least once.
Finally, let and so that . Applying Lemma 10 for , we obtain the existence of a de Bruijn word of order which has as prefix a de Bruijn word of order , hence it satisfies the conditions in Lemma 9. It follows that for each there exists a word of length (obtained by successively deleting a symbol from the end of the de Bruijn word of order ) whose complexity function is the function corresponding to that .
Example 5. Let us first consider the case of a binary alphabet . We shall construct, as in the proof of Theorem 12, words with for which . We can obviously consider and . We have a weak embedding, marked by a gap, of the de Bruijn word of order 1, , in the de Bruijn word of order 2, (situation which is no longer possible for words of order , respectively for ). We obtain in turn and . Let us now consider for the Martin cycle , which corresponds to the word . The graph obtained from by removing all the arcs of is the union of the cycles , and , i.e., , respectively . Appending these to the cycle we obtain for instance the de Bruijn cycle of order , where we underlined the appended cycles; we write it as
deleting the symbols of ( ) from the end, and then those in the loops ( ) and (b) (which can be deleted without shifting the cycle) we obtain in turn
For all these words obtained from we have and , .
In general, besides the loops ( ) and ( ), for greater values of we can have in more than one cycle. We shall consider now and we shall construct the words of length . To avoid too lengthy words we shall write for the concatenation of letters . For , the Martin cycle is . The graph obtained from by removing the arcs of is the union of the cycles
| (15) |
where, for example, ( ) represents , and ( ) is . A de Bruijn cycle of order 6 obtained by appending the five cycles (which will be underlined) is
and we can write
the last one corresponding to the cycle . We write it as a word ending with the next cycle, , in the union (15)
and we obtain
We write now the corresponding word ending with the cycle ( )
and from this we get
Deleting the loop ( ) and then the loop ( ) we obtain at last
We have and for .
Let now a 3-letter alphabet be given (the situation is similar for any ). We have obviously . From a de
Bruijn word of order 2 which contains a strongly embedded de Bruijn word of order 1
abca accbba,
the gap marking the end of the overlapping, we can obtain the words
Similarly, from the de Bruijn word of order 3 which contains a strongly embedded de Bruijn word of order 2
aabbacbccaa cababcacccbbbcbaaa,
we can obtain successively the words
which satisfy and for .
Remark 6. It is clear now that Theorem 8 is a consequence of the stronger result from Theorem 12. However, if one is interested only in obtaining words with , the constructive methods in the proof of Theorem 8 are simpler and faster.
6. OTHER COMPLEXITY MEASURES FOR FINITE WORDS
The complexity function which was used throughout the paper has the advantage that it can be defined in the same way both for infinite sequences and for finite words, as it was stated in the introduction. As far as finite words are concerned, the first measure of subword complexity seems to have been introduced by Heinz [11] as the total number of factors of ,
The problem of studying the maximum of over all the words of length over a finite alphabet with elements was a central one. It is easy to see that the maximum of over the words in is attained at if for each the maximum of is attained at . One has then the delimitation (obtained in [20])
with defined in (4), and, having in mind the explicit form of in (5),
| (16) |
where is the unique natural number for which . The bound in (16) appeared in [12] and [19]. In view of Theorem 12, there are words of length whose total complexity equals the value in the right hand side of (16). The existence of such words, as it was already mentioned in the proof of Theorem 12, was established for binary alphabets in [19].
There are also other notions of complexity for finite words. The maximal complexity of a word , defined by Rauzy, is
A notion of global complexity for finite words is given in [1], namely the global maximal complexity in
By it is denoted the set of the values for which there exists a word such that
With these notations we obtain from Theorem 12
Corollary 13. For each , the global maximal complexity is given by , the function being defined by (4).
Proof. We have
the last equality following from the fact that coincides with the complexity function for at least one word of length .
Applying the result in Proposition 2, the values for and given in [1] can be easily obtained.
Corollary 14. For , we have . If , then ; if , then .
Acknowledgment. This research was partially supported by the program of academic exchange CNRS-Romanian Academy.
REFERENCES
[1] Anisiu, M.-C., Blázsik, Z. and Kása, Z., Maximal complexity of finite words, Pure Math. Appl., 13, pp. 39-48, 2002.
[2] de Bruijn, N. G., A combinatorial problem, Nederl. Akad. Wetensch. Proc., 49, pp. 758-764, 1946 = Indag. Math., 8, pp. 461-467, 1946.
[3] de Bruijn, N. G., Acknowledgement of priority to C. Flye Sainte-Marie on the counting of circular arrangements of zeros and ones that show each -letter word exactly once, T. H. -Report 75-WSK-06, Technological University Eindhoven, the Netherlands, pp. 114, 1975.
[4] Champernowne, D. G., The construction of decimals normal in the scale of ten, J. London Math. Soc., 8, pp. 254-260, 1933.
[5] Cummings, L. J. and Wiedemann, D., Embedded de Bruijn sequences, Proceedings of the 7th Southeastern international conference on combinatorics, graph theory, and computing (Boca Raton, Florida, 1986), Congr. Numer., 53, pp. 155-160, 1986.
[6] Ehrenfeucht, A., Lee, K. P. and Rozenberg, G., Subword complexities of various classes of deterministic developmental languages without interactions, Theoret. Comput. Sci., 1, pp. 59-75, 1975.
[7] Flye Sainte-Marie, C., Solution to question nr. 48, l’Intermédiaire des Mathématiciens, 1, pp. 107-110, 1894.
[8] Fredericksen, H., A survey of full length nonlinear shift register cycle algorithms, SIAM Review, 24, pp. 195-221, 1982.
[9] Games, R. A., A generalized recursive construction for de Bruijn sequences, IEEE Trans. Inform. Theory, 29, pp. 843-850, 1983.
[10] Good, I. J., Normal recurring decimals, J. London Math. Soc., 21, pp. 167-169, 1946.
[11] Heinz, M., Zur Teilwortkomplexität für Wörter und Folgen über einem endlichen Alphabet, EIK, 13, pp. 27-38, 1977.
[12] Hunyadváry, L. and Iványi, A., On some complexity measures of words, Dep. Math., Karl Marx Univ. Econ., Budapest 1984-2, pp. 67-82, 1984.
[13] Hunyadváry, L. and Iványi, A., Construction of complex chains sequences and ring sequences, in Abstracts of Colloquium Theory of Algorithms, Pécs, p. 20, 1984.
[14] Levé, F. and Séébold, P., Proof of a conjecture on word complexity, Journées Montoises d’Informatique Théorique (Marne-la-Vallée, 2000), Bull. Belg. Math. Soc. Simon Stevin, 8, pp. 277-291, 2001.
[15] de Luca, A., On the combinatorics of finite words, Theoret. Comput. Sci., 218, pp. 13-39, 1999.
[16] Martin, M. H., A problem in arrangements, Bull. American Math. Soc., 40, pp. 859864, 1934.
[17] Morse, M. and Hedlund, G. A., Symbolic dynamics, Amer. J. Math., 60, pp. 815-866, 1938.
[18] Ralston, A., A new memoryless algorithm for de Bruijn sequences, J. Algorithms, 2, pp. 50-62, 1981.
[19] Shallit, J., On the maximum number of distinct factors in a binary string, Graph Comb., 9, pp. 197-200, 1993.
[20] Vörös, N., On the complexity measures of symbol-sequences, in Proceedings of the Conference of Young Programmers and Mathematicians (ed. A. Iványi), Eötvös Loránd University, Budapest, pp. 43-50, 1984.
Received by the editors: March 10, 2004.
