Properties of palindromes in finite words

1 year ago

Abstract

We present a method which displays all palindromes of a given length from De Bruijn words of a certain order, and also a recursive one which constructs all palindromes of length $n+1$ from the set of palindromes of length $n$. We show that the palindrome complexity function, which counts the number of palindromes of each length contained in a given word, has a different shape compared with the usual (subword) complexity function. We give upper bounds for the average number of palindromes contained in all words of length n, and obtain exact formulae for the number of palindromes of length 1 and 2 contained in all words of length $n$.

Authors

Mira-Cristiana Anisiu
Tiberiu Popoviciu Institute of Numerical Analysis, Romanian Academy, Cluj-Napoca

Valeriu Anisiu
Department of Mathematics Faculty of Mathematics and Computer Science Babeş-Bolyai University of Cluj-Napoca

Zoltán Kása
Department of Computer Science Faculty of Mathematics and Computer Science Babeş-Bolyai University of Cluj-Napoca

Keywords

?

Paper coordinates

M.-C. Anisiu, V. Anisiu, Z. Kása, Properties of palindromes in finite words, Pure Math. Appl., 17 (2006) nos. 3-5, pp. 183-195.

PDF

https://ictp.acad.ro/anisiu/papers/2006-Anisiu-A-K-Properties.pdf

http://www.bke.hu/puma/17_3/AnisiuAnisiuKasa.pdf

About this paper

Journal

Pure Mathematics and Applications

Publisher Name

Romanian Academy

DOI

Print ISSN

Online ISSN

1788-800X

google scholar link

[1] J.-P. Allouche, M. Baake, J. Cassaigne and D. Damanik, Palindrome complexity, Theoret. Comput. Sci., 292 (2003), 9–31.
[2] M.-C. Anisiu and J. Cassaigne, Properties of the complexity function for finite words, Rev. Anal. Num. Théor. Approx., 33 (2004), 123–139.
[3] J.-P. Borel and C. Reutenauer, Palindromic factors of billiard words, Theoret. Comput. Sci., 340 (2005), 334–348.
[4] N.G. De Bruijn, A combinatorial problem, Nederl. Akad. Wetensch. Proc., 49 (1946), 758–764 = Indag. Math., 8 (1946), 461–467.
[5] A. Ehrenfeucht, K.P. Lee and G. Rozenberg, Subword complexities of various classes of deterministic developmental languages without interactions, Theoret. Comput. Sci., 1 (1975), 59–75.
[6] C. Flye Sainte-Marie, Solution to question nr. 48, l’Intermédiaire des Mathématiciens, 1 (1894), 107–110
[7] H. Fredricksen, A survey of full length nonlinear shift register cycle algorithms, SIAM Review, 24 (1982), 195–221.
[8] R.A. Games, A generalized recursive construction for De Bruijn sequences, IEEE Trans. Inform. Theory, 29 (1983), 843–850.
[9] M. Giel-Pietraszuk, M. Hoffmann, S. Dolecka, J. Rychlewski and J. Barciszewski, Palindromes in proteins, J. Protein Chem., 22 (2003), 109–113.
[10] I.J. Good, Normal recurring decimals, J. London Math. Soc., 21 (1946),
167–169.
[11] R.L. Graham, D.E. Knuth and O. Patashnik, Concrete Mathematics: A Foundation for Computer Science, 2nd edition (Reading, Massachusetts: Addison-Wesley), 1994.
[12] M. Heinz, Zur Teilwortkomplexität für Wörter und Folgen über einem endlichen Alphabet, EIK, 13 (1977), 27–38.
[13] A. de Luca, On the combinatorics of finite words, Theoret. Comput. Sci., 218 (1999), 13–39.
[14] A. de Luca and Al. de Luca, Combinatorial properties of Sturmian palindromes, Int. J. Found. Comput. Sci., 17 (2006), 557–573.
[15] F. Levé and P. Séébold, Proof of a conjecture on word complexity, Bull. Belg. Math. Soc. Simon Stevin, 8 (2001), 277–291.
[16] M.H. Martin, A problem in arrangements, Bull. American Math. Soc., 40 (1934), 859–864.
[17] M. Morse and G.A. Hedlund, Symbolic dynamics, Amer. J. Math., 60 (1938), 815–866.
[18] A. Ralston, A new memoryless algorithm for De Bruijn sequences, J. Algorithms, 2 (1981), 50–62

2006-Anisiu-A-K-Properties

Properties of palindromes in finite words

Mira-Cristiana AnisiuTiberiu Popoviciu Institute of Numerical AnalysisRomanian Academy, Cluj-Napocae-mail: mira@math.ubbcluj.roandValeriu AnisiuDepartment of MathematicsFaculty of Mathematics and Computer ScienceBabeş-Bolyai University of Cluj-Napocae-mail: anisiu@math.ubbcluj.roandZoltán KÁsaDepartment of Computer ScienceFaculty of Mathematics and Computer ScienceBabeş-Bolyai University of Cluj-Napocae-mail: kasa@cs.ubbcluj.ro

(Received: July 12-15, 2006)

Abstract

We present a method which displays all palindromes of a given length from De Bruijn words of a certain order, and also a recursive one which constructs all palindromes of length $n + 1$ from the set of palindromes of length $n$ . We show that the palindrome complexity function, which counts the number of palindromes of each length contained in a given word, has a different shape compared with the usual (subword) complexity function. We give upper bounds for the average number of palindromes contained in all words of length $n$ , and obtain exact formulae for the number of palindromes of length 1 and 2 contained in all words of length $n$ .

Mathematics Subject Classifications (2000). 68R15

1 Introduction

The palindrome complexity of infinite words has been studied by several authors (see [1], [3], [14] and the references therein). Similar problems related to the number of palindromes are important for finite words too. One of the reasons is that palindromes occur in DNA sequences (over 4 letters) as well as in protein description (over 20 letters), and their role is under research ([9]).

Let an alphabet

A

with

card (A) = q \geq 1

be given. The set of the words of length

n

over

A

will be denoted by

A^{n}

.

Given a word

w = w_{1} w_{2} \dots w_{n}

, the reversed of

w

is

\tilde{w} = w_{n} \dots w_{2} w_{1}

. Denoting by

ε

the empty word, we put by convention

\tilde{ε} = ε

. The word

w

is a palindrome if

\tilde{w} = w

. We denote by

a^{k}

the word

\underset{k times}{\underset{⏟}{a \dots a}}

. The set of the subwords of a word

w

which are nonempty palindromes will be denoted by PAL (

w

). The (infinite) set of all palindromes over the alphabet

A

is denoted by

PAL (A)

, while

{PAL}_{n} (A) = PAL (A) \cap A^{n}

.

2 Storing and generating palindromes

An old problem asks if, given an alphabet

A

with

card (A) = q

, there exists a shortest word of length

q^{k} + k - 1

containing all the

q^{k}

words of length

k

. The answer is affirmative and was given in [6], [10], [4]. For each

k \in N

, these words are called De Bruijn words of order

k

. This property can be proved by means of the Eulerian cycles in the De Bruijn graph

B_{k - 1}

. If a window of length

k

is moved along a De Bruijn word, at each step a different word is seen, all the

q^{k}

words being displayed.

We ask if it is possible to arrange all palindromes of length

k

in a similar way. The answer is in general no, excepting the case of the two palindromes

a b a \dots a

and

b a b \dots b

of odd length.

Proposition 1 Given a word

w \in A^{n}

and

k \geq 2

, the following statements are equivalent:
(1) all the subwords of length

k

are palindromes;
(2)

n

is even,

k = n - 1

and there exists

a, b \in A, a \neq b

so that

w = (a b)^{n / 2}

.

Furthermore, in this case the only palindromes of

w

are

(a b)^{n / 2 - 2} a

and

(b a)^{n / 2 - 2} b

.

Proof. Let us consider the first two palindromes

a_{1} a_{2} \dots a_{k}

and

b_{1} b_{2} \dots b_{k}

such that

a_{2} a_{2} \dots a_{k} = b_{1} b_{2} \dots b_{k - 1}

, hence

a_{k - i + 1} = a_{i} = b_{i - 1} = b_{k - i + 2}, i = 2, \dots, k

It follows

\begin{array}{ll} i = 2 & a_{k - 1} = a_{2} = b_{1} = b_{k} \\ i = 3 & a_{k - 2} = a_{3} = b_{2} = b_{k - 1} \\ i = 4 & a_{k - 3} = a_{4} = b_{3} = b_{k - 2} \\ \dots \\ i = k - 1 & a_{2} = a_{k - 1} = b_{k - 2} = b_{3} \\ i = k & a_{1} = a_{k} = b_{k - 1} = b_{2} \end{array}

If

k = 2 l, (l \geq 1)

we have

b_{2} = a_{1} = a_{3} = \dots = a_{k - 1}

and

b_{3} = a_{2} = \dots = a_{k}

and

a_{1} a_{2} \dots a_{k}

is a palindrome if and only if

a_{1} = a_{2} = \dots = a_{k}

, hence

a_{1} a_{2} \dots a_{k} = a^{k}

; it follows that

b_{1} b_{2} \dots b_{k} = a^{k}

too, and the two palindromes are equal.

If

k = 2 l + 1

, we have

b_{2} = a_{1} = a_{3} = \dots = a_{k}

and

b_{3} = a_{2} = \dots = a_{k - 1}

, hence

a_{1} a_{2} \dots a_{k} = a b a b \dots a (a \neq b)

and

b_{1} b_{2} \dots b_{k} = b a b \dots b

. If another palindrome will follow, it must be again

(a b)^{n / 2}

(equal with the first one).

Remark 1 For

k = 1

, the maximum length of a word containing all distinct palindromes of length 1 (i.e. letters) exactly once is

n = q

.

It is obvious that for

k \geq 2

it is not possible to arrange all palindromes of length

k

in the most compact way. But each palindrome is determined by the
parity of its length and its first

⌈ k / 2 ⌉

letters, where

⌈ \cdot ⌉

denotes the ceil function (which returns the smallest integer that is greater than or equal to a specified number).

Proposition 2 All palindromes of length

k

can be obtained from a De Bruijn word of length

q^{⌈ k / 2 ⌉} + ⌈ k / 2 ⌉ - 1

.

Proof. The De Bruijn word contains all different words of length

⌈ k / 2 ⌉

. Each such word

a_{1} \dots a_{⌈ k / 2 ⌉}

can be extended to a palindrome by symmetry, for

k

even, and by taking

a_{⌈ k / 2 ⌉ + 1} = a_{⌈ k / 2 ⌉ - 1}, \dots, a_{k} = a_{1}

, for

k

odd.

Example 1 Let

k = 3, q = 3

and the De Bruijn word of order

⌈ k / 2 ⌉ = 2 w_{1} = 0221201100

. From each word of length 2 which appears in the given De Bruijn word, we obtain the corresponding palindrome of length

k = 3

:

\begin{aligned} 02 \to 020 \\ 22 \to 222 \\ 21 \to 212 \\ 12 \to 121 \\ 20 \to 202 \\ 01 \to 010 \\ 11 \to 111 \\ 10 \to 101 \\ 00 \to 000 . \end{aligned}

Let

k = 4, q = 2

and the De Bruijn word of order

⌈ k / 2 ⌉ = 2 w_{2} = 01100

. From each word of length 2 contained in 01100 we obtain by symmetry the corresponding palindrome of length

k = 4

:

\begin{aligned} 01 \to 0110 \\ 11 \to 1111 \\ 10 \to 1001 \\ 00 \to 0000 . \end{aligned}

There are several algorithms which construct De Bruijn words, for example, in [16], [18], [7] and [8].

We can generate recursively all palindromes of length

n, n \in N

, using the difference representation. This is based on the following proposition.

Proposition 3 If

w_{1}, w_{2}, \dots, w_{p}

are all binary (

A = {0, 1}

) palindromes of length

n

, where

p = 2^{⌈ \frac{n}{2} ⌉}, n \geq 1

, then

2 w_{1}, 2 w_{2}, \dots, 2 w_{p}, 2^{n + 1} + 1 + 2 w_{1}, 2^{n + 1} + 1 + 2 w_{2}, \dots, 2^{n + 1} + 1 + 2 w_{p}

are all palindromes of length

n + 2

.

Proof. If

w

is a binary palindrome of length

n

, then

0 w 0

and

1 w 1

will be palindromes too, and the only palindromes of length

n + 2

which contains

w

as a subword, which proves the proposition.

In order to generate all binary palindromes of a given length let us begin with an example considering all binary palindromes of length 3 and 4 and their decimal representation:

000	0	0000	0
010	2	0110	6
101	5	1001	9
111	7	1111	15

The sequence of palindromes in increasing order based on their decimal value for a given length can be represented by their differences. The difference representation of the sequence

0, 2, 5, 7

is

2, 3, 2 (2 - 0 = 2, 5 - 2 = 3, 7 - 5 = 2)

, and the difference representation of the sequence

0, 6, 9, 15

is

6, 3, 6

. A difference representation is always a symmetric sequence and the corresponding sequence of palindromes in decimal can be obtained by successive addition beginning with

0 : 0 + 6 = 6, 6 + 3 = 9, 9 + 6 = 15

. By direct computation we obtain the following difference representation of palindromes for length

n \leq 8

.

$n$
	11
	23
3		2	3	2
4		6	3	6
5		4	6	4	3	4	6	4
6		12	6	12	3	12	6	12
7		8	12	8	6	8	12	8	3	8	12	8	6	8	12	8
8		24	12	24	6	24	12	24	3	24	12	24	6	24	12	24

We easily can generalize and prove by induction that the difference representations can be obtained as follows.

For

n = 2 k

we have the difference representation:

a_{1}, a_{2}, \dots, a_{2^{k} - 1},

from which the difference representation for

2 k + 1

is:

2^{k}, a_{1}, 2^{k}, a_{2}, 2^{k}, \dots, 2^{k}, a_{2^{k} - 1}, 2^{k} .

For

n = 2 k + 1

we have the difference representation:

2^{k}, a_{1}, 2^{k}, a_{2}, 2^{k}, \dots, 2^{k}, a_{2^{k} - 1}, 2^{k},

from which the difference representation for

2 k + 2

is:

3 \cdot 2^{k}, a_{1}, 3 \cdot 2^{k}, a_{2}, 3 \cdot 2^{k}, \dots, 3 \cdot 2^{k}, a_{2^{k} - 1}, 3 \cdot 2^{k} .

This representation can be generalized for

q \geq 2

. The number of palindromes in this case is

q^{⌈ \frac{n}{2} ⌉}

.

For

n = 2 k

we have the difference representation:

a_{1}, a_{2}, \dots, a_{q^{k} - 1}

from which the difference representation for

2 k + 1

is:

\underset{q - 1 times}{\underset{⏟}{q^{k}, \dots, q^{k}}}, a_{1}, \underset{q - 1 times}{\underset{⏟}{q^{k}, \dots, q^{k}}}, a_{2}, \underset{q - 1 times}{\underset{⏟}{q^{k}, \dots, q^{k}}}, \dots, \underset{q - 1 times}{\underset{⏟}{q^{k}, \dots, q^{k}}}, a_{q^{k} - 1}, \underset{q - 1 times}{\underset{⏟}{q^{k}, \dots, q^{k}}} .

For

n = 2 k + 1

we have the difference representation:

\underset{q - 1 times}{\underset{⏟}{q^{k}, \dots, q^{k}}}, a_{1}, \underset{q - 1 times}{\underset{⏟}{q^{k}, \dots, q^{k}}}, a_{2}, \dots, a_{q^{k} - 1}, \underset{q - 1 times}{\underset{⏟}{q^{k}, \dots, q^{k}}},

from which the difference representation for

2 k + 2

is:

\begin{aligned} \underset{q - 1 times}{\underset{⏟}{(q + 1) q^{k}, \dots, (q + 1) q^{k}}}, a_{1}, \underset{q - 1 times}{\underset{⏟}{(q + 1) q^{k}, \dots, (q + 1) q^{k}}}, a_{2}, \\ \dots, \underset{q - 1 times}{\underset{⏟}{(q + 1) q^{k}, \dots, (q + 1) q^{k}}}, a_{q^{k} - 1}, \underset{q - 1 times}{\underset{⏟}{(q + 1) q^{k}, \dots, (q + 1) q^{k}}} . \end{aligned}

3 The shape of the palindrome complexity functions

For an infinite sequence

U

, the (subword) complexity function

p_{U} : N ⟶ N

(defined in [17] as the block growth, then named subword complexity in [5]) is given by

p_{U} (n) = card (F (U) \cap A^{n})

for

n \in N

, where

F (U)

is the set of all finite subwords (factors) of

U

. Therefore the complexity function maps each nonnegative number

n

to the number of subwords of length

n

of

U

; it verifies the iterative equation

\begin{matrix} (1) & p_{U} (n + 1) = p_{U} (n) + \sum_{j = 2}^{q} (j - 1) s (j, n) \end{matrix}

s (j, n)

being the cardinal of the set of the subwords in

U

having the length

n

and the right valence

j

. A subword

u \in U

has the right valence

j

if there are

j

and only

j

distinct letters

x_{i}

such that

u x_{i} \in F (U), 1 \leq i \leq j

.

For a finite word

w

of length

n

, the complexity function

p_{w} : N ⟶ N

given by

p_{w} (k) = card (F (w) \cap A^{k}), k \in N

, has the property that

p_{w} (k) = 0

for

k > n

. The corresponding iterative equation is

\begin{matrix} (2) & p_{w} (k + 1) = p_{w} (k) + \sum_{j = 2}^{q} (j - 1) s (j, k) - s_{0} (k), \end{matrix}

where

s_{0} (k) = s (0, k) \in {0, 1}

stands for the cardinal of the set of subwords

v

(suffixes of

w

of length

k

) which cannot be continued as

v x \in F (w), x \in A

. We can write (2) in a condensed form

\begin{matrix} (3) & p_{w} (k + 1) = p_{w} (k) + \sum_{j = 0}^{q} (j - 1) s (j, k) . \end{matrix}

The above relations have their correspondents in terms of left extensions of the subwords.

For an infinite sequence

U

, the complexity function

p_{U}

is nondecreasing; more than that, if there exists

m \in N

such that

p_{U} (m + 1) = p_{U} (m)

, then

p_{U}

is constant for

n \geq m

.

The complexity function for a finite word

w

of length

n

has a different behaviour, because of

p_{w} (n) = 1

(there is a unique subword of length

n

, namely

w)

. It was proved ([12], [13], [15], [2]) that the shape of the complexity function is trapezoidal:

Theorem 1 Given a finite word

w

of length

n

, there are three intervals of monotonicity for

p_{w} : [0, J], [J, M]

and

[M, n]

; the function increases at first, is constant and then decreases with the slope -1 .

The palindrome complexity function of a finite or infinite word

w

is given by

{pal}_{w} : N ⟶ N, {pal}_{w} (k) = card (PAL (w) \cap A^{k}), k \in N

. Obviously,

\begin{matrix} (4) & {pal}_{w} (k) \leq p_{w} (k), k \in N \end{matrix}

and for finite words of length

| w | = n

,

\begin{matrix} (5) & {pal}_{w} (k) \leq min {q^{⌈ k / 2 ⌉}, n - k + 1}, k \in {0, \dots, n} \end{matrix}

The palindrome

u \in PAL (w)

has the palindrome valence

j

if there are

j

and only

j

distinct letters

x_{i}

such that

x_{i} u x_{i} \in PAL (w), 1 \leq i \leq j

. We denote by

\begin{matrix} (6) & s_{p} (j, k) = card {u \in (PAL (w) \cap A^{k}) : u has the palindrome valence j}, \end{matrix}

and by

s_{p} (0, k)

the cardinal of the set of subwords

v \in PAL (w) \cap A^{k}

(not necessarily suffixes or prefixes of

w

) which cannot be continued as

x v x \in PAL (w)

,

x \in A

.

The palindrome complexity function of finite or infinite words satisfies the iterative equation

\begin{matrix} (7) & {pal}_{w} (k + 2) = {pal}_{w} (k) + \sum_{j = 0}^{q} (j - 1) s_{p} (j, k) \end{matrix}

Due to the fact that the number of even palindromes is not directly related to that of odd ones, we do not expect that

{pal}_{w}

is of trapezoidal shape, as it was the case for the subword complexity function

p_{w}

.

Figure 1: Odd and even palindrome complexity function

For this reason we define the odd, respectively even palindrome complexity function as the restrictions of

{pal}_{w}

to odd, respectively even integers:

{pal}_{w}^{o}

:

2 N + 1 \to N, {pal}_{w}^{o} (k) = {pal}_{w} (k); {pal}_{w}^{e} : 2 N \to N, {pal}_{w}^{e} (k) = {pal}_{w} (k)

.

These functions have a trapezoidal form for short words; nevertheless, this is not true in general, as the following examples show.

Example 2 The word

w_{1} = 1010^{5} 1^{2} 0^{7} 10

with

| w_{1} | = 19

has

{pal}_{w_{1}}^{o} (1) = 2

,

{pal}_{w_{1}}^{O} (3) = 3, {pal}_{w_{1}}^{O} (5) = 1, {pal}_{w_{1}}^{O} (7) = 2, {pal}_{w_{1}}^{O} (9) = 1 .

(see Fig. 1.)

Example 3 The word

w_{2} = 1^{4} 0^{6} 10^{8} 1^{2} 0

with

| w_{2} | = 22

has

{pal}_{w_{2}}^{e} (2) = 2

,

{pal}_{w_{2}}^{e} (4) = 3, {pal}_{w_{2}}^{e} (6) = 1, {pal}_{w_{2}}^{e} (8) = 2, {pal}_{w_{2}}^{e} (10) = 1 .

(see Fig. 1.)

REMARK 2 The palindrome complexity for infinite words is not nondecreasing, as the usual complexity function is. Indeed, we can continue the word in Example 2 with

11001100 \dots

, and its odd palindrome complexity function will be as that for

w_{1}

, and then equal to 0 for

k \geq 11

. Similarly, we can continue

w_{2}

in Example 3 with

1010 \dots

to obtain an infinite word with the even palindrome complexity of

w_{2}

till

k = 10

and equal to 0 for

k \geq 12

.

4 Average number of palindromes

We consider an alphabet

A

with

q \geq 2

letters.
Definition 1 We define the total palindrome complexity

P

by

\begin{matrix} (8) & P (w) = \sum_{n = 1}^{| w |} {pal}_{w} (n) \end{matrix}

where

w

is a word of length

| w |

, and

{pal}_{w} (n)

denotes the number of distinct palindromes of length

n

which are nonempty subwords of

w

.

Because the set of the nonempty palindromes in

w

is denoted by PAL (

w

), we can write also

P (w) = card (PAL (w))

.

Definition 2 The average number of palindromes

M_{q} (n)

contained in all words of length

n

is defined by

\begin{matrix} (9) & M_{q} (n) = \frac{\sum_{w \in A^{n}} P (w)}{q^{n}} . \end{matrix}

We can give the following upper estimate for

M_{q} (n)

.
Theorem 2 For

n \in N

, the average number of palindromes contained in the words of length

n

satisfies the inequalities

\begin{aligned} M_{q} (n) \leq \frac{q^{- (n - 1) / 2} (q + 3) + 2 n (q - 1) + q^{3} - 2 q^{2} - 2 q - 1}{(q - 1)^{2}}, for n odd, \\ (10) & M_{q} (n) \leq \frac{q^{- n / 2} (3 q + 1) + 2 n (q - 1) + q^{3} - 2 q^{2} - 2 q - 1}{(q - 1)^{2}}, for n even . \end{aligned}

Proof. We have

\begin{aligned} \sum_{w \in A^{n}} P (w) & = \sum_{w \in A^{n}} \sum_{π \in PAL (w)} 1 = \sum_{w \in A^{n}} \sum_{k = 1}^{n} \sum_{π \in PAL (w) \cap A^{k}} 1 \\ = \sum_{w \in A^{n}} \sum_{π \in PAL (w) \cap A^{1}} 1 + \sum_{k = 2}^{n} \sum_{π \in {PAL}_{k} (A)} \sum_{\begin{array}{c} w \in A^{n} \\ π \in PAL (w) \cap A^{k} \end{array}} 1, \end{aligned}

and

\begin{matrix} (11) & \sum_{w \in A^{n}} \sum_{π \in PAL (w) \cap A^{1}} 1 \leq q q^{n} = q^{n + 1} \end{matrix}

For a fixed palindrome

π

, with

| π | = k

, the number of the words of length

n

in which it appears as a subword at position

i (1 \leq i \leq n - k + 1)

is

q^{n - k}

. But the position

i

is arbitrary, so that there are at most

(n - k + 1) q^{n - k}

words in which

π

is a subword, these words being not necessarily distinct. It follows that

\sum_{w \in A^{n}} P (w) \leq q^{n + 1} + \sum_{k = 2}^{n} \sum_{π \in {PAL}_{k} (A)} (n - k + 1) q^{n - k}

The number of the palindromes of length

k

is

q^{⌈ k / 2 ⌉}

, therefore

\sum_{w \in A^{n}} P (w) \leq q^{n + 1} + \sum_{k = 2}^{n} (n - k + 1) q^{n - k + ⌈ k / 2 ⌉}

and

M_{q} (n) \leq q + \sum_{k = 2}^{n} (n - k + 1) q^{- k + ⌈ k / 2 ⌉}

.

We split the sum according to

k = 2 j, j = 1, \dots, ⌊ n / 2 ⌋

, respectively

k = 2 j + 1

,

j = 1, \dots, ⌊ (n - 1) / 2 ⌋

, and obtain

M_{q} (n) \leq q + \sum_{j = 1}^{⌊ n / 2 ⌋} (n - 2 j + 1) q^{- j} + \sum_{j = 1}^{⌊ (n - 1) / 2 ⌋} (n - 2 j) q^{- j} .

Making use of

\sum_{j = 1}^{s} q^{- j} = (1 - q^{- s}) / (q - 1)

and

\sum_{j = 1}^{s} j q^{- j} = (q - q^{1 - s} (s + 1) + s q^{- s}) / (q - 1)^{2}

, it follows that

M_{q} (n)

satisfies the inequalities in (10).

Corollary 1 The following inequality holds

\begin{matrix} (12) & \underset{n \to \infty}{lim sup} \frac{M_{q} (n)}{n} \leq \frac{2}{q - 1} . \end{matrix}

Proof.

\begin{aligned} \underset{n \to \infty}{lim sup} & \frac{M_{q} (n)}{n} = max {\underset{n \to \infty}{lim sup} \frac{M_{q} (2 n + 1)}{2 n + 1}, \underset{n \to \infty}{lim sup} \frac{M_{q} (2 n)}{2 n}} \\ \leq & max {lim_{n \to \infty} (\frac{q^{- n} (q + 3) + 2 (2 n + 1) (q - 1) + q^{3} - 2 q^{2} - 2 q - 1}{(q - 1)^{2}}) \frac{1}{2 n + 1} \\ lim_{n \to \infty} (\frac{q^{- n} (3 q + 1) + 4 n (q - 1) + q^{3} - 2 q^{2} - 2 q - 1}{(q - 1)^{2}}) \frac{1}{2 n}} = \frac{2}{q - 1} \end{aligned}

We are interested in finding how large is the average number of palindromes contained in the words of length

n

compared to the length

n

. The numerical estimations done for small values of

n

show that

M_{q} (n)

is comparable to

n

, but Corollary 1 allows us to show that for

q \geq 4

this does not hold.

Corollary 2 For an alphabet with

q \geq 4

letters,

\begin{matrix} (13) & \underset{n \to \infty}{lim sup} \frac{M_{q} (n)}{n} < 1 . \end{matrix}

In the proof of Theorem 2 we have used the rough inequality (11), which was sufficient to prove the result. In fact, it is not difficult to calculate exactly

\begin{matrix} (14) & S_{n, p} = \sum_{w \in A^{n}} \sum_{π \in PAL (w) \cap A^{p}} 1 for p = 1, 2 . \end{matrix}

This result has intrinsic importance.
Theorem 3 The number of occurrences of the palindromes of length 1, respectively 2 , in all words of length

n

(counted once if a palindrome appears in a word, and once again if it appears in another one) is given by

\begin{matrix} (15) & S_{n, 1} = q^{n + 1} - q (q - 1)^{n}, \end{matrix}

respectively by

\begin{aligned} S_{n, 2} = & q^{n + 1} - \frac{q}{(q - 1) \sqrt{q^{2} + q - 3}} ({(\frac{q - 1 + \sqrt{q^{2} + q - 3}}{2})}^{n + 2} \\ (16) & - {(\frac{q - 1 - \sqrt{q^{2} + q - 3}}{2})}^{n + 2}) \end{aligned}

Proof. We use Iverson's convention [11]

[α] = {\begin{cases} 1, if α is true \\ 0, if α is false \end{cases}

and obtain

S_{n, 1} = \sum_{w \in A^{n}} \sum_{a \in A} [a in w] = q \sum_{w \in A^{n}} [a_{1} in w]

where

a_{1}

is a fixed letter of the alphabet

A

. Then

S_{n, 1} = q \sum_{w \in A^{n}} [a_{1} in w] = q (q^{n} - \sum_{w \in A^{n}} [a_{1} not in w]) = q^{n + 1} - q (q - 1)^{n}

We proceed similarly to calculate

S_{n, 2} = \sum_{w \in A^{n}} \sum_{π \in PAL (w) \cap A^{2}} 1

and obtain

S_{n, 2} = \sum_{w \in A^{n}} \sum_{a \in A} [a a in w] = q \sum_{w \in A^{n}} [a_{1} a_{1} in w]

where

a_{1}

is again a fixed letter of the alphabet

A

. We denote

φ (n) := \sum_{w \in A^{n}} [a_{1} a_{1}

in

w]

, for which

φ (2) = 1

and

φ (3) = 2 q - 1

. It is easier to establish a recurrence formula for

ψ (n) = q^{n} - φ (n) = \sum_{w \in A^{n}} [a_{1} a_{1}

not in

w]

. The number

ψ (n)

is obtained from:

the number $(q - 1) ψ (n - 1)$ of words which do not end in $a_{1}$ and have not $a_{1} a_{1}$ in their first $n - 1$ positions;
the number $(q - 1) ψ (n - 2)$ of words which end in $a_{1}$ , have the $n - 1$ position occupied by one of the other $q - 1$ letters and have not $a_{1} a_{1}$ in the first $n - 2$ positions.

It follows that

ψ

satisfies the recurrence formula

\begin{matrix} (17) & ψ (n) = (q - 1) (ψ (n - 1) + ψ (n - 2)) \end{matrix}

with

ψ (2) = q^{2} - 1

and

ψ (3) = q^{3} - 2 q + 1

. Its solution is

\begin{aligned} ψ (n) = & \frac{1}{(q - 1) \sqrt{q^{2} + q - 3}} ({(\frac{q - 1 + \sqrt{q^{2} + q - 3}}{2})}^{n + 2} \\ - {(\frac{q - 1 - \sqrt{q^{2} + q - 3}}{2})}^{n + 2}) \end{aligned}

and (16) follows from the fact that

\begin{matrix} (18) & S_{n, 2} = q (q^{n} - ψ (n)) . \end{matrix}

The expression of

S_{n, 2}

from (16) allows us to improve Corollary 1.
Corollary 3 The following inequality holds

\begin{matrix} (19) & \underset{n \to \infty}{lim sup} \frac{M_{q} (n)}{n} \leq \frac{q + 1}{q (q - 1)} . \end{matrix}

Proof. Taking into account the inequality

\sum_{w \in A^{n}} \sum_{π \in PAL (w) \cap A^{1}} 1 \leq q q^{n} = q^{n + 1},

and (18), we get

\begin{aligned} M_{q} (n) & \leq \frac{1}{q^{n}} (S_{n, 1} + S_{n, 2} + \sum_{k = 3}^{n} \sum_{π \in {PAL}_{k} (A)} (n - k + 1) q^{n - k}) \\ \leq q (2 - \frac{ψ (n)}{q^{n}}) + \sum_{k = 3}^{n} (n - k + 1) q^{- k + ⌊ (k + 1) / 2 ⌋} . \end{aligned}

But

0 < (q - 1 + \sqrt{q^{2} + q - 3}) / 2 < q

and

- 1 < (q - 1 - \sqrt{q^{2} + q - 3}) / 2 < 0

for

q \geq 2

, hence

lim_{n \to \infty} ψ (n) / q^{n} = 0

. Then

\begin{aligned} \underset{n \to \infty}{lim sup} \frac{M_{q} (n)}{n} & \leq lim_{n \to \infty} \frac{1}{n} \sum_{k = 3}^{n} (n - k + 1) q^{- k + ⌊ (k + 1) / 2 ⌋} \\ \leq \sum_{k = 3}^{\infty} q^{- k + ⌊ (k + 1) / 2 ⌋} = \sum_{i = 1}^{\infty} q^{- 2 i - 1 + i + 1} + \sum_{i = 2}^{\infty} q^{- 2 i + i} \\ = - \frac{1}{q} + 2 \sum_{i = 1}^{\infty} q^{- i} = \frac{q + 1}{q (q - 1)} . \end{aligned}

Corollary 4 The inequality (13) holds for

q = 3

too.
It seems that (13) holds also for

q = 2

. Using a computer program we obtained some values for the terms of the sequence

M^{*} (n) = M_{2} (n) / n, n \geq 2

. The first values are:

M^{*} (n) = 1, n = 2, \dots, 7; M^{*} (8) = 0.99750; M^{*} (9) =

0.98550 , which were close to 1 . We tried for greater values of

n

and get

\begin{array}{lll} M^{*} (20) = 0.89975, & M^{*} (21) = 0.89002, & M^{*} (22) = 0.88043 \\ M^{*} (23) = 0.87101, & M^{*} (24) = 0.86177, \dots, & M^{*} (30) = 0.81064 \end{array}

The last value was obtained in a very long time, so for greater values of

n

we generated some random words

w_{1}, w_{2}, \dots, w_{ℓ}

of length 100 , respectively 200 , 300, 400 and 500 over

A = {0, 1}

and get some roughly approximate values

M^{*} (n) ≃ ({pal}_{w_{1}} (n) + \dots + {pal}_{w_{ℓ}} (n)) / ℓ

. For

ℓ = 200

we obtained

\begin{aligned} M^{*} (100) ≃ 0.53, M^{*} (200) ≃ 0.39, M^{*} (300) ≃ 0.32 \\ M^{*} (400) ≃ 0.29, M^{*} (500) ≃ 0.26 \end{aligned}

This method allows us to obtain the previous exactly computed values

M^{*} (20)

,

\dots, M^{*} (30)

with two exact digits. These numerical results allow us to formulate the following
Conjecture The sequence

M_{q} (n) / n

is strictly decreasing for

n \geq 7

.
Acknowledgements. The first and the third author acknowledge the support of the Romanian Academy (Grant 13 GAR/2006) and the kind hospitality of Rényi Institute in the frame of the cooperation program between the Romanian Academy and the Hungarian Academy of Sciences.

References

[1] J.-P. Allouche, M. Baake, J. Cassaigne and D. Damanik, Palindrome complexity, Theoret. Comput. Sci., 292 (2003), 9-31.
[2] M.-C. Anisiu and J. Cassaigne, Properties of the complexity function for finite words, Rev. Anal. Num. Théor. Approx., 33 (2004), 123-139.
[3] J.-P. Borel and C. Reutenauer, Palindromic factors of billiard words, Theoret. Comput. Sci., 340 (2005), 334-348.
[4] N.G. De Bruijn, A combinatorial problem, Nederl. Akad. Wetensch. Proc., 49 (1946), 758-764 = Indag. Math., 8 (1946), 461-467.
[5] A. Ehrenfeucht, K.P. Lee and G. Rozenberg, Subword complexities of various classes of deterministic developmental languages without interactions, Theoret. Comput. Sci., 1 (1975), 59-75.
[6] C. Flye Sainte-Marie, Solution to question nr. 48, l'Intermédiaire des Mathématiciens, 1 (1894), 107-110.
[7] H. Fredricksen, A survey of full length nonlinear shift register cycle algorithms, SIAM Review, 24 (1982), 195-221.
[8] R.A. Games, A generalized recursive construction for De Bruijn sequences, IEEE Trans. Inform. Theory, 29 (1983), 843-850.
[9] M. Giel-Pietraszuk, M. Hoffmann, S. Dolecka, J. Rychlewski and J. Barciszewski, Palindromes in proteins, J. Protein Chem., 22 (2003), 109-113.
[10] I.J. Good, Normal recurring decimals, J. London Math. Soc., 21 (1946), 167-169.
[11] R.L. Graham, D.E. Knuth and O. Patashnik, Concrete Mathematics: A Foundation for Computer Science, 2nd edition (Reading, Massachusetts: Addison-Wesley), 1994.
[12] M. Heinz, Zur Teilwortkomplexität für Wörter und Folgen über einem endlichen Alphabet, EIK, 13 (1977), 27-38.
[13] A. de Luca, On the combinatorics of finite words, Theoret. Comput. Sci., 218 (1999), 13-39.
[14] A. de Luca and Al. de Luca, Combinatorial properties of Sturmian palindromes, Int. J. Found. Comput. Sci., 17 (2006), 557-573.
[15] F. Levé and P. Séébold, Proof of a conjecture on word complexity, Bull. Belg. Math. Soc. Simon Stevin, 8 (2001), 277-291.
[16] M.H. Martin, A problem in arrangements, Bull. American Math. Soc., 40 (1934), 859-864.
[17] M. Morse and G.A. Hedlund, Symbolic dynamics, Amer. J. Math., 60 (1938), 815-866.
[18] A. Ralston, A new memoryless algorithm for De Bruijn sequences, J. Algorithms, 2 (1981), 50-62.

2006

$n$
	11
	23
3		2	3	2
4		6	3	6
5		4	6	4	3	4	6	4
6		12	6	12	3	12	6	12
7		8	12	8	6	8	12	8	3	8	12	8	6	8	12	8
8		24	12	24	6	24	12	24	3	24	12	24	6	24	12	24

$n$
	11
	23
3		2	3	2
4		6	3	6
5		4	6	4	3	4	6	4
6		12	6	12	3	12	6	12
7		8	12	8	6	8	12	8	3	8	12	8	6	8	12	8
8		24	12	24	6	24	12	24	3	24	12	24	6	24	12	24

Properties of palindromes in finite words

Abstract

Authors

Keywords

Paper coordinates

PDF

About this paper

Journal

Publisher Name

DOI

Print ISSN

Online ISSN

References

Properties of palindromes in finite words

Abstract

1 Introduction

2 Storing and generating palindromes

3 The shape of the palindrome complexity functions

4 Average number of palindromes

References

Related Posts

$n$
	11
	23
3		2	3	2
4		6	3	6
5		4	6	4	3	4	6	4
6		12	6	12	3	12	6	12
7		8	12	8	6	8	12	8	3	8	12	8	6	8	12	8
8		24	12	24	6	24	12	24	3	24	12	24	6	24	12	24