AN APPROXIMATION ALGORITHM FOR THE AT LEAST VERSION OF THE GENERALIZED MINIMUM SPANNING TREE PROBLEM

. We consider the at least version of the Generalized Minimum Spanning Tree Problem, denoted by L-GMSTP, which consists in ﬁnding a minimum cost tree spanning at least one node from each node set of a complete graph with the nodes partitioned into a given number of node sets called clusters. We assume that the cost function attached to edges satisﬁes the triangle inequality and the clusters have sizes bounded by ρ . Under these assumptions we present a 2 ρ approximation algorithm. The algorithm works by rounding an optimal fractional solution to a linear programming relaxation. Our technique is based on properties of optimal solutions to the linear programming formulation of the minimum spanning tree problem and the parsimonious property of Goemans and Bertsimas.


INTRODUCTION
The minimum spanning tree (MST) problem can be generalized in a natural way by considering instead of nodes node sets (clusters) and asking for a minimum cost tree spanning exactly one node from each cluster.This problem is called the generalized minimum spanning tree problem (GMSTP) and it was introduced by Myung et al. [6].
Two variants of the generalized minimum spanning tree problem were considered in the literature: one in which in addition to the cost attached to the edges, we have costs attached also to the nodes, called the prize collecting generalized minimum spanning tree problem, see [9] and the second one consists in finding a minimum cost tree spanning at least one node from each cluster, denoted by L-GMSTP and introduced by Dror et al. [1].The same authors have proven that the L-GMSTP is NP-hard.
The theory of NP-completeness has reduced hopes that the NP-hard problems can be solved within polynomially bounded computation times.At the expense of reducing the quality of the solution by relaxing some of the requirements, for example we can relax the requirement that the algorithm always finds an optimal solution, we can get considerable speed-up in complexity.Consequently, there is much interest in approximation, heuristic and metaheuristic algorithms.
An algorithm is an α-approximation algorithm for an optimization problem if 1.The algorithm runs in polynomial time.2. The algorithm produces a solution which is within a factor of α of the value of the optimal solution.
Approximation algorithms have been around since 1966, but in the last years there has been a great deal of research, and two different strands converged: complexity theorists have developed powerful tools for showing that no αapproximation algorithms can exist unless P = N P and algorithm designers have developed techniques that apply to a wide range of problems.
The aim of this paper is to describe an approximation algorithm for the at least version of the generalized minimum spanning tree problem under some special assumptions.

DEFINITION OF THE PROBLEM
The at least version of the generalized minimum spanning tree problem (L-GMSTP) is defined on an undirected graph G = (V, E) with nodes partitioned into m clusters.Let |V | = n and K = {1, 2, . . ., m} be the index set of the node sets (clusters).Then, We assume that the graph G is complete and each edge e = {i, j} ∈ E has a nonnegative cost denoted by c ij .
The L-GMSTP is the problem of finding a minimum-cost tree spanning a subset of nodes which includes at least one node from each cluster.

INTEGER PROGRAMMING FORMULATIONS
The L-GMSTP can be formulated as an integer program in many different ways.For example, introducing the variables x e ∈ {0, 1}, e ∈ E and y i ∈ {0, 1}, i ∈ V , to indicate whether an edge e respectively a node i is contained in the spanning tree, a feasible solution to the L-GMSTP can be seen as a connected subgraph with at least one node selected from every cluster and connecting all the clusters.Therefore the L-GMSTP can be formulated as the Fig. 1.Example of a feasible solution of the L-GMSTP problem, where at least one node from each cluster is selected.
following 0-1 integer programming problem: where for S ⊆ V , the cutset, denoted by δ(S), is defined as usually: In the above formulation, we use the standard shorthand notations: In the integer programming formulation of the L-GMSTP, constraints (1) guarantee that from every cluster we select at least one node, constraints (2) guarantee that the selected subgraph is connected and finally constraint (3) guarantees that the selected subgraph has y(V ) − 1 edges.
An equivalent integer programming formulation of the L-GMSTP used in developing an approximation algorithm for the problem is described in what it follows: In this new integer programming formulation, the constraint (3) is omitted, because it is redundant under the assumption that the edges have a nonnegative cost.
We consider the linear programming (LP) relaxation of this integer program obtained by replacing the integrality constraints x e ∈ {0, 1}, for all e ∈ E and y i ∈ {0, 1}, for all i ∈ V by the constraints x e ∈ [0, 1], for all e ∈ E and y i ∈ [0, 1], for all i ∈ V .

AN APPROXIMATION ALGORITHM FOR THE L-GMSTP
In this section we present an approximation algorithm for the L-GMSTP under the following two assumptions: 1.The cost function c : E → R + attached to the edges of G satisfies the triangle inequality: 2. The clusters are bounded: for some ρ > 0. For this class of problem instances we can efficiently construct a solution with cost at most 2ρ times the optimum.The design of the approximation algorithm is based on solving the LP relaxation of the integer programming formulation of the L-GMSTP and round the fractional solution to a nearby integral one.Our technique of rounding is based on properties of optimal solutions to the linear programming formulation of the minimum spanning tree problem and the parsimonious property of Goemans and Bertsimas, see [4].
Let (y * , x * , Z * LP ) = ((y * i ) n i=1 , (x * e ) e∈E , Z * LP ) be the optimal fractional solution of the LP relaxation.Obviously, the following inequality between the value of the optimal fractional solution of the LP relaxation and the value of the optimal solution of the IP formulation holds: The LP relaxation can be solved in polynomial time (relative to the input size of the problem) using the ellipsoid method [5] or interior point methods [12].
Because the cluster sizes are assumed to be bounded by ρ, i.e. |V k | ≤ ρ, k ∈ K, there exists in each cluster V k at least one node v ∈ V k such that Let W = {v 1 , ..., v m , ..., v p } ⊆ V denote the set of chosen nodes.We now compute the minimum cost tree spanning the nodes of W and claim that this tree, which is a feasible (approximate) solution of the L-GMSTP, denoted T (W ), has the cost at most 2ρ times the optimum of the IP formulation of L-GMSTP, Z IP .More precisely, we show that Theorem 1.The performance ratio for approximating the optimum solution to the L-GMSTP satisfies:

PROOF OF CORRECTNESS
The crucial argument in providing the approximation algorithm is based on the parsimonious property, see [4].
Given a complete undirected graph G = (V, E).We associate with each edge (i, j) ∈ E a cost c ij and for any pair (i, j) of vertices, let r ij be the connectivity requirement between i and j (r ij is assumed to be symmetric, i.e. r ij = r ji ).A network is called survivable if it has at least r ij edge disjoint paths between any pair (i, j) of vertices.
The survivable network design problem consists in finding the minimum cost survivable network.This problem can be formulated by the following integer program: We denote by IZ ∅ (r) the optimal value of the above integer program.Let (P ∅ (r)) denote the linear programming relaxation of (IP ∅ (r)) obtained by dropping the integrality restrictions and let Z ∅ (r) be its optimal value.By definition the degree of vertex i ∈ V is d x (i) = x(δ(i)), for any feasible solution x, either to (IP ∅ (r)) or to (P ∅ (r)).Because of constraints (6) for S = {i}, the degree of vertex i is at least equal to max j∈V \{i} r ij .If d x (i) = max j∈V \{i} r ij , then we say that x is parsimonious at vertex i.If we impose that the solution x is parsimonious at all the vertices of a set D ⊆ V , we get some interesting variations of (IP ∅ (r)) and (P ∅ (r)), denoted by (IP D (r)) and (P D (r)), respectively.The formulation of (IP D (r)) as an integer program is: We denote by IZ D (r) the optimal value of the above integer program.Let (P D (r)) denote the linear programming relaxation of (IP D (r)) obtained by dropping the integrality restrictions and let Z D (r) be its optimal value.Theorem 2. (parsimonious property, Goemans and Bertsimas [4]) If the costs c ij satisfy the triangle inequality, then The proof of this theorem is based on a result on connectivity properties of Eulerian multigraphs.
Let now W ⊆ V and consider the following linear program: Problem LP2: Replacing constraints (9) with the integrality constraints x e ∈ {0, 1}, the formulation obtained is the formulation of the minimum tree spanning the subset of nodes W ⊂ V .
Consider the following relaxation of the problem LP2.Problem LP3: Thus we omitted constraint (8) and relaxed constraint (9).The following result is a straightforward consequence of the parsimonious property, if we choose r ij = 1, if i, j ∈ W , and 0 otherwise, and D = V \ W . Lemma 3. The optimal solution values to linear programming problems LP2 and LP3 are the same, that is . Consider the following integer programming problem: Problem IP4: x e ∈ {0, 1}, e ∈ E. (12) Clearly, it is the integer programming formulation of the MST (minimum spanning tree) problem.Let LP4 be the LP relaxation of this formulation, that is, we simply replace the constraint ( 12) by the constraint 0 ≤ x e ≤ 1, for all e ∈ E.
Denote by Z * 4 the value of the optimal solution of the LP4.The following known result for minimum spanning trees holds: , where c(T (V )) denotes the cost of the minimum spanning tree on V. Proof.See for example [4].
Let W ⊆ V , then Proposition 4 can be easily modified to obtain: Proof.Let (x e ) be a feasible solution to linear program LP2.If implies that x e = 0 and using Proposition 4 we prove the inequality.Consider that (y * , x * , Z * LP ) is the optimal solution to the linear programming relaxation of the L-GMSTP, then we define Now, let us show that ( x e ) e∈E is a feasible solution to LP3.Indeed, x e ≥ 0 for all e ∈ E, hence condition (10) is satisfied.Let S ⊂ V be such that W ∩ S = ∅ = W \ S and choose some i ∈ W ∩ S. Hence y i = 1 and y * i ≥ 1 ρ .Then we have x(δ(S)) = e∈δ(S) x e = ρ e∈δ(S) x * e ≥ ρy * i ≥ ρ 1 ρ = 1, by definition of x e and the fact that the (x * e ) solve the linear programming relaxation of the L-GMSTP.Therefore ( x e ) satisfy constraint (7) in LP3.
Now we are able to prove the performance bounds for the approximation algorithm that we proposed for the L-GMSTP:

CONCLUSIONS
For a special class of the L-GMSTP instances, i.e. graphs with the cost function attached to the edges of the graph satisfying the triangle inequality and the clusters having bounded sizes by ρ, we provided an approximation algorithm for the problem which delivers a solution with cost at most 2ρ times the optimum.