52.219 Minimum Spanning Tree (MST)

The minimum spanning tree (MST) is a solution to the minimum connector problem. An example of the minimum connector problem might be as follows. We have to build an irrigation system (made of canals) connecting given locations. Some of these locations cannot be directly connected (maybe because of cost or geographical constraints). Design a system at minimum total cost.

We can take the locations as vertices of a graph, and the canals as the edges of the graph, where the edges are labelled with a weight, giving us the cost of constructing that canal. We must then find a subgraph that passes through each vertex. Clearly this subgraph must be a tree, because if it wasn't a tree it must contain a cycle, and the cycle can be broken by deleting an edge, reducing the cost, and not disconnecting the graph.

What follows are two algorithms that compute the MST, and a proof that they are correct.

Kruskal's Algorithm
This is a greedy algorithm, at each stage making the greediest possible choice (ie. locally best) with no concern to the implications of this choice on future choices. Such techniques generally do not deliver optimal solutions, but in this case it does, and we give a proof of this.

Kruskal's algorithm can be stated as follows:


    0. Create a minimum spanning tree T that initially contains no edges,
    1. Choose an edge e in G, where 
         (a) e is not in T and ...
         (b) e is of minimum weight and ...
         (c) e does not create a cycle in T
    2. If T does not contain all the vertices of G go to step 1.

The tree T, produced by the above algorithm, is a MST. We prove this by contradiction.

Proof of Kruskal's algorithm
Let T be a minimum spanning tree of G, and assume (wrongly, as we will show) that there exists another minimum spanning tree S, such that weight of S (ie. W(S)) is less than weight of T (ie. W(S) < W(T))


1. Let e be the edge of least weight in T that is not in S

2. Add edge e to S
   (a) This creates a cycle C, and C contains e
   (b) Cycle C contains an edge e', where e' is not in T. 
       Proof: This must be the case otherwise all edges in C - e are already 
              in T, and T would also contain a cycle, and would not be a tree.

   (c) If we replace e' in S by e we get a spanning tree S' where
       (i)   cost of e <= cost e'
             Proof: e and e' must be coincident on one vertex in
                    common, and the above algorithm would have chosen e in 
                    preference to e' to create T.
       (ii)  S' is now one edge closer to T than S' is to T.

3. W(S') <= W(S). We can now repeat from step 1 until S' = T

4. Process terminates with S' = T and W(T) <= W(S). This contradicts 
   initial assumption, that there can be another MST, S< of less cost
   than T. Consequently Kurskal's algorithm is sound.

Prim's algorithm
In Prim's algorithm we deal with vertices rather than edges (a crude differentiation between Kruskal's algorithm, but a starting point at least). The algorithm goes as follows.

    1. Put an arbitrary vertex into T
    2. Add an edge e of minimum weight to T, where e joins 
       a vertex in T to a vertex not in T.

More specifically,

   0. let U be the set of vertices not in the minimum spanning tree T, and
      initialise U to be the set {1}
      let V-U be the set of vertices not in the minimum spanning tree, and
      initialise V-U to be {2,3,4,...,n}
      let T be the set of edges in the minimum spanning tree, and
      initialise T to be the empty set {}

   1. let e be the lowest cost edge in G that joins a vertex in U to a
      vertex in V-U

   2. Add edge e to T

   3. let w be the (second e), ie w is the vertex in V-U that is 
      an end vertex of the edge e
   4. Add vertex w to U

   5. Remove vertex w from V-U

   6. If the set V-U is not empty then go to step 1 otherwise deliver T

The above algorithm can be shown to be sound using a similar proof to the one above.

Complexity of Prim
Assume the graph G is complete. At each iteration, step 1 compares the distance of each of the vertices in U to each of the vertices in V-U. Initially there will be 1 vertex in U and n-1 vertices in V-U, leading to n-1 looks at edge weights. Next time round there will be 2 vertices in U and n-2 vertices in V-U, and we examine 2(n-2) edges. Next time there will be 3 vertices in U, (n-3) vertices in V-U and 3(n-3) edges will be examined ... and so on. So we have


   Sn = 1.(n-1) + 2(n-2) + 3(n-3) ... n(n-(n-1))
      = n + 2n + 3n + 4n ... - (1 + 4 + 9 + 16 ...)
      = n(1 + 2 + 3 + 4 ...) - (1 + 4 + 9 + 16 ...)
      = (n*n*n - n*n - n)/6
      = O(n*n*n)

Therefore, for a complete graph, the above implementation of Prim's algorithm is of cubic complexity.

Prim Demo
A scheme encoding of the above algorithm can be found in /home/s7/pat/scheme/pub/prim.scm When you load the file it will give a small guide to using a demo for Prim's algorithm. This will create a complete graph of n vertices and will compute the MST. As it goes it will draw out the edges of the MST. I would suggest that you do not go beyond a graph with 100 vertices, as it take considerable time to create the graph.