52.219 Connected?

An important property of a (undirected) graph is whether it is connected or disconnected. That is, is there a path from every vertex to every other vertex? This could be important for a number of reasons. If the graph represents some service, then if it is diconnected, that service might not get to everyone. If the graph represents a problem, then if it is disconnected then we might have a collection of independent problems, and we can solve each one in turn. So, being connected might be good, or might be bad.

Random Graphs
Our graphs (undirected) are created using the function call (make-random-graph n p) where n is the number of vertices and p is the probability of an edge existing between a pair of vertices. That is, the generator selects two vertices i and j, and then generates a random number x in the range 0 to 1 (ie, via a call to (random 1.0)). If x is less than or equal to p the pair of directed arcs (i j) and (j i) are created.

If p is 0 no edges will exist in the graph, and if p is 1.0 then all possible edges will exist. That is, there is a maximum of n(n-1)/2 edges in a graph (each of the n vertices can be adjacent to at most n-1 other vertices, and since edges are symmetric, we divide this number by 2 (not using a Pentium chip I might add:)). In the random graph (make-random-graph n p) we will have in total p.n.(n-1)/2 edges.

For a given value of n, there will be a corresponding value of p where the random graphs become connected. For example if n=20 and p=0.05 it would not be possible for graphs to be connected (because we need at least n-1 edges), and not until p=0.1 would it be at all possible (and even then only remotely). So, for a given value of n, if we vary p from (say) 0.01 to 0.5, and at each value of p generate 100 random graphs, we could count the number of those graphs that are connected. We can then plot this, on the x axis we have p, and on the y axis the number of connected graphs. We should then see an abrupt change in the graphs at a critical value for p. This value of p will be dependent upon n.

How do we determine if a graph is connected?
There are at least 3 ways. The first is to use depth first search. If on termination all of the n elements of the vector mark are set to true, we will have visited all vertices and G must be connected. So, a 2 pass algorithm. Do dfs and then scan the mark vector.

An alternative is as follows. We have three sets, U, V-U, and S. U is the set of vertices that we have visited. V-U is the set of vertices that we have not yet visited. S is the set of vertices that are in V-U and are adjacent to vertices in U. Initially U contains an initial vertex from G, say {1}, and V-U is then the set of vertices in G, but excluding vertex 1, and S is empty.

The algorithm goes as follows:


   1. Let S be the set of vertices in V-U that are adjacent to the
      vertices in U

   2. If S is empty then go to step 6

   3. Add the elements in S to the set U

   4. Remove the elements of S from the set V-U

   5. Go to step 1

   6. If V-U is empty then the graph is connected, otherwise
      the graph is disconnected.

A question for us is then, what is the complexity of the above algorithm. Consider the worst case scenario, such that each time round the loop 1-5 a single value is added to the set S. The first time round the loop we will compare 1 vertex against n-1 others to see if they are adjacent (line 1). Consequently the set U has is of size 2, and set V-U is of size (n-2). Next time round the loop we compare each of the 2 elements in U against the n-2 vertices in V-U, and we find a single vertex. So we do 2.(n-2) tests for adjacency. Next time round we do 3.(n-3) tests, an so on ... so the series looks as follows:

   (n-1) + 2(n-2) + 3(n-3) + ... (n-1).(n-(n-1)) + n

An alternative approach, is to reuse something we already have, such as Floyd or Warshall's algorithm, algorithms that are O(n*n*n) (ie. cubic). If we compute the transitive closure using Warshall, and then examine any row of the resultant matrix, and discover that the row has 1's in all entries except the i'th entry, then it is connected.