52.219 Graphs

See source /home/s7/pat/scheme/pub/graphs.scm
Basic definition of a graph = (V,E), where the graph G is composed of a set of vertices V (also known as nodes) and a set of undirected edges E. Edges may be "labelled" with something, such as a cost, where cost may be a distance between vertices, time between vertices, cost in going between vertices, etc. Therefore, we may have a "labelled graph". Alternatively, edges maybe labelled with relations, such as before/after, < or >, and so on.

Representation
Basic representation, is typically as an adjacency matrix, or adjacency list In the adjacency matrix we have an n by n array such that if vertex i is adjacent to vertex j the array element will be 1, otherwise it will be zero. Another way of thinking of this is if the ith/jth array element is 1 then it is true that the edge (i j) exists in G.

With the adjacency list we associate with each vertex a list, containing the vertices adjacent to that vertex.

In the scheme implementation we will use a more general representation. We will have an object called a graph G, where the graph may be directed or undirected.

The (directed) graph object can be created using the function call:

(define G (make-random-digraph n p))
where n is the size of the graph (number of vertices) and p is the probability that an edge exists between vertices i and j, for all i and all j, where i is not equal to j. The object will have a number of attributes, most notably:


 (n G)       number of vertices in G. Think of n of G as the size of G
 (V G)       the vertices of G, as a set of the first n integers 
             (1 to n inclusive)
 (E G)       the edges of G, as a list of pairs
 (p G)       the probability that an edge (i j) exists. This can also be
             thought of as the density of G. if (= 1 (p G)) then G is a
             clique (fully connected) and if (= 0 (p G)) then G is the
             null graph
 (x-vect G)  A vector, where (vector-ref (x-vect G) i) gives the x
             coordinate of vertex i.
 (y-vect G)  A vector, where (vector-ref (y-vect G) i) gives the y
             coordinate of vertex i. Consequently x-vect and y-vect
             give the position of the vertices in the x-y plane
 (C G)       Is the cost matrix for the graph G. 
             (array-ref (C G) (list i j)) gives the cost from i to j
             in G, and is computed as the Euclidean distance using 
             x-vect and y-vect. If the edge (i j) does not exist then
             then (array-ref (C G) (list i j)) is *max*. Therefore,
             the diagonal (for the non-existent edges (i i)) will
             have all elements set to *max*

Making an undirected graph we use the function
(define G (make-random-graph n p))
There are functions to save graphs to disc (ie. (save-graph G fname)) and to load graphs from disc (ie. (load-graph fname)). There are also function for drawing a graph (ie. (draw-graph G)), drawing a vertex (ie. ((draw-vertex G) i)), drawing an edge (ie. ((draw-edge G) edge) where edge is a pair, such as (list i j)).

It should be noted that the functions draw-vertex and draw-edge are "partially applicable" and can therefore be "curried" thus allowing us to use the primitive ``for-each''. For example, given graph G we can draw all of the edges of G as follows:

(for-each (draw-edge G) (E G))
and we can draw all the vertices of G as follows:
(for-each (draw-vertex) (V G))
There are also functions (as you could expect) to determine if G is a graph (graph? G), test if vertex i is adjacent to vertex j in G (adjacent? i j G), and get the set of vertices that are adjacent to i in G (adjacent-to i G).

Single Source Shortest Paths (SSSP)
One of the first things we might want to enquire of a graph, is given a vertex i, give me all of the paths that start with i and end on all of the remaining vertices so that the length of each path is a minimum. For example, we might have a graph G where the vertices are cities, and the edges are the time to fly from one city to another. Furthermore, it may be the case that you cannot fly direct from any city to any other city, and you might have to go through some city or cities. We might then want to know, given a city called Source, what is the distance from Source to i, for all i not equal to Source. This will give us n-1 paths in G.

An algorithm that does this is Dijkstra's algorithm /home/s7/pat/scheme/pub/dijkstra.scm and is describe informally below. A demo of the algorithm can be run by loading the above file and making the function call

(demo 10 0.3)
This will produce a random graph with 10 vertices, edge probability of 0.3, and will display all of the shortest paths with their costs.


The algorithm uses the following variable

 S, the set of ``special'' vertices, such that all vertices in S 
    are already on shortest paths

 V-S (read it as V minus S) the set of vertices not in S

 D  a vector, such that D[i] is the distance from Source to i
    (and initially D is a copy of the Source'th row in (C G))

 P  a vector, such that P[i] is the vertex immediately before vertex
    i on the path from Source

The algorithm goes like this:

      1. let u be the closest vertex to Source in V-S
 
      2. add u to the set S, and remove u from the set V-S

      3. For all vertices w in V-S
         If the cost from Source -> u -> w is less than
            the cost from Source -> w (that is, it is less costly via u)
         Then set D[w] to be cost(Source -> u) + cost(u -> w)  and
              set P[w] to be u

      4. If there are still vertices remaining in the set V-S 
         Then go back to step 1 above

For example, if we have the following cost matrix



                    -------------------------------
                    |     |     |     |     |     |
                    |  -  | 10  |  -  |  30 | 100 |
                    |     |     |     |     |     |
                    -------------------------------
                    |     |     |     |     |     |
                    | 10  |  -  |  50 |  -  |  -  |
                    |     |     |     |     |     |
                    -------------------------------
                    |     |     |     |     |     |
                    |  -  | 50  |  -  |  20 |  10 |
                    |     |     |     |     |     |
                    -------------------------------
                    |     |     |     |     |     |
                    | 30  |  -  |  20 |  -  |  60 |
                    |     |     |     |     |     |
                    -------------------------------
                    |     |     |     |     |     |
                    | 100 |  -  |  10 |  60 |  -  |
                    |     |     |     |     |     |
                    -------------------------------


and Source = 1, the algorithm proceeds as follows

    S             V-S      u    D1  D2  D3  D4  D5   P1 P2 P3 P4 P5
   {1}         {2,3,4,5}        -   10  -   30  100  1  1  1  1  1

   {1,2}       {3,4,5}     2    -   10  60  30  100  1  1  2  1  1

   {1,2,4}     {3,5}       4    -   10  50  30   90  1  1  4  1  4

   {1,2,4,3}   {5}         3    -   10  50  30   60  1  1  4  1  3
 
   {1,2,4,3,5} {}

We then get the costs of the shortest paths in the vector D, and we can trace back the path from some vertex i to the Source using vector P. For example the path from vertex 1 to vertex 5 goes 5 <- 3 <- 4 <- 1 and has a cost of 60 (note that the function that does this is called (get-path-from i k P)

NOTE: Dijkstra's algorithm is of complexity O(n*n) and you might care to convince yourself of that. Furthermore, the algorithm only holds so long as all labels are positive, that is there is no negative costs to be found in the graph. We have an informal proof below.

We know that in step 1 (choosing the vertex u in V-S, such that u is closest to the source) that the vertex u MUST be special. If it were not special, then we should be able to find some other vertex w such that there is a path Source -> w -> u that has lower cost than Source -> u. But we know that u is the vertex closest to Source in V-S therefore for a shorter path to exist to u we must have a path from Source -> w that is greater than Source -> u and a path from w -> u that is negative ... and this is not allowed. So we can show that it would be absurd for this to exist, so u must be special