Quick Cliques: A package to efficiently compute all maximal cliques in sparse graphs

Author:: Darren Strash (first name DOT last name AT gmail DOT com)

Introduction

This package contains the code that was used to generate the results of Eppstein and Strash (2011). (See Recommended Reading for references I am using.)

This package contains code to generate all maximal cliques of graphs. It contains implementations of the following algorithms:

(tomita) The algorithm of Tomita et al. (2006), which is known to work well in practice, but consumes much memory because it uses an adjacency matrix representation of the input graph. For small sparse graphs, this algorithm is an excellent choice, and it is worst-case optimal. This algorithm also works very fast in practice for dense graphs.
See also:
tomita.c tomita_algorithm.c
(tomita-adjacency-list) The algorithm of Tomita et al. (2006), modified to use adjacency lists, therefore only requiring space linear in the size of the graph. This version is explained in the paper by Eppstein and Strash (2011), and is referred to in their paper as the "max-degree" algorithm. For sparse graphs, this algorithm can be quite fast on some inputs, but very slow on others. In particular, there are some sparse graphs that cause this algorithm to run for over 5 hours, when the hybrid and degeneracy algorithms described next takes 10 and 4 minutes, respectively. This behavior is likely because there are some high-degree vertices whose neighbor lists are iterated througha many times. Therefore, use caution when running this algorithm.
See also:
adjlist.c adjlist_algorithm.c
(hybrid) An implementation of the algorithm of Eppstein, Löffler, and Strash (2010), which computes a degeneracy ordering of the input graph, and uses this ordering to speed up computation without any fancy data structuring. This algorithm is called "hybrid" in Eppstein and Strash (2011). For sparse graphs, this algorithm is consistently fast in practice, it is especially fast when the degeneracy of the graph is very small (for example, less than 6), or with graphs whose vertices, on average, have very few later neighbors in the degeneracy ordering. However, with larger degeneracy graphs, and graphs whose vertices have, on average, many more later neighbors in the degeneracy ordering, using degeneracy, the algorithm described next, is typically faster. Why? It uses a more complex data structure that lets us gets us better speed in this case.
See also:
hybrid.c hybrid_algorithm.c
(degeneracy) The algorithm of Eppstein, Löffler, and Strash (2010), which computes a degeneracy ordering of the input graph, and builds a data structure to speed up computation. This algorithm is called "degeneracy" in Eppstein and Strash (2011). For sparse graphs, this algorithm is consistently fast in practice, it is particular fast when the degeneracy of the graph is higher than 6, or with graphs whose vertices, on average, have many later neighbors in the degeneracy ordering.
See also:
degeneracy.c degeneracy_algorithm.c

Compilation and Execution

Under Linux systems, or *nix, executing "make" or "make all" from the top-level directory should compile all the source files in the src directory, place all object files in the obj directory, and place all executables in the bin directory. The makefile is not very sophisticated, so you may want to always run "make clean" first. If you have gcc and gnu make, this should just work. This code has not been tested in other environments.

Executing the bash shell script "test.sh" in the top-level directory will execute every executable on the data sets in data and print statistics about the data sets.

Defines

You can modify the defines in the Makefile to alter the behavior of the executables.

DEBUG - print verbose debug information; may be annoying
MEMORY_DEBUG - check for malloc or calloc returning NULL; if so, exit gracefully (causes slightly slower code)
RETURN_CLIQUES_ONE_BY_ONE - add cliques to a linked list that is returned by the clique listing algorithms
PRINT_CLIQUES_ONE_BY_ONE - print cliques to standard output one per line
- (don't use with define PRINT_CLIQUES_TOMITA_STYLE)
PRINT_CLIQUES_TOMITA_STYLE - print cliques as described by Tomita et al. (2006)
- prints to standard output: print the vertex when it is added to the partial clique, print "b" when a vertex is removed, and print "c" when a maximal clique is found. (don't use with define PRINT_CLIQUES_ONE_BY_ONE)
ALLOW_ALLOC_ZERO_BYTES - some systems behave strangely when you allocate 0 bytes with either malloc or calloc, if this is not defined, then we always allocated at least one byte.

The Defines used to generate the results of Eppstein and Strash (2011) are -DALLOW_ALLOC_ZERO_BYTES and -DPRINT_CLIQUES_TOMITA_STYLE with redirecting the standard output to /dev/null.

Executables

Each executable reads in the input graph from standard input, and writes the main product to standard output, any statistical information or by-products are printed to standard error.

Statistics:
- bin/printnm - print the number of vertices and edges in the data set
- bin/compdegen - compute and print the degeneracy of the data set

Maximal clique listing: (print number of maximal cliques, and algorithm running time)
- bin/tomita - execute algorithm tomita on the data set
- bin/adjlist - execute algorithm tomita-adjacency-list on the data set
- bin/hybrid - execute algorithm hybrid on the data set
- bin/degeneracy - execute algorithm degeneracy on the data set

Data Sets

Several data sets are in the directory data, in a subdirectory named according to their origin.

Sources

I have included several data sets from BioGRID version 3.0.65 (http://thebiogrid.org/) in subdirectory biogrid.

See Eppstein and Strash (2011) for more sources

Format

Each file contains the following:

The number of vertices in the graph
Twice the number of edges in the graph.
a list of edges in x,y format, where 0<=x,y<number of vertices.
- the graph is symmetric, so x,y and y,x are in the list.