/* This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see */ #include #include #include #include #include"misc.h" #include"LinkedList.h" #include"MemoryManager.h" #include"degeneracy_algorithm.h" /*! \file degeneracy.c \brief Execute the algorithm in degeneracy_algorithm.c and print the number of cliques found and wall clock execution time. \author Darren Strash (first name DOT last name AT gmail DOT com) \copyright Copyright (c) 2011 Darren Strash. This code is released under the GNU Public License (GPL) 3.0. \image html gplv3-127x51.png \htmlonly See GPL 3.0 here \endhtmlonly */ int main() { #ifdef MEMORY_DEBUG fprintf(stderr, "WARNING: MEMORY_DEBUG is defined, timing will be off.\n"); #endif int n; // number of vertices int m; // 2x number of edges LinkedList** adjacencyList = readInGraphAdjList(&n,&m); int i; int** adjList; // = Calloc(n, sizeof(int*)); int* degree; // = Calloc(n, sizeof(int)); #ifdef RETURN_CLIQUES_ONE_BY_ONE LinkedList* cliques = createLinkedList(); #endif runAndPrintStatsListList( &listAllMaximalCliquesDegeneracy, "degeneracy", adjacencyList, adjList, #ifdef RETURN_CLIQUES_ONE_BY_ONE cliques, #endif degree, n ); #ifdef RETURN_CLIQUES_ONE_BY_ONE destroyCliqueResults(cliques); #endif // Free up memory from adjacency list. i = 0; while(i See GPL 3.0 here \endhtmlonly \section intro_sec Introduction This package contains the code that was used to generate the results of Eppstein and Strash (2011). (See \ref recommended_reading for references I am using.) This package contains code to generate all maximal cliques of graphs. It contains implementations of the following algorithms: -# (tomita) The algorithm of Tomita et al. (2006), which is known to work well in practice, but consumes much memory because it uses an adjacency matrix representation of the input graph. For small sparse graphs, this algorithm is an excellent choice, and it is worst-case optimal. This algorithm also works very fast in practice for dense graphs. \see tomita.c tomita_algorithm.c

-# (tomita-adjacency-list) The algorithm of Tomita et al. (2006), modified to use adjacency lists, therefore only requiring space linear in the size of the graph. This version is explained in the paper by Eppstein and Strash (2011), and is referred to in their paper as the "max-degree" algorithm. For sparse graphs, this algorithm can be quite fast on some inputs, but very slow on others. In particular, there are some sparse graphs that cause this algorithm to run for over 5 hours, when the hybrid and degeneracy algorithms described next takes 10 and 4 minutes, respectively. This behavior is likely because there are some high-degree vertices whose neighbor lists are iterated througha many times. Therefore, use caution when running this algorithm. \see adjlist.c adjlist_algorithm.c

-# (hybrid) An implementation of the algorithm of Eppstein, Löffler, and Strash (2010), which computes a degeneracy ordering of the input graph, and uses this ordering to speed up computation without any fancy data structuring. This algorithm is called "hybrid" in Eppstein and Strash (2011). For sparse graphs, this algorithm is consistently fast in practice, it is especially fast when the degeneracy of the graph is very small (for example, less than 6), or with graphs whose vertices, on average, have very few later neighbors in the degeneracy ordering. However, with larger degeneracy graphs, and graphs whose vertices have, on average, many more later neighbors in the degeneracy ordering, using degeneracy, the algorithm described next, is typically faster. Why? It uses a more complex data structure that lets us gets us better speed in this case. \see hybrid.c hybrid_algorithm.c

-# (degeneracy) The algorithm of Eppstein, Löffler, and Strash (2010), which computes a degeneracy ordering of the input graph, and builds a data structure to speed up computation. This algorithm is called "degeneracy" in Eppstein and Strash (2011). For sparse graphs, this algorithm is consistently fast in practice, it is particular fast when the degeneracy of the graph is higher than 6, or with graphs whose vertices, on average, have many later neighbors in the degeneracy ordering. \see degeneracy.c degeneracy_algorithm.c

\section install_sec Compilation and Execution Under Linux systems, or *nix, executing "make" or "make all" from the top-level directory should compile all the source files in the src directory, place all object files in the obj directory, and place all executables in the bin directory. The makefile is not very sophisticated, so you may want to always run "make clean" first. If you have gcc and gnu make, this should just work. This code has not been tested in other environments. Executing the bash shell script "test.sh" in the top-level directory will execute every executable on the data sets in data and print statistics about the data sets. \subsection Defines You can modify the defines in the Makefile to alter the behavior of the executables. - DEBUG - print verbose debug information; may be annoying - MEMORY_DEBUG - check for malloc or calloc returning NULL; if so, exit gracefully (causes slightly slower code) - RETURN_CLIQUES_ONE_BY_ONE - add cliques to a linked list that is returned by the clique listing algorithms - PRINT_CLIQUES_ONE_BY_ONE - print cliques to standard output one per line - (don't use with define PRINT_CLIQUES_TOMITA_STYLE) - PRINT_CLIQUES_TOMITA_STYLE - print cliques as described by Tomita et al. (2006) - prints to standard output: print the vertex when it is added to the partial clique, print "b" when a vertex is removed, and print "c" when a maximal clique is found. (don't use with define PRINT_CLIQUES_ONE_BY_ONE) - ALLOW_ALLOC_ZERO_BYTES - some systems behave strangely when you allocate 0 bytes with either malloc or calloc, if this is not defined, then we always allocated at least one byte. The Defines used to generate the results of Eppstein and Strash (2011) are -DALLOW_ALLOC_ZERO_BYTES and -DPRINT_CLIQUES_TOMITA_STYLE with redirecting the standard output to /dev/null. \subsection Executables Each executable reads in the input graph from standard input, and writes the main product to standard output, any statistical information or by-products are printed to standard error. - Statistics: - bin/printnm - print the number of vertices and edges in the data set - bin/compdegen - compute and print the degeneracy of the data set - Maximal clique listing: (print number of maximal cliques, and algorithm running time) - bin/tomita - execute algorithm tomita on the data set - bin/adjlist - execute algorithm tomita-adjacency-list on the data set - bin/hybrid - execute algorithm hybrid on the data set - bin/degeneracy - execute algorithm degeneracy on the data set \section data_sets Data Sets Several data sets are in the directory data, in a subdirectory named according to their origin. \subsection Sources I have included several data sets from BioGRID version 3.0.65 (http://thebiogrid.org/) in subdirectory biogrid. See Eppstein and Strash (2011) for more sources \subsection Format Each file contains the following: -# The number of vertices in the graph -# Twice the number of edges in the graph. -# a list of edges in x,y format, where 0<=x,yTheoretical Computer Science, 363(1), 2006. (http://dx.doi.org/10.1016/j.tcs.2006.06.015) - "Listing All Maximal Cliques in Near-Optimal Time" by David Eppstein, Maarten Löffler, and Darren Strash, ISAAC 2010, LNCS volume 6506, pp. 403-416, 2010. (http://dx.doi.org/10.1007/978-3-642-17517-6_36 or http://arxiv.org/abs/1006.5440) - "Listing All Maximal Cliques in Large Sparse Real-World Graphs" by David Eppstein and Darren Strash, SEA 2011, LNCS volume 6630, pp. 364-375, 2011. (http://dx.doi.org/10.1007/978-3-642-20662-7_31 or http://arxiv.org/abs/1103.0318) */