Static Program Analysis based on Virtual Register Renaming

My PhD dissertation has been published as a University of Cambridge Computer Laboratory Technical Report. The bibliography has been extended to become a general SSA bibliography.

I studied for my PhD at the Computer Laboratory in Cambridge between 2001 and 2004. I had my viva on 19 Jul 05, and I graduated on 22 Jul 06.

Abstract

Static single assignment form (SSA) is a popular program intermediate representation (IR) for static analysis. SSA programs differ from equivalent control flow graph (CFG) programs only in the names of virtual registers, which are systematically transformed to comply with the naming convention of SSA. Static single information form (SSI) is a recently proposed extension of SSA that enforces a greater degree of systematic virtual register renaming than SSA. This dissertation develops the principles, properties, and practice of SSI construction and data flow analysis. Further, it shows that SSA and SSI are two members of a larger family of related IRs, which are termed virtual register renaming schemes (VRRSs). SSA and SSI analyses can be generalized to operate on any VRRS family member. Analysis properties such as accuracy and efficiency depend on the underlying VRRS.

This dissertation makes four significant contributions to the field of static analysis research.

First, it develops the SSI representation. Although SSI was introduced five years ago, it has not yet received widespread recognition as an interesting IR in its own right. This dissertation presents a new SSI definition and an optimistic construction algorithm. It also sets SSI in context among the broad range of IRs for static analysis.

Second, it demonstrates how to reformulate existing data flow analyses using new sparse SSI-based techniques. Examples include liveness analysis, sparse type inference and program slicing. It presents algorithms, together with empirical results of these algorithms when implemented within a research compiler framework.

Third, it provides the only major comparative evaluation of the merits of SSI for data flow analysis. Several qualitative and quantitative studies in this dissertation compare SSI with other similar IRs.

Last, it identifies the family of VRRSs, which are all CFGs with different virtual register naming conventions. Many extant IRs are classified as VRRSs. Several new IRs are presented, based on a consideration of previously unspecified members of the VRRS family. General analyses can operate on any family member. The required level of accuracy or efficiency can be selected by working in terms of the appropriate family member.

Download

My dissertation is available for download in PDF format. It is 1.3MB. [pdf/1.3MB]
Note that it is also available in the Computer Laboratory technical report series as UCAM-CL-TR-660.

Thesisometer

My PhD dissertation took a long time to complete! The final version weighs in at 55,000 words, with over 200 pages. I began writing it up in Feb 2004. I started measuring its size daily on 22 May 2004. The measurements are taken using detex, with the help of perl and cron. The graph was processed using gnuplot. [progress graph]