(conference presentation of this paper)Slides:
(another version of earlier talk)
We introduce the Forseti system, which is a principled approach for holistic memory management. It permits a sysadmin to specify the total physical memory resource that may be shared between all concurrent virtual machines on a physical node, e.g. in a datacenter. Forseti seeks to maximize the combined throughput of the set of VMs based on concepts from economic utility theory. Our results show that Forseti enables dramatic reductions (up to 5x) in VM memory footprint without compromising application execution times. This is a practice talk for ISMMSlides:
(similar to talk below).
Garbage collection is no longer an esoteric research interest. Mainstream programming languages like Java and C# rely on high-performance memory managed runtime systems. In this talk, I will motivate the need for rigorous models of memory management to enable more powerful analysis and optimisation techniques. I will draw on a diverse range of topics including thermodynamics, economics, machine learning and control theory.Slides:
AspectJ is a tool for aspect-oriented programming in Java. Aspects allow programmers to inject behaviour into code based on syntactic pattern matching on constructs from the original source program. In this talk, I will review the features of AspectJ - particularly its flexibility (good) and its user-friendliness (bad). I will report on a new project at Glasgow about modifying the behaviour of the Java Collections framework to enable graceful degradation of application performance when system memory is exhausted.Slides:
This is a roadmap talk about general compiler optimisations. I will start by giving a brief overview of how compiler optimizations can be classified, discussing ahead-of-time optimization, feedback-directed optimization and runtime optimization. I will discuss how information sharing enables more effective optimization. I will argue that this trend is taking us to a Big Brother optimization utopia.
In a future computational environment with terabytes of non-volatile RAM and petabytes of cloud-based backing store, can we avoid explicit delete/free()s and implicit garbage collection altogether? Health warning: this is a highly speculative, low-content presentation.Slides:
In this talk, we show how economic utility theory can be applied to heap resource allocation for multiple co-located virtual machines.Slides:
A brief overview of low-level patterns and their applications in code analysis.Slides:
This talk presents a comparative study of programs written in Java, Clojure, JRuby, Jython and Scala that execute on the JVM platform.Slides:
We formulate heap sizing as a control problem, apply and tune a standard controller algorithm, and evaluate its performance on a set of well-known benchmarks. We find our controller adapts the heap size more responsively than existing mechanisms. This responsiveness allows tighter virtual machine memory footprints while preserving target application throughput, which is ideal for both embedded and utility computing domains. In short, we argue that formal, systematic approaches to memory management should be replacing ad-hoc heuristics as the discipline matures. Control-theoretic heap sizing is one such systematic approach.Slides:
(another version of above talk).
Typical theorem-proving workloads on the Poly/ML runtime system may execute for several hours, occupying multi-gigabyte heaps. The heap size may be fixed at execution startup time, or it may be allowed to vary dynamically. To date, runtime heap size growth has been implemented using simple hard-coded heuristics. In this talk, I will argue that a mathematically rigorous approach to heap sizing, based on control theory, is more appropriate.Slides:
(An earlier version of above talk)Slides:
I will motivate the need for mathematical models for heap memory usage, and outline how control theory can be applied to adaptive heap sizing. I will report on our experiences in Jikes RVM and Poly/ML runtime systems.Slides:
I discuss dynamic resources (threads, heap) required for program execution, and how varying the amount of resource allocated affects execution characteristics. I show how control theory may be used to optimize execution time, by controlling heap size. Then I go on to discuss how such tuneable resource allocation techniques can form the basis for holistic optimization of a cloud data center.Video: MP4 (15mins)
In this talk, I review current heuristic approaches to setting the heap size in Java virtual machines. Then I outline two mathematical models for heap-resizing, based on microeconomic theory and control theory. I will argue that These rigorous mathematical models give a more principled approach to heap sizing.Slides:
Our concept of a literate experiment is to have a single, rich, human-generated, text-based description of a particular experiment. We aim to weave together the description with the meta-description of the experimental process.Slides:
The success of Google's MapReduce parallel programming pattern has led to alternative implementations for various scenarios. In this talk we present MRJ, a MapReduce Java framework for multicore architectures. We also explore the significant impact that Java runtime garbage collection has on the performance and scalability of MRJ. We advocate the use of machine-learning based auto-tuning techniques to reduce the performance deficit caused by garbage collection.
Overview of my Java fork/join implementation of a planetary n-body simulation.
This project investigates a novel extension to the Java programming language, providing support for flexible parallelism on next-generation many-core processors. The flexibility of our system is based on: (1) encoding descriptions of the level of accuracy provided by alternative implementations of a program module, allowing trade-offs between computational cost and precision; and (2) enabling the runtime system to determine adaptively when new parallel threads should be spawned. I argue that this kind of programming extension fits naturally in a transactional memory system setting.Slides:
(earlier version of above talk)
In current compilers, data flow analysis and code optimization are generally implemented using sequential algorithms. Given the commoditization of many-core platforms, it should be possible to use parallel algorithms instead. This talk describes how the standard sequential algorithm for constructing static single assignment form (SSA) may be parallelized. Then we demonstrate how this parallel algorithm may be realized in an existing compiler infrastructure, using light-weight threading mechanisms for Java.Slides:
Overview of my Java fork/join implementation of a text-file concordance application. Full code at http://sicsaconcordance.googlecode.comSlides:
Memory costs money and programs are customers. We draw an analogy between memory management and micro-economic supply-and-demand theory, which is hopefully entertaining, if not entirely practical.Slides:
Thread-level speculation (TLS) is a mechanism for improving the execution of sequential programs on multi-core processors. TLS relies on the insertion of potential thread-spawn points into sequential programs, either statically or dynamically. Spawned threads generate parallelism, but they are only effective if (a) the parallel execution saving outweighs the thread management overhead, and (b) there are few data dependence vioations with other concurrent threads.
Until now, spawn points have been identified via (a) exhaustive profiling of programs, and/or (b) heuristics devised by human experts. We have recently begun a project that uses machine learning techniques to generate spawning heuristics automatically, with less program profiling. This presentation reports on the use of Java programs' static attributes and dynamic behaviour as features for learning effective TLS spawning policies.Slides:
(earlier version of above talk)Slides:
Over the past 20 years, SSA has risen to become the compiler intermediate representation of choice. Compiler developers cite many qualitative reasons for choosing SSA. However in this study, we present clear quantitative benefits of SSA, by applying several standard software metrics to compiler intermediate code in both SSA and non-SSA forms. The average complexity reduction in using SSA ranges from 7% to 23% with our software metrics, over a set of standard SPEC benchmarks.Slides:
This talk describes simple, low-level patterns in Java methods that can be discovered via cheap static analysis of bytecode. Various nano-patterns are described, and then we present example applications for which these nano-patterns may be used.Slides:
We introduce a set of simple patterns that may be found in Java method source code. Examples include "method contains loop", "method makes recursive call", and "method throws exception". We can use the set of patterns that a method exhibits as an abstract summary of that method's source code characteristics. Next we perform runtime profiling to discover dynamic properties about each method as it executes. We use machine learning to link the simple source code patterns with the dynamic properties. This general approach has several exciting applications - at present we are concentrating on optimizing thread-level speculation.Slides:
Many-core machines have arrived. Speculative execution is a popular technique for speeding up sequential programs on such parallel hardware. Value prediction enables more effective speculation. We will review several schemes for value prediction in Java programs, and evaluate these on standard benchmark suites.Slides:
This talk explains that extracting information from object type names can give a reliable and efficient way to make such object lifetime predictions. We use this approach to optimize a generational garbage collector in the Jikes RVM system.Slides:
An investigation of how various object-oriented metrics correlate with object lifetime in Java benchmark programs.
A general overview of how machine learning can be applied to optimize generational garbage collection. Reports on preliminary work to identify objects with similar lifetimes based on similar class names.Slides:
Java program execution times vary greatly with different garbage collection algorithms. Until now, it has not been possible to determine the best GC algorithm for a particular program without exhaustively profiling that program for all available GC algorithms. Our new approach uses machine learning techniques to build a prediction model that, given a single profile run of a previously unseen Java program, can predict an optimal GC algorithm for that program.Slides:
Object pretenuring involves the identification of long-lived objects at or before their instantiation. It is a key optimization for generational garbage collection systems, which are standard in most high performance Java virtual machines. This talk presents a new study of factors that are used to indicate object lifespans. We adopt the information theory measurement of normalized mutual information to compare these various different factors in a common framework. A study of garbage collection traces from four standard Java benchmark programs shows that there is high dependence on some of these factors such as allocation site and object type. We also identify and measure new factors based on object-oriented metrics.Slides:
(earlier version of above talk)Slides:
We study the architectural problem of branch prediction. We analyse the popular technique of two-level adaptive prediction, relating it to the state-of-the-art Machine Learning technique of Bayesian Networks (BNs). We show that a two-level predictor is an approximation to the BN formalism. This link allows us to explore the wider family of BN predictors. We investigate how to adapt BN techniques to operate within realistic hardware constraints, using the same primitive components that are present in existing branch predictors. We systematically study how performance is affected by these simplifications. We aim to use these ideas to reduce the storage overhead of BN predictors without losing significant prediction accuracy. The key motivating factor is that storage required in two-level predictors grows exponentially with branch history length, whereas BNs provide a framework to reduce this overhead.Slides:
(earlier version of above talk)
I will give a high-level overview of some trends in JVM development practice. My basic premise is that JVMs are becoming increasingly complex software systems. At present, there are few effective mechanisms in place to manage this complexity. I think this problem presents an interesting research challenge!Slides:
Take the best techniques from two diverse areas of computer science research - (1) program comprehension and (2) machine learning. Mix together and apply to memory management. Program comprehension allows us to study why garbage collection happens. Machine learning allows us to control how garbage collection happens. Preliminary results have been carefully cooked up in the Jikes RVM system, and taste delicious!Slides:
Successful branch predictors have three properties: high accuracy, low latency and low implementation complexity. This talk shows how branch prediction can be solved using the machine learning technique of bayesian networks. However the vanilla bayesian network classifier requires serious hackery to achieve the necessary levels of accuracy/latency/complexity for branch prediction.Slides:
Concept assignment identifies units of source code that are functionally related, even if this is not apparent from a syntactic point of view. Until now, the results of concept assignment have only been used for static analysis, mostly of program source code. This paper investigates the possibility of using concept information as part of dynamic analysis of programs. There are two case studies involving (i) a small Java program used in a previous research study; (ii) a large Java virtual machine (the popular Jikes RVM system). These studies demonstrate the usefulness of concept information for dynamic approaches to profiling, debugging and comprehension. This paper also introduces the new idea of feedback-directed concept assignment.Slides:
I will review basic concepts of branch prediction, then show how the Fano inequality can be used to bound predictor performance. I hope to describe two studies I have recently carried out. The first study involves optimal branch prediction for the quicksort algorithm. The second study uses trace files from the Championship Branch Prediction, a sort of World Cup for branch prediction designers.
Virtualization is ubiqitous, with the global availability of the Java Virtual Machine and other similar virtual machine platforms. Higher-order virtualization involves building a stack of virtual machine layers. This provides obvious advantages such as: flexibility; separation of concerns; reuse of existing functionality; support for legacy platforms. However, the benefits of higher-order virtualization come at a price in terms of efficiency. This paper examines different techniques to increase the efficiency of higher-order virtualization on chip multiprocessor architectures. These techniques embrace hardware, software and virtual machine interaction. Some techniques (such as just-in-time compilation) are employed in existing virtual machine systems. Other techniques (such as hints-based speculative parallelism) have not been previously considered. We examine how to use these perfomance-enhancing techniques in the context of stacked virtual machine layers.Slides:
(earlier version of above talk)
Probabilistic program slicing is a proposal for a new form of slicing. In this talk we will see that other program analyses have been adapted to handle probabilities. We will provide motivation for a probabilistic form of slicing. Then we give a simple walk-through example of probabilistic slicing. We conclude with a survey of the future work required.Slides:
Highly predictable method return values provide tremendous scope for speculative parallelism. But how do we measure predictability? The most fundamental indicator is provided by the information-theoretic notion of entropy. In this talk, I will discuss entropy measurements for method return values from a set of standard Java benchmarks. The results have implications for both architecture and virtual machine.Slides:
Concept assignment is a program comprehension technique that relates high-level "human-oriented" concepts with low-level "implementation-oriented" source code. I will describe ongoing research that applies concept assignment to the analysis of code generators, such as those found in compiler backends. Anomalies in this analysis often indicate bugs in a code generator. I will show examples using my concept assignment tool operating on an experimental ARM code generator for the IBM Jikes RVM.Slides:
(earlier version of above talk)Slides:
Static single information form (SSI) is a recently proposed program intermediate representation that augments the classical control flow graph with dependence information. SSI encodes the dependence information in its variable naming scheme, in the same manner as static single assignment form (SSA). However, SSI encodes both control and data dependence, whereas SSA encodes only data dependence. In this talk, I will review SSI and show how it can form the basis for two efficient slicing algorithms.Slides: