1. Static Verification for Modern Software Systems Presenter: Craig Mclaughlin Summary: The current proposal addresses the task of verifying properties of software systems which may be concurrent, distributed and/or operating on heterogeneous architectures. Several projects have extended techniques for sequential programs to the concurrent setting to handle concurrency within GPGPU programming (principally OpenCL/CUDA). Others have taken ideas from type theory to improve static guarantees about communication systems (such as MPI). Recent work has explored combining separation logic and session types to support more powerful reasoning for distributed programs. The aim of the current proposal is to apply this hybrid logic in the context of concurrent and distributed programming models in an effort to enhance the properties one can verify statically about these systems. ------------------------------------- 2. MapReduce with CUDA to achieve inter- and intra-node parallelism Presenter: Dimitar Petrov Summary: As the scale of high-performance computing grows, new needs arise for increased parallelism, reduced complexity and improved programmability. The work proposed here combines MapReduce with CUDA to achieve both inter- and intra-node parallelism. Map and reduce tasks are combined into chunks and executed on the GPU with the aim of increased performance. Although similar solutions have been proposed, all of these rely on the developer's understanding of GPU programming, either CUDA or OpenCL. We propose using Rootbeer to alleviate the issue and make the system accessible to a wider developer audience. Rootbeer allows programmers to write code in Java and have the (de)serialization, kernel code generation and kernel launch done automatically. Rootbeer also provides additional abstractions so that it can run complex Java objects on a GPU. ------------------------------------- 3. Generating optimal performance portable OpenCL code from existing OpenCL code. Presenter: Gordon Reid Summary: OpenCL was developed as a platform-independent way of writing code for execution on a number of different device types. OpenCL guarantees functional portability, but not performance portability. There is a body of existing work in obtaining performance portability using a number of different methods. Some authors have opted for acceptable performance portability between some CPUs and GPUs, doing minimal changes to OpenCL code with some of the changes being manually written by the developer and selected using ifdefs or different source files. Other work takes a more dramatic approach, either requiring the program to be rewritten in another high-level language which is then compiled down to OpenCL code, or are attempting performance portability via a completely new compiler implementation. My plan is in two parts. The first involves using existing OpenCL code like in some previous work but going further and automatically tuning and changing more aspects of the device and kernel code for many device types, including the Intel Xeon Phi. The second involves a look into analysis and optimisations at the SPIR level to explore finer grained optimisations made possible by this new intermediate format.