

Project SuggestionsThe following projects are suggestions of PhD proposals, and indicate the area of expertise of the given supervisors.
If you are interested in pursuing a PhD on the prescribed project or
something similar, please contact the supervisor stated to discuss.
|
Evaluating the efficacy and impact of cloud-computing measurement mechanisms, Dr Dimitrios PezarosCloud-computing and other distributed systems rely on network measurement architectures and tools in order to provide crucial functionality such as node placement, quality of service and management. But there are various ways in order to measure the performance of these systems. Active measurement systems inject synthetic traffic, while passive measurement systems monitor network traffic and post-process its properties (e.g., packet inter-arrival times, queue lengths, etc.). Additionally, one might need to measure at other layers, for instance application performance.This project will look at evaluating these different measurement approaches, and crucially perform a systematic quantification of the requirements, the overhead and the efficacy of these various measurement techniques in a cloud-computing environment. As a SICSA project, this project will be jointly supervised with Dr Dimitrios Pezaros at the University of Glasgow and Dr Tristan Henderson at the University of St. Andrews. |
Memory Management in the Cloud, Jeremy SingerCloud computing involves networked machines that store and process massive data sets in a scalable and fault-tolerant manner. Individual computing nodes in a cloud-based system manage their own local memory via standard operating system (OS) or virtual machine (VM) interfaces. This project begins with an empirical analysis of per-node memory management activity, to determine whether there are distinctive usage patterns for cloud-based workloads. In particular, we hope to focus on Hadoop (an open-source MapReduce platform) running on top of the Java VM. Potentially interesting avenues for research include (i) tuning Java garbage collection for Hadoop workloads, and (ii) configuring local Java VM parameters to ensure that cloud-computing service-level agreements can be satisfied. |
Information Filtering, Leif Azzopardi ( Leif.Azzopardi@glasgow.ac.uk )The vast amount of existing information and the constant addition of new information from news feeds, blogs, and personal status updates like twitter can lead to the problem of information overload. While there is no grand de nition of the phrase, Bawden and Robinson defi ne information overload as "represent[ing] a state of aff airs where an individual's e fficiency in using information in their work is hampered by the amount of relevant, and potentially useful, information available to them." One solution to alleviating this problem for long-term and persistent information needs is to delegate part of the information seeking process to an information fi ltering system. Several applications of filtering systems include matching news stories with personal interests; nding journal and conference papers for academics; helping organizations nd consumer and press opinions on new products, and keeping people up-to-date with popular culture. In choosing to use an information ltering system, the user is seeking to maximise their exposure to information without overloading their cognitive capacity. Humans are capable of processing a limited amount of information in any given period of time and an information ltering system should seek to minimise the extraneous cognitive load of users while maximising the e ectiveness of the information presented. This places the Information Filtering system in a tricky position: too many irrelevant documents will make the system unusable; too many relevant documents will overload the cognitive capacity of users; and too few relevant documents means the system is not an appropriate replacement for the previous information seeking techniques employed by the user. For example, a product manager at an electronics company needs to be aware of what is being written about product releases. Reading through all the content available on the web for all the di erent products would not be feasible. Instead, the manager could rely upon an information ltering system to extract the relevant information from the streams of information being published on the Web. In choosing to use this system, he/she could defi ne a set of topics of interest and rely on the filtering system to automatically present information as it becomes available. He/she then runs this risk of missing potentially relevant information or being burdened by irrelevant information. What makes this task challenging is the number of potentially conflicting factors that a ffect the system's performance. |
High-Performance "Green" Computing through High-Level FPGA Programming, Wim VanderbauwhedeField-Programmable Gate Arrays (FPGAs) have been heralded as a "greener" alternative to existing processor platforms for a large variety of tasks, as they combine low power consumption with very high throughput. However, FPGAs are much more difficult to program than conventional microprocessors, and this is the biggest barrier to their mainstream adoption. This PhD research would explore approaches to high-level FPGA programming, either focusing on the language and compiler technology (for example leveraging our "1000 cores" framework [1,2]), or on specific application domains such as Information Retrieval or Machine Learning. The student will get access to the world's most largest and most performant FPGA supercomputer, the Novo-G, hosted by the NSF Center for High-Performance Reconfigurable Computing at the University of Florida [3]. This is an excellent opportunity to participate in world-class research in a highly relevant and exciting research field.
[1] http://www.eetimes.com/electronics-news/4211856/1-000-processors-on-a-Xilinx-FPGA |
Programming with large numbers of processor cores, William Paul CockshottRecent developments in chip design have led to processors with large numbers of cores per chip. Intel has produced a chip with 48 Pentium cores, and in this department a chip with 1000 cores has been designed. A hypothesis that is worth investigating is that as the number of cores grows > 1000 we will need programming languages that provide geometrical process structures analogous to two dimensional arrays of data in existing programming languages. The point being that cores adjacent to one another can intercommunicate faster than more remote ones. The proposed research would look into geometrical, type checking and sequencing problems with such geometrical programming languages for massive multi-core processors. For an initial investigation of some of the issues from which the research could take off see http://www.dcs.gla.ac.uk/~wpc/reports/linopaper.pdf Suitable for SICSA Complex Systems strand. Possible co supervisor Dr Hammond at St Andrews |
Using Constraint Satisfaction and Theoretical techniques to enumerate and classify combinatorial structures, Alice Miller and Patrick ProsserA major problem in combinatorics research is that of proving the existence of, and classifying structures with given combinatorial properties. Examples of such structures are linear spaces on a given number of points with given block sizes and graphs with specific numbers of vertices and forbidden subgraphs. We may wish to know how many structures exist with a given property, and to be able to classify them and categorise them with respect to additional properties (for example their block structure or edge structure, or their automorphism groups). If the size of the problem is defined to be the number of points/vertices in the structure, then the complexity increases exponentially with size. Classically, exhaustive computational search has been used to generate combinatorial structures with given properties, possibly with the use of a computational group theory tool like GAP to eliminate isomorphic solutions. However as the size of the structures increase, the search space becomes too large to feasibly check. Mathematical reasoning can be used to reduce the search space (by eliminating impossible scenarios). It can also be used to reduce the problem to a smaller one. For example, it can be shown that linear spaces with a certain property only exist if the complement of the space (a graph) is diamond-free (i.e. has no set of four vertices which share at least 5 edges). If no suitable diamond-free graph exists then the original linear space of interest can not exist. A technique which works particularly well at proving the existence (or non-existance) of structures with given properties is constraint programming. This is a technique for modelling and solving constraint satisfaction problems (such as timetabling problems, scheduling problems, routing problems, problems of design, etc). Problems are modelled as a collection of variables with domains of values, with constraints acting between variables disallowing combinations of values. The problem is then to find an assignment of value to variables, from their domains, that satisfy the constraints. Constraint programming toolkits aim to give us an easy and efficient way to model such problems and to solve these problems via a process of inference and search. The aim of this project is to classify combinatorial structures of interest and use a combination of mathematical reasoning and constraint programming to classify them, for problem size as large as possible. |
Technologies for Hard Problems, Patrick Prosser (Glasgow)Some problems are hard and some are easy. Why is that? For example, it can be really difficult to find a time and place to have a meeting with your friends. Trying to arrange a meeting with a couple of friends is usually quite easy. If we want to get all our friends together at once it will most likely be impossible (so we won’t even bother trying). But when our meeting has not too many people and not too few, the problem will hover between being easy and impossible, and it will be hard. The same thing happens with combinatorial problems, like colouring a map, or finding the largest independent set of a graph, or scheduling production in a factory while respecting customer due dates. That is, we see a phase transition in a decision problem, with hard problems occurring at the edge of solvability. But what technology should we use for solving these problems? A recent study on triangle packing used constraint programming (CP) and a state of the art SAT solver. For the hard instances the SAT solver was orders of magnitude faster than the CP solvers: in some cases the SAT solver was taking seconds and CP was taking hours. A Mixed Integer Programming (MIP) toolkit was also used: MIP was able to solve problems in an hour that took CP a week! We want to know why this happens. We also want to look at problems that have been modelled and solved with CP and see if we can annihilate them with MIP. At the very least we want to get a better understanding of what makes problems hard and when we should use one technology rather than another. |
Using Constraint Satisfaction to solve a model checking problem, Alice Miller and Patrick Prosser (Glasgow), Tom Kelsey and Ian Miguel (St Andrews)Model checking is a technique used to verify properties of concurrent, communicating systems. An abstract model of a system is created and an automatic tool - a model checker - used to search an associated graph (the state-space) to check the behaviour. One of the major problems with model checking however is that state-spaces can become very large. This is known as the state-space explosion problem, and a rich area for research is tackling this problem. When a property is not satisfied by a system, it is sufficient to terminate search when an appropriate error path (i.e. counter-example) has been found. This can significantly reduce the amount of the state-space that has to be generated. However, when a model is very large, even finding an error path can be intractable. Ideally one would like to direct the search so that an error path (if it exists) is found quickly. Constraint programming is a technique for modelling and solving constraint satisfaction problems (such as timetabling problems, scheduling problems, routing problems, problems of design, etc). Problems are modelled as a collection of variables with domains of values, with constraints acting between variables disallowing combinations of values. The problem is then to find an assignment of value to variables, from their domains, that satisfy the constraints. Constraint programming toolkits aim to give us an easy and efficient way to model such problems and to solve these problems via a process of inference and search. The challenges in constraint programming are to make inference powerful and efficient, to heuristically guide search, and to eliminate unwanted symmetries. The aim of this project is to investigate feasibility of reformatting model checking problems as constraint satisfaction problems, thereby allowing the exploitation of advanced constraint programming search techniques to efficiently identify a solution (error path). |
Counter Terrorism Simulations, Prof. Chris JohnsonOver the last five years we have worked with a range of organisations in the UK and USA to develop computer simulations of terrorist attacks using a range of mechanisms including improvised explosive devices (IEDs) and chemical, biological, radiological and nuclear (CBRN) attacks. The software models help counter terrorism agencies to assess the resilience of their plans to a range of potential scenarios. However, they raise a host of computational issues – it can be difficult to determine what should and what should not be modelled in a simulation, it can also be difficult to capture all of the multiple agencies that might be called on in the aftermath of a potential attack. A further set of challenges relate to the software engineering of complex multi agent environments. Existing solutions draw heavily on ideas and techniques that have originally been developed within the games industry. |
Configuration Management and Space Situation Awareness,Prof. Chris JohnsonOver the last decade, I have worked with NASA, the European Space Agency and the US Air Force to identify techniques that can be used to increase the resilience and dependability of space missions. One particular problem is in the configuration of software on satellites and manned missions. Many of these platforms are repeatedly reprogrammed to meet the changing requirements of complex operations. They are also, typically, constrained by the requirements of multiple stake holders including operational and scientific teams from many different countries. In consequence, poor communication and inadequate validation or verification has led to mishaps and the loss of space missions worth many millions of Euros. PhDs in this area would work closely with space agencies from around the globe to help ground teams control the successive reconfiguration of space missions, learning the lessons from previous failures. |
Parallelising model checking using FPGAs (Dr A. Miller, and Dr W .Vanderbauwhede)Model checking is a technique used to verify properties of concurrent, communicating systems. An abstract model of a system is created and an automatic tool - a model checker - is used to exhaustively search an associated graph (the state-space) to check the behaviour. One of the major problems with model checking however is that state-spaces can become very large. This is known as the state-space explosion problem, and a rich area for research is tackling this problem. FPGAs are a type of integrated circuit that can be programmed. There is a growing interest in the use of FPGAs for computational tasks as they are potentially much faster than a microprocessor, with a fraction of the power consumption. This project will investigate the use of FPGAs to alleviate the state-space explosion problem, by allowing parts of the model checking process to be handled in parallel. Different model checkers use different search algorithms and allow for a variety of different types of behaviour to be investigated (e.g. non-deterministic, probabilistic, real time). The aim of this project is to determine which model checkers, and which aspects of their implementation succumb to parallel techniques. |
Algorithms for matching problems with preferences (Dr David Manlove)Description: Matching problems involve a set of applicants (e.g., school-leavers) who seek to be collectively matched to a set of posts (e.g., places on university degree courses). The applicants may have preferences over a subset of the posts, and vice versa. Algorithms to solve matching problems must be efficient (i.e., run in polynomial time vital given the numbers of agents typically involved in applications) and optimal with respect to key properties of the matching, such as its size (in order to minimise the number of applicants who are unmatched) and the overall satisfaction of the agents involved (which will in general depend on the supplied preference lists). This project concerns the derivation of efficient optimal algorithms for matching problems with preferences. This a rewarding area of work, with many large-scale real-world applications, such as the assignment of medical students to hospital posts in many countries, the allocation of kidney donors to transplant patients, and also school, college and university placement. |
Investigations into Algorithmic Information Theory. Dr Paul Cockshott and Dr. Lewis Mackenzie (Glasgow), Dr. Greg Michaelson (Heriot-Watt)The original definition of entropy by Boltzmann is anthropomorphic because it does not base itself on a fundamental law of physics but on human ignorance about the possible microstates of a system. A similar objection can be levelled against Chaitins definition of algorithmic entropy as being conditional on the definition of the UTM used for defining the shortest algorithm to compute a number. The Boltzmann definition has been developed by physics since then because quantum mechanics gives a lower bound to the microstates that a system can occupy [2] these being anthropic in the Boltzmann case. The research problem is to investigate whether the computational definition of entropy can be appropriately renormalized to remove the anthropic decision on what TM implementation is to be used. The research project could involve large parallel computing task to approximate the first n digits of the omega number. The department can call on substantial parallel computing resources through SCOTGRID and via EPCC. Omega is the probability that a TM will halt on a randomly given input tape. For background see [1]
|
Analysing Large-Scale Deployments of Software for Mobile Phones (Dr. Matthew Chalmers, U. Glasgow)Repositories such as Apple's App Store, Google's Android Market and Sauriks Cydia allow researchers as well as commercial developers to deploy applications to large numbers of users around the world. Scale in terms of number of users, geographic distribution and contexts of use is only the beginning of the complexity faced by analysts, evaluators and developers: users may add plugins and connections to other software to the application, so that software structure varies between users and also varies over time. We can collect log data from the application, but are then faced with a challenge of how to analyse, understand and use it. The fundamental research question underlying this research is: how can we find patterns in this log data that are useful for evaluators, developers and indeed for the users themselves? |