DDMACS: Data-driven Modelling and Analysis of Complex Systems

- a SICSA Workshop -

Glasgow, 19 March 2018

Eventbrite - DDMACS 2018


Our aim is to bringing together researchers from SICSA institutions interested in the combined application of data-driven techniques and computational modelling and analysis methods to real world problems.

View more »

Venue and Date

Venue: Level 5, Sir Alwyn Williams Building
School of Computing Science, University of Glasgow
18 Lilybank Gardens, Glasgow, G12 8RZ
Date: 19 March 2018, 10:00 -- 16:00 lunch included

View programme »

Organisers and Sponsors

Chairs: Oana Andrei (University of Glasgow) and Vashti Galpin (University of Edinburgh)
Sponsors: SICSA and School of Computing Science, University of Glasgow

View more »


Our aim is to foster a cross-disciplinary community bringing together researchers from SICSA institutions interested in the combined application of data-driven techniques and computational modelling and analysis methods to real world problems. We will address current challenges and relevant questions, and share experiences on how to create sound data-driven models amenable to rigorous analysis and verification techniques that deliver reliable information about the data-generating process.

Invited speakers:

  • Leif Azzopardi, Department of Computer and Information Sciences, University of Strathclyde
  • Jane Hillston, School of Informatics, University of Edinburgh

Call for contributions:
We invite contributions on approaches to data-driven modelling and analysis of complex systems on:

  • use of modelling methods and notations in a knowledge management/discovery context,
  • development and use of common modelling and knowledge management/discovery frameworks to explore and understand complex systems from the application domains of interest.
Speakers are invited to present research results in any of the themes of interest for the workshop as well as application experiences, tools, and promising preliminary ideas.
If you would like to give a talk, please get in touch with Oana Andrei via email by 2 March 2018.

The lunch and the coffee breaks are free for registered participants. In order to help with catering arrangements, please register at our Eventbrite page.


10:00 - 10:30 Arrival, tea and coffee
10:30 - 11:30 Invited talk: Leif Azzopardi (University of Strathclyde) - Modelling How People Search
11:30 - 11:55 Tom Dalton (University of St Andrews) - Evaluating Data Linkage: Creating longitudinal synthetic population data to provide 'gold-standard' linked data sets for comprehensive linkage evaluation
11:55 - 12:20 Anthony Chapman (University of Aberdeen) - CLEMI - An Imputation Evaluation Framework
12:20 - 13:00 Lunch
13:00 - 14:00 Invited talk: Jane Hillston (University of Edinburgh) - Moment-based Availability Prediction for Bike-Sharing Systems
14:00 - 14:25 Tim Storer (University of Glasgow) - Modelling Realistic User Behaviour in Information Systems Simulation Models as Fuzzing Aspects
14:25 - 14:50 Chika Eucharia Ugwuanyi (University of Strathclyde) - Indoor CO2 prediction for healthy offices: Machine learning approach
14:50 - 14:15 Juan Afanador (University of Aberdeen) - Delegating via Quitting Games in Ad-hoc Multi-Agent Systems
15:15 - 15:45 Discussions (challenges and applications areas) and wrapping up


Invited talks:

  • Leif Azzopardi: "Modelling How People Search"
    In this talk, I will describe my efforts in trying to understanding how people interact with search systems. Over the years, I’ve used various data modelling techniques to describe interactions – and while it has been possible to characterise how people are searching, explore how their behaviour impacts upon performance, and draw various insights – a key challenge remained – how can we explain the observed behaviours? In perhaps a move away from data driven solutions, recently, I’ve been developing Economic Models of search behaviour, that not only describe, but also explain people’s search behaviours – and enable us to generate testable hypotheses about how their behaviour will change – when factors in the environment (system and user) change. I argue that economic models can be developed for all sorts of human computer interactions, and so are likely to provide many more insights into how people use systems and how we can design systems to encourage or dissuade particular behaviours and outcomes.
    Related Work: Book Chapter - A Tutorial on Economic Models of Interaction (2018), An Analysis of the Cost and Benefit of Search Interactions (2016), Modelling interaction with economic models of search (2014)
  • Jane Hillston: "Moment analysis, model reduction and the London Bike Sharing Scheme"
    User satisfaction with bike-sharing systems relies on them being able to find a bike, or conversely a parking slot, when they need one. I will present an approach to constructing fast but accurate availability predictions through stochastic modelling. Based on historical measurements we construct a population Continuous Time Markov Chain (PCTMC) model with time-dependent rates to capture the flow of bikes between stations throughout the day. Given a target station for prediction, the moments of the number of available bikes in the station at a future time can be derived by a set of moment equations with an initial set-up given by the snapshot of the current state of all stations in the system. Typically this would be too time-consuming for the whole system given its size so we construct a directed contribution graph, which allows us to prune the PCTMC so that it only contains stations which have significant contribution to the journey flows to the target station. Once the moments have been derived, the underlying probability distribution of the available number of bikes is reconstructed through the maximum entropy approach. I will illustrate our approach on Santander Cycles, the bike-sharing system in London.
Contributed talks:
  • Juan Afanador: "Delegating via Quitting Games in Ad-hoc Multi-Agent Systems"
    Delegation allows an agent to request that another agent completes a task. In many situations the task may be delegated onwards, and this process can repeat until it is eventually, successfully or unsuccessfully, performed. We consider policies to guide an agent in choosing who to delegate to when such recursive interactions are possible. These policies, based on quitting games and multi-armed bandits, were empirically tested for effectiveness. Our results indicate that the quitting games based policies outperform those which do not explicitly account for the recursive nature of delegation.
  • Anthony Chapman: "CLEMI - An Imputation Evaluation Framework"
    Missing data is challenging enough without the added complexities posed by a lack of research in evaluating imputation. Not only could we potentially increase the impact and validity of studies from many different sectors (research, public and private), we also believe that by creating evaluation software, more researchers may be willing to use and justify using imputation methods. Our work aims to encourage further research for efficient imputation evaluation by defining a framework which could be used to optimise the way we impute datasets prior to data analysis. We propose a framework which uses a prototypical approach to create testing data and machine learning methods to create a new metric for evaluation. Preliminary results are presented which show how, for our dataset, records with less than 40% missingness could be used for analysis, increasing the amount of available data.
  • Tom Dalton: "Evaluating Data Linkage: Creating longitudinal synthetic population data to provide 'gold-standard' linked data sets for comprehensive linkage evaluation"
    ‘Gold-standard’ data to evaluate linkage algorithms are rare. Synthetic data have the advantage that all the true links are known. In the domain of population reconstruction, the ability to synthesise populations on demand, with varying characteristics, allows a linkage approach to be evaluated across a wide range of data sets. This talk presents a micro-simulation model for generating such synthetic populations, taking as input a set of desired statistical properties. It then outlines how these desired properties are verified in the generated populations, and the intended approach to using generated populations to evaluate linkage algorithms. We envisage a sequence of experiments where a set of populations are generated to consider how linkage quality varies across different populations: with the same characteristics, with differing characteristics, and with differing types and levels of corruption. The performance of an approach at scale is also considered.
  • Tim Storer: "Modelling Realistic User Behaviour in Information Systems Simulation Models as Fuzzing Aspects"
    We contend that the engineering of information systems is hampered by a paucity of tools to tractably model, simulate and predict the impact of realistic user behaviours on the emergent properties of the wider socio-technical system, evidenced by the plethora of case studies of system failure in the literature. We address this gap by presenting a novel approach that models ideal user behaviour as workflows, and introduces irregularities in that behaviour as aspects which fuzz the model. We demonstrate the success of this approach through a case study of software development workflows, showing that the introduction of realistic user behaviour to idealised workflows better simulates outcomes reported in the empirical software engineering literature.
  • Chika Eucharia Ugwuanyi: "Indoor CO2 prediction for healthy offices: Machine learning approach"
    Making use of historical sensor data found in smart environments can offer occupants opportunities to manage their indoor environments for their own safety and well-being. This paper presents machine learning models of how indoor environmental sensor data can be used to predict indoor CO2 levels for occupants’ well-being and comfort. The primary aim of the approach presented here stems from the ensemble model’s paradigm where it is believed that applying more than one model is better than a single model. Therefore, it is practicable to reason about how a commercially available sensor can be used to monitor, track and record our indoor environmental variables and how the results obtained from the analyses was improved with two main ensemble models against a single linear regression model. As a technique applied to a real-world example, it is suggested that future empirical work can be done with this as a template for other sensor data involving either indoor or outdoor environmental variables.


We are grateful to the SICSA themes Theory, Modelling and Computation and Data Science for their generous support, as well as the School of Computing Science, University of Glasgow, for the venue.