Data-driven Modelling and Analysis of Complex Systems 2018

Abstracts

Invited talks:

Leif Azzopardi: "Modelling How People Search"
In this talk, I will describe my efforts in trying to understanding how people interact with search systems. Over the years, I’ve used various data modelling techniques to describe interactions – and while it has been possible to characterise how people are searching, explore how their behaviour impacts upon performance, and draw various insights – a key challenge remained – how can we explain the observed behaviours? In perhaps a move away from data driven solutions, recently, I’ve been developing Economic Models of search behaviour, that not only describe, but also explain people’s search behaviours – and enable us to generate testable hypotheses about how their behaviour will change – when factors in the environment (system and user) change. I argue that economic models can be developed for all sorts of human computer interactions, and so are likely to provide many more insights into how people use systems and how we can design systems to encourage or dissuade particular behaviours and outcomes.
Related Work: Book Chapter - A Tutorial on Economic Models of Interaction (2018), An Analysis of the Cost and Benefit of Search Interactions (2016), Modelling interaction with economic models of search (2014)
Jane Hillston: "Moment analysis, model reduction and the London Bike Sharing Scheme"
User satisfaction with bike-sharing systems relies on them being able to find a bike, or conversely a parking slot, when they need one. I will present an approach to constructing fast but accurate availability predictions through stochastic modelling. Based on historical measurements we construct a population Continuous Time Markov Chain (PCTMC) model with time-dependent rates to capture the flow of bikes between stations throughout the day. Given a target station for prediction, the moments of the number of available bikes in the station at a future time can be derived by a set of moment equations with an initial set-up given by the snapshot of the current state of all stations in the system. Typically this would be too time-consuming for the whole system given its size so we construct a directed contribution graph, which allows us to prune the PCTMC so that it only contains stations which have significant contribution to the journey flows to the target station. Once the moments have been derived, the underlying probability distribution of the available number of bikes is reconstructed through the maximum entropy approach. I will illustrate our approach on Santander Cycles, the bike-sharing system in London.

Contributed talks:

Juan Afanador: "Delegating via Quitting Games in Ad-hoc Multi-Agent Systems"
Delegation allows an agent to request that another agent completes a task. In many situations the task may be delegated onwards, and this process can repeat until it is eventually, successfully or unsuccessfully, performed. We consider policies to guide an agent in choosing who to delegate to when such recursive interactions are possible. These policies, based on quitting games and multi-armed bandits, were empirically tested for effectiveness. Our results indicate that the quitting games based policies outperform those which do not explicitly account for the recursive nature of delegation.
Anthony Chapman: "CLEMI - An Imputation Evaluation Framework"
Missing data is challenging enough without the added complexities posed by a lack of research in evaluating imputation. Not only could we potentially increase the impact and validity of studies from many different sectors (research, public and private), we also believe that by creating evaluation software, more researchers may be willing to use and justify using imputation methods. Our work aims to encourage further research for efficient imputation evaluation by defining a framework which could be used to optimise the way we impute datasets prior to data analysis. We propose a framework which uses a prototypical approach to create testing data and machine learning methods to create a new metric for evaluation. Preliminary results are presented which show how, for our dataset, records with less than 40% missingness could be used for analysis, increasing the amount of available data.
Tom Dalton: "Evaluating Data Linkage: Creating longitudinal synthetic population data to provide 'gold-standard' linked data sets for comprehensive linkage evaluation"
‘Gold-standard’ data to evaluate linkage algorithms are rare. Synthetic data have the advantage that all the true links are known. In the domain of population reconstruction, the ability to synthesise populations on demand, with varying characteristics, allows a linkage approach to be evaluated across a wide range of data sets. This talk presents a micro-simulation model for generating such synthetic populations, taking as input a set of desired statistical properties. It then outlines how these desired properties are verified in the generated populations, and the intended approach to using generated populations to evaluate linkage algorithms. We envisage a sequence of experiments where a set of populations are generated to consider how linkage quality varies across different populations: with the same characteristics, with differing characteristics, and with differing types and levels of corruption. The performance of an approach at scale is also considered.
Tim Storer: "Modelling Realistic User Behaviour in Information Systems Simulation Models as Fuzzing Aspects"
We contend that the engineering of information systems is hampered by a paucity of tools to tractably model, simulate and predict the impact of realistic user behaviours on the emergent properties of the wider socio-technical system, evidenced by the plethora of case studies of system failure in the literature. We address this gap by presenting a novel approach that models ideal user behaviour as workflows, and introduces irregularities in that behaviour as aspects which fuzz the model. We demonstrate the success of this approach through a case study of software development workflows, showing that the introduction of realistic user behaviour to idealised workflows better simulates outcomes reported in the empirical software engineering literature.
Chika Eucharia Ugwuanyi: "Indoor CO2 prediction for healthy offices: Machine learning approach"
Making use of historical sensor data found in smart environments can offer occupants opportunities to manage their indoor environments for their own safety and well-being. This paper presents machine learning models of how indoor environmental sensor data can be used to predict indoor CO2 levels for occupants’ well-being and comfort. The primary aim of the approach presented here stems from the ensemble model’s paradigm where it is believed that applying more than one model is better than a single model. Therefore, it is practicable to reason about how a commercially available sensor can be used to monitor, track and record our indoor environmental variables and how the results obtained from the analyses was improved with two main ensemble models against a single linear regression model. As a technique applied to a real-world example, it is suggested that future empirical work can be done with this as a template for other sensor data involving either indoor or outdoor environmental variables.

10:00 - 10:30	Arrival, tea and coffee
10:30 - 11:30	Invited talk: Leif Azzopardi (University of Strathclyde) - Modelling How People Search
11:30 - 11:55	Tom Dalton (University of St Andrews) - Evaluating Data Linkage: Creating longitudinal synthetic population data to provide 'gold-standard' linked data sets for comprehensive linkage evaluation
11:55 - 12:20	Anthony Chapman (University of Aberdeen) - CLEMI - An Imputation Evaluation Framework
12:20 - 13:00	Lunch
13:00 - 14:00	Invited talk: Jane Hillston (University of Edinburgh) - Moment-based Availability Prediction for Bike-Sharing Systems
14:00 - 14:25	Tim Storer (University of Glasgow) - Modelling Realistic User Behaviour in Information Systems Simulation Models as Fuzzing Aspects
14:25 - 14:50	Chika Eucharia Ugwuanyi (University of Strathclyde) - Indoor CO2 prediction for healthy offices: Machine learning approach
14:50 - 14:15	Juan Afanador (University of Aberdeen) - Delegating via Quitting Games in Ad-hoc Multi-Agent Systems
15:15 - 15:45	Discussions (challenges and applications areas) and wrapping up

DDMACS: Data-driven Modelling and Analysis of Complex Systems

- a SICSA Workshop -

About DDMACS

Venue and Date

Organisers and Sponsors

About DDMACS

Programme

Abstracts

Sponsors