What's on in Computing Science?
Date: Friday, 10 September, 2010
Sir Alwyn Williams Building, 423 Seminar Room
[ Inference Seminar ] Robust Methodologies for Partition Clustering
Paulo Lisboa (Liverpool John Moores University)
Clustering is fundamental to exploratory data analysis. While there are well developed statistical clustering algorithms, these are typically model dependent. Many practical applications still require solutions expressed as single partitions using standard computational algorithms, such as k-means, applied to data with non-standard distributions. Intriguingly, there does not appear to be any agreed methodology to identify a 'best' solution, recognising the high prevalence of local minima depending on the initial conditions. This talk presents a methodology to systematically map the landscape of cluster partitions and so enables the selection of suitable reference solutions based on stability as well as separability criteria. The measure of cluster separation also relates to visualisation of high dimensional data, in a relatively new result using a decades old approach that could be more widely used.
The methodology is generic and is validated with a difficult synthetic data set, then applied to bioinformatics data including sub-typing of breast cancer from non-normal protein expression values measured from resected biopsies (n=1076). As an epilogue, a straightforward application is described which resulted in the maps recently featured in news items related to local alcohol profiles http://www.bbc.co.uk/news/health-11138535
Contact: Dr Rónán Daly (email@example.com)
Add to my calendar