Department of Computing Science
Probabilistic assignments of formulas to mass peaks in metabolomic experiments
Simon Rogers (a), Richard A. Scheltema (b), Mark Girolami (a) and Rainer Breitling (b)
(a) Department of Computing Science, University of Glasgow, Glasgow UK
(b) Groningen Bioinformatics Centre, University of Groningen, The Netherlands
Motivation: High-accuracy mass spectrometry is a popular technology for high-throughput measurements of cellular metabolites (metabolomics). One of the major challenges is the correct identification of the observed mass peaks, including the assignment of their empirical formula, based on the measured mass.
Results: We propose a novel probabilistic method for the assignment of empirical formulas to mass peaks in high-throughput metabolomics mass spectrometry measurements. The method incorporates information about possible biochemical transformations between the compounds to assign higher weights to formulas that could be created from other metabolites in the sample. In a series of experiments we show that the method performs well and provides greater insight than assignments based on mass alone. In addition, we extend the model to incorporate isotope information to achieve even more reliable formula identification.
Contact:srogers@dcs.gla.ac.uk

Supplementary document
Available for download as a .pdf

Code
Matlab implementation and example scripts will be available soon.

Data
The measured masses from the Trypanasoma dataset reported by Breitling et al. (2006) were matched to KEGG metabolites, using a mass window of ±10 ppm. Matching entries were retrieved with the SOAP interface provided at the KEGG website. All unique molecular formulas were selected and stored with the mass. The automated matching was performed by the MetabolomeExplorer software (Scheltema et al., in prep.), using a Java implementation based on the KEGG library (keggapi.jar) utilizing the functions search_compounds_by_mass (retrieves all compound KEGG ids within the provided mass range) and bget (retrieves all information in the KEGG database for a given id). Dedicated software was written to interpret the results from the bget function. The full annotation file is available for download.
The full list of chemical transformations can be downloaded here.