Computing at Glasgow University
Paper ID: 8865
DCS Tech Report Number: TR-2008-271

An Evaluation of Gaussian Processes for Sentence Classification and Protein Interaction Detection
Polajnar,T. Rogers,S. Girolami,M.

Publication Type: Tech Report (internal)
Appeared in: DCS Technical Report Series
Page Numbers :
Publisher: Dept of Computing Science, University of Glasgow
Year: 2008

Classification methods are vital for efficient access of knowledge hidden in biomedical publications. Non-parametric classifiers, such as support vector machines (SVMs), are popular because the performance is not limited by high-dimensional feature spaces, thus reducing the need for feature engineering. SVMs have been widely used in the past decade for text classification. The Gaussian process (GP) classifier is an analogous probabilistic method, rarely applied to text, which offers the same non-parametric advantages as the SVM. In this paper we provide a much needed comparison of the performance and properties of these classifiers for detection of sentences which describe protein interactions. We find that there is no statistically significant difference in performance of GPs and SVMs on classification tasks. The natural ability to extend the basic GP model and no costly margin parameter tuning make the GP an invaluable tool for text classification.

Keywords: text mining; Gaussian process; support vector machine; protein interaction; sentence classification

PDF Bibtex entry Endnote XML