Computing at Glasgow University
Paper ID: 9278

Text segmentation via topic modeling: An analytical study
Misra,H. Yvon,F. Jose,J.M. Cappe,O.

Publication Type: Conference Proceedings
Appeared in: The 18th ACM Conference on Information and Knowledge Management
Page Numbers : 1553--1556
Publisher: N/A
Year: 2009

In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of latent Dirichlet allocation (LDA) topic model to segment a text into semantically coherent segments. A major ben- efit of the proposed approach is that along with the seg- ment boundaries, it outputs the topic distribution associated with each segment. This information is of potential use in applications like segment retrieval and discourse analysis. The new approach outperforms a standard baseline method and yields significantly better performance than most of the available unsupervised methods on a benchmark dataset.

Keywords: text segmentation, unsupervised topic model- ing, latent Dirichlet allocation, dynamic programming

PDF Bibtex entry Endnote XML