UNIVERSITY of GLASGOW

Computing at Glasgow University
 
Paper ID: 5871

A Database Index to Large Biological Sequences
Hunt,E. Atkinson,M.P. Irving,R.W.

Publication Type: Conference Proceedings
Appeared in: Proceedings of the 27th Conference on Very Large Databases
Page Numbers : 139-148
Publisher: Morgan Kaufmann
Year: 2001
ISBN/ISSN: 1-55860-804-4
Abstract:

"We present an approach to searching genetic DNA sequences using an adaptation of the suffix tree data structure deployed on the general purpose persistent Java platform, PJama. Our implementation technique is novel, in that it allows us to build suffix trees on disk for arbitrarily large sequences, for instance for the longest human chromosome consisting of 263 million letters. We propose to use such indexes as an alternative to the current practice of serial scanning. We describe our tree creation algorithm, analyse the performance of our index, and discuss the interplay of the data structure with object store architectures. Early measurements are presented."

Keywords: suffix tree, indexing, database


PDF Bibtex entry Endnote XML