<XML><RECORDS><RECORD><REFERENCE_TYPE>3</REFERENCE_TYPE><REFNUM>5871</REFNUM><AUTHORS><AUTHOR>Hunt,E.</AUTHOR><AUTHOR>Atkinson,M.P.</AUTHOR><AUTHOR>Irving,R.W.</AUTHOR></AUTHORS><YEAR>2001</YEAR><TITLE>A Database Index to Large Biological Sequences</TITLE><PLACE_PUBLISHED>Proceedings of the 27th Conference on Very Large Databases </PLACE_PUBLISHED><PUBLISHER>Morgan Kaufmann</PUBLISHER><PAGES>139-148</PAGES><ISBN>1-55860-804-4</ISBN><LABEL>Hunt:2001:5871</LABEL><KEYWORDS><KEYWORD>suffix tree</KEYWORD></KEYWORDS<ABSTRACT>"We present an approach to searching genetic DNA sequences using an adaptation of the suffix tree data structure deployed on the general purpose persistent Java platform, PJama. Our implementation technique is novel, in that it allows us to build suffix trees on disk for arbitrarily large sequences, for instance for the longest human chromosome consisting of 263 million letters. We propose to use such indexes as an alternative to the current practice of serial scanning. We describe our tree creation algorithm, analyse the performance of our index, and discuss the interplay of the data structure with object store architectures. Early measurements are presented." </ABSTRACT></RECORD></RECORDS></XML>