<XML><RECORDS><RECORD><REFERENCE_TYPE>3</REFERENCE_TYPE><REFNUM>5689</REFNUM><AUTHORS><AUTHOR>Hunt,E.</AUTHOR></AUTHORS><YEAR>2000</YEAR><TITLE>PJama Stores and Suffix Tree Indexing for Bioinformatics Applications</TITLE><PLACE_PUBLISHED> 10th PhD Workshop at ECOOP'00 </PLACE_PUBLISHED><PUBLISHER>N/A</PUBLISHER><PAGES>8</PAGES><LABEL>Hunt:2000:5689</LABEL><KEYWORDS><KEYWORD>persistence</KEYWORD></KEYWORDS<ABSTRACT> Motivation: The biggest public domain biological sequence archive exceeds 6 Gbases of DNA and much larger sequence amounts are held by industrial labs. The amount of data is growing exponentially but sequence search technologies still rely on flat file storage and high-throughput parallel computers reading all data sequentially to find sequence similarities or patterns. This issue is not addressed by existing database technologies. Results: We explored DNA and protein sequence indexing using transient and persistent suffix trees and tested our retrieval methods with human, worm and bacterial DNA, and protein data sets. Our index structure is designed in Java and takes advantage of orthogonal persistence for Java, PJama. Our exact sequence search methods deliver excellent performance and will complement our existing genome map applets by showing sequence query hits in genomic context. </ABSTRACT><URL>http://www.inf.elte.hu/~phdws/1-1-Hunt.ps</URL></RECORD></RECORDS></XML>