Tree Structures for Algorithmic Problems on Strings

EPSRC research project GR/L98640

(1st April 1998 - 31st March 2001)

[A suffix tree structure]

[ Top | Introduction | People | Publications | Software | Links ]

Introduction

Suffix trees [Wei73, McC76, Ukk95] and suffix arrays [MM93] are classical data structures that can be used to facilitate efficient on-line string searching. A comprehensive discussion of these structures appears in [Gus97]. The main aim of this project was to investigate and evaluate the use of binary search trees in this context.

[ Top | Introduction | People | Publications | Software | Links ]

People

The Principal Investigator of the Tree Structures research project was Rob Irving. Lorna Love was a research student on the project.

In addition, the following former students undertook MSc in IT projects involving suffix tree structures:

  • Sarah Cox, "Suffix Trees in Java", M.Sc. in I.T. dissertation 1999.
  • Brian Young, "Suffix Binary Search Trees in Java", MSc in I.T. dissertation 2000.

[ Top | Introduction | People | Publications | Software | Links ]

Publications

The following publications and reports are directly associated with the Tree Structures research project:

  • R.W. Irving and L. Love, "The suffix binary search tree and suffix AVL tree", Technical Report no. TR-2000-54 of the Computing Science Department of Glasgow University, February 2000. (postscript version)
  • R.W. Irving and L. Love, "The suffix binary search tree and suffix AVL tree", Journal of Discrete Algorithms, vol. 1 (2003), pp. 387-408. (postscript version)
  • R.W. Irving and L. Love, "Suffix arrays and suffix binary search trees", Technical Report no. TR-2001-82 of the Computing Science Department of Glasgow University, March 2001. (postscript version)
  • L. Love, "The suffix binary search tree", PhD thesis, Computing Science Department, University of Glasgow, 2001.
  • E. Hunt, R.W. Irving and M.P. Atkinson, "Persistent suffix trees and suffix binary search trees as DNA sequence indexes", Technical Report no. TR-2000-63 of the Computing Science Department of Glasgow University, October 2000. (postscript version)
  • E. Hunt, M.P. Atkinson and R.W. Irving, "A database index to large biological sequences", Technical Report no. TR-2000-82 of the Computing Science Department of Glasgow University, February 2001. (postscript version), in Proceedings of VLDB'01, Rome, September 2001, pp. 139-148.
  • E. Hunt, M.P. Atkinson and R.W. Irving, Database Indexing for large DNA and Protein Sequence Collections, VLDB Journal, vol. 11 (2002), pp. 256-271.

 

[ Top | Introduction | People | Publications | Software | Links ]

Software

Part of the Tree Structures research project has involved the implementation of algorithms for a variety of suffix-based structures. On this website, we have provided source code, executable files and documentation (in the form of READ_ME files) for the following:

  • Suffix Binary Search Trees (Ada95 and C). These files provide functions/procedures for constructing and exploiting suffix binary search trees, in various representations, together with demo programs.
  • Suffix Trees (Ada95 and C). These files provide functions/procedures for constructing and exploiting suffix trees, together with demo programs.
  • Suffix Arrays (Ada95 and C). These files provide functions/procedures for constructing and exploiting suffix arrays, together with demo programs.

 

[ Top | Introduction | People | Publications | Software | Links ]

Links

The following links are to pages giving further information on suffix structures.

 

[ Top | Introduction | People | Publications | Software | Links ]