Suffix Arrays in Ada 95

Rob Irving & Lorna Love, University of Glasgow

General

These comprise a number of Ada 95 packages that provide for construction and use of suffix arrays [1]. In each case, the suffix array is built using a suffix binary search tree - see [2, 3]. The four versions depend on whether the tree is in its standard version or its refined version (see [2]), and whether it is represented using dynamically created nodes or using an array of nodes. The refined versions are faster, but use a little more space than the standard versions. However the latter provide more flexibility - see operations list below. The array versions are more space-economical, and just as fast, as those that use dynamic storage.

The procedures provided by the packages are summarised as follows. (Further details can be found in the header comments in the package specifications.)

 

The demonstration programs

The demo programs have a simple text-based interface. They should be run from a command line with one or two file names as parameters. With a single file parameter, they construct the suffix array for the string contained in that file (replacing new-lines by blanks in a text file). They then find a longest repeated substring, and offer the user the option to input a sequence of search strings, reporting the number of occurrences in each case. With two files as parameters, they construct the generalised suffix array for these two strings and find a longest common substring.

 

Source code

The src folder contains the source code, written in Ada 95. It consists of the following:

These packages are portable, and may be compiled with any Ada 95 compiler.

 

Executables

The bin folder contains executable versions of demonstration programs for Windows and Unix (PC x86 and Sparc Solaris)

 

DISCLAIMER: this software is not supported, and although it has been subjected to extensive testing, the authors cannot take responsibility for any loss etc. resulting from errors or unexpected behaviour.

Copies of the source may be made freely for private study or educational use, provided that the program headers acknowledging the authors are retained.

 

References

  1. U. Manber & E.W. Myers, Suffix arrays: a new method for on-line string searches, SIAM Journal of Computing 22 (1993), 935-948.

2. R.W. Irving & L. Love, The suffix binary search tree and suffix AVL tree, University of Glasgow Computing Science Department Research Report TR-2000-54, 2000.

3. R.W. Irving & L. Love, Suffix binary search trees and suffix arrays, University of Glasgow Computing Science Department Research Report TR-2000-82, 2001.