Next: Indexing Up: Data Structures Used Previous: Dynamic and compressed

Dictionaries

We have oversimplified the dictionary representation by showing them in table 2.3 as just two dimensional arrays of characters. A dictionary must perform two mappings:

It must map fields to their encoded representation during the compression operation: encode(lexemetoken).
It must perform the reverse mapping from codes to literal values when parts of the relation are printed out: decode(tokenlexeme).
The mapping must be cyclic such that .

It should perform these operations fast, with minimal waste of space.

Figure 3.3: Dictionary implementation. It can be seen that the mapping is cyclic such that .

The implementation chosen in shown in figure 3.3. Consider first the string pool, this is a dynamic vector whose elements are 8 bit fields, The strings to be encoded are stored end to end in the pool. The offsets within the pool of the strings starting positions, would itself comprise map the strings onto a fairly dense range of integers. Since, however, the mean string length is likely to be greater than 1, let us say it is s, there will on average be bits wasted in each such code. Further compression becomes possible by using a table that maps a dense subrange of the integers onto the strings starting positions. Composed, these two table lookup operations allow efficient decoding from tokens to lexemes.

The reverse mapping is provided by a hash table that maps lexemes onto their tokens.

All tables use dynamic compressed vectors which ensures that their entries are stored in fields of minimal width, and that the data-structures are extensible.

W Paul Cockshott
Fri Sep 6 10:29:18 BST 1996