We have oversimplified the dictionary representation by showing them in table 2.3 as just two dimensional arrays of characters. A dictionary must perform two mappings:
Figure 3.3: Dictionary implementation. It can be seen that the
mapping is cyclic such that .
The implementation chosen in shown in figure 3.3. Consider first
the string pool, this is a dynamic vector whose elements are 8 bit fields,
The strings to be encoded are stored end to end in the pool.
The offsets within the pool of the strings starting positions, would
itself comprise map the strings onto a fairly dense range of integers.
Since, however, the mean string length is likely to be greater than 1,
let us say it is s, there will on average be bits wasted
in each such code. Further compression becomes possible by using
a table that maps a dense subrange of the integers onto the strings starting
positions. Composed, these two table lookup operations allow efficient
decoding from tokens to lexemes.
The reverse mapping is provided by a hash table that maps lexemes onto their tokens.
All tables use dynamic compressed vectors which ensures that their entries are stored in fields of minimal width, and that the data-structures are extensible.