Rapid traversal of hash tables

Next: Persistence Up: Indexing Previous: Multi-column hashing

Rapid traversal of hash tables

Since the query speed is limited by the traversal of the hash tables it is important to have an efficient method of computing successive hash indices. The hash table is logically a multi-dimensional array, i.e., a sequence of locations each of which is identified with a unique combination of the subscripts. Physically it is usually a sequence or one dimensional array of locations in the computer memory. In the HPDS the sequence of locations is provided by a compressed vector abstract type.

Associated with each complete set of indices to the multidimensional array there will be a number which, when used as an index to the underlying one dimensional array, will define the corresponding position in the one dimensional array. Thus, given A:Array[0..1,0..1,0..3], the element A[1,0,3] will be at location 11 in the underlying storage vector.

In the above example the location index 11 is obtained from the array indices 1,0,3 as shown in figure 3.4. The indices for the array are multiplied by appropriate constants and the results added together to form the location's number.

We shall restrict our attention to multidimensional arrays each of whose dimensions is some integer power of two in length, which occur in the HPDS relational indexing scheme, and look at the problem of how a program or hardware mechanism can step through such a multi-dimensional array, one or more of whose indices is to be held fixed whilst the others are varied. This operation constitutes the inner loop of most queries on the HPDS, and it is important that it be implemented efficiently.

Figure 3.4: Deriving a location index from the multidimensional array indices

Suppose we wish to step through the array , holding the value of j constant at 1 but going through all legitimate combinations of i and k. This corresponds to a query in which the second indexed column of the relation must take on a specified value.

This would involve accessing array elements:

A[0,1,0], A[0,1,1], A[0,1,2], A[0,1,3],

A[1,1,0], A[1,1,1], A[1,1,2], A[1,1,3]

And the corresponding locations: 4,5,6,7,12,13,14,15

The normal process by which this would be done would be to

reserve 3 variables i,j,k
initialise them to the values i=0,j=1,k=0
set up a pair of nested loops so that
1. the outer loop incremented i through all its allowed values
2. the inner loop incremented k through all its allowed values
3. the kernel of the inner loop used the process shown in figure 3.4 to compute the locations from the indices

The disadvantage of this process is that it involves multiplication or shift operations and nested loops, all of which are relatively slow. Performance can be improved if it is recognised that it is possible, by suitable use of mask bits to so arrange the inputs to the CPU's adder that the counting operations required to step through the array can be performed by simple addition.

Suppose that we wished to step through the array A as described above. Instead of maintaining 3 variables corresponding to the dimensions of the array to be traversed, we have a single counting variables c and two mask words, p and q. The following steps are followed:

a boolean or is peformed between c and the mask word p,
the result of the first step is then incremented and returned to register c,
the mask q is anded with the result of the first step and the result used as the location of the current array element

For this technique to work the masks p,q must be prepared so that p holds 1s in the positions corresponding to the dimensions of the array that are being held constant, whilst the mask q contains:

1s in the positions corresponding to the dimensions of the array that are being stepped through;
a pattern of 1s and 0s in the positions corresponding to the dimensions of the array that are being held constant, so that the patterns of 1s and 0s correspond to the appropriately multiplied constant indices of these dimensions;
the value 0 in all other bit positions

The effect of the ORing with p is to create a number with strings of 1s between the fields that correspond to the dimensions being stepped through. These strings of ones have the effect of allowing carrys to propagate through them to the fields in the words corresponding to the next varying dimension.

The illustration in figure 3.5 shows how this is performed in going from array position 0,0,3 to array position 1,0,0.

An implementation, similar to that currently used in the HPDS, is shown in the following sequence of instructions for an x86 class processor.

Figure 3.5: The process by which successive indices are calculated according to the new method.

Assume in what follows that register SI holds the mask p, register DI holds the mask q and that register cx holds a count of the number of elements to visit.

   
   twurzel= record
              garbroot:^tpdata;
              sels:array[0..$1fff] of tselsort;
              htb:array[smallest..bigest] of theapid;
              classes:array[firstper div 8 .. lastper div 8] of pclassid;
            end;

Again the loop requires no slow multiply or shift instructions, and only a single loop is required.

Next: Persistence Up: Indexing Previous: Multi-column hashing

W Paul Cockshott
Fri Sep 6 10:29:18 BST 1996