Next: Dynamic and compressed Up: Data Structures Used Previous: Data Structures Used

Relations

Logically a relation is a set of tuples, and it is conventional to implement the tuples as records with fields stored adjacent to one another --- a horizontal organisation of the data. It is equally valid however, to think of the relation as a sequence of columns, and implement these columns as vectors so that corresponding fields in succesive tuples are stored adjacently --- vertical data organisation. In disk resident database systems the horizontal organisation is, with good reason, more popular. It has the advantage that a single disk access fetches all fields of a tuple. Even for non-compressed RAM databases, where memory access speeds are much faster, the implementational simplicity of this approach commends it.

Operation with compressed data imposes additional constraints which tend to favour a vertical organisation. Unlike a conventional database system, the field widths are not known when the schema is created. The number of bits required to represent the domain of a field is proportional to the logarithm of the domain's cardinality. As data is loaded the cardinality grows.

Suppose we have a field, the chipname field in our examples, that requires 3 bits to encode its domain. If the number of unique strings used in the domain rises above 8, we will need 4 bits to represent them in encoded form. In consequence, it becomes necessary to widen all instances of fields belonging to this domain. With a horizontal data organisation one then has to shift other fields to the right, causing the record size to grow. Unless one wastes space by leaving unused bits at the end of the tuple, one is forced to allocate a complete new vector of tuples and copy the entire relation, tuple by tuple, into the new vector.

With a vertical organisation one still has a cost associated with the widening of fields, but it is less severe. If the fields each have their own column, then only the column that is being widened has to be altered. This reduces both the number of bits that have to be copied and reduces the load on the storage manager.

This load comes in two forms:

The amount of work done by the garbage collector will be proportional to the number of bits persecond being discarded. In shifting from a horizontal to a vertical organisation, the number of reorganisations due to widening does not change, but the number of bits of store discarded on each re-organisation does.
The peak storage occupancy of the system is reduced. During a widening operation, storage is needed both for the original ( narrower ) version of the column or relation and the new (wider ) version. In the worst case, during a widening of a database with a single horizontal relation, the database temporarily almost doubles in size. For a columnar organisation, the temporary store needed during widening is of the order of that occupied by the widest single column.

The HPDS currently uses the relation organisation shown in figure 3.1

Figure 3.1: The relations are stored in colunar rather than record form

The columns are organised as a linked list of fields, each of which points to the dictionary that compresses its domain. The vertical slices through the tuples are then stored in compressed, bit aligned vectors.

Next: Dynamic and compressed Up: Data Structures Used Previous: Data Structures Used

W Paul Cockshott
Fri Sep 6 10:29:18 BST 1996