Logically a relation is a set of tuples, and it is conventional to implement the tuples as records with fields stored adjacent to one another --- a horizontal organisation of the data. It is equally valid however, to think of the relation as a sequence of columns, and implement these columns as vectors so that corresponding fields in succesive tuples are stored adjacently --- vertical data organisation. In disk resident database systems the horizontal organisation is, with good reason, more popular. It has the advantage that a single disk access fetches all fields of a tuple. Even for non-compressed RAM databases, where memory access speeds are much faster, the implementational simplicity of this approach commends it.
Operation with compressed data imposes additional constraints which tend to favour a vertical organisation. Unlike a conventional database system, the field widths are not known when the schema is created. The number of bits required to represent the domain of a field is proportional to the logarithm of the domain's cardinality. As data is loaded the cardinality grows.
Suppose we have a field, the chipname field in our examples, that requires 3 bits to encode its domain. If the number of unique strings used in the domain rises above 8, we will need 4 bits to represent them in encoded form. In consequence, it becomes necessary to widen all instances of fields belonging to this domain. With a horizontal data organisation one then has to shift other fields to the right, causing the record size to grow. Unless one wastes space by leaving unused bits at the end of the tuple, one is forced to allocate a complete new vector of tuples and copy the entire relation, tuple by tuple, into the new vector.
With a vertical organisation one still has a cost associated with the widening of fields, but it is less severe. If the fields each have their own column, then only the column that is being widened has to be altered. This reduces both the number of bits that have to be copied and reduces the load on the storage manager.
This load comes in two forms:
Figure 3.1: The relations are stored in colunar rather than record form
The columns are organised as a linked list of fields, each of which points to the dictionary that compresses its domain. The vertical slices through the tuples are then stored in compressed, bit aligned vectors.