In order to predict the space occupancy of a transformed memory resident database, it is necessary to have a mathematical storage model which provides apriori information about expected storage volumes. A particular feature of this system is its data sensitivity, as the performance of dictionary compression techniques relies heavily on contextual and repeated value redundancy in presented relations. The pre-processing of candidate relational data, even on a sample basis, should clarify the situation and give a database administrator a better idea of potential benefits provided by the system.
The rationale and nomenclature underlying the storage model is explained.
Now we can start to develop a predictive model of compression, first note that ,
given that dictionary structures may be shared between field
domains, we will apply a kronicker delta flag for computational distinctions
. i.e.
Also we will define a common domain cardinality , as follows.
where represents the individual cardinality of seperable common domains.
Thus we have, assuming hash table optimal slack being a factor half occupancy.
A measure of compression for a single column is
In terms of distinct types of data compression we observe that the first term represents the elimination of contextual redundancy through the use of minimal width tokens, whereas the composite second represents the suppression of repeated values within the column through the use of a dictionary.
The storage requirements for the ith table, including it's indexing are
The corresponding compression measure for the ith table is
The overall measure of compression is