This is the begining of an algebraic specification of a database type system that will combine relational with probabalistic and multi-dimensional database concepts. It is intended that it can be used to build a new compressed database management system as a successor to Hibase.
Why do we need an algebraic specification of the type system? As Hoare says1 :
"The construction of a single mathematical model obeying an elegant set of algebraic laws is a significant intellectual achievement; so is the formulation of a set of algebraic laws characterising an interesting and useful set of models.
"But neither of these achievements is enough. We need to build up a large collection of models and algebras, covering a wide range of computational paradigms, appropriate for implementation either in hardware or in software, either of the present day or of some possible future."
A failing of the original Hibase project was, to my mind, the fact that the stage of preparing a formal specification of the system was elided. The RA who would have had the mathematical training to do this left to become a lecturer elsewhere and those who succeeded him lacked the training in formal methods to do the task. A consequence of this is that we are left with a type system that is organised around the implementation rather than around the mathematical properties of the system that we are modeling. I think that this is undesireable on several counts:
This last point applies particularly to the position of the vector type in the heirarchy. I will argue in this paper that vectors should properly belong beneath relations in the hierarchy.
What I intend to try and do in what follows is to define a hierarchy of types, and for each type to define its signature that is to say a set of operators on it, and a set of axioms that must be valid for these operators.
Within this context, types will be implemented by Java classes, and operators on the types by the methods that operate on the classes.
The broad division is into operators on the one hand and the structured and atomic types on the other. The structured and functional types form sub-lattices of their own. Each type is a pair (D, S). D is a set of sets of concrete denotations on different media: the printed page, the video screen, tansmission form ( as sent over transputer links), the working store of the transputer. S is its signature or the set of functions that take it as arguments or return it as results.
2
This is the Superclass to which all other classes belong - we assume that basic arithmetic is available on all types The universal type is the supertype that contains all of the sub-types derived from atoms, tuples and sets.
Its S ( set of operations ) is { ¬ = > < £ ³ + - ׸ max min show # }. The class Universal has a single pre-given member undefinedValue, which is returned when any operation returns an undefined result.
These operations are inherited by or overriden by equivalent operators in all sub-types of universal. The reason for universal having so mUniversal operations is that we want to be able to construct tuples whose elements are drawn from the class universal. If we want to perform arithmetic on tuples, the basic arithmetic operation must be defined on all members of the class universal.
Beneath the universal class we have atomics, tuples and sets.
Operator | Type | Explanation
|
a < b | ( UniversalÄUniversal ® B) | Returns 1 if a is less than
b, see section
|
|
a = b | ( UniversalÄUniversal ® B) | Returns 1 if a equals
b
|
z¬ a + b | (UniversalÄ Universal ® Universal) | z is the algebraic
sum of
a and b
|
z¬ a ++ b | (UniversalÄ Universal ® Universal) | z is the concatenation of
a and b unlike addition, not a commutative operator
|
z¬ a - b | (UniversalÄ Universal ® Universal) | z is the algebraic
difference of a and b
|
z¬ a ×b | (UniversalÄ Universal ® Universal) | z is the
algebraic product of a and b
|
z¬ a ¸b | (UniversalÄ Universal ® Universal) | z is the quotient
of
a and b
|
z¬ Øa | ( Universal® B) | Negation:
z = 1 Û a = 0
|
z¬ - a | ( Universal®Universal) | Unary minus: z = -1×a
|
| z¬ ||x ( Universal®R) | z is the absolute value of x
| |
package strathclyde.cs.relational; import strathclyde.cs.relational.atomic.*; public interface Universal{ public static final Universal undefinedValue = new Numeric(Double.NaN);The implementaion of the undefined value makes use of the format NaN, short for Not a Number, which is defined in the IEEE floating point standard as the undefined value.
public abstract Universal plus (Universal b);implements +
public abstract Universal concat (Universal b);implements ++
public abstract Universal minus (Universal b);implements -
public abstract Universal times (Universal b);implements ×
public abstract Universal divide (Universal b);implements ¸
public Universal max (Universal b); public Universal min (Universal b); public abstract boolean lessthan (Universal b);implements < note that this returns a boolean not a universal, a boolean can be converted to a universal using the class Bool which implements universal.
}
Subscriptable 3 is an abstract interface that is inherited by all those classes that can be subscripted by universals - tuples, vectors, relations with primary keys.
package strathclyde.cs.relational; public interface Subscriptable { public Universal subscript(Universal i); public Universal subscript(int i); }
1 C.A.R. Hoare. Algebra and models. ACM Software Engineering Notes, 18(5):1-8, December 1993.
2 Part of Mbase implemented in Java Author Nikolaos Kotsis Started Aug 1997 Copyright (c) University of strathclyde
3 * Part of Mbase implemented in Java author Paul Cockshott Started Aug 1997 Copyright (c) University of strathclyde