Introduction

An extensible sparse functional array (ESFA) is a data structure that provides incremental updates and lookups.

The data structure is implemented through a combination of hardware and software; it relies on the massive parallelism available in a digital circuit. ESFA can be executed on a conventional von Neumann architecture, but not efficiently: for example, it can be done by simulating the circuit and runnint the algorithm on the simulated circuit. The intention, however, is to use massive parallelism to implement the abstract circuit at full speed, or at least without too bad a slowdown.

ESFA arrays have the following properties:

They are purely functional: when you update an array, the result is a completely new array, but the old one is unchanged and still available. Instead of making a side-effect to an array to change its value, you create new arrays, and you can't ever change the value of an old array.
They are extensible and sparse: you don't need to allocate space, or specify the minimum and maximum index values. You can traverse an array from one defined index to the next
All operations are unit time; that is, each operation takes a fixed number of clock cycles

Basic operations

There are two fundamental operations: update and lookup, and an empty array constant called empty. There is a delete operation, and there are several ways to use this to make ESFA work cleanly with the host system.

The only way to create an array is to use update, which takes an existing array, and index, and a value. Update returns a new array which is identical to the old one, except that it has the specified value at the given index. The old array is not modified.

You can fetch the value of an array at an index using lookup.

There are two interfaces to ESFA: a set of pure functions, and a monadic interface that makes the state explicit.

Here are the types of empty, update, and lookup, using the pure function interface.

AName is the type of an array
Idx is the type of an index
EltVal e is the type of an element

esfEmpty :: AName
esfUpdate :: AName -> Idx -> EltVal e -> AName
esfLookup :: AName -> Idx -> Maybe (EltVal e)

Laws for lookup and update

The behaviour of ESF arrays is defined by two laws.

Law 1 (Empty array)

If you look up any index at all in an empty array, you get nothing.

esfLookup esfEmpty i = Nothing

Law 2 (Nonempty array)

Suppose an array a2 is defined as update a j v. Now if you look up a2 at index j, the result is Just v. If you look up a2 at any index other than j, the result is the same as looking up a at that index.

esfLookup (esfUpdate a j v) i
  | i==j  = Just v
  | i/=j  = esfLookup a j

Some important points.

You don't allocate an array with a given size
There is only one way to create an array, and that's using update
When you update an array, you don't modify it - you create a new array
An array is immutable: once created, its value will never change
You can look up an array at any index; that's why the result is a Maybe type

An annotated example

This section gives an example program, interleaved with its output and some commentary explaining what is going on. This program is in the file examples/Example2.hs, along with some other example programs. You can experiment by changing the updates and lookups that are used. A more detailed explaination of how the array representation works appears in the papers (see the papers on the ESFA web page). The next section summarises the operations.

Module imports

The program is in the file Example2.hs. It's a main module, and it imports several modules.

-- Example of ESF Arrays showing sharing and deletion

module Main where

import Control.Monad.IO.Class
import Control.Monad.Trans.State.Strict
import Data.Array.ESFA.Spec
import Data.Array.ESFA.SimArch

Setting up the simulation

The main program uses runtree to simulate the circuit, and to run the ESFA algorithm on that simulated circuit. The first argument is the log of the number of cells; thus specifying k=4 gives a tiny machine with 16 cells. The second parameter () is the initial value of the users state, which is not used in this particular program. The script is the user program, running in the StateT monad. You can do IO using liftIO, and the ESF machine needs to be initialized, setting all the cells in the circuit to empty.

main :: IO ()
main = runtree 4 () script

Printing out lots of status information

There are several operations that print information contained in the system state. In this example program, several of those will be used after each update. For convenience, they are collected into one operation called showState. The output from this is described later, after the second update is performed.

showState =
  do esfmPrintState
     esfmPrintMachineArrays
     esfmPrintSpecArrays
     stats <- esfmAnalyzeMachineState
     liftIO (putStrLn (show stats))
     liftIO (putStrLn (take 75 (repeat '-')))

The script and initialization

The user program uses the StateT monad; StESFA is defined for convenience. The program needs to begin by initializing the system state. It also sets an option, EsfmTraceOperation, that will cause the system to print a message each time an update or other ESFA operation is performed.

script :: StESFA () Int ()
script =
  do liftIO (putStrLn "ESFA Example 2")
     esfmInitialize
     esfmSetOption EsfmTraceOperation

Creating an array using update

Now the program will create an array by doing an update. The empty array is called esfmEmpty (for now, at least). It's always present: the user doesn't need to create it, can't create it, and can't destroy it. The only way to get started is by updating esfmEmpty to get a new array. The following definition updates esfmEmpty at index 100 with the value 101. The string parameter "a1" isn't really part of the ESFA system, and isn't present at all in the machine representation. But the esfm API lets you give a string label to an array, which is used by esfmPrintMachineArrays and esfmPrintSpecArrays to make it easier to see which array is which. This is the only purpose of that label. It's possible for esfmUpdate to fail, so it returns a Maybe type. There are several ways to detect and handle errors, but in this demonstration program we just pattern match so that a1 is defined to be the actual array. The update would fail if the memory is completely full, or if the array being updated doesn't exist.

     Just a1 <- esfmUpdate "a1" esfmEmpty (100, mkInt 101)

No we can print the internal machine representation. This shows the fields in the cells. Naturally this representation takes some work to interpret, because it's designed to be used by the algorithm, not to be read by users. See the paper (and a later section in this document that's not written yet) for a detailed explanation of what the fields mean, and how to interpret the results.

     showState
-- a1 = 100-101

The output is below. Each array element is stored in a cell (a word in the ESFM memory). A cell has a CellID, which is essentially an address. Only the cells that contain anything are printed, and Cell 0 was allocated, so its contents are printed out.

*Main> main
ESFA Example 2
esf update: define a1 (name=0) = update -1 [100]=EsfUserOrdVal 101
    0.  *  >    1    1..1     [  100] = EsfUserOrdVal 101

There is a meta-operation called esfmPrintMachineArrays that examines the states of the cell circuits, works out the arrays that are represented, and prints them out. Each time you do an esfmUpdate, the correct result --- correct according to the executable specification --- is automatically saved in the state. This is using a conventional tree data structure in the RAM to represent the ESF arrays. The esfmPrintSpecArrays operation prints those out.

esfmPrintMachineArrays: arrays represented in machine state:
a1 (0)  =   [100] = 101.

esfmPrintSpecArrays: arrays according to specification:
esfmEmpty (-1)  =
a1 (0)  =   [100] = 101.

MachineStats {ncells = 16, narrays = 1, nelts = 1, nzombies = 0, nfree = 15}
---------------------------------------------------------------------------

By comparing the results of the two print operations, you can check that all is well. The results should always be identical (except that printSpecArrays currently shows esfmEmpty). At this point there is just one array, a1, and it contains the value 101 at index 100.

Now a2 is defined by updating a1.

     Just a2 <- esfmUpdate "a2" a1 (200, mkInt 201)
     showState
-- a2 = 200-201 100-101

Now there are two arrays in the system. The old array, a1, has not been modified, and the new array, a2, is just like a1 except that it has the new element.

esf update: define a2 (name=1) = update 0 [200]=EsfUserOrdVal 201
    0.     >    1    1..2     [  100] = EsfUserOrdVal 101
    1.  *  >    2    2..2     [  200] = EsfUserOrdVal 201
esfmPrintMachineArrays: arrays represented in machine state:
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.

esfmPrintSpecArrays: arrays according to specification:
esfmEmpty (-1)  =
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.

MachineStats {ncells = 16, narrays = 2, nelts = 2, nzombies = 0, nfree = 14}

A tree diagram (shown with indented text) illustrates the sharing. Array a1 contains the [100]=101 element. Array a2 contains [200]=201, and it also inherits [100]=101 from a1. Thus the element [100]=101 is shared between a1 and a2.

a1 [100]=101
  a2 [200]=201

Now consider how these arrays are represented in the machine. The crucial point is that each element is represented in just one cell, even if it is shared among many arrays. Here's a tree diagram showing the array codes and the inclusion sets.

a1 (0->1) [100]=101  1..2
  a2 (1->2) [200]=201  2..2

The notation a1 (0->1) means that array a1 has AName=0, and it currently has ACode=1. The AName is permanent; a1 will always have AName=0. When a1 was created, the update operation returned the int 0, and the value of a1 in the main program is 0. When an array is created, an arbitrary empty cell is chosen to hold its element, and if the CellId of that cell is i, then the name of the array is AName=i. Similarly, a2 has name 1 and code 2.

The notation i..j means an inclusion set consisting of arrays with codes c such that c <= i <= j. Thus the [100] element has inclusion set {ACode 1, ACode 2} which is the same as {AName 0, AName 1}, or {a1, a2}. This element is shared between two arrays. The [200] element belongs only to a2.

If you want to find the elements of an array, say a1, first find its AName (0), then look in cell with CellID=0 to get the ACode (it's 1), then look at all the cells where 1 lies within the low..high interval (in the state right now, the only element with that property is [100]. This identifies the candidates for the elements of the array, but if several of these elements have the same index value, then the most recent update has to be chosen; the lookup algorithm does this by finding the candidate with the specified index that has the narrowest inclusion set interval.

Multi-threaded update

Now we define a new array by updating a1 again. Notice that a1 has been updated twice, in two different ways, resulting in a2 and a3. This is a "multi-threaded" access, where a new update uses some older array, rather than just the most recently created array.

     Just a3 <- esfmUpdate "a3" a1 (300, mkInt 301)
     showState
-- a3 = 300-301 100-101

esf update: define a3 (name=2) = update 0 [300]=EsfUserOrdVal 301
    0.     >    1    1..3     [  100] = EsfUserOrdVal 101
    1.     >    3    3..3     [  200] = EsfUserOrdVal 201
    2.  *  >    2    2..2     [  300] = EsfUserOrdVal 301
esfmPrintMachineArrays: arrays represented in machine state:
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a3 (2)  =   [100] = 101.  [300] = 301.

esfmPrintSpecArrays: arrays according to specification:
esfmEmpty (-1)  =
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a3 (2)  =   [100] = 101.  [300] = 301.

MachineStats {ncells = 16, narrays = 3, nelts = 3, nzombies = 0, nfree = 13}

     Just a4 <- esfmUpdate "a4" a2 (400, mkInt 401)
     showState
-- a4 = 400-401 200-201 100-101

esf update: define a4 (name=3) = update 1 [400]=EsfUserOrdVal 401
    0.     >    1    1..4     [  100] = EsfUserOrdVal 101
    1.     >    3    3..4     [  200] = EsfUserOrdVal 201
    2.     >    2    2..2     [  300] = EsfUserOrdVal 301
    3.  *  >    4    4..4     [  400] = EsfUserOrdVal 401
esfmPrintMachineArrays: arrays represented in machine state:
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a3 (2)  =   [100] = 101.  [300] = 301.
a4 (3)  =   [100] = 101.  [200] = 201.  [400] = 401.

esfmPrintSpecArrays: arrays according to specification:
esfmEmpty (-1)  =
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a3 (2)  =   [100] = 101.  [300] = 301.
a4 (3)  =   [100] = 101.  [200] = 201.  [400] = 401.

MachineStats {ncells = 16, narrays = 4, nelts = 4, nzombies = 0, nfree = 12}

Shadowed elements

     Just a5 <- esfmUpdate "a5" a2 (200, mkInt 501)
     Just temp <- esfmLookup "a2" a2 200
     liftIO (putStrLn ("a2[200] = " ++ show temp)) -- expect 201
     Just temp <- esfmLookup "a5" a5 200
     liftIO (putStrLn ("a5[200] = " ++ show temp)) -- expect 501
     showState
-- a5 = 500-501 200-201 100-101

esf update: define a5 (name=4) = update 1 [200]=EsfUserOrdVal 501
esf lookup a2 (name=1)  [200] = Just (EsfUserOrdVal 201)
a2[200] = EsfUserOrdVal 201
esf lookup a5 (name=4)  [200] = Just (EsfUserOrdVal 501)
a5[200] = EsfUserOrdVal 501
    0.     >    1    1..5     [  100] = EsfUserOrdVal 101
    1.  *  >    3    3..5     [  200] = EsfUserOrdVal 201
    2.     >    2    2..2     [  300] = EsfUserOrdVal 301
    3.     >    5    5..5     [  400] = EsfUserOrdVal 401
    4.  *  >    4    4..4     [  200] = EsfUserOrdVal 501
esfmPrintMachineArrays: arrays represented in machine state:
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a3 (2)  =   [100] = 101.  [300] = 301.
a4 (3)  =   [100] = 101.  [200] = 201.  [400] = 401.
a5 (4)  =   [100] = 101.  [200] = 201.  [200] = 501.

esfmPrintSpecArrays: arrays according to specification:
esfmEmpty (-1)  =
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a3 (2)  =   [100] = 101.  [300] = 301.
a4 (3)  =   [100] = 101.  [200] = 201.  [400] = 401.
a5 (4)  =   [100] = 101.  [200] = 501.  [200] = 201.

MachineStats {ncells = 16, narrays = 5, nelts = 5, nzombies = 0, nfree = 11}

Another update

     Just a6 <- esfmUpdate "a6" a3 (600, mkInt 601)
     showState
-- a6 = 600-601 300-301 100-101

esf update: define a6 (name=5) = update 2 [600]=EsfUserOrdVal 601
    0.     >    1    1..6     [  100] = EsfUserOrdVal 101
    1.     >    4    4..6     [  200] = EsfUserOrdVal 201
    2.     >    2    2..3     [  300] = EsfUserOrdVal 301
    3.     >    6    6..6     [  400] = EsfUserOrdVal 401
    4.     >    5    5..5     [  200] = EsfUserOrdVal 501
    5.  *  >    3    3..3     [  600] = EsfUserOrdVal 601
esfmPrintMachineArrays: arrays represented in machine state:
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a3 (2)  =   [100] = 101.  [300] = 301.
a4 (3)  =   [100] = 101.  [200] = 201.  [400] = 401.
a5 (4)  =   [100] = 101.  [200] = 201.  [200] = 501.
a6 (5)  =   [100] = 101.  [300] = 301.  [600] = 601.

esfmPrintSpecArrays: arrays according to specification:
esfmEmpty (-1)  =
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a3 (2)  =   [100] = 101.  [300] = 301.
a4 (3)  =   [100] = 101.  [200] = 201.  [400] = 401.
a5 (4)  =   [100] = 101.  [200] = 501.  [200] = 201.
a6 (5)  =   [100] = 101.  [300] = 301.  [600] = 601.

MachineStats {ncells = 16, narrays = 6, nelts = 6, nzombies = 0, nfree = 10}

Lookups

The next part of the program performs some lookups. In a real program, lookup (and the sparse operations described later) are the only way to fetch data from the machine. (Note: the string label parameter shouldn't be there and will be removed.) To do a lookup, you specify an array and index, and the result is the contents of the array at that index. Naturally this is a Maybe type.

     Just x1 <- esfmLookup "a4" a4 400   -- expect 401
     Just x2 <- esfmLookup "a5" a5 100   -- expect 101
     Just x3 <- esfmLookup "a5" a5 200   -- expect 201
     Just x4 <- esfmLookup "a6" a6 300   -- expect 301
     liftIO (putStrLn (show (map getInt [x1,x2,x3,x4])))

esf lookup a4 (name=3)  [400] = Just (EsfUserOrdVal 401)
esf lookup a5 (name=4)  [100] = Just (EsfUserOrdVal 101)
esf lookup a5 (name=4)  [200] = Just (EsfUserOrdVal 501)
esf lookup a6 (name=5)  [300] = Just (EsfUserOrdVal 301)
[401,101,501,301]

Deleting an array may not delete any elements

     esfmDelete a3
     showState

esf delete name=2
    0.     >    1    1..5     [  100] = EsfUserOrdVal 101
    1.     >    3    3..5     [  200] = EsfUserOrdVal 201
    2.  *            2..2     [  300] = EsfUserOrdVal 301
    3.     >    5    5..5     [  400] = EsfUserOrdVal 401
    4.     >    4    4..4     [  200] = EsfUserOrdVal 501
    5.     >    2    2..2     [  600] = EsfUserOrdVal 601
esfmPrintMachineArrays: arrays represented in machine state:
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a4 (3)  =   [100] = 101.  [200] = 201.  [400] = 401.
a5 (4)  =   [100] = 101.  [200] = 201.  [200] = 501.
a6 (5)  =   [100] = 101.  [300] = 301.  [600] = 601.

esfmPrintSpecArrays: arrays according to specification:
esfmEmpty (-1)  =
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a4 (3)  =   [100] = 101.  [200] = 201.  [400] = 401.
a5 (4)  =   [100] = 101.  [200] = 501.  [200] = 201.
a6 (5)  =   [100] = 101.  [300] = 301.  [600] = 601.

MachineStats {ncells = 16, narrays = 5, nelts = 6, nzombies = 0, nfree = 10}

Deleting an array may delete a cascade of elements

     esfmDelete a6
     showState

esf delete name=5
    0.     >    1    1..4     [  100] = EsfUserOrdVal 101
    1.     >    2    2..4     [  200] = EsfUserOrdVal 201
    3.     >    4    4..4     [  400] = EsfUserOrdVal 401
    4.     >    3    3..3     [  200] = EsfUserOrdVal 501
esfmPrintMachineArrays: arrays represented in machine state:
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a4 (3)  =   [100] = 101.  [200] = 201.  [400] = 401.
a5 (4)  =   [100] = 101.  [200] = 201.  [200] = 501.

esfmPrintSpecArrays: arrays according to specification:
esfmEmpty (-1)  =
a1 (0)  =   [100] = 101.
a2 (1)  =   [100] = 101.  [200] = 201.
a4 (3)  =   [100] = 101.  [200] = 201.  [400] = 401.
a5 (4)  =   [100] = 101.  [200] = 501.  [200] = 201.

MachineStats {ncells = 16, narrays = 4, nelts = 4, nzombies = 0, nfree = 12}

Tagging/untagging the element values

The program also contains a few utility functions that perform tagging and untagging of Int data, converting to and from the EsfEltVal type.

mkInt :: Int -> EsfEltVal Int
mkInt n = EsfUserOrdVal n
getInt :: EsfEltVal Int -> Int
getInt (EsfUserOrdVal n) = n
getInt _ = error "element should be EsfUserOrdVal but isn't"

Summary of ESFA operations using monadic interface

Several sets of operations are defined in the system. This section shows how to use the operations whose names begin with esfm. These aren't the lowest level of operation; they provide some extra diagnostic tools. Thus the esfmUpdate operation not only does the update, it also places some tracing information in the state (maintained using the StateT monad). The actual operation is performed by ptcUpdate, which executes the update algorithm on the circuit, which is simulated.

Initialization

esfmInitialize

Update and lookup

esfmUpdate. Example:

Just a2 <- esfmUpdate "a2" a1 (200, mkInt 201)
esfmLookup. Example:

Just x3 <- esfmLookup "a5" a5 200 -- expect 201

Printing machine state

esfmPrintState. Prints out the state of the circuit. Only the cells that contain useful information are printed; empty cells are skipped.
esfmPrintMachineArrays. This examines the state of the (simulated) circuit and works out the contents of the arrays that are represented in the machine. These arrays are printed out. This operation shows what is actually represented in the machine (cf. esfmPrintSpecArrays). This is a meta-operation that relies on the ability of the circuit simulator to gain access to the internal register states. Note: the MachineArrays aren't quite finished: if an array has been updated at an index where it was already defined, both index/value definitions will be printed. That isn't a bug in the ESFA system, it's just incomplete implementation of the meta-operation.
esfmPrintSpecArrays. Whenever an esfmUpdate is performed, the result according to the specification is calculated and stored in the tracing state. The Spec arrays are using an algebraic data type to calculate the correct values of the arrays; this is actually an executable specification. This value is completely independent of the machine state. By using esfmPrintSpecArrays, you can see what the correct arrays should be, according to the specification.
esfmAnalyzeMachineState. You can find out several statistics of the machine state. Also, you can extract these fields for yourself and process them. Here is an example, followed by the definition of the data type.

stats <- esfmAnalyzeMachineState liftIO (putStrLn (show stats))

data MachineStatistics = MachineStats {ncells, narrays, nelts, nzombies, nfree :: Int}

Controlling the behavior with options

esfmSetOption. You can turn on tracing of the operations:

esfmSetOption EsfmTraceOperation

Extensible Sparse Functional Arrays User's Guide (0.0.2)

John T. O'Donnell

10 April 2013