A foreign-language interface provides a way for software components written in a one language to interact with components written in another. Programming languages that lack foreign-language interfaces die a lingering death.
This document describes Green Card, a foreign-language interface for the non-strict, purely functional language Haskell. We assume some knowledge of Haskell and C.
Our goals are limited. We do not set out to solve the foreign-language interface in general; rather we intend to profit from others' work in this area. Specifically, we aim to provide the following, in priority order:
The ability to call C from Haskell is an essential foundation. Through it we can access operating system services and mountains of other software libraries.
In the other direction, should we be able to write a Haskell library that a C program can use? In principle this makes sense but in practice there is zero demand for it. The exception is that the ability to support some sort of call-backs is essential, but that is a very limited form of C calling Haskell.
Should we support languages other than C? The trite answer is that pretty much everything available as a library is available as a C library. For other languages the right thing to do is to interface to a language-independent software component architecture, rather than to a raft of specific languages. For the moment we choose COM, but CORBA(2) might be another sensible choice.
While we do not propose to call Haskell from C, it does make sense to think of writing COM software components in Haskell that are used by clients. For example, one might write an animated component that sits in a Web page.
This document, however, describes only (1), the C interface mechanism.
Even after the scope is restricted to designing a foreign-language interface from Haskell to C, the task remains surprisingly tricky. At first, one might think that one could take the C header file describing a C procedure, and generate suitable interface code to make the procedure callable from Haskell.
Alas, there are numerous tiresome details that are simply not expressed by the C procedure prototype in the header file. For example, consider calling a C procedure that opens a file, passing a character string as argument. The C prototype might look like this:
int open( char *filename )
Our goal is to generate code that implements a Haskell procedure with type
open :: String -> IO FileDescriptor
open :: Maybe String -> IO FileDescriptorso that we can model `NULL' by `Nothing'.
newtype FileDescriptor = FD IntThe file descriptor returned by `open' is just an integer, but Haskell programmers often use `newtype' declarations create new distinct types isomorphic to existing ones. Now the type system will prevent, say, an attempt to add one to a `FileDescriptor'. Needless to say, the Haskell result type is not going to be described in the C header file.
sin :: Float -> Float
None of these details are mentioned in the C header file. Instead, many of them are in the manual page for the procedure, while others (such as parameter lifetimes) may not even be written down at all.
The previous section bodes ill for an automatic system that attempts to take C header files and automatically generate the "right" Haskell functions; C header files simply do not contain enough information.
The rest of this paper describes how we approach the problem. The general idea is to start from the Haskell type definition for the foreign function, rather than the C prototype. The Haskell type contains quite a bit more information; indeed, it is often enough to generate correct interface code. Sometimes, however, it is not, in which case we provide a way for the programmer to express more details of the interface. All of this is embodied in a program called "Green Card".
Green Card is a Haskell pre-processor. It takes a Haskell module as input, and scans it for Green-Card directives (which are lines prefixed by "`%'"). It produces a new Haskell module as output, and sometimes a C module as well (Figure 2 Foreign language interfaces are harder than they look).
Green Card's output depends on the particular Haskell implementation that is going to compile it. For the Glasgow Haskell Compiler (GHC), Green Card generates Haskell code that uses GHC's primitive `ccall'/`casm' construct to call C. All of the argument marshalling is done in Haskell. For Hugs, Green Card generates a C module to do most of the argument marshalling, while the generated Haskell code uses Hugs's `prim' construct to access the generated C stubs.
For example, consider the following Haskell module:
module M where %fun sin :: Float -> Float sin2 :: Float -> Float sin2 x = sin (sin x)
Everything is standard Haskell except the `%fun' line, which asks Green Card to generate an interface to a (pure) C function `sin'. After the GHC-targeted version of Green Card processes the file, it looks like this(3):
module M where
sin :: Float -> Float
sin f = unsafePerformPrimIO (
case f of { F# f# ->
_casm_ "%r = sin(%0)" f# `thenPrimIO` \ r# ->
returnPrimIO (F# r#)})
sin2 :: Float -> Float
sin2 x = sin (sin x)
The `%fun' line has been expanded to a blob of gruesome boilerplate, while the rest of the module comes through unchanged.
If Hugs were the target, the Haskell source file remains unchanged, but a the Hugs variant of Green Card would generate output that uses Hugs's primitive mechanisms for calling C. Much of the Green-Card implementation is, however, shared between both variants. (We hope. The Hugs variant isn't even written.)
Green Card pays attention only to Green-Card directives, each of which starts with a "`%'" at the beginning of a line. All other lines are passed through to the output Haskell file unchanged.
The syntax of Green Card directives is given in Figure 3 Overview of Green Card). The syntax for dis is given later (Figure 5.7 Prefixes). The form Any_x means any symbol except x. Green Card understands the following directives:
A directive can span more than one line, but the continuation lines must each start with a `%' followed by some whitespace. For example:
%fun draw :: Int -- Length in pixels % -> Maybe Int -- Width in pixels % -> IO ()
Haskell-style comments are permitted in Green-Card directives.
A general principle we have followed is to define a single, explicit (and hence long-winded) general mechanism, that should deal with just about anything, and then define convenient abbreviations that save the programmer from writing out the general mechanism in many common cases. We have erred on the conservative side in defining such abbreviations; that is, we have only defined an abbreviation where doing without it seemed unreasonably long-winded, and where there seemed to be a systematic way of defining an abbreviation.
The most common Green-Card directive is a procedure specification. It describes the interface to a C procedure. A procedure specification has four parts:
Any of these parts may be omitted except the type signature. If any part is missing, Green Card will fill in a suitable statement based on the type signature given in the `%fun' statement. For example, consider this procedure specification:
%fun sin :: Float -> Float
Green Card fills in the missing statements like this(4):
%fun sin :: Float -> Float %call (float x1) %code result = sin(x1); %result (float result)
The rules that guide this automatic fill-in are described in Section 5.5 Automatic fill-in.
A procedure specification can define a procedure with no input parameter, or even a constant (a "procedure" with no input parameters and no side effects). In the following example, `printBang' is an example of the former, while `grey' is an example of the latter(5):
%fun printBang :: IO () %code printf( "!" ); %fun grey :: Colour %code r = GREY; %result (colour r)
All the C variables bound in the `%call' statement or mentioned in the `%result' statement, are declared by Green Card and in scope throughout the body. In the examples above, Green Card would have declared `x1', `result' and `r'.
The `%fun' statement starts a new procedure specification.
Green Card supports two sorts of C procedures: ones that may cause side effects (including I/O), and ones that are guaranteed to be pure functions. The two are distinguished by their type signatures. Side-effecting functions have the result type `IO t' for some type `t'. If the programmer specifies any result type other than `IO t', Green Card takes this as a promise that the C function is indeed pure, and will generate code that calls `unsafePerformIO'.
The procedure specification will expand to the definition of a Haskell function, whose name is that given in the `%fun' statment, with two changes: the longest matching prefix specified with a `%prefix' (Section 5.7 Prefixes elaborates)statement is removed from the name and the first letter of the remaining function name is changed to lower case. Haskell requires all function names to start with a lower-case letter (upper case would indicate a data constructor), but when the C procedure name begins with an upper case letter it is convenient to still be able to make use of Green Card's automatic fill-in facilities. For example:
%fun OpenWindow :: Int -> IO Window
would expand to a Haskell function `openWindow' that is implemented by calling the C procedure `OpenWindow'.
%prefix Win32 %fun Win32OpenWindow :: Int -> IO Window
would expand to a Haskell function `openWindow' that is implemented by calling the C procedure `Win32OpenWindow'.
The `%call' statement tells Green Card how to translate the Haskell parameters into C values. Its syntax is designed to look rather like Haskell pattern matching, and consists of a sequence of zero or more Data Interface Schemes (DISs), one for each (curried) argument in the type signature. For example:
%fun foo :: Float -> (Int,Int) -> String -> IO () %call (float x) (int y, int z) (string s) ...
This `%call' statement binds the C variables `x', `y', `z', and `s', in a similar way that Haskell's pattern-matching binds variables to (parts of) a function's arguments. These bindings are in scope throughout the body and result-marshalling statements.
In the `%call' statement, "`float'", "`int'", and "`string'" are the names of the DISs that are used to translate between Haskell and C. The names of these DISs are deliberately chosen to be the same as the corresponding Haskell types (apart from changing the initial letter to lower case) so that in many cases, including `foo' above, Green Card can generate the `%call' line by itself (Section 5.5 Automatic fill-in). In fact there is a fourth DIS hiding in this example, the `(_,_)' pairing DIS. DISs are discussed in detail in Section 6 Data Interface Schemes.
The body consists of arbitrary C code, beginning with `%code'. The reason for allowing arbitrary C is that C procedures sometimes have complicated interfaces. They may return results through parameters passed by address, deposit error codes in global variables, require `#include''d constants to be passed as parameters, and so on. The body of a Green Card procedure specification allows the programmer to say exactly how to call the procedure, in its native language.
The C code starts a block, and may thus start with declarations that create local variables. For example:
%code int x, y; % x = foo( &y, GREY );
Here, `x' and `y' are declared as local variables. The local C variables declared at the start of the block scope over the rest of the body and the result-marshalling statements.
The C code may also mention constants from C header files, such as `GREY' above. Green Card's `%#include' directive tells it which header files to include (Section 8 Imports).
Functions return their results using a `%result' statement. Side-effecting functions -- ones whose result type is `IO t' -- can also use `%fail' to specify the failure value.
The `%result' statement takes a single DIS that describes how to translate one or more C values back into a single Haskell value. For example:
%fun sin :: Float -> Float %call (float x) %code ans = sin(x); %result (float ans)
As in the case of the `%call' statement, the "`float'" in the `%result' statement is the name of a DIS, chosen as before to coincide with the name of the type. A single DIS, "`float'", is used to denote both the translation from Haskell to C and that from C to Haskell, just as a data constructor can be used both to construct a value and to take one apart (in pattern matching).
All the C variables bound in the `%call' statement, and all those bound in declarations at the start of the body, scope over all the result-marshalling statements (i.e. `%result' and `%fail').
In a result-marshalling statement an almost arbitrary C expression, enclosed in braces, can be used in place of a C variable name. The above example could be written more briefly like this(6):
%fun sin :: Float -> Float
%call (float x)
%result (float {sin(x)})
The C expression can neither have assignments nor nested braces as that could give rise to syntactic ambiguity (Section 6.1 Forms of DISs elaborates).
A side effecting function returns a result of type `IO t' for some type `t'. The `IO' monad supports exceptions, so Green Card allows them to be raised.
The result-marshalling statements for a side-effecting call consists of zero or more `%fail' statements, each of which conditionally raise an exception in the `IO' monad, followed by a single `%result' statement that returns successfully in the `IO' monad. Just as in Section 5.4 Result marshalling, the `%result' statement gives a single DIS that describes how to construct the result Haskell value, following successful completion of a side-effecting operation. For example:
%fun windowSize :: Window -> IO (Int,Int)
%call (window w)
%code struct WindowInfo wi;
% GetWindowInfo( w, &wi );
%result (int {wi.x}, int {wi.y})
Here, a pairing DIS is used, with two `int' DISs inside it. The arguments to the `int' DISs are C record selections, enclosed in braces; they extract the relevant information from the `WindowInfo' structure that was filled in by the `GetWindowInfo' call(7).
The `%fail' statement has two fields, each of which is either a C variable, or a C expression enclosed in braces. The first field is a boolean-valued expression that indicates when the call should fail; the second is a `(char *)'-valued that indicates what sort of failure occurred. If the boolean is true (i.e. non zero) then the call fails with a `UserError' in the `IO' monad containing the specified string.
For example:
%fun fopen :: String -> IO FileHandle
%call (string s)
%code f = fopen( s );
%fail {f == NULL} {errstring(errno)}
%result (fileHandle f)
The assumption here is that `fopen' puts its error code in the global variable `errno', and `errstring' converts that error number to a string.
`UserError's can be caught with `catch', but exactly which error occurred must be encoded in the string, and parsed by the error-handling code. This is rather slow, but errors are meant to be exceptional.
Any or all of the parameter-marshalling, body, and result-marshalling statements may be omitted. If they are omitted, Green Card will "fill in" plausible statements instead, guided by the function's type signature. The rules by which Green Card does this filling in are as follows:
Some C header files define a large number of constants of a particular type. The `%const' statement provides a convenient abbreviation to allow these constants to be imported into Haskell. For example:
%const PosixError [EACCES, ENOENT]
This statement is equivalent to the following `%fun' statements:
%fun EACCES :: PosixError %fun ENOENT :: PosixError
After the automatic fill-in has taken place we would obtain:
%fun EACCES :: PosixError
%result (posixError { EACCES })
%fun ENOENT :: PosixError
%result (posixError { ENOENT })
Each constant is made available as a Haskell value of the specified type, converted into Haskell by the DIS function for that type. (It is up to the programmer to write a `%dis' definition for the function -- see Section 6.2 DIS functions.)
In C it is common practise to give all function names in a library the same prefix, to minimize the impact on the common namespace. In Haskell we use qualified imports to achieve the same result. To simplify the conversion of C style namespace management to Haskell the `%prefix' statement specifies which prefixes to remove from the Haskell function names.
module OpenGL where %prefix OpenGL %prefix gl %fun OpenGLInit :: Int -> IO Window %fun glSphere :: Coord -> Int -> IO Object
This would define the two procedures Init and Sphere which would be implemented by calling OpenGLInit and glSphere respectively.
A Data Interface Scheme, or DIS, tells Green Card how to translate from a Haskell data type to a C data type, and vice versa.
The syntax of DISs is given in Figure 5.7 Prefixes. It is designed to be similar to the syntax of Haskell patterns. A DIS takes one of the following forms:
%fun foo :: This -> Int -> That %call (this x y) (int z) %code r = c_foo( x, y, z ); %result (that r)In this example `this' and `that' are DIS functions defined elsewhere.
newtype Age = Age Int %fun foo :: (Age,Age) -> Age %call (Age (int x), Age (int y)) %code r = foo(x,y); %result (Age (int r))As the `%call' line of this example illustrates, tuples are understood as data constructors, including their special syntax. Haskell record syntax is also supported. For example:
data Point = Point { px,py::Int }
%fun foo :: Point -> Point
%call (Point { px = int x, py = int y })
...
The use of records is also the reason for the restriction that
simple C expressions can't contain assignment. Without this
restriction examples like this would be ambiguous:
%result Foo { a = bar x, b = bar y }
Green Card does not attempt to perform type inference; it simply assumes
that any DIS starting with an upper case letter is a data constructor,
and that the number of argument DISs matches the arity of the constructor.
%fun foo :: Int# -> IO ()
%call ({int} x)
...
data T = MkT Int#
%fun baz :: T -> IO ()
%call (MkT ({int} x))
...
It would be unbearably tedious to have to write out complete DISs in every procedure specification, so Green Card supports DIS functions in much the same way that Haskell provides functions. (The big difference is that DIS functions can be used in "patterns" -- such as `%call' statements -- whereas Haskell functions cannot.)
Green Card supports two sorts of DIS function: DIS macros (Section 6.2.1 DIS macros) and user-defined DISs (Section 6.2.2 User-defined DISs).
DIS macros allow the programmer to define abbreviations for commonly-occurring DISs. For example:
newtype This = MkThis Int (Float, Float) %dis this x y z = MkThis (int x) (float y, float z)
Along with the `newtype' declaration the programmer can write a `%dis' function definition that defines the DIS function `this' in the obvious manner.
DIS macros are simply expanded out by Green Card before it generates code. So for example, if we write:
%fun f :: This -> This %call (this p q r) ...
Green Card will expand the call to `this':
%fun f :: This -> This %call (MkThis (int p) (float q, float r)) ...
(In fact, `int' and `float' are also DIS macros defined in Green Card's standard prelude, so the `%call' line is further expanded to:
%fun f :: This -> This
%call (MkThis (I# ({int} p)) (F# ({float} q), F# ({float} r)))
...
The fully expanded calls describe the marshalling code in full detail; you can see why it would be inconvenient to write them out literally on each occasion!)
Notice that DIS macros are automatically bidirectional; that is, they can be used to convert Haskell values to C and vice versa. For example, we can write:
%fun f :: This -> This %call (MkThis (int p) (float q, float r)) %code int a, b, c; % f( p, q, r, &a, &b, &c); %result (this a b c)
The form of DIS macro definitions, given in Figure 5.7 Prefixes, is very simple. The formal parameters can only be variables (not patterns), and the right hand side is simply another DIS. Only first-order DIS macros are permitted.
Sometimes Green Card's primitive DISs (data constructors) are insufficiently expressive. For recursive types, such as lists, it is obviously no good to write a single data constructor.
Green Card therefore provides a "trap door" to allow a sufficiently brave programmer to write his or her own marshalling functions. For example:
data T = Zero | Succ T %fun square :: T -> T %call (t (int x)) %code r = square( x ); %result (t (int r))
Use of `t' requires that the programmer define two ordinary Haskell functions, `marshall_t' to convert from Haskell to C, and `unmarshall_t' to convert in the other direction. In this example, these functions would have the types:
marshall_t :: T -> Int unmarshall_t :: Int -> T
The functions must have precisely these names: "`marshall_'" followed by the name of the DIS, and similarly for unmarshall. Notice that these marshalling functions have pure types (e.g. `marshall_t' has type `T -> Int' rather than `T -> IO Int'). Sometimes one wants to write a marsalling function that is internally stateful. For example, it might pack a `[Char]' into a `ByteArray', by allocating a `MutableByteArray' and filling it in with the characters one at a time. This can be done using `runST', or even `unsafePerformIO'. (These are all GHC-specific comments; so far as Green Card is concerned it is simply up to the programmer to supply suitably-typed marshalling functions.)
Green Card distinguishes user-defined DISs from DIS macros by omission: if there is a DIS macro definition for a DIS function `f' then Green Card treats `f' as a macro, otherwise it assumes `f' is a user-defined DIS and generates calls to `marshall_t' and/or `unmarshall_t'.
How does Green Card use these DISs to convert between Haskell values and C values? We give an informal algorithm here, although most programmers should be able to manage without knowing the details.
To convert from Haskell values to C values, guided by a DIS, Green Card does the following:
Much the same happens in the other direction, except that Green Card calls the `unmarshall' function in the user-defined DIS case.
Figure 6.3 Semantics of DISs gives the DIS functions that Green Card provides as a "standard prelude".
The "`T'" variants allow the programmer to specify what type is to be used as the C representation type. For example, the `int' DIS maps a Haskell `Int' to a C `int', whereas `intT {FD}' maps a Haskell `Int' onto a C value with type `FD'.
Several of the standard DISs involve types that go beyond standard Haskell:
Almost all DISs work on single-constructor data types. It is much less obvious how to translate values of multi-constructor data types to and from C. Nevertheless, Green Card does deal in an ad hoc fashion with the `Maybe' type, because it seems so important.
The syntax for the `maybeT' DIS is:
maybeT cexp dis
where `dis' is any DIS, and `cexp' is a C expression which represents the `Nothing' value in the C world.
In the following example, the function `foo' takes an argument of type `Maybe Int'. If the argument value is `Nothing' it will bind `x' to `0'; if it is `Just a' it will bind `x' to the value of `a'. The return value will be `Just r' unless `r == -1' in which case it will be `Nothing'.
%fun foo :: Maybe Int -> Maybe Int
%call (maybeT { 0 } (int x))
%code r = foo(x);
%result (maybeT { -1 } (int r))
There is also a `maybe' DIS wich just takes the DIS and defaults to `0' as the `Nothing' value.
Green Card "connects" with code in other modules in two ways:
The general syntax for invoking Green Card is:
`green-card [options] [filename]'
Green Card reads from standard input if no filename is given. The options can be any of those:
Here we summarise aspects of Green Card that are less than ideal, and indicate possible improvements.
Microsoft's Common Object Model (COM) is a language-independent software component architecture. It allows objects written in one language to create objects written in another, and to call their methods. The two objects may be in the same address space, in different address spaces on the same machine, or on separate machines connected by a network. OLE is a set of conventions for building components on top of COM.
CORBA is a vendor-independent competitor of COM.
Only GHC aficionados will understand this code. The whole point of Green Card is that Joe Programmer should not have to learn how to write this stuff.
The details of the filled-in statements will make more sense after reading the rest of Section 5 Procedure specifications
When there are no parameters, the `%call' line can be omitted. The second example can also be shortened by writing a C expression in the `%result' statement; see Section 5.4 Result marshalling.
It can be written more briefly still by using automatic fill-in (Section 5.5 Automatic fill-in).
This example also shows one way to interface to C procedures that manipulate structures.
This document was generated on 21 March 1997 using the texi2html translator version 1.51.