Skip to main content
University of Glasgow
A-Z : ACADEMIC DEPARTMENTS | SERVICES | STAFF |
Faculty of Information and Mathematical Systems > Department of Computing Science

Languages and Machines

Page 1/1: 1

2010-02-19

Generating an expression grammar for SableCC

Yes, you've read it correctly: this post is about generating a grammar for SableCC, the Java-based parser generator.

Expression grammars in SableCC

An expression grammar is that part of a grammar that describes the syntax for operator-based expressions, e.g.

2*3-4>>5+6*7

The grammar for this expression must take into account the precedence, associativity and fixity of the operators. For example, the SableCC productions for a grammar for arithmetic expressions consisting of factors and terms (i.e. sums of products) look like this (omitting the transformation annotations):

Productions

grammar         = exp_list;

exp_list        =
                    {list}    exp_list separator exp 
                  | {single}  exp           
                  ;

exp             =  
                    {plus}    exp plus factor 
                  | {factor}  factor          
                  ;

factor          =
                    {mult}    factor mult term 
                  | {term}    term             
                  ;

term              {-> exp} =
                    {number}  number            
                  | {exp}     l_par exp r_par   
                  ;

separator       = {semicolon} semi;

The AST is quite simple:

Abstract Syntax Tree

grammar           = exp+ ;

exp               =
                    {plus}    [l]:exp  [r]:exp |
                    {mult}    [l]:exp  [r]:exp |
                    {number}  number
                  ;

This example illustrates an unfortunate aspect of SableCC: the complete expression grammar has to be written out as production rules for every level of precedence, with the correct associativity and fixity. Even the example for a simple two-operator grammar is already quite complicated. Also note that every operator (or to be precise every precedence level) requires a separate AST node. A real-life programming language such as C++ or Perl has typically about 30 different operators with more than 10 levels of precedence. Writing an expression grammar for such a set of operators is daunting to say the least.

An expression grammar generator

However, the production rules for expression grammars are quite regular, so it is possible to generate the expression grammar using a script. That means we will be creating a parser generator grammar generator.

Every level of precedence requires a separate production rule and as observed above, because of the limited way in which in SableCC the concrete syntax can be transformed into the abstract syntax, every one of these rules results in a different node in the AST. It is therefore essential that we can not only generate the grammar but also at least a skeleton for the visitor methods for every operator node in the AST. Each of the visitor methods should contain a switch/case statement to perform the correct action for each operator. Unfortunately, Java's switch/case does not allow matching of strings. Therefore we need to map the operators to a unique number.

Perl to the rescue

For jobs like this, I very much prefer Perl over any other language; however, the design of the expression grammar generator is not Perl-specific (nor indeed SableCC-specific), so you can easily port it to your language of choice.

To generate the production rules (and the corresponding tokens, AST and visitor methods), we start from a table that succinctly describes our expression grammar:

my %optable=(
    1 => ['Prefix',['++', '--'],'AssocNone'],
    2 => [' ',['**'],'AssocRight'],
    3 => ['Prefix',['!','~'],'AssocRight'], 
    4 => ['Infix',['*','/','%'],'AssocLeft'], 
    5 => ['Infix',['+','-'],'AssocLeft'], 
    6 => ['Infix',['<<','>>'],'AssocLeft'],
    7 => ['Infix',['<=','>=','>','<'],'AssocNone'],
    8 => ['Infix',['==','!='],'AssocNone'], 
    9 => ['Infix',['&','bitand'],'AssocLeft'],
    10 => ['Infix',['|','^','bitor','xor'],'AssocLeft'],
    11 => ['Infix',['&&','and'],'AssocLeft'],
    12 => ['Infix',['||','or'],'AssocLeft'],
    13 => ['Infix',['='],'AssocRight'],
);

Based on this table, the script will generate the grammar and the corresponding Java code for the nodes from templates (expressions_templ.sablecc3, ExpressionsVisitor_templ.java). The templates contain commented placeholders, e.g. //OP_EXPR_PRODUCTIONS, which are replaced by the code generated by the script.

We create the operator tokens from corresponding helpers as follows:

# Helpers
$line=~/OP_EXPR_HELPERS/ && do { 
    for my $lev (1..$nlevs) {
        my @ops = @{$optable{$lev}->[1]};    
        print $GRAM "ops_l$lev = '".join ("' | '",@ops)."';\n";
    }
};

# Tokens
$line=~/OP_EXPR_TOKENS/ && do { 
    for my $lev (1..$nlevs) {
        my $opname=($lev==$signop)?'sign':"op_l$lev"; # special case: sign
        print $GRAM  "$opname = ops_l$lev;\n";
    }
};

The actual production rules depend on the fixity and associativity. For example, the Infix AssocRight rule template:

                print $GRAM  "{op_l$lev} terml$lm1 op_l$lev term_l$lev  
                {-> New expr.l$lev (op_l$lev,term_l$lm1.expr,term_l$lev.expr) }\n";

where $lm1=$lev-1.

AST generation is equally straightforward:

# AST    
$line=~/OP_EXPR_AST/ && do {
    for my $lev (1..$nlevs) {
        my $opname=($lev==$signop)?'sign':"op_l$lev"; # special case: sign
        my $fixity=$optable{$lev}->[0];
        if ($fixity eq 'Prefix') {
            print $GRAM "                 {l$lev}     [op]:op_l$lev [l]:expr  |\n";
        } elsif ($fixity eq 'Postfix') {
            print $GRAM "                 {l$lev}     [op]:op_l$lev [r]:expr  |\n";
        } else  {
            print $GRAM "                 {l$lev}    [op]:$opname [l]:expr  [r]:expr |\n";
        }
    }
};

We then generate an operator table for the Java class for our AST walker:

# Java operator table

$line=~/OP_EXPR_OPTABLE_ENTRIES/ && do {
    for my $lev (1..$nlevs) {
        my @ops = @{$optable{$lev}->[1]}; 
        my $i=0;
        for my $op (@ops) {
            print $JAVA "\t\toptable.put(\"$op\",$lev$i);\n";
            $i++;
        }
    }
};

Each operator gets a unique number based on its level and its position in the list. The Java methods for each operator precedence level contain a switch/case based on this number.

        print $JAVA "
        public void outAL${lev}Expr(AL${lev}Expr node) {
            String opstr=node.getOp().toString().trim();
            switch (optable.get(opstr)) {
        ";        
        my @ops = @{$optable{$lev}->[1]}; 
        my $i=0;
        for my $op (@ops) {
                print $JAVA "
                case $lev$i: // '$op'
                setNodeInt (node, getNodeInt (node.getL()) $op getNodeInt (node.getR()));
                break;
                ";                
            $i++;                
        }                
        print $JAVA "\n}\n}\n";

For example, a tree walker which performs integer computation has following method for the level-4 node:

    public void outAL4Expr(AL4Expr node) {
        String opstr=node.getOp().toString().trim();
        switch (optable.get(opstr)) {

                case 40: // '*'
                setNodeInt (node, getNodeInt (node.getL()) * getNodeInt (node.getR()));
                break;

                case 41: // '/'
                setNodeInt (node, getNodeInt (node.getL()) / getNodeInt (node.getR()));
                break;

                case 42: // '%'
                setNodeInt (node, getNodeInt (node.getL()) % getNodeInt (node.getR()));
                break;

        }
    }

Trying it out

To try it out, you can run the example, all files are in the archive BlogExample-ExpressionGrammar.zip.

The script is not very generic, all the file names and tables are hardcoded. So you simply run

$ perl expression_parser_generator.pl

This will generate the grammar expressions_gen.sablecc3, so you run sablecc on it:

$ sablecc expression_parser_generator.pl

The example provides a Main.java and a Calculate.java, which is based on the generated ExpressionsVisitor_gen.java. Compile and run:

$ javac Main.java
$ java Main test_expressions.gc

Every expression in the test file should evaluate to 42.

Conclusion

One of the deficiencies of SableCC is the lack of an expression grammar generator. I turns out it is not hard to write one in Perl, you're welcome to it. For full details, see the source code.

Content Request - posted by gotomeeting - 2/10/2011 17:38:09
One of the deficiencies of SableCC is the lack of an expression grammar generator. I turns out it is not hard to write one in Perl, you're welcome to it. For full details, see the source code.

2010-02-11

Writing a compiler using SableCC

In their own words, "SableCC is a parser generator which generates fully featured object-oriented frameworks for building compilers, interpreters and other text parsers. In particular, generated frameworks include intuitive strictly-typed abstract syntax trees and tree walkers. SableCC also keeps a clean separation between machine-generated code and user-written code which leads to a shorter development cycle."

For the Compilers course I teach we use Java as the development language because it is the language our Honours students are most familiar with. For historical reasons, SableCC is used as the parser generator. SableCC 3.2 has many attractive features but good documentation is not really one of them. In this post I want to explain its use by example.

Writing a compiler using SableCC is relatively straightforward (insofar as writing a compiler is ever straightforward ): you specify the tokens and productions of the grammar in an extended BNF-like notation, and annotate the production rules with transformations from concrete to abstract syntax. The abstract syntax tree (AST) is defined using the same syntax as the concrete one, in the same file.

Example

As an example, we will create a parser and emitter for the following C-like function definition funcdef.gc:

unsigned int cmp(int x, int y) {
    if (x==y) { 1 } else { 0 }
}

To be able to parse this code, we need to create the Abstract Syntax Tree and the grammar. We will call our grammar CFundef.

Abstract Syntax Tree (AST)

A minimal AST for parsing this example can be defined in SableCC like this:

Abstract Syntax Tree

program = expr+ ;
expr =    {bind} bind_expr
        | {pure} pure_expr
        ;

bind_expr = {fundef} [ftype]:c_int_type [fname]:identifier [fargstypes]:funcarg+ 
                     [fbody]:expr+
        ;
pure_expr =
          {cmp}   [op]:cmp [args]:pure_expr+ 
        | {cond}  [pred]:pure_expr [iftrue]:expr+ [iffalse]:expr+  
        | {number} number
        | {var} identifier
        ;            

funcarg = {argtup} [argtype]:c_int_type [argvar]:identifier;

c_int_type       =  signedness? int;
signedness     = {sig} signed | {unsig} unsigned;

Briefly, + means 1 or more, * 0 or more, ? 0 or 1 and | separates alternatives of a rule. The labels enclosed in braces, e.g. {bind}, identify the alternatives; the labels in brackets, e.g. [ftype]:, identify the token after the colon. The AST defines a program as a list of expressions expr. An expr is either a binding expression bind or a pure expression. A binding expression binds something to a variable, in Gannet a named function is implemented by binding a lambda function (anonymous function) to a variable. A pure expression is any expression that is not a bind expression.
For this simple example, the only bind expression is fundef. The AST node contains the return type of the function (c_int_type), its name, the argument list which consist of typed identifiers (funcarg), and the function body, which consists of a list of expressions. A pure expression can be a comparison operation (x==y), an if-then-else block, a number or a variable.

Identifiers not defined in the AST (e.g. "cmp", "identifier" and "int" must be terminals of the grammar, i.e. tokens:

Tokens

comma = ',';    
semi = ';';
cmp = '==';
l_par = '(';
r_par = ')';
l_brace = '{';
r_brace = '}';
if = 'if';
else = 'else';
int = 'int';
unsigned = 'unsigned';    
number = decimal_constant;
identifier = noncap (digit | nondigit)*;
blank = blank;    
comment = comment;

Unquoted identifiers on the right-hand side of the token definitions are defined as regular expressions in the Helpers section of the grammar.
SableCC conveniently lets us specify which tokens can be ignored:

Ignored Tokens

blank,
comment;

Concrete syntax of the language

Assuming that our program will only exists of one or more function definitions, the production rules become

Productions

program         = expr_list separator?   {-> New program ([expr_list.expr])} ;                          

expr_list          {-> expr*} =
                    {list}    expr_list separator expr 
                    {-> [expr_list.expr, expr.expr] }
                  | {single}  expr      {-> [expr.expr] }
                  ;

pure_expr_list  {-> pure_expr*} =
                    {list} pure_expr_list separator pure_expr 
                    {-> [pure_expr_list.pure_expr,pure_expr.pure_expr]}
                |   {single} pure_expr {-> [pure_expr.pure_expr]}
                ;

expr = 
                    {bind} bind_expr 
                |   {pure} pure_expr
                ;

bind_expr  = {fundef} function_definition {-> function_definition.bind_expr};

function_definition {-> bind_expr} = 
{fundef} c_int_type identifier l_par arg_decl_list r_par 
         l_brace expr_list separator? r_brace 
{-> New bind_expr.fundef (c_int_type,identifier,[arg_decl_list.funcarg]
,[expr_list.expr]) };

arg_decl_list {-> funcarg* } =    
                             {alist}    arg_decl_list list_sep func_arg 
                             {-> [arg_decl_list.funcarg,func_arg.funcarg]}
                           | {asingle}  func_arg {-> [func_arg.funcarg]}                
                           ;

func_arg {-> funcarg } = c_int_type identifier 
                        {-> New funcarg.argtup (c_int_type,identifier)};

pure_expr {-> pure_expr} = 
      {cmp} [left]:atom_pure_expr [op]:cmp [right]:atom_pure_expr    
      {-> New pure_expr.cmp (op,[left.pure_expr,right.pure_expr]) }
    | {atom} atom_pure_expr {-> atom_pure_expr.pure_expr };             

atom_pure_expr {-> pure_expr} =
                  {cond} if l_par [pred]:pure_expr r_par [lbt]:l_brace 
                  [iftrue]:expr_list [s1]:separator? 
                  [rbt]:r_brace else [lbf]:l_brace 
                  [iffalse]:expr_list [s2]:separator? 
                  [rbf]:r_brace 
                {-> New pure_expr.cond (pred.pure_expr,[iftrue.expr],[iffalse.expr])}
                | {number}  number              {-> New pure_expr.number(number) }
                | {var} identifier               {-> New pure_expr.var(identifier) }
                  ;                      

c_int_type  =  signedness? int ;
signedness     = {sig} signed | {unsig} unsigned;

separator {-> } = semi {-> };                             
list_sep {-> } = comma  {-> };

Observe how lists are defined recursively: removing the annotations, an expression list is defined as:

expr_list = expr_list separator expr | expr;

The annotations {-> ...} define conversions between the concrete syntax of the grammar as expressed using the Productions and the abstract syntax as defined by the AST. Every production rule is on the left-hand side annotated with the corresponding rule in the AST, for example

arg_decl_list {-> funcarg* }

means that the list of argument declarations will be mapped to a list of funcarg. The annotations on the right-hand side express the conversion for every alternative of the production, for example

c_int_type identifier {-> New funcarg.argtup (c_int_type,identifier)};

creates a new leaf node in the AST. For productions that include other productions, the annotation does not explicity create a new node but calls the annotation of the nested production. For example, in the arg_decl_list production the alternative asingle

{asingle}  func_arg {-> [func_arg.funcarg]}

is mapped to a one-element list which contains a call to the right-hand side transformation rule of func_arg. When the mapping between the concrete syntax and AST is direct, e.g. for expr or c_int_type, the annotations can be omitted.

Generating the Java parser

Running sablecc on this grammar file

$ sablecc c_style_funcdef.sablecc3

results in generation of a set of Java classes implementing the lexer, parser, AST and tree walkers. The class hierarchy starts with the name defined using the Package declaration in the grammar file, in our case CFundef:

CFundef/
    analysis/
    lexer/
    node/
    parser/

In practice,the directories that contain classes of interest to us are analysis and node. The latter contains classes implementing all nodes of the AST. The names of the classes are composed as follows: 'T' for terminals (Tokens), 'P' for Productions, 'A' for alternatives of a rule; then the name of the token or AST node in camelcase, e.g. TCmp, PExpr or AFundefBindExpr.

The Main program

So how do we use these classes? First of all, here is some boilerplate for the Main program:

/*
 * WV: a parser for CFundef, emitting Gannet intermediate representation
 * */

import java.io.FileReader;
import java.io.PushbackReader;
import java.io.BufferedReader;

import CFundef.parser.*;
import CFundef.lexer.*;
import CFundef.node.*;


public class Main {

    public static void main(String arguments[]) {

        boolean verbose = false;
        if (arguments.length < 1) {
            System.out.println("usage: cfundefc [.gc file]");
            System.exit(1);
        }

        try {
            FileReader infile = new FileReader(arguments[0]);
            Lexer l = new Lexer(new PushbackReader(new BufferedReader(infile),1024));
            Parser p = new Parser(l);
            Start tree = p.parse();
            // The next two lines are the only non-generic ones
            Emitter emitter = new Emitter();
            tree.apply(emitter);

        } catch (Exception e) {
            throw new RuntimeException("\n"+e.getMessage());
        }
    }
}

The only non-generic lines in the above code are the instantiation of the Emitter class

Emitter emitter = new Emitter();

and the application of the emitter to the AST

tree.apply(emitter);

This means that all we have to do is create the Emitter class.

DepthFirstAdapter and ReversedDepthFirstAdapter

SableCC provides the DepthFirstAdapter and ReversedDepthFirstAdapter classes which perform a recursive descent on the AST. The DepthFirstAdapter traverses lists of child nodes in normal order, the ReversedDepthFirstAdapter in reverse order. The latter is useful e.g. when generating assembly code for pushing arguments onto a stack.

Now a word on the structure of the methods in the DepthFirstAdapter class. The class contains methods called in, out and case followed by the name of the AST node. Let's first look at a simple node: the ANumberPureExpr node, that is the node that represents an integer number in our AST.

public void inANumberPureExpr(ANumberPureExpr node)
{
    defaultIn(node);
}

public void outANumberPureExpr(ANumberPureExpr node)
{
    defaultOut(node);
}

public void caseANumberPureExpr(ANumberPureExpr node)
{
    inANumberPureExpr(node);
    if(node.getNumber() != null)
    {
        node.getNumber().apply(this);
    }
    outANumberPureExpr(node);
}

The case method is the one that performs the actual tree traversal by calling the apply() method on every child of the node. In the case of the ANumberPureExpr, the only child is of type TNumber, i.e. the token representing the number. We can see this from the class definition (in CFundef/node/ANumberPureExpr.java); the node TNumber represents a terminal, i.e. a leaf in the AST.

public final class ANumberPureExpr extends PPureExpr
{
    private TNumber _number_;


    public TNumber getNumber()
    {
        return this._number_;
    }

    // ...
}

The caseANumberPureExpr method will first call the inANumberPureExpr method, then descend into the child nodes, return and apply the outANumberPureExpr method. By default, the in and out methods call the class-global methods defaultIn and defaultOut, which do nothing. This makes it very easy to apply a particular operation to every node in the AST. For example, redefining the defaultIn method to print out the stringified representation of the node (using the toString method) will result in a print-out of the complete AST.

For a more complex node such as ACondPureExpr, which represents the if-then-else construct, the in and out nodes are quite the same as for the outANumberPureExpr;

public void inACondPureExpr(ACondPureExpr node)
{
    defaultIn(node);
}

public void outACondPureExpr(ACondPureExpr node)
{
    defaultOut(node);
}

The case method is only slightly more complicated. It walks throught he predicate (Pred), the Iftrue and Iffalse expressions in turn; both the latter are lists of expressions, and as we use the DepthFirstAdapter, the lists are traversed in normal order.

public void caseACondPureExpr(ACondPureExpr node)
{
    inACondPureExpr(node);
    if(node.getPred() != null)
    {
        node.getPred().apply(this);
    }
    {
        List<PExpr> copy = new ArrayList<PExpr>(node.getIftrue());
        for(PExpr e : copy)
        {
            e.apply(this);
        }
    }
    {
        List<PExpr> copy = new ArrayList<PExpr>(node.getIffalse());
        for(PExpr e : copy)
        {
            e.apply(this);
        }
    }
    outACondPureExpr(node);
}

Creating the Emitter class

For this example we want to generate Gannet code corresponding from the Gannet-C function definition, and we choose the DepthFirstAdapter. Consequently, our Emitter class looks like

import CFundef.node.*;
import CFundef.analysis.*;
import java.util.*;

public class Emitter extends DepthFirstAdapter {
    // emitter methods for each node    
}

The Emitter class simply overrides the methods for every node in the AST that needs to be emitted. Gannet has a Scheme-like syntax, which means that every expression is simply a whitespace-separated list enclosed in parentheses. Apart from that, expressions in Gannet can be quoted using a prefix single quote. This is the emitter code for the function definition node:

public void inAFundefBindExpr(AFundefBindExpr node)
{
    emit("'(assign '"+node.getFname().toString()+" (lambda ");

    for ( PFuncarg arg_type: node.getFargstypes()) {
        AArgtupFuncarg arg = (AArgtupFuncarg)arg_type;
        emitnnl("'"+arg.getArgvar().toString()+" ");    
    }            
    emitnnl("'");        
}

public void outAFundefBindExpr(AFundefBindExpr node)
{
    emitcp("FunDef-lambda");
    emitcp("Fundef-assign");        
}

The emit and emitnnlmethods prints the string with the correct indentation; emitnnl does not print a newline. The emitcp method prints the closing parenthesis. In the inAFundefBindExpr method we create the assignment expression "(assign '"+node.getFname().toString() and the lambda expression, which consists of the quoted list of the function arguments. Gannet is untyped so the type information is simply discarded. Obviously, in a proper compiler there will be a separate pass to perform type checking and catch type errors before the emitter pass.
A few things may need some clarification here: first of all, for every rule with alternatives SableCC generates a P-type class. Now this is an abstract class from which the concrete alternative classes are derived. For example the PExpr class:

public abstract class PExpr extends Node {}

Therefore we must cast the arg_type to the type of the concrete class, in casu AArgtupFuncarg:

AArgtupFuncarg arg = (AArgtupFuncarg)arg_type;

Furthermore, we emit the code for AArgtupFuncarg in the inAFundefBindExpr method, this means that we don't need a specialised inAArgtupFuncarg method. However, we don't generate any code for the function body in inAFundefBindExpr. The caseAFundefBindExpr inherited from the DepthFirstAdapter class will descend in the tree and emit the code for expression making up the function body. In our example this is an if-then expression, so we need to provide a specialised emitter method for ACondPureExpr:

public void inACondPureExpr(ACondPureExpr node)
{
    annotate("If-Else");
    emit("(if"); 
}

public void outACondPureExpr(ACondPureExpr node)
{
    emitcp("If-Else");
}

public void caseACondPureExpr(ACondPureExpr node)
{
    inACondPureExpr(node);
    if(node.getPred() != null)
    {
        node.getPred().apply(this);
    }
    {
        List<PExpr> copy = new ArrayList<PExpr>(node.getIftrue());
        boolean lett=false;
        if (copy.size()>1) {
            lett=true;
            emit("'(let");
        } else {
                emitnnl("'(return ");
        }           

        for(PExpr e : copy)
        {
            e.apply(this);
        }
        if (lett==true) {
            emitcp("let"); // for let
        } else {
                emitcp("If-return");
        }            
    }
    {
        List<PExpr> copy = new ArrayList<PExpr>(node.getIffalse());
        boolean letf=false;
        if (copy.size()>1) {
            letf=true;
            emit("'(let");
        } else {
                emitnnl("'(return ");
        }           

        for(PExpr e : copy)
        {
            e.apply(this);
        }
        if (letf==true) {
            emitcp("let"); // for let
        } else {
                emitcp("If-return");
        }            
    }
    outACondPureExpr(node);
}

This is an example of a case where we specialise the caseACondPureExpr method. We have to do so because we need to emit code in between the constituents of the if-then expression: if the Iftrue or Iffalse block consists of a single expression, we emit a (return ), otherwise we wrap the list of expressions in a (let ). This cannot be achieved through the inACondPureExpr and outACondPureExpr methods.

Building and running the compiler

Assuming your Main.java and Emitter.java are in the same directory as the generated CFundef, compiling is simple:

$ javac -cp . Main.java

Running the example

$ java Main funcdef.gc

results in the Gannet intermediate representation code

(let
    '(assign 'cmp  (lambda 
        'x  'y  '(if
                        (== x y )
                        '(return '1 )
                        '(return '0 )
                    )
                )
            )
        )

Conclusion

With this post I wanted to give an overview of the basic steps in writing a compiler using SableCC. Of course a full-blown compiler will have a much more complicated grammar and require several more passes. However, I hope this helps as a starter.

For full details, see the source code of the example, including the generated SableCC classes

2009-12-07

Building SystemC-2.2 on Snow Leopard

Upgrading to Snow Leopard on my MacBook Pro broke my SystemC builds. The reason is that Snow Leopard is by default a 64-bit OS, and the build tools default to 64 bits. But SystemC only works in 32-bit mode. The solution is simple enough, just add the following flag to your compilation and link commands:

-arch i386

I use SCons to build my SystemC apps, so I added this flag to CXXFLAGS and LINKFLAGS.

If you upgraded from Leopard and did not rebuild SystemC itself, that's it.

Building SystemC itself is slightly more complicated: running configure results in an error: cannot not guess build type. To fix that, specify --build=i386-pc-macosx. However, the build will then fail on the line

as -o qtmds.o qtmds.s -I. \ 
-I../../../../src/sysc/qt -I../../../../src

The QuickThreads library uses an assembly file i386.s but the assember assumes its x86_64. To fix this, you have to pass the arch flag to the assembler. To do so, the best way is to edit configure.in. Look for i386-pc-macosx and add the -arch i386 flags as follows:

--- configure.in.ORIG   2008-11-05 10:43:46.000000000 +0000
+++ configure.in    2009-12-06 11:21:10.000000000 +0000
@@ -148,7 +148,7 @@
     i386-pc-macosx*)
         case "$CXX_COMP" in
             c++ | g++)
-                EXTRA_CXXFLAGS="-Wall"
+                EXTRA_CXXFLAGS="-Wall -arch i386"
                 DEBUG_CXXFLAGS="-g"
                 OPT_CXXFLAGS="-O3"
                 TARGET_ARCH="macosx"
@@ -159,7 +159,7 @@
                 AC_MSG_ERROR("sorry...compiler not supported")
        ;;
         esac
-   AS="as"
+   AS="as -arch i386"
         QT_ARCH="iX86"
         ;;

Now run autoconf to generate a new configure file. Run configure in objdir:

../configure --build=i386-pc-macosx

I verified the build with make check and it passes all tests.

2009-10-29

Greener Search with FPGAs

Last year, the Austrian company MatrixWare and the Information Retrieval Facility (IRF) approached Leif Azzopardi and me to create a Proof-of-Concept of an FPGA-based information retrieval system. We built the system on SGI Altix 4700 with two RC-100 FPGA blades using a high-level programming solution from the Swedish company Mitrionics and demonstrated it successfully during the Information Retrieval Facility Symposium. As usual, the academic output lagged somewhat but this summer we had two highly succesful papers, one at SIGIR and one at FPL. There is clearly a huge potential in the use of FPGAs to reduce energy consumption of a wide range of systems and we are actively seeking collaborations to explore the possibilities.

2009-08-04

Ruby to SystemC Translator

The reference implementation of the Gannet Platform is written in Ruby. The choice of Ruby as a system-level prototyping language might seem odd, but its advantages are that it is clear and easy to learn, with a simple but powerfull object model.

The reference model is the specification for the Gannet Platform, and as a specification language Ruby has proven an excellent choice. By restricting the coding style to non-idiomatic and adding type annotations in comments, it was possible to translate the Ruby code to C++ and from there to SystemC, the de facto standard for system-level modelling, using a relatively straightforward translator written in Perl.

The very fact that it is possible to translate Ruby to C++ is of course the key justification for the choice. The current translator is far from perfect, and translation to SystemC will never be completely automatic. However, the system is already powerfull enough to create the expression component of the Gannet SystemC model (i.e. the actual implementation of all methods) from the Ruby specification. The coordination component (ports, exports) has to be created manually, but as this part of the code does not change often, the productivity gain is still considerable.

The translator is part of the Gannet source code, www.gannetcode.org.

2009-08-03

First Release of the Gannet SoC Platform

The source code and documentation for the alpha release of the Gannet System-on-Chip Platform is now available on the Gannet project web site. The release includes a compiler for the Gannet language (written in Haskell), a reference implementation of the platform in Ruby, a C++ Virtual Machine and a SystemC system-level model of the hardware platform.

powered by Blosxom