From: Will Partain <partain@dcs.gla.ac.uk>

(Up to Haskerl index.)
To: haskell, comp.lang.functional, comp.lang.perl
Subject: a note on the Haskerl extension to Haskell [long]
Date: Thu, 01 Apr 93 10:06:58 +0100
From: Will Partain <partain@dcs.gla.ac.uk>

Non-strict, purely-functional languages, such as Haskell [1], are
perceived to be inadequate for everyday, get-the-job-done tasks; in
particular, they are seen to be "bad at I/O".  Consequently, an
informal working group has been designing an extended variant of
Haskell to address these requirements, while remaining within a
non-strict, purely-functional framework.

The Perl language [2] is nothing if not "good for everyday,
get-the-job-done" tasks -- it puts UNIX at the programmer's
fingertips.  Furthermore, perl is widely known, so we have chosen to
follow its lead, rather than invent our extensions ex nihilo.  (For
our initial version, we do not mind a UNIX-biased design.)

What follows is an *informal* note about what we call the "Haskerl"
extension to Haskell.  We especially welcome "major-gripe" comments
now, so that we can consider them before the first complete Haskerl
definition (sometime after FPCA, we hope).

W. Partain, group scribe (partain@dcs.glasgow.ac.uk)
[full working group listed in the document]

.................................................................

Executive summary
~~~~~~~~~~~~~~~~~

For the disaffected Haskell programmer, we provide: regular
expressions, at-your-fingertips access to UNIX features, and
shorthand-laden figure-out-what-I-mean (FWIM) syntax.

For the disaffected Perl programmer, we provide: lazy evaluation,
referential transparency, Hindley-Milner type checking, polymorphism,
a rich set of builtin data types, and (recursive) user-defined data
types.

We achieve this, essentially, by adding Perl-like features to Haskell.

Background
~~~~~~~~~~

To keep matters short, we assume basic familiarity with both Haskell
and Perl.  One additional non-standard Haskell idea we use heavily
herein is "monadic I/O" [3].

You may think of something of type "IO a" as being "a sequenced action
which eventually returns something of type "a".  A main program is of
type "IO ()".  The main functions used are "thenIO", "thenIO", and
"returnIO", thusly (for example):

    main  = io_action_1	`thenIO` ( \ result_1 ->
	    io_action_2	`thenIO_`
	    io_action_3 `thenIO` ( \ (_, _, result_3) ->
	    exitIO ( result_1 + result_3 )
	    ))

`thenIO_` is just shorthand for "`thenIO` ( \ _ ->".

The above might be written in pseudo-C as:

    main () {
	result_1 = io_action_1;
	io_action_2;
	(_, _, result_3) /* yes, right... */ = io_action_3;
	exit (result_1 + result_3);
    }

The above is all fine, except the syntax leaves something to be
desired.  That is one of the major points that Haskerl addresses.

Syntax for monadic I/O
~~~~~~~~~~~~~~~~~~~~~~

We steal back the {, }, and ; characters from if-you-don't-use-layout,
and use them instead for monad comprehensions of type "IO a".  Braces
delimit such a comprehension.  Semi-colon (;) is now, roughly,
"thenIO".

The example in the previous section may be written as:

    main = {
	result_1 <- io_action_1 ;
	io_action_2 ;
	(_, _, result_3) <- io_action_3 ;
	exitIO (result_1 + result_3)
    }

Which brings us to our next point ...

FWIM syntax (IO monad comprehensions)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The careful Haskell-aware reader of the example in the previous
section will note that the types really don't "work out".  For
example, semi-colon ("thenIO") gives the appearance of working whether
or not an "io_action" gives a result or not.

Things do work because Haskerl takes Perl's approach of
figure-out-what-I-mean (FWIM) syntax.  So: the general form of an IO
monad comprehension is:

    { io_thing_1 ; ... ; io_thing_N | result_value }

If an io_thing is of the form "result <- io_action", then the
subsequent semi-colon really is equivalent to "thenIO"; however, if
the io_thing is of the form "io_action" (no result bound), then the
semi-colon is equivalent to "thenIO_" (ignore-the-result variant).

Further, if an io_thing turns out to have type "[IO a]" (list of I/O
actions), then suitable wrapping will be inserted to deal with that
("listIO io_action `thenIO` ( \ result -> ... ", for those who care).

Finally, if the comprehension does not have an explicit
"result_value", the result from the last io_thing will be returned; if
the last io_thing has no result, then () will be returned (giving the
comprehension the type "IO ()", which is the type of a main program,
incidentally).

FWIM syntax ("if", for example)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

FWIM syntax for monad comprehensions isn't quite enough for Perl-like
programming convenience; in particular, a few major control functions
need some special pleading.

"if", for example, normally has the type "Bool -> a -> a -> a".
However, that is inadequate if we want to do something like ...

    if -f filename_string { -- check if a file exists; this is an IO action!
	handle <- open filename_string
    }
    -- (it would be tiresome if we had to put on an "else")

(Obviously, this example could be written in some longer, tedious way;
but the whole point of Haskerl is to remove lengthy tedium.)

Therefore, Haskerl recognises "if"s of type "IO Bool -> a -> a", and
makes the appropriate adjustments.  Also, if there is no `else' branch
but the `then' branch is of type "IO b", then a suitable no-op IO
action will be inserted as the `else'.

Similar manipulations are done to provide convenient "foreach",
"while", &&, and || constructs.  So, for example, both of these
expressions would typecheck and make sense:

    { (handle, pid) <- open "last |"
		    	|| die "couldn't start `last' pipe!\n" }

    if a == b || c == d then 1 else 0	-- as in straight Haskell

(Polymorphic) Regular expressions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One of the greatest conveniences of Perl is the at-your-fingertips
access to regular-expression (regexp) manipulation and use.  Haskerl
goes one better, by having *polymorphic* regexp facilities, though it
retains special syntax for Strings-only regexp work.

Again, we proceed by example.  Suppose we have data type:

    data Foo a b = Foo1 a Int | Foo2 b b | Foo3 Bool (Foo a b)

We might then match against a list of Foos (type "[Foo]") as follows:

    case expr of
      /^{Foo1 _ {4}}({Foo2 _ _})+{Foo1 _ _}?.$/ -> ...

Explanation (which assumes you mostly know how regexps work in Perl):

* Braces are used to delimit the constructors being matched; note the
  braces around the {4}, which indicate a numeric match there (an
  unadorned 4 indicates the *character* '4').

* ^ matches the beginning of a list; $ matches the end; . matches any
  constructor; (<regexp>) [parens] for grouping; <regexp>+ for one or
  more <regexps>; <regexp>* for zero or more; <regexp>? for zero or
  one.  I.e., the usual stuff.

  Classes (e.g., /[{Foo1 _ _}-{Foo2 _ _}]/) work, provided the
  relevant types ("Foo" in the example) are in class "Ord".

* As indicated by the wildcards, you *cannot* bind names to the fields
  of the matched data constructors.

  HOWEVER, the parts of the list matched by the parenthesized regexps
  are bound to the names $1, $2, ... $n, where $i refers to the part
  matched by the i-th regexp.  (Again, the usual stuff).  For Perl
  groupies, the names $@, $`, and $' are bound to: the whole matched
  bit, the part of the list *before* the matched bit, and the part
  after the matched bit.

* Because we still expect String regexp-matching to be the predominant
  use, we provide special pleading for it.  Most notably, an unadorned
  letter 'x' in a regexp is short for {MkChar x#}, where x# is
  Haskell's idea of the unboxed-character bit-pattern.

  All the other String regexp magic you might expect also exists: \n,
  \b, \w, \S, etc., etc.

Interpolation
~~~~~~~~~~~~~

For convenience, Haskerl allows interpolation of values into strings,
lists, "backticked" IO actions, and regular expressions.

Strings: If "val" is of type "String", then it can be interpolated
into a literal string, as in:

	"The quick brown ${val} jumped over the lazy dog\n"

(Unlike in Perl, the braces around the variable name are *not*
optional.)

Lists: If "val" is of type "[a]", then it may be interpolated into a
list of the same type; e.g.:

	[1, 3, 5, ${val}, 11, 13]

Backticked I/O actions:  The expression

	`cat /etc/passwd`

has the type "IO String".  Such expressions can have strings
interpolated into them, e.g.,:

	`cat ${filename}`

Regexps: A regular expression may have a value interpolated into it,
as in (this regexp matches a list of numbers):

	/{0}*(${val})/

Unlike in Perl, Haskerl does *not* interpolate, then go looking for
metacharacters (parens, ., $, etc.).  While convenient, it just
doesn't make sense for polymorphic regexps.

Builtins for UNIX
~~~~~~~~~~~~~~~~~

Perl has a rich set of built-in operators for talking to UNIX -- open,
chmod, getpwent, print, etc., etc., etc.  Furthermore, there is quite
a bit of FWIM syntax associated with various of these.  So, for
example, it knows that

    print;

"really" means

    print STDOUT $_;

Haskerl will have a similar and similarly-rich set of UNIX-friendly
builtins (mostly of type "IO <something>").  And there will be FWIM
fiddling, as above, when merited.

We haven't finished designing the set of builtins, plus associated
rules, and the whole list would be too long for this note anyway.
However, our intention should be clear from the above example.  (We
will post a separate note on the topic, as soon as we can.)

(Associative) arrays
~~~~~~~~~~~~~~~~~~~~

Associative arrays (and normal Int-indexed arrays) are a powerful
feature of Perl.  Both are a non-big deal in Haskerl, because of the
data structuring facilities of Haskell (which already has Arrays as a
builtin type).

Therefore, the associative-array datatype in Haskerl looks like

    data (Hashable a) => AssocArray a b	-- abstract

(with a suitable set of operations to go with it).  Note that
associative arrays may have keys of any type in class "Hashable", not
just Strings.

The only special pleading for associative arrays in Haskerl is that
the syntax "assoc_array{key}" is provided for indexing.

Implementation
~~~~~~~~~~~~~~

There is obviously no implementation of Haskerl yet, but the major
Haskell implementors have expressed interest in providing some
support.  Simon Peyton Jones (Glasgow) says, "The next release of our
compiler will have experimental support; just use the -erl flag."
Lennart Augustsson (Chalmers) writes, "I haven't started adding
Haskerl to LML/HBC, but it shouldn't take long.  By FPCA definitely."
John Peterson (Yale) indicated that support in their system will
depend on your LISP implementation (which sits under Yale Haskell).

We are therefore optimistic that there will be timely opportunity to
try out the Haskerl ideas before they are completely set in concrete.

.................................................................

References:

[1] P. Hudak, S. Peyton Jones, and P. Wadler (editors), Report on the
    functional programming language Haskell, SIGPLAN Notices, 27, 5,
    May, 1992.

[2] L. Wall and R. L. Schwartz, Programming perl, O'Reilly and
    Associates, Sebastopol, California, 1991.

[3] S. Peyton Jones and P. Wadler, "Imperative functional
    programming", 20th ACM Symposium on Principles of Programming
    Languages (POPL), Charleston, South Carolina, January, 1993.

.................................................................

Working group participants:

E. Biagioni (CMU)
J. D. Brock (UNC-Asheville)
W. S. Gibson (HP Labs)
D. Howe (Imperial)
J. P. M. Hultquist (NASA, Ames)
D. Kolo (Harvard)
W. Partain (Glasgow) -- scribe
B. T. Smith (Quintus)
D. Wakeling (York) -- chair
(Back to Haskerl index.)
Will Partain, partain@dcs.gla.ac.uk; 1998-03-07.