To: haskell, comp.lang.functional, comp.lang.perl Subject: a note on the Haskerl extension to Haskell [long] Date: Thu, 01 Apr 93 10:06:58 +0100 From: Will Partain <partain@dcs.gla.ac.uk> Non-strict, purely-functional languages, such as Haskell [1], are perceived to be inadequate for everyday, get-the-job-done tasks; in particular, they are seen to be "bad at I/O". Consequently, an informal working group has been designing an extended variant of Haskell to address these requirements, while remaining within a non-strict, purely-functional framework. The Perl language [2] is nothing if not "good for everyday, get-the-job-done" tasks -- it puts UNIX at the programmer's fingertips. Furthermore, perl is widely known, so we have chosen to follow its lead, rather than invent our extensions ex nihilo. (For our initial version, we do not mind a UNIX-biased design.) What follows is an *informal* note about what we call the "Haskerl" extension to Haskell. We especially welcome "major-gripe" comments now, so that we can consider them before the first complete Haskerl definition (sometime after FPCA, we hope). W. Partain, group scribe (partain@dcs.glasgow.ac.uk) [full working group listed in the document] ................................................................. Executive summary ~~~~~~~~~~~~~~~~~ For the disaffected Haskell programmer, we provide: regular expressions, at-your-fingertips access to UNIX features, and shorthand-laden figure-out-what-I-mean (FWIM) syntax. For the disaffected Perl programmer, we provide: lazy evaluation, referential transparency, Hindley-Milner type checking, polymorphism, a rich set of builtin data types, and (recursive) user-defined data types. We achieve this, essentially, by adding Perl-like features to Haskell. Background ~~~~~~~~~~ To keep matters short, we assume basic familiarity with both Haskell and Perl. One additional non-standard Haskell idea we use heavily herein is "monadic I/O" [3]. You may think of something of type "IO a" as being "a sequenced action which eventually returns something of type "a". A main program is of type "IO ()". The main functions used are "thenIO", "thenIO", and "returnIO", thusly (for example): main = io_action_1 `thenIO` ( \ result_1 -> io_action_2 `thenIO_` io_action_3 `thenIO` ( \ (_, _, result_3) -> exitIO ( result_1 + result_3 ) )) `thenIO_` is just shorthand for "`thenIO` ( \ _ ->". The above might be written in pseudo-C as: main () { result_1 = io_action_1; io_action_2; (_, _, result_3) /* yes, right... */ = io_action_3; exit (result_1 + result_3); } The above is all fine, except the syntax leaves something to be desired. That is one of the major points that Haskerl addresses. Syntax for monadic I/O ~~~~~~~~~~~~~~~~~~~~~~ We steal back the {, }, and ; characters from if-you-don't-use-layout, and use them instead for monad comprehensions of type "IO a". Braces delimit such a comprehension. Semi-colon (;) is now, roughly, "thenIO". The example in the previous section may be written as: main = { result_1 <- io_action_1 ; io_action_2 ; (_, _, result_3) <- io_action_3 ; exitIO (result_1 + result_3) } Which brings us to our next point ... FWIM syntax (IO monad comprehensions) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The careful Haskell-aware reader of the example in the previous section will note that the types really don't "work out". For example, semi-colon ("thenIO") gives the appearance of working whether or not an "io_action" gives a result or not. Things do work because Haskerl takes Perl's approach of figure-out-what-I-mean (FWIM) syntax. So: the general form of an IO monad comprehension is: { io_thing_1 ; ... ; io_thing_N | result_value } If an io_thing is of the form "result <- io_action", then the subsequent semi-colon really is equivalent to "thenIO"; however, if the io_thing is of the form "io_action" (no result bound), then the semi-colon is equivalent to "thenIO_" (ignore-the-result variant). Further, if an io_thing turns out to have type "[IO a]" (list of I/O actions), then suitable wrapping will be inserted to deal with that ("listIO io_action `thenIO` ( \ result -> ... ", for those who care). Finally, if the comprehension does not have an explicit "result_value", the result from the last io_thing will be returned; if the last io_thing has no result, then () will be returned (giving the comprehension the type "IO ()", which is the type of a main program, incidentally). FWIM syntax ("if", for example) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FWIM syntax for monad comprehensions isn't quite enough for Perl-like programming convenience; in particular, a few major control functions need some special pleading. "if", for example, normally has the type "Bool -> a -> a -> a". However, that is inadequate if we want to do something like ... if -f filename_string { -- check if a file exists; this is an IO action! handle <- open filename_string } -- (it would be tiresome if we had to put on an "else") (Obviously, this example could be written in some longer, tedious way; but the whole point of Haskerl is to remove lengthy tedium.) Therefore, Haskerl recognises "if"s of type "IO Bool -> a -> a", and makes the appropriate adjustments. Also, if there is no `else' branch but the `then' branch is of type "IO b", then a suitable no-op IO action will be inserted as the `else'. Similar manipulations are done to provide convenient "foreach", "while", &&, and || constructs. So, for example, both of these expressions would typecheck and make sense: { (handle, pid) <- open "last |" || die "couldn't start `last' pipe!\n" } if a == b || c == d then 1 else 0 -- as in straight Haskell (Polymorphic) Regular expressions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ One of the greatest conveniences of Perl is the at-your-fingertips access to regular-expression (regexp) manipulation and use. Haskerl goes one better, by having *polymorphic* regexp facilities, though it retains special syntax for Strings-only regexp work. Again, we proceed by example. Suppose we have data type: data Foo a b = Foo1 a Int | Foo2 b b | Foo3 Bool (Foo a b) We might then match against a list of Foos (type "[Foo]") as follows: case expr of /^{Foo1 _ {4}}({Foo2 _ _})+{Foo1 _ _}?.$/ -> ... Explanation (which assumes you mostly know how regexps work in Perl): * Braces are used to delimit the constructors being matched; note the braces around the {4}, which indicate a numeric match there (an unadorned 4 indicates the *character* '4'). * ^ matches the beginning of a list; $ matches the end; . matches any constructor; (<regexp>) [parens] for grouping; <regexp>+ for one or more <regexps>; <regexp>* for zero or more; <regexp>? for zero or one. I.e., the usual stuff. Classes (e.g., /[{Foo1 _ _}-{Foo2 _ _}]/) work, provided the relevant types ("Foo" in the example) are in class "Ord". * As indicated by the wildcards, you *cannot* bind names to the fields of the matched data constructors. HOWEVER, the parts of the list matched by the parenthesized regexps are bound to the names $1, $2, ... $n, where $i refers to the part matched by the i-th regexp. (Again, the usual stuff). For Perl groupies, the names $@, $`, and $' are bound to: the whole matched bit, the part of the list *before* the matched bit, and the part after the matched bit. * Because we still expect String regexp-matching to be the predominant use, we provide special pleading for it. Most notably, an unadorned letter 'x' in a regexp is short for {MkChar x#}, where x# is Haskell's idea of the unboxed-character bit-pattern. All the other String regexp magic you might expect also exists: \n, \b, \w, \S, etc., etc. Interpolation ~~~~~~~~~~~~~ For convenience, Haskerl allows interpolation of values into strings, lists, "backticked" IO actions, and regular expressions. Strings: If "val" is of type "String", then it can be interpolated into a literal string, as in: "The quick brown ${val} jumped over the lazy dog\n" (Unlike in Perl, the braces around the variable name are *not* optional.) Lists: If "val" is of type "[a]", then it may be interpolated into a list of the same type; e.g.: [1, 3, 5, ${val}, 11, 13] Backticked I/O actions: The expression `cat /etc/passwd` has the type "IO String". Such expressions can have strings interpolated into them, e.g.,: `cat ${filename}` Regexps: A regular expression may have a value interpolated into it, as in (this regexp matches a list of numbers): /{0}*(${val})/ Unlike in Perl, Haskerl does *not* interpolate, then go looking for metacharacters (parens, ., $, etc.). While convenient, it just doesn't make sense for polymorphic regexps. Builtins for UNIX ~~~~~~~~~~~~~~~~~ Perl has a rich set of built-in operators for talking to UNIX -- open, chmod, getpwent, print, etc., etc., etc. Furthermore, there is quite a bit of FWIM syntax associated with various of these. So, for example, it knows that print; "really" means print STDOUT $_; Haskerl will have a similar and similarly-rich set of UNIX-friendly builtins (mostly of type "IO <something>"). And there will be FWIM fiddling, as above, when merited. We haven't finished designing the set of builtins, plus associated rules, and the whole list would be too long for this note anyway. However, our intention should be clear from the above example. (We will post a separate note on the topic, as soon as we can.) (Associative) arrays ~~~~~~~~~~~~~~~~~~~~ Associative arrays (and normal Int-indexed arrays) are a powerful feature of Perl. Both are a non-big deal in Haskerl, because of the data structuring facilities of Haskell (which already has Arrays as a builtin type). Therefore, the associative-array datatype in Haskerl looks like data (Hashable a) => AssocArray a b -- abstract (with a suitable set of operations to go with it). Note that associative arrays may have keys of any type in class "Hashable", not just Strings. The only special pleading for associative arrays in Haskerl is that the syntax "assoc_array{key}" is provided for indexing. Implementation ~~~~~~~~~~~~~~ There is obviously no implementation of Haskerl yet, but the major Haskell implementors have expressed interest in providing some support. Simon Peyton Jones (Glasgow) says, "The next release of our compiler will have experimental support; just use the -erl flag." Lennart Augustsson (Chalmers) writes, "I haven't started adding Haskerl to LML/HBC, but it shouldn't take long. By FPCA definitely." John Peterson (Yale) indicated that support in their system will depend on your LISP implementation (which sits under Yale Haskell). We are therefore optimistic that there will be timely opportunity to try out the Haskerl ideas before they are completely set in concrete. ................................................................. References: [1] P. Hudak, S. Peyton Jones, and P. Wadler (editors), Report on the functional programming language Haskell, SIGPLAN Notices, 27, 5, May, 1992. [2] L. Wall and R. L. Schwartz, Programming perl, O'Reilly and Associates, Sebastopol, California, 1991. [3] S. Peyton Jones and P. Wadler, "Imperative functional programming", 20th ACM Symposium on Principles of Programming Languages (POPL), Charleston, South Carolina, January, 1993. ................................................................. Working group participants: E. Biagioni (CMU) J. D. Brock (UNC-Asheville) W. S. Gibson (HP Labs) D. Howe (Imperial) J. P. M. Hultquist (NASA, Ames) D. Kolo (Harvard) W. Partain (Glasgow) -- scribe B. T. Smith (Quintus) D. Wakeling (York) -- chair(Back to Haskerl index.)