Matching regular expressions, observably correctly

James McKinna, University of St Andrews

We revisit Harper's JFP Pearl on the correctness and termination of a CPS regexp matcher. The twist here is to express matching on a string s against regexp r in *direct* style, but with a dependent type which encodes success, computing a parse tree p and a suffix s' such that

(render p) ++ s' = s

or failure, with less or more error reporting as required. That is, matching is expressed directly as the inverse of the observer function render, which flattens a parse tree back into a string. The render function is most cleanly expressed in tail-recursive form in the usual "fusion law" way, with continuations reified in the string suffices. Correctness of matching reduces to the (obvious) correctness of render. Termination is automatic, as we work in the strongly terminating setting of EPIGRAM.