Brains, Programming Languages, Disconnected Enthusiasms

Monday, May 14, 2007

Hyrax and The Parsnip

I've been working on and off on a programming language for over a year now. In the beginning it was a nameless C++ implementation of Paul Graham's illusory Arc. Under the name Hyrax my language kept the s-expression syntax and Lisp-like core but donned and shed many poorly implemented language features (type inference, prototype-based Objects, etc..). Finally I found some purpose for the beast as a functional matrix language. With the semantics of Scheme, the mathematical capabilities of GSL and GiNac and the syntax of Matlab I could give SciPy a run for its money.

The only obstacle in my way is parsing the new syntax. I have thus far been using a kludgy hand-written s-expression parser that is highly immune to improvement. I tried out ANTLR and Spirit but found them both cumbersome. In the true spirit of Not Invented Here I have been writing my own parser library in C++: Parsnip.

Parsnip is a backtracking recursive descent parser combinator library inspired by Parsec. Here's an example of Parsnip in action:



/*
PARSER types are declared Output, Input
so this parser accepts strings and return ints.
The reversal of the usual order seems a little clumsy
but it makes some code shorter and is easy to get used to.
*/

typedef PARSER(int, string) Parser;


/*
plus parses 0 or more spaces followed
by the '+' character followed by 0 or more
spaces
*/

Parser plus = spaces >> ch('+') >> spaces;


/*
The sepBy1 parse creates a sequence
of integer parsers from an input
of integer strings separated by
plus parsers.

The reduce parser works like
usual reduce function, it collects
pairs of values from a sequence and
applies a function to them. The second
argument passed to reduce is the default
in case of an empty sequence.
*/


Parser sum = reduce(add, 0, sepBy1(integer, plus));


//The Maybe struct is modeled after Haskell's Maybe a = Nothing | Just a
//but in this case the Just value can be accessed by Maybe::data

Maybe<int> result = parse("1 + 2 + 3", sum);
// prints "6"

if (result) cout << result.data;

To allow more natural expression of grammars I make heavy use of backtracking in parsnip parsers. All this backtracking comes at the cost of horrible algorithmic bounds, which I partially offset by memoizing the parse results. This makes Parsnip's parsers full-blown packrat parsers.

With this immensely powerful and easy to use parser I plan on turning Hyrax into an unstoppable juggernaut of scientific computing. Now all I need is a new name.

Blogorevolution: How I Faked The Moon Landing Using Web 3.0

I'm commandeering Spikebucket for unbearable programming language nerdiness. I'll say it here so you don't have to hear it at a party. Since neither Weissman nor Sergey are likely to ever check this page again, I don't think this change of course will be a problem.