Ben Laurie blathering

Writing Compilers Quickly

James Donald asks

How did you bring up a compiler so quickly – what environment, what development system is the compiler written in?

Firstly, I’ve had a lot of practice – I’ve been writing (small) compilers for at least 20 years now. I’ve been writing code for over 35 years. That helps. But to talk about the reproducible parts of what I did…

Firstly, I wrote the compiler in Perl. I like Perl for quick hacks, and I also like to write Perl code “properly” – that is, I use modules and object oriented programming, not great monolithic blocks of Perl gibberish.

Secondly, I used a compiler-compiler, in this case, Parse::Yapp. This uses a specification language rather like BNF, and exactly like its predecessors, yacc and bison, so there was zero learning curve for me there.

I do remember finding yacc extremely puzzling when I first started with it, so if you are new to this, I would highly recommend Antlr and Antlrworks. The only reason I did not use Antlr is its Perl support seems to be pretty experimental, and I didn’t want to spend time fighting the tools. Otherwise, Antlr is vastly superior to the whole yacc/bison/yapp thing. Antlrworks lets you interactively explore your parsing, complete with diagrams. It really is quite awesome and I love it.

Thirdly, I avoided using a lexer generator. In my experience, these are fiddly and more trouble than they’re worth, especially if you’re writing in Perl, where you have very nice pattern matching tools that allow you to write a lexer in not many lines of code (about 60 in the current version of Stupid).

Fourthly, I used monkey-patching to inject the language-dependent output part of the code. I’ve never really (consciously) used this technique before – I first came across it as a formal notion when we were wrestling with early versions of Caja, as it is very widely used in Javascript libraries. Although it is kinda evil and caused us untold pain with Caja, it does make for a very nice separation between components.

Fifthly, I kept the grammar of Stupid simple – I make life harder for the programmer in order to make the parser simpler. This is for two reasons, firstly, I was in a hurry, but more importantly, it is a design aim of Stupid that it should be clear to the programmer exactly what is happening. Getting clever with the grammar does not aid that process.

Sixthly, keeping your head clear is good! Parsers naturally produce parse trees with great ease, so building a parse tree and then later processing that is the way to go. Trying to do it all in one go rarely works well (single pass compilers are generally rather limited, though Stupid is technically mostly single pass at the moment). Getting used to recursion helps with processing parse trees.

Finally, I had a previous project, CaPerl, where I’d used Parse::Yapp, so I could crib from that to get off the ground rapidly.

As for the rest of the environment: FreeBSD for the operating system, emacs for the editor (wish there were a yapp/yacc mode – and even better, a Stupid mode!), Mercurial for version control.


  1. For anyone going the ANTLR route, consider using StringTemplate (written by the same guy) to generate code. I’ve used it on a few projects now and find it quite handy.

    Comment by Monty — 1 Feb 2010 @ 18:59

  2. I’m curious to hear more about your monkey-patching. Can you explain more about how this helps write a compiler?

    Comment by Christopher Lord — 2 Feb 2010 @ 1:42

  3. My Solution includes Lex, Yacc, Python and LLVM.

    Pretty quick and simple.

    Comment by Alcides — 6 Feb 2010 @ 0:56

  4. OMeta (available in a handful of languages and not difficult to port) is a handy tool for writing interpreters and compilers quickly.

    Comment by Patrick Logan — 6 Feb 2010 @ 4:29

  5. here’s a yacc-mode ->

    Comment by drhodes — 6 Feb 2010 @ 7:24

  6. Also related to ANTLR, Terence Parr has a book about creating your own domain specific languages.

    Comment by Monty — 24 Feb 2010 @ 17:02

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress