Toothpicks and Bubblegum, Software Edition, Iteration 326

There’s nothing like working with an old *nix utility to remind you how brittle software is. Case in point: I’m trying to use flex and bison to design a very simple grammar for extracting some information from plaintext. Going by the book and everything, and it just doesn’t work. Keeps telling me it caught a PERIOD as its lookahead token when it expected a WORD and dies with a syntax error. I killed a whole day trying to track this down before I realized one simple thing: the order of token declarations in the parser (that’s your .y file) must match the order of token declaration in the lexer (your .l file). If it doesn’t, neither bison nor flex will tell you about this, of course (and how could they, when neither program processes files intended for the other?). It’s just that your program will stubbornly insist, against all indications to the contrary, that it has indeed caught a PERIOD when it expected a WORD and refuse to validate perfectly grammatical text.


I was so angry when this was happening and now I think I might be even angrier. Keep in mind that this fantastically pathological behavior is not documented anywhere, so I found myself completely baffled by what was happening. Where was PERIOD coming from? Why didn’t it just move on to the next valid token? Of course the correct thing is to include the tab.h file in the lexer, but I had written my definition down explicitly in the lexer file so I didn’t think to do that.

What’s ludicrous about this is that the flex/bison toolchain has to go through yet another auxiliary tool, m4, just to do its thing. m4, if you don’t know, is a macro language with a terrible, incomprehensible syntax that was invented for the purposes of text transformation, thereby proving years before its formulation Greenspun’s 10th rule, according to which any sufficiently advanced C project will end up reimplementing, badly, some subset of Common Lisp.

I have the utmost respect for Dennis Ritchie, but m4 is a clusterfuck that should have neverĀ survived this long. Once a language like Lisp existed, which could actually give you code and DSL transformations at a high level of abstraction, m4 became superfluous. It has survived, like so many awful tools of its generation, through what I can only assumeĀ is inertia.

Leave a Reply

Your email address will not be published. Required fields are marked *