Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Jison: Bison in JavaScript (zaa.ch)
31 points by mnemonik on Jan 3, 2010 | hide | past | favorite | 5 comments


I'm going to start a flame war here... I guess.

The thing about bottom up parser generators, is that they really aren't all that useful.

There's a dirty little secret in compiler development: most "real world" compilers use hand written top-down recursive decent parsers. They don't use Bison.

The classical argument in favor of using bottom up parsing is that it can recognize a larger class of grammars and is often more efficient than top down parsing.

This, however, ignores one huge point. Parser generators are leaky abstractions. They almost always suck at error recovery. This is particularly true for bottom up parsers because they don't have a lot of contextual information. Error recovery is extremely important for several reasons:

1. Parsers with bad error recovery give shitty error messages.

2. Things like intellisence, which always operate on syntactically incorrect code, won't work without good error recovery.

Some people have done research on automatically injecting error recovery into bottom-up parsers. These systems, however, are always based on heuristics, and supporting one set of scenarios well always leaves another set of scenarios not supported well. Those are exactly the kind of situations where you need humans to tune things.

The problem with parser generators is that their input languages are not turing complete (by obvious necessity). That means when the parser generator does something wrong, you can't always fix it. Most abstractions in computer science are not like this. They involve one turning complete programming language being compiled down into a lower level turning complete programming language. When abstractions leak, you can always code around the leakage.

But, with a parser generator, you can't do this. If you want to implement robust error recovery, then you have to add explicit productions to the grammar to handle them. That becomes extremely tedious extremely quickly.

That means to create a real parser, for use in interacting with humans, you need one of two things:

1. A customized parser generator, tuned for the exact parser you are generating.

2. A parser written by hand.

Item #1 isn't really practical. That leaves number 2 as the obvious best choice.

Writing bottom up parsers by hand is a giant PITA. That's why people invented parser generators, so they wouldn't have to spend an eternity hand crafting tables.

Writing top down parsers by hand, however, is really easy and intuitive. You also get full support of most code browsing tools and debuggers, so they are easy to change, even as the grammar gets large. They are also easy to read.

Finally, because they are written in a Turing complete programming language, you can easily (and with minimal code) inject what heuristics you need, in exactly the place where you need them.

Because of this, hand written recursive decent parsers are almost always the best engineering choice, despite all the other tradeoffs.

Despite all the proofs of the superiority of LR parsers over LL parsers, I would take a human, armed with a specific grammar and a for loop, over Bison any day.


You mention specifically that it this is true for compilers. What about smaller applications. From my browsing of the FreeBSD tree most programs that have a syntax use yacc/lex to parse the input. Awk for example, among others.

As you are much more experienced in this area, any opinions?


You should be weary of using Lex for unicode, because the tables can get large.

As far as Bison goes, I guess that's up to you. My preference is against it, mainly because I don't think it buys you much (if the grammar is small, the code for the parser should be small to).

The biggest driver should be user experience. If you care a lot about how errors are reported (i.e. you want to avoid cascading errors, while trying to report as many errors as possible up front) , then I say you should definitely not use bison.

If you don't care about error reporting much, then its really just based on your personal preference.


If only I found this yesterday! I was looking for a parser generator that output JavaScript last night. I settled on:

http://jscc.jmksf.com/

And started writing a small Lisp programming language this morning. I like this Jison grammar syntax a lot better so I think I'll have to start porting it.

Thanks for posting this!


The work here is actually useful in real world. For example, if a jison frontend deployed as a grammar checker for code snippet share site or more practical a SQL query web-interface, it can provide a more responsive interactive input area which eliminates the communications to sever-side for grammar errors.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: