Home Documentation Download Blog Forum Projects
Label: ♦english

[772] Think PEGuliarly!

Comment
I eventually decided to take on implementing PEG parser generator for Dao, as ybabel proposed some time ago. It's an unusual functionality, and some aspects of it would be better discussed beforehand, which is why I'm starting this topic.
First of all, a short intro about what PEG is and how it can be used.
PEG, Parsing Expression Grammar, is a simple BNF-like notation to define a parser. The syntax and semantics are explained well in the Wikipedia article , so I just give example from it which I'll use further. It's a simplest grammar for arithmetic expressions:
Value   ← [0-9]+ / '(' Expr ')'
Product ← Value (('*' / '/') Value)*
Sum     ← Product (('+' / '-') Product)*
Expr    ← Sum
When and why use PEG parser? PEG is much more flexible than regular expressions due to its recursion capability. Thus it becomes possible to parse complex-structured text data, code of existing and, most interesting, your own programming and markup languages etc. And that's not all mere theory: PEG is already adapted for most mainstream languages (see the incomplete list at the packrat/PEG page ).
Now, here is what I've come with regarding PEG support in Dao.
First, we need grammar mapping on Dao syntax. The original PEG syntax is certainly nice, but using it as text would require special parser and greatly complicate the implementation (higher error-proneness should also be taken into account). So it seems suitable to simply reuse ordinary language elements. The resulting Dao-based syntax I've thought up looks like this (same grammar as above):
grammar = (
  value => {'[0-9]+', ('(', $expr, ')')},
  product => ($value, ({'*', '/'}, $value), 0:),
  sum   => ($product, ({'+', '-'}, $product}), 0:),
  expr  => $sum
);
Here is description of the semantics:
  1. grammar => named tuple;
  2. terminal symbol => Dao regex string;
  3. non-terminal symbol (as a reference) => enum, "$id";
  4. grouping => tuple, "(e1, e2, ...)";
  5. ordered choice => list, "{e1, e2, ...}";
  6. quantifiers "*", "+" and "?" => tuples "0:", "1:" and "0:1";
  7. predicates "&" and "!" => integers 1 and 0.
That may be not the best variant, any other ideas regarding the syntax are welcomed.
BTW, there is one interesting aspect about the Dao-based syntax I propose -- the grammar is supposed to be used "as is", without translation into some internal structures. Thus it should be possible and safe to alter the grammar on the fly , after the corresponding parser has been created (but not add or remove rules). Not sure how it can be useful (and can it be at all), but that may be worth taking into account.
Regarding the actual implementation: for now I suppose it will be a usual C module with parser class. The class's constructor will take the grammar and pre-process it -- resolve references, compile regexes and check basic restrictions (I already wrote some code for that). No changes should be made to the grammar representation, so you'll be able to alter it any time like an ordinary composition of Dao values. Upon running the parser it's engine (packrat implementation, I suppose) will "walk" the grammar preserving the ability to detect new errors on the fly.
The question which interests me most is what output the parser should produce. Should it be just an AST mapped to Dao data structures, or perhaps it can be organized in more sophisticated and suitable way? There is probably a wide area of various possibilities to consider.
Should someone come around any interesting ideas about PEG in Dao, write down your thoughts here and thereby prevent me from deciding everything alone! :)
Comments
PEG seems very useful, I have been hoping to have something for this in Dao. It is really great that you are proposing to develop it:) Your proposal of grammar construction using named tuple looks very good, the only possible improvement I can imagine is to use $and and $not for the predicates.

Maybe we don't need to design and implement formats or data structures for output, if we can provide some functional (code section) methods to the PEG parser. But I am not sure yet how such methods should look like.
OK, I switched to $and , $not and $or . The latter is for ordered choice, "e1, $or, e2"; I felt it would make complex expressions easier to read (clearer variants boundaries, less braces, closer to the original notation).
Regarding functional methods, I see one feasible option. After the parser has processed the text and formed the AST, it can walk through the resulting tree using code section supplied by the user. Assuming the tree is traversed depth-first, the code section is provided with single node's data as parameter and is expected to return int value -- whether to traverse this node deeper or switch to the next node on same (or upper) tier. This way it becomes possible to process only the desired symbols without traversing all the tree if you don't have to.
BTW, another thing I plan is providing way for separate, grammar-independent definition of whitespace. By default, the parser will presume that terminal symbols may optionally be divided by "%s*" (any number of any whitespace characters). But if you need to parse something like Python or Yaml, with tabs being part of syntax, or automatically skip comments, you can provide the parser with your own regex to be used as whitespace pattern.

Change picture:

Choose file:

1234 5
67891011 12
131415161718 19
202122232425 26
2728293031

fu: A little bit game development in Dao! Thanks to ClangDao, it has become very easy to create bindings for C/ C++ libraries. The latest one i ... (May.14,07:08)

dao: Dao 1.2 Beta1 is released! After a very long time of development, the first beta release for Dao 1.2 is finally available ( http ... (May.06,23:37)

fu: ... Just to mention: a couple of demos (including the 2000 line one) has been successfully ported to IPho ... (May.19,02:43)

fu: ... Yes, it is getting mature, and more libraries and modules are coming out, hehe:) For GameKit, unfor ... (May.19,02:38)

Pompei2: ... This is cool news and really shows that ClangDao is getting mature, thumbs up. Too bad for this litt ... (May.18,09:17)

fu: ... Not completely, but mostly. New revisions will be regularly pushed to the repository on google code ( ... (May.08,22:38)

Pompei2: ... If I understand it correctly, you want to completely switch? If so then: (May.08,08:46)

This site is powered by Dao
Copyright (C) 2009-2012, daovm.net.
Webmaster: admin@daovm.net