The Parse Tree Module

The parser's principal role is to generate a parse tree. It does that by following language-specific production rules that are followed after encountering lexical tokens that are provided by a lexer.

By means of construction flags it is possible to tell the lexer to accept e.g. 'class' as a keyword (C++) or as an identifier (C). Similarly, it is possible to configure the parser for particular rules.

The parse tree itself is a lisp-like structure. All nodes subclass PTree::Atom (for terminals) or PTree::List (for non-terminals). A Visitor allows to traverse the parse tree based on the real run-time types of the individual nodes (there are about 120 different PTree::Node types).

The Encoding class

The C++ grammar makes it quite hard to recover certain semantic information from syntactic structure. For example, in a simple declaration individual declarators may carry part of the type information for the variables they declare. For example,

            char *a, b, c[3];
          

three declarators a, b, and c. The first has type char *, the second char, the third char[3]. In order to avoid the need to analyze the whole declaration to extract the type of a declarator, the parser attaches the type and name to declarators.

A similar argument applies to other cases, where non-local information is encoded into a node's encoded_name and encoded_type member.

The Encoding class needs to be able to represent full type names, and thus it seems sensible to use a mangling similar (or even identical !) to the one developed as part of the C++ ABI standard (see C++ ABI).

PTree::Display

Parse Trees tend to grow quickly, and it becomes quickly hard to debug them by simply traversing the list. Thus, the PTree module provides a simple means to print a (sub-)tree to an output stream.

            PTree::display(node, std::cout, false, false);
          

will print the tree referred to by node to std::cout. The third parameter is a flag indicating whether the encodings should be printed, too. The fourth parameter indicates, whether the actual C++ type of the node being printed should be included in the output.

Since this API turned out to be rather useful, there is a stand-alone applet that just generates a parse tree and then prints it out using the above function.

            display-ptree [-g <output>] [-d] [-r] [-e] <input>
          

The available options are:

-g filename

Generate a dot graph and write it to the given file.

-d

Print debug information (in particular traces) during the parsing.

-r

Print the C++ type of the parse tree nodes.

-e

Print encoded names / types for nodes such as names, declarators, etc..