Config file format
Grammar file format
Symbol attributes
Getting started

Grammar file format

The grammar file contains a sequence of rules. Each rule can be preceded by a rule-name.

The pound sign (#) begins a comment; all characters remaining on the line are ignored. Except when terminating a comment, the newline character is a whitespace character.

Rule format

The rule syntax is simple BNF. An example rule is:
expr -> expr PLUS term
A rule is composed of a left-hand side symbol followed zero or more right-hand side symbols, separated by an arrow. Identifiers in upper case are tokens. Identifiers not in upper case are nonterminals. An identifier must begin with a letter; the characters - and _, and the digits 0 to 9 can be used after the first letter.

Any number of attributes can follow a symbol. The attributes following a symbol are said to be attached to the symbol. The following are the attributes.

Attribute Description
> Shift priority
+ Reduce priority
^ First priority
! Exclusive modifier on priority
< Sticky follow
* Accept
NUMBER Lex state
(attribs) NUMBER Repeat attribs NUMBER of times

Symbol attributes are described further in the next section.

The file must contain at least one start rule. A rule with the left-hand side symbol start indicates such a rule. For example, the rule below is a start rule.

start -> expr SEMI
Parsing can begin at any start rule.

Rule-name format

A rule-name must have one of the following forms. Let R be the rule it precedes.
Defines a node named ClassName. This node will represent R in the AST. basil generates the ClassName definition in the output nonterminal lzz file. The class will contain member functions that return the child nodes of R (the functions will be named after the right-hand side symbols). ClassName must be an acceptable C++ identifier.

[$ ClassName]
Like above, except ClassName is considered slippery. A slippery visitor that does not visit ClassName will automatically visit the child nonterminal nodes of R, from left to right. A visitor is slippery if true is passed as the slippery argument to the base visitor constructor. A visitor is by default slippery. A bottom-up visitor (a visitor that visits the AST as its created) is the only visitor you would likely not make slippery.

Slippery nodes are specially useful for sequence rules:

id-list -> id
[$ SeqNode] id-list -> id-list COMMA id

[IdNode] id -> IDENT
A visitor that collects the identifiers here would only have to visit IdNode:
// get identifer list
struct GetIdentList (IdentVector & ident_set)
  : NodeVisitor (true)
  // id -> IDENT
  void visit (IdNode & node) const
    ident_set.push_back (node.getIDENT ().getLexeme ());
(This is lzz code. You can also write visitors directly in C++.)

Names a shared node. ClassName will also represent R in the AST. ClassName must be defined elsewhere.

Use of shared nodes can reduce the amount of semantic code. Sequence rules are often candidates for shared nodes:

expr-list -> expr
[(SeqNode)] expr-list -> expr-list COMMA expr

[ ]
An empty rule-name. This form prevents single-rules from being bypassed (if the bypass option is enabled). If R is not a single rule, then this means nothing.
If a rule is preceded by an empty rule-name, or no rule-name at all, then:
  • If the rule is an epsilon rule (i.e.: foo ->), then an unset node represents the rule in the AST. Token and nonterminal nodes contain a member function to test if the node is set.

  • If the rule has one right-hand side symbol, then the child node represents the rule in the AST.

  • Otherwise, the child nodes are discarded, and an unset node represents the rule in the AST. In this manner, the AST can be pruned.