Basil

Contents:
Home
Synopsis
Config file format
Grammar file format
Symbol attributes
Getting started
Visitors
Simulator
Examples
Download

Symbol attributes

Symbol attributes can be attached to a left- or right-hand side symbol. The priority attributes (> + ^) and the exclusive attribute (!) can be repeated. The first exclusive attribute must follow a priority attribute.

Lexical state

The lexical state attribute sets the lexical state of the symbol if follows. The lexical state is a number, zero or greater.

A lexical state is calculated at each parsing state. When the parser requires the next token, the lexer is passed the lexical state of the current parsing state. The lexical state can be used by the lexer to tokenize context sensitive tokens.

If a token T has as lexical state L, then the lexer is passed L when T could be the next token. If a right-hand side nonterminal N has a lexical state L, then the lexer is passed L when any token in FIRST(N) could be the next token. If the left-hand side nonterminal N of a rule R has a lexical state L, then the lexer is passed L when any token in FIRST(R) could be the next token, where FIRST(R) is the set of tokens that can start R. The lexer is passed the lexical state 0 if none of these conditions are true. It is an error if at any time there could be two or more possible lexical states.

Consider the grammar below.

start -> a B 1 a B 2
a -> A
a ->
The lexer is passed the lexical state 1 when the first B could be the next token, and the lexical state 2 when the second B could be the next token.

The lexical state of a parsing state, if non-zero, is shown in the log file following the state number in parenthesis.

Priorities

basil generates a list of actions for each lookahead token in a parsing state. When the parser reaches a point where there are multiple actions for the next token it tries them in order. If a try leads to a syntax error, possibly caused by a semantic predicate, then the parser backtracks and tries the next one. If a syntax error occurs and no pending actions remain, then the parser reports a syntax error and possibly recovers.

The priority attributes are used to order the actions in a list. The exclusive attribute is used to exclude one or more actions from a list.

An action has two properties: a priority and an exclusive priority. These properties are computed from the attributes attached to the symbols (you will see how they are computed below). The actions in a list are ordered with respect to their priority; the action with the highest priority is first. Once the actions are ordered, an action and all remaining actions are discarded if the action has a lower exclusive priority than the one preceding it.

Here is an example. Suppose for some lookahead token T three actions are possible: ACTION_A, ACTION_B and ACTION_C. Let the priority and exclusive priority of these actions be the following:

Action Priority Exclusive Priority
ACTION_A 1 1
ACTION_B 2 1
ACTION_C 0 0
The action list on token T would be:
T - ACTION_B, ACTION_A
Before we can see how the priority of an action is computed, we need to define a few terms:
propagated shift priority
This is a property of a rule. It is computed for nonkernel rules only. This property stays with the rule as the dot moves to the right (as the rule becomes a kernel rule). A start rule has no propagated shift priority.

The propagated shift priority of a rule is the maximum effective shift priority of the left-hand side symbol, considering all contexts. For example, considering only the rules below, the propagated shift priority of the third rule is 2.

a -> d.e > f
b > -> d.e >
>> e ->.E 
The effective shift priority of e in the first and second rule is 1 and 2 respectively. The propagated shift priority of the third rule is 2, the maximum value. The propagated shift priority is shown preceding the rule.

If the command line option -ss is specified, then the propagated shift priorities are visible in the log file.

effective shift priority
This is a property of a right-hand side symbol. It is computed for the symbol following the dot in kernel and nonkernel rules. The effective shift priority of a symbol is computed using this procedure:

  • Let R be a rule.
  • Let L be the left-hand side symbol of R.
  • Let N be the symbol after the dot in R.

The effective shift priority of N is the sum of:

  • The propagated shift priority of R.
  • The shift priority of L.
  • The shift priority of N.

For example, the effective shift priority of A in the rule below is 3.

> a > ->.A >
effective first priority
This is a property of a token. It is computed for all tokens that can follow the symbol after the dot in kernel and nonkernel rules where the symbol after the dot is a nonterminal. For example, E is such a token in the rule below if FIRST(d) contains epsilon.
a -> b.c d E
The effective first priority of a token is computed using this procedure:

  • Let R be a rule.
  • Let L be the left-hand side symbol of R.
  • Let N be the symbol after the dot in R, and let N be a nonterminal.
  • Let T be a token that can follow N.

The effective first priority of T is the sum of:

  • The propagated shift priority of R.
  • The shift priority of L.
  • The reduce priority of N.
  • The first priority of T.

For example, the effective first priority of E in the rule below is 4 if FIRST(d) contains epsilon.

> a > -> b.c + d E ^
propagated first priority
This is a property of a follow item. A follow item is an item in the follow set of a rule. A follow item is composed of two symbols: a token and a nonterminal. It is printed in this form:
T.n
If T.n is an item in the follow set of a rule, then it indicates there is a reduction on T when the dot is after the last symbol in the rule. Furthermore, the nonterminal n is followed (via a GOTO) after the reduction. The nonterminal n is called the shortcut (n is not necessarily the left-hand side symbol of the rule).

The propagated first priority of a follow item is the maximum effective first priority of the token, considering all contexts. For example, considering only the rules below, the propagated first priority of E.d in the follow set of the third rule is 3.

> a -> b.d + E ^
> c -> b.d E ^
d -> D [E.d^^^]
The propagated first priority is shown following the follow item.

If the command line option -sf is specified, then the follow sets are visible in the log file.

Using these terms we can now see how the priority of an action is computed.

A SHIFT action on lookahead token T is present in a state when T follows the dot in some rule. The priority of the action is the maximum effective shift priority of T, considering all rules where T follows the dot.

A SHIFT action on lookahead token T is shown in the log file in the form

T - SHIFT S
where S is the state number.

A REDUCE action on lookahead token T is present in a state when T is the token in a follow item in the follow set of a rule, and the dot is after the last symbol on the right-hand side of the rule. The priority of the action is the reduce priority of the left-hand side symbol of the rule plus the propagated first priority of the follow item.

A REDUCE action on lookahead token T is shown in the log file in the form

T - REDUCE R n
where R is the number of the rule to reduce and n is the shortcut.

The exclusive priority of an action is computed in the same manner with the following understanding:

  • An exclusive attribute following a shift priority attribute is a shift exclusive priority.
  • An exclusive attribute following a reduce priority attribute is a reduce exclusive priority.
  • An exclusive attribute following a first priority attribute is a first exclusive priority.

Sticky follow

The sticky follow attribute sets the sticky property of the symbol it follows. If a symbol is sticky, then any REDUCE action with the symbol as the shortcut will be not be a candidate when computing the default action list in a state.

To put it another way, the parser follows a sticky symbol (via a GOTO) only when a valid token is next.

Consider this example:

start -> a b
a -> R
a -> S
b -> T
b -> U
As we can see in parsing states 1 and 2, the tokens R and S are reduced to a on any lookahead token:
1
-----
a -> R. (1) [T.a U.a]

* - REDUCE 1 a

2
-----
a -> S. (2) [T.a U.a]

* - REDUCE 2 a
However, by making a sticky,
start -> a < b
a -> R
a -> S
b -> T
b -> U
R and S are reduced to a only when a valid token is next:
1
-----
a -> R. (1) [T.a< U.a<]

T - REDUCE 1 a
U - REDUCE 1 a

2
-----
a -> S. (2) [T.a< U.a<]

T - REDUCE 2 a
U - REDUCE 2 a
The sticky attribute on a follow item indicates that the shortcut is sticky. The follow item EOT.start in a start rule is always sticky:
start ->.a b [EOT.start<]
Hence a start rule is reduced only when EOT (the end token) is next.

If the sticky attribute is attached to the left-hand side symbol in a rule, then the rule is reduced only when a valid token is next. For example, in the grammar below, R, but not S, is reduced to a only when a valid token is next:

start -> a b
a < -> R
a -> S
b -> T
b -> U
States 1 and 2 are:
1
-----
a < -> R. (1) [T.a U.a]

T - REDUCE 1 a
U - REDUCE 1 a

2
-----
a -> S. (2) [T.a U.a]

* - REDUCE 2 a

Accept

The accept attribute sets the accept property of the symbol if follows. If a rule is reduced and an accepting symbol is followed (via a GOTO), then all pending parsers formed during the parsing of the rule are discarded. A pending parser is formed before an action is executed if one or more alternate actions are possible. If the parser encounters a syntax error, possibly caused by a semantic predicate, the parser will backtrack to the last-formed pending parser.

An ACCEPT action indicates to the parser that an accepting symbol is being followed. In all other ways it is similar to a REDUCE action. An ACCEPT action on lookahead token T is shown in the log file in the form

T - ACCEPT R n
where R is the number of the rule to reduce and n is the shortcut.

If an accept action discards all pending parsers, then the event is called a full accept. This is important during syntax error recovery. Syntax recovery begins from the point of the last full accept. Since the accept action serves two purposes, a grammar will have accepting symbols even if it does not require backtracking.

The accept attribute if often used with the sticky follow attribute. The rule below, for example, will be reduced only if a valid token is next. The parser will also discard all pending parsers formed during the parsing of the rule.

a <* -> b c d