tree.h file to your project, containing a
struct definition as per
HW#3, or your own preferred tree representation. Today's lab
is written up as if your tree type is
struct tree {
int prodrule;
char *symbolname;
int nkids;
struct tree *kids[10]; /* if nkids >0 */
struct token *leaf; /* if nkids == 0; NULL for ε productions */
};
The actual maximum number of children needed depends on your context
free grammar in your .y file, and possibly whether you represent all
punctuation tokens in your tree or not. The number may be smaller or larger
than 10.
You should write a full tree abstract data type in a tree.c and eventually it should provide functions to create, destroy, and traverse trees. Initially for this lab, you will need functions to allocate and initialize leaves and internal nodes.
#include "tree.h" directives to the header of
your lex and yacc specifications. Typically this is inside %{ ... %}:
%{
#include "tree.h"
%}
#include directive, you need to add
a dependency to your
makefile that will trigger recompilation if the include
file is modified.
In this case, the
gcc compiles of lex.yy.o and
cgram.tab.o depend on
tree.h, not just lex.yy.c and
j0gram.tab.c. Change rules like
lex.yy.o : lex.yy.c gcc -c -g -Wall lex.yy.cto instead say
lex.yy.o : lex.yy.c tree.h gcc -c -g -Wall lex.yy.c
"tree.h" in its header section, you can add
%union {
struct tree *treeptr;
}
to the header of your Bison file. As shown in lecture notes, adding
this %union is one thing, and using it in your grammar rules is
another; you have to declare all your terminal and non-terminal symbols as
to what member (in this case treeptr) before you can refer to them in
actions as $$ or $1 etc.
%union declared, the output code that Bison writes will
have a variable named yylval of type YYSTYPE where
YYSTYPE is the union type given by %union.
On each shift action, the parser will copy (push) what is in yylval.treeptr
onto the value stack. So our lexer should allocate a tree leaf and assign
it to yylval.treeptr where we previously allocated a token
and assigned it to some global variable such as yytoken.
For some of you, this will be a small tweak, for others this is a major change, depending on how you implemented your HW#2. If you have already performed a given step here, you may skip forward.
[0-9]+ { return ICON; }
You need to allocate your token inside your yylex() per HW#2 specs.
Something like the following.
[0-9]+ { return alctoken(ICON); }
where alctoken() is a helper function that you write to
allocate a token. You have to do this (duh) for all the regular expressions
that return a token.
int alctoken(int category){
yytoken = malloc(sizeof (struct token));
yytoken->cat = category;
...
return category;
}
to something like
int alctoken(int category){
yylval.treeptr = malloc(sizeof (struct tree));
... initialize other tree fields, prodrule = category, nkids = 0...
yylval.treeptr->token = malloc(sizeof (struct token));
yylval.treeptr->token->cat = category;
... initialize other token fields...
return category;
}
Note: you may want to write additional helper functions.
yylex()
code, and from inside Bison-generated yyparse() code.
printnode(t) that can takes a pointer
to a struct tree as a
parameter, and prints a line of detailed information about it. For a
tree leaf, have it print the tree's token information (text, line #...)
as per HW#2.
yylex() function, for example,
inside an alctoken() function,
you can call printnode() right before you return,
to see all the tokens
as they come out of yylex().
yyparse() things are a bit more hairy. Consider a
grammar rule like
identifier : IDENTIFIER ;
Once you have declared %token <treeptr> IDENTIFIER
in your you can print the token information for IDENTIFIER as follows:
identifier : IDENTIFIER { printnode($1); } ;
You could go ahead and print all your terminal symbols in all your rules
in this way, but for this lab, you are asked to just do a couple
categories to verify that the leaf information is present and usable.
$$ a value.
translation_unit: translation_unit external_declaration {
$$ = alctree(TU_TU_ED, "translation_unit", 2, $1, $2);
} ;
parameters : { $$ = NULL; /* empty parameter list */ } ;
Note that epsilon productions are a frequent cause of ambiguity
(e.g. reduce/reduce conflicts) and most grammar files go out of their way
to avoid doing any of these!
$$=$1 as its default semantic action.
This works great when a non-terminal has only one child.
You don't have to write
this semantic action at all. No { } is required. Example:
parameters : parameter_list ;Note that you can add a semantic action here, such as one that allocates a tree node with one child. Do so if it will help you understand your tree afterwards.
parameters : parameter_list { $$ = alctree(PS_PL, "parameters", 1, $1); };
yyparse() returns. Either:
file : translation_unit {
tree_print($1); /* can call rest of compiler from here */
} ;
or define a global variable for the tree root
file : translation_unit { root = $1; } ;
... and then, after yyparse() in main:
tree_print(root);