CSE 423 Lab #4: Trees

Turnin: on Canvas in a .zip file. Group Lab. Divide the work.

1. Adding a tree type to your compiler

The phrase "your compiler" in this case refers to the output of Lab #3 and/or preliminary work on HW#3, where you've got Bison and Flex talking together as a syntax checker.
  1. Add a tree.h file to your project, containing a struct definition as per HW#3, or your own preferred tree representation. Today's lab is written up as if your tree type is
    struct tree {
       int prodrule;
       char *symbolname;
       int nkids;
       struct tree *kids[10]; /* if nkids >0 */
       struct token *leaf;   /* if nkids == 0; NULL for ε productions */
    };
    
    The actual maximum number of children needed depends on your context free grammar in your .y file, and possibly whether you represent all punctuation tokens in your tree or not. The number may be smaller or larger than 10.

    You should write a full tree abstract data type in a tree.c and eventually it should provide functions to create, destroy, and traverse trees. Initially for this lab, you will need functions to allocate and initialize leaves and internal nodes.

  2. Add #include "tree.h" directives to the header of your lex and yacc specifications. Typically this is inside %{ ... %}:
    %{
    #include "tree.h"
    %}
         
  3. Whenever you add an #include directive, you need to add a dependency to your makefile that will trigger recompilation if the include file is modified. In this case, the gcc compiles of lex.yy.o and cgram.tab.o depend on tree.h, not just lex.yy.c and j0gram.tab.c. Change rules like
    lex.yy.o : lex.yy.c
    	gcc -c -g -Wall lex.yy.c
    
    to instead say
    lex.yy.o : lex.yy.c tree.h
    	gcc -c -g -Wall lex.yy.c
    

2. Adding the %union to your bison

Once your Bison .y file knows about your tree type by including "tree.h" in its header section, you can add
%union {
   struct tree *treeptr;
}
to the header of your Bison file. As shown in lecture notes, adding this %union is one thing, and using it in your grammar rules is another; you have to declare all your terminal and non-terminal symbols as to what member (in this case treeptr) before you can refer to them in actions as $$ or $1 etc.

3. Adding the leaves to your lex

It is time to make the leaves available to Bison. With the %union declared, the output code that Bison writes will have a variable named yylval of type YYSTYPE where YYSTYPE is the union type given by %union.

On each shift action, the parser will copy (push) what is in yylval.treeptr onto the value stack. So our lexer should allocate a tree leaf and assign it to yylval.treeptr where we previously allocated a token and assigned it to some global variable such as yytoken.

For some of you, this will be a small tweak, for others this is a major change, depending on how you implemented your HW#2. If you have already performed a given step here, you may skip forward.

  1. If you have raw return statements in your lex file like this:
    [0-9]+		{ return ICON; }
    
    You need to allocate your token inside your yylex() per HW#2 specs. Something like the following.
    [0-9]+		{ return alctoken(ICON); }
    
    where alctoken() is a helper function that you write to allocate a token. You have to do this (duh) for all the regular expressions that return a token.
  2. In your code to allocate a token, change it to allocate a leaf that contains/wraps that token pointer. Change
    int alctoken(int category){
       yytoken = malloc(sizeof (struct token));
       yytoken->cat = category;
       ...
       return category;
    }
    
    to something like
    int alctoken(int category){
       yylval.treeptr = malloc(sizeof (struct tree));
       ... initialize other tree fields, prodrule = category, nkids = 0...
       yylval.treeptr->token = malloc(sizeof (struct token));
       yylval.treeptr->token->cat = category;
       ... initialize other token fields...
       return category;
    }
    
    Note: you may want to write additional helper functions.

4. Proving that you've got leaf information

There are two places where you could insert print statements to prove that you have leaf information: from inside the Flex-generated yylex() code, and from inside Bison-generated yyparse() code.
  1. Write a print function, perhaps printnode(t) that can takes a pointer to a struct tree as a parameter, and prints a line of detailed information about it. For a tree leaf, have it print the tree's token information (text, line #...) as per HW#2.
  2. Inside the yylex() function, for example, inside an alctoken() function, you can call printnode() right before you return, to see all the tokens as they come out of yylex().
  3. From inside yyparse() things are a bit more hairy. Consider a grammar rule like
         identifier : IDENTIFIER ;
    
    Once you have declared %token <treeptr> IDENTIFIER in your you can print the token information for IDENTIFIER as follows:
         identifier : IDENTIFIER { printnode($1); } ;
    
    You could go ahead and print all your terminal symbols in all your rules in this way, but for this lab, you are asked to just do a couple categories to verify that the leaf information is present and usable.

5. Adding internal nodes (not required for Lab 4, but in HW#)

Syntax tree construction consists of specifying for every production rule in the grammar, what its corresponding syntax tree should be, by assigning $$ a value.

  1. For each production rule that has more than one child, allocate an internal node. Example:
    translation_unit: translation_unit external_declaration {
            $$ = alctree(TU_TU_ED, "translation_unit", 2, $1, $2);
    	} ;
    
  2. Set $$ to NULL for epsilon productions. Example:
       parameters : { $$ = NULL; /* empty parameter list */ } ;
    
    Note that epsilon productions are a frequent cause of ambiguity (e.g. reduce/reduce conflicts) and most grammar files go out of their way to avoid doing any of these!
  3. Bison uses $$=$1 as its default semantic action. This works great when a non-terminal has only one child. You don't have to write this semantic action at all. No { } is required. Example:
       parameters : parameter_list ;
    
    Note that you can add a semantic action here, such as one that allocates a tree node with one child. Do so if it will help you understand your tree afterwards.
       parameters : parameter_list { $$ = alctree(PS_PL, "parameters", 1, $1); };
    
  4. Write (or adapt from the HW#3 spec) a tree traverser to print your tree. You can either call it from the start symbol action, or set the root of the tree and call it after yyparse() returns. Either:
    file : translation_unit {
           tree_print($1); /* can call rest of compiler from here */
         } ;
    
    or define a global variable for the tree root
    file : translation_unit { root = $1; } ;
    
    ... and then, after yyparse() in main:
    
      tree_print(root);