CSE 423 Lab #5: Drawing Syntax Trees

Turnin:

an image of your hand-drawn syntax tree with your name on it (individual), and
a pretty PNG image produced from your compiler (group OK)

Due on Canvas, Sunday 3/2, 11:59pm

This week's lab asks you to practice drawing syntax trees by hand, and then implement a tree traversal that generates a graphical representation of your syntax trees or symbol tables. Turn in as far as you get by Sunday evening; preferably a .zip with some sample images and your compiler code base that produced them. You will be graded as having done the lab as long as it looks like you did two or more hours of work.

Part 1: Drawing Syntax Trees by Hand

You need a concrete mental model of the syntax trees upon which the entire rest of the course will be built. I will literally ask you to draw syntax trees on the midterm and final exam, possibly with aspects of semantic analysis or code generation to work out. So let's practice drawing some syntax trees. Below are three short Kotlin programs with which you can practice drawing syntax trees.

You will need a Bison grammar that approximates Kotlin, such as one you are writing for HW. It is possible to do the lab faking this part, but the closer your approximation is to reality, the more useful the lab is.
Draw a circle or oval for each non-leaf node with two or more children.
Draw a square or rectangle for each leaf node (token). You can omit punctuation that is implied by the production rule.
Draw the non-terminal name inside circles/ovals, and the terminal symbol name inside squares/rectangles
Draw lines connecting parents with children.

fun main(args : Array<String>) { // Print text to the console. println("Hello World!") }

fun main() { var x : Int = 5 x + 1 }

fun main() { var n : Int = 5 if (n < 0) { println("{n} is negative") } else { println("{n} is zero") } }

Check your work by exchanging with a classmate, or do Part 2 and compare your work against the Machine.

Part 2: Dot

Dot is part of a package called graphviz. It is available for Linux and Windows; maybe MacOS too. On login.cs.nmt.edu it is in /usr/bin/dot which should already be on your path. If you want to do this lab on your own machine, you will have to download and install graphviz.

The dot language is a human readable ASCII way of describing a graph, which dot will then render in forms such as a PNG image file.

Here is the dot language reference

Prepping your Dot Syntax Trees

To use Dot, each tree node will require a unique identification. While we could maybe get away with using their address (pointer) as an id, you should probably go ahead and add a new field to your tree structure (translate as needed to fit your tree type):

struct tree {
  ...
  int id;
  ...
}

And initialize this field in alctree() or whatever you are using to construct tree nodes (translate as needed):

int serial;
struct tree *alctree(int label, char *sname, int nkids, ...)
{
   ...
   ptr->id = serial++;
   ...
}

If you have additional functions that allocate tree nodes (alcleaf() or whatever) they should also add the code to initialize their id #.

Adding names for token categories

Dunno if you wrote your own means of printing token categories, but if YYDEBUG is defined, Bison (maybe not other yacc's, or not compatibly) writes out a static array of strings from which you can get token (and non-terminal) names.

#if YYDEBUG || YYERROR_VERBOSE || 0
/* YYTNAME[SYMBOL-NUM] -- String name of the symbol SYMBOL-NUM.
   First, the terminals, then, starting at YYNTOKENS, nonterminals.  */
static const char *const yytname[] =
{
  "$end", "error", "$undefined", "BAD_TOKEN", "ICON", "CCON", "FCON",
  "ENUMERATION_CONSTANT", "IDENTIFIER", "STRING", "SIZEOF", "INCOP",

You can't read this static array from other .c files, but you can add the following to the end of your .y file:

%%

const char *yyname(int sym)
{
   return yytname[sym-BAD_TOKEN+3];
}

If you are writing your compiler in a different language or using a different YACC implementation you may have to adapt or build-your-own substantially in order to perform the equivalent; feel free to ask for help and/or share your results if you figure it out for another YACC implementation.

Writing your Dot Syntax Trees

Write a print_graph() function that traverses your tree and writes out a Dot file with extension .dot. You can base this on your textual tree-printing function from HW#3, and just modify your print statements to write things in Dot format.

Your solution might look some something like the following (adapted from [Jeffery2021]). You may have to debug this code!


/* add a \ before leading and trailing double quotes */
char *escape(char *s) {
   char *s2 = malloc(strlen(s)+4);
   if (s[0] == '\"') {
      if (s[strlen(s)-1] != '\"') {
	 fprintf(stderr, "What is it?!\n");
	 }
      sprintf(s2, "\\%s", s);
      strcat(s2+strlen(s2)-1, "\\\"");
      return s2;
     }
   else return s;
}

char *pretty_print_name(struct tree *t) {
   char *s2 = malloc(40);
   if (t->leaf == NULL) {
      sprintf(s2, "%s#%d", t->symbolname, t->prodrule%10);
      return s2;
      }
   else {
      sprintf(s2,"%s:%d", escape(t->leaf->text), t->leaf->category);
      return s2;
      }
}

void print_branch(struct tree *t, FILE *f) {
   fprintf(f, "N%d [shape=box label=\"%s\"];\n", t->id, pretty_print_name(t));
}

char *yyname(int);

void print_leaf(struct tree *t, FILE *f) {
   char * s = yyname(t->leaf->category);
   // print_branch(t, f);
   fprintf(f, "N%d [shape=box style=dotted label=\" %s \\n ", t->id, s);
   fprintf(f, "text = %s \\l lineno = %d \\l\"];\n", escape(t->leaf->text),
   t->leaf->lineno);
}

void print_graph2(struct tree *t, FILE *f) {
   int i;
   if (t->leaf != NULL) {
      print_leaf(t, f);
      return;
      }
   /* not a leaf ==> internal node */
   print_branch(t, f);
   for(i=0; i < t->nkids; i++) {
      if (t->kids[i] != NULL) {
         fprintf(f, "N%d -> N%d;\n", t->id, t->kids[i]->id);
	 print_graph2(t->kids[i], f);
	 }
      else { /* NULL kid, epsilon production or something */
         fprintf(f, "N%d -> N%d%d;\n", t->id, t->id, serial);
	 fprintf(f, "N%d%d [label=\"%s\"];\n", t->id, serial, "Empty rule");
	 serial++;
	 }
      }
}

void print_graph(struct tree *t, char *filename){
      FILE *f = fopen(filename, "w"); /* should check for NULL */
      fprintf(f, "digraph {\n");
      print_graph2(t, f);
      fprintf(f,"}\n");
      fclose(f);
}

Adding a command line option

Modify your compiler to accept an optional argument -dot and if you see that option, have your compiler call your .dot-file generating code instead of its normal behavior.

Running the Dot Program

Given a .dot file written by your compiler, in a file like hello.java.dot, to generate a PNG image, you would invoke it via

dot -Tpng hello.java.dot >hello.png

View the generated image to make sure it looks understandable. For example, for a C subset language, a hello world program (hello.c) results in the following:

Of course, for some other language, your tree should look different.

Wrapping up

Try your .dot-file generation on the preceding examples, and the following (nonsensical) program:

// foo.kt
fun foo(x : Int, y : String) : Int {
   return x
   }
fun main() {
   var z : Int
   z = foo(5, ("funf").toString())
}

For this program, you might be running

./k0 -dot foo.kt
dot -Tpng foo.kt.dot >foo.png

When foo.png looks reasonable, you are done with the lab. Submit your image on canvas.