CSE 423 Lab #2: Lexical Testing

For today's lab, we are going to work on HW#2 as much as we can. And think about how to test it sufficiently.

Lab Part 1: Team Up and Swap Notes (20 minutes)

Break into small groups, preferably of size three. Zoom folks, I guess we'll try one big zoom group.
Share with each other the non-trivial regex'es that you have in your k0lex.l so far. Discuss each.
If you have questions about what regular expression is correct, write those down as part of your lab submission.
For each non-trivial regex, either agree unanimously, or by voting, which regular expressions are the most correct.

Lab Part 2: Lex Testing

Write test cases for each token category, especially the non-trivial ones.
Hello World is a test case (.kt file). It provides coverage for ~10 token categories.
A brute-force lexical coverage test would include one of each token category on the entire list -- prove that your lexical analyzer can generate all ~200 integer codes for all the unique terminal symbols in your ytab.h file
Subdivide this task amongst team members, share results.
Add one or more tests for everything in Kotlin
Test the boundary cases ("weird" and "bad" tokens).
Unterminated strings
Random binary garbage characters
Nested comments
...etc.

What to Turn in

Lab submission: turn in a .ZIP containing: For Part 1, a PDF document named README.pdf that provides

your list of team members, and their participation scores ( 0 for no participation, 1 for partial participation, 2 for full participation)
your list of which Kotlin tokens deserve non-trivial regexes.
your group's (unanimous or vote-based) winners for these regex'es
your list of questions (if any)

For Part 2: include in the .zip your set of lexical test cases, in .kt files.

You will receive full credit for the lab if it looks like you did at least three hours' worth of lab work. I figure each group should have no problem generating 30 or more test cases that collectively test over 100 of your tokens. Probably teams can get closer to 100% coverage than that. The tests do not have to "run", they do not have to "compile", they only have to provide some lexical challenges relevant to Kotlin.

Answers to Lab Questions

What do you mean, put the tokens in yylex()?: I mean malloc() an instance of each token inside yylex() and initialize its fields before you return the integer category. The type definition for tokens should be in a .h file.
How do I de-escape the string literals for the svals as shown in the HW#2 example output?: You probably write a helper function and build an sval from yytext by doing a "transforming copy" where most yytext characters are copied into the sval, but escaped characters in yytext map 2+ characters down to a single character in the sval.
It seems like you want a lot of stuff to be done inside yylex(): Yes, probably in 1+ helper functions called from the yylex() semantic actions code fragments.
Multi-line strings?: Kotlin allows multi-line strings. It was decided k0 will do multi-line strings. However, tricky bits are negotiable.