In this assignment you will write a lexical analyzer in
flex(1), for a subset of Kotlin known as the k0
language
Engineering Requirements
In this and all subsequent assignments in 423, please meet the following
engineering requirements. Points will be assigned in grading for them.
unzip.
It may not be a .tar or a .rar
or a .bzip or whatever, whether disguised or renamed or not.
The .zip must unpack into the current directory, not a
subdirectory. Subdirectories are fine, but there must be a
top-level makefile that builds an executable named
k0
in the top-level directory from where you were unzipped.
That is what my test script will attempt to run.
k0.
#include of .c files. No including any code (function
bodies) in .h files.
-Wall
on all compilation lines. If you are using another language,
which must be approved by Dr. J, you must also seek to use all
its warning options, or get any omissions approved.
Points will be lost if you don't fix warnings. There are some common
lex/flex warnings, such as about not using input() that
are no big deal, but use
%option noinput %option nounputto shut them up. See the instructor if you are unable to fix a warning.
Your program executable must be named k0. Your program
should read in source file(s) named on the command line and
write output with one line for each token, described below.
Source files
must accept the extension .kt. The compiler should
automatically add
.kt to the end of filenames if no other extension is given.
(Eventually in a later homework, the compiler will automatically name the
executable the same name as the first argument.
For this assignment there is no output executable.)
Compilers and related tools are used by programs such as make(1)
that read the process exit status to tell whether all is well. Your
program's exit status should return 0 if there are no errors, and a nonzero
number to indicate errors. For lexical errors, return 1.
k0 language is (not) described (yet) at
http://www.cs.nmt.edu/~jeffery/courses/423/k0.html. As this
is a new language this semester, these details will be filled in
and corrected and amended as needed in response to student questions.
yylex() and main(),
below.
"hello
/* world
12e
yylex(), you should compute attributes for each token,
and store them in a global variable named yytoken. Note that
this is not part of the lex/yacc public interface, although it is named so
as to be a recognizable extension of said interface. You should use the
following token type, or a compatible extension of it.
struct token {
int category; /* the integer code returned by yylex */
char *text; /* the actual string (lexeme) matched */
int lineno; /* the line number on which the token occurs */
char *filename; /* the source file in which the token occurs */
int ival; /* for integer constants, store binary value here */
double dval; /* for real constants, store binary value here */
char *sval; /* for string constants, malloc space, de-escape, store */
/* the string (less quotes and after escapes) here */
}
In this homework your main() procedure should
build a LINKED LIST of all the token structs, each of which is created by
yylex(). In the next assignment, we will discard the linked
list and instead insert all these tokens into a tree.
Example linked list structure:
struct tokenlist {
struct token *t;
struct tokenlist *next;
}
Use the malloc() function to allocate chunks of memory for
struct token and struct tokenlist.
yylex() and main() yylex() should return a different unique integer > 257
for each reserved word, and for each other token category (identifier,
integer literal constant, string literal constant, addition operator, etc).
Numbers > 257 are required for the sake of compatibility with the
parser generator tool. For each such number, you must #define
a symbol, as in
#define IDENTIFIER 260This is required for the sake of readability. Your
yylex()
should return -1 when it hits end of file. In this homework, your
yylex() should recognize lines beginning with # and treat them
as comments, i.e. delete the line contents silently. In later homework,
treatment of preprocessor directives will become more interesting.
In this assignment, there should be (at least) two separately-compiled .c
files, a .h file and a makefile. The yylex() function must be
called by a main() function in a loop. For each token, the
main() function should
write out a line containing the token category (an integer
> 257) and lexical attributes.
For an example input file named hello.kt that contains:
fun main(args : Array<String>) {
println("Hello,\tWorld!")
}
|
your output should look something like the following. Integer categories are for illustration purposes; your integer codes may be different.
Category Text Lineno Filename Ival/Sval
-------------------------------------------------------------------------
270 fun 1 hello.kt
271 main 1 hello.kt
290 ( 1 hello.kt
271 args 1 hello.kt
294 : 1 hello.kt
271 Array 1 hello.kt
295 < 1 hello.kt
271 String 1 hello.kt
296 > 1 hello.kt
291 ) 1 hello.kt
292 { 1 hello.kt
271 println 2 hello.kt
290 ( 2 hello.kt
272 "Hello,\tWorld!" 2 hello.kt Hello, World!
291 ) 2 hello.kt
293 } 3 hello.kt
|