jeffery@cs.nmt.edu
c113c(pronounced "See 113 See", short for CSE 113 Compiler) is a subset of the ANSI C Programming Languagec113cis a tiny language intended to be implemented in a compiler construction class.
c113c is a subset of C.
c113c is intended to correspond roughly to
the subset of C that would be covered in a CS1 class such as NMT's CSE
113 course.
The facilities that c113c supports are just barely
interesting enough to write some non-trivial computations in it.
c113c programs are legal C programs with a .c file extension.
A program begins
with a main() procedure.
A "Hello world" program looks like:
#include <stdio.h>
int main() {
printf("Hello, world");
return 0;
}
The c113c include facility is restricted to only those built-in system
includes used in CSE 113, which are faked in c113c.
C features many basic types. c113c supports:
char int float
The types int and float both refer to 64-bit values.
The types short, long, and double are
allowed in c113c and denote aliases for int and float
c113c has while and for loops.
Curly braces around the loop body are required.
For loops require non-empty expressions for all three parts of the header.
For loop clauses do not allow variable declaration in the initializer,
nor use of the comma operator to initialize or update multiple variables
each iteration.
Conditionals in c113c consist of if and switch statements.
If statements use syntax similar to while loops. Curly braces are required.
An else branch is optional.
if (x < 0) {
...
}
else branches require curly braces, unless they are (chained) if statements.
if (x < 0) {
...
} else if (x < 10) {
...
} else {
...
}
Switch statements in c113c require constant, non-duplicate switch cases. A break is required at the end of each pre-final, non-empty chunk of code (no fall-through). A default clause is required at the end, with no break.
switch (c) {
case 1: case 2:
printf("1 or 2\n"); break;
case 3: case 4:
printf("3 or 4\n"); break;
default:
printf("some other value\n");
}
C supports creation of new types via a struct. c113c has structs.
C has pointers, but no pointer arithmetic. c113c should support just enough pointers to support homework assignments in CSE 113. Linked lists. Thus, pointers to structs.
C has arrays. c113c has one-dimensional arrays only.
When in doubt about c113c features, refer to the C language specification. I will add notes below as needed. The easiest way to get out of having to implement something is to ask about it and negotiate.
c113c
may simplify and reduce the lexical rules of C a bit.
Of the C whitespace characters, c113c must implement space, tab, carriage return, and newline.
c113c supports both styles of C comments. In C comments are text placed between the delimiters /* and */. Comments can also use // to comment from that point to the end of a line. Examples:
x = 1; //single line comment
/* this is a
multiple line
comment */
C has many reserved words. Newer dialects have more than older versions. The reserved words (also called keywords) in c113c are in bold. Those not in c113c are underlined and should result in a fatal error ("this C feature is not in c113c"). Note that c113c also has semantic simplifications compared with ANSI C.
| auto | else | long | switch |
| break | enum | register | typedef |
| case | extern | return | union |
| char | float | short | unsigned |
| const | for | signed | void |
| continue | goto | sizeof | volatile |
| default | if | static | while |
| do | int | struct | double |
c113c supports the following operators
| = | assignment |
| + - * / | binary arithmetic, int and float |
| % | binary arithmetic, int |
| ++ -- | unary increment and decrement, int only, suffix only |
| - | unary negation, prefix |
| == != > < >= <= | binary comparison |
| && || ! | logical AND, OR, and NOT |
| & * sizeof | unary prefix address-of, contents-of, size-of |
| [ ] . | binary subscript and dot |
| (type) | how much casting does c113c absolutely require? |
An error is reported for
, is not in c113c,
but the comma is legal punctuation in variable
declaration lists and function parameter lists.
| Escape sequence | Hex value in ASCII | Character represented |
|---|---|---|
| \a | 07 | Alert (Beep, Bell) |
| \b | 08 | Backspace |
| \e | 1B | Escape character |
| \f | 0C | Formfeed Page Break |
| \n | 0A | Newline (Line Feed); see notes below |
| \r | 0D | Carriage Return |
| \t | 09 | Horizontal Tab |
| \v | 0B | Vertical Tab |
| \\ | 5C | Backslash |
| \' | 27 | Apostrophe or single quotation mark |
| \" | 22 | Double quotation mark |
| \? | 3F | Question mark (used to avoid trigraphs) |
| \nnn | any | c113c allows \0. report other octal chars as a lexical error |
| \xhh | any | c113c reports hex characters as a lexical error |
| \uhhhh | none | c113c reports Unicode code points as a lexical error |
| \Uhhhhhhhh | none | c113c reports Unicode code point where h is a hexadecimal digit |
Punctuation characters are lexemes that are supported in c113c that are not part of other lexemes (not operators, not identifiers, not literals).
( ) , ; { } , :
Other punctuation characters generally should be reported as lexical errors,
including
# $ @ \ ^ `One exception: a line beginning with # in the following format is a line directive and is to be interpreted as per GCC. It gives a line number N and (quoted) filename to be used for reporting on the line(s) that follow. The ... is (optional) other stuff you can treat as a comment and ignore up until the next newline character.
# N "f" ...
Identifiers in c113c are as per the C language: a letter followed by zero or more additional letters or digits. They are case-sensitive. c113c will consider only the first six characters to be significant.
Function definitions in c113c follow the following format:
return type identifier ( parameter_list ) { function body }
if (statement) {}
if (statement) {} else {}
if (statement) {} else if (statement) {} else {}
while (statement) {}
for ( init; condition; increment) {}
switch(integer){
case literal:
statements;
...
default:
statements;
}
Structs in c113c are a subset of C structs:
struct tag {
data
};
Compared with C, c113c skips: nested struct-in-struct
(although pointer to struct is allowed), bit-fields within structs,
anonymous (non-labeled) structs.
Declaration syntax is only allowed for global variables and at the top of the bodies of function definitions, before the first executable statement. We allow only simple initializers including int, float, char and char *.
type identifier; type * identifier; type identifier [K]; type identifier = literal;A slightly simpler syntax is allowed for parameter lists, which do not allow initializers.
All data types listed are used by c113c and are described in the reserved words section. char, short, int, long, float, double
Although the C language automatically converts between numeric types as needed (promotion and demotion), c113c does not.
These are usual C null-terminated char* arrays.
Note that (void *) values automatically convert to any other pointer type, but no other pointer type conversions, nor casts, are supported in c113c. I wouldn't care about supporting void * at all, if you want to implement malloc as returning char *, that's fine.
As listed in the introduction C has multidimensional arrays whereas c113c single dimensional arrays. For example:
int num[100];
Arrays are constructed as pointers of a specific type with a fixed memory width with sequential memory allocated either on the stack or heap. The reference pointer is to the first sequential location.
stdio.h, c113c will have
as few as possible.
| Function | Library | Use |
|---|---|---|
| printf(s,x) | stdio.h | Prints to stdout, simplified |
| sprintf(s1,s,x) | stdio.h | Formats to a string, simplified |
| fopen(s,m) | stdio.h | open a file, simplified |
| fclose(f) | stdio.h | close a file |
| fprintf(f,s,x) | stdio.h | Prints to file, simplified |
| fscanf(f,s,p) | stdio.h | Reads/scans from file, simplified |
| malloc(n) | stdlib.h | allocate memory |
| realloc(p,n) | stdlib.h | reallocate/resize memory |
| free(p) | stdlib.h | free memory |
| rand() | stdlib.h | random number |
| strlen(s) | string.h | string length |
| strcpy(s1,s2) | string.h | string copy |
| strcmp(s1,s2) | string.h | string compare |
| strtok(s1,s2) | string.h | string splitter/tokenizer |
| sqrt(r) | math.h | square root |
| cos(r) | math.h | cosine |
| pow(r,i) | math.h | exponent |
| sin(r) | math.h | sine |