c113c

a Programming Language

Clinton Jeffery jeffery@cs.nmt.edu
with input from CSE 423 students

Draft Version 0.3, February 3, 2021.



Language Reference Manual


Abstract

c113c (pronounced "See 113 See", short for CSE 113 Compiler) is a subset of the ANSI C Programming Language c113c is a tiny language intended to be implemented in a compiler construction class.





New Mexico Institute of Mining and Technology
Department of Computer Science and Engineering
Socorro, NM 87801 USA












Contents

  1. Introduction
  2. Lexical Rules
  3. Syntax
  4. Data Types and Semantics
  5. Summary


1. Introduction

c113c is a subset of C. c113c is intended to correspond roughly to the subset of C that would be covered in a CS1 class such as NMT's CSE 113 course. The facilities that c113c supports are just barely interesting enough to write some non-trivial computations in it.

c113c programs are legal C programs with a .c file extension. A program begins with a main() procedure. A "Hello world" program looks like:

#include <stdio.h>

int main() {
   printf("Hello, world");
   return 0;
}
The c113c include facility is restricted to only those built-in system includes used in CSE 113, which are faked in c113c.

C features many basic types. c113c supports:

char
int
float

The types int and float both refer to 64-bit values. The types short, long, and double are allowed in c113c and denote aliases for int and float

c113c has while and for loops. Curly braces around the loop body are required. For loops require non-empty expressions for all three parts of the header. For loop clauses do not allow variable declaration in the initializer, nor use of the comma operator to initialize or update multiple variables each iteration.

Conditionals in c113c consist of if and switch statements. If statements use syntax similar to while loops. Curly braces are required. An else branch is optional.

if (x < 0) {
   ...
}
else branches require curly braces, unless they are (chained) if statements.
if (x < 0) {
   ...
} else if (x < 10) {
   ...
} else {
   ...
}

Switch statements in c113c require constant, non-duplicate switch cases. A break is required at the end of each pre-final, non-empty chunk of code (no fall-through). A default clause is required at the end, with no break.

    switch (c) {
    case 1: case 2:
       printf("1 or 2\n"); break;
    case 3: case 4:
       printf("3 or 4\n"); break;
    default:
       printf("some other value\n");
    }
  

C supports creation of new types via a struct. c113c has structs.

C has pointers, but no pointer arithmetic. c113c should support just enough pointers to support homework assignments in CSE 113. Linked lists. Thus, pointers to structs.

C has arrays. c113c has one-dimensional arrays only.

When in doubt about c113c features, refer to the C language specification. I will add notes below as needed. The easiest way to get out of having to implement something is to ask about it and negotiate.

2. Lexical Rules

The lexical rules of c113c start with: the lexical rules of C. c113c may simplify and reduce the lexical rules of C a bit.

2.1 Whitespace and Comments

Of the C whitespace characters, c113c must implement space, tab, carriage return, and newline.

c113c supports both styles of C comments. In C comments are text placed between the delimiters /* and */. Comments can also use // to comment from that point to the end of a line. Examples:

 x = 1; //single line comment
    /* this is a
    multiple line
    comment */

2.2 Reserved Words

C has many reserved words. Newer dialects have more than older versions. The reserved words (also called keywords) in c113c are in bold. Those not in c113c are underlined and should result in a fatal error ("this C feature is not in c113c"). Note that c113c also has semantic simplifications compared with ANSI C.

autoelselongswitch
breakenumregistertypedef
caseexternreturnunion
charfloatshortunsigned
constforsignedvoid
continuegotosizeofvolatile
defaultifstaticwhile
dointstructdouble

2.3 Operators

c113c supports the following operators

= assignment
+ - * / binary arithmetic, int and float
% binary arithmetic, int
++ -- unary increment and decrement, int only, suffix only
- unary negation, prefix
== != > < >= <= binary comparison
&& || ! logical AND, OR, and NOT
& * sizeof unary prefix address-of, contents-of, size-of
[ ] . binary subscript and dot
(type) how much casting does c113c absolutely require?

An error is reported for

C's comma operator , is not in c113c, but the comma is legal punctuation in variable declaration lists and function parameter lists.

2.4 Literals

Integers

Reals

Characters

Strings

Escape Sequences (Character and String Literals)

Adapted from wikipedia

Escape sequenceHex value in ASCII Character represented
\a 07 Alert (Beep, Bell)
\b 08 Backspace
\e 1B Escape character
\f 0C Formfeed Page Break
\n 0A Newline (Line Feed); see notes below
\r 0D Carriage Return
\t 09 Horizontal Tab
\v 0B Vertical Tab
\\ 5C Backslash
\' 27 Apostrophe or single quotation mark
\" 22 Double quotation mark
\? 3F Question mark (used to avoid trigraphs)
\nnnany c113c allows \0. report other octal chars as a lexical error
\xhhanyc113c reports hex characters as a lexical error
\uhhhhnonec113c reports Unicode code points as a lexical error
\Uhhhhhhhhnonec113c reports Unicode code point where h is a hexadecimal digit

2.5 Punctuation

Punctuation characters are lexemes that are supported in c113c that are not part of other lexemes (not operators, not identifiers, not literals).

(  )  ,  ;  {  }  , :
  
Other punctuation characters generally should be reported as lexical errors, including
#  $  @  \  ^  `
  
One exception: a line beginning with # in the following format is a line directive and is to be interpreted as per GCC. It gives a line number N and (quoted) filename to be used for reporting on the line(s) that follow. The ... is (optional) other stuff you can treat as a comment and ignore up until the next newline character.
# N "f" ...
  

2.6 Identifiers

Identifiers in c113c are as per the C language: a letter followed by zero or more additional letters or digits. They are case-sensitive. c113c will consider only the first six characters to be significant.

3. Syntax

A good fraction of ANSI C syntax will denote constructs that are not supported in c113c. The easiest thing is probably to support the whole C language, less the parts that have been ruled out via lexical errors, and then define portions of it that will be unsupported and trigger an error in HW#3.

3.1 Function Syntax

Function definitions in c113c follow the following format:

return type identifier ( parameter_list ) { function body }

3.2 Control Structures

if (statement) {}
if (statement) {} else {}
if (statement) {} else if (statement) {} else {}
while (statement) {}
for ( init; condition; increment) {}
switch(integer){
case literal:
statements;
...
default:
statements;
}

3.3 Structures

Structs in c113c are a subset of C structs:

struct tag {
  data
  };
Compared with C, c113c skips: nested struct-in-struct (although pointer to struct is allowed), bit-fields within structs, anonymous (non-labeled) structs.

3.4 Declaration Syntax

Declaration syntax is only allowed for global variables and at the top of the bodies of function definitions, before the first executable statement. We allow only simple initializers including int, float, char and char *.

type identifier;
type * identifier;
type identifier [K];
type identifier = literal;
A slightly simpler syntax is allowed for parameter lists, which do not allow initializers.

4. Data Types

4.1 Numbers

All data types listed are used by c113c and are described in the reserved words section. char, short, int, long, float, double

Although the C language automatically converts between numeric types as needed (promotion and demotion), c113c does not.

4.2 Strings

These are usual C null-terminated char* arrays.

Note that (void *) values automatically convert to any other pointer type, but no other pointer type conversions, nor casts, are supported in c113c. I wouldn't care about supporting void * at all, if you want to implement malloc as returning char *, that's fine.

4.3 Arrays

As listed in the introduction C has multidimensional arrays whereas c113c single dimensional arrays. For example:

int num[100]; 

Arrays are constructed as pointers of a specific type with a fixed memory width with sequential memory allocated either on the stack or heap. The reference pointer is to the first sequential location.

4. Library Functions

c113c supports a small subset of the functionality of a small subset of C's standard includes, including stdio.h, stdlib.h, string.h, time.h (currenttime), and math.h. While the full C versions of these libraries support many many functions, and even types, c113c is minimalist. For example, instead of defining 25+ public symbols in stdio.h, c113c will have as few as possible.

FunctionLibraryUse
printf(s,x)stdio.hPrints to stdout, simplified
sprintf(s1,s,x)stdio.hFormats to a string, simplified
fopen(s,m)stdio.hopen a file, simplified
fclose(f)stdio.hclose a file
fprintf(f,s,x)stdio.hPrints to file, simplified
fscanf(f,s,p)stdio.hReads/scans from file, simplified
malloc(n)stdlib.h allocate memory
realloc(p,n)stdlib.h reallocate/resize memory
free(p)stdlib.h free memory
rand()stdlib.h random number
strlen(s)string.h string length
strcpy(s1,s2)string.h string copy
strcmp(s1,s2)string.h string compare
strtok(s1,s2)string.h string splitter/tokenizer
sqrt(r)math.h square root
cos(r)math.h cosine
pow(r,i)math.h exponent
sin(r)math.h sine

Summary

Sure, c113c may be a toy language designed for a compiler class. Even with only this much, it may provide a convenient notation for a lot of simple programming tasks such as those faced by students in CSE 113.