Jzero

a Programming Language

Clinton Jeffery jeffery@cs.nmt.edu
with input from CSE 423 students

Draft Version 0.1a, April 3, 2022.



Language Reference Manual


Abstract

j0 (pronounced "Jay Zero", short for Java 0 Compiler) is a subset of the Java Programming Language. j0 is a tiny language intended to be implemented in a compiler construction class.





New Mexico Institute of Mining and Technology
Department of Computer Science and Engineering
Socorro, NM 87801 USA












Contents

  1. Introduction
  2. Lexical Rules
  3. Syntax
  4. Data Types and Semantics
  5. Summary


1. Introduction

j0 is a family of subsets of Java. Broadly, j0 is intended to correspond roughly to the subset of Java that would be covered in a CS1 class, or in NMT's case, the CSE 213 course. The facilities that j0 supports are interesting enough to write some non-trivial computations in it. How much of j0 you must implement depends on your team size:

j0 programs are legal Java programs with a .java file extension. A program consists of a class that contains a main() procedure where execution starts. A "Hello world" program looks like:

public class HelloWorld {
   public static void main(String[] args) {
      System.out.println("Hello, World");
   }
}
The Java import facility is restricted to only those built-in system imports used in CSE 213, which are faked in j0.

Java features many basic types. j0 supports:

char
int
float
boolean
String

The types int and float both refer to 64-bit values. The types short, long, and double are allowed in j0 and denote aliases for int and float. Booleans and chars are stored in a single byte (like C, rather than Java 16-bit characters)

j0 has while and for loops. Curly braces around the loop body are required. For loops require non-empty expressions for all three parts of the header. For loop clauses do not allow variable declaration in the initializer, nor use of the comma operator to initialize or update multiple variables each iteration. In j0.1 for loop headers may ONLY consist of an assignment to a variable, a boolean test of that variable, and an increment/decrement/assignment of that variable. In j0.2+ the second and third parts of a for-loop are expressions constrained to produce boolean and integer values.

Conditionals in j0 consist of if statements. j0.2+ has switch statements. If statements use syntax similar to while loops. Curly braces are required. An else branch is optional.

if (x < 0) {
   ...
}
else branches require curly braces, unless they are (chained) if statements.
if (x < 0) {
   ...
} else if (x < 10) {
   ...
} else {
   ...
}

j0.2+ supports switch statements. Switch statements in j0 require constant, non-duplicate switch cases. A break is required at the end of each pre-final, non-empty chunk of code (no fall-through). A default clause is required at the end, with no break.

    switch (c) {
    case 1: case 2:
       printf("1 or 2\n"); break;
    case 3: case 4:
       printf("3 or 4\n"); break;
    default:
       printf("some other value\n");
    }
  

Java supports creation of new types via a class. j0.1 does not have (user-defined) classes j0.2+ has user-defined (simple) classes.

Java has no pointers; it has references. j0 should support enough in terms of references to allow operations essential for Strings and arrays.

Java has arrays, which are a weird built-in thing that are not class instances and have special syntax support. j0 has one-dimensional arrays only.

When in doubt about j0 features, refer to the Java language specification. I will add notes below as needed. The easiest way to get out of having to implement something is to ask about it and negotiate.

2. Lexical Rules

The lexical rules of j0 start with: the lexical rules of Java. j0 may simplify and reduce the lexical rules of Java a bit.

2.1 Whitespace and Comments

Of the Java whitespace characters, j0 must implement space, tab, carriage return, and newline.

j0 supports both styles of Java comments. Comments may be text placed between the delimiters /* and */. They may not be nested. Comments can also use // to comment from that point to the end of a line. Examples:

 x = 1; //single line comment
    /* this is a
    multiple line
    comment */

2.2 Reserved Words

Java has many reserved words. The reserved words (also called keywords) in all of j0 are in bold. Those not in any j0 have strikethrough and should result in a fatal error ("this C feature is not in j0"). Words with neither bold nor strikethrough are in Level Two or Level Three of j0.

Supported reserved words: boolean, break, case, char, class, continue, default, double, else, float, for, if, instanceof, int, long, new, public, return, static, switch, void, while.

Java reserved words not in Jzero: abstract, assert, byte, catch, const, do, enum, exports, extends, final, finally, goto, implements, import, interface, module, native, package, protected, requires, short, strictfp, super, synchronized, this, throw, throws, transient, try, var, volatile, private.

2.3 Operators

j0 supports the following operators

= assignment. j0.2+ includes +=, -=.
+ - * / binary arithmetic, int and float
% binary arithmetic, int
++ -- unary increment and decrement, int only, suffix only
- unary negation, prefix
== != > < >= <= binary comparison
&& || ! logical AND, OR, and NOT
instanceof j0.3 includes instanceof
[ ] . binary subscript and dot
(type) j0.2+ does type casts

An error is reported for

The comma operator , is not in j0, but the comma is legal punctuation in variable declaration lists and function parameter lists.

2.4 Literals

Integers

Reals

Characters

Strings

Escape Sequences (Character and String Literals)

j0.1 supports \n, \t, \', \"
j0.2 also supports \a, \f, \r, \0 octals
j0.3 also supports \b, \v, \?, \x hexadecimals
Unsupported escapes should be recognized and report a lexical error.

Escape sequenceHex value in ASCII Character represented
\a 07 Alert (Beep, Bell)
\b 08 Backspace
\f 0C Formfeed Page Break
\n 0A Newline (Line Feed); see notes below
\r 0D Carriage Return
\t 09 Horizontal Tab
\v 0B Vertical Tab
\\ 5C Backslash
\' 27 Apostrophe or single quotation mark
\" 22 Double quotation mark
\? 3F Question mark (used to avoid trigraphs)
\nnnany octal escapes
\xhhany hexadecimal escapes
\uhhhhnone Unicode code points

2.5 Punctuation

Punctuation characters are lexemes that are supported in j0 that are not part of other lexemes (not operators, not identifiers, not literals).

(  )  ,  ;  {  }  , :
  
Other punctuation characters generally should be reported as lexical errors, including
#  $  @  \  `
  
One exception: a line beginning with # in the following format is a line directive and is to be interpreted as per GCC. It gives a line number N and (quoted) filename to be used for reporting on the line(s) that follow. The ... is (optional) other stuff you can treat as a comment and ignore up until the next newline character.
# N "f" ...
  

2.6 Identifiers

Identifiers in j0 are as per the C language, not Java: a letter followed by zero or more additional letters or digits. They are case-sensitive.

3. Syntax

A good fraction of standard Java syntax will denote constructs that are not supported in j0. The easiest thing is probably to support the whole Java language grammar, less the parts that have been ruled out via lexical errors, and then define portions of it that will be unsupported and trigger an error in HW#3.

3.1 Function Syntax

Function definitions in j0 follow the following format. In j0.1 they are all static, with the static keyword required before the return type.

return type identifier ( parameter_list ) { function body }

3.2 Control Structures

if (statement) {}
if (statement) {} else {}
if (statement) {} else if (statement) {} else {}
while (statement) {}
for ( init; condition; increment) {}
switch(integer){
case literal:
statements;
...
default:
statements;
}

3.3 Classes

j0.1 has only one class, within which the program is a set of static methods. j0.2+ classes are a subset of Java classes:

public class tag {
  data
  };
j0 does not support nested classes inside other classes. References to classes as fields within a class is allowed. j0 does not have anonymous (non-labeled or lambda) classes.

3.4 Declaration Syntax

Declaration syntax is only allowed for global variables and at the top of the bodies of function definitions, before the first executable statement. j0 allows only simple initializers including int, float and char.

type identifier;
type identifier [];
type identifier = literal;
A slightly simpler syntax is allowed for parameter lists, which do not allow initializers.

In Jzero Level One, only one identifier is declared with each such variable declaration. In Jzero Level Two, variable declarations may be comma-separated lists of identifiers, each of which may have array or initializer suffixes.

4. Data Types

4.1 Numbers

All data types listed are used by j0 and are described in the reserved words section. char, short, int, long, float, double

Although the C language automatically converts between numeric types as needed (promotion and demotion), j0 does not.

4.2 Strings

These are usual C null-terminated char* arrays.

Note that (void *) values automatically convert to any other pointer type, but no other pointer type conversions, nor casts, are supported in j0. I wouldn't care about supporting void * at all, if you want to implement malloc as returning char *, that's fine.

4.3 Arrays

As listed in the introduction Java has multidimensional arrays whereas j0 single dimensional arrays. For example:

int [] num = new int[100]; 

or

int [] num;
num = new int[100]; 

Such array construction is the only use of reserved word new in j0.

Java supports empty square brackets array syntax on either side of the identifier, but j0 only supports empty square brackets on the left of the identifier.

4. Library Functions

j0 supports a small subset of the functionality of a small subset of Java's standard libraries.

FunctionImport?Use
System.out.println(s)Prints to stdout, simplified
random number
string stuff
...

array .get() and .set()
System.out.print(s)
System.out.println(s)
String.charAt(n)
String.equals(s)
String.compareTo(s)  // ? do we need both this and equals()?
String.length()
String.toString(i) vs. String.valueOf()  ??
InputStream.read()   // ? is there a better input?
System.in.read() ?
  
At Jzero Level 2, add:
String.substring(x,y)
java.util.Random.nextInt()
java.lang.Math.abs()
java.lang.Math.max()
java.lang.Math.min()
java.lang.Math.pow()
At Jzero Level 3, add:
String.indexOf()
String.split()
java.lang.Math.cos()
java.lang.Math.sin()
java.lang.Math.tan()

Summary

Sure, j0 may be a toy language designed for a compiler class. Even with only this much, it may provide a convenient notation for a lot of simple programming tasks such as those faced by students in CSE 113.