Ucode Code Generation Guide

This document provides a guide to the generation of target code from intermediate three-address instructions. This is the Unicon ucode edition of this document.

The methodology for creating this guide is as important as its unfinished contents. This guide is produced by reverse-engineering, that is to say, by examining the output of "unicon -c". It potentially needs to be updated whenever the Unicon version changes.

Ucode file format basics

Ucode consists of a header section (the "u2" section) followed by a control-L, followed by a code section (the "u1" section)

Header section

Generally, the u2 section is a few fixed fields followed by a list of the public global identifiers that are declared in this module, for use by the linker. Consider this example
ucodecomment
version	U12.1.00
uid	u1.u1-1681329218-0
impl	local
global	1
	0,000005,main,0
^L
ucode version number, always (for now) U12.1.00
unique identifier: uid<tab>filename.u1-randomnumber-0
what to do with undeclared variables: impl<tab>local
global region: global<tab>N
globals, one per line: index,flags,name,0
single form-feed character on a line by itself

Code (u1) Section

Generally, this is zero or more function bodies. Each function body starts with header line, then has a declarations section, then a declend line, followed by ucode instructions, until an end pseudoinstruction.

Declarations Section

The declarations section contains two subsections consisting of zero or more variable declarations followed by zero or more literal constant declarations. Each subsection is numerically ordered sequentially using assigned indices starting at zero. Variables are introduced using the pseudo instruction local with operands consisting of its index, flags, and the declared variable name. Literal constants are similarly comprised of an index, then flags, followed by the literal constant data.

The flags are a string of six octal format characters containing ASCII 0-7 and denoting three bits per character for a total of 18 bits, of which 13 are used. The same flags format is shared by variables and literals, so many of the flag bits are mutually exclusive. From unicon/src/icont/link.h, the definitions of the bits, which may be tested for independently, are:
Flag Name Octal Value Description
F_Global 01 variable declared global externally
F_Unref 02 procedure is unreferenced
F_Proc 04 procedure
F_Record 010 record
F_Dynamic 020 variable declared local dynamic
F_Static 040 variable declared local static
F_Builtin 0100 identifier refers to built-in procedure
F_ImpError 0400 procedure has default error
F_Argument 01000 variable is a formal parameter
F_IntLit 02000 literal is an integer
F_RealLit 04000 literal is a real
F_StrLit 010000 literal is a string
F_CsetLit 020000 literal is a cset

Lines are indented by a tab character except for the header line and labels. After the mnemonic, lines that contain operands have a tab character and then one or more operands separated by commas. I expect that TAB characters (control-I) may not be substituted using spaces. An example of this format is show below.
Ucodecomment
proc name
	local	0,flags,varname
	...
	local	n,flags,varname
	con	0,flags,...
	...
	con	m,flags,...
	declend
	...code instructions
	end

	see intermediate mapping section for flags examples


	see intermediate mapping section for constant formats





Ucode Instruction Set and Semantics

The Unicon VM is the Icon VM, with most additions occurring in the runtime system. The instruction set is a stack machine that is strongly typed. It features goal-directed evaluation with implicit backtracking, which mainly means you mark and unmark groups of instructions, within which expression failure causes an implicit goto that exits the group of instructions.

See Appendices B and C of The Implementation of Icon and Unicon for more complete information. Ask the instructor as needed.

Mapping TAC intermediate code to ucode

     intermediate     
code
instruction
ucode equivalent comment
global
global	N
N = # of globals
global x
	i,000001,x,0
index#,flagbits,varname,0
const 42
	i,002000,2,42
index#,flagbits,#digits,digits
const 3.14
	i,004000,3.14
index#,flagbits,real#
const "hello"
	i,010000,5,150,145,154,154,157
index#,flagbits,#chars,octal1,...,octaln
x := y + z
(global variables in slots i, j, k)
	mark	Ln
	pnull
	var	ix
	pnull
	var	jy
	var	kz
	plus
	asgn
	unmark
lab Ln
stack machine; push slots for results, operands and then do instructions. Locals and globals are always using a local index #; the linker sorts out who is actually global.
local x
	local	i,000000,varname

in between the proc...declend pseudoinstructions, in sequential order
x := y + z
(local variables)
	mark	Ln
	pnull
	var	ix
	pnull
	var	jy
	var	kz
	plus
	asgn
	unmark
lab Ln
Same as for globals. Code generated does not depend on scope/region...
x := y / z
	mark	Ln
	pnull
	var	ix
	pnull
	var	jy
	var	kz
	div
	asgn
	unmark
lab Ln
Same as for other binary operators, except the div instruction
x := - y
	mark	Ln
	pnull
	var	ix
	pnull
	var	jy
	neg
	asgn
	unmark
lab Ln
x := y
	mark	Ln
	pnull
	var	ix
	var	jy
	asgn
	unmark
lab Ln
x := &y (y global) n/a see instructions used e.g. for lists and tables
x := &y (y local) n/a see instructions used e.g. for lists and tables
x := *y n/a see instructions used e.g. for lists and tables
*x := y n/a see instructions used e.g. for lists and tables
goto L goto L beware marks and unmarks. L4 might start with an unmark. goto instruction might be preceded by pnull
if x < y then
    goto Lm
	mark	Ln
	mark0
	pnull
	var	1
	var	0
	numgt
	unmark
	unmark
	unmark
	pnull
	goto	Lm
	unmark
lab Ln
Full set of instructions for comparison operators. Beware numbers of marks/unmarks.
if x then
    goto Lm
	mark	Ln
	mark0
	pnull
	var	0
	int	1
	numeq
	unmark
	unmark
	unmark
	pnull
	goto	Lm
	unmark
lab Ln
if !x then goto L         cmpq $0, -8(%rbp)
        jne L'
        jmp L
L':
Why not:
cmpq $0, -8(%rbp)
je L
param x Calculate what parameter # you are by counting how many instructions in the linked list until you get to the CALL instruction.
call p,n,x If call is to a member function, I hope you remembered to insert/push "self" object as first parameter for method invocation
return x
	mark	Ln
	var	ix
	pret
lab Ln
return
	mark	Ln
	pnull
	pret
lab Ln
proc x,n1,n2
	i,000005,x,0		# in globals
	...
proc x
	...locals
	declend
	...code
	pfail
	end
label Ln
lab Ln
end
	pfail
	end
x := y field z
	mark	Ln
	pnull
	var	ix
	pnull
	var	jy
	field	z
	asgn
	unmark
lab Ln