CSE 423 Lab #11: Reverse Engineer Some X86-64

Turnin two hours of work on Canvas, relevant to your Final Code homework. Even if you are working on TAC-C output, you should look at what X86_64 looks like.

In this week's "reverse engineering lab", we will study the output of gcc -S.

One goal is to update Dr. J's final code generation guide to be current/correct for login.cs.nmt.edu's version of GCC, which is GCC 11.4 and may have many differences since code.html was written. In order to get very much out of the reverse engineering process, you should begin by skimming Chapter 10 of professor Thain's book, and maybe looking at Chapter 11. We may also need to consult documents on x86_64 calling conventions.

References:

Hello World

gcc -S hello.c produces a hello.s. On the login server in 2021 it looked like the following. Commentary in red is not part of the file.
	.file	"hello.c"        pseudo, what file are we
	.text                    which section, defaults to 0 (code)
	.section	.rodata  read-only data section
.LC0:                            a label for the following string data
	.string	"hello, world"   almost C format!
	.text
	.globl	main             make this name visible to other modules in the linker
	.type	main, @function  make this name denote a function
main:                            a label for the following code
.LFB0:                           another label, for a function beginning
	.cfi_startproc           call frame information, needed for exception handling
	pushq	%rbp             first actual instruction saves rbp
	.cfi_def_cfa_offset 16   canonical frame address 16 bytes above rsp
	.cfi_offset 6, -16       previous value of register 6 is at -16 from CFA
	movq	%rsp, %rbp       set new frame pointer at top of register
	.cfi_def_cfa_register 6  use register 6
	leaq	.LC0(%rip), %rdi load address of string into %rdi
	call	puts@PLT         call puts()
	movl	$0, %eax         set eax to 0
	popq	%rbp             restore old base pointer
	.cfi_def_cfa 7, 8        CFA in register 7 at offset 8
	ret                      return!
	.cfi_endproc             end of function
.LFE0:                           label for end of function
	.size	main, .-main     size of function
	.ident	"GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0" debugging info
	.section	.note.GNU-stack,"",@progbits debugging section
Assemble it with as -o hello.o hello.s to produce a hello.o. Link it with
ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/7/crtbegin.o hello.o -lc /usr/lib/gcc/x86_64-linux-gnu/7/crtend.o /usr/lib/x86_64-linux-gnu/crtn.o

Updated Final Code Generation Guide

This document provides a guide to the generation of target code from intermediate three-address instructions. This is the x86_64 edition of this document. The methodology for creating it is as important as its unfinished contents. This guide is produced by reverse-engineering, that is to say, by examining the output of "gcc -S". It potentially needs to be updated whenever the gcc version changes.

     intermediate     
code
instruction
               Old x86_64               
equivalent
2024 x86_64 from GCC 11.4 Comment
int x; (global) .comm x,8,8 name,size,alignment
x := y + z
(C global variables)
movl y(%rip), %edx
movl z(%rip), %eax
leal (%rdx,%rax), %eax
movl %eax, x(%rip)
The register %rip, which is not mentioned in Bryant/O'Halloran Figure 2, is the instruction pointer, a.k.a program counter, 64-bit edition.
x := y + z
(local variables)
movl -4(%rbp), %eax
movl -8(%rbp), %edx
leal (%rdx,%rax), %eax
movl %eax, -12(%rbp)
movq -16(%rbp), %rdx
movq -8(%rbp), %rax
addq %rdx, %rax
movq %rax, -24(%rbp)
x := y + z
(class foo variables)
movq %rdi, -8(%rbp) ; t1 = self
movq -8(%rbp), %rax ; rax = self
movq 8(%rax), %rdx ; rdx = self->y
movq -8(%rbp), %rax ; rax = self
movq 16(%rax), %rax ; rax = self->z
addq %rax, %rdx ; rdx = y+z
movq -8(%rbp), %rax ; rax = self
movq %rdx, (%rax) ; self->x = rdx

optimizes (-O2) to

movq 16(%rdi), %rax
addq 8(%rdi), %rax
movq %rax, (%rdi)

Note main issue of memory layout for fields x,y,z at offsets 0,8,16; these are known at compile-time for static/non-virtual OOP. A dynamic/virtual would treat as
self.x = self.y + self.z
and implement field op via runtime call
or table lookup.
x := y / z
(C global variables)
movl y(%rip), %eax
movl %eax, %edx
sarl $31, %edx
idivl z(%rip)
movl %eax, x(%rip)
Sarl, the shift-right, seems to fill dx out with the sign bit of ax. idivl seems to use a 64-bit numerator and divide it by a 32-bit denominator.
x := - y
(local variables)
movl -4(%rbp), %eax
negl %eax
movl %eax, -8(%rbp)
x := y
(local variables)
movl -4(%rbp), %eax
movl %eax, -8(%rbp)
Note: mov does not do direct memory-to-memory copy
x := &y (y global) movq $y, -8(%rbp) Note: $y apparently gives absolute address of y, mov instruction apparently will store this to a memory address in register-relative form
x := &y (y local) leaq -12(%rbp), %rax
movq %rax, -8(%rbp)
Load effective address. Instead of fetching contents of -12(%rbp).
x := *y
*x := y
goto L jmp L
if x < y then goto L movl x, %rax
cmpq y, %rax
jle L
Full set of "condition code bits" in the condition registers, for the various comparison operators.
if x then goto L cmpq $0, -8(%rbp)
jne L
if !x then goto L         cmpq $0, -8(%rbp)
        jne L'
        jmp L
L':
Why not:
cmpq $0, -8(%rbp)
je L
param x movq -8(%rbp), reg Calculate what parameter # you are by counting how many instructions in the linked list until you get to the CALL instruction. Params 1-6 are passed in registers. Others on the stack.
param #123456
%rdi%rsi%rdx%rcx%r8%r9
call p,n,x If call is to a member function, I hope you remembered to insert/push "self" object as first parameter for method invocation
return x movl -8(%bsp), %eax
jmp Lend
Load return value to into ax register, then jump to end to return
global x,n1,n2 treat globals as class variables of some "global" singleton?
proc x,n1,n2      .text
     .p2align 4,,15
.globl f
     .type f, @function
f:
.LFBn:
     .cfi_startproc
...
local x,n
label Ln
end Lend:
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFEn:
        .size func, .-func
Counter for .LFEn incremented for each function in file
x := y field z may involve y's class
class x,n1,n2
field x,n