In this week's "reverse engineering lab", we will study the output of gcc -S.
One goal is to update Dr. J's final code generation guide to be current/correct for login.cs.nmt.edu's version of GCC, which is GCC 11.4 and may have many differences since code.html was written. In order to get very much out of the reverse engineering process, you should begin by skimming Chapter 10 of professor Thain's book, and maybe looking at Chapter 11. We may also need to consult documents on x86_64 calling conventions.
References:
.file "hello.c" pseudo, what file are we .text which section, defaults to 0 (code) .section .rodata read-only data section .LC0: a label for the following string data .string "hello, world" almost C format! .text .globl main make this name visible to other modules in the linker .type main, @function make this name denote a function main: a label for the following code .LFB0: another label, for a function beginning .cfi_startproc call frame information, needed for exception handling pushq %rbp first actual instruction saves rbp .cfi_def_cfa_offset 16 canonical frame address 16 bytes above rsp .cfi_offset 6, -16 previous value of register 6 is at -16 from CFA movq %rsp, %rbp set new frame pointer at top of register .cfi_def_cfa_register 6 use register 6 leaq .LC0(%rip), %rdi load address of string into %rdi call puts@PLT call puts() movl $0, %eax set eax to 0 popq %rbp restore old base pointer .cfi_def_cfa 7, 8 CFA in register 7 at offset 8 ret return! .cfi_endproc end of function .LFE0: label for end of function .size main, .-main size of function .ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0" debugging info .section .note.GNU-stack,"",@progbits debugging sectionAssemble it with
as -o hello.o hello.s to produce a hello.o.
Link it with ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/7/crtbegin.o hello.o -lc /usr/lib/gcc/x86_64-linux-gnu/7/crtend.o /usr/lib/x86_64-linux-gnu/crtn.o
| intermediate code instruction |
Old x86_64
equivalent |
2024 x86_64 from GCC 11.4 | Comment | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| int x; (global) |
.comm x,8,8
|
| name,size,alignment | ||||||||||||||
|
x := y + z (C global variables) |
movl y(%rip), %edx
|
|
The register %rip,
which is not mentioned in Bryant/O'Halloran Figure 2, is
the instruction pointer, a.k.a program counter, 64-bit edition.
| ||||||||||||||
|
x := y + z (local variables) |
movl -4(%rbp), %eax
|
movq -16(%rbp), %rdx
| |||||||||||||||
|
x := y + z (class foo variables) |
movq %rdi, -8(%rbp) ; t1 = self
optimizes (-O2) to
|
|
Note main issue of memory layout for fields x,y,z at offsets
0,8,16; these are known at compile-time for static/non-virtual OOP.
A dynamic/virtual would treat as self.x = self.y + self.z and implement field op via runtime call or table lookup. | ||||||||||||||
|
x := y / z (C global variables) |
movl y(%rip), %eax
|
| Sarl, the shift-right, seems to fill dx out with the sign bit of ax. idivl seems to use a 64-bit numerator and divide it by a 32-bit denominator. | ||||||||||||||
|
x := - y (local variables) |
movl -4(%rbp), %eax
|
| |||||||||||||||
|
x := y (local variables) |
movl -4(%rbp), %eax
|
| Note: mov does not do direct memory-to-memory copy | ||||||||||||||
| x := &y (y global) |
movq $y, -8(%rbp)
|
| Note: $y apparently gives absolute address of y, mov instruction apparently will store this to a memory address in register-relative form | ||||||||||||||
| x := &y (y local) |
leaq -12(%rbp), %rax
|
| Load effective address. Instead of fetching contents of -12(%rbp). | ||||||||||||||
| x := *y | |||||||||||||||||
| *x := y | |||||||||||||||||
| goto L | jmp L | ||||||||||||||||
| if x < y then goto L |
movl x, %rax
|
| Full set of "condition code bits" in the condition registers, for the various comparison operators. | ||||||||||||||
| if x then goto L |
cmpq $0, -8(%rbp)
|
|
| ||||||||||||||
| if !x then goto L |
cmpq $0, -8(%rbp)
|
|
Why not:cmpq $0, -8(%rbp)
| ||||||||||||||
| param x |
movq -8(%rbp), reg
|
| Calculate what parameter # you are by counting how many instructions
in the linked list until you get to the CALL instruction. Params 1-6
are passed in registers. Others on the stack.
| ||||||||||||||
| call p,n,x | If call is to a member function, I hope you remembered to insert/push "self" object as first parameter for method invocation | ||||||||||||||||
| return x |
movl -8(%bsp), %eax
|
| Load return value to into ax register, then jump to end to return | ||||||||||||||
| global x,n1,n2 | treat globals as class variables of some "global" singleton? | ||||||||||||||||
| proc x,n1,n2 |
.text
| ||||||||||||||||
| local x,n | |||||||||||||||||
| label Ln | |||||||||||||||||
| end |
Lend:
| Counter for .LFEn incremented for each function in file
| |||||||||||||||
| x := y field z | may involve y's class | ||||||||||||||||
| class x,n1,n2 | |||||||||||||||||
| field x,n |