0% found this document useful (0 votes)
96 views125 pages

Slides

The document summarizes a workshop on practical reverse engineering for Linux x86-64. The 4-hour workshop will provide an introduction to Linux x86-64 reverse engineering, including vocabulary like instructions and addressing modes, grammar like compiler patterns and optimizations, and hands-on practice with a virtual machine loaded with tools like Ghidra. Attendees will learn reverse engineering techniques like analyzing prologue/epilogue code, stack cookies, register usage, and compiler optimizations by disassembling and debugging sample code.

Uploaded by

Marta Ruiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views125 pages

Slides

The document summarizes a workshop on practical reverse engineering for Linux x86-64. The 4-hour workshop will provide an introduction to Linux x86-64 reverse engineering, including vocabulary like instructions and addressing modes, grammar like compiler patterns and optimizations, and hands-on practice with a virtual machine loaded with tools like Ghidra. Attendees will learn reverse engineering techniques like analyzing prologue/epilogue code, stack cookies, register usage, and compiler optimizations by disassembling and debugging sample code.

Uploaded by

Marta Ruiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 125

A Practical Introduction to Reverse Engineering

Workshop
Singapore – December, 3rd 2022
Introduction
Workshop Presentation

• Introduction to x86-64 Linux reverse engineering.

• 4-hours workshop + 1 hour for 1x1 questions.

• Driven by hands-on.

• A 30 minutes evaluation at the end of the session.

1
VM

• Ubuntu 22.04

• Login: re / Password: re

• Ghidra 10.2

2
About

• Security Engineer

• Enjoy reverse engineering and development

• Mostly doing reverse on mobile (Android & iOS)

Open Obfuscator
3
Practical Reverse Engineering
Reverse Engineering

The purpose of reverse engineering is to highlight a functionality or an asset without


having access to the original information (e.g. the source code).

4
Reverse Engineering

Functionalities:

• An algorithm.

5
Reverse Engineering

Functionalities:

• An algorithm.
• A check.

5
Reverse Engineering

Functionalities:

• An algorithm.
• A check.
• A structure.

5
Reverse Engineering

Assets:

• Password.

6
Reverse Engineering

Assets:

• Password.
• An API Key.

6
Reverse Engineering

Original information:

• Without the source code.

7
Reverse Engineering

Original information:

• Without the source code.


• Without the symbols.

7
Reverse Engineering

Original information:

• Without the source code.


• Without the symbols.
• With obfuscation.

7
Reverse Engineering

source.cpp executable.bin
object.o library.so

clang / gcc / cl.exe ld / link.exe strip

Compiler linker post-link actions

source.cpp

8
Reverse Engineering

source.cpp executable.bin
object.o library.so

clang / gcc / cl.exe ld / link.exe strip

Compiler linker post-link actions

source.cpp

Static library reverse engineering Lucky reverse engineering : Classical reverse


(object �les) with symbols engineering

9
Linux x86-64 Reverse Engineering

10
Linux x86-64 Reverse Engineering

Learning reverse engineering is somehow similar to learning a new language:

• Vocabulary

11
Linux x86-64 Reverse Engineering

Learning reverse engineering is somehow similar to learning a new language:

• Vocabulary • Instructions

11
Linux x86-64 Reverse Engineering

Learning reverse engineering is somehow similar to learning a new language:

• Vocabulary • Instructions

• Grammar

11
Linux x86-64 Reverse Engineering

Learning reverse engineering is somehow similar to learning a new language:

• Vocabulary • Instructions

• Grammar • Addressing Modes/ABI Conventions

11
Linux x86-64 Reverse Engineering

Learning reverse engineering is somehow similar to learning a new language:

• Vocabulary • Instructions

• Grammar • Addressing Modes/ABI Conventions

• Idioms/Expressions

11
Linux x86-64 Reverse Engineering

Learning reverse engineering is somehow similar to learning a new language:

• Vocabulary • Instructions

• Grammar • Addressing Modes/ABI Conventions

• Idioms/Expressions • Compiler Patterns/Optimizations

11
Linux x86-64 Reverse Engineering

push r12
push rbp
push rbx
mov rbx, rdi
mov rdi, rsi
sub rsp, 60h
mov r12, [rbx+28h]
mov rax, fs:28h
mov [rsp+78h�var_20], rax
xor eax, eax
movzx esi, byte ptr [r12+10h]
movss xmm0, dword ptr [r12+8]
call sub_16C90
test rax, rax
jz loc_B0CD
mov rbp, rax
cmp [rbx+10h], rax
jz loc_B114

12
Linux x86-64 Reverse Engineering

push r12
push rbp
push rbx Prologue
mov rbx, rdi
mov rdi, rsi
sub rsp, 60h
mov r12, [rbx+28h]
mov rax, fs:28h Stack Cookies
mov [rsp+78h�var_20], rax
xor eax, eax Compiler Optimization
movzx esi, byte ptr [r12+10h]
movss xmm0, dword ptr [r12+8]
call sub_16C90
test rax, rax Registers
jz loc_B0CD
mov rbp, rax
cmp [rbx+10h], rax
jz loc_B114
Instructions
13
x86-64: Registers

push r12
push rbp
push rbx rax eax ah al
mov rbx, rdi
mov rdi, rsi 64 Bits
32 Bits

sub rsp, 60h 16 Bits

mov r12, [rbx+28h]


mov rax, fs:28h
mov [rsp+78h�var_20], rax
xor eax, eax
movzx esi, byte ptr [r12+10h]
movss xmm0, dword ptr [r12+8]
call sub_16C90
test rax, rax
jz loc_B0CD
mov rbp, rax
cmp [rbx+10h], rax
jz loc_B114

14
x86-64: Registers

push r12
push rbp
push rbx rax eax ah al
mov rbx, rdi
mov rdi, rsi 64 Bits
32 Bits

sub rsp, 60h 16 Bits

mov r12, [rbx+28h]


mov rax, fs:28h rbx ebx bh bl
mov [rsp+78h�var_20], rax
xor eax, eax
movzx esi, byte ptr [r12+10h]
movss xmm0, dword ptr [r12+8]
rcx ecx ch cl
call sub_16C90
test rax, rax
jz loc_B0CD
mov rbp, rax
cmp [rbx+10h], rax rdx edx dh dl
jz loc_B114

15
x86-64: Registers

push r12
push rbp
push rbx rax eax ah al
mov rbx, rdi
mov rdi, rsi 64 Bits
32 Bits

sub rsp, 60h rax 16 Bits

mov r12, [rbx+28h] rbx


rcx
mov rax, fs:28h rdx
mov [rsp+78h�var_20], rax rsi
xor eax, eax rdi
movzx esi, byte ptr [r12+10h] rbp
movss xmm0, dword ptr [r12+8] rsp
r8
call sub_16C90 r9
test rax, rax r10
jz loc_B0CD r11
mov rbp, rax r12
r13
cmp [rbx+10h], rax
r14
jz loc_B114 r15
16 General-Purpose Registers
16
x86-64: Registers

push r12
push rbp
push rbx rax eax ah al
mov rbx, rdi
mov rdi, rsi 64 Bits
32 Bits

sub rsp, 60h rax 16 Bits

mov r12, [rbx+28h] rbx


rcx Calling convention &
mov rax, fs:28h rdx
mov [rsp+78h�var_20], rax
scratch registers
rsi
xor eax, eax rdi
movzx esi, byte ptr [r12+10h] rbp
rsp
Stack registers
movss xmm0, dword ptr [r12+8]
r8
call sub_16C90 r9
test rax, rax r10
jz loc_B0CD r11
mov rbp, rax r12 Scratch registers but
r13 callee-saved.
cmp [rbx+10h], rax
r14
jz loc_B114 r15
16 General-Purpose Registers
17
x86-64: Instructions
push r12
push rbp
push rbx
mov rbx, rdi
mov rdi, rsi
sub rsp, 60h
mov r12, [rbx+28h] mov DST, SRC
mov rax, fs:28h
mov rax, rdi
mov [rsp+78h�var_20], rax
mov ecd, dl
xor eax, eax mov rdi, qword ptr [rsi + 0�8]
movzx esi, byte ptr [r12+10h] mov byte ptr [rsp], cl
movss xmm0, dword ptr [r12+8] mov rax, 0�123
call sub_16C90
test rax, rax
jz loc_B0CD
mov rbp, rax
cmp [rbx+10h], rax
jz loc_B114
18
x86-64: Instructions
push r12
push rbp
push rbx
mov rbx, rdi
mov rdi, rsi
sub rsp, 60h
mov r12, [rbx+28h] push rax
mov rax, fs:28h pop rbb
mov [rsp+78h�var_20], rax
xor eax, eax
movzx esi, byte ptr [r12+10h]
movss xmm0, dword ptr [r12+8]
call sub_16C90
test rax, rax
jz loc_B0CD
mov rbp, rax
cmp [rbx+10h], rax
jz loc_B114
19
x86-64: Instructions
push r12
push rbp
push rbx
mov rbx, rdi
mov rdi, rsi
sub rsp, 60h
mov r12, [rbx+28h] add rax, rbx
mov rax, fs:28h sub rdx, rcx
xor eax, eax
mov [rsp+78h�var_20], rax
or rax, rax
xor eax, eax (���)
movzx esi, byte ptr [r12+10h]
movss xmm0, dword ptr [r12+8]
call sub_16C90
test rax, rax
jz loc_B0CD
mov rbp, rax
cmp [rbx+10h], rax
jz loc_B114
20
x86-64: Instructions
push r12
push rbp
push rbx
mov rbx, rdi
mov rdi, rsi
sub rsp, 60h
mov r12, [rbx+28h] jmp loc_2345
mov rax, fs:28h jmp rax
jnz rcx
mov [rsp+78h�var_20], rax
xor eax, eax call FUNC_123
movzx esi, byte ptr [r12+10h] call rax
movss xmm0, dword ptr [r12+8]
call sub_16C90
test rax, rax
jz loc_B0CD
mov rbp, rax
cmp [rbx+10h], rax
jz loc_B114
21
Compiler Optimizations

22
x86-64: Instructions

mov rax, rbx


rax = rbx + 2
add rax, 0�2

23
x86-64: Compiler Optimizations

rax = rbx + 2 lea rax, [rbx + 0�2]

24
x86-64: Compiler Optimizations

rax �� 0 mov rax, 0�0 48 c7 c0 00 00 00 00

25
x86-64: Compiler Optimizations

rax �� 0 xor rax, rax 48 31 c0

26
x86-64: Compiler Optimizations

mov rax, <X>


mov rcx, 0�8
X % 8
idiv rcx
mov rax, rdx

27
x86-64: Compiler Optimizations

mov rax, <X>


X % 8
and rax, 7

28
x86-64: Compiler Optimizations

mov rax, <X> 48 8b 45 f8


mov rcx, 26 b9 1a 00 00 00
X % 26
div rcx 48 f7 f1
mov rax, rdx 48 89 d0

29
x86-64: Compiler Optimizations

mov rax, <X> 48 89 f8


push 26 6a 1a
X % 26 pop rcx 59
div rcx 48 f7 f1
mov rax, rdx 48 89 d0

30
x86-64: Compiler Optimizations

movabs rcx, 0�4ec4ec4ec4ec4ec5


mov rax, <X>
mul rcx
shr rdx, 0�3
X % 26 lea rax, [rdx�rdx*4]
lea rax, [rax�rax*4]
add rax, rdx
sub <X>, rax

31
x86-64: Compiler Optimizations

movabs rcx, 0�4ec4ec4ec4ec4ec5


mov rax, <X>
mul rcx
shr rdx, 0�3
lea rax, [rdx�rdx*4]
lea rax, [rax�rax*4]
add rax, rdx
sub <X>, rax

32
x86-64: Compiler Optimizations

movabs rcx, 0�4ec4ec4ec4ec4ec5


mov rax, <X>
mul rcx
shr rdx, 0�3
lea rax, [rdx�rdx*4]
lea rax, [rax�rax*4]
add rax, rdx
sub <X>, rax

33
Calling Convention

34
x86-64: Calling Convention

A calling convention de�nes how registers should


be used when calling a function.

This convention depends on:

1. The architecture
2. The operating system

It also de�nes which registers must be preserved


when calling functions.

35
x86-64: Calling Convention

A calling convention de�nes how registers should int x = 1;


be used when calling a function. int y = 2;
int result = compute(x, y);

This convention depends on:

1. The architecture mov     dword ptr [rbp-0Ch], 1


2. The operating system mov     dword ptr [rbp-8],   2
mov     edx, [rbp-8]
mov     eax, [rbp-0Ch]
mov     esi, edx
mov     edi, eax
It also de�nes which registers must be preserved call    compute
when calling functions. mov     [rbp-4], eax

36
x86-64: Calling Convention

int x = 1;
RDI
int y = 2;
RSI
int result = compute(x, y);
RDX
RCX
R8
R9
Other parameters are passed through the stack
mov     dword ptr [rbp-0Ch], 1
call    compute mov     dword ptr [rbp-8],   2
mov     edx, [rbp-8]
Return Value: RAX
mov     eax, [rbp-0Ch]
mov     esi, edx
mov     edi, eax
call    compute
mov     [rbp-4], eax

37
x86-64: Prologue / Epilogue

When calling compute(), the calling convention allows computes() to


modify rax, rcx, rdx, r8, r9, r10, r11. int x = 1;
int y = 2;
But compute() is not allowed to
modify the values of rbx, rbp, rdi, rsi, rsp, r12, r13, r14, r15.
int result = compute(x, y);

38
x86-64: Prologue / Epilogue

push r15
"But compute() is not allowed to
push r14
modify the values of rbx, rbp, rdi, rsi, rsp, [...]"
push r13
Backup registers that are callee-saved push r12
push rbp
push rbx

Allocate space for stack variables sub rsp, 0�98

mov rax, fs:28h
Initialize the stack cookie mov [rsp+0�88], rax

Restore the stack add rsp, 0�98

push rbx
push rbp
pop r12
Restore the callee-saved registers pop r13
pop r14
pop r15

39
x86-64: Endianness

uintptr_t* memory = ...;
�memory = 0�11223344;

44 33 22 11 00 00 00 00

x86-64 is a little-endian architecture which means that the least


signi�cant byte is stored at the "highest" memory address.

Basically, the memory representation is reversed compared to the in-


register 12 45 32 AE E3 00 32 11
representation.

mov     rax, [rbp+var_8]

RAX = 0x113200E3AE324512

40
x86-64: Optimization

41
Linux Execution Bootstrapping

42
Linux Execution Bootstrapping

int main(int argc, char** argv) {
All executables must define a main()
 printf("Hello World\n");
function which is the first function being
 return 0;
executed when the executable starts.
}

43
Linux Execution Bootstrapping

$ readelf ��f�le�header compiled.bin
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI�                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable f�le)
int main(int argc, char** argv) {
  Machine:                           Advanced Micro Devices X86-64   printf("Hello World\n");
  Version:                           0�1
  Entry point address:               0�5eb0   return 0;
  Start of program headers:          64 (bytes into f�le)
  Start of section headers:          136080 (bytes into f�le) }
  F�ags:                             0�0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         13
  Size of section headers:           64 (bytes)
  Number of section headers:         26
  Section header string table index: 25

44
Linux Execution Bootstrapping

$ readelf ��f�le�header compiled.bin
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI�                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable f�le)
int main(int argc, char** argv) {
  Machine:                           Advanced Micro Devices X86-64   printf("Hello World\n");
  Version:                           0�1
  Entry point address:               0�5eb0   return 0;
  Start of program headers:          64 (bytes into f�le)
  Start of section headers:          136080 (bytes into f�le) }
  F�ags:                             0�0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         13
int main(int argc, char** argv) {
  Size of section headers:           64 (bytes)
  Number of section headers:         26
  printf("Hello World\n");
  Section header string table index: 25   return 0;
}

��libc_start_main(&main, ...);
45
Linux Execution Bootstrapping

Cross-References
Cross-Ref
Entry Point

46
Linux Execution Bootstrapping

47
Demo

48
1 Hands-on #1: Simple Crackme

49
1

Level: Easy

$ ./crackme.elf foo
Missing login ��� Objectives: Getting started with reverse
engineering and compiler optimizations.
Try again!
$ ********** ./crackme.elf ****
Well done!

ELF x86-64
Involves Basic Mathematics Not Stripped

50
Hints

• http://flaviojslab.blogspot.com/2008/02/integer-division.html
• The charset of the password is abcdefghijklmnopqrstuvwxyz (lower case)
• What is the priority between xor and add ?

51
Reverse Engineering Structures
Reverse Engineering Structures

struct PointTy{
  int x;
Reverse engineering is not always about
  int y;
understanding a function or an algorithm. };

It might also involve understanding complex int compute(int x, int y) {


  PointTy* P = malloc(sizeof(PointTy));
data types like structures or C++ classes.   P->x = x;
  P->y = y;
  return P->x + P->y;
}

52
Reverse Engineering Structures

sub     rsp, 18h
mov     [rsp+18h+var_4], edi
mov     [rsp+18h+var_8], esi
mov     edi, 8
struct PointTy {
call    _malloc
  int x;
mov     [rsp+18h+var_10], rax
mov     eax, [rsp+18h+var_4]
  int y;
mov     rcx, [rsp+18h+var_10] };
mov     [rcx], eax
mov     eax, [rsp+18h+var_8] int compute(int x, int y) {
mov     rcx, [rsp+18h+var_10]   PointTy* P = malloc(sizeof(PointTy));
mov     [rcx+4], eax   P->x = x;
mov     rax, [rsp+18h+var_10]   P->y = y;
mov     eax, [rax]
  return P->x + P->y;
mov     rcx, [rsp+18h+var_10]
}
add     eax, [rcx+4]
add     rsp, 18h
retn

53
Reverse Engineering Structures

Stack allocation
sub     rsp, 18h
mov     [rsp+18h+var_4], edi
mov     [rsp+18h+var_8], esi
mov     edi, 8 PointTy� P = malloc(sizeof(PointTy));
call    _malloc
mov     [rsp+18h+var_10], rax
mov     eax, [rsp+18h+var_4]
mov     rcx, [rsp+18h+var_10] P��x = x;
mov     [rcx], eax
mov     eax, [rsp+18h+var_8]
mov     rcx, [rsp+18h+var_10] P��y = y;
mov     [rcx+4], eax
mov     rax, [rsp+18h+var_10]
mov     eax, [rax]
mov     rcx, [rsp+18h+var_10] return P��x + P��y;
add     eax, [rcx+4]
add     rsp, 18h
retn Stack deallocation

54
Reverse Engineering Structures
RAX
0

struct PointTy {
16
  int x;
  int y;
}; 24

32

40
55
Reverse Engineering Structures
RAX
0

struct PointTy {
16
  void* x;
  void* y;
}; 24

32

40
56
Reverse Engineering Structures
RAX
0

struct PointTy {
16
  char  x;
  void* y;
}; 24

32

40
57
Reverse Engineering Structures
RAX
0

Padding

struct PointTy {
16
  char  x;
  void* y;
}; 24

32

40
58
Reverse Engineering Structures

int compute(PointTy* P) {
  P->x = 1;
  P->y = 2;
  return P->x + P->y;
}

int main(int argc, char** argv) {
  PointTy P;
  int value = compute(&P);
  return value;
}

59
Reverse Engineering Structures

$ clang�� -O0 [���]

PUSH       RBP
int compute(PointTy* P) { MOV        RBP,RSP
  P->x = 1; SUB        RSP,0�20
MOV        dword ptr [RBP + local_c],0�0
  P->y = 2;
MOV        dword ptr [RBP + local_10],EDI
  return P->x + P->y;
MOV        qword ptr [RBP + local_18],RSI
} LEA        RDI=>local_20,[RBP + -0�18]
CALL       FUN_00101120
int main(int argc, char** argv) { MOV        dword ptr [RBP + local_24],EAX
  PointTy P; MOV        EAX,dword ptr [RBP + local_24]
  int value = compute(&P); ADD        RSP,0�20
  return value; POP        RBP
RET
}

60
Reverse Engineering Structures

$ clang�� -O0 [���]

�� main
int compute(PointTy* P) { undef�ned4 FUN_00101150(undef�ned4 param_1,���) {
  P->x = 1;   undef�ned4 uVar1;
  P->y = 2;   undef�ned  local_20 [8];
  undef�ned8 local_18;
  return P->x + P->y;   undef�ned4 local_10;
}   undef�ned4 local_c;

  local_c = 0;
int main(int argc, char** argv) {
  local_18 = param_2;
  PointTy P;   local_10 = param_1;
  int value = compute(&P);   uVar1 = FUN_00101120(local_20);
  return value;   return uVar1;
}
}

61
Reverse Engineering Structures

$ clang�� -O1 [���]

int compute(PointTy* P) {
  P->x = 1;
  P->y = 2; PUSH       RAX
  return P->x + P->y;
MOV        RDI,RSP
}
CALL       FUN_00101120
int main(int argc, char** argv) { POP        RCX
  PointTy P; RET
  int value = compute(&P);
  return value;
}

62
Reverse Engineering Structures

$ clang�� -O1 [���]

int compute(PointTy* P) {
  P->x = 1;
�� main
  P->y = 2; void FUN_00101150(void) {
  return P->x + P->y;   undef�ned auStack_8[8];
}

int main(int argc, char** argv) {   FUN_00101120(auStack_8);
  PointTy P;
  return;
  int value = compute(&P);
  return value; }
}

63
Reverse Engineering Structures

int compute(PointTy* P) {
  P->x = 1;
  P->y = 2;
  return P->x + P->y;
}

int main(int argc, char** argv) {
  PointTy P;
  int value = compute(&P);
  return value;
}

64
Reverse Engineering Structures

MOV        qword ptr [RSP + local_8],RDI
int compute(PointTy* P) {
MOV        RAX,qword ptr [RSP + local_8]
  P->x = 1;
  P->y = 2; MOV        dword ptr [RAX],0�1
  return P->x + P->y; MOV        RAX,qword ptr [RSP + local_8]
} MOV        dword ptr [RAX + 0�4],0�2
MOV        RAX,qword ptr [RSP + local_8]
int main(int argc, char** argv) { MOV        EAX,dword ptr [RAX]
  PointTy P; MOV        RCX,qword ptr [RSP + local_8]
  int value = compute(&P); ADD        EAX,dword ptr [RCX + 0�4]
  return value;
RET
}

65
Reverse Engineering Structures

int compute(PointTy* P) {
  P->x = 1; �� compute
  P->y = 2; int FUN_00101120(int *param_1) {
  return P->x + P->y;
}
  *param_1 = 1;
  param_1[1] = 2;
int main(int argc, char** argv) {
  PointTy P;   return *param_1 + param_1[1];
  int value = compute(&P); }
  return value;
}

66
Demo

67
2 Hands-on #2: Structures

68
2

Level: Medium
$ ./crackme_medium.elf 01020304
Try again! Objectives: Identify and reverse structures
$ ./crackme_medium.elf ********
Well done!

ELF x86-64
Involves Basic Arithmetic Operations Stripped

69
Reverse Engineering Large Binaries
Reverse Engineering Large Binaries

Most of the programs relies on third-party libraries that can be dynamically or statically
linked.

70
Reverse Engineering Large Binaries

MD5 Computation with OpenSSL


void do_md5(const char* input) {
  uint8_t H[MD5_DIGEST_LENGTH];
  MD5_CTX ctx;

  MD5_Init(&ctx);
  MD5_Update(&ctx, input, strlen(input));
  MD5_Final(H, &ctx);

  char H_str[MD5_DIGEST_LENGTH * 2];
  for (size_t i = 0; i < MD5_DIGEST_LENGTH; ++i) {
    sprintf(&H_str[i * 2], "%02x", H[i]);
  }
  printf("md5('%s')� %s\n", input, H_str);
}

71
Reverse Engineering Large Binaries

MD5 Computation with OpenSSL


void do_md5(const char* input) {
  uint8_t H[MD5_DIGEST_LENGTH];
  MD5_CTX ctx;

  MD5_Init(&ctx);
  MD5_Update(&ctx, input, strlen(input));
  MD5_Final(H, &ctx); $ clang -O0 main.cpp �o main �lcrypto

  char H_str[MD5_DIGEST_LENGTH * 2];
  for (size_t i = 0; i < MD5_DIGEST_LENGTH; ++i) {
    sprintf(&H_str[i * 2], "%02x", H[i]);
  } Dynamic link with OpenSSL
  printf("md5('%s')� %s\n", input, H_str);
}

72
Reverse Engineering Large Binaries

void FUN_001011a0(char *param_1) {
MD5 Computation with OpenSSL   char *data;
  size_t len;
void do_md5(const char* input) {   ulong local_b0;
  uint8_t H[MD5_DIGEST_LENGTH];   char local_a8 [32];
  MD5_CTX local_88;
  MD5_CTX ctx;
  byte local_28 [24];
  char *local_10;
  MD5_Init(&ctx);
  MD5_Update(&ctx, input, strlen(input));   local_10 = param_1;
  MD5_Init(&local_88);
  MD5_Final(H, &ctx);   data = local_10;
  len = strlen(local_10);
  char H_str[MD5_DIGEST_LENGTH * 2];   MD5_Update(&local_88,data,len);
  MD5_Final(local_28,&local_88);
  for (size_t i = 0; i < MD5_DIGEST_LENGTH; ++i) {
  for (local_b0 = 0; local_b0 < 0�10; local_b0 = local_b0 + 1) {
    sprintf(&H_str[i * 2], "%02x", H[i]);     sprintf(local_a8 + local_b0 * 2,"%02x",(ulong)local_28[local_b0]);
  }   }
  printf("md5('%s')� %s\n", input, H_str);   printf("md5(\'%s\')� %s\n",local_10,local_a8);
  return;
}

Dynamically imported functions

73
Reverse Engineering Large Binaries

void FUN_001011a0(char *param_1) {
MD5 Computation with OpenSSL   char *data;
  size_t len;
void do_md5(const char* input) {   ulong local_b0;
  uint8_t H[MD5_DIGEST_LENGTH];   char local_a8 [32];
  MD5_CTX local_88;
  MD5_CTX ctx;
  byte local_28 [24];
  char *local_10;
  MD5_Init(&ctx);
  MD5_Update(&ctx, input, strlen(input));   local_10 = param_1;
  MD5_Init(&local_88);
  MD5_Final(H, &ctx);   data = local_10;
  len = strlen(local_10);
  char H_str[MD5_DIGEST_LENGTH * 2];   MD5_Update(&local_88,data,len);
  MD5_Final(local_28,&local_88);
  for (size_t i = 0; i < MD5_DIGEST_LENGTH; ++i) {
  for (local_b0 = 0; local_b0 < 0�10; local_b0 = local_b0 + 1) {
    sprintf(&H_str[i * 2], "%02x", H[i]);     sprintf(local_a8 + local_b0 * 2,"%02x",(ulong)local_28[local_b0]);
  }   }
  printf("md5('%s')� %s\n", input, H_str);   printf("md5(\'%s\')� %s\n",local_10,local_a8);
  return;
}

Dynamically imported functions

Can't be stripped
74
Reverse Engineering Large Binaries

MD5 Computation with OpenSSL


void do_md5(const char* input) {
  uint8_t H[MD5_DIGEST_LENGTH];
  MD5_CTX ctx;

  MD5_Init(&ctx);
  MD5_Update(&ctx, input, strlen(input));
  MD5_Final(H, &ctx); $ clang -O0 main.cpp �o main libcrypto.a

  char H_str[MD5_DIGEST_LENGTH * 2];
  for (size_t i = 0; i < MD5_DIGEST_LENGTH; ++i) {
    sprintf(&H_str[i * 2], "%02x", H[i]);
  } Static link with OpenSSL
  printf("md5('%s')� %s\n", input, H_str);
}

75
Reverse Engineering Large Binaries

void FUN_0013d0f0(undef�ned8 param_1) {
  undef�ned8 uVar1;
MD5 Computation with OpenSSL   undef�ned8 uVar2;
  ulong uStack_b0;
void do_md5(const char* input) {   undef�ned auStack_a8 [32];
  uint8_t H[MD5_DIGEST_LENGTH];   undef�ned auStack_88 [96];
  undef�ned auStack_28 [24];
  MD5_CTX ctx;   undef�ned8 uStack_10;

  MD5_Init(&ctx);   uStack_10 = param_1;
  FUN_0028ffa0(auStack_88);
  MD5_Update(&ctx, input, strlen(input));   uVar1 = uStack_10;
  MD5_Final(H, &ctx);   uVar2 = strlen(uStack_10);
  FUN_0028f940(auStack_88,uVar1,uVar2);
  FUN_0028fb80(auStack_28,auStack_88);
  char H_str[MD5_DIGEST_LENGTH * 2];
  for (uStack_b0 = 0; uStack_b0 < 0�10; uStack_b0 = uStack_b0 + 1) {
  for (size_t i = 0; i < MD5_DIGEST_LENGTH; ++i) {     sprintf(auStack_a8 + uStack_b0 * 2,&DAT_003a1ed1,auStack_28[uStack_b0]);
    sprintf(&H_str[i * 2], "%02x", H[i]);   }
  printf(&DAT_00363004,uStack_10,auStack_a8);
  }
  return;
  printf("md5('%s')� %s\n", input, H_str);
}
Statically imported functions

76
Reverse Engineering Large Binaries

Dynamic Link Static Link (and stripped)


void FUN_001011a0(char *param_1) { void FUN_0013d0f0(undef�ned8 param_1) {
  char *data;   undef�ned8 uVar1;
  size_t len;   undef�ned8 uVar2;
  ulong local_b0;   ulong uStack_b0;
  char local_a8 [32];   undef�ned auStack_a8 [32];
  MD5_CTX local_88;   undef�ned auStack_88 [96];
  byte local_28 [24];   undef�ned auStack_28 [24];
  char *local_10;   undef�ned8 uStack_10;

  local_10 = param_1;   uStack_10 = param_1;
  MD5_Init(&local_88);   FUN_0028ffa0(auStack_88);
  data = local_10;   uVar1 = uStack_10;
  len = strlen(local_10);   uVar2 = strlen(uStack_10);
  MD5_Update(&local_88,data,len);   FUN_0028f940(auStack_88,uVar1,uVar2);
  MD5_Final(local_28,&local_88);   FUN_0028fb80(auStack_28,auStack_88);
  for (local_b0 = 0; local_b0 < 0�10; local_b0 = local_b0 + 1) {   for (uStack_b0 = 0; uStack_b0 < 0�10; uStack_b0 = uStack_b0 + 1) {
    sprintf(local_a8 + local_b0 * 2,"%02x",(ulong)local_28[local_b0]);     sprintf(auStack_a8 + uStack_b0 * 2,&DAT_003a1ed1,auStack_28[uStack_b0]);
  }   }
  printf("md5(\'%s\')� %s\n",local_10,local_a8);   printf(&DAT_00363004,uStack_10,auStack_a8);
  return;   return;
} }

77
Reverse Engineering Large Binaries

When a library is statically linked and the program correctly stripped1 , the reverse
engineering of the whole binary can be challenging.

1 this step is error prone

78
Reverse Engineering Large Binaries

When a library is statically linked and the program correctly stripped1 , the reverse
engineering of the whole binary can be challenging.
⇒ We can quickly get lost while trying to analyse the binary.

1 this step is error prone

78
Reverse Engineering Large Binaries

Dynamic Link Static Link (and stripped)


void FUN_0013d0f0(undef�ned8 param_1) {
  undef�ned8 uVar1;
Smaller binary size   undef�ned8 uVar2;
  ulong uStack_b0;
  undef�ned auStack_a8 [32];
  undef�ned auStack_88 [96];
  undef�ned auStack_28 [24];
Require that the user has the library with the correct version   undef�ned8 uStack_10;

  uStack_10 = param_1;
  FUN_0028ffa0(auStack_88);
  uVar1 = uStack_10;
  uVar2 = strlen(uStack_10);
Easier to reverse
  FUN_0028f940(auStack_88,uVar1,uVar2);
  FUN_0028fb80(auStack_28,auStack_88);
  for (uStack_b0 = 0; uStack_b0 < 0�10; uStack_b0 = uStack_b0 + 1) {
    sprintf(auStack_a8 + uStack_b0 * 2,&DAT_003a1ed1,auStack_28[uStack_b0]);
  }
  printf(&DAT_00363004,uStack_10,auStack_a8);
  return;
}

79
Reverse Engineering Large Binaries

Dynamic Link Static Link (and stripped)

Smaller binary size Larger binary size

Require that the user has the library with the correct version The library is embedded in the binary

Easier to reverse Can be challenging to reverse

80
Reverse Engineering Large Binaries

• Strings, logs

• Constants

• Functions relative position

81
1. Strings

82
Strings

83
Strings

List of strings in one of the


libraries used in the Android
Pegasus spyware.

assets/injectso_arm
84
Strings

List of strings in one of the


libraries used in the Android
Pegasus spyware.

assets/injectso_arm
85
Strings

86
Strings

Don't spend time on reversing open-source code!

87
2. Constants

88
Constants

void FUN_0013d0f0(undef�ned8 param_1) {
  undef�ned8 uVar1;
  undef�ned8 uVar2;
  ulong uStack_b0;
  undef�ned auStack_a8 [32];
  undef�ned auStack_88 [96];
  undef�ned auStack_28 [24];
  undef�ned8 uStack_10;

  uStack_10 = param_1;
  FUN_0028ffa0(auStack_88);
  uVar1 = uStack_10;
  uVar2 = strlen(uStack_10);
  FUN_0028f940(auStack_88,uVar1,uVar2);
  FUN_0028fb80(auStack_28,auStack_88);
  for (uStack_b0 = 0; uStack_b0 < 0�10; uStack_b0 = uStack_b0 + 1) {
    sprintf(auStack_a8 + uStack_b0 * 2,&DAT_003a1ed1,auStack_28[uStack_b0]);
  }
  printf(&DAT_00363004,uStack_10,auStack_a8);
  return;

89
Constants

void FUN_0013d0f0(undef�ned8 param_1) {
  undef�ned8 uVar1;
  undef�ned8 uVar2;
  ulong uStack_b0;
  undef�ned auStack_a8 [32];
  undef�ned auStack_88 [96];
  undef�ned auStack_28 [24];
  undef�ned8 uStack_10;

  uStack_10 = param_1;
  FUN_0028ffa0(auStack_88);
  uVar1 = uStack_10; MD5 Constants
  uVar2 = strlen(uStack_10);
  FUN_0028f940(auStack_88,uVar1,uVar2);
  FUN_0028fb80(auStack_28,auStack_88);
  for (uStack_b0 = 0; uStack_b0 < 0�10; uStack_b0 = uStack_b0 + 1) {
    sprintf(auStack_a8 + uStack_b0 * 2,&DAT_003a1ed1,auStack_28[uStack_b0]);
  }
  printf(&DAT_00363004,uStack_10,auStack_a8);
  return;

90
Constants

91
3. Relative Positioning

92
Constants

93
Relative Positioning

Likely MD5-related
functions

Likely MD5-related
functions

94
Relative Positioning

#include <stdio.h>
#include "md5_local.h"
#include <openssl/opensslv.h> MD5_Update(���)

��
 * Implemented from RFC1321 The MD5 Message-Digest Algorithm
 �� MD5_Transform(���)

#def�ne INIT_DATA_A (unsigned long)0�67452301L
#def�ne INIT_DATA_B (unsigned long)0�efcdab89L
#def�ne INIT_DATA_C (unsigned long)0�98badcfeL MD5_Final(���)
#def�ne INIT_DATA_D (unsigned long)0�10325476L

int MD5_Init(MD5_CTX *c)
{
    memset(c, 0, sizeof(*c));
    c->A = INIT_DATA_A;
    c->B = INIT_DATA_B;
    c->C = INIT_DATA_C;
    c->D = INIT_DATA_D;
    return 1;
}

OpenSSL_1_1_1s/crypto/md5/md5_dgst.c

95
Relative Positioning

#include <stdio.h>
#include "md5_local.h"
#include <openssl/opensslv.h> MD5_Update(���)

��
 * Implemented from RFC1321 The MD5 Message-Digest Algorithm
 �� MD5_Transform(���)

#def�ne INIT_DATA_A (unsigned long)0�67452301L
#def�ne INIT_DATA_B (unsigned long)0�efcdab89L
#def�ne INIT_DATA_C (unsigned long)0�98badcfeL MD5_Final(���)
#def�ne INIT_DATA_D (unsigned long)0�10325476L

int MD5_Init(MD5_CTX *c)
{
    memset(c, 0, sizeof(*c)); $ readelf -SW ./md5_dgst.o
    c->A = INIT_DATA_A; Section Headers:
    c->B = INIT_DATA_B;   [Nr] Name                Type      Off    Size   ES F�g Lk Inf Al
    c->C = INIT_DATA_C;   [ 3] .text.MD5_Update    PROGBITS  000040 00020d 00  AX  0   0 16
    c->D = INIT_DATA_D;   [ 5] .text.MD5_Transform PROGBITS  000250 000028 00  AX  0   0 16
  [ 7] .text.MD5_Final     PROGBITS  000280 000417 00  AX  0   0 16
    return 1;
  [ 9] .text.MD5_Init      PROGBITS  0006a0 000052 00  AX  0   0 16
}

OpenSSL_1_1_1s/crypto/md5/md5_dgst.c

96
Relative Positioning

Likely MD5-related
functions

Likely MD5-related
functions

97
Relative Positioning

Likely MD5-related
functions

98
Relative Positioning

$ readelf -SW ./md5_dgst.o
Section Headers:
  [Nr] Name                Type      Off    Size   ES F�g Lk Inf Al
  [ 3] .text.MD5_Update    PROGBITS  000040 00020d 00  AX  0   0 16
  [ 5] .text.MD5_Transform PROGBITS  000250 000028 00  AX  0   0 16
  [ 7] .text.MD5_Final     PROGBITS  000280 000417 00  AX  0   0 16
  [ 9] .text.MD5_Init      PROGBITS  0006a0 000052 00  AX  0   0 16

99
Relative Positioning

$ readelf -SW ./md5_dgst.o
Section Headers:
  [Nr] Name                Type      Off    Size   ES F�g Lk Inf Al
  [ 3] .text.MD5_Update    PROGBITS  000040 00020d 00  AX  0   0 16
  [ 5] .text.MD5_Transform PROGBITS  000250 000028 00  AX  0   0 16
  [ 7] .text.MD5_Final     PROGBITS  000280 000417 00  AX  0   0 16
  [ 9] .text.MD5_Init      PROGBITS  0006a0 000052 00  AX  0   0 16

MD5_Update
MD5_Transform
MD5_Final

100
Relative Positioning

As a rule of thumb:
Functions that are close each other in the source �le (or compilation unit),
are likely close in the compiled binary.

In particular, the relative order in the source code is preserved in the �nal
binary1.

1
This can be mitigated: https://github.com/open-obfuscator/o-mvll/blob/main/src/core/utils.cpp#L231-L257

101
3 Hands-on #3: Embedded Library

102
3

Level: Hard
$ ./crackme_hard.elf "azertyw"
Try again! Objectives: Reverse engineering a large binary with
$ ./crackme_hard.elf "***************" cryptography functions.
Well done!

ELF x86-64
Involves known cryptography algorithms Stripped

103
Guidelines

• Statically linked against an open-source cryptography library


• The API for cryptographic functions follows this sequence:
1. init()
2. update()
3. finalize()

104
Closing Remarks
Closing Remarks

A highly recommended reading:

1. https://margin.re/2021/11/an-opinionated-guide-on-how-to-reverse-engineer-software-part-1/

2. https://margin.re/2021/11/an-opinionated-guide-on-how-to-reverse-engineer-software-part-2/

105
Closing Remarks

106
Closing Remarks

https://www.root-me.org

107
Thank You!

https://www.romainthomas.fr me@romainthomas.fr

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy