0% found this document useful (0 votes)

24 views10 pages

Garbage Collector

Uploaded by

varunprint1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views10 pages

Garbage Collector

Uploaded by

varunprint1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Baby's First Garbage Collector

↩↪
DECEMBER 08, 2013 C CODE LANGUAGE

When I get stressed out and have too much to do, I have this
paradoxical reaction where I escape from that by coming up
with another thing to do. Usually it’s a tiny self-contained program
that I can write and finish.
The other morning, I was freaking myself out about the book I’m
working on and the stuff I have to do at work and a talk I’m
preparing for Strange Loop, and all of the sudden, I thought, “I
should write a garbage collector.”
Yes, I realize how crazy that paragraph makes me seem. But my
faulty wiring is your free tutorial on a fundamental piece of
programming language implementation! In about a hundred lines
of vanilla C, I managed to whip up a basic mark-and-
sweep collector that actually, you know, collects.
Garbage collection is considered one of the more shark-infested
waters of programming, but in this post, I’ll give you a nice kiddie
pool to paddle around in. (There may still be sharks in it, but at
least it will be shallower.)
Reduce, reuse, recycle
The basic idea behind garbage collection is that the language (for
the most part) appears to have access to infinite memory. The
developer can just keep allocating and allocating and allocating
and, as if by magic, it never fails.
Of course, machines don’t have infinite memory. So the way the
implementation does this is that when it needs to allocate a bit of
memory and it realizes it’s running low, it collects garbage.
“Garbage” in this context means memory it previously allocated
that is no longer being used. For the illusion of infinite memory to
work, the language needs to be very safe about “no longer being
used”. It would be no fun if random objects just started getting
reclaimed while your program was trying to access them.
In order to be collectible, the language has to ensure there’s no
way for the program to use that object again. If it can’t get a
reference to the object, then it obviously can’t use it again. So the
definition of “in use” is actually pretty simple:
1. Any object that’s being referenced by a variable that’s still in scope is in
use.
2. Any object that’s referenced by another object that’s in use is in use.
The second rule is the recursive one. If object A is referenced by a
variable, and it has some field that references object B, then B is in
use since you can get to it through A.
The end result is a graph of reachable objects—all of the objects in
the world that you can get to by starting at a variable and
traversing through objects. Any object not in that graph of
reachable objects is dead to the program and its memory is ripe for
a reaping.
Marking and sweeping
There’s a bunch of different ways you can implement the process of
finding and reclaiming all of the unused objects, but the simplest
and first algorithm ever invented for it is called “mark-sweep”. It
was invented by John McCarthy, the man who invented Lisp and
beards, so you implementing it now is like communing with one of
the Elder Gods, but hopefully not in some Lovecraftian way that
ends with you having your mind and retinas blasted clean.
It works almost exactly like our definition of reachability:
1. Starting at the roots, traverse the entire object graph. Every time you
reach an object, set a “mark” bit on it to true.
2. Once that’s done, find all of the objects whose mark bits are not set and
delete them.
That’s it. I know, you could have come up with that, right? If you
had, you’d be the author of a paper cited hundreds of times. The
lesson here is that to be famous in CS, you don’t have to come up
with really smart stuff, you just have to come up with dumb
stuff first.
A pair of objects
Before we can get to implementing those two steps, let’s get a
couple of preliminaries out of the way. We won’t be actually
implementing an interpreter for a language—no parser, bytecode,
or any of that foolishness—but we do need some minimal amount of
code to create some garbage to collect.
Let’s play pretend that we’re writing an interpreter for a little
language. It’s dynamically typed, and has two types of objects: ints
and pairs. Here’s an enum to identify an object’s type:
typedef enum {
OBJ_INT,
OBJ_PAIR
} ObjectType;
A pair can be a pair of anything, two ints, an int and another pair,
whatever. You can go surprisingly far with just that. Since an object
in the VM can be either of these, the typical way in C to implement
it is with a tagged union.
We’ll define it thusly:
typedef struct sObject {
ObjectType type;

union {
/* OBJ_INT */
int value;

/* OBJ_PAIR */
struct {
struct sObject* head;
struct sObject* tail;
};
};
} Object;
The main Object struct has a type field that identifies what kind of
value it is— either an int or a pair. Then it has a union to hold the
data for the int or pair. If your C is rusty, a union is a struct where
the fields overlap in memory. Since a given object can only be an
int or a pair, there’s no reason to have memory in a single object
for all three fields at the same time. A union does that. Groovy.
A minimal virtual machine
Now we can wrap that in a little virtual machine structure. Its role
in this story is to have a stack that stores the variables that are
currently in scope. Most language VMs are either stack-based (like
the JVM and CLR) or register-based (like Lua). In both cases, there
is actually still a stack. It’s used to store local variables and
temporary variables needed in the middle of an expression.
We’ll model that explicitly and simply like so:
#define STACK_MAX 256

typedef struct {
Object* stack[STACK_MAX];
int stackSize;
} VM;
Now that we’ve got our basic data structures in place, let’s slap
together a bit of code to create some stuff. First, let’s write a
function that creates and initializes a VM:
VM* newVM() {
VM* vm = malloc(sizeof(VM));
vm->stackSize = 0;
return vm;
}
Once we’ve got a VM, we need to be able to manipulate its stack:
void push(VM* vm, Object* value) {
assert(vm->stackSize < STACK_MAX, "Stack overflow!");
vm->stack[vm->stackSize++] = value;
}

Object* pop(VM* vm) {

assert(vm->stackSize > 0, "Stack underflow!");
return vm->stack[--vm->stackSize];
}
OK, now that we can stick stuff in “variables”, we need to be able to
actually create objects. First a little helper function:
Object* newObject(VM* vm, ObjectType type) {
Object* object = malloc(sizeof(Object));
object->type = type;
return object;
}
That does the actual memory allocation and sets the type tag. We’ll
be revisiting this in a bit. Using that, we can write functions to push
each kind of object onto the VM’s stack:
void pushInt(VM* vm, int intValue) {
Object* object = newObject(vm, OBJ_INT);
object->value = intValue;
push(vm, object);
}

Object* pushPair(VM* vm) {

Object* object = newObject(vm, OBJ_PAIR);
object->tail = pop(vm);
object->head = pop(vm);

push(vm, object);
return object;
}
And that’s it for our little VM. If we had a parser and an interpreter
that called those functions, we’d have an honest to God language
on our hands. And, if we had infinite memory, it would even be able
to run real programs. Since we don’t, let’s start collecting some
garbage.
Marky mark
The first phase is marking. We need to walk all of the reachable
objects and set their mark bit. The first thing we need then is to
add a mark bit to Object:
typedef struct sObject {
unsigned char marked;
/* Previous stuff... */
} Object;
When we create a new object, we’ll modify newObject() to
initialize markedto zero. To mark all of the reachable objects, we
start with the variables that are in memory, so that means walking
the stack. That looks like this:
void markAll(VM* vm)
{
for (int i = 0; i < vm->stackSize; i++) {
mark(vm->stack[i]);
}
}
That in turn calls mark. We’ll build that in phases. First:
void mark(Object* object) {
object->marked = 1;
}
This is the most important bit, literally. We’ve marked the object
itself as reachable, but remember we also need to handle
references in objects: reachability is recursive. If the object is a
pair, its two fields are reachable too. Handling that is simple:
void mark(Object* object) {
object->marked = 1;

if (object->type == OBJ_PAIR) {
mark(object->head);
mark(object->tail);
}
}
But there’s a bug here. Do you see it? We’re recursing now, but we
aren’t checking for cycles. If you have a bunch of pairs that point to
each other in a loop, this will overflow the stack and crash.
To handle that, we just need to bail out if we get to an object that
we’ve already processed. So the complete mark() function is:
void mark(Object* object) {
/* If already marked, we're done. Check this first
to avoid recursing on cycles in the object graph. */
if (object->marked) return;

object->marked = 1;

if (object->type == OBJ_PAIR) {
mark(object->head);
mark(object->tail);
}
}
Now we can call markAll() and it will correctly mark every
reachable object in memory. We’re halfway done!
Sweepy sweep
The next phase is to sweep through all of the objects we’ve
allocated and free any of them that aren’t marked. But there’s a
problem here: all of the unmarked objects are, by definition,
unreachable! We can’t get to them!
The VM has implemented the language’s semantics for object
references: so we’re only storing pointers to objects in variables
and the pair elements. As soon as an object is no longer pointed to
by one of those, we’ve lost it entirely and actually leaked memory.
The trick to solve this is that the VM can have its own references to
objects that are distinct from the semantics that are visible to the
language user. In other words, we can keep track of them
ourselves.
The simplest way to do this is to just maintain a linked list of every
object we’ve ever allocated. We’ll extend Object itself to be a node
in that list:
typedef struct sObject {
/* The next object in the list of all objects. */
struct sObject* next;

/* Previous stuff... */
} Object;
The VM will keep track of the head of that list:
typedef struct {
/* The first object in the list of all objects. */
Object* firstObject;

/* Previous stuff... */
} VM;
In newVM() we’ll make sure to initialize firstObject to NULL. Whenever
we create an object, we add it to the list:
Object* newObject(VM* vm, ObjectType type) {
Object* object = malloc(sizeof(Object));
object->type = type;
object->marked = 0;

/* Insert it into the list of allocated objects. */

object->next = vm->firstObject;
vm->firstObject = object;

return object;
}
This way, even if the language can’t find an object, the
language implementation still can. To sweep through and delete
the unmarked objects, we just need to traverse the list:
void sweep(VM* vm)
{
Object** object = &vm->firstObject;
while (*object) {
if (!(*object)->marked) {
/* This object wasn't reached, so remove it from the list
and free it. */
Object* unreached = *object;

*object = unreached->next;
free(unreached);
} else {
/* This object was reached, so unmark it (for the next GC)
and move on to the next. */
(*object)->marked = 0;
object = &(*object)->next;
}
}
}
That code is a bit tricky to read because of that pointer to a pointer,
but if you work through it, you can see it’s pretty straightforward.
It just walks the entire linked list. Whenever it hits an object that
isn’t marked, it frees its memory and removes it from the list. When
this is done, we will have deleted every unreachable object.
Congratulations! We have a garbage collector! There’s just one
missing piece: actually calling it. First let’s wrap the two phases
together:
void gc(VM* vm) {
markAll(vm);
sweep(vm);
}
You couldn’t ask for a more obvious mark-sweep implementation.
The trickiest part is figuring out when to actually call this. What
does “low on memory” even mean, especially on modern computers
with near-infinite virtual memory?
It turns out there’s no precise right or wrong answer here. It really
depends on what you’re using your VM for and what kind of
hardware it runs on. To keep this example simple, we’ll just collect
after a certain number of allocations. That’s actually how some
language implementations work, and it’s easy to implement.
We’ll extend VM to track how many we’ve created:
typedef struct {
/* The total number of currently allocated objects. */
int numObjects;

/* The number of objects required to trigger a GC. */

int maxObjects;

/* Previous stuff... */
} VM;
And then initialize them:
VM* newVM() {
/* Previous stuff... */
vm->numObjects = 0;
vm->maxObjects = INITIAL_GC_THRESHOLD;
return vm;
}
The INITIAL_GC_THRESHOLD will be the number of objects at which you
kick off the first GC. A smaller number is more conservative with
memory, a larger number spends less time on garbage collection.
Adjust to taste.
Whenever we create an object, we increment numObjects and run a
collection if it reaches the max:
Object* newObject(VM* vm, ObjectType type) {
if (vm->numObjects == vm->maxObjects) gc(vm);

/* Create object... */

vm->numObjects++;
return object;
}
I won’t bother showing it, but we’ll also
tweak sweep() to decrementnumObjects every time it frees one.
Finally, we modify gc() to update the max:
void gc(VM* vm) {
int numObjects = vm->numObjects;

markAll(vm);
sweep(vm);

vm->maxObjects = vm->numObjects * 2;
}
After every collection, we update maxObjects based on the number
of live objects left after the collection. The multiplier there lets our
heap grow as the number of living objects increases. Likewise, it
will shrink automatically if a bunch of objects end up being freed.
Simple
You made it! If you followed all of this, you’ve now got a handle on
a simple garbage collection algorithm. If you want to see it all
together, here’s the full code. Let me stress here that while this
collector is simple, it isn’t a toy.
There are a ton of optimizations you can build on top of this (and in
things like GC and programming languages, optimization is 90% of
the effort), but the core code here is a legitimate real GC. It’s very
similar to the collectors that were in Ruby and Lua until recently.
You can ship production code that uses something exactly like this.
Now go build something awesome!

Memory Thinking For C & C++ Linux Diagnostics
100% (1)
Memory Thinking For C & C++ Linux Diagnostics
258 pages
Mastering Arduinojson 6: Efficient Json Serialization For Embedded C++
No ratings yet
Mastering Arduinojson 6: Efficient Json Serialization For Embedded C++
333 pages
DS Lab Manual (14 Labs) Fall 2023
No ratings yet
DS Lab Manual (14 Labs) Fall 2023
178 pages
Kle'S K F Patil Iba, Ranebennur: 1. Data Structure
No ratings yet
Kle'S K F Patil Iba, Ranebennur: 1. Data Structure
66 pages
CPP Smart Pointers Ebook
No ratings yet
CPP Smart Pointers Ebook
58 pages
Garbage Collection: Vitaly Shmatikov
No ratings yet
Garbage Collection: Vitaly Shmatikov
34 pages
Chapter 7 - RUN - TIME ENVIRONMENT
No ratings yet
Chapter 7 - RUN - TIME ENVIRONMENT
85 pages
Heaps
No ratings yet
Heaps
67 pages
Foundations of C++ - Bjarne Stroustrup - ETAPS12-corrected
No ratings yet
Foundations of C++ - Bjarne Stroustrup - ETAPS12-corrected
27 pages
Ds Mod3
No ratings yet
Ds Mod3
101 pages
C and Data Structure
No ratings yet
C and Data Structure
151 pages
PLDI Week 05 Lexing
No ratings yet
PLDI Week 05 Lexing
78 pages
01 The Node
No ratings yet
01 The Node
10 pages
Question Bank
No ratings yet
Question Bank
173 pages
Lecture L09 Parameters Memory INodes Notes
No ratings yet
Lecture L09 Parameters Memory INodes Notes
6 pages
Trace Surfing Presentation
No ratings yet
Trace Surfing Presentation
63 pages
Article 8 Uses of Pointers in C Alex Via
No ratings yet
Article 8 Uses of Pointers in C Alex Via
8 pages
COMP2006 Lecture 5 Structs
No ratings yet
COMP2006 Lecture 5 Structs
38 pages
5 Object Programming Essentials
No ratings yet
5 Object Programming Essentials
85 pages
Oo 0
No ratings yet
Oo 0
13 pages
Intro To C - Module 5
No ratings yet
Intro To C - Module 5
15 pages
PF CS1 Lab 11
No ratings yet
PF CS1 Lab 11
9 pages
C++ The Good, Bad, and Ugly
No ratings yet
C++ The Good, Bad, and Ugly
29 pages
Chapter 7 - Run - Time Environment
No ratings yet
Chapter 7 - Run - Time Environment
90 pages
Object-Oriented Programm Ing: Hapter 1
No ratings yet
Object-Oriented Programm Ing: Hapter 1
56 pages
Converted 7da7c
No ratings yet
Converted 7da7c
31 pages
C++ Tutorial Part II - Advanced: Silan Liu
No ratings yet
C++ Tutorial Part II - Advanced: Silan Liu
53 pages
System Memorywith C
No ratings yet
System Memorywith C
8 pages
08 Structs
No ratings yet
08 Structs
7 pages
Intro To C - Module 7
No ratings yet
Intro To C - Module 7
10 pages
Lecture 12: Arrays, Pointers, Recursive Types, & Garbage Collection
No ratings yet
Lecture 12: Arrays, Pointers, Recursive Types, & Garbage Collection
39 pages
11 - 5 - Garbage Collection (09 - 51)
No ratings yet
11 - 5 - Garbage Collection (09 - 51)
6 pages
No Littering Tamu
100% (1)
No Littering Tamu
43 pages
C Language Topics For Interview
No ratings yet
C Language Topics For Interview
24 pages
122 Midterm Theory Notes
No ratings yet
122 Midterm Theory Notes
12 pages
02 GarbageCollection
No ratings yet
02 GarbageCollection
102 pages
Class-Lect - Needed Notes
No ratings yet
Class-Lect - Needed Notes
6 pages
10 Structs 2
No ratings yet
10 Structs 2
6 pages
Type-Safe Generic Data Structures in C
No ratings yet
Type-Safe Generic Data Structures in C
11 pages
Static Analysis of String Manipulations in Critical Embedded C Programs
No ratings yet
Static Analysis of String Manipulations in Critical Embedded C Programs
17 pages
C Interview Questions
No ratings yet
C Interview Questions
18 pages
Session 1:: - Simple Knowledge of C Programming Language
No ratings yet
Session 1:: - Simple Knowledge of C Programming Language
16 pages
CMP203 OOPs and C++ Reference Book
No ratings yet
CMP203 OOPs and C++ Reference Book
125 pages
Data Structure
No ratings yet
Data Structure
23 pages
Lecture Slides 10 105-Memallocation-Gc
No ratings yet
Lecture Slides 10 105-Memallocation-Gc
9 pages
Memory Management
No ratings yet
Memory Management
4 pages
Lecture Slides 10 105-Memallocation-Gc
No ratings yet
Lecture Slides 10 105-Memallocation-Gc
9 pages
Intro To C - Module 4
No ratings yet
Intro To C - Module 4
15 pages
C Language Topics For Interview
No ratings yet
C Language Topics For Interview
24 pages
Gathering of Gray Presents: An Introduction To Programming For Hackers Part VI - Pointers, Data Structures and Dynamic Memory by Lovepump, 2004 Visit
No ratings yet
Gathering of Gray Presents: An Introduction To Programming For Hackers Part VI - Pointers, Data Structures and Dynamic Memory by Lovepump, 2004 Visit
13 pages
Day 1 Introduction To Data Structures: Vritika Naik Twitter: @naikvritika Linkedin: Vritika Naik
No ratings yet
Day 1 Introduction To Data Structures: Vritika Naik Twitter: @naikvritika Linkedin: Vritika Naik
28 pages
MIT6 087IAP10 Lec08
No ratings yet
MIT6 087IAP10 Lec08
23 pages
COMPACT NSX 100 To 630 A Catalogue
No ratings yet
COMPACT NSX 100 To 630 A Catalogue
284 pages
Ten Things On C++: 1. Basic Compilation
No ratings yet
Ten Things On C++: 1. Basic Compilation
7 pages
1.background On OOP
No ratings yet
1.background On OOP
31 pages
Embedded C
No ratings yet
Embedded C
9 pages
Fundamentals of C++: Yingcai Xiao 09/03/08
No ratings yet
Fundamentals of C++: Yingcai Xiao 09/03/08
20 pages
Collins FMS-4200 Flight Management System PDF
100% (4)
Collins FMS-4200 Flight Management System PDF
606 pages
MTRN2500
No ratings yet
MTRN2500
6 pages
Crawler
No ratings yet
Crawler
83 pages
MapReduce Quora
No ratings yet
MapReduce Quora
39 pages
Calibration Book
100% (1)
Calibration Book
118 pages
Deep Learning Based On Cotton Leaf Disease Detection Using DesnseNet
No ratings yet
Deep Learning Based On Cotton Leaf Disease Detection Using DesnseNet
55 pages
Consistent Hashing
No ratings yet
Consistent Hashing
19 pages
Alvo Stockman - Best Friends Forever (Alvolucion) PDF
100% (2)
Alvo Stockman - Best Friends Forever (Alvolucion) PDF
36 pages
Introduction To Ore Mineralogy - Thalhammer
100% (1)
Introduction To Ore Mineralogy - Thalhammer
43 pages
Reading For Real World 2
No ratings yet
Reading For Real World 2
7 pages
PCL Price List
No ratings yet
PCL Price List
12 pages
AX 12 Barcode
No ratings yet
AX 12 Barcode
34 pages
Heavy Duty Coolant
No ratings yet
Heavy Duty Coolant
2 pages
5.0 Field Effect Transistor (FET)
No ratings yet
5.0 Field Effect Transistor (FET)
50 pages
NES 729 Part 3 Requirements For Non-Destructive Examination Methods
No ratings yet
NES 729 Part 3 Requirements For Non-Destructive Examination Methods
48 pages
PGJR Sdls301 Final
No ratings yet
PGJR Sdls301 Final
6 pages
Class 6 Ch-5 Changes Around Us
No ratings yet
Class 6 Ch-5 Changes Around Us
6 pages
Godox AD300 Pro Manual
No ratings yet
Godox AD300 Pro Manual
12 pages
Face Book Chat
No ratings yet
Face Book Chat
3 pages
Medidor de Flujo ULTRAFIX
No ratings yet
Medidor de Flujo ULTRAFIX
26 pages
Cec 225 PDF
No ratings yet
Cec 225 PDF
14 pages
Communication Process Model: Lesson 3
No ratings yet
Communication Process Model: Lesson 3
17 pages
Barack Obama's Pauses and Gestures in Humorous Speeches
No ratings yet
Barack Obama's Pauses and Gestures in Humorous Speeches
9 pages
Art - Attacks.evasion - Zoo - Adversarial Robustness Toolbox 1.2.0 Documentation
No ratings yet
Art - Attacks.evasion - Zoo - Adversarial Robustness Toolbox 1.2.0 Documentation
12 pages
BDC - Sap Abap Questionnare
No ratings yet
BDC - Sap Abap Questionnare
6 pages
Chinese Checkers Board With Dragon
No ratings yet
Chinese Checkers Board With Dragon
7 pages
JD - Document Controller-1
No ratings yet
JD - Document Controller-1
2 pages
IEEE Power System Paper-A 20-KW, 10-KHz, Single-Phase Multilevel Active
No ratings yet
IEEE Power System Paper-A 20-KW, 10-KHz, Single-Phase Multilevel Active
7 pages
Blade Runner Shot List - Sheet1-2
No ratings yet
Blade Runner Shot List - Sheet1-2
1 page
Delhi Ridge
No ratings yet
Delhi Ridge
2 pages
Atc Training and Emergency Handling: "Never Stop Learning, For When We Stop Learning, We Stop Growing"
No ratings yet
Atc Training and Emergency Handling: "Never Stop Learning, For When We Stop Learning, We Stop Growing"
2 pages
Lori Hubbartt Human Resources Recruiter Resume
No ratings yet
Lori Hubbartt Human Resources Recruiter Resume
2 pages
Cone of Experience (1946) Was The Most Important Contribution of Edgar Dale in Field of
No ratings yet
Cone of Experience (1946) Was The Most Important Contribution of Edgar Dale in Field of
3 pages
Domestic: Nitai CH Basak
No ratings yet
Domestic: Nitai CH Basak
2 pages
Coding For Kids: JavaScript Adventures with 50 Hands-on Activities
From Everand
Coding For Kids: JavaScript Adventures with 50 Hands-on Activities
Silas Meadowlark
No ratings yet
Beyond the Basics of JavaScript
From Everand
Beyond the Basics of JavaScript
Tom Henricksen
No ratings yet
Just the basics of JavaScript
From Everand
Just the basics of JavaScript
Tom Henricksen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Garbage Collector

Uploaded by

Garbage Collector

Uploaded by

Baby's First Garbage Collector

Object* pop(VM* vm) {

Object* pushPair(VM* vm) {

/* Insert it into the list of allocated objects. */

/* The number of objects required to trigger a GC. */

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.