CS553 Homework #3: Benchmarking Storage
CS553 Homework #3: Benchmarking Storage
Benchmarking Storage
Instructions:
● Assigned date: Friday March 13th, 2020
● Due date: 11:59PM on Sunday March 29th, 2020
● Maximum Points: 100%
● This homework can be done in groups up to 3 students
● Please post your questions to the Piazza forum
● Only a softcopy submission is required; it will automatically be collected through GIT after the
deadline; email confirmation will be sent to your HAWK email address
● Late submission will be penalized at 10% per day; an email to the TA with the subject “CS553:
late homework submission” must be sent
1 Your Assignment
This project aims to teach you how to benchmark storage systems. You can be creative with this project.
You must use either the C or C++ programming languages. Libraries such as PThreads will be necessary
to complete the assignment. Libraries such as STL could be used if necessary. Other programming
languages will not be allowed due to the variability of difficulty between languages, and increased
complexity in grading. Do not write code that relies on complex libraries (e.g. boost), as those will
simplify certain parts of your assignments, and will increase complexity in grading. If you are not sure if
certain libraries are allowed, ask the TAs.
You can use any Linux system for your development, but you must use the Chameleon testbed
[https://www.chameleoncloud.org]; more information about the hardware in this testbed can be found
at https://www.chameleoncloud.org/about/hardware-description/, under Standard Cloud Units. Even
more details can be found at https://www.chameleoncloud.org/user/discovery/, choose “Compute”,
then Click the “View” button. You are to use “Compute Haswell” node types; if there are no Haswell
nodes available, please use “Compute Skylake” node types. You are to use the advanced reservation
system to reserve 1 bare-metal instance to conduct your experiments. You will need to assign your
instance a floating IP address so that you can connect to your instance remotely.
In this project, you need to design a benchmarking program that evaluates the storage system. You will
perform strong scaling studies, unless otherwise noted; this means you will set the amount of work (e.g.
the number of objects or the amount of data to evaluate in your benchmark), and reduce the amount of
work per thread as you increase the number of threads. The TAs will compile (with the help of make)
and test your code on Chameleon bare-metal instances (Haswell or Skylake). If your code does not
compile and the TAs cannot run your project, you will get 0 for the assignment.
1. Disk:
a. Implement: MyDiskBench benchmark; Hint: there are multiple ways to read and write to
disk, explore the different APIs, and pick the fastest one out of all them; also make sure
you are measuring the speed of your disk and not your memory (you may need to flush
your disk cache managed by the OS)
b. Dataset: 10GB data split up in 7 different configurations (note these are similar to the
way IOZone deals with multi-threading and multiple concurrent file access):
Other requirements:
● You must write all benchmarks from scratch. Do not use code you find online, as you will get 0
credit for this assignment. If you have taken other courses where you wrote similar benchmarks,
you are welcome to start with your codebase as long as you wrote the code in your prior class.
● All of the benchmarks will have to evaluate concurrency performance; concurrency can be
achieved using threads. Use strong scaling in all experiments, unless it is not possible, in which
case you need to explain why a strong scaling experiment was not done. Be aware of the thread
synchronizing issues to avoid inconsistency or deadlock in your system.
● All benchmarks can be run on a single machine.
● Not all timing functions have the same accuracy; you must find one that has at least 1ms accuracy
or better, assuming you are running the benchmarks for at least seconds at a time.
● Since there are many experiments to run, find ways (e.g. scripts) to automate the performance
evaluation. Besides BASH scripts, it is possible to automate your experiments using the “parallel”
tool in Linux.
● For the best reliability in your results, repeat each experiment 3 times and report the average and
standard deviation. This will help you get more stable results that are easier to understand and
justify.
● Don’t forget to benchmark your disk, and not your memory. You may need to flush caches that
might be stored in memory.
● You may find it more efficient to deal with binary data when reading or writing in this evaluation.
● No GUIs are required. Simple command line interfaces are required. Make your benchmark and
iozone as similar as possible from a command line argument and how the program behaves.
Submit code/report through GIT. If you cannot access your repository contact the TAs. You can find a
git cheat sheet here: https://www.git-tower.com/blog/git-cheat-sheet/
Grades for late programs will be lowered 10% per day late.