Lecture 7 - Optimizations - A 2025
Lecture 7 - Optimizations - A 2025
מחשב
מצגת Optimizations – 7
= some other user’s time (time executing instructions in different user’s process)
Lesson:
Find the 10% that really count!
Let the compiler worry about the rest
Important:
First make program work correctly
Make sure easy to maintain
Then optimize
Optimizing Compilers
Provide efficient mapping of program to machine
■register allocation – שימוש ברגיסטרים
■code selection and ordering – שינוי סדר בשביל לנצלPipeline
■eliminating minor inefficiencies –
sum = val[i-1][j]
+ val[i+1][j]
+ val[i][j-1]
+ val[i][j+1];
sum = val[i-1][j]
+ val[i+1][j]
+ val[i][j-1]
+ val[i][j+1];
We know it is really:
/* Sum neighbors of i,j */
up = val[(i-1)*n + j];
down = val[(i+1)*n + j];
left = val[i*n + j-1];
right = val[i*n + j+1];
sum = up + down + left + right;
sum = val[i-1][j]
+ val[i+1][j]
+ val[i][j-1]
+ val[i][j+1];
sum = val[i-1][j]
+ val[i+1][j]
+ val[i][j-1]
+ val[i][j+1];
#include <stdio.h>
int main() {
char c = 125;
while (c > 0) {
printf("%d ", c);
c++;
}
return 0;
}
What is the output of the program ?
#include <stdio.h>
int main() {
char c = 125;
while (c > 0) {
printf("%d ", c);
c++;
}
return 0;
}
125, 126,
127
What is the output of the program ?
#include <stdio.h>
int main() {
char c = 125;
while (c < (c + 1)) {
printf("%d ", c);
c++;
}
return 0;
}
What is the output of the program ?
#include <stdio.h> GCC compiler recognizes that for a signed char, c < (c + 1)
will always hold true because the result of the increment is
always a larger integer number, except when overflow
occurs, which causes a wrap-around but still maintains a
int main() { cycle where c always appears less than c + 1 in the signed
8-bit context. Given this insight, GCC decides to optimize the
loop condition while (c < (c + 1)) to while (true) because it
char c = 125; recognizes that the condition will never become false during
the execution of the loop for signed char.
while (c < (c + 1)) {
This decision is wrong in our case.
printf("%d ", c);
c++;
}
return 0;
}
Procedure
■Compute sum of all elements of vector
■Store result at destination location
Move vec_length Call Out of Loop
void combine2(vec_ptr v, int *dest)
{
int i;
int length = vec_length(v);
*dest = 0;
for (i = 0; i < length; i++) {
int val;
get_vec_element(v, i, &val);
*dest += val;
}
}
Optimization
■Move call to vec_length out of inner loop
●Value does not change from one iteration to next
●Code motion
■ vec_length requires only constant time, but significant overhead
Reduction in Strength
void combine3(vec_ptr v, int *dest)
{
int i;
int length = vec_length(v);
int *data = get_vec_start(v);
*dest = 0;
for (i = 0; i < length; i++) {
*dest += data[i];
}
Reduction in Strength
■Shift, add instead of multiply or divide
●compilers are (generally) good at this
●Exact trade-offs machine-dependent
■Keep data in registers rather than memory
●compilers are not good at this, since concerned with aliasing
▪When encounters conditional branch, cannot reliably determine where to continue fetching
. . .
. . .
Branch Taken
⬛ Performance Cost
▪ Multiple clock cycles on modern processor
▪ Can be a major performance limiter
Structure
Representation
Alignment Principles
זו דרישה
שמשתנה בין
⬛ Aligned Data מערכות הפעלה
שונות נדרש כדי ליעל גישהalignment
▪ Primitive data type requires K bytes לבצע מספר,למשתנים (כלומר
כדיRAM -מינימלי של גישות ל
▪ Address must be multiple of K - לRAM -להביא משתנה מ
▪ Required on some machines; advised on x86-64 )CPU
של קודalignment
נעשה ע"י קומפיילר
בזמן קומפילציה
Specific Cases of Alignment (x86-64)
⬛ 1 byte: char, … יכולchar משתנה מסוג
RAM -לשבת בכל כתובת ב
▪ no restrictions on address
יכולshort משתנה מסוג
⬛ 2 bytes: short, … לשבת בכתובת שמתחלקת
.2 -ב
דוגמא לסידור
אחר של שדות
struct בתוך בסידור החדש אנחנו עדיין
struct S1
struct S1 {{ struct S2
struct S2 {{ מפסידים את אותה הכמות של
char
char c;
c; double
double v;v; אבל מרוויחים בכך, unused bytes
int
int i[2];
i[2]; int
int i[2];
i[2]; ,שכל השדות יושבים ברצף בזכרון
double
double v;v; char
char c;
c; בגלל הרציפות.ללא חורים ביניהם
}} *p;
*p; }} *p;
*p; cache המבנה שלנו הופך להיות
.friendly
Multiple of K=8
Saving Space
דוגמא נוספת לסידורים שונים
כאשר הפעם,struct של אותו
⬛ Put large data types first הסידור החדש חוסך לנו
! מקום
⬛ Effect (K=4)
מכיוון שאין התנאיה של מיקום
אם נרכז את כל, char של
אז, ביחדchar השדות של
.נקבל חסכון בזכרון
struct S4
struct S4 {{ struct S5
struct S5 {{
char
char c;
c; int
int i;
i;
int
int i;
i; char
char c;
c;
char
char d;
d; char
char d;
d;
}} *p;
*p; }} *p;
*p;