C/C++ Code Optimization Techniques: Rules To Remember
C/C++ Code Optimization Techniques: Rules To Remember
C/C++ Code Optimization Techniques: Rules To Remember
Rules to Remember
1. Find the bottleneck of your Application / module. 80 - 20 rule also applies over here which means most of the execution goes in a smaller part of the code. You should use some profiling tool to fing the bottle nect of your application before going into the optimization part. 2. C/C++ compilers are queit efficient and they procuce optimized binary for the written code. So micro-optimization like using register variables will not be usesful. Let the compiler find the best way to utilize the variables. 3. Some guidelines for optimizing C/C++ code: a. Get same calculations/comparisons out of the loop.
for (int i=0; i<iMax; i++) { if (Val > CONST_VAL) { ... } else { .... } }
if (Val > CONST_VAL) { for (int i=0; i<iMax; i++) { ..... } } else { for (int i=0; i<iMax; i++)
{ ..... } } b. Avoid type conversions, use same type of variables including signed and unsigned. unsigned long x, sum; for (int i = 0; i<Max_I; i++) { sum = x*i; } Optimized: unsigned long x, sum; for (unsigned long i = 0; i<Max_I; i++) { sum = x*i; }
c. Interger calculations are faster than floating point. d. Keep in mind the processors word length. On 32 bit processor 32-bit calculations and copying will be faster or equal to 16-bit or 8-bit.
f. Multiplication and Division operations are generally costlier than Addition Substraction. Try to convert multiplication/division to addition/substraction.
An Example:
Before These are constants to the loop extentY, extentX, TopLeftX, TopLeftY and are of type unsigned long #define Target(cx,cy) pOffBits[serverResX * (cy) + (cx)] #define Source(cx,cy) pBitsI[extentX * (cy) + (cx)] for (int y = 0; y<extentY; y++) for (int x = 0; x<extentX; x++)
Optimized unsigned long intermC = serverResX * TopLeftY + TopLeftX; unsigned long intermZ = 0; unsigned long intermY = 0; for (unsigned long y = 0; y<extentY; y++) { unsigned long intermA = intermY + intermC; for (unsigned long x = 0; x<extentX; x++) { pOffBits[intermA + x] = pBitsI[intermZ + x]; } intermZ += extentX; intermY += serverResX; } Step By Step 1. Replacing the #define and making the loop variable data type same as that of the others for (unsigned long y = 0; y<extentY; y++) for (unsigned long x = 0; x<extentX; x++) { pOffBits[serverResX * (y + TopLeftY) + (x + TopLeftX)] = pBitsI[extentX * y + x]; } 2. Rearranging the calculation pOffBits[serverResX * y + x + serverResX * TopLeftY + TopLeftX] = pBitsI[extentX * y + x];
Here (serverResX * TopLeftY + TopLeftX) is constant so we can take it out of the loop unsigned long intermC = serverResX * TopLeftY + TopLeftX; for (unsigned long y = 0; y<extentY; y++) for (unsigned long x = 0; x<extentX; x++) { pOffBits[serverResX * y + x + intermC] = pBitsI[extentX * y + x];
3. Now here both serverResX * y and extentX * y are constant in the inner loop so we can take it out of the innder loop. unsigned long intermC = serverResX * TopLeftY + TopLeftX; for (unsigned long y = 0; y<extentY; y++) { unsigned long intermA = serverResX * y; unsigned long intermB = extentX * y; for (unsigned long x = 0; x<extentX; x++) { pOffBits[intermA + x + intermC] = pBitsI[intermB + x]; } }
4. Now you can see the multiplication serverResX * y going in the loop. Where y ranges from 0 to extentY. We can convert it to addition like this: unsigned long intermC = serverResX * TopLeftY + TopLeftX; unsigned long intermY = 0; for (unsigned long y = 0; y<extentY; y++) { unsigned long intermA = intermY; unsigned long intermB = extentX * y; for (unsigned long x = 0; x<extentX; x++) { pOffBits[intermA + x + intermC] = pBitsI[intermB + x]; } intermY += serverResX; }
5. Similerly for extentX, and we gan get rid of intermA and intermB unsigned long intermC = serverResX * TopLeftY + TopLeftX; unsigned long intermZ = 0; unsigned long intermY = 0; for (unsigned long y = 0; y<extentY; y++)
{ for (unsigned long x = 0; x<extentX; x++) { pOffBits[intermY + x + intermC] = pBitsI[intermZ + x]; } intermZ += extentX; intermY += serverResX; }
6. Also since intermY + intermC is constant in the inner loop we can take it out unsigned long intermC = serverResX * TopLeftY + TopLeftX; unsigned long intermZ = 0; unsigned long intermY = 0; for (unsigned long y = 0; y<extentY; y++) { unsigned long intermA = intermY + intermC; for (unsigned long x = 0; x<extentX; x++) { pOffBits[intermA + x] = pBitsI[intermZ + x]; } intermZ += extentX; intermY += serverResX; }
This is the optimized algorithm. Note: All datatypes are same over here
Comments, questions, please mail me. Akhilesh Singh Last Updated 8th January 2005 Crossroad | About Me | Science | Life | Code | Contact | Site Map | Search Copyright 2006 - 2009