Report
Report
Report
Ax = b (1)
In the above equation x and b are the vectorized repersentation of X and B respectively.
From the Code Listing 1 we can see that the size of A is 62500 × 62500. This is computed through
the size of B which is 250 × 250 and then multiplying it by itself.
AT Ax = AT b (2)
Ãx = b̃ (3)
In the above equation the pre-multiplication with AT ensures that the resulting matrix à is sym-
metric and positive-definite. Hence even if A is not symmetric, we can still compute à because of
the properties of AT .
2
2.2. Explain why solving Ax = b for x is equivalent to minimizing 21 xT Ax − bT x over x,
assuming that A is symmetric positive–definite.
We want to show that by minimizing 12 xT Ax − bT x over x is equivalent to solving Ax = b. We can
do this by taking the derivative of 12 xT Ax − bT x with respect to x and setting it to zero. Hence we
get the following equation:
d 1 T T
x Ax − b x = 0 (4)
dx 2
1 T
x A + xT AT − bT = 0
(5)
2
xT A − bT = 0 (6)
T T
x A=b (7)
xT = bT A−1 (8)
−1 T
x = bT A
(9)
−1 T
x= A b (10)
−1
x=A b (11)
Ax = b (12)
Therefore can see at the end that we get the equation Ax = b which is what we wanted to show.
The function first initializes the solution x with the initial guess x0, and computes the initial
residual r as b − A ∗ x0. the direction d is also initialized as r. Then the function enters a for
loop which iterates for the maximum number of iterations. In each iteration, it performs the steps
which is provided in the algorithm.
8
9 for i = 1:maxIter
10 s = A * d;
11 alpha = pho_old / dot(d, s);
12 x = x + alpha * d;
13 r = r - alpha * s;
14 pho_new = dot(r, r);
3
15
16 beta = pho_new / pho_old;
17 d = r + beta * d;
18 pho_old = pho_new;
19
20 rvec = [rvec, pho_new];
21
22 if sqrt(pho_new) < tol
23 disp(’Converged’);
24 break;
25 end
26
27
28 end
29 end
In this question we are asked to validate our implementation of the Conjugate Gradient Algorithm.
We are given the matrix A and the vector b. We are then asked to use the function which we
implemented in the previous question to solve the system Ax = b. When we obtain the solution x
we are asked to plot the convergence of the residuals vs the iterations. The plot is shown in Figure 1.
Our maximum number of iterations is set to 200 and the tolerance is set to 10−4 . From the
Figure 1 we can see that the residuals converge right before the 180th iteration. Also the from the
plot we can see that the residual overall decreases, but does has multiple spikes in between.
4
3.3. Plot the eigenvalues of A test.mat and comment on the condition number and
convergence rate.
In this question we are asked to plot the eigenvalues of A and comment on the condition number
and the convergence rate. Condition Number is defined as the ratio of the largest eigenvalue to the
smallest eigenvalue. The condition number κ(A) is the relation of sensitivy of the solution x to the
changes in the right hand side b. The condition number is computed as follows:
λmax
κ(A) = (13)
λmin
If small changes in b cause large changes in x, then the system is called as ill-conditioned and the
condition number of the system is large. However in the other hand, if small changes in b cause
small changes in x, then the system is called as well-conditioned and the condition number of the
system is small. Therefore the condition number is a measure of the sensitivity of the solution x
to the changes in the right hand side b.
Hence in order to compute the condition number of A we need to compute the eigenvalues of
A. The eigenvalues of A are computed using the eig function in Matlab. The eigenvalues are then
sorted in ascending order and plotted. The plot is shown in Figure 2. As it can be seen in the
plot the difference between the largest eigenvalue and the smallest eigenvalue is very large. Hence
the condition number of A is very large. This means that the system is ill-conditioned and small
changes in b will cause large changes in x. This can also be seen by calculating the condition number
of A. We can do that by using the inbuilt cond function in Matlab. The condition number of A is
approximately 1.67 × 106 which is very large. Hence this is aligned with our previous observation
that the system is ill-conditioned.
Figure 2: Eigenvalues of A
5
system is ill-conditioned, small changes in b will cause large changes in x. Hence the residual will
have multiple spikes in between, but will overall decrease.
We first load the matrices A and B. A is the blurring matrix and B is the blurred image. We then
prepare for the Preconditioned Conjugate Gradient (PCG) method by creating the augmented sys-
tem augA and augB. We compute the preconditioner M by using matlab function ichol with the
nofill option (as asked). The PCG method is then applied to solved the system augA×x = augB.
We also make sure to have a diagonal shift of α = 0.01 in the PCG method which is done by using
the matlab function speye in which we create a sparse identity matrix and then multiply it by α
and add that to our matrix augA. The solution x is then reshaped to the original image size and
then displayed.
As for the implementation with our myCG function, we call the function with the parameters A, B,
x0, maxIter and tol. With this we are able to solve the system Ax = b. For both the implementa-
tion the result is then reshaped to the original image size and then displayed. Also the parameters
for both the implementations are set to maxIter = 200 and tol = 1e − 6.
In the Code Listing 3 we can see the implementation of the deblurring problem. The code follows
the explanation given above.
9
10 img = B; % Blurred image
11 n = size(img, 1);
12 b = B(:); % Vectorized blurred image
13 guess = ones(size(A, 1), 1);
14 maxiter = 200; % Maximum number of iterations
15 tol = 1e-6; % Tolerance
16
17 % Display image
18 imagesc(reshape(img, [n, n]));
19 colormap(’gray’);
20 axis off;
21 saveas(gcf, ’../Template/graphs/blurred.png’);
6
22
23 % Solve the system using ’pcg’
24 augA = A’ * A; % Augmented matrix of A
25 augA_shifted = augA + 0.01 * speye(size(augA)); % Shifted matrix
26
27 L = ichol(augA_shifted, struct(’type’, ’nofill’)); % Incomplete Cholesky
factorization
28 M = L * L’; % Preconditioner
29 M1 = L’;
30 M2 = L;
31 augB = A’ * b; % Augmented vector of b
32
33 [x_pcg, flag, relres, iter, resvec_pcg] = pcg(augA, augB, tol, maxiter, M1, M2); %
Solve the system
34
35
36 % Draw deblurred image obtained with ’pcg’
37 imagesc(reshape(x_pcg, [n, n]));
38 colormap(’gray’);
39 axis off;
40 saveas(gcf, ’../Template/graphs/deblurred_pcg.png’);
41
42 % Draw residuals vs iterations
43 semilogy(resvec_pcg);
44 xlabel(’Iterations’);
45 ylabel(’Residual value’);
46 legend(’Residuals’);
47 title(’Residuals vs Iterations for PCG’);
48 saveas(gcf, ’../Template/graphs/residuals_pcg.png’);
49
7
Figure 3: Blurred Image
(a) Deblurred Image using myCG (b) Deblurred Image using PCG
(a) Residuals vs Iterations for PCG (b) Residuals vs Iterations for myCG
As it can be seen from the Figures 4, the blurred image is deblurred using both the methods.
8
However in our implementation the blurring using pcg is better than the blurring using myCG.
This can be seen by comparing the residual graphs for both of them in Figure 5 which the residual
for pcg is much lower than the residual for myCG. The main reason for why this happens is because
of the preconditioner. The preconditioner is used to reduce the condition number of the system,
which makes the system well-conditioned and hence the solution is more accurate. Hence the
blurring using pcg performs better than the blurring using myCG, but it should also be noted that
the pcg method is more computationally expensive than the myCG method. Therefore if we want
a more accurate solution and we are willing to pay the computational cost, then we should use
the pcg method. However if we want a less accurate solution and we are not willing to pay the
computational cost, then we should use the myCG method.
4.2. When would pcg worth the added computational cost? What about if you are
deblurring lots of images with the same blur operator?
There are several scenarios in which pcg is worth the added computational cost. One of the sce-
narios is when the system is ill-conditioned. In this case the solution obtained by pcg will be more
accurate than the solution obtained by myCG. Another scenario is when the system is large. In this
case the pcg will be faster than myCG because of the preconditioner. The preconditioner reduces
the condition number of the system, hence the system is well-conditioned and the solution is more
accurate. Hence in this case pcg is worth the added computational cost.
If we are deblurring lots of images with the same blur operator, then pcg is worth the added
computational cost. This is because the preconditioner is computed only once and then used for
all the images. Therefore the computational cost is reduced.