CS 320/CSE 302/ECE 392

Introduction to Parallel Programming for Scientists and Engineers

MP6 – Due Tuesday, May 4

 

The objective of this MP is to parallelize, using MPI, the Gauss-Jordan Elimination method with implicit partial pivoting. This method is used to solve sets of linear equations.

Consider the matrix equation:

[A].[x1 || x2 || … || xm || Y] = [b1 || b2 || … || bm || U]

where

A, Y are square matrices of size n*n.

xi, bi are column vectors of size n*1.

U is the identity matrix of size n*n.

Operator || denotes column augmentation, i.e. it merges the columns of both the matrices joined by this operator, resulting in one wider matrix.

Given the values for matrix A and the column vectors bi, the matrix solution of the above equation by Gauss-Jordan elimination simultaneously solves the linear sets of equations:

A.x1 = b1 A.x2 = b2 … A.xm = bm

and A.Y = U

The last equation implies that Y = A-1.

The Gauss-Jordan Elimination method works as follows:

  1. Scale each row such that the largest element in matrix A in that row is normalized to unity. The same scaling factor must be applied to the matrix on the R.H.S. of the original matrix equation. In the process, check that the matrix is non-singular.
  2. For j = 1 to n /* i.e. for each column */
  1. find the maximum of the elements A(i, j), i=j to n. Suppose this is some A(k, j).
  2. if j and k are not the same, then for both the matrix A and the augmented matrix on the R.H.S. of the original matrix equation, exchange rows j and k.
  3. for both the matrix A and the augmented matrix on the R.H.S. of the original matrix equation, scale row j by a factor 1/A(j, j).
  4. for both the matrix A and the augmented matrix on the R.H.S. of the original matrix equation, apply the following row transformation to all rows other than row j:

A(row, i) = A(row, i) – A(row, j)*A(j, i) i=1,n

At the end of the computation outlined above, the matrix A will have been reduced to a unity matrix. Each bi will now give the value for the corresponding xi and what was U will now give the value for Y.

You must figure out a good way to implement the above algorithm in parallel, and you MUST include an explanation of your approach with your submission(make this a separate README file). Please describe how your data is distributed across different processes, the flow of control in your implementation, and what communication and synchronization is used to ensure that the computation proceeds correctly. Let the values for m, the matrix A, and the vectors bi be read in by processor 0 initially.

Also, please restrict the size of matrices while testing your program to avoid overloading the machine.