Introduction to Parallel Programming for Scientists and Engineers
The objective of this exercise is to transform one of the most important DO loops (in terms of its execution time) of a conventional Fortran program into parallel form. The program is to be transformed into OpenMP form and executed on a multiprocessor.
The target machine is raphson.cse.uiuc.edu. Passwords for this machine were e-mailed to you this morning. Please contact the instructor if you don't receive your password today. The OpenMP compiler is called guidef77 and it is located in /usr/local/bin. To use this compiler all you need to do is type
because you path variable already contains /usr/local/bin.
The compiler will produce an a.out file that you can execute by typing its name. Manuals for our version of OpenMP (in PostScript and PDF) can be found in /server/apps/guide35/docs.
We are going to use the program trfd.f which you can find in my home directory (That is, to get it just type cp ~padua/trfd.f .). This is one of the codes in a collection called the Perfect Benchmarks. Each program in this collection is a simplified version of a code typically executed on the most powerful machines. Trfd is a kernel for quantum mechanics calculations, but you don't need to know any quantum mechanics to do this MP. Trfd, and all the Perfect benchmark codes are not in the public domain. So, we should use trfd only for the purposes of this homework and the code cannot be distributed outside this class.
We need only to work on loop DO 100 in subroutine OLDA. For this exercise (as is usually the case in real life) it is better to parallelize the outermost loop that can be parallelized. However, although loop 100 can be parallelized, you don't have to do that to get full credit for this MP. Instead, you can parallelize (some of) the inner loops within loop 100. However, you will only receive partial credit if you don't parallelize the outermost loops that can be parallelized within loop 100.
However, I should encourage you to go beyond the line of duty. So, if you parallelize loop 100 you will receive up to 50 extra credits for the homework. There will be other extra credit options in future MPs. If you get 150 extra credits you will not need to do the extra MP required of graduate students to receive 1 unit of credit for the course. Alternatively, if you are not a graduate student or did not register for one unit, you can skip an MP if you get 150 units of credit.
There is no input data for trfd. Furthermore, the program validates itself. It prints in file TIV whether or not the result is valid. So, for your program to be correct it is necessary for it to print VALID. However, your program could still be wrong even if it prints VALID (remember the discussion on races). So, check each parallelized loop carefully.
The program also prints timing information, but you should ignore this information. In any case, your parallel program will run slower than the serial version. I believe it is necessary to parallelize loop 100 to get performance improvements.
Leave the OpenMP version of your program in you home directory under the name trfd.f. We will pick it up from there for grading. Please make your file group readable to the world and DO NOT modify the file after the due date, because we will take the date of the file as the date when you handed in your MP.
Finally, a word about an important limitation of our version of OpenMP. In OLDA, all vector parameters are declared as having one dimension of size 1. This works well for a conventional compiler because it does not need to know the size of a vector to generate correct code. All the compiler needs to know is that the parameter is a one-dimensional array. However, the OpenMP compiler needs to know the size of an array to generate a private copy. Therefore, if you try to generate a private version of one of the parameters, you will have to change the size in the specification (declaration) statement from 1 to the real size of the vector.