Automatic Parallelization of Conventional Fortran Programs

Principal Investigators: David A. Padua, Josep Torrellas, and Rudolf Eigenmann

Polaris Objective

To advance the state of the art in automatic program parallelization. To create and maintain an optimizing compiler that represents this state of the art and that can be used as a solid infrastructure by compiler research and development groups. The Polaris compiler will include all optimization techniques necessary to transform a given sequential program into a form that runs efficiently on the target machine. This includes techniques such as automatic detection of parallelism and distribution of data. The intended target machines are high-performance parallel computers with a global address space as well as traditional shared-memory multiprocessors.

Polaris Bulletin Board

For your questions, bug reports, suggestions, etc.

The Polaris Group

Students

Graduates

Bill Blume Ph.D. '95

Luiz DeRose Ph.D. '96

Keith Faigin M.S. '94

Sanjoy Ghosh Ph.D. '92

John Grout M.S. '95

Luddy Harrison Ph.D. '89

Jay Hoeflinger Ph.D. '98

John Jozwiak M.S. '92

Jee Ku M.S. '95

Thomas Lawrence M.S. '96

Sam Midkiff Ph.D. '92

Yunheung Paek Ph.D. '97

Paul Petersen Ph.D. '93

Bill Pottenger Ph.D. '97

Lawrence Rauchwerger Ph.D. '95

David Sehr Ph.D. '92

Peng Tu Ph.D. '95

Stephen Weatherford M.S. '94

Description of the Polaris Compiler

The Polaris compiler takes a Fortran77 program as input, transforms this program so that it runs efficiently on a parallel computer, and outputs this program version in one of several possible parallel Fortran dialects.

The input language includes several directives which allow the user of Polaris to specify parallelism explicitly in the source program. The output language of Polaris is typically in the form of Fortran77 plus parallel directives as well. For example, a generic parallel directive set includes the directives "CSRD$ PARALLEL" and "CSRD$ PRIVATE a,b", specifying that the iterations of the subsequent loop shall be executed concurrently and that the variables a and b shall be declared "private to the current loop", respectively. Another output language that Polaris can generate is the Fortran plus the directive language available on the SGI Challenge machine.

Polaris performs its transformations in several "compilation passes". In addition to many commonly known passes, Polaris includes advanced capabilities performing the following tasks: array privatization, data dependence testing, induction variable recognition, interprocedural analysis, and symbolic program analysis. An extensive set of options allow the user and the developer of Polaris to experiment with the tool in a flexible way. An overview of the Polaris transformations is given in the Publication Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing.

The implementation of Polaris consists of 170,000 lines of C++ code. A basic infrastructure provides a hierarchy of C++ classes that the developers of the individual compilation passes can use for manipulating and analyzing the input program. This infrastructure is described in the Publication "The Polaris Internal Representation". The Polaris Developer's Guide gives a more thorough introduction for compiler writers.

How to get a copy of Polaris

Download license

Download a copy of Polaris

Download a copy of the Perfect Benchmarks

History

Results

We executed a set of benchmark programs (in real-time mode for timing accuracy) on eight processors of an SGI Challenge with 150 MHz R4400 processors at the National Center for Supercomputing Applications (NCSA). We ran each code serially and recorded the execution times. Then we ran them in parallel after parallelizing them with the Silicon Graphics Power Fortran Analyzer. Finally we ran them in parallel after transforming them with Polaris. The increase in speed for each code, due to each compiler, compared with the serial speed, is plotted on the chart below. The ratio plotted for each compiler is computed as:

Speedup = Time_serial / Time_parallel

Next, in an effort to determine the importance of six of the parallelization techniques used within Polaris, we compiled each code six more times. We turned off one of the six techniques each time, and counted what percentage of loops in the code were serialized. The results are shown on the chart below. The vertical axis represents the percentage of loops serialized by turning off a given technique. The six techniques were: advanced induction variable analysis, automatic inlining of subroutines, interprocedural value propagation (IPVP), array privatization, a dependence test called the Range Test, and advanced reduction analysis.

Program Name	Brief Description	Benchmark Suite	Lines of Code
ARC2D	Fluid dynamics (2D inviscid flows)	Perfect	4000
BDNA	Nucleic acid simulation	Perfect	4000
FLO52	Fluid dynamics (2D inviscid flow)	Perfect	2368
MDG	Liquid water simulation	Perfect	1200
OCEAN	Fluid dynamics (2D Boussinesq fluid layer)	Perfect	3285
TRFD	Quantum mechanics	Perfect	634
TURB3D	3D turbulent fluid flow	Perfect	1400
HYDRO2D	Galactical jets simulation	SPEC	4289
SU2COR	Elementary particle mass computation	SPEC	2333
SWIM	Shallow water equations	SPEC	426
TFFT2	Fourier transforms	SPEC	642
TOMCATV	Mesh generator	SPEC	190
CLOUD3D	Weather simulation,	NCSA	14438
CMHOG	Astrophysical simulation	NCSA	11826

Ongoing Projects

Retargeting Polaris at multiprocessor workstations

Compiling for Scalable Shared-memory Multiprocessors

Optimizing Parallel Programs

Number of sites to visit this page since March 10, 1997