Automatic Parallelization of Conventional Fortran Programs

Principal Investigators: David A. Padua, Josep Torrellas, and Rudolf Eigenmann

Polaris Objective

To advance the state of the art in automatic program parallelization. To create and maintain an optimizing compiler that represents this state of the art and that can be used as a solid infrastructure by compiler research and development groups. The Polaris compiler will include all optimization techniques necessary to transform a given sequential program into a form that runs efficiently on the target machine. This includes techniques such as automatic detection of parallelism and distribution of data. The intended target machines are high-performance parallel computers with a global address space as well as traditional shared-memory multiprocessors.

Polaris Bulletin Board

For your questions, bug reports, suggestions, etc.

The Polaris Group


  • George Almasi
  • Calin Cascaval
  • Young Min Kim
  • Jaejin Lee
  • Yuan Lin
  • Jaigak Song
  • Predrag Tosic
  • Peng Wu
  • Graduates

  • Bill Blume Ph.D. '95
  • Luiz DeRose Ph.D. '96
  • Keith Faigin M.S. '94
  • Sanjoy Ghosh Ph.D. '92
  • John Grout M.S. '95
  • Luddy Harrison Ph.D. '89
  • Jay Hoeflinger Ph.D. '98
  • John Jozwiak M.S. '92
  • Jee Ku M.S. '95
  • Thomas Lawrence M.S. '96
  • Sam Midkiff Ph.D. '92
  • Yunheung Paek Ph.D. '97
  • Paul Petersen Ph.D. '93
  • Bill Pottenger Ph.D. '97
  • Lawrence Rauchwerger Ph.D. '95
  • David Sehr Ph.D. '92
  • Peng Tu Ph.D. '95
  • Stephen Weatherford M.S. '94

  • Description of the Polaris Compiler

    The Polaris compiler takes a Fortran77 program as input, transforms this program so that it runs efficiently on a parallel computer, and outputs this program version in one of several possible parallel Fortran dialects.

    The input language includes several directives which allow the user of Polaris to specify parallelism explicitly in the source program. The output language of Polaris is typically in the form of Fortran77 plus parallel directives as well. For example, a generic parallel directive set includes the directives "CSRD$ PARALLEL" and "CSRD$ PRIVATE a,b", specifying that the iterations of the subsequent loop shall be executed concurrently and that the variables a and b shall be declared "private to the current loop", respectively. Another output language that Polaris can generate is the Fortran plus the directive language available on the SGI Challenge machine.

    Polaris performs its transformations in several "compilation passes". In addition to many commonly known passes, Polaris includes advanced capabilities performing the following tasks: array privatization, data dependence testing, induction variable recognition, interprocedural analysis, and symbolic program analysis. An extensive set of options allow the user and the developer of Polaris to experiment with the tool in a flexible way. An overview of the Polaris transformations is given in the Publication Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing.

    The implementation of Polaris consists of 170,000 lines of C++ code. A basic infrastructure provides a hierarchy of C++ classes that the developers of the individual compilation passes can use for manipulating and analyzing the input program. This infrastructure is described in the Publication "The Polaris Internal Representation". The Polaris Developer's Guide gives a more thorough introduction for compiler writers.

    How to get a copy of Polaris

    Download license

    Download a copy of Polaris

    Download a copy of the Perfect Benchmarks



    We executed a set of benchmark programs (in real-time mode for timing accuracy) on eight processors of an SGI Challenge with 150 MHz R4400 processors at the National Center for Supercomputing Applications (NCSA). We ran each code serially and recorded the execution times. Then we ran them in parallel after parallelizing them with the Silicon Graphics Power Fortran Analyzer. Finally we ran them in parallel after transforming them with Polaris. The increase in speed for each code, due to each compiler, compared with the serial speed, is plotted on the chart below. The ratio plotted for each compiler is computed as:

    Speedup = Timeserial / Timeparallel

    Next, in an effort to determine the importance of six of the parallelization techniques used within Polaris, we compiled each code six more times. We turned off one of the six techniques each time, and counted what percentage of loops in the code were serialized. The results are shown on the chart below. The vertical axis represents the percentage of loops serialized by turning off a given technique. The six techniques were: advanced induction variable analysis, automatic inlining of subroutines, interprocedural value propagation (IPVP), array privatization, a dependence test called the Range Test, and advanced reduction analysis.

    Program Name Brief Description Benchmark Suite Lines of Code
    ARC2D Fluid dynamics (2D inviscid flows) Perfect 4000
    BDNA Nucleic acid simulation Perfect 4000
    FLO52 Fluid dynamics (2D inviscid flow) Perfect 2368
    MDG Liquid water simulation Perfect 1200
    OCEAN Fluid dynamics (2D Boussinesq fluid layer) Perfect 3285
    TRFD Quantum mechanics Perfect 634
    TURB3D 3D turbulent fluid flow Perfect 1400
    HYDRO2D Galactical jets simulation SPEC 4289
    SU2COR Elementary particle mass computation SPEC 2333
    SWIM Shallow water equations SPEC 426
    TFFT2 Fourier transforms SPEC 642
    TOMCATV Mesh generator SPEC 190
    CLOUD3D Weather simulation, NCSA 14438
    CMHOG Astrophysical simulation NCSA 11826

    Ongoing Projects

  • Retargeting Polaris at multiprocessor workstations
  • Compiling for Scalable Shared-memory Multiprocessors
  • Optimizing Parallel Programs

  • Polaris Publications

    CSRD Reports

    Reports by Polaris User Groups

    This page last updated on December 4, 1998 by JPH.
    Email us at

    Number of sites to visit this page since March 10, 1997