The first decade of the development of parallelizing compilers was distinguished by the theoretical approach. From the development of dependence analysis through unimodular transformations, the work was largely mathematical. People in the field spent a lot of time developing program transformations which would be useful ``if the compiler found a certain pattern''. There was very little work done to determine what patterns the compilers were likely to find.
In the late 1980s we decided to analyze how well commercial compilers could parallelize a set of real programs, the Perfect Benchmarks. We found, to our surprise, that two leading commercial parallelizers performed dismally on almost every program.
Then we launched our hand-analysis of the Perfect Benchmarks, in which we set out to see what compiler techniques would be actually useful for parallelizing those programs. It turned out that a fairly small number of sophisticated techniques were necessary to get good speedups on those programs.
We believed that we could speed up the codes automatically if we implemented those techniques in a compiler. To prove our point, we embarked upon this Polaris project. The results so far are encouraging, and they have carried over to programs other than the Perfect programs. We have good results with the SPEC '95 codes as well as currently in-use scientific programs we received from NCSA.