Research Activities

At the beginning of the current year, we had in-place the hardware and software infrastructure necessary to begin building our compiler infrastructure. We had also designed the overall framework for our compiler, assuming the Jikes RVM as the software base. We now discuss the individual research activities, and our results in each:

Consistency Model Support

In addition to the Java memory model as supplied by the Jikes RVM, we have implemented support for a sequentially consistent memory model. Currently, the memory targeted by a particular compilation is selected by a compiler flag. Research is ongoing to develop a language for specifying the consistency model to be targeted by a particular compilation.

A consistency model is supported using the following steps. Since the consistency model determines the order that memory accesses in a thread are observed by (or observe) memory accesses in other threads, the memory model imposes a default ordering on the program memory accesses. In reality, analyses can show that some of these default ordering are overly conservative, for three reasons:
  1. First, the code being compiled may not execute in parallel with any other threads, thus there is no other thread to observe theorder of the memory accesses, and when the access can occur is only constrained by intra-thread dependences. May-happen-in-parallel (MHP) analysis determines if code being compiled can be in a thread that executes in parallel with other threads.
  2. Second, even if an access in the code being compiled is executing in parallel with other threads, the storage being accessed may not be observable by those other threads, and again, the access is constrained only by intra-thread dependences. That a storage location is accessible by more than one thread is determined by escape analysis.
  3. Third, even if the storage being accessed can be accessed by another thread executing in parallel, only if the access is part of a critical cycle, which are detected by delay set analysis, should the order of the access be constrained by more than intra-thread dependences.
After these analyses, we are left with a set of of orders between accesses in a thread that must be enforced at the level of hardware memory operations for the memory model to be correctly enforced. This enforcement is done in two stages:
  1. prevent transformations from occurring that will cause an order that should be enforced to be violated;
  2. efficiently insert the necessary fence instructions to enforce these orders, including the optimization eliminating redundant fences[1, 2].
    Fence insertion can be thought of as creating a virtual machine that has the memory model specified for the program, with the compiler preserving the model by only performing transformations that are legal under the model.
Thus, to efficiently support the different consistency models, we need the following analyses and optimizations:
We will now discuss our activities for each of these, in turn.

MHP Analysis and Delay Set Analysis

As mentioned above, we need to perform more analyses to identify those memory accesses that really conflict: We currently have one student working on these two analyses to develop efficient (i.e.low order polynomial) compile-time solutions to these problems. For MHP analysis we are investigating a program thread structure graph, similar to a call graph, that allows parallelism between code regions to be easily determined. For delay set analysis we are looking at modified path detection algorithms. These algorithms have the property that they can be parameterized to be very conservative and fast running, or precise and execute in worst case exponential time.

Escape Analysis

Escape analysis is used to identify all memory locations that can be accessed by more than one thread (as opposed to all shared memory locations, i.e. those that reside in shared memory but are not accessible by more than one thread).

We have implemented simple and very conservative, but fast, analysis. It is an intra-procedural analysis that considers all memory locations that are accessible from static variables, method parameters, or return values as escaping memory locations. This version gives worst-case results, i.e. it gives an upper bound on the performance slowdowns we might expect to see from any reason- able analysis framework.

We have also performed a manual (and optimistic) escape analysis on a set of benchmark programs. We inspect source code, and reason what variables (memory locations) can possibly be accessed by other threads of execution. We mark these variables specially to identify them as escaping. This version gives the best-case results for the precision of escape analysis.

Experimental data for both of these analyses has been collected, and is presented in theExperimental Result Section.

We are currently developing an automatic escape analysis algorithm that is more precise than the simple and conservative approach described above. Because our analysis is concerned only with thread escaping accesses, and not the more general lifetime or method escaping properties of other works, we should be able to develop an analysis that is, in practice, faster than previously developed approaches but almost as accurate.

We have developed and implemented an alias analysis algorithm that detects aliases involving specific array elements, in addition to aliases detected by other known algorithms. This algorithm is required for accurately analyzing Java applications with multidimensional arrays. This is because all Java arrays are one-dimensional objects, and a multidimensional array is an array of arrays (i.e. Java arrays need not be "regular"). Our algorithm is store-based: it focuses on objects stored in memory, where each object is identified by a unique name. The name for an array element is the name for the array object suffixed by a parameterized expression. A specific value for the parameter identifies one particular array element, and the range of possible values for the parameter identifies all the elements of that array. This naming scheme allows a compact representation for names for all array elements in the program. Using the result of escape analysis, we can conservatively assume that field accesses of escaping objects are potentially conflicting. We then disable optimizations that may reorder these accesses.

Program Transformations

Using the analyses described above, we have modified the Jikes RVM to disable the application of program transformations when necessary to maintain the correctness of the program under the selected consistency model. As an example of this, consider redundant load elimination applied to the following code segment, where r1 , r2 , and r3 are registers.

Load r1, *a // "a" contains the address of a memory location 
            // from whichto load thevalue into r1 
Load r2, *b // "b" contains the same address as "a" // i.e. "a" and "b" are aliased)
Load r3, *a


can be transformed into:

Load r1, *a // "a" contains the address of a memory location 
            // from which to load thevalue into r1 
Load r2, *b // "b" contains the same address as "a" // i.e. "a" and "b" are aliased)
Move r3, r1


This transformation violates sequential consistency if another thread changes the value in memory of location "a" between the first and second Load . In this case, r3 might get an older value than r2, even if the assignment to r3 occurs after the assignment to r2 . We examined all of the optimizations implemented in the Jikes RVM to determine if they were valid for a sequentially consistent execution. We modified the transformations so that that they are not performed where they could potentially violate sequential consistency (for the case when sequential consistency is chosen as the memory model), but are performed, even in sequentially consistent programs, where they are shown to be safe by the analyses described above. Optimizations in the Jikes RVM that are affected by this are
  • redundant load elimination,
  • redundant store elimination,
  • loop invariant code motion, and
  • scalar replacement of loads.

Fence Instruction Insertion and Optimization

Processors force memory accesses issued before a fence instruction (fence) to complete before issuing further memory operations. Thus, fences can be used to enforce necessary orders for a memory model in the underlying hardware - even when that hardware implements a more relaxed consistency model. For sequential consistency, it is sufficient (but inefficient) to insert a fence before every access to a memory location that may be shared by more than one thread of execution. More accurate escape analysis (i.e. thread-based), MHP analysis, and delay set analysis reduce the number of orders that need to be enforced by inserting a fence, and consequently reduce the number of fences needed. Going beyond this, two other techniques can be used to enforce fences:
  1. register hazards (i.e. reads/read, write/read and read/write operations to the same register) in conjunction with the store order from the processor enforce an order of accesses within a thread on the order that a particular memory location is read. This fact is exploited to reduce the number fences that are inserted.
  2. More subtly, two orders to be enforced can overlap, i.e. in the sequence of operations:

            load A 
    load B
    load C

    it is possible that the orders to be enforced are A before C and B before C. Fences can be inserted in at least two ways:
    1. by placing a fence after A and B, or
    2. more efficiently, by placing a single fence after B that enforces both of the orders.
    Additional complexity is added when programs with loops and branches are considered.
We have implemented a fast, slow insensitive form of the algorithm, and a slower, but more precise flow sensitive form of the analysis. Results of the analysis follow.

Experimental Results

We obtained preliminary experimental results for our basic implementation of sequential consistency using the specJVM98 benchmark suite. Note that the goal of this project is not to provide higher performance, but to provide comparable performance with an easier-to-understand-and-use memory consistency model. We compared the performance slowdown for running the benchmark programs using sequential consistency as the memory model, versus using the default memory model implemented in the Jikes RVM.

Table 1 gives slowdowns for the specJVM98 benchmarks using the simple and conservative, but automatic, form of escape analysis described above. The "Original Time" column is the time for the program using the default Jikes RVM Java memory model, column "Naive" is with a fence insertion algorithm that inserts a fence after every shared variable, and column "Local" uses intra-basic-block information information to optimize the placement of fences. Note that the slowdown numbers are shown in the parenthesis.

Table 1: Performance numbers for naive automatic escape analysis
Program Performance in seconds with simple escape analysis
Number in parentheses is the slowdown
Benchmark Original Time Naive Local
_200_check 0.5909 0.6031 (1.02) 0.5904 (1.00)
_201_compress 2.4088 38.1364 (15.83) 30.2705 (12.57)
_202_jess 0.7975 4.6505 (5.83) 4.2613 (5.34)
_209_db 1.1162 1.6485 (1.48) 1.5042 (1.35)
_222_mpegaudio 1.7270 31.419 (18.19) 30.3742 (17.59)
_227_mtrt 1.6067 6.6871 (4.16) 5.9325 (3.69)
_228_jack 3.2935 7.1071 (2.16) 6.9189 (2.10)

Table 2 presents similar numbers for the programs using manual, precise escape analysis. This represents an upper bound on the performance that can be expected for sequentially consistent programs using an escape analysis that doesn't differentiate between the different fields (or elements, in the case of arrays) object. The "Global" column gives performance numbers for whole procedure, flow sensitive, fence insertion optimization.

Table 2: Performance numbers for optimistic, manual escape analysis
Program Performance in seconds with simple escape analysis
Number in parentheses is the slowdown
Benchmark Original Time Naive Local Global
_200_check 0.5909 0.6016 (1.02) 0.5883 (0.99) 0.5881 (0.99)
_201_compress 2.4088 5.9381 (1.79) 6.0025 (1.81) 5.7639 (1.74)
_202_jess 0.7975 0.823 (1.07) 0.882 (1.15) 0.815 (1.06)
_209_db 1.1162 1.1278 (1.03) 1.115 (1.02) 1.1083 (1.01)
_222_mpegaudio 1.7270 23.1003 (12.9) 22.7267 (12.7) 22.6943 (12.7)
_227_mtrt 1.6067 1.6259 (1.01) 1.6111 (1.00) 1.6123 (0.62)
_228_jack 3.2935 3.3056 (1.05) 3.3008 (1.04) 3.3576 (1.06)


We observe that with the manual analysis, there was a significant slowdown for only two benchmark programs ( _201_compress and _222_mpegaudio). We are currently investigating both of these in more detail We are hopeful that the advanced alias analysis should yield better results for these, and other benchmarks.

| Home | Contact Us|

This material is based upon work supported by the National Science
Foundation under Grant No. 0081265.
Any opinions, findings, and conclusions or recommendations expressed in
this material are those of the author(s) and do not necessarily reflect the
views of the National Science Foundation
.