Collection hierarchy

Next: Key usage within Up: Program Representation Previous: Data structure incremental

Collection hierarchy

Polaris provides an extensive set of templatized data structures for holding information about the program. There are lists, trees, and sets. In addition, there is a consistent mechanism for traversing these structures. Following the Polaris philosophy of the ownership of data, there are versions of each type of structure which cause the structure to either own the data being refered to, or not.

The data structures are ``templatized'', meaning that they may be used to hold many different types of objects. In fact, any class object derived from the Listable base class may be refered to by a Collection. This includes most of the objects in Polaris.

Objects derived from the Listable class get Wrappers when they are inserted into one of the Collections. The Wrappers themselves are placed in the Collection. The Wrappers contain a reference count to show how many references there are to the given element, as well as a pointer to the element, and other miscellaneous information about the status of the element.

It is just as easy to make and use a List<Expression> as it is to make and use a List<ProgramUnit>. Since these structures are templatized, a consistent set of functions are used to insert objects, delete objects, and modify them.

To form a linked list of expressions, you can create an empty list via:

          List<Expression> *elist = new List<Expression>;

Then, to insert elements into it, you use the member function ins_last(), such as

          elist->ins_last(expr);

where expr is defined as Expression *expr. Then, to traverse the linked list, you use an Iterator object, such as:

          for (Iterator<Expression> e_iter = elist;
               e_iter.valid();
               ++e_iter) {

              Expression *list_element = e_iter.current();

                    . . .

              }

As shown in the above example, Iterators offer several easy-to-use member functions for striding through a data structure:

valid() - evaluates ``True'' until you are finished stepping through the structure
current() - returns a reference to the current item in the traversal
operator ++ - step to the next item in the traversal

It is important that the operator ++ be used in pre-increment mode rather than post-increment mode (that is ++ iter instead of iter ++). This is merely due to the semantics of C++. The pre-increment form is defined to increment the object, then return the (incremented) value. The post-increment form is defined to return the value, then increment. Both may return the same thing in a sloppy implementation of C++, but we should not count on sloppy implementations.

This is related to a humorous observation about C++: From its name, C++ appears to have greater value than C, but when you get down to it, the value is really no different. Now, if somebody comes up with a really good language based on C, they should call it ++C!

An Iterator was meant to provide a read-only traversal of a data structure. An outwardly-identical class, called Mutator is meant to be used if the list is to be modified (such as elements being deleted) while it is being traversed.

As shown in the example above, merely assigning a Collection object to an Iterator initializes the Iterator, allowing it to be used for traversing the Collection.

The List structure actually owns the elements in it. It would be responsible for deleting its elements. So, whenever the List is deleted, so are all the elements. There is a similar structure for holding only pointers to elements, without being responsible for the elements themselves. This is called a RefList. Code for creating and using a RefList looks outwardly identical to that for a List. The programmer must keep in mind that the ownership of objects put into a RefList does not change. Something else, not the RefList, is responsible for deleting the objects when they are no longer needed. When the RefList is deleted, its objects will have their reference counts decremented and will only be deleted if the reference count goes to zero.

The List and RefList structures are good for a linear search through a collection of objects without a key. A ``direct'' access method is provided through the square bracket operator ( list[index] ), but this is not really direct access, as the implementation crawls from the head of the list to the end, counting elements until it reaches the element being addressed. It is particularly inefficient to use the square bracket operator to move backward through the list, such as

    // EXTREMELY INEFFICIENT CODE:
	for (i = nelements; i>=0; --i)  {
		. . . list[i] . . .
	}

since this is actually accomplished in quadratic time instead of linear time (it has to travel from the head of the list to the element accessed in each iteration of the loop). Such an operation is better accomplished by using an Iterator, setting it to the end of the list, then decrementing it to get to the next element. For instance:

	Iterator<Expression> iter = list;

	for (iter.set_to_last();  iter.valid();  --iter) {
	          . . . iter.current() . . .
	}

There is a family of data structures for holding data which should be accessible via a key. The type of structure used is base on three factors: whether the structure should own its data, whether the structure should own its key, and whether the key should be based on the address of its key or the value of its key. This gives eight possible combinations of data structures, which can be seen in 4.1.2.

Each of the eight possibilities is given a separate class name:

Map : owns data, key by address, does not own key
KeyMap : owns data, key by address, owns key
Database : owns data, key by value, does not own key
KeyDatabase : owns data, key by value, owns key
RefMap : does not own data, key by address, does not own key
RefKeyMap : does not own data, key by address, owns key
RefDatabase : does not own data, key by value, does not own key
RefKeyDatabase : does not own data, key by value, owns key

These classes represent data in a binary tree format, according to the key. The current implementation is that of a red-black tree. This data structure maintains a balanced form, and thus is very efficient for looking up data.

Several safety factors are built into Collections which help the system to detect improper uses of storage. The rule about data structures within Polaris is that they may have only a single owner. This is enforced within Collections by disallowing insertion of an object into a Collection if the Wrapper for the object indicates that it is already in a different Collection. If an object is deleted before all references to it are removed, it is turned into a Zombie node, meaning its storage is deleted except for a small stub. Then, if another access is attempted, the system catches it and reports an error. When all accesses are removed from an object, its reference count will be decremented to zero, and it will be automatically deleted.

An object must be owned by a Collection before it can be put into a Ref Collection.

A very useful operation on objects within Collections is the assign/pull operation. It is used in a context in which we would like to modify an object ``in place'' in a Collection. We can't do that directly in most cases because that would allow programs to bypass the consistency checks provided. But at the same time, it is fairly inefficient to remove an object from a list, modify it, and then re-insert it in the list.

Assign/pull allows us to leave the object's Wrapper in the collection, temporarily pull out the object, modify it, and replace it in the same position. This must be done in a certain order to make the mechanism work. This order is described in Section 4.7.1.

Sets are implemented within the Collection hierarchy as lists. The key is the address of the element inside. Sets don't work very well for things without a unique address in the system. For instance, a Set<Expression> could easily have duplicates of a given expression in it, if they all have different addresses. A Set is good for things which have unique addresses, for instance, symbols in a symbol table (there is a one-to-one mapping between a given symbol and its symbol table address).

Next: Key usage within Up: Program Representation Previous: Data structure incremental

Jay Hoeflinger
Mon Apr 21 11:52:18 CDT 1997