The goal of this project is to propose a lower-cost alternative to the
hardware-intensive COMA machines. The approach we take is to use a page
as the allocation unit in main memory. This allows us to avoid designing
main memory as a hardware cache. The earliest known work, which takes this
approach, is the Simple COMA project.
The main problem with this approach is that bringing data into local memory involves allocating space for the entire page containing that data. If the application has poor spatial locality, this leads to memory fragmentation. Consequently the page fault frequency increases, and the performance of the application suffers. We have a solution which reduces memory fragmentation and cuts down on the frequency of page faults. This work is in progress.
Reliability is a major concern when designing any large-scale shared-memory
multiprocessor. We have made our design fault-tolerant by protecting pages
in memory with their own firewall.
It prevents wild writes originating in a faulty node from corrupting
the memory of another node. Isolating faults in this manner ensures that only
applications making use of the faulty node are affected by it. This work is
in progress.