We build and investigate software systems for multicore computers. We have analyzed and fixed scalability problems in existing software, such as the Linux kernel, and built scalable software from scratch, such as RadixVM and the Corey kernel. To facilitate this work and help identify scalability bottlenecks on multicore computers we have built analysis tools, like DProf.
The Scalable Commutativity Rule/Commuter
The scalable commutativity rule is a formal rule for deciding when an implementation can scale based on interface-level properties. Commuter is a tool for using this to check the scalability of real software implementation.
RadixVM is a scalable virtual memory system designed to parallelize VM operations on non-overlapping regions of the address space.
This work analyzes the scalability of seven system applications running on Linux on a 48-core computer. Most applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard parallel programming techniques these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this work is that there is no scalability reason to give up on traditional operating system organizations just yet.
DProf is a statistical profiler that helps programmers understand cache miss costs by attributing misses to data types instead of code.
Metis is an in-memory MapReduce library optimized for multicore architectures. The high efficiency of Metis relies on the use of hash tables to store intermediate key/value pairs. To guarantee high performance, Metis determines the size of hash table by sampling the input, so that key/value insertions and queries are O(1). Metis also organizes key/value pairs within each hash table slot as a B+Tree and Parallel Sorting by Regular Sampling sorting algorithm for the Merge phase to prevent the inacurracy of sampling from degrading the performance.
Corey is an experimental operating system designed to give applications control of sharing kernel data structures. Multiprocessor application performance can be limited by the operating system when the application uses the operating system frequently and the operating system services use data structures shared and modified by multiple processing cores. If the application does not need the sharing, then the operating system will become an unnecessary bottleneck to the applications’ performance. Corey arranges each kernel data structure so that only a single processor need update it, unless directed otherwise by the application.