Balancing Shared and Distributed Heaps on NUMA Architectures
Malak Aljabri, Hans-Wolfgang Loidl, and Phil Trinder
Abstract:
Due to the varying latencies between memory banks, efficient shared memory access is challenging on modern NUMA architectures. This has a major impact on the shared memory performance of parallel programs, particularly those written inlanguages with automatic memory management.
This paper presents a performance evaluation of distributed and shared heap
implementations of parallel Haskell on a state-of-the-art physical shared memory NUMA machine.
The evaluation exposes bottlenecks in the shared-memory management, which results in limits to
scalability beyond 25 out of the 48 cores.
We demonstrate that a hybrid system, GUMSMP, that combines both distributed and shared heap
abstractions consistently outperforms the shared memory GHC implementation on seven benchmarks
by a factor of 3.3 on average. Specifically, we show that the best results are obtained when
sharing memory only within a single NUMA region, and using distributed memory system abstractions
across the regions.
- Full paper: [pdf]
- Benchmarks and data set: Benchmarks