HPC Runtime Support for Fast and Power Efficient Locking and Synchronization
- Type of Publication:
- Akkan, Hakan
- Lang, Michael
- Ionkov, Latchesar
- In Proceedings for IEEE Cluster 2013, http://pti.iu.edu/ieeecluster-2013/
- Abstract—As compute nodes increase in parallelism, existing intra-node locking and synchronization primitives need to be scalable, fast, and power efficient. Most parallel runtime systems try to find a balance between these properties during synchro- nization by fine-tuned spin-waiting and processor yielding to the OS. Unfortunately, the code path followed by the OS to put the processor into a lower power state for idling almost always includes the interrupt processing path. This introduces an unnecessary overhead for both the waiting tasks and the task waking them up. In this work we investigate a pair of x86 specific instructions, MONITOR and MWAIT , that can be used to build these primitives with the desired performance and power efficiency properties. This pair of instructions allow a processor to quickly pause execution until another one wakes it up with single memory store avoiding the overhead of switching to the idle thread of the OS for the waiting task, and sending IPIs for the waking task. We implement a locking primitive using these instructions and evaluate its effectiveness in OpenMP on low to high scales. In these tests we have seen very good scaling and performance improvements of up to 23x and 6x power reduction at 64 cores. With these results as a motivation we propose that other high-core count processors include these type of instructions and make them available to user-space applications.
Full text: CLUSTER2013-final-mwait.pdf