Load-balanced and locality-aware scheduling for dataintensive workloads at extreme scales

Hits: 1237
Year:
2015
Type of Publication:
Article
Authors:
  • Wang, Ke
  • Qiao, Kan
  • Sadooghi, Iman
  • Zhou, Xiaobing
  • Li, Tonglin
  • Lang, Michael
  • Raicu, Ioan
Journal:
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE
Number:
00
Pages:
1-29
BibTex:
Abstract:
Data driven programming models such as many-task computing (MTC) have been prevalent for running data-intensive scientific applications. MTC applies over-decomposition to enable distributed scheduling. To achieve extreme scalability, MTC proposes a fully distributed task scheduling architecture that employs as many schedulers as the compute nodes to make scheduling decisions. Achieving distributed load balancing and best exploiting data-locality are two important goals for the best performance of distributed scheduling of data-intensive applications. Our previous research proposed a data-aware work stealing technique to optimize both load balancing and data-locality by using both dedicated and shared task ready queues in each scheduler. Tasks were organized in queues based on the input data size and location. Distributed key-value store was applied to manage task metadata. We implemented the technique in MATRIX, a distributed MTC task execution framework. In this work, we devise an analytical sub-optimal upper bound of the proposed technique; compare MATRIX with other scheduling systems; and explore the scalability of the technique at extreme scales. Results show that the technique is not only scalable, but can achieve performance within 15% of the sub-optimal solution.
Back

© 2017 New Mexico Consortium