Header menu link for other important links
X
Towards Scalable Lifetime Reliability Management for Dark Silicon Manycore Systems
V. Rathore, , A.K. Singh, T. Srikanthan, M. Shafique
Published in Institute of Electrical and Electronics Engineers Inc.
2019
Pages: 204 - 207
Abstract
Aggressive technology scaling enabled very high integration density. Unfortunately, it also led to issues such as process variation, increased power density and consequently rising chip temperature resulting in accelerated device aging and poor lifetime reliability of different components in a manycore system. Moreover, thermal and power limitations let only a fraction of the chip function at full speed; the rest is the dark silicon. Most of the lifetime reliability enhancement solutions for the multi-/manycore systems in the literature are heuristic-based, while some use standard compute-intensive methods to solve the optimization problem making them not scale well with the manycore size. The heuristic-based solutions are formulated to search through the design space of a fine granularity making it huge, limiting their scalability. Also, these approaches do not account for the impact of different applications' execution behavior on the aging of the underlying cores, and their performance requirement distribution across the cores to their advantage. In this paper, we present our resource management strategies towards building scalable lifetime reliability enhancement solutions for dark silicon manycore systems. The first technique, Hierarchical Mapping approach (HiMap), maps a periodic workload employing a block-based hierarchical method that leverages dark cores for thermal mitigation. The second approach, LifeGuard, uses reinforcement learning to learn the applications' aging behavior, and is aware of the performance requirement pattern onto the core frequencies. It maps randomly arriving requests and is scalable to the number of applications and the size of a manycore. © 2019 IEEE.