Experimental Framework for Injecting Logic Errors in a Virtual Machine to Profile Applications for Soft Error Resilience

Hits: 4020
Type of Publication:
soft errors, resilience, fault tolerance, reliability, fault injection, virtual machines, high performance computing, supercomputing
  • DeBardeleben, Nathan
  • Blanchard, Sean P.
  • Fu, Song
  • Guan, Qiang
  • Zhang, Ziming
As the high performance computing (HPC) community con- tinues to push for ever larger machines, reliability continues to be a serious obstacle. Further, as feature size and voltages decrease, the rate of transient soft errors is on the rise. HPC programmers of today have to deal with these faults to a small degree and it is expected this will only be a larger problem as systems continue to scale. In this paper we present SEFI, the Soft Error Fault Injection framework, a tool for profiling software for its susceptibility to soft errors. In particular, we focus in this paper on logic soft error injection. Using the open source virtual machine (QEMU), we demonstrate modifying emulated machine instructions to introduce soft errors. We conduct experiments by modifying the virtual machine itself in a way that does not require intimate knowledge of the tested application. With this technique, we show that we are able to inject simulated soft errors in the logic operations of a target application without affecting other applications or the operating system sharing the VM. We present some initial results and discuss where we think this work will be useful in next generation co-design.

© 2018 New Mexico Consortium