Stochastic Computing: Embracing Errors in Architecture and Design of Hardware and Software

Rakesh Kumar, Assistant Professor in the Electrical & Computer Engineering Dept at the U of Ill at Urbana Champaign, Aug 20, 2012

All of computing today relies on an abstraction where software expects the hardware to behave flawlessly for all inputs under all conditions. While the abstraction worked historically due to the relatively small magnitude of variations in hardware and environment, computing will increasingly be done with devices and circuits which are inherently stochastic or whose behavior is stochastic due to manufacturing and environmental uncertainties. Couple it with the fact that there is an unprecedented cost and power pressure on the computing devices of future, the cost of maintaining the abstraction of flawless hardware for such emerging circuits/devices will be prohibitive and we will need to fundamentally rethink the correctness contract between hardware and software.

In our group, we are exploring a vision of computing systems where a) hardware is allowed to produce errors that are exposed to the highest layers of software, and b) hardware and software is optimized to maximize power savings afforded by relaxed correctness. We call the under-designed processors that produce stochastically correct results even under nominal conditions, stochastic processors. In this talk, I will present two example methodologies for building processors that are optimized for non-zero error rates. In the first example, the processor is optimized for timing errors that are assumed to be detected/corrected using a hardware error resilience mechanism. In the second example, GPU is allowed to produce certain control and data errors that the error resilient GPU applications can tolerate. I will also discuss two examples of building applications for stochastic processors. In the first example, applications are re-formulated as stochastic optimization problems that can tolerate numerical errors. In the second example, algorithmic techniques are used to derive approximate detection and correction schemes for sparse linear algebra problems. The significant power and reliability benefits in the different scenarios suggest that there may indeed be hope for software to save hardware when it comes to the power and the reliability problems of the future.

© 2018 New Mexico Consortium