12B – Managing Lifetime in Multi-Core Systems

Room: Florentine II
Organizer: Cristiana Bolchini (Politecnico di Milano, Italy)
Moderator: Giorgio Di Natale (LIRMM, France)

  • Lifetime reliability modeling and estimation in multi-core systems
    Antonio Miele (Politecnico di Milano)

    Abstract: Nowadays, lifetime reliability has assumed a role of primary driver in the design of modern multi-core systems, due to the aggressive CMOS technology downscaling that has caused an acceleration in device aging and wear-out phenomena. Complexity of modern systems has pushed the focus of the design activities at system level. At that level, the estimation of the lifetime of a multi-core system is one of the most relevant challenges due to the many design choices (such as mapping, scheduling, use of spare units,..) as well as the high variability of the working condition the system will face with (e.g. variably workload or premature failures of a subset of the cores). As a conclusion the definition of an accurate and flexible model for the the estimation of the lifetime reliability of a multi-core system is mandatory.

  • Early System Failure Prediction by Using Aging In Situ Monitors: Methodology of Implementation and Application Results
    Lorena Anghel (TIMA Laboratory)

    Abstract: With CMOS technology scaling, it becomes more and more difficult to guarantee circuit functionality for all process voltage temperature (PVT) corners. Moreover, circuit wearout degradation lead to additional temporal variations, resulting in an important increase of design margins for targeted a specific reliable systems (automotive or health care embedded application). Adding pessimistic timing margin to guarantee all operating points under worse case conditions is no more acceptable due to the huge impact on design costs. Therefore the usage of in situ monitors for pre-error detection becomes a must, as it allow decreasing the constraints imposed on the design. However, the decision of placing these monitors on the right paths is not an easy task, as it can rapidly result in an explosion of the design surface in addition to an increase of engineer efforts induced by the recursive place and route operations. Therefore, a methodology of placing the In situ Monitors by taking into accounts all tradeoffs and essential parameters such as workload, temperature, area and performance penalty has to be used. These in Situ Monitors are then used in a feedback loop of voltage regulation to provide, process, temperature and age compensation. Several application results for different circuits fabricated in 28nm FDSOI will demonstrate the approach.

  • Runtime resource management for lifetime extension in multi-core systems
    Cristiana Bolchini (Politecnico di Milano)

    Abstract: The availability of numerous, possibly heterogeneous, processing resources in multi-core systems allows one to exploit them to optimize performance and/or power consumption. However, it is important to be aware that such solutions may have an impact on the overall lifetime of the system because of aging and wear-out mechanisms, and runtime management strategies, generally adopted for handling performance and power consumption aspects, should be enhanced in order to consider such issues. The overall goal is to pursue the improvement of lifetime reliability while optimizing for performance/power.