CSE Speaker Series – Dr. Patrick Bridges

On Friday, November 11 at 11:00 am in Cramer 221, Dr. Patrick Bridges from UNM will discuss Integrating Performance Modeling into Scalable System Software Design. 

Abstract: Developing new scalable system software techniques is essential to the success of emerging large-scale scientific computing systems due to the increasing scale and complexity of hardware, programming systems, and applications. In particular, HPC operating systems and middleware must address challenges in areas such as fault tolerance, scheduling, synchronization, power management, and high-speed communication. Interactions between these areas also complicate software design; recent research has shown, for example, that both power capping and asynchronous checkpointing can have widely varying and hard-to predict impacts on system performance. 

Because of these challenges, my research has increasing relied on performance modeling to expose research challenges, quantify performance tradeoffs, and evaluate the resulting system. This aspect of the research is challenging and rewarding because it requires understanding the underlying system, the strengths and limitations of different modeling approaches developed by the modeling community, and how to best integrate these techniques into system software design. In some cases, my students and I have been able to use simple analytical models; recently, however, we have recently been relying on more sophisticated stochastic modeling techniques. We have also begun exploring the viability of using large-scale computational models to inform the design of HPC system software. 

In this talk, I discuss several systems research projects my students and I have conducted to meet HPC system software challenges in the areas of resilience, scheduling, and communication system design. In each of these areas, I describe both the research itself and how modeling techniques have informed the research. Finally, I will briefly discuss some new research directions we are currently exploring as well as provide some thoughts on the broader integration of modeling and evaluation in computer systems research and education.