SoCC'24 Queue Management for SLO-Oriented Large Language Model Serving
Archit Patke, Dhemath Reddy, Saurabh Jha, Haoran Qiu, Christian Pinto, Chandra Narayanaswami, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 13th Symposium on Cloud Computing (SoCC)
preprintcite Ongoing integration by IBM watsonx
DSN'24 iPrism: Characterize and Mitigate Risk by Quantifying Change in Escape Routes
Shengkun Cui, Saurabh Jha, Ziheng Chen, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfslidescodecite
DSN'24 Fault Localization Using Interventional Causal Learning for Cloud-Native Applications
Saurabh Jha, Jesus Rios, Frank Bagehorn, Larisa Shwartz, Naoki Abe
In Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S)
pdfinterviewproduct annoucmentcite Insights used by IBM Instana
DSN'24 When Green Computing Meets Performance and Resilience SLOs
Haoran Qiu, Weichao Mao, Chen Wang, Saurabh Jha, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer
In Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'24 Disrupt Track)
pdfcite
ATC'24 Power-aware Deep Learning Model Serving with µ-Serve.
Haoran Qiu, Weichao Mao, Archit Patke, Shengkun Cui, Saurabh Jha, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer
In Proceedings of the 2024 USENIX Annual Technical Conference (ATC ‘24)
preprintciteslides
CLOUD'24 SAM: Subseries Augmentation-based Meta-learning for Generalizing AIOps Model in Multi-Cloud Migration.
Xi Yang, Paulito Palmes, Saurabh Jha, Bekir Turkkan, Gerard Vanloo, Frank Bagehorn, Chandra Narayanaswami, Larisa Shwartz, Naoki Abe, Yu Deng, Daby M. Sow
In Proceedings of the 16th International Conference on Cloud Computing (CLOUD)
pdf
AIOps'24 Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
Haoran Qiu, Weichao Mao, Archit Patke, Shengkun Cui, Saurabh Jha, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer
In Proceedings of the 5th International Workshop on Cloud Intelligence / AIOps at ASPLOS 2024 (AIOps 2024)
preprintcitecode
AIOps'24 QLM: Queue Management for Large Language Model Serving.
Archit Patke, Dhemath Reddy, Saurabh Jha, Christian Pinto, Haoran Qiu, Shengkun Cui, Chandra Narayanaswami, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 5th International Workshop on Cloud Intelligence / AIOps at ASPLOS 2024 (AIOps 2024).
papercite Insights used by IBM watsonx
AAAI-24 Optimizing IT FinOps and Sustainability through Unsupervised Workload Characterization
Xi Yang, Rohan R. Arora, Saurabh Jha, Chandra Narayanaswami, Cheuk Lam, Jerrold Leichter, Yu Deng, Daby M. Sow
In The Thirty-Eighth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI)
pdfslidestalkdoicite Ongoing integration in IBM Turbonomic
IAAI'23 Fault Injection based Interventional Causal Learning for Distributed Applications
Qing Wang, Jesus Rios, Saurabh Jha, Karthikeyan Shanmugam, Frank Bagehorn, Xi Yang, Robert Filepp, Naoki Abe, Larisa Shwartz
In The Thirty-Fifth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI)
cite
ASE'22 WOLFFI: A fault injection platform for learning AIOps models
Frank Bagehorn, Jesus Rios, Saurabh Jha, Robert Filepp, Larisa Shwartz, Naoki Abe, Xing Yang
In 2022 37th IEEE/ACM International Conference on Automated Software Engineering (ASE)
pdftalkcite
CLOUD'22 Localizing and Explaining Faults in Microservices using Distributed Tracing
Jesus Rios, Saurabh Jha, Larisa Shwartz
In Proceedings of the 15th International Conference on Cloud Computing (CLOUD)
pdf
DSN'22 Exploiting Temporal Data Diversity for Detecting Safety-critical Faults in AV Compute Systems
Saurabh Jha, Shengkun Cui, T. Tsai, S. K. S. Hari, M. B. Sullivan, Zbigniew T. Kalbarczyk, Steve Keckler, Ravishankar K. Iyer
In Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfcite
COMPSYS'22 Evaluating hardware memory disaggregation under delay and contention
Archit Patke, Haoran Qiu, Saurabh Jha, Srikumar Venugopal, Michele Gazzetti, Christian Pinto, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 36th IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPS-W)
Best Presentation
Patent Automated selection of performance monitors
Saurabh Jha, Amos A. Omokpo, Karthick Rajamani, Harigovind Venkatraj Ramasamy
In US Patent Office (USPTO), US16/818,656, Granted
pdf
Patent Hardware fault detection for feedback control systems in autonomous machine applications
Tim Tsai, Saurabh Jha, Siva Hari, Michael Sullivan
In US Patent Office (USPTO), US16/994,382, Granted
pdf
ML4AD'21 Watch out for the risky actors: Assessing risk in dynamic environments for safe driving
Saurabh Jha, Yan Miao, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In the Workshop on Machine Learning for Autonomous Driving colocated with NeurIPS
pdfcite
WOSC'21 Is Function-as-a-Service a Good Fit for Latency-Critical Services?
Haoran Qiu, Saurabh Jha, Subho S. Banerjee, Archit Patke, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the Seventh International Workshop on Serverless Computing colocated with ACM/IFIP International Middleware Conference
pdftalkdoicitecode
TOR'21 Data-Driven Application-Oriented Reliability Model of a High-Performance Computing System
Bentolhoda Jafary, Saurabh Jha, Lance Findella, Ravishankar K. Iyer
In Proceedings of the IEEE Transactions on Reliability
pdfdoicite
ICS'21 Delay sensitivity-driven congestion mitigation for HPC systems
Archit Patke, Saurabh Jha, Haoran Qiu, Jim Brandt, Ann Gentile, Joe Greenseid, Zbigniew Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the ACM International Conference on Supercomputing
pdfdoicite
ASPLOS'21 BayesPerf: Minimizing Performance Monitoring Errors Using Bayesian Statistics
Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems
pdfslidestalkcite
SC'20 Live Forensics for HPC Systems: A Case Study on Distributed Storage Systems
Saurabh Jha, Shengkun Cui, Subho Banerjee, Tianyin Xu, Jeremy Enos, Mike Showerman, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the SC20: International Conference for High Performance Computing, Networking, Storage and Analysis
pdfinterviewproduct annoucmentcodedoicite Integrated with IBM InstanaBest Student Paper FinalistBest Paper Finalist
OSDI'20 FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-oriented Microservices
Haoran Qui, Subho Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation
pdfslidestalkdoicite
ISSRE'20 AV-FUZZER: Finding safety violations in autonomous driving systems
Guanpeng Li, Yiran Li, Saurabh Jha, T. Tsai, S. K. S. Hari, M. B. Sullivan, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the IEEE International Conference on Software Reliability Engineering
pdfslidescitecode Best Paper CSL News
ICML'20 Inductive Bias-driven Reinforcement Learning For Efficient Schedules in Heterogeneous Clusters
Subho Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 37th International Conference on Machine Learnings
pdfcite CSL News
DSN'20 ML-driven Malware that Targets AV Safety
Saurabh Jha, Shengkun Cui, Subho Banerjee, James Cyriac, T. Tsai, Zbigniew T. Kalbarczyk, Steve Keckler, Ravishankar K. Iyer
In Proceedings of the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfcite
DSN'20 The Mystery of the Failing Jobs: Insights from Operational Data from Two University-Wide Computing Systems
Rakesh Kumar, Saurabh Jha, Ashraf Mahgoub, Zbigniew T Kalbarczyk, Kramer William, Ravishankar K Iyer, Saurabh Bagchi
In Proceedings of the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfcite
NSDI'20 Measuring Congestion in High-Performance Datacenter Interconnects
Saurabh Jha, Archit Patke, Jim Brandt, Ann Gentile, Benjamin Lim, Mike Showerman, Greg Bauer, Larry Kaplan, Zbigniew Kalbarczyk, William Kramer, Ravi Iyer
In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20)
pdfslidestalkdoicitecodedata
HOTI'20 A Study of Network Congestion in Two Supercomputing High-Speed Interconnects
Saurabh Jha, Archit Patke, Jim Brandt, Ann Gentile, Mike Showerman, Eric Roman, Zbigniew Kalbarczyk, William Kramer, Ravi Iyer
In Proceedings of the IEEE 26th Annual Symposium on High-Performance Interconnects (HOTI 20)
pdfslidescite
DSN'19 ML-Based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection
Saurabh Jha, Subho Banerjee, T. Tsai, S. K. S. Hari, M. B. Sullivan, Zbigniew T. Kalbarczyk, Steve Keckler, Ravishankar K. Iyer
In Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfdoicite Science DailyDaily IlliniGuancha.cnSpace DailyEureka AlertSina
DSN'19 Towards a Bayesian Approach for Assessing Fault Tolerance of Deep Neural Networks
Subho Banerjee, James Cyriac, Saurabh Jha, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks – Supplemental Volume (DSN-S)
pdfdoicite
DSN'18 Hands Off the Wheel in Autonomous Vehicles?: A Systems Perspective on over a Million Miles of Field Data
Subho Banerjee, Saurabh Jha, James Cyriac, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfdoicitecode
TDSC'18 Resiliency of HPC Interconnects: A Case Study of Interconnect Failures and Recovery in Blue Waters
Saurabh Jha, Valerio Formicola, Catello Di Martino, Mark Dalton, William Kramer, Zbigniew Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the IEEE Transactions on Dependable and Secure Computing
pdfdoicite
DSN'18 AVFI: Fault Injection for Autonomous Vehicles
Saurabh Jha, Subho Banerjee, James Cyriac, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)
pdfdoicitecode
CLUSTER'17 Holistic Measurement-Driven System Assessment
Saurabh Jha, Jim Brandt, Ann Gentile, Zbigniew T. Kalbarczyk, Greg Bauer, Jeremy Enos, Mike Showerman, Larry Kaplan, Brett Bode, Annette Greiner, Amanda Bonnie, Mike Mason, Ravishankar K. Iyer, William Kramer
In the Workshop on Machine Learning for Autonomous Driving colocated with NeurIPS
pdfcite
CUG'16 Understanding Fault Scenarios and Impacts through Fault Injection Experiments in Cielo
Valerio Formicola, Saurabh Jha, Daniel Chen, Fei Deng, Amanda Bonnie, Mike Mason, Jim Brandt, Ann Gentile, Larry Kaplan, Jason Repik, Jeremy Enos, Mike Showerman, Annette Greiner, Zbigniew Kalbarczyk, Ravishankar K. Iyer, William Kramer
In the 2016 Cray User Group (CUG)
pdfcite
VLDB'15 Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach
Saurabh Jha, Bingsheng He, Mian Lu, Xuntao Cheng, Huynh Phung Huynh
In Proceedings of the 2015 VLDB Endowment (VLDB)
pdfdoicitecode
FTXS'15 LogDiver: A Tool for Measuring Resilience of Extreme-Scale Systems and Applications
Catello Di Martino, Saurabh Jha, William Kramer, Zbigniew Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale colocated with HPDC 2015
pdfdoicite
HPDC'13 P-HGRMS: A Parallel Hypergraph Based Root Mean Square Algorithm for Image Denoising
Tejaswi Agarwal, Saurabh Jha, Rajesh Kanna
In the 22nd ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC)
pdfslidesposter Best Poster