SoCC'24
|
Queue Management for SLO-Oriented Large Language Model Serving
Archit Patke, Dhemath Reddy, Saurabh Jha, Haoran Qiu, Christian Pinto, Chandra Narayanaswami, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 13th Symposium on Cloud Computing (SoCC)
preprintcite Ongoing integration by IBM watsonx |
DSN'24
|
iPrism: Characterize and Mitigate Risk by Quantifying Change in Escape Routes
Shengkun Cui, Saurabh Jha, Ziheng Chen, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfslidescodecite |
DSN'24
|
Fault Localization Using Interventional Causal Learning for Cloud-Native Applications
Saurabh Jha, Jesus Rios, Frank Bagehorn, Larisa Shwartz, Naoki Abe
In Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S)
pdfinterviewproduct annoucmentcite Insights used by IBM Instana |
DSN'24
|
When Green Computing Meets Performance and Resilience SLOs
Haoran Qiu, Weichao Mao, Chen Wang, Saurabh Jha, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer
In Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'24 Disrupt Track)
pdfcite |
ATC'24
|
Power-aware Deep Learning Model Serving with µ-Serve.
Haoran Qiu, Weichao Mao, Archit Patke, Shengkun Cui, Saurabh Jha, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer
In Proceedings of the 2024 USENIX Annual Technical Conference (ATC ‘24)
preprintciteslides |
CLOUD'24
|
SAM: Subseries Augmentation-based Meta-learning for Generalizing AIOps Model in Multi-Cloud Migration.
Xi Yang, Paulito Palmes, Saurabh Jha, Bekir Turkkan, Gerard Vanloo, Frank Bagehorn, Chandra Narayanaswami, Larisa Shwartz, Naoki Abe, Yu Deng, Daby M. Sow
In Proceedings of the 16th International Conference on Cloud Computing (CLOUD)
pdf |
AIOps'24
|
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
Haoran Qiu, Weichao Mao, Archit Patke, Shengkun Cui, Saurabh Jha, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer
In Proceedings of the 5th International Workshop on Cloud Intelligence / AIOps at ASPLOS 2024 (AIOps 2024)
preprintcitecode |
AIOps'24
|
QLM: Queue Management for Large Language Model Serving.
Archit Patke, Dhemath Reddy, Saurabh Jha, Christian Pinto, Haoran Qiu, Shengkun Cui, Chandra Narayanaswami, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 5th International Workshop on Cloud Intelligence / AIOps at ASPLOS 2024 (AIOps 2024).
papercite Insights used by IBM watsonx |
AAAI-24
|
Optimizing IT FinOps and Sustainability through Unsupervised Workload Characterization
Xi Yang, Rohan R. Arora, Saurabh Jha, Chandra Narayanaswami, Cheuk Lam, Jerrold Leichter, Yu Deng, Daby M. Sow
In The Thirty-Eighth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI)
pdfslidestalkdoicite Ongoing integration in IBM Turbonomic |
IAAI'23
|
Fault Injection based Interventional Causal Learning for Distributed Applications
Qing Wang, Jesus Rios, Saurabh Jha, Karthikeyan Shanmugam, Frank Bagehorn, Xi Yang, Robert Filepp, Naoki Abe, Larisa Shwartz
In The Thirty-Fifth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI)
cite |
ASE'22
|
WOLFFI: A fault injection platform for learning AIOps models
Frank Bagehorn, Jesus Rios, Saurabh Jha, Robert Filepp, Larisa Shwartz, Naoki Abe, Xing Yang
In 2022 37th IEEE/ACM International Conference on Automated Software Engineering (ASE)
pdftalkcite |
CLOUD'22
|
Localizing and Explaining Faults in Microservices using Distributed Tracing
Jesus Rios, Saurabh Jha, Larisa Shwartz
In Proceedings of the 15th International Conference on Cloud Computing (CLOUD)
pdf |
DSN'22
|
Exploiting Temporal Data Diversity for Detecting Safety-critical Faults in AV Compute Systems
Saurabh Jha, Shengkun Cui, T. Tsai, S. K. S. Hari, M. B. Sullivan, Zbigniew T. Kalbarczyk, Steve Keckler, Ravishankar K. Iyer
In Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfcite |
COMPSYS'22
|
Evaluating hardware memory disaggregation under delay and contention
Archit Patke, Haoran Qiu, Saurabh Jha, Srikumar Venugopal, Michele Gazzetti, Christian Pinto, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 36th IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPS-W)
Best Presentation |
Patent
|
Automated selection of performance monitors
Saurabh Jha, Amos A. Omokpo, Karthick Rajamani, Harigovind Venkatraj Ramasamy
In US Patent Office (USPTO), US16/818,656, Granted
pdf |
Patent
|
Hardware fault detection for feedback control systems in autonomous machine applications
Tim Tsai, Saurabh Jha, Siva Hari, Michael Sullivan
In US Patent Office (USPTO), US16/994,382, Granted
pdf |
ML4AD'21
|
Watch out for the risky actors: Assessing risk in dynamic environments for safe driving
Saurabh Jha, Yan Miao, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In the Workshop on Machine Learning for Autonomous Driving colocated with NeurIPS
pdfcite |
WOSC'21
|
Is Function-as-a-Service a Good Fit for Latency-Critical Services?
Haoran Qiu, Saurabh Jha, Subho S. Banerjee, Archit Patke, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the Seventh International Workshop on Serverless Computing colocated with ACM/IFIP International Middleware Conference
pdftalkdoicitecode |
TOR'21
|
Data-Driven Application-Oriented Reliability Model of a High-Performance Computing System
Bentolhoda Jafary, Saurabh Jha, Lance Findella, Ravishankar K. Iyer
In Proceedings of the IEEE Transactions on Reliability
pdfdoicite |
ICS'21
|
Delay sensitivity-driven congestion mitigation for HPC systems
Archit Patke, Saurabh Jha, Haoran Qiu, Jim Brandt, Ann Gentile, Joe Greenseid, Zbigniew Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the ACM International Conference on Supercomputing
pdfdoicite |
ASPLOS'21
|
BayesPerf: Minimizing Performance Monitoring Errors Using Bayesian Statistics
Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems
pdfslidestalkcite |
SC'20
|
Live Forensics for HPC Systems: A Case Study on Distributed Storage Systems
Saurabh Jha, Shengkun Cui, Subho Banerjee, Tianyin Xu, Jeremy Enos, Mike Showerman, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the SC20: International Conference for High Performance Computing, Networking, Storage and Analysis
pdfinterviewproduct annoucmentcodedoicite Integrated with IBM InstanaBest Student Paper FinalistBest Paper Finalist |
OSDI'20
|
FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-oriented Microservices
Haoran Qui, Subho Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation
pdfslidestalkdoicite |
ISSRE'20
|
AV-FUZZER: Finding safety violations in autonomous driving systems
Guanpeng Li, Yiran Li, Saurabh Jha, T. Tsai, S. K. S. Hari, M. B. Sullivan, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the IEEE International Conference on Software Reliability Engineering
pdfslidescitecode Best Paper CSL News |
ICML'20
|
Inductive Bias-driven Reinforcement Learning For Efficient Schedules in Heterogeneous Clusters
Subho Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 37th International Conference on Machine Learnings
pdfcite CSL News |
DSN'20
|
ML-driven Malware that Targets AV Safety
Saurabh Jha, Shengkun Cui, Subho Banerjee, James Cyriac, T. Tsai, Zbigniew T. Kalbarczyk, Steve Keckler, Ravishankar K. Iyer
In Proceedings of the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfcite |
DSN'20
|
The Mystery of the Failing Jobs: Insights from Operational Data from Two University-Wide Computing Systems
Rakesh Kumar, Saurabh Jha, Ashraf Mahgoub, Zbigniew T Kalbarczyk, Kramer William, Ravishankar K Iyer, Saurabh Bagchi
In Proceedings of the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfcite |
NSDI'20
|
Measuring Congestion in High-Performance Datacenter Interconnects
Saurabh Jha, Archit Patke, Jim Brandt, Ann Gentile, Benjamin Lim, Mike Showerman, Greg Bauer, Larry Kaplan, Zbigniew Kalbarczyk, William Kramer, Ravi Iyer
In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20)
pdfslidestalkdoicitecodedata |
HOTI'20
|
A Study of Network Congestion in Two Supercomputing High-Speed Interconnects
Saurabh Jha, Archit Patke, Jim Brandt, Ann Gentile, Mike Showerman, Eric Roman, Zbigniew Kalbarczyk, William Kramer, Ravi Iyer
In Proceedings of the IEEE 26th Annual Symposium on High-Performance Interconnects (HOTI 20)
pdfslidescite |
DSN'19
|
ML-Based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection
Saurabh Jha, Subho Banerjee, T. Tsai, S. K. S. Hari, M. B. Sullivan, Zbigniew T. Kalbarczyk, Steve Keckler, Ravishankar K. Iyer
In Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfdoicite Science DailyDaily IlliniGuancha.cnSpace DailyEureka AlertSina |
DSN'19
|
Towards a Bayesian Approach for Assessing Fault Tolerance of Deep Neural Networks
Subho Banerjee, James Cyriac, Saurabh Jha, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks – Supplemental Volume (DSN-S)
pdfdoicite |
DSN'18
|
Hands Off the Wheel in Autonomous Vehicles?: A Systems Perspective on over a Million Miles of Field Data
Subho Banerjee, Saurabh Jha, James Cyriac, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
pdfdoicitecode |
TDSC'18
|
Resiliency of HPC Interconnects: A Case Study of Interconnect Failures and Recovery in Blue Waters
Saurabh Jha, Valerio Formicola, Catello Di Martino, Mark Dalton, William Kramer, Zbigniew Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the IEEE Transactions on Dependable and Secure Computing
pdfdoicite |
DSN'18
|
AVFI: Fault Injection for Autonomous Vehicles
Saurabh Jha, Subho Banerjee, James Cyriac, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)
pdfdoicitecode |
CLUSTER'17
|
Holistic Measurement-Driven System Assessment
Saurabh Jha, Jim Brandt, Ann Gentile, Zbigniew T. Kalbarczyk, Greg Bauer, Jeremy Enos, Mike Showerman, Larry Kaplan, Brett Bode, Annette Greiner, Amanda Bonnie, Mike Mason, Ravishankar K. Iyer, William Kramer
In the Workshop on Machine Learning for Autonomous Driving colocated with NeurIPS
pdfcite |
CUG'16
|
Understanding Fault Scenarios and Impacts through Fault Injection Experiments in Cielo
Valerio Formicola, Saurabh Jha, Daniel Chen, Fei Deng, Amanda Bonnie, Mike Mason, Jim Brandt, Ann Gentile, Larry Kaplan, Jason Repik, Jeremy Enos, Mike Showerman, Annette Greiner, Zbigniew Kalbarczyk, Ravishankar K. Iyer, William Kramer
In the 2016 Cray User Group (CUG)
pdfcite |
VLDB'15
|
Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach
Saurabh Jha, Bingsheng He, Mian Lu, Xuntao Cheng, Huynh Phung Huynh
In Proceedings of the 2015 VLDB Endowment (VLDB)
pdfdoicitecode |
FTXS'15
|
LogDiver: A Tool for Measuring Resilience of Extreme-Scale Systems and Applications
Catello Di Martino, Saurabh Jha, William Kramer, Zbigniew Kalbarczyk, Ravishankar K. Iyer
In Proceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale colocated with HPDC 2015
pdfdoicite |
HPDC'13
|
P-HGRMS: A Parallel Hypergraph Based Root Mean Square Algorithm for Image Denoising
Tejaswi Agarwal, Saurabh Jha, Rajesh Kanna
In the 22nd ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC)
pdfslidesposter Best Poster |