ASE 2025 (series) / AIware 2025 (series) / Benchmark & Dataset Track /
SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks
Thu 20 Nov 2025 15:24 - 15:29 at Grand Hall 1 - Evaluation Frameworks, and Quantitative Assessment of LLMs (Part 1) Chair(s): Zhou Yang
AI coding agents have shown great progress on Python software engineering benchmarks like SWE-Bench, and for other languages like Java and C in benchmarks like Multi SWE-Bench. However, C#– a prominent enterprise language ranking #5 in the TIOBE index– remains absent from such benchmarks. We introduce SWE-Sharp-Bench, a reproducible software engineering benchmark for C# featuring 150 instances from 17 repositories. Evaluating identical model-agent configu rations across languages reveals a significant performance gap: while 70% of Python tasks in SWE-Bench Verified are solved, only 40% of our C# tasks are resolved. We open-source SWE Sharp-Bench and our entire curation pipeline.
Thu 20 NovDisplayed time zone: Seoul change
Thu 20 Nov
Displayed time zone: Seoul change
15:00 - 15:29 | Evaluation Frameworks, and Quantitative Assessment of LLMs (Part 1)Benchmark & Dataset Track / Main Track at Grand Hall 1 Chair(s): Zhou Yang University of Alberta, Alberta Machine Intelligence Institute | ||
15:00 8mTalk | Automated Extract Method Refactoring with Open-Source LLMs: A Comparative Study Main Track Sivajeet Chand Technical University of Munich, Melih Kilic Technical University of Munich, Roland Würsching Technical University of Munich, Sushant Kumar Pandey University of Groningen, The Netherlands, Alexander Pretschner TU Munich Pre-print | ||
15:08 8mTalk | Benchmarking Web API Integration Code Generation Benchmark & Dataset Track Daniel Maninger TU Darmstadt, Leon Chemnitz TU Darmstadt, Amir Molzam Sharifloo , Mira Mezini TU Darmstadt; hessian.AI; National Research Center for Applied Cybersecurity ATHENE Pre-print | ||
15:16 8mTalk | From Search to Reasoning: A Five-Level RAG Capability Framework for Enterprise Data Benchmark & Dataset Track Gurbinder Gill , Ritvik Gupta Carnegie Mellon University, USA, Denis Lusson , Anand Chandrashekar , Donald Nguyen Corvic AI Pre-print | ||
15:24 5mTalk | SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks Benchmark & Dataset Track Sanket Mhatre Microsoft, Yasharth Bajpai Microsoft, Sumit Gulwani Microsoft, Emerson Murphy-Hill Microsoft, Gustavo Soares Microsoft Pre-print | ||