ASE 2025 (series) / AIware 2025 (series) / Benchmark & Dataset Track /
A Tool for Benchmarking Large Language Models' Robustness in Assessing the Realism of Driving Scenarios
Thu 20 Nov 2025 16:24 - 16:29 at Grand Hall 1 - Evaluation Frameworks, and Quantitative Assessment of LLMs (Part 2) Chair(s): Zhou Yang
Thu 20 NovDisplayed time zone: Seoul change
Thu 20 Nov
Displayed time zone: Seoul change
16:00 - 16:50 | Evaluation Frameworks, and Quantitative Assessment of LLMs (Part 2)Main Track / Benchmark & Dataset Track at Grand Hall 1 Chair(s): Zhou Yang University of Alberta, Alberta Machine Intelligence Institute | ||
16:00 8mTalk | PromptExp: Multi-granularity Prompt Explanation of Large Language Models Main Track Ximing Dong Centre for Software Excellence at Huawei Canada, Shaowei Wang University of Manitoba, Dayi Lin Centre for Software Excellence, Huawei Canada, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Ahmed E. Hassan Queen’s University | ||
16:08 8mTalk | Beyond Code Explanations: A Ray of Hope for Cross-Language Vulnerability Repair Main Track Kevin Lira North Carolina State University, Baldoino Fonseca Universidade Federal de Alagoas, Wesley K.G. Assunção North Carolina State University, Davy Baía Federal University of Alagoas, Márcio Ribeiro Federal University of Alagoas, Brazil Pre-print | ||
16:16 8mTalk | Secure Code Generation at Scale with Reflexion Benchmark & Dataset Track Arup Datta University of North Texas, Ahmed Aljohani University of North Texas, Hyunsook Do University of North Texas Pre-print | ||
16:24 5mTalk | A Tool for Benchmarking Large Language Models' Robustness in Assessing the Realism of Driving Scenarios Benchmark & Dataset Track Jiahui Wu Simula Research Laboratory and University of Oslo, Chengjie Lu Simula Research Laboratory and University of Oslo, Aitor Arrieta Mondragon University, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University Pre-print | ||
16:29 21mLive Q&A | Joint Q&A and Discussion #LLMAssessment Main Track | ||