AIware 2025
Wed 19 - Thu 20 November 2025
co-located with ASE 2025

Software vulnerabilities pose a significant security concern, given the widespread reliance on software systems. In response, recent research has turned to Large Language Models (LLMs) as a means to automate vulnerability repair. However, most existing studies focus on specific backend programming languages, such as C/C++, Java, or Python, which limits our understanding of how LLMs perform across front-end programming languages, such as JavaScript, TypeScript, and PHP. This study investigates the effectiveness of GPT-4.1, Claude Opus 4, and Gemini 2.5 Pro, three state-of-the-art language models, in repairing vulnerabilities across these front-end programming languages, which are widely used in web development and frequently targeted in real-world exploits. To do that, we curated a dataset comprising 4,900 CVEs and 5,005 associated commits from 2,432 open-source projects spanning JavaScript, TypeScript, and PHP. The results indicate that GPT-4.1 is the most consistently effective model, while Claude Opus 4 often produces the most human-like patches. Our analysis highlights the strengths and limitations of each model, indicating that while LLMs hold promise for automated vulnerability repair, their effectiveness remains uneven across multiple front-end languages.

Thu 20 Nov

Displayed time zone: Seoul change

16:00 - 16:50
Evaluation Frameworks, and Quantitative Assessment of LLMs (Part 2)Main Track / Benchmark & Dataset Track at Grand Hall 1
Chair(s): Zhou Yang University of Alberta, Alberta Machine Intelligence Institute
16:00
8m
Talk
PromptExp: Multi-granularity Prompt Explanation of Large Language Models
Main Track
Ximing Dong Centre for Software Excellence at Huawei Canada, Shaowei Wang University of Manitoba, Dayi Lin Centre for Software Excellence, Huawei Canada, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Ahmed E. Hassan Queen’s University
16:08
8m
Talk
Beyond Code Explanations: A Ray of Hope for Cross-Language Vulnerability Repair
Main Track
Kevin Lira North Carolina State University, Baldoino Fonseca Universidade Federal de Alagoas, Wesley K.G. Assunção North Carolina State University, Davy Baía Federal University of Alagoas, Márcio Ribeiro Federal University of Alagoas, Brazil
Pre-print
16:16
8m
Talk
Secure Code Generation at Scale with Reflexion
Benchmark & Dataset Track
Arup Datta University of North Texas, Ahmed Aljohani University of North Texas, Hyunsook Do University of North Texas
Pre-print
16:24
5m
Talk
A Tool for Benchmarking Large Language Models' Robustness in Assessing the Realism of Driving Scenarios
Benchmark & Dataset Track
Jiahui Wu Simula Research Laboratory and University of Oslo, Chengjie Lu Simula Research Laboratory and University of Oslo, Aitor Arrieta Mondragon University, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University
Pre-print
16:29
21m
Live Q&A
Joint Q&A and Discussion #LLMAssessment
Main Track