Beyond Code Explanations: A Ray of Hope for Cross-Language Vulnerability Repair
Software vulnerabilities pose a significant security concern, given the widespread reliance on software systems. In response, recent research has turned to Large Language Models (LLMs) as a means to automate vulnerability repair. However, most existing studies focus on specific backend programming languages, such as C/C++, Java, or Python, which limits our understanding of how LLMs perform across front-end programming languages, such as JavaScript, TypeScript, and PHP. This study investigates the effectiveness of GPT-4.1, Claude Opus 4, and Gemini 2.5 Pro, three state-of-the-art language models, in repairing vulnerabilities across these front-end programming languages, which are widely used in web development and frequently targeted in real-world exploits. To do that, we curated a dataset comprising 4,900 CVEs and 5,005 associated commits from 2,432 open-source projects spanning JavaScript, TypeScript, and PHP. The results indicate that GPT-4.1 is the most consistently effective model, while Claude Opus 4 often produces the most human-like patches. Our analysis highlights the strengths and limitations of each model, indicating that while LLMs hold promise for automated vulnerability repair, their effectiveness remains uneven across multiple front-end languages.