AIware 2025
Wed 19 - Thu 20 November 2025
co-located with ASE 2025

This program is tentative and subject to change.

Abstract—Unit tests often lack concise summaries that convey test intent, especially in auto-generated or poorly documented codebases. Large Language Models (LLMs) offer a promising solution, but their effectiveness depends heavily on how they are prompted. Unlike generic code summarization, test-code summarization poses distinct challenges because test methods validate expected behavior through assertions rather than implementing functionality. This paper presents a new benchmark of 91 real-world Java test cases paired with developer-written summaries and conducts a controlled ablation study to investigate how test code-related components-such as the method under test (MUT), assertion messages, and assertion semantics-affect the performance of LLM-generated test summaries. We evaluate four code LLMs (Codex, Codestral, DeepSeek, and Qwen-Coder) across seven prompt configurations using n-gram metrics (BLEU, ROUGE-L, METEOR), semantic similarity (BERTScore), and LLM-based evaluation. Results show that prompting with assertion semantics improves summary quality by an average of 0.10 points (2.3%) over full MUT context (4.45 vs. 4.35) while requiring fewer input tokens. Codex and Qwen-Coder achieve the highest alignment with human-written summaries, while DeepSeek underperforms despite high lexical overlap. The replication package is publicly available at https://doi.org/10.5281/zenodo.17067550

Assertion-Aware Test Code Summarization with Large Language Models (Assertion_Aware_Test_Code_Summarization_with_Large_Language_Models.pdf)810KiB

This program is tentative and subject to change.

Thu 20 Nov

Displayed time zone: Seoul change

10:30 - 11:50
LLM-Based Software Testing and Quality AssuranceMain Track at Grand Hall 4
10:30
8m
Talk
Understanding the Characteristics of LLM-Generated Property-Based Tests in Exploring Edge Cases
Main Track
Hidetake Tanaka Nara Institute of Science and Technology, Haruto Tanaka Nara Institute of Science and Technology, Kazumasa Shimari Nara Institute of Science and Technology, Kenichi Matsumoto Nara Institute of Science and Technology
Pre-print
10:38
8m
Talk
Understanding LLM-Driven Test Oracle Generation
Main Track
Adam Bodicoat University of Auckland, Gunel Jahangirova King's College London, Valerio Terragni University of Auckland
10:46
8m
Talk
Turning Manual Tasks into Actions: Assessing the Effectiveness of Gemini-generated Selenium Tests
Main Track
Myron David Peixoto Federal University of Alagoas, Baldoino Fonseca Universidade Federal de Alagoas, Davy Baía Federal University of Alagoas, Kevin Lira North Carolina State University, Márcio Ribeiro Federal University of Alagoas, Brazil, Wesley K.G. Assunção North Carolina State University, Nathalia Nascimento Pennsylvania State University, Paulo Alencar University of Waterloo
File Attached
10:54
8m
Talk
Software Testing with Large Language Models: An Interview Study with Practitioners
Main Track
Maria Deolinda Cesar school, Cleyton Magalhaes Universidade Federal Rural de Pernambuco, Ronnie de Souza Santos University of Calgary
11:02
8m
Talk
HPCAgentTester: A Multi-Agent LLM Approach for Enhanced HPC Unit Test Generation
Main Track
Rabimba Karanjai University of Houston, Lei Xu Kent State University, Weidong Shi University of Houston
11:10
8m
Research paper
Assertion-Aware Test Code Summarization with Large Language Models
Main Track
A: Anamul Haque Mollah University of North Texas, A: Ahmed Aljohani Rochester Institute of Technology, A: Hyunsook Do University of North Texas
File Attached
11:20
30m
Live Q&A
Joint Q&A and Discussion #LLMforTesting
Main Track