This program is tentative and subject to change.
Wed 19 NovDisplayed time zone: Seoul change
09:00 - 09:20 | |||
17:35 - 18:20 | |||
Thu 20 NovDisplayed time zone: Seoul change
10:30 - 11:50 | |||
10:30 8mTalk | Understanding the Characteristics of LLM-Generated Property-Based Tests in Exploring Edge Cases Main Track Hidetake Tanaka Nara Institute of Science and Technology, Haruto Tanaka Nara Institute of Science and Technology, Kazumasa Shimari Nara Institute of Science and Technology, Kenichi Matsumoto Nara Institute of Science and Technology Pre-print | ||
10:38 8mTalk | Understanding LLM-Driven Test Oracle Generation Main Track Adam Bodicoat University of Auckland, Gunel Jahangirova King's College London, Valerio Terragni University of Auckland | ||
10:46 8mTalk | Turning Manual Tasks into Actions: Assessing the Effectiveness of Gemini-generated Selenium Tests Main Track Myron David Peixoto Federal University of Alagoas, Baldoino Fonseca Universidade Federal de Alagoas, Davy Baía Federal University of Alagoas, Kevin Lira North Carolina State University, Márcio Ribeiro Federal University of Alagoas, Brazil, Wesley K.G. Assunção North Carolina State University, Nathalia Nascimento Pennsylvania State University, Paulo Alencar University of Waterloo File Attached | ||
10:54 8mTalk | Software Testing with Large Language Models: An Interview Study with Practitioners Main Track Maria Deolinda Cesar school, Cleyton Magalhaes Universidade Federal Rural de Pernambuco, Ronnie de Souza Santos University of Calgary | ||
11:02 8mTalk | HPCAgentTester: A Multi-Agent LLM Approach for Enhanced HPC Unit Test Generation Main Track Rabimba Karanjai University of Houston, Lei Xu Kent State University, Weidong Shi University of Houston | ||
11:10 8mResearch paper | Assertion-Aware Test Code Summarization with Large Language Models Main Track A: Anamul Haque Mollah University of North Texas, A: Ahmed Aljohani Rochester Institute of Technology, A: Hyunsook Do University of North Texas File Attached | ||
11:20 30mLive Q&A | Joint Q&A and Discussion #LLMforTesting Main Track | ||
11:50 - 12:30 | |||
11:50 8mTalk | The Future of Generative AI in Software Engineering: A Vision from Industry and Academia in the European GENIUS Project Main Track Robin Gröpler ifak - Institute for Automation and Communication, Magdeburg, Steffen Klepke Siemens AG, Jack Johns BT Group PLC, Andreas Dreschinski Akkodis, Klaus Schmid , Benedikt Dornauer University of Innsbruck; University of Cologne, Eray Tüzün Bilkent University, Joost Noppen , Mohammad Reza Mousavi King's College London, Yongjian Tang Siemens AG, Germany, Johannes Viehmann Fraunhofer FOKUS, Germany, Selin Şirin Aslangül , Beum Seuk Lee BT Group PLC, Adam Ziolkowski BT, Eric Zie Pre-print | ||
11:58 5mTalk | Where Do LLMs Still Struggle? An In-Depth Analysis of Code Generation Benchmarks (short paper-benchmark) Main Track | ||
12:03 5mTalk | Guidelines for Empirical Studies in Software Engineering involving Large Language Models ArXiv Track Sebastian Baltes Heidelberg University, Florian Angermeir fortiss GmbH, Chetan Arora Monash University, Marvin Muñoz Barón Technical University of Munich, Chunyang Chen TU Munich, Lukas Böhme Hasso Plattner Institute, University of Potsdam, Potsdam, Germany, Fabio Calefato University of Bari, Neil Ernst University of Victoria, Davide Falessi University of Rome Tor Vergata, Italy, Brian Fitzgerald Lero - The Irish Software Research Centre and University of Limerick, Davide Fucci Blekinge Institute of Technology, Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Stefano Lambiase Department of Computer Science, Aalborg University, Denmark, Daniel Russo Department of Computer Science, Aalborg University, Mircea Lungu IT University, Copenhagen, Lutz Prechelt Freie Universität Berlin, Paul Ralph Dalhousie University, Christoph Treude Singapore Management University, Stefan Wagner Technical University of Munich Pre-print | ||
12:08 22mLive Q&A | Joint Q&A and Discussion #FutureofAIware Main Track | ||
15:00 - 15:29 | |||
15:00 29mTalk | Automated Extract Method Refactoring with Open-Source LLMs: A Comparative Study Main Track Sivajeet Chand Technical University of Munich, Melih Kilic Technical University of Munich, Roland Würsching Technical University of Munich, Sushant Kumar Pandey University of Groningen, The Netherlands, Alexander Pretschner TU Munich Pre-print | ||
16:00 - 16:50 | |||
16:00 16mTalk | Beyond Code Explanations: A Ray of Hope for Cross-Language Vulnerability Repair Main Track Kevin Lira North Carolina State University, Baldoino Fonseca Universidade Federal de Alagoas, Wesley K.G. Assunção North Carolina State University, Davy Baía Federal University of Alagoas, Márcio Ribeiro Federal University of Alagoas, Brazil | ||
16:16 16mTalk | PromptExp: Multi-granularity Prompt Explanation of Large Language Models Main Track Ximing Dong Centre for Software Excellence at Huawei Canada, Shaowei Wang University of Manitoba, Dayi Lin Centre for Software Excellence, Huawei Canada, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Ahmed E. Hassan Queen’s University | ||
16:33 16mLive Q&A | Joint Q&A and Discussion #LLMAssessment Main Track | ||
16:50 - 17:35 | |||
16:50 8mTalk | Neuro-Symbolic Compliance: Integrating LLMs and SMT for Automated Financial Legal Analysis Main Track Yung Shen HSIA National Chengchi University, Fang Yu National Chengchi University, Jie-Hong Roland Jiang National Taiwan University File Attached | ||
16:58 8mTalk | Are We Aligned? A Preliminary Investigation of the Alignment of Responsible AI Values between LLMs and Human Judgment Main Track Asma Yamani King Fahd University of Petroleum and Minerals, Malak Baslyman King Fahd University of Petroleum & Minerals, Moataz Ahmed King Fahd University of Petroleum and Minerals File Attached | ||
17:06 5mTalk | A Vision for Value-Aligned AI-Driven Systems Main Track Humphrey Obie Monash University | ||
17:11 5mTalk | Generative AI and Empirical Software Engineering: A Paradigm Shift Main Track Pre-print | ||
17:16 19mLive Q&A | Joint Discussion #ResponsibleAI Main Track | ||
17:35 - 18:20 | |||
18:20 - 18:40 | |||
Call for Papers
The AIWare Datasets and Benchmarks track invites high quality publications on highly valuable datasets and benchmarks crucial for the development and continuous improvement of AIware. Such datasets and benchmarks are essential for development and evaluation of AIware and their evolution. This track encourages high quality datasets and benchmarks for development and assessment of AIware in the following areas:
- Data papers that include:
- New datasets, or carefully and thoughtfully designed (collections of) datasets based on previously available data tailored for AIware.
- Data generators and reinforcement learning environments.
- Data-centric AI methods and tools, e.g. to measure and improve data quality or utility, or studies in data-centric AI that bring important new insights.
- Advanced practices in data collection and curation are of general interest even if the data itself cannot be shared.
- Frameworks for responsible dataset development, audits of existing datasets, and identifying significant problems with existing datasets and their use.
- Tools and best practices to enhance dataset creation, documentation, metadata standards, ethical data handling (e.g., licensing, privacy), and accessibility.
- Benchmarking papers are expected to include:
- Benchmarks on new or existing metrics, as well as benchmarking tools.
- Systematic analyses of existing systems on novel datasets yield important new insights.
- Establish meaningful benchmarks that drive progress in performance, robustness, fairness, reliability, and usability of AIware tools.
Topics of interest
Topics of interest fall under the topics of interest of AIware conference with an emphasis on the scope for dataset and benchmark papers explained above.
Submissions
AIware 2025 Benchmark and Dataset Track welcomes submissions from both academia and industry. At least one author from each accepted submission will be required to attend the conference and present the paper.
NEW:
- Short papers: Submissions are 4 pages including references.
- Long papers: Page limits: 6 - 8 pages including references.
At the time of submission, the papers should disclose (anonymized and curated) data/benchmarks to increase reproducibility and replicability.
All submissions must be in English and PDF. The page limit is strict, and it will not be possible to purchase additional pages at any point in the process (including after acceptance).
Submission guidelines follows the guidelines in the main track of AIware conference. Papers must be submitted electronically in OpenReview platform through the following submission site: https://openreview.net/group?id=ACM.org/AIWare/2025/Data_and_Benchmark_Track
Authors are required to sign up active OpenReview accounts for submission. (Institutional email is recommended for registration otherwise it might take a couple of days for OpenReview to manually activate the account.) More information about OpenReview is provided in the AIware conference main track page.
Review and evaluation process
Authors are encouraged to follow a double-anonymous review process in the submission. However, single anonymity is also allowed, which reveals the authors’ identities, but not reviewers.
Evaluation criteria:
For Data papers:
- Novelty: originality of the dataset or tool and clarity of relation with related work
- Impact: value, usefulness, and reusability of the datasets or tool
- Relevance: the relevance of the proposed demonstration for the AIware audience
- Presentation: quality of the presentation
- Open Usage: accessibility of the datasets or tool, i.e., the data/tool can be found and obtained without a personal request, and any required code should be open source
For Benchmarking papers:
- Novelty: the originality of its underlying ideas and clarity of relation with related work
- Impact: the outreach of the proposed tool, metric or dataset and the usefulness of the results
- Relevance: the relevance of the proposed demonstration for the AIware audience
- Presentation: the quality of the presentation
- Open Usage: accessibility of the datasets, metrics, or tools, i.e., the data/tool/metric can be found and obtained without a personal request, and any required code should be open source
Awards
AIware Distinguished Dataset (or Benchmark) Award: given to the best full length paper accepted in the Benchmark and Dataset track.