AIware 2025
Wed 19 - Thu 20 November 2025
co-located with ASE 2025
Dates
Tracks

This program is tentative and subject to change.

You're viewing the program in a time zone which is different from your device's time zone change time zone

Wed 19 Nov

Displayed time zone: Seoul change

09:00 - 09:20
09:20 - 10:30
AIware & SecurityMain Track at Grand Hall 4
09:20
8m
Research paper
CHASE: LLM Agents for Dissecting Malicious PyPI Packages
Main Track
Takaaki Toda Waseda University, Tatsuya Mori Waseda University
09:28
8m
Talk
CFCEval: Evaluating Security Aspects in Code Generated by Large Language Models
Main Track
Cheng Cheng Concordia University, Jinqiu Yang Concordia University
Pre-print
09:36
8m
Talk
Security in the Wild: An Empirical Analysis of LLM-Powered Applications and Local Inference Frameworks
Main Track
Julia Gomez-Rangel Texas A&M University - Corpus Christi, Young Lee Texas A & M University - San Antonio, Bozhen Liu Texas A&M University - Corpus Christi
Pre-print
09:44
8m
Talk
How Quantization Impacts Privacy Risk on LLMs for Code?
Main Track
Md Nazmul Haque North Carolina State University, Hua yang North Carolina State University, Zhou Yang University of Alberta, Alberta Machine Intelligence Institute , Bowen Xu North Carolina State University
Pre-print
09:52
8m
Talk
Securing the Multi-Chain Ecosystem: A Unified, Agent-Based Framework for Vulnerability Repair in Solidity and Move
Main Track
Rabimba Karanjai University of Houston, Lei Xu Kent State University, Weidong Shi University of Houston
10:00
8m
Talk
SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for AI-Powered Software
Main Track
Wenliang Shan Monash University, Michael Fu The University of Melbourne, Rui Yang Monash University and Transurban, Kla Tantithamthavorn Monash University and Atlassian
Pre-print File Attached
10:10
20m
Live Q&A
Joint Q&A and Discussion #AISecurity
Main Track

16:00 - 16:50
Human Factors and Organizational Perspectives in AIwareMain Track at Grand Hall 4
16:00
8m
Talk
Examining the Usage of Generative AI Models in Student Learning Activities for Software Programming
Main Track
Rufeng Chen McGill University, Shuaishuai Jiang , Jiyun Shen , AJung Moon McGill University, Lili Wei McGill University
16:08
8m
Talk
Human to Document, AI to Code: Three Case Studies of Comparing GenAI for Notebook Competitions
Main Track
Tasha Settewong Nara Institute of Science and Technology, Youmei Fan Nara Institute of Science and Technology, Raula Gaikovina Kula The University of Osaka, Kenichi Matsumoto Nara Institute of Science and Technology
Pre-print
16:16
8m
Talk
Judge the Votes: A System to Classify Bug Reports and Give Suggestions
Main Track
Emre Dinc Bilkent University, Eray Tüzün Bilkent University
Pre-print
16:24
8m
Talk
Model-Assisted and Human-Guided: Perceptions and Practices of Software Professionals Using LLMs for Coding
Main Track
Italo Santos University of Hawai‘i at Mānoa, Cleyton Magalhaes Universidade Federal Rural de Pernambuco, Ronnie de Souza Santos University of Calgary
16:32
18m
Live Q&A
Joint Discussion #HumanInTheLoop
Main Track

16:50 - 17:35
Emerging Frontiers and Applications of AIwareMain Track / ArXiv Track at Grand Hall 4
16:50
5m
Talk
Envisioning Future Interactive Web Development: Editing Webpage with Natural Language
Main Track
Dang Truong Singapore Management University, Jingyu Xiao The Chinese University of Hong Kong, Yintong Huo Singapore Management University, Singapore
16:55
5m
Talk
On the Promises and Challenges of AI-Powered XR Glasses as Embodied Software
Main Track
Ruizhen Gu University of Sheffield, Jingqiong Zhang University of Sheffield, José Miguel Rojas University of Sheffield, Donghwan Shin University of Sheffield
Pre-print
17:00
5m
Talk
Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles
ArXiv Track
Jiahui Wu Simula Research Laboratory and University of Oslo, Chengjie Lu Simula Research Laboratory and University of Oslo, Aitor Arrieta Mondragon University, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University
Pre-print
17:05
5m
Talk
Combining Reasoning Optimized LLMs and SMT Solvers for Automated Loop Invariant Synthesis
Main Track
17:10
5m
Talk
Evaluating Large Language Models for Code Translation: Effects of Prompt Language and Prompt Design
ArXiv Track
17:15
20m
Live Q&A
Joint Q&A and Discussion #AIwareApplications
Main Track

17:35 - 18:20

Thu 20 Nov

Displayed time zone: Seoul change

10:30 - 11:50
LLM-Based Software Testing and Quality AssuranceMain Track at Grand Hall 4
10:30
8m
Talk
Understanding the Characteristics of LLM-Generated Property-Based Tests in Exploring Edge Cases
Main Track
Hidetake Tanaka Nara Institute of Science and Technology, Haruto Tanaka Nara Institute of Science and Technology, Kazumasa Shimari Nara Institute of Science and Technology, Kenichi Matsumoto Nara Institute of Science and Technology
Pre-print
10:38
8m
Talk
Understanding LLM-Driven Test Oracle Generation
Main Track
Adam Bodicoat University of Auckland, Gunel Jahangirova King's College London, Valerio Terragni University of Auckland
10:46
8m
Talk
Turning Manual Tasks into Actions: Assessing the Effectiveness of Gemini-generated Selenium Tests
Main Track
Myron David Peixoto Federal University of Alagoas, Baldoino Fonseca Universidade Federal de Alagoas, Davy Baía Federal University of Alagoas, Kevin Lira North Carolina State University, Márcio Ribeiro Federal University of Alagoas, Brazil, Wesley K.G. Assunção North Carolina State University, Nathalia Nascimento Pennsylvania State University, Paulo Alencar University of Waterloo
File Attached
10:54
8m
Talk
Software Testing with Large Language Models: An Interview Study with Practitioners
Main Track
Maria Deolinda Cesar school, Cleyton Magalhaes Universidade Federal Rural de Pernambuco, Ronnie de Souza Santos University of Calgary
11:02
8m
Talk
HPCAgentTester: A Multi-Agent LLM Approach for Enhanced HPC Unit Test Generation
Main Track
Rabimba Karanjai University of Houston, Lei Xu Kent State University, Weidong Shi University of Houston
11:10
8m
Research paper
Assertion-Aware Test Code Summarization with Large Language Models
Main Track
A: Anamul Haque Mollah University of North Texas, A: Ahmed Aljohani Rochester Institute of Technology, A: Hyunsook Do University of North Texas
File Attached
11:20
30m
Live Q&A
Joint Q&A and Discussion #LLMforTesting
Main Track

11:50 - 12:30
Future of AIwareMain Track / ArXiv Track at Grand Hall 4
11:50
8m
Talk
The Future of Generative AI in Software Engineering: A Vision from Industry and Academia in the European GENIUS Project
Main Track
Robin Gröpler ifak - Institute for Automation and Communication, Magdeburg, Steffen Klepke Siemens AG, Jack Johns BT Group PLC, Andreas Dreschinski Akkodis, Klaus Schmid , Benedikt Dornauer University of Innsbruck; University of Cologne, Eray Tüzün Bilkent University, Joost Noppen , Mohammad Reza Mousavi King's College London, Yongjian Tang Siemens AG, Germany, Johannes Viehmann Fraunhofer FOKUS, Germany, Selin Şirin Aslangül , Beum Seuk Lee BT Group PLC, Adam Ziolkowski BT, Eric Zie
Pre-print
11:58
5m
Talk
Where Do LLMs Still Struggle? An In-Depth Analysis of Code Generation Benchmarks (short paper-benchmark)
Main Track

12:03
5m
Talk
Guidelines for Empirical Studies in Software Engineering involving Large Language Models
ArXiv Track
Sebastian Baltes Heidelberg University, Florian Angermeir fortiss GmbH, Chetan Arora Monash University, Marvin Muñoz Barón Technical University of Munich, Chunyang Chen TU Munich, Lukas Böhme Hasso Plattner Institute, University of Potsdam, Potsdam, Germany, Fabio Calefato University of Bari, Neil Ernst University of Victoria, Davide Falessi University of Rome Tor Vergata, Italy, Brian Fitzgerald Lero - The Irish Software Research Centre and University of Limerick, Davide Fucci Blekinge Institute of Technology, Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Stefano Lambiase Department of Computer Science, Aalborg University, Denmark, Daniel Russo Department of Computer Science, Aalborg University, Mircea Lungu IT University, Copenhagen, Lutz Prechelt Freie Universität Berlin, Paul Ralph Dalhousie University, Christoph Treude Singapore Management University, Stefan Wagner Technical University of Munich
Pre-print
12:08
22m
Live Q&A
Joint Q&A and Discussion #FutureofAIware
Main Track

15:00 - 15:29
Evaluation Frameworks, and Quantitative Assessment of LLMs (Part 1)Main Track at Grand Hall 4
15:00
29m
Talk
Automated Extract Method Refactoring with Open-Source LLMs: A Comparative Study
Main Track
Sivajeet Chand Technical University of Munich, Melih Kilic Technical University of Munich, Roland Würsching Technical University of Munich, Sushant Kumar Pandey University of Groningen, The Netherlands, Alexander Pretschner TU Munich
Pre-print
16:00 - 16:50
Evaluation Frameworks, and Quantitative Assessment of LLMs (Part 2)Main Track at Grand Hall 4
16:00
16m
Talk
Beyond Code Explanations: A Ray of Hope for Cross-Language Vulnerability Repair
Main Track
Kevin Lira North Carolina State University, Baldoino Fonseca Universidade Federal de Alagoas, Wesley K.G. Assunção North Carolina State University, Davy Baía Federal University of Alagoas, Márcio Ribeiro Federal University of Alagoas, Brazil
16:16
16m
Talk
PromptExp: Multi-granularity Prompt Explanation of Large Language Models
Main Track
Ximing Dong Centre for Software Excellence at Huawei Canada, Shaowei Wang University of Manitoba, Dayi Lin Centre for Software Excellence, Huawei Canada, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Ahmed E. Hassan Queen’s University
16:33
16m
Live Q&A
Joint Q&A and Discussion #LLMAssessment
Main Track

16:50 - 17:35
Responsible, Ethical, and Legal Dimensions of AIwareMain Track at Grand Hall 4
16:50
8m
Talk
Neuro-Symbolic Compliance: Integrating LLMs and SMT for Automated Financial Legal Analysis
Main Track
Yung Shen HSIA National Chengchi University, Fang Yu National Chengchi University, Jie-Hong Roland Jiang National Taiwan University
File Attached
16:58
8m
Talk
Are We Aligned? A Preliminary Investigation of the Alignment of Responsible AI Values between LLMs and Human Judgment
Main Track
Asma Yamani King Fahd University of Petroleum and Minerals, Malak Baslyman King Fahd University of Petroleum & Minerals, Moataz Ahmed King Fahd University of Petroleum and Minerals
File Attached
17:06
5m
Talk
A Vision for Value-Aligned AI-Driven Systems
Main Track
Humphrey Obie Monash University
17:11
5m
Talk
Generative AI and Empirical Software Engineering: A Paradigm Shift
Main Track
Christoph Treude Singapore Management University, Margaret-Anne Storey University of Victoria
Pre-print
17:16
19m
Live Q&A
Joint Discussion #ResponsibleAI
Main Track

17:35 - 18:20
18:20 - 18:40
Awards and ClosingMain Track at Grand Hall 4

Call for Papers

The AIWare Datasets and Benchmarks track invites high quality publications on highly valuable datasets and benchmarks crucial for the development and continuous improvement of AIware. Such datasets and benchmarks are essential for development and evaluation of AIware and their evolution. This track encourages high quality datasets and benchmarks for development and assessment of AIware in the following areas:

  1. Data papers that include:
  • New datasets, or carefully and thoughtfully designed (collections of) datasets based on previously available data tailored for AIware.
  • Data generators and reinforcement learning environments.
  • Data-centric AI methods and tools, e.g. to measure and improve data quality or utility, or studies in data-centric AI that bring important new insights.
  • Advanced practices in data collection and curation are of general interest even if the data itself cannot be shared.
  • Frameworks for responsible dataset development, audits of existing datasets, and identifying significant problems with existing datasets and their use.
  • Tools and best practices to enhance dataset creation, documentation, metadata standards, ethical data handling (e.g., licensing, privacy), and accessibility.
  1. Benchmarking papers are expected to include:
  • Benchmarks on new or existing metrics, as well as benchmarking tools.
  • Systematic analyses of existing systems on novel datasets yield important new insights.
  • Establish meaningful benchmarks that drive progress in performance, robustness, fairness, reliability, and usability of AIware tools.

Topics of interest

Topics of interest fall under the topics of interest of AIware conference with an emphasis on the scope for dataset and benchmark papers explained above.

Submissions

AIware 2025 Benchmark and Dataset Track welcomes submissions from both academia and industry. At least one author from each accepted submission will be required to attend the conference and present the paper.

NEW:

  • Short papers: Submissions are 4 pages including references.
  • Long papers: Page limits: 6 - 8 pages including references.

At the time of submission, the papers should disclose (anonymized and curated) data/benchmarks to increase reproducibility and replicability.

All submissions must be in English and PDF. The page limit is strict, and it will not be possible to purchase additional pages at any point in the process (including after acceptance).

Submission guidelines follows the guidelines in the main track of AIware conference. Papers must be submitted electronically in OpenReview platform through the following submission site: https://openreview.net/group?id=ACM.org/AIWare/2025/Data_and_Benchmark_Track

Authors are required to sign up active OpenReview accounts for submission. (Institutional email is recommended for registration otherwise it might take a couple of days for OpenReview to manually activate the account.) More information about OpenReview is provided in the AIware conference main track page.

Review and evaluation process

Authors are encouraged to follow a double-anonymous review process in the submission. However, single anonymity is also allowed, which reveals the authors’ identities, but not reviewers.

Evaluation criteria:

For Data papers:

  • Novelty: originality of the dataset or tool and clarity of relation with related work
  • Impact: value, usefulness, and reusability of the datasets or tool
  • Relevance: the relevance of the proposed demonstration for the AIware audience
  • Presentation: quality of the presentation
  • Open Usage: accessibility of the datasets or tool, i.e., the data/tool can be found and obtained without a personal request, and any required code should be open source

For Benchmarking papers:

  • Novelty: the originality of its underlying ideas and clarity of relation with related work
  • Impact: the outreach of the proposed tool, metric or dataset and the usefulness of the results
  • Relevance: the relevance of the proposed demonstration for the AIware audience
  • Presentation: the quality of the presentation
  • Open Usage: accessibility of the datasets, metrics, or tools, i.e., the data/tool/metric can be found and obtained without a personal request, and any required code should be open source

Awards

AIware Distinguished Dataset (or Benchmark) Award: given to the best full length paper accepted in the Benchmark and Dataset track.