Dr. Salah Bouktif
Wed, 12 November 2025
Title of the research page: Enhancing LLM Code Generation: A Systematic Evaluation of Multi-Agent Collaboration and Runtime Debugging for Accuracy, Reliability, and Latency
The integration of Large Language Models (LLMs) and Generative AI (Gen AI) in complex
software engineering (SE) tasks is promising. Code synthesis and program repair automation
are significantly transforming the field. However, achieving reliable results remains
challenging, as current LLMs lack reasoning and often struggle to generate successful
code, especially for non-trivial programming tasks, and may produce irrational output
or "hallucinations."
Dr. Bouktif from the Department of Computer Science and Software Engineering share the belief that applying LLMs to complex, repository-level SE activities require an in-depth understanding of code dependencies across thousands of modules and lines of code, an area that has been relatively understudied. Addressing these reliability issues, Dr. Bouktif and his collaborators are actively evaluating methods for assisting LLMs in generating high-quality code by using Multi-Agent Collaboration and Runtime Debugging.
Their recent project proposes a framework that integrates two complementary inference-time strategies: multi-agent collaboration (ACT) followed by runtime execution-based debugging (Debugger). The ACT phase leverages specialized LLM agents (Analyst, Coder, Tester) guided by specific roles to mirror the collaborative division of labor seen in human software development teams. Following this process-oriented collaboration, the Debugger phase is activated upon code failure, utilizing runtime execution feedback to refine the solution. A debugger agent is designed to use static and dynamic analyses to locate and fix defects.
Empirical studies using 19 LLM models demonstrate the effectiveness of the proposed framework, achieving an average accuracy of 64.82% across all LLMs on the HumanEval benchmark and significantly outperforming the basic prompting approach by more than 7.66%. This framework offers a practical and efficient strategy for promoting the integration of code generation in the software engineering life cycle.
Do you find this content helpful?
عفوا
لايوجد محتوى عربي لهذه الصفحة
عفوا
يوجد مشكلة في الصفحة التي تحاول الوصول إليها