research-article Open Access
- Authors:
- Ionut Daniel Fagadau University of Milano - Bicocca, Milan, Italy
University of Milano - Bicocca, Milan, Italy
https://orcid.org/0009-0007-8464-8435
Search about this author
- Leonardo Mariani University of Milano - Bicocca, Milan, Italy
University of Milano - Bicocca, Milan, Italy
https://orcid.org/0000-0001-9527-7042
Search about this author
- Daniela Micucci University of Milano - Bicocca, Milan, Italy
University of Milano - Bicocca, Milan, Italy
https://orcid.org/0000-0003-1261-2234
Search about this author
- Oliviero Riganelli University of Milano - Bicocca, Milan, Italy
University of Milano - Bicocca, Milan, Italy
https://orcid.org/0000-0003-2120-2894
Search about this author
ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program ComprehensionApril 2024Pages 24–34https://doi.org/10.1145/3643916.3644409
- 1citation
- 1
- Downloads
Metrics
Total Citations1Total Downloads1Last 12 Months1
Last 6 weeks1
- Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
- Publisher Site
- eReader
ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension
Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot
Pages 24–34
PreviousChapterNextChapter
ABSTRACT
Generative AI is changing the way developers interact with software systems, providing services that can produce and deliver new content, crafted to satisfy the actual needs of developers. For instance, developers can ask for new code directly from within their IDEs by writing natural language prompts, and integrated services based on generative AI, such as Copilot, immediately respond to prompts by providing ready-to-use code snippets. Formulating the prompt appropriately, and incorporating the useful information while avoiding any information overload, can be an important factor in obtaining the right piece of code. The task of designing good prompts is known as prompt engineering.
In this paper, we systematically investigate the influence of eight prompt features on the style and the content of prompts, on the level of correctness, complexity, size, and similarity to the developers' code of the generated code. We specifically consider the task of using Copilot with 124,800 prompts obtained by systematically combining the eight considered prompt features to generate the implementation of 200 Java methods. Results show how some prompt features, such as the presence of examples and the summary of the purpose of the method, can significantly influence the quality of the result.
References
- Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. arXiv:2108.07732Google Scholar
- Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models. Proceedings of the ACM Programming Languages 7, OOPSLA1 (2023). Google ScholarDigital Library
- Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models Are Few-Shot Learners. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS).Google Scholar
- Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating large language models trained on code. arXiv:2107.03374Google Scholar
- Vincenzo Corso, Daniela Mariani, Leonardo Micucci, and Oliviero Riganelli. 2024. Generating Java Methods: An Empirical Assessment of Four AI-Based Code Assistants. In Proceedings of the International Conference on Program Comprehension (ICPC). Google ScholarDigital Library
- Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the ACM Technical Symposium on Computer Science Education (SIGCSE TS). Google ScholarDigital Library
- Thomas Dohmke. 2023. The economic impact of the AI-powered developer lifecycle and lessons from GitHub Copilot. https://github.blog/2023-06-27-the-economic-impact-of-the-ai-powered-developer-lifecycle-and-lessons-from-github-copilot/.Google Scholar
- Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, and Mike Lewis. 2023. In-Coder: A Generative Model for Code Infilling and Synthesis. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
- GitHub. 2023. Copilot. https://github.com/features/copilot.Google Scholar
- GitHub. 2023. GitHub. https://github.com/.Google Scholar
- GitHub. 2023. GitHub Copilot in VS Code. https://code.visualstudio.com/docs/editor/github-copilot.Google Scholar
- Google. 2023. Bard. https://bard.google.com.Google Scholar
- LeetCode. 2023. LeetCode. https://leetcode.com.Google Scholar
- Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d'Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, and Oriol Vinyals. 2022. Competition-level code generation with AlphaCode. Science 378, 6624 (2022), 1092--1097. Google ScholarCross Ref
- Zongjie Li, Chaozheng Wang, Zhibo Liu, Haoxuan Wang, Dong Chen, Shuai Wang, and Cuiyun Gao. 2023. CCTEST: Testing and Repairing Code Completion Systems. In Proceedings of the International Conference on Software Engineering (ICSE). Google ScholarDigital Library
- Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the Conference on Human Factors in Computing Systems (CHI). Google ScholarDigital Library
- Leo S. Lo. 2023. The Art and Science of Prompt Engineering: A New Literacy in the Information Age. Internet Reference Services Quarterly 27, 4 (2023), 203--210. Google ScholarCross Ref
- Thomas W. MacFarland and Jan M. Yates. 2016. Kruskal-Wallis H-test for oneway analysis of variance (ANOVA) by ranks. Introduction to nonparametric statistics for the biological sciences using R (2016), 177--211. Google ScholarCross Ref
- Antonio Mastropaolo, Luca Pascarella, Emanuela Guglielmi, Matteo Ciniselli, Simone Scalabrino, Rocco Oliveto, and Gabriele Bavota. 2023. On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot. In proceedings of the International Conference on Software Engineering (ICSE). Google ScholarDigital Library
- Patrick Mcknight and Julius Najab. 2010. Mann-Whitney U Test. the Corsini encyclopedia of psychology (2010). Google ScholarCross Ref
- Nhan Nguyen and Sarah Nadi. 2022. An empirical evaluation of GitHub copilot's code suggestions. In Proceedings of the International Conference on Mining Software Repositories (MSR). Google ScholarDigital Library
- Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
- OpenAI. 2023. ChatGPT. https://openai.com/chatgpt.Google Scholar
- Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv:2009.10297Google Scholar
- Xiaoxue Ren, Xinyuan Ye, Dehai Zhao, Zhenchang Xing, and Xiaohu Yang. 2023. From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven AI Chaining. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE). Google ScholarCross Ref
- Laria Reynolds and Kyle McDonell. 2021. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In Proceedings of the Conference on Human Factors in Computing Systems (CHI). Google ScholarDigital Library
- Alberto D. Rodriguez, Katherine R. Dearstyne, and Jane Cleland-Huang. 2023. Prompts Matter: Insights and Strategies for Prompt Engineering in Automated Software Traceability. In Proceedings of the Software and Systems Traceability Workshop (SST) at the International Requirements Engineering Conference (RE).Google ScholarCross Ref
- Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv:2302.11382Google Scholar
- Jules White, Sam Hays, Quchen Fu, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. ChatGPT Prompt Patterns for Improving Code Quality, Refactoring, Requirements Elicitation, and Software Design. arXiv:2303.07839Google Scholar
- Burak Yetistiren, Isik Ozsoy, and Eray Tuzun. 2022. Assessing the Quality of GitHub Copilot's Code Generation. In Proceedings of the Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE). Google ScholarDigital Library
- Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the International Conference on International Conference on Machine Learning (ICML).Google Scholar
Cited By
View all
Index Terms
Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot
Software and its engineering
Software notations and tools
Development frameworks and environments
Integrated and visual development environments
Recommendations
- How Readable is Model-generated Code? Examining Readability and Visual Inspection of GitHub Copilot
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering
Background: Recent advancements in large language models have motivated the practical use of such models in code generation and program synthesis. However, little is known about the effects of such tools on code readability and visual attention in ...
Read More
- Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?
Abstract
Several advances in deep learning have been successfully applied to the software development process. Of recent interest is the use of neural language models to build tools, such as Copilot, that assist in writing code. In this paper we perform a ...
Read More
- IDLGen: Automated Code Generation forInter-parameter Dependencies inWeb APIs
Service-Oriented Computing
Abstract
The generation of code templates from web API specifications is a common practice in industry. However, existing tools neglect the dependencies among input parameters (so-called inter-parameter dependencies), extremely common in practice and ...
Read More
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Publication
- Information
- Contributors
Published in
ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension
April 2024
487 pages
ISBN:9798400705861
DOI:10.1145/3643916
- Chair:
- Igor Steinmacher,
- Co-chair:
- Mario Linares-Vasquez,
- Program Chair:
- Kevin Patrick Moran,
- Program Co-chair:
- Olga Baysal
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2024
Author Tags
- prompt engineering
- code generation
- copilot
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics
- Bibliometrics
- Citations1
Article Metrics
- View Citations
1
Total Citations
1
Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all
PDF Format
View or Download as a PDF file.
eReader
View online with eReader.
eReader
Digital Edition
View this article in digital edition.
View Digital Edition
- Figures
- Other
Close Figure Viewer
Browse AllReturn
Caption
View Table of Contents
Export Citations
Your Search Results Download Request
We are preparing your search results for download ...
We will inform you here when the file is ready.
Download now!
Your Search Results Download Request
Your file of search results citations is now ready.
Download now!
Your Search Results Download Request
Your search export query has expired. Please try again.