Markets

Can Anyone Code Now? Exploring AI Help for Non-Programmers

Abstract and 1 Introduction

2. Prior conceptualisations of intelligent assistance for programmers

3. A brief overview of large language models for code generation

4. Commercial programming tools that use large language models

5. Reliability, safety, and security implications of code-generating AI models

6. Usability and design studies of AI-assisted programming

7. Experience reports and 7.1. Writing effective prompts is hard

7.2. The activity of programming shifts towards checking and unfamiliar debugging

7.3. These tools are useful for boilerplate and code reuse

8. The inadequacy of existing metaphors for AI-assisted programming

8.1. AI assistance as search

8.2. AI assistance as compilation

8.3. AI assistance as pair programming

8.4. A distinct way of programming

9. Issues with application to end-user programming

9.1. Issue 1: Intent specification, problem decomposition and computational thinking

9.2. Issue 2: Code correctness, quality and (over)confidence

9.3. Issue 3: Code comprehension and maintenance

9.4. Issue 4: Consequences of automation in end-user programming

9.5. Issue 5: No code, and the dilemma of the direct answer

10. Conclusion

A. Experience report sources

References

Figure 1 – Code generation using the GitHub Copilot editor extension. The portion highlighted in blue has been generated by the model. Left: a function body, generated based on a textual description in a comment. Right: a set of generated test cases. Source: copilot.github.comFigure 1 – Code generation using the GitHub Copilot editor extension. The portion highlighted in blue has been generated by the model. Left: a function body, generated based on a textual description in a comment. Right: a set of generated test cases. Source: copilot.github.com

Abstract

Large language models, such as OpenAI’s codex and Deepmind’s AlphaCode, can generate code to solve a variety of problems expressed in natural language. This technology has already been commercialised in at least one widely-used programming editor extension: GitHub Copilot.

In this paper, we explore how programming with large language models (LLM-assisted programming) is similar to, and differs from, prior conceptualisations of programmer assistance. We draw upon publicly available experience reports of LLM-assisted programming, as well as prior usability and design studies. We find that while LLM-assisted programming shares some properties of compilation, pair programming, and programming via search and reuse, there are fundamental differences both in the technical possibilities as well as the practical experience. Thus, LLM-assisted programming ought to be viewed as a new way of programming with its own distinct properties and challenges.

Finally, we draw upon observations from a user study in which non-expert end user programmers use LLM-assisted tools for solving data tasks in spreadsheets. We discuss the issues that might arise, and open research challenges, in applying large language models to end-user programming, particularly with users who have little or no programming expertise.

1. Introduction

Inferential assistance for programmers has manifested in various forms, such as programming by demonstration, declarative programming languages, and program synthesis (Section 2). Large language models such as GPT mark a quantitative and qualitative step-change in the automatic generation of code and natural language text. This can be attributed to cumulative innovations of vector-space word embeddings, the transformer architecture, large text corpora, and pre-trained language models (Section 3).

These models have been commercialised in the form of APIs such as OpenAI Codex, or as programmerfacing tools such as GitHub Copilot and Tabnine. These tools function as a sort of advanced autocomplete, able to synthesize multiple lines of code based on a prompt within the code editor, which may be natural language (e.g., a comment), code (e.g., a function signature) or an ad-hoc mixture. The capabilities of such tools go well beyond traditional syntax-directed autocomplete, and include the ability to synthesize entire function bodies, write test cases, and complete repetitive patterns (Section 4). These tools have reliability, safety, and security implications (Section 5).

Prior lab-based and telemetric research on the usability of such tools finds that developers generally appreciate the capabilities of these tools and find them to be a positive asset to the development experience, despite no strong effects on task completion times or correctness. Core usability issues include the challenge of correctly framing prompts as well as the effort required to check and debug generated code (Section 6).

Longitudinal experience reports of developers support some of the lab-based findings, while contradicting others. The challenges of correctly framing prompts and the efforts of debugging also appear here. However, there are many reports that these tools do in fact strongly reduce task time (i.e., speed up the development process) (Section 7).

Programming with large language models invites comparison to related ways of programming, such as search, compilation, and pair programming. While there are indeed similarities with each of these, the empirical reports of the experience of such tools also show crucial differences. Search, compilation, and pair programming are thus found to be inadequate metaphors for the nature of LLM-assisted programming; it is a distinct way of programming with its own unique blend of properties (Section 8).

While LLM-assisted programming is currently geared towards expert programmers, arguably the greatest beneficiaries of their abilities will be non-expert end-user programmers. Nonetheless, there are issues with their direct application in end-user programming scenarios. Through a study of LLM-assisted end-user programming in spreadsheets, we uncover issues in intent specification, code correctness, comprehension, LLM tuning, and end-user behaviour, and motivate the need for further study in this area (Section 9).

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button