In June 2021 the collaboration between Microsoft and OpenAI released GitHub Copilot, a simulated, artificial intelligence (AI) -powered pair programming tool. The technical preview is available as an extension in Microsoft’s Visual Studio (VS) Code integrated development environment (IDE) and aims to help programmers code faster and with less effort.
Microsoft spent $7bn in 2018 to acquire GitHub, the collaborative platform and social networking site for software developers where code repositories can be stored, accessed, and shared. In July 2021, Microsoft CEO Satya Nadella revealed that 72% of the Fortune 50 use the GitHub platform to build, ship, and maintain software, including the likes of NASA and Shopify.
The technical preview of GitHub Copilot followed investment group Prosus’s $1.8bn acquisition of Stack Overflow, a forum for programmers. If successful, future versions of Copilot could reduce the need for some of the many Stack Overflow queries that programmers make daily.
OpenAI’s Codex is a specialised version of GPT-3 which has been fine-tuned for programming applications and powers GitHub Copilot. The system was trained on billions of lines of publicly available source code, including code in public repositories on GitHub. The deep learning model translates natural language into code and suggests individual lines or whole functions. Startups including TabNine and Kite have also taught neural networks to code.
Learning on the job with GitHub Copilot
GitHub’s Copilot is not the same as low or no-code applications, or even the code autocomplete provided by the Intellisense add on in VS Code. Rather it attempts to recreate pair programming, a software development paradigm where two programmers work together, one typing and one observing and commenting on progress.
This agile method where two people work in close proximity and switch roles frequently has been impacted by Covid-19. Though nascent and not a true imitator of pair programming (Copilot requires programmers to be able to do things like name functions and variables sensibly) the tool could offer something for the remote programmer in the evolving future of work landscape.
Potential for problems
Using Copilot, developers can cycle through alternative suggestions, choosing whether to accept, reject, or manually edit suggested code. Copilot’s suggestions will improve with use and match individual coders’ styles as the reinforcement learning algorithm improves by recording whether each suggestion is accepted or not.
Like any AI model, Copilot depends on good data, such as well-written and documented code publicly available on GitHub. Natural language processing (NLP) may not always pick up on abstractions, abbreviations, slang, context, and nuance present in the training datasets or repositories. Further issues arise when using open-source code, as members of the public not only often have different standards for naming functions and commenting code, but also inherent social biases. Additional issues may arise as Copilot doesn’t test the code it suggests, and so may propose old or out of use libraries and languages.
Code licensing issues on the horizon?
The Copilot tool is a ‘code synthesizer, not a search engine’ as most of the code has never been seen before. However, around 0.1% of the code suggested may contain verbatim snippets from the training data, that is, other peoples’ code.
The solution Copilot proposes is to acknowledge the author when quoting code directly. This would, in theory, allow credit to be attributed to original authors. However, licensing issues may arise if code is directly copied from open-source repositories and used later in proprietary projects.