Quality Assurance, Errors, and AI â OâReilly

A recent article in Fast Company makes the claim âThanks to AI, the Coder is no longer King. All Hail the QA Engineer.â Itâs worth reading, and its argument is probably correct. Generative AI will be used to create more and more software; AI makes mistakes and itâs difficult to foresee a future in which it doesnât; therefore, if we want software that works, Quality Assurance teams will rise in importance. âHail the QA Engineerâ may be clickbait, but it isnât controversial to say that testing and debugging will rise in importance. Even if generative AI becomes much more reliable, the problem of finding the âlast bugâ will never go away.

However, the rise of QA raises a number of questions. First, one of the cornerstones of QA is testing. Generative AI can generate tests, of courseâat least it can generate unit tests, which are fairly simple. Integration tests (tests of multiple modules) and acceptance tests (tests of entire systems) are more difficult. Even with unit tests, though, we run into the basic problem of AI: it can generate a test suite, but that test suite can have its own errors. What does âtestingâ mean when the test suite itself may have bugs? Testing is difficult because good testing goes beyond simply verifying specific behaviors.

Join the O’Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

The problem grows with the complexity of the test. Finding bugs that arise when integrating multiple modules is more difficult and becomes even more difficult when youâre testing the entire application. The AI might need to use Selenium or some other test framework to simulate clicking on the user interface. It would need to anticipate how users might become confused, as well as how users might abuse (unintentionally or intentionally) the application.

Another difficulty with testing is that bugs arenât just minor slips and oversights. The most important bugs result from misunderstandings: misunderstanding a specification or correctly implementing a specification that doesnât reflect what the customer needs. Can an AI generate tests for these situations? An AI might be able to read and interpret a specification (particularly if the specification was written in a machine-readable formatâthough that would be another form of programming). But it isnât clear how an AI could ever evaluate the relationship between a specification and the original intention: what does the customer really want? What is the software really supposed to do?

Security is yet another issue: is an AI system able to red-team an application? Iâll grant that AI should be able to do an excellent job of fuzzing, and weâve seen game playing AI discover âcheats.â Still, the more complex the test, the more difficult it is to know whether youâre debugging the test or the software under test. We quickly run into an extension of Kernighanâs Law: debugging is twice as hard as writing code. So if you write code thatâs at the limits of your understanding, youâre not smart enough to debug it. What does this mean for code that you havenât written? Humans have to test and debug code that they didnât write all the time; thatâs called âmaintaining legacy code.â But that doesnât make it easy or (for that matter) enjoyable.

Programming culture is another problem. At the first two companies I worked at, QA and testing were definitely not high-prestige jobs. Being assigned to QA was, if anything, a demotion, usually reserved for a good programmer who couldnât work well with the rest of the team. Has the culture changed since then? Cultures change very slowly; I doubt it. Unit testing has become a widespread practice. However, itâs easy to write a test suite that give good coverage on paper, but that actually tests very little. As software developers realize the value of unit testing, they begin to write better, more comprehensive test suites. But what about AI? Will AI yield to the âtemptationâ to write low-value tests?

Perhaps the biggest problem, though, is that prioritizing QA doesnât solve the problem that has plagued computing from the beginning: programmers who never understand the problem theyâre being asked to solve well enough. Answering a Quora question that has nothing to do with AI, Alan Mellor wrote:

We all start programming thinking about mastering a language, maybe using a design pattern only clever people know.

Iâve programmed industrial controllers. I can now talk about factories, and PID control, and PLCs and acceleration of fragile goods.

I worked in PC games. I can talk about rigid body dynamics, matrix normalization, quaternions. A bit.

I worked in marketing automation. I can talk about sales funnels, double opt in, transactional emails, drip feeds.

I worked in mobile games. I can talk about level design. Of one way systems to force player flow. Of stepped reward systems.

Code is literally nothing. Language nothing. Tech stack nothing. Nobody gives a monkeys [sic], we can all do that.

To write a real app, you have to understand why it will succeed. What problem it solves. How it relates to the real world. Understand the domain, in other words.

Exactly. This is an excellent description of what programming is really about. Elsewhere, Iâve written that AI might make a programmer 50% more productive, though this figure is probably optimistic. But programmers only spend about 20% of their time coding. Getting 50% of 20% of your time back is important, but itâs not revolutionary. To make it revolutionary, we will have to do something better than spending more time writing test suites. Thatâs where Mellorâs insight into the nature of software so crucial. Cranking out lines of code isnât what makes software good; thatâs the easy part. Nor is cranking out test suites, and if generative AI can help write tests without compromising the quality of the testing, that would be a huge step forward. (Iâm skeptical, at least for the present.) The important part of software development is understanding the problem youâre trying to solve. Grinding out test suites in a QA group doesnât help much if the software youâre testing doesnât solve the right problem.

Software developers will need to devote more time to testing and QA. Thatâs a given. But if all we get out of AI is the ability to do what we can already do, weâre playing a losing game. The only way to win is to do a better job of understanding the problems we need to solve.

Tracking need-to-know trends at the intersection of business and technology.

Take O’Reilly with you and learn anywhere, anytime on your phone and tablet.

View all O’Reilly videos, Superstream events, and Meet the Expert sessions on your home TV.