AI · Software Development

What's missing in AI driven software development

With the advent of LLMs, programming has been going through a continuous transformation. Spec driven development has evolved to be the very centre of this transformation. The main intent of AI driven software development is to be able to take the human out of the loop and achieve 'execution' autonomy.

💡

Spec driven development is a software development process where product requirements are detailed enough to precisely sketch both the end product and engineering. These specs or specifications serve as the very roots of the software system, the ground truth.

But why do we even need these roots? Well, these roots prevent execution drifts to achieve the end goal, which is basically a scalable, reliable, shareable and functional software system. Even with the most advanced tools, building such a system requires different sittings/sessions with AI. And these different sessions don't have much awareness about each other. LLMs are not deterministic in nature by themselves and have a tendency to hallucinate. Every session is susceptible to making its own assumptions and inventing its own decision making which might or might not align with decisions made in earlier sessions. Specs attempt to bridge this gap to make the whole engineering process deterministic in nature and successfully achieve the end goal.

There are many methodologies of defining these specs. Most of them today are primarily defined in natural language and written in a format called Markdown. Think of markdown as a word document, but different. This would have been a very natural choice since LLMs are natively great at understanding and working with natural language.

Specs usually span across multiple files which capture different aspects of the software system. Working with and maintaining such natural language specs is prone to drifts.

For example, adding a new feature would require updating the specs first. And updating the specs by leveraging AI itself could result in different formatting, structure or terminology across different files. It could also result in incomplete updates. Similarly, leveraging AI to execute the implementation through the specs could result in similar drifts. Because of non-deterministic nature of LLMs, reading same natural language spec in different sessions could lead to very different decisions.
Deriving relationships across different natural language spec files is error prone. Furthermore, a lot of unnecessary info is loaded into the sessions since targeted reads/extraction of natural language specs becomes difficult, especially as the specs grow.
The core issue with natural language is the fact that it is not really reliably verifiable. There is no program or a deterministic way to verify that natural language is being worked without any gaps, misses, assumptions, ambiguities and hallucinations.

There are ways to optimise such specs, but this approach is mostly at the mercy of LLMs and we could only hope for the best. Ideally, human input should only be required to correctly define the specs and the specs themselves should be good enough to drive the autonomous execution to achieve the end goal, as fast as possible.