My approach towards AI driven Software Development

I dont intend to demean the technological marvels, but in my experience, being an AI first citizen is painful and expensive.

I take this with a grain of salt and persevere through the whole experience. There is a definite sense of excitement which encapsulates endless possibilities.

Even though I have written code my whole life, I still don't feel I know much about software engineering. And the idea of coding being largely solved by AI is inevitably appealing. It provokes me to do things I have never dared to do by myself and venture out in unknown territories.

I have spent the majority of my time in the past few months in figuring out how to write software completely using AI. This is the kind of software I have no understanding or experience of building, like this website. I recently penned down my opinions on the issues with AI driven software development. Natural language specs are the major friction - drifts, lack of verifiability via tooling, and weak context engineering.

Turns out, structured data formats have been around for ages and address exactly this problem. Incidentally, LLMs are inherently robust and efficient in working with such structured formats, specifically JSON. I think every aspect of a software system can be expressed as a JSON spec, and that this (not better prompts, not bigger models) is what's missing from AI-driven development. Here's why.

Every JSON artifact has a template schema which it adheres to. This schema defines a ruleset for different properties and their types. A given JSON schema can refer/link to other JSON schema properties/objects. We can write programs (referred to as tooling going forward) which verify whether a given JSON correctly adheres to its schema ruleset. We can also build a tooling harness to verify the relationships across different JSON artifacts. This verification is 100% deterministic in nature and surfaces very specific details on the errors in case the artifacts don't comply with their schema. On top of all this, JSONs enable very specific and surgical extraction/updates on their artifacts. Basically, we get to work with just what we need.

All AI software and features (ChatGPT Codex, Claude Code, Claude Cowork, Cursor etc) are driven by something called 'tool calling'. Think of tool calling as a medium of interaction between LLMs and real world systems. File operations (search, read, write), executing programs, performing different actions with applications are all examples of such interaction. It's like LLMs can double click on a directory to enter it, look at all the files present in that directory and read the relevant one (just like we all do) through this tool calling. And they are pretty accurate when it comes to this decision making. What this means is that LLMs aka coding agents can effectively invoke different commands of JSON tooling I talked about earlier. We can leverage this tool calling capability to deterministically run verification, extraction, update, trace relationships etc on JSON spec artifacts.

Any software system can be roughly represented and broken down into the following aspects. These aspects can collectively define any software system and its end to end development cycle.

Aspect Examples
Problem & Context Why it exists, scope, who it serves, success criteria
Users & Personas Roles, goals, usage patterns
Capabilities & Features What the system can do, system verbs
Architecture & Components Internal structure, component boundaries, trust zones
Domain Model & Glossary Business vocabulary, entities, relationships
Functional Requirements Specific behaviors, acceptance criteria
Interface Contracts & APIs Endpoints, protocols, request/response schemas
Invariants & Business Rules Truths that must always hold
Non-Functional Requirements Performance, reliability, scalability, security posture
Data Model Entities, schemas, persistence, relationships
Test Data & Fixtures Concrete examples, expected outcomes
Error States & Edge Cases Failure modes, exception handling
Security & Threats Attack vectors, mitigations, threat models
Tech Stack & Dependencies Languages, frameworks, external services
Implementation Plan Milestones, deliverables, sequencing
Roadmap & Scheduling User stories, sprints, integration order
CI/CD Pipeline Build, test, deploy, drift audit
Deployment & Environments Dev, staging, prod, infrastructure
Governance Change control, commit conventions, PR rules
Monitoring & Observability Dashboards, metrics, alerts, drift detection
Project Scaffold Directory layout, config files, boilerplate
Code Implementation Source code
Code Review & Verification Audit gates, fixture status, regression checks
Documentation User docs, API docs, runbooks
Cost & Resources Budget, infrastructure cost, capacity
Compliance & Regulatory Legal requirements, data residency
Accessibility i18n, a11y, localization
Team & Org Roles, responsibilities, ownership
Migration & Evolution Schema versioning, version upgrades

Each of these aspects breaks down a software system to a different level of granularity. What if we could represent and encode each of these aspects as JSON schema/artifacts? Does this mean we can represent any software system as a collection of JSON specs? Well, this is exactly what I try to explore and figure out with this speccing toolkit. Each of these aspects could be a spec artifact. Not every aspect becomes its own spec; some fold into others.

I know this is getting very intimidating. Let's just pause and remind ourselves that LLMs have humbling amount of knowledge about each and every aspect of software systems. What this means is, LLMs probably have all the info around what should be captured in each of these aspects, we just need to guide them. Also, does this mean, we have to remember all of this and manually guide it for each of these aspects? Doing this would feel like we are doing all the cooking and LLMs just plate the dish for us. I guess we all want this to be the other way around.

Like I said, these aspects/specs collectively define any software system. And all these aspects share a very serious relationship with each other. We can actually sequence them in a way so that every aspect feeds into the next, moving from a very high level representation to the very granular details. In this sequence, each of the preceding spec artifacts encodes enough information to derive and define a given spec. Consider the fact that implementation plan cannot be defined without roadmap and milestones, roadmap cannot be defined without functional requirements, functional requirements cannot be defined without capabilities, capabilities cannot be defined without problem and context. I refer to this as a strict waterfall model where any given spec depends on the preceding (upstream) specs, but never the succeeding (downstream) specs. Since each of these aspects can be generalised and standardised into template schemas, we can have a prompt guidance for them. LLMs use this guidance along with all the info encoded in the upstream specs to spit out a given spec. As soon as we spec out all these aspects, we have a comprehensive and complete representation/definition of any software system.

If I can be honest, this strict waterfall model is the most painful aspect of the speccing process. Misalignments or gaps often surface late in the pipeline and force a full replay of every upstream spec. I started off with bidirectional relationships across specs, but that turned out to be even more problematic. Downstream changes kept triggering upstream changes, which led to drift between specs that were supposed to be settled. Strict waterfall is the only way I could work around this and keep the upstream specs locked once they were accepted. I lean on skills to orchestrate the full pipeline, where each spec is rigorously reviewed until it passes before moving on to the next.

To connect all the dots, every aspect of any software system and development cycle can be represented as a JSON spec. Collectively, these specs can define any software system. We can quantify the schema template for each spec and qualitatively define prompt guidance for each aspect. All these specs are sequenced, linked, and worked upon in a strict waterfall model. Since they are defined as JSON artifacts, we can build deterministic tooling to verify, trace, and rectify them. Any non-compliance with the spec sequencing, schema templates, or expected tracing results in self-explanatory errors. This also enables surgical context engineering, so we can precisely work with only what we need. This is the core idea I've been building to address the issues with natural language speccing and AI-driven software development. The rest of this toolkit flows from these fundamentals.