9. Protocol-Driven Development
Admitting AI-generated software through structural, behavioral, and operational invariants.
Architecture chapter
Authority, evidence, and replay
10. Vision 2030 Case Study
Vision 2030 & Sovereignty
Secures the national software supply chain as Saudi software factories adopt generative AI, ensuring that generated code is admitted only if it satisfies formal invariants.
Preceding chapters established the runtime governance lifecycle: reasoning translates into intent, intent compiles into an execution contract, and the contract derives bounded execution identity. Autonomous systems, however, generate software as well as actions, synthesizing scripts, workflow orchestrations, security policies, integration adapters, tests, and control-plane components. When generated code participates in execution, software construction itself enters the governance boundary.
Protocol-Driven Development (PDD) treats machine-enforceable protocols as the primary software artifacts, and implementation code as admissible realizations of those protocols. As machine intelligence reduces the cost of code generation, the primary engineering bottleneck shifts from code production to code admission. PDD formalizes this admission model in greater technical depth [1].
Protocol Authority
The generator carries no standing trust; the protocol defines the basis for admission.
From Code Generation to Code Admission
Traditional development focuses on writing, reviewing, testing, deploying, and monitoring code, disciplines that remain fundamental to software engineering. Generative artificial intelligence changes the economics of implementation by making candidate code easier to produce at scale. Natural-language specifications, however, remain ambiguous. Generated implementations vary across models, prompts, contexts, and tool histories; many compile and pass superficial tests while failing to align with true system requirements.
As implementations become easier to produce, code admission becomes the control point.
This shift forms the core of PDD. System architects must ask not whether a model can generate code, but whether the institution maintains a machine-enforceable standard to decide which generated artifacts may enter production environments. In autonomous infrastructure, this question is especially important. A generated adapter, workflow, policy module, or remediation script may later acquire execution identity via VAI or participate in an OpenKedge governance pipeline. Admitting a flawed artifact compromises the control plane itself.
PDD addresses this by defining what code must be, rather than how it is generated. A model may yield a single candidate or thousands; the admission boundary remains stable. The generator and implementation may vary, but the admissibility standard remains stable.
This integration incorporates software construction directly into the Autonomous State Control Plane. While runtime governance decides whether an action may execute, protocol admission determines whether a software artifact is eligible to enter the runtime environment.
The admission problem intensifies under continuous generation. A platform may generate patches during incidents, infrastructure modules during deployments, policy helpers during governance design, data transformations during analytics workflows, and adapters during integration tasks. While each candidate may appear plausible, the institution requires a durable admission rule specifying the active protocol, required evidence, and failure consequences. Without this rule, organizations evaluate artifacts ad hoc based on intuition, urgency, or model confidence.
Admission also requires verifiable provenance. The control plane must track whether an artifact was generated by a model, written by a human, modified by an agent, or imported from an external repository. Provenance does not dictate admissibility, but it calibrates the evidence threshold. A generated adapter serving a production execution path faces a higher admission bar than a low-risk internal utility, regardless of compilation success.
Why Tests and Prompts Are Insufficient
Natural-language specifications are useful, but they cannot serve as governance boundaries. They are ambiguous, incomplete, difficult to enforce mechanically, and interpreted differently by humans and models. A requirement like "rotate credentials safely" carries different operational implications for a security engineer, a platform administrator, and a code-generation model. Without a machine-enforceable protocol, the system cannot verify whether the generated implementation satisfies the intended constraint.
Prompts are also necessary but insufficient. They guide generation by supplying constraints, examples, style guides, and policy reminders. However, prompts cannot guarantee behavior; they are not stable control boundaries. Generation processes can ignore, misinterpret, or optimize around prompt instructions. A prompt can request a compliant implementation, but it cannot prove compliance.
Unit tests and examples provide valuable evidence, but they only sample behavior. They fail to cover the complete state space, missing edge cases, concurrency bugs, authorization bypasses, resource exhaustion, and forbidden side effects. A generated implementation can pass every test and still violate the required protocol.
Generated Code Admission
A generated implementation can compile, pass sampled tests, and still violate the protocol the system actually requires. PDD treats tests strictly as evidence, not as the complete governance boundary.
Human code review remains valuable for identifying semantic gaps, architectural flaws, maintainability risks, and domain-specific concerns. However, manual review is slow, highly variable, and impossible to scale to high-volume generated code. It depends heavily on reviewer interpretation and may miss subtle violations of the operational protocol.
PDD does not discard prompts, tests, and reviews; it positions them as evidence inputs rather than the complete admission system. A strong admission pipeline identifies the governing protocol and requires evidence that the candidate satisfies its invariants.
This distinction matters for engineering culture. Teams often treat passing tests as a green light, relying on post-hoc monitoring to catch anomalies. While acceptable for many conventional development workflows, this approach is too weak for components destined for autonomous execution. Monitoring observes behavior after admission; PDD evaluates the artifact before it can affect governed systems.
Tests are most effective when derived directly from the protocol. A test verifying an example is useful; a test enforcing a behavioral invariant is stronger; a property-based test exploring a broad class of inputs is stronger still. PDD does not replace testing; it gives testing a governance target.
Protocol as the Primary Artifact
A protocol is a machine-enforceable specification of the admissible behavior of a software component. It defines allowed inputs, outputs, state transitions, safety constraints, behavioral invariants, operational limits, evidence requirements, and integration obligations.
Under PDD, implementations are replaceable; protocols are authoritative.
This reverses the traditional hierarchy of software generation. A generated implementation is never trusted based on its source, model, prompt, or toolchain. It is trusted only when empirical evidence demonstrates that it satisfies the protocol. Multiple implementations can satisfy the same protocol. A model may generate one candidate, a human may write another, and a future model may generate a third; the protocol defines the admissible space for all three.
The correctness of the generated artifact is secondary to the correctness of the admission system governing it. While artifact quality matters, it is evaluated relative to the protocol. Without a correct admission system, an organization cannot distinguish a plausible implementation from an admissible one.
Protocols matter for control-plane components. An execution adapter must enforce contracts; a policy module must evaluate inputs consistently; a workflow definition must preserve approval boundaries; a remediation script must avoid forbidden side effects. These components require more than plausible code; they require rigorous admissibility evidence.
Protocol ownership represents an institutional responsibility. Every protocol must maintain a designated owner, version history, defined scope, review path, and change management process. Without protocol ownership, the admission boundary dissolves. In sovereign or regulated environments, this responsibility maps to platform teams, security organizations, domain authorities, or public-sector operating bodies. Protocols represent authority-bearing artifacts and must be governed accordingly.
Protocols also make implementation diversity governable. One team may deploy a generated TypeScript adapter, another a Python script, and a third a managed cloud workflow. As long as they satisfy the same protocol, multiple realizations can coexist under identical structural, behavioral, and operational constraints. This flexibility matters for multi-cloud and sovereign environments where underlying substrates vary.
The PDD Model
The PDD model represents a protocol as a composition of invariant classes:
where 𝓢 denotes structural invariants, 𝓑 denotes behavioral invariants, and 𝓞 denotes operational invariants.
An implementation is admissible if and only if it satisfies the protocol:
This formulation shows that the candidate implementation is a valid realization of the protocol-defined admissible space. This notation is intentionally clean; it defines the core relationship without prescribing a specific verification technology: generated code is a candidate realization, and the protocol defines the boundary of admissibility.

Different domains utilize different enforcement mechanisms. A type-safe library relies on static type checks and property tests; a control-plane adapter requires schema validation, policy conformance, sandbox execution, and runtime simulation; a public-sector workflow requires institutional reviews and audit logs. PDD accommodates this variance while maintaining the admission principle.
The model supports a spectrum of assurance. Invariants can be checked statically, validated via dynamic tests, verified through model checking, simulated, or monitored at runtime. PDD does not require complete formal proof for all systems; it requires that the protocol define the admissibility target and that the admission pipeline produce evidence proportional to risk.
This approach keeps the model practical. A low-risk generated helper satisfies its protocol through type checks and conformance tests. A high-risk execution adapter requires sandbox execution, negative tests, threat modeling, and signed approvals. The same admission doctrine applies across both scenarios.
Type-Theoretic Interpretation of PDD
PDD can be interpreted as a type-theoretic admission layer: generated code becomes operational only when it inhabits the protocol-defined space of admissible implementations.
As machine intelligence makes code generation easier to scale, the central engineering challenge shifts from producing code to verifying admissibility. Type-theoretic techniques help move this admissibility boundary from post-hoc runtime monitoring to pre-runtime verification. In this view, the protocol defines the type of acceptable implementations, and a candidate artifact is admitted only when evidence shows it inhabits that type.
where impl : 𝓟 denotes that the implementation inhabits the protocol-defined admissible space. This connects to our core invariant composition model:
Structural invariants align with type and interface constraints: the artifact must expose the correct surface, consume and produce valid data shapes, respect module boundaries, and avoid forbidden dependencies. Behavioral invariants align with refinement predicates and contract systems: the artifact must preserve authorization checks, preconditions, postconditions, state transitions, and error-handling behaviors. Operational invariants align with effect and resource constraints: the artifact must respect side-effect bounds, timeouts, external call limits, evidence obligations, and memory allocations.
Refinement types encode constraints beyond simple shapes, such as bounded percentages, non-empty scopes, permitted resource classes, or ownership predicates. Dependent types express properties parameterized by runtime values or contexts; for example, specifying that a remediation script for a given service and region may only affect resources within that declared scope. Typed effect systems constrain what generated code is permitted to do: marking it as read-only, evidence-emitting, network-free, or cloud-mutating exclusively via designated adapters.
This type-theoretic framing does not replace testing, review, sandboxing, or runtime evidence. It moves as much of the safety boundary as practical prior to execution. While some properties can be checked statically, others require property-based tests, simulation, sandbox execution, or runtime monitoring. PDD is strongest when these techniques combine, and the admission decision logs the evidence that justified the result.
Advanced implementations of this admission pipeline can leverage foundational research in refinement types [2], dependent types [3], and typed effect systems [4].
Structural, Behavioral, and Operational Invariants
Structural invariants constrain the shape of the artifact. They define required interfaces, type constraints, schema conformance, dependency boundaries, module limits, API surfaces, and configuration schemas. A generated component that imports forbidden libraries, exposes unsupported endpoints, bypasses module boundaries, or generates an invalid schema fails structurally before its behavior is ever evaluated.
Behavioral invariants constrain what the artifact does. They define preconditions, postconditions, state-transition rules, authorization checks, idempotency guarantees, determinism, error handling, and forbidden side effects. A generated workflow that skips an approval step, a script that mutates resources outside its declared scope, or an adapter that retries a non-idempotent operation without safeguards fails behaviorally.
Operational invariants constrain runtime behavior. They define latency budgets, resource allocations, timeout thresholds, logging and evidence requirements, failure recovery, rollback behaviors, rate limits, and observability obligations. A generated service that operates correctly in a unit test but lacks required evidence emission, ignores timeouts, or consumes unbounded resources fails operationally.
Structural invariants determine whether the artifact fits. Behavioral invariants determine whether it functions correctly. Operational invariants determine whether it can run safely.
| Invariant Class | Purpose | Examples |
|---|---|---|
| Structural | Constrain the shape and interfaces of the artifact | Types, schemas, dependencies, module boundaries, API surfaces |
| Behavioral | Constrain the allowed behavior and state transitions | Preconditions, postconditions, authorization checks, idempotency, forbidden effects |
| Operational | Constrain runtime and deployment properties | Timeouts, resource limits, evidence emission, rollback behavior, observability, rate limits |
These invariant classes must be versioned. Protocols evolve as systems learn from operational failures. New evidence may reveal missing failure modes, inadequate observability, or unsafe dependency patterns. Versioning lets the institution know which protocol admitted a given artifact and replay admission decisions when protocols update.
Invariants must also remain composable. A generated control-plane plugin may need to satisfy a general plugin protocol, a security protocol, an evidence-emission protocol, and a domain-specific operational protocol. Composability prevents every artifact from requiring a bespoke admission model, enabling institutions to raise the admission bar in high-risk domains by layering additional invariants instead of rewriting the entire governance surface.
Composed invariants must be checked for conflicts. A latency budget may conflict with a heavy evidence-emission requirement; a dependency restriction may conflict with an observability library; a rollback invariant may conflict with a non-transactional downstream system. Surfacing these design tensions before runtime is a key advantage of the protocol-driven model.
Evidence-Based Admission
PDD requires empirical evidence, not merely specification.
The admission system asks not whether the implementation looks plausible, but what evidence shows that the implementation satisfies the protocol.
Admission evidence includes static analysis reports, type checking outputs, schema validation logs, model-checking proofs, property-based test results, runtime simulation metrics, sandbox execution traces, policy checks, security scans, conformance tests, and human review attestations. Different protocols demand different evidence. A low-risk generated utility requires type checks and linting; a control-plane adapter requires sandbox execution, policy conformance, evidence-emission tests, and manual reviews; a generated workflow affecting public services requires formal institutional approval and replayable admission logs.
Admission outcomes must remain explicit. A candidate is admitted, rejected, admitted with constraints, requested to produce additional evidence, routed to human review, restricted to sandbox use, or required to pass simulation before achieving runtime eligibility. Admission represents an active governance decision, not a passive build step.
The admission process itself must emit evidence. The system logs the submitted artifact, the governing protocol version, the checks executed, the evidence produced, the decision reached, the authorizers, and any active constraints. This admission record links directly to deployment logs, IEEC events, and later runtime telemetry.
Evidence-based admission supports system learning. If an admitted artifact is involved in an incident, reviewers inspect the admission evidence to determine whether the protocol was incomplete, the checks were insufficient, the implementation drifted after admission, or the runtime environment changed.
Rejected artifacts also produce evidence. A rejection log records which protocol requirement failed and why. Repeated rejections reveal whether the prompt is underspecified, the model is poorly suited to the task, the protocol is ambiguous, or the check is too brittle. Negative admission evidence is operationally useful, helping engineers refine generation tasks and protocols without lowering the admission bar.
Admission with constraints is another useful outcome. A generated artifact may be safe for sandbox use but not production; it may be admissible for read-only tasks but not mutations; it may be valid for one cloud provider but not another. Encoding these constraints avoids a false binary between full admission and rejection.
PDD in the Autonomous State Control Plane
PDD is not a post-execution logging layer; it is an out-of-band, continuous admission substrate.
PDD operates before and around runtime governance, determining whether generated software, policies, adapters, workflows, and agents are eligible to enter the control plane.
Sovereign Software Admission
In sovereign AI systems, generated software must never become operational simply because a model generated it. It must pass through admission protocols owned by the institution responsible for execution.
The artifacts governed by PDD encompass generated remediation scripts, execution adapters, workflow definitions, policy modules, infrastructure-as-code modifications, agent tool wrappers, data transformations, simulation models, approval workflows, and control-plane plugins. These artifacts shape how autonomous systems behave, serving as the tools agents call, the adapters enforcing contracts, the policies governing intent, or the workflows routing approvals.
This explains why PDD belongs inside the Autonomous State Control Plane. If the control plane governs runtime actions but admits generated components casually, the governance boundary is compromised. A generated adapter with excessive permissions or missing evidence emission undermines VAI; a generated policy module with ambiguous logic undermines OpenKedge; a generated tool wrapper that bypasses intent isolation undermines SAL.
PDD also participates in the closed-loop cycle. Runtime evidence reveals protocol gaps, replay identifies unsafe behavior, and simulation refines invariants. Updated protocols constrain future generated artifacts. In this sense, PDD is a learning admission system connected to live operational telemetry.
This is the software counterpart to runtime governance. OpenKedge asks whether a proposed intent may execute now; VAI asks what authority is justified now; PDD asks whether the software artifact is eligible to participate in those decisions at all. If an agent generates an execution adapter, the adapter must satisfy the relevant protocol before it can receive contracts. If a model generates a policy helper, it must satisfy policy-module invariants before influencing decisions. If an agent generates a remediation script, the script must be admitted before it can execute with proof-derived execution identity.
Relationship to OpenKedge, SAL, and VAI
SAL governs how reasoning crosses into sovereign governance; OpenKedge governs whether a proposed intent may execute; VAI governs the runtime authority issued for approved execution; and PDD governs whether generated code and system components are admissible before participating in the loop.
SAL governs reasoning boundaries; OpenKedge governs intent; VAI governs authority; and PDD governs admissibility.
| Layer | Governance Question | Primary Artifact |
|---|---|---|
| SAL | May this reasoning output cross into sovereign governance? | Structured intent |
| OpenKedge | Should this intent be allowed under policy and context? | Execution contract |
| VAI | What runtime authority is justified by the approved contract? | Execution identity |
| PDD | Is this generated artifact admissible for use in the system? | Machine-enforceable protocol and admission evidence |
This relationship is composable. PDD governs the artifacts used by SAL, OpenKedge, and VAI, including model-context transformation code, policy adapters, identity brokers, execution adapters, evidence emitters, and simulation tools. In return, evidence from those layers refines PDD protocols. This supports architectural coherence: the systems that govern autonomous agents are themselves governed.
This recursive property matters. A control plane built from ungoverned generated components would contradict its own doctrine. It would govern agent actions while leaving the underlying software substrate to informal trust. PDD closes this gap, applying the same discipline to system construction that the control plane applies to execution: define the boundary, produce evidence, make decisions replayable, and refine the boundary from operational telemetry.
Implementation Patterns
A protocol registry represents the first implementation pattern. This registry houses versioned, owned, machine-enforceable protocols for components, adapters, workflows, policy modules, generated scripts, and plugins, specifying their invariants, evidence requirements, and applicability scope.
An admission pipeline represents the second pattern. Generated candidate artifacts flow through validation, testing, static analysis, simulation, policy checks, sandbox execution, and evidence generation. The pipeline produces explicit admission outcomes rather than treating build success as admission.
Sandbox execution is mandatory for untrusted generated artifacts. Candidate remediation scripts, adapters, or workflows run in constrained environments that restrict permissions, simulate target systems, observe side effects, and produce empirical evidence before runtime eligibility.
Protocol-constrained generation is useful but insufficient. Prompts and generation tasks include protocol specifications as constraints, examples, and acceptance criteria to help models produce better candidates. However, the admission pipeline must verify outputs independently. Prompts guide generation; protocols govern admission.
An evidence ledger records all admission decisions, connecting them to later runtime evidence. When an admitted artifact participates in an OpenKedge decision or a VAI execution path, the system can trace its origin back to the protocol that admitted it.
CI/CD integration makes PDD operationally practical. Generated artifacts are gated before merge, deployment, or runtime use, using build pipelines, code owners, policy checks, admission controllers, and deployment gates. Generated artifacts should not become operational through informal reviews or unchecked automation.
Runtime feedback closes the loop. Telemetry from OpenKedge decisions, VAI enforcement, incident reviews, and simulations updates protocols, test suites, and admission checks. This helps the admission system improve as the autonomous system encounters real-world workloads.
Implementation must also support artifact quarantine. Generated artifacts that fail admission are isolated from deployment paths and runtime registries. Previously admitted artifacts that later violate operational evidence expectations are suspended until review, preventing failed candidates from lingering in informal repositories where they could be accidentally reused.
For high-impact environments, admission decisions must be reproducible. Given identical artifacts, protocol versions, toolchains, and evidence inputs, the system must reconstruct why it admitted or rejected the artifact. This requires deliberate capture of the admission environment.
Design Requirements
PDD implementations must satisfy several concrete requirements.
- Machine-enforceable protocols. Protocols must be represented in formats that tools can validate, test, simulate, or enforce.
- Structural, behavioral, and operational invariants. Protocols must cover shape, behavior, and runtime safety rather than relying solely on tests.
- Protocol registry. The system must maintain versioned, owned protocols for generated artifacts and control-plane components.
- Candidate artifact submission. Generated code, workflows, policies, adapters, and configurations must pass through a submission path before admission.
- Automated admission checks. Static analysis, type checks, schema validation, policy checks, simulations, and conformance tests must run automatically.
- Evidence generation. Every admission decision must produce structured evidence detailing the artifact, protocol version, checks, results, and outcome.
- Sandbox and simulation support. Higher-risk artifacts must run in constrained environments before achieving runtime eligibility.
- Human review for ambiguous or high-risk artifacts. The pipeline must escalate to human review when protocol satisfaction is unclear or the institutional consequence is high.
- CI/CD integration. PDD must gate generated artifacts before merge, deployment, registry admission, or operational use.
- Runtime feedback for protocol refinement. Operational telemetry must actively refine protocols, test suites, simulations, and admission rules.
- Versioned protocols and artifacts. Reviewers must be able to verify which protocol version admitted which artifact and under what evidence.
- Replayable admission decisions. Admission outcomes must remain reconstructable for auditing, incident forensic investigations, and protocol evolution.
With SAL, OpenKedge, VAI, and PDD in place, the whitepaper turns from architecture to application: how this control-plane doctrine applies to national-scale AI initiatives such as Vision 2030, sovereign clouds, smart cities, and AI-native public administration.
References
- [1]He, Jun; Yu, Deying. Protocol-Driven Development: Governing Generated Software Through Invariants and Evidence. arXiv preprint arXiv:2605.12981. 2026. arXiv
- [2]Tim Freeman; Frank Pfenning. Refinement types for ML. Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (PLDI), ACM. 1991.
- [3]Hongwei Xi; Frank Pfenning. Dependent types in practical programming. Conference Record of the 26th Symposium on Principles of Programming Languages (POPL'99), ACM. 1999.
- [4]John M. Lucassen; David K. Gifford. Polymorphic Effect Systems. Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ACM. 1988.