Pillar 5: Design and Prototype¶

v0.8 — Working Draft

This page is under active development. Content is directionally accurate but subject to revision. Suggest an edit →

"Every initiative is a prototype until evidence says otherwise."

The Argument¶

Organizations have a deep bias toward commitment. Once an AI initiative is approved, funded, and announced, it acquires institutional momentum that makes it extraordinarily difficult to modify, pivot, or kill — regardless of what the evidence says. This is not a failure of individual judgment. It is a structural consequence of how organizations make decisions: sunk cost psychology, reputational risk to sponsors, and planning processes that treat approval as a one-way gate.

Design thinking and prototyping offer a counter-logic. The core principle is epistemic humility: we do not know in advance which AI applications will create value, which workflows will change, or how users will actually behave. Therefore, every initiative should be treated as a prototype — a hypothesis to be tested — until empirical evidence supports scaling.

This is not the same as running pilots. Most organizational pilots are theater: the outcome is predetermined, the success criteria are vague or post-hoc, and the political cost of declaring a pilot a failure exceeds the political cost of scaling a mediocre solution. Genuine prototyping requires pre-registered hypotheses, measurable success criteria defined before launch, explicit kill criteria, time-boxed evaluation periods, and organizational norms that treat stopping a prototype as a success (we learned something) rather than a failure (we wasted money).

The design dimension is equally critical. AI initiatives frequently fail not because the technology does not work, but because the human interaction with the technology was poorly designed. Prompt interfaces are confusing, outputs are presented without calibration, workflows assume adoption patterns that do not match actual behavior, or the AI is inserted into a process at the wrong point. These are design failures, and they are preventable through the standard methods of human-centered design: user research, journey mapping, iterative testing, and rapid prototyping of the human-AI interaction — not just the AI model.

Prototyping also serves a trust function (linking to Pillar 3). When employees see that the organization is testing AI applications carefully, measuring results honestly, and willing to stop what does not work, they extend more trust to the process. The alternative — announcing a grand AI strategy and deploying at scale — communicates overconfidence and reduces employees' sense of agency.

The practical implication is that organizations should maintain a portfolio of prototypes at various stages of maturity, governed by evidence-based stage gates. Investment follows evidence, not executive enthusiasm.

In Practice¶

A pharmaceutical company adopted a "prototype-to-production" pipeline with four explicit gates: concept (1-week test with synthetic data), feasibility (4-week test with real data, small team), validation (12-week controlled comparison against current process), and scale (full deployment). Each gate required pre-defined quantitative criteria. Of 22 concepts that entered the pipeline, 7 reached production — and each of those 7 delivered measurable value. The 15 that were stopped represented learning, not waste.

A financial advisory firm applied human-centered design methods to its AI deployment. Before building any AI tool, the firm conducted contextual inquiry with advisors — observing how they actually worked, not how process maps said they worked. This revealed that advisors' primary pain point was not portfolio analysis (where the AI team had planned to start) but client meeting preparation. The redesigned prototype addressed the actual pain point and achieved immediate adoption.

A municipal government published its AI prototype results — including failures — on an internal dashboard visible to all employees. This radical transparency served two purposes: it built trust in the process (employees could see that bad ideas were actually being killed) and it created an organizational learning asset (teams could see what had been tried and what had been learned before proposing their own initiatives).

The 4×1 Matrix¶

Dimension	Example
Tools	Prototype canvases, pre-registered hypothesis templates, stage-gate evaluation scorecards
Processes	Time-boxed prototype cycles, evidence-based stage gates, human-centered design sprints
Behaviors	Defining kill criteria before launch, celebrating stopped prototypes, sharing prototype learnings openly
Change Skills	Designing rigorous experiments, facilitating go/no-go decisions without politics, conducting contextual user research

Diagnostic Questions¶

Kill rate: What percentage of your AI pilots have been stopped based on evidence? If the answer is zero, your pilots are not genuine experiments — they are pre-approved deployments with a pilot label.
Pre-registration: Are success and failure criteria defined before a prototype launches — or constructed after results are known? Post-hoc criteria are indistinguishable from rationalization.
Design investment: How much of your AI initiative budget is allocated to understanding user needs, designing interactions, and testing usability — versus building models and infrastructure?
Learning capture: When a prototype is stopped, is the learning formally captured and accessible to future teams? If not, you are paying the cost of experimentation without reaping the benefit.

← Back to Change Agility Overview