Jun 1, 2026 · 8 min read

Governance or Containment: The Real Question I'm Bringing to Build

I’m on a plane to San Francisco as I write this. When I wrote down the four questions I was bringing to Build, I thought I was going to evaluate an assembled platform: whether Agent 365, Foundry, Purview, Entra, and A2A finally cohere into something you can build on.

In the days since, I realized those four questions are the same question wearing four hats. And I made a decision that changes what I’m actually testing.

The Axis Isn’t Low-Code Versus Pro-Code

Every framing of Microsoft’s agent story routes through Copilot Studio versus Azure AI Foundry: low-code versus pro-code. It’s the wrong axis. The pro-code path is the one that matters either way: it’s how you implement the kind of code factory a regulated shop needs. The low-code lane is a side quest.

The decision that actually matters is about where control lives. There are two postures, and they pull in opposite directions.

Control by governance is Microsoft’s posture. Control flows down from the platform: Entra issues the agent’s identity, Purview enforces the data policy, Agent 365 is the control plane that inventories and revokes, A2A carries the trust between agents. You get governance because you live inside the system that grants it.

Control by containment is the other posture, the one I argued for in The Sandbox Isn’t the Hard Part. You own a thin, auditable runtime, and enforcement lives on the environment, not inside the agent. NemoClaw and OpenShell got this right: a compromised agent can’t disable a sandbox it doesn’t control. You own the layer below the proxy yourself instead of trusting the platform to reach it.

The whole trip pivots on whether these two postures can coexist. Can Microsoft’s governance attach to a runtime I own and operate outside its walls, or do I only get governance by moving inside?

Two postures for agent control: control-by-governance, where identity, policy, and the control plane flow down from Microsoft's platform, versus control-by-containment, where a thin harness on pi enforces from below, with the coding agent positioned at the seam as the adversarial probe where governance either holds or breaks first

The Harness Is the Thing Worth Keeping

The architecture worth building is a thin harness. Not adopting a framework, but building a harness, on top of pi, the model-agnostic coding-agent toolkit Mario Zechner built, the same engine that runs under OpenClaw, a ~500-line agent core where agents generate their own tools rather than download them. Hermes sits in the same bucket for the messaging-driven work.

This is the same argument I made about the factory: the harness is the moat, not the model. You rent the engine: Opus this quarter, something better next quarter. What you own is the harness: the sandbox, the context, the feedback loop, the review gate, the identity model. The harness is the part that compounds.

So the question is no longer “should I adopt Microsoft’s framework?” It’s “can you keep the harness and get Microsoft’s governance?” And answering it requires being precise about three things the industry collapses into one word.

Microsoft Agent Framework is an open-source SDK, the successor to Semantic Kernel and AutoGen. It’s code you adopt to define and orchestrate agents. Foundry Agent Service is the managed runtime. Microsoft’s own description says it handles “hosting, scaling, identity, observability, and enterprise security,” supporting any framework. Microsoft Foundry is the umbrella platform above both.

The conflation hides the real question. When a vendor says “governance comes with Foundry,” do they mean it comes with the SDK you can run anywhere, or only when your agent runs inside the managed Service? Because if governance is a property of the Service, of the runtime you don’t own, then control by containment and control by governance are mutually exclusive by construction. That’s the thing I need to know before I write a line of the harness.

Why I’m Testing It With a Coding Agent

A business agent built in Copilot Studio is the easy case for Microsoft’s governance. It calls Graph through connectors, lives inside the tenant, and never touches a shell. Of course the control plane can govern it; it built it.

A coding agent on pi is the hard case, which is exactly why it’s the right probe.

It runs with maximal privilege (shell, filesystem, arbitrary code execution) not the bounded surface of a connector. It runs outside the managed service, on a devbox and in CI. It self-extends at runtime, writing its own tools mid-task, so its capability set isn’t fixed at provisioning. And it’s embeddable as a downstream node in an agent graph, reachable by other agents over A2A.

That’s the factory’s Execution Environment and its Identity actor pushed to the limit. If Microsoft’s governance can hold a leash on an agent that privileged, running that far outside, mutating that fast, it can hold anything. If it breaks, it breaks here first, and I’d rather find the break this week than in a production change six months from now.

Control Means Enforcement. Visibility Means a Transcript.

The first question I asked in the prequel was whether Agent 365 is control or visibility. I have two break scenarios that decide it, and I’m going to put them in front of the session engineers directly.

The revocation test: the harness is mid-task, executing a git push. I revoke its Agent 365 permission while the push is in flight. Does the push die at the boundary, or does it complete, and turn red on a dashboard thirty seconds later? Killing it is control. Coloring it red is visibility.

The inventory test: the harness runs npm install mid-session and now holds a capability it didn’t have when it was provisioned. Does the Agent 365 inventory reflect the new tool, or does it only know about what was registered at creation, blind to everything the agent grew into afterward? Tracking the drift is control. Listing the birth certificate is visibility.

A transcript is useful. It is not a leash. I need to know which one I’m buying.

Does Purview Reach a Runtime I Brought Myself?

The prequel asked whether one Purview policy plane could govern both Studio-built and Foundry-built agents. The harness decision sharpens that into something harder.

When a pi-based agent makes a model call to a configured provider, that payload can carry proprietary code. Does Purview see it leaving (classify it, redact it, block it) or does Purview’s reach stop at the agent boundary, governing only the data flows that move across Microsoft’s own surfaces?

This is the gap I mapped in What witness.ai Doesn’t See: the most dangerous surface sits below the network layer, in the execution environment and the configuration, where conversation-level governance has no visibility. If Purview is a conversation-and-connector control, it stops exactly where the real risk starts.

A2A Is the Seam, and the Lock-In Tell

This is the highest-leverage thing I’ll learn all week.

A2A is the open protocol, the wire format for agents talking to agents, announced as a new era of interoperability. The question is what conformance actually buys. If the harness speaks A2A, does that alone make it a first-class governed peer (Agent 365 inventories it, Purview policy applies to it, Entra identity attaches to it) purely because it conforms?

Or does governance only travel with the SDK and the Service, so an A2A-conformant agent that didn’t adopt the framework and didn’t host in Foundry is just a stranger on the wire, visible but ungoverned?

If it’s the latter, A2A is open at the bottom and governance is the lock-in layer above it. That’s the exact pattern I traced in The Agent Control Plane War: the protocol commoditizes, the registry and the governance plane above it accumulate the switching costs. Governance-as-lock-in is the mechanism the whole control-versus-visibility question has been circling. A2A is where I find out if it’s real.

Identity: Ephemeral, or Sprawl at Job Granularity

The factory’s hardest primitive is identity: an agent that opens a PR needs a governed non-human identity and an audit trail that survives an examiner. I wrote about why your Okta federation can’t see it in Who Issued the Agent?

The test for Entra Agent ID is binding model. Does it bind to ephemeral execution, where a CI job spins up, gets a short-lived, scoped identity, and the identity expires when the job ends? Or does each agent instance require a durable service principal, so a thousand CI runs leave a thousand service principals behind: identity sprawl at job granularity, the machine-scale version of the exact problem federation already can’t govern?

A coding factory runs agents by the thousands, briefly. If the identity model assumes durable actors, it’s modeling the wrong thing.

The No-Fluff Test Still Stands

Microsoft promised a no-fluff event. The signal won’t be in the keynote. It’ll be in the L-400 tracks and the in-person lightning talks, where a sophisticated room asks “but what about the scenario where…” and the answer is either a clean mechanism or a moment of honest uncertainty.

I’m calibrating one distinction the whole time: “generally available” versus “works in production across thousands of tenants with years of identity-configuration debt.” The boring infrastructure (clean revocation, audit completeness) sometimes ships ahead of the marketing. Other things take eighteen months past GA to mean what the keynote said. I’m there to learn which is which.

The Decision It Forces

Strip everything else away and Build resolves a binary.

Either a thin harness you control plus external governance is real: Microsoft lets its control plane, its policy engine, and its identity model attach to a runtime you own, through open conformance rather than runtime captivity. The strategy works, and the harness is worth building.

Or governance only travels with the SDK and the managed Service, and I’m forced to choose. Control by containment, and forfeit the platform’s governance. Or control by governance, and give up the harness that I already argued is the moat.

That’s the same fork the factory post closed on (own your chassis, or run someone else’s finished factory) with the one variable I couldn’t resolve from the outside. Whether owning the chassis still lets you buy in the governance, or costs you it.

I don’t know the answer yet. That’s the honest state of it. I have a strong prior and two break scenarios I’m going to run against real engineers in a real room.

The verdict, which way Build actually pointed, lands in the next post, after June 3.

If you’re at Fort Mason working the same seam, whether platform governance can reach a runtime you own, I want to compare notes before the verdict gets written. The companion read is Six Primitives for a Code Factory: the harness this whole question is about.

Find me on X @orestesgarcia or LinkedIn /in/setsero. The verdict will be here after June 3.