When Your Team Starts Building
Nobody in the room could explain what it did.
One engineer was automating the MuleSoft API spec lifecycle — update the spec, publish it to the design portal, update the ticket. He’d loaded Claude Code with context about the project: the application inventory, the naming conventions, the existing skill structure. He described the goal and let it run. Claude asked him for his credentials. Then it navigated to Anypoint Design Center on its own, listed every spec in the registry, and cross-referenced them against the existing application catalog — a mapping no one had explicitly described. When the room absorbed what had just happened, one person said: “The only guy I know who understands how this works is Karpathy. And even he says he has no clue.”
That’s where we are.
What the Follow-On Prompt Produced
Three weeks ago I closed a team session with a single assignment: from everything you documented, pick one step you’d hand to an agent first. Not the whole workflow. One step.
The previous session had shown me five different mental models — who was already building, who had the domain knowledge to design, who knew exactly where they hurt. The exercise was diagnostic. The follow-on prompt was the first move toward execution.
What came back was not uniform, not polished, and not what I expected. Five engineers, five implementations, five different stages of completion — all in a real enterprise environment with real deadlines, real systems, and real compliance constraints. Not a proof of concept scheduled for a demo. Work in progress.
Five Builders
The Automator. The first engineer presented something already running in production for two repositories. A GitHub Actions workflow that triggers on every pull request, runs Claude Code against the diff, compares the changes to the team’s Confluence-documented standards — naming conventions, data masking requirements, commit message formats — and posts a structured review comment before a human reviewer opens the tab.
His offshore colleagues now get that feedback in seconds instead of waiting until 8am. The challenge he hit wasn’t Claude — it was the authentication setup in GitHub’s CI context. Getting the right token flow took longer than the automation itself. The feedback I gave him: encapsulate the review prompt as a skill before propagating to the other 27 repositories. Centralizing a reusable YAML the same way you’d centralize any shared pipeline configuration is the difference between a working tool and 27 tools that drift apart. Claude Code also has a built-in /github command that wires the Actions integration automatically — that’s the right starting point once the skill is solid.
The hardest part of his demo wasn’t the automation. It was resisting the urge to ship before it’s stable.
The Planner. The second engineer came back with a structured project skeleton generated by Claude Code from a description of the workflow he wanted to build: aggregate requirements from Jira and ServiceNow, consolidate them into a daily digest, route them to the right people. He had the folder structure, the module separation, the file naming — all scaffolded from a description and a conversation.
He ran into one meaningful problem: his first prompt produced a web application. He didn’t want a UI. He wanted an agentic workflow — something that runs, fetches, correlates, and delivers, without a browser in the loop. He refined the prompt until the shape of the output matched the shape of the solution he needed. The feedback I gave him: the Jira integration is already solved. We have two working options in the team marketplace — an Anthropic-official skill via MCP, and our own CLI-based implementation that connects to both Jira and Confluence. The new ground is ServiceNow. We have nothing there. That’s where his time adds something the team doesn’t already have.
Directing an agent toward the gaps, not the solved problems — that’s the skill.
The Spec Wrangler. This is the engineer who produced the moment nobody could explain.
He’s been building the MuleSoft API spec automation in stages: read the spec, apply changes based on requirements, publish the updated version to Design Center, notify the ticket. A workflow that currently costs a senior engineer an hour per change, sometimes more. He’d prepared context files — the runtime application inventory, the naming prefixes, the existing skill structure from a related project. He showed Claude what was already there and asked it to build something analogous for the spec layer.
What Claude actually did was go further. It identified the authentication pattern for Anypoint Platform on its own, asked for credentials, connected to Design Center via its API, pulled the full spec registry, and started cross-referencing. He got to the end of the demo uncertain exactly what decision Claude had made to take that path.
This is the implementation gap: the distance between what you described and what the agent executed. It’s different from the context accumulation gap I described in the previous post. Here, the agent didn’t do exactly what was planned — but what it did was useful. Learning to work with that gap, to read it rather than fight it, turns out to be its own engineering discipline.
The New Hire. One engineer on the team joined recently. Thirty years in the industry. First AI-assisted onboarding.
He was honest at the previous session: he doesn’t have a defined pipeline of incoming work yet. He’s in the absorption phase — learning the codebase, learning the team’s tools, building a picture of what we have before committing to what he wants to build. What he presented wasn’t a finished skill. It was a pattern.
He sends roughly 90% of his tasks to an agent first. Not primarily for coding — for orientation. One example: he asked Claude Code what pull requests he had pending and when the last pipeline had run for a specific resource. Claude pulled from three separate skills — GitHub, Jira, and Azure DevOps — without being told to coordinate them. An investigation that would have taken 15 to 20 minutes of portal-clicking came back in seconds. He’d already worked out the answer before he went to confirm it with a colleague.
He’s been building context in days instead of weeks. My suggestion to him: turn this into a ramp-up skill. Not a tutorial. A structured skill with workflows that answer the questions every new engineer asks in the first 30 days. We onboard consultants regularly. If we can compress orientation with a well-built skill, that’s time we stop paying twice.
The Investigator. The most production-ready implementation came from the engineer closest to the support function.
He built an investigation skill on top of the team’s Datadog MCP connection. The use case is one the team handles several times a week: a user calls in with a problem, and someone has to trace what happened in the system. Before this skill, that meant navigating Datadog manually — filtering by time window, correlating session events, cross-referencing error codes against system documentation. The skill does all of it from a single prompt. Give it an account number, a reference ID, or a date range. It reconstructs the full session timeline, maps events in sequence, and explains what happened.
He demoed a real case from the previous day. A user had submitted a stop payment and received a generic error. The skill traced the session, found error C90 — a catch-all from the payment processor — and identified the root cause: the payee name was 44 characters. The field limit was 25. Resolution took seconds. The same investigation, done manually, would have pulled three engineers into a channel for the better part of a morning.
He has 26 memory files, a skill for each transaction type, and documentation written to the team’s standard. It’s ready to share.

The Wall Everyone Hit
Here’s what I noticed across all five presentations: every implementation eventually ran into the same question.
How do I share this?
The automator needs to propagate to 27 repositories. The planner wants to build something the whole team can install. The spec wrangler wants his context files and skill to live in a shared project. The investigator wants his 26 memory files and skills out of his personal environment and into a format anyone on the team can run. Even the new hire ended up with an assignment: codify your onboarding pattern so the next hire benefits from it.
Every builder, independently, arrived at the distribution problem. That’s not a coincidence — it’s the structural problem that follows every successful implementation.
Personal tool → team tool → team marketplace → enterprise capability. The path is the same every time. What changes is how soon a team invests in the infrastructure that makes the transition possible. The organizational barriers to AI adoption aren’t usually the technology — they’re the systems and standards that decide what gets shared and what stays siloed. Without a marketplace, without a review process, without a standard for what a publishable skill looks like, the default is accretion: everybody’s best tools stay on everybody’s own machines until they’re too outdated to run.
The session ended with a direct action item: build a team standard for skill creation. What structure does a publishable skill follow? What review process does it go through? That’s the agenda for the next session — not more implementations, but the infrastructure that turns implementations into shared capability.
The Ramp-Up Question
I want to return to the new hire’s experience, because it tends to get overlooked when you’re surrounded by engineers who are deep in the tooling.
He’s been in the industry for thirty years. He knows how to ramp up. His previous experience: two engineers, a few hours of shadowing daily, weeks before he could move independently. This time: Claude Code with the right skills loaded. His words: “I’ve been able to absorb what would have taken weeks in a couple of days — not at the depth of someone who’s been here for years, but at the level I need to start contributing.” He can hold a meaningful conversation about the system. He can assign tasks without being an expert in every component.
That’s a structurally different kind of onboarding. And it points at something I want to document as a formal artifact — not a wiki page, not a slide deck, but a working ramp-up skill with executable workflows: where does code live, how does a change move from local to production, what naming conventions apply, who owns what. The kind of thing that’s automatically useful the day someone joins, and automatically decays in usefulness as answers change — which means it also works as a forcing function for keeping the documentation current.
The team that builds this has a structural advantage. Not in the interview room. In week three.
The Question That Had to Be Asked
During the investigator’s demo, another engineer raised his hand: if you give the skill an account number and it reconstructs the full user session, can you identify the individual?
It was the right question. It needed to be asked. And it surfaced the thing that requires the most care in a regulated environment.
The answer, in this case: our data masking happens at the application level, before logs reach Datadog. Personally identifiable information — names, addresses, identifiers — is stripped at the source. What reaches the observability platform is session data, error codes, reference IDs. The skill can identify what happened. It can’t identify who, by design.
But the question mattered more than the answer. An engineer who raises it during a live demo — not in a compliance review six months later, not after an incident — is exactly the culture that makes AI adoption in a bank sustainable. The instinct to ask “but what about PII?” doesn’t need to be trained into people who’ve been working in financial services long enough. What needs to exist is an environment where that instinct is acted on, in the moment, before the tool ships. This conversation happened in the working session. That’s where it belongs.
The Infrastructure of Intelligence
The pattern across all five implementations runs the same way: articulation enabled action, action produced a working tool, and the working tool immediately revealed a distribution problem.
That sequence isn’t unique to this team. It’s what a real AI transformation looks like at the session level. The first skill is always personal. The first useful skill always wants to become a team resource. The question is whether the team has the infrastructure to support that transition, or whether useful tools stay siloed until they’re no longer useful to the one person who built them.
The next session is about building that standard. That’s the work that turns five individual builders into a team that compounds.
If you’re running an AI transformation in a regulated environment and hitting the same distribution wall — how skills move from personal to shared — I’d be curious what you’re finding. Find me on X @orestesgarcia or LinkedIn /in/setsero.
Related: Same Question, Different Worlds — the session that set this in motion, and what the five different mental models revealed about where the team actually stands.