
Author: Aaron Rea, Principal Consultant, Cloud Security
I just got back from Google Cloud Next, and after three days of “agentic” being said roughly every eleven seconds, one thing is clear. We are back in a platform-building moment. Not a “Gemini is cool” moment. Not a “let me bolt an LLM onto Lambda” moment. A full-on, this-is-going-to-eat-the-next-five-years platform moment, the same flavor as when containers stopped being a Docker party trick and turned into Kubernetes eating the data center.
The headline was the Gemini Enterprise Agent Platform, Google’s evolution of Vertex AI plus a constellation of new primitives: Agent Studio, Agent Runtime, Agent Identity, Agent Gateway, Agent Observability, Agent Sandbox, Agent Memory Bank, Agent Threat Detection. If you read that list and your brain auto-completed “this is Kubernetes for agents,” you and I are reading the same tea leaves.
The DIY vs. COTS tightrope
Here’s the tension every cloud and platform team is walking into in 2026, whether they’ve named it or not.
On one side, DIY: roll your own agent runtime on raw GKE, wire up your own vector store, glue MCP servers to your service mesh, write your own gateway, build your own audit pipeline. You’ll learn a lot. You’ll also be eighteen months behind, and you’ll own a snowflake your security team has to defend and your auditors have to certify.
On the other side, fully-baked COTS: buy the platform end-to-end, click through the studio, ship to prod, accept whatever lock-in and opaque behavior comes with it. It works until it doesn’t, and now you’re explaining to a regulator why you can’t tell them which model approved a transaction.
The sustainable answer is the middle path: build a platform. Not a magical AI platform. A boring, opinionated, paved-road platform that picks the abstractions you want to be portable across, lets you swap models and runtimes underneath, enforces your security and compliance posture by default, and gives your app teams a paved road that doesn’t require them to learn what gVisor is. Same playbook we ran with K8s a decade ago. Nobody actually wanted to manage etcd; everyone wanted the abstractions on top.
For regulated industries, this isn’t a preference. It’s the only path that produces an auditable, governable agent estate.
What actually shipped that matters
A lot of the 260+ announcements were product marketing, but several are genuinely platform-shaped and worth real attention.
GKE Agent Sandbox. The standout. gVisor-based kernel isolation, the same tech protecting Gemini, capable of 300 sandboxes per second per cluster at sub-second latency, with up to 30% better price-performance on Axion vs. other hyperscalers. If you’re going to let LLM-generated code execute against your infrastructure, traditional container isolation isn’t sufficient, and Agent Sandbox is the first credible primitive that treats “ephemeral, untrusted, single-replica workload” as a first-class K8s concept. It’s based on an open-source controller, so the abstraction is portable across clouds. That matters for any compliance regime that requires demonstrable workload isolation.
Agent Gateway and Agent Identity. These are the unsexy announcements that signal Google is taking enterprise seriously. Every agent gets a workload identity, every action runs through a unified policy enforcement point, every interaction emits audit telemetry. If you’ve ever tried to retrofit identity onto a fleet of microservices that grew up without it, you know why doing this on day one is the difference between a platform and a liability. For SOC 2, HIPAA, PCI, and FedRAMP environments, this is the foundation that makes agent workloads defensible.
TPU 8t / 8i split and Axion N4A going GA. The split into training-optimized (8t) and inference-optimized (8i, with claimed 80% better performance per dollar) is honest about something the industry has been quiet about: the workloads are different, and one chip serving both leaves money and latency on the table. Pair that with Axion N4A delivering up to 2x better price-performance than current-gen x86 for cost-sensitive workloads, and the economic case for putting agent glue code on Arm becomes hard to argue with.
llm-d as a CNCF Sandbox project. A buried but important development. Google is a founding contributor alongside Red Hat, IBM Research, CoreWeave, and NVIDIA, with a stated vision of “any model, any accelerator, any cloud.” If that delivers, distributed inference becomes a portable Kubernetes-native primitive instead of a vendor-locked feature, and CNCF governance gives multi-cloud strategies a real anchor point. That’s the K8s parallel: when CNCF picked a winner in container orchestration, the industry consolidated and the platform layer above it became where differentiation happened.
Wiz + Google Threat Intelligence integration. A direction worth watching closely. The combined platform brings Wiz’s Cloud and AI Security Platform together with Google’s threat intel, and the new AI Application Protection Platform claims code-to-cloud-to-runtime coverage. The acquisition is starting to produce a coherent story for AI workload security, especially for organizations operating across multi-cloud and hybrid footprints.
The K8s parallel, in case it isn’t obvious
Around 2015 you had three camps. Camp A wrote their own Mesos clusters and bespoke schedulers because they were big enough and had the engineering capacity. Camp B bought a PaaS (Heroku, Cloud Foundry, OpenShift early days) and accepted the lock-in. Camp C bet on Kubernetes, which at the time was unfinished, underdocumented, and required deep operational knowledge to run safely.
Camp C ultimately won. Not because K8s was elegant (it wasn’t, and arguably still isn’t), but because it was open enough to invest in, opinionated enough to hide what teams didn’t want to learn, and supported by an ecosystem that compounded year over year. Every serious infrastructure team eventually built their own internal platform on top: paved roads, golden paths, opinionated CD pipelines, internal developer portals. That layer is where the differentiation lived. It’s also where the operational maturity, the security posture, and the compliance evidence lived.
We’re in the same spot with agents. The primitives Google announced (sandboxes, runtimes, identity, gateways, memory, observability) are the new kubelets, ingress controllers, and service meshes. They’re useful. They’re also not a platform. The platform is what gets built on top of them: the opinionated agent SDK your dev teams actually use, the policy bundles that wrap Agent Gateway with your auth and DLP rules, the CI pipeline that runs Agent Simulation against synthetic traffic before anything hits prod, the runbooks and SOC playbooks for when an agent misbehaves at 3am.
If your organization’s plan for the next 18 months is “let each team pick their own framework and ship,” you’re going to relive the fragmentation pain of raw Docker in 2016. Pick your abstractions, build the paved road, and make the easy thing also be the secure, observable, and auditable thing.
A quick word on Vegas and Weezer
Credit where it’s due: Next on the south end of the Strip was the most manageable big cloud conference I’ve done in years. Everything within walking distance, no shuttle bus roulette, no hour-long marches through three casinos to get to a breakout. Other large cloud conferences could learn a thing or two. And Weezer at Allegiant on Tuesday was exactly what 40,000 conference badges needed to remember they’re human beings. Rivers Cuomo’s voice is somehow unchanged since 1996. I have theories.
What to do now that we’re a week out
A week back at your desk, the keynote afterglow has faded, and the question is what actually changes in your roadmap. Four things worth committing to:
- Pick your abstraction layer and commit. GKE plus Agent Sandbox is a defensible bet because the controller is open-source and the primitives map to K8s concepts your team already understands. If you’re on EKS or AKS, watch llm-d closely. That’s your portability story.
- Treat agent identity and governance as table stakes. Every agent gets a workload identity, every action gets audited, every tool call goes through a gateway. Do this before you have 50 agents in production, not after. Your auditors will ask, and “we’ll get to it” is not an answer.
- Build the simulation and eval pipeline before you build the agents. Agent Simulation and Agent Optimizer are interesting; what’s more interesting is having a CI gate that says “this agent regressed on the eval suite, no merge.” That’s a 2026 problem you can solve with 2018 tools.
- Match your buy-vs-build choices to where your differentiation actually lives. Buy the primitives that are commodity (compute, isolation, identity infrastructure). Invest your engineering effort on the platform layer that encodes your business logic, your security posture, and your operational expertise. That’s the layer that compounds.
The next decade of infrastructure is going to look a lot like the last one. A Cambrian explosion of vendor offerings, a slow consolidation around open primitives, and platform teams quietly doing the unsexy work of turning chaos into a paved road. The teams that win won’t be the ones who picked the right LLM. They’ll be the ones who built the platform that made the LLM choice swappable, the security posture inheritable, and the audit story tellable.
If that platform layer is on your roadmap, and you’re trying to figure out where to start, that conversation is one many of us in the cloud and security community are having right now. Worth having sooner rather than later.

