Google Cloud - Community - Medium

Anthropic Just Gave Agents ‘Dreams.’ Here’s How to Build Your Own on Google Cloud

Qingyue(Annie) Wang — Sat, 20 Jun 2026 04:59:36 GMT

This is Part 4 of a five-part series on building an AI agent with reflective memory on Google Cloud, from the core concept to a verified implementation.

TL;DR: Anthropic’s Dreams lets a managed agent sleep on past sessions and wake up with cleaner memory. On Google Cloud, you can build the same pattern yourself: keep the live session as evidence, let Memory Bank remember the user, let a Cloud Run Job reflect on completed task trajectories, store the lessons in Firestore, and retrieve both before the next action.

Anthropic recently gave managed agents a way to sleep on it. With Dreams, Claude reviews past sessions and an agent’s existing memory while the agent is idle, then produces a cleaner memory store for future sessions: duplicates merged, stale facts updated, contradictions resolved, and new patterns added.

That is a catchy product feature, but the architecture lesson is bigger than the product:

The agent works now. The harness reflects later. The next action starts with better memory.

This post builds that pattern on Google Cloud with components you own: Memory Bank, Firestore, embeddings, and a scheduled Cloud Run Job. There is one important difference from Anthropic’s managed Dreams: their feature curates a memory store from past sessions, while this architecture splits memory into two jobs:

learn the user: derive durable user facts from conversations;
learn the job: derive reusable lessons from completed task trajectories.

Both are reflective memory. They just run over different evidence.

The architecture

Dana opens a support ticket:

The billing page will not load.

To answer well, the agent needs two different memories. It needs to remember Dana: she is an administrator at ACME, on the Enterprise plan, and prefers concise troubleshooting. It also needs to remember the job: similar “page will not load” tickets often turn out to be missing roles, so permissions should be checked before generic browser advice.

Those memories come from different places, on different timelines:

the user facts come from conversations;
the job lesson comes from a completed support trajectory;
the current ticket lives in the active session.

The architecture is:

Keep the live session as evidence. Write user facts and job lessons through separate reflection paths. Retrieve both into one scoped context before the agent acts.

The database choices matter less than the separation of responsibilities:

Cloud SQL or managed Sessions keeps the active conversation durable.
Memory Bank manages user-facing long-term memory.
Firestore stores trajectories and derived job lessons.
Cloud Run Job runs the offline reflection process.
A recall tool merges scoped user facts and job lessons at runtime.

That is the harness doing memory.

Start with evidence, not memory

The live session is not long-term memory. It is evidence: messages, tool calls, intermediate state, and outcomes. ADK represents this through a SessionService; in production, that should be durable rather than in-memory.

This distinction prevents the common mistake:

A persisted transcript is not the same thing as useful memory.

A transcript records what happened. Reflective memory records what the system learned from what happened, so the harness keeps the session and derives smaller memory artifacts from it.

Write path one: Memory Bank remembers the user

The first write path is the managed one. At a deliberate lifecycle point, the application sends relevant conversation events to Memory Bank, which can generate and consolidate durable user facts under identity scope.

For Dana, that might become:

Dana is an administrator at ACME.
ACME is on the Enterprise plan.
Dana prefers concise troubleshooting steps.

This path matters because the final recall step needs user context, but it is not the hard part of this post. The harder part is the second write path: teaching the agent from completed work.

Write path two: the agent learns the job

The second reflection path begins after a task finishes. Suppose the support agent first suggests clearing the browser cache. That fails. Later, an account check finds the real cause: Dana is missing the billing role.

The harness records a structured trajectory:

agent_id: support_agent
user_id: dana
task: billing page will not load
actions: [suggest cache clear, check account, check billing role]
outcome: resolved
root_cause: missing billing role

The most important field is the outcome.

Without the outcome, reflection tends to produce vague advice like “diagnose the issue carefully.” With the root cause and resolution, reflection can produce a reusable lesson:

For permission-gated pages, verify account status and resource roles
before generic browser troubleshooting.

This is why the trajectory store should not be a raw transcript dump. It should record enough structure for later reflection to know what actually worked. In this architecture, the trajectory is written to Firestore and marked processed: false, which turns Firestore into the dream job's work queue.

The dream is a scheduled reflection job

The dream is the background process that turns completed trajectories into reusable job memory. Technically, it is a Cloud Run Job triggered by Cloud Scheduler. It does not serve live HTTP traffic; it starts, processes a batch, writes derived memories, and exits.

The job:

reads unprocessed trajectories;
selects the ones worth reflecting on;
asks Gemini to derive a lesson or procedure;
validates the result and applies scope;
embeds the approved memory;
writes it back to Firestore for retrieval;
marks the source trajectory as processed.

Scheduled reflection is a design choice, not a law. If an agent needs to retry immediately, reflection may happen synchronously between attempts. For many systems, though, batch reflection is better: it keeps expensive reasoning off the live path, supports consolidation, and gives you a place for review, confidence checks, and expiry.

Every derived lesson should keep:

its source trajectory;
its scope;
its confidence or review status;
its expiry or freshness policy.

That traceability matters because a reflective system can preserve a wrong conclusion as faithfully as a correct one.

Embeddings make lessons usable later

Future tickets will not repeat the exact same words. “Billing page is blank,” “I cannot open invoices,” and “the page will not load” may all need the same permissions-first procedure, so the dream embeds the approved lesson and stores the vector in Firestore. At retrieval time, the harness embeds the current query and uses Firestore vector search to find nearby lessons.

The durable rule is simple:

Store and query with the same embedding model, task convention, and vector dimensions.

The specific model can change. The architectural responsibility does not.

Read path: merge facts and lessons before acting

When Dana opens the next ticket, the agent should not choose between user memory and job memory. It needs both, and a custom recall tool can query both stores:

async def recall(query, tool_context):
    user_facts = await tool_context.search_memory(query)
    job_lessons = search_job_memory(query, scope=current_scope)
    return {
        "user_facts": user_facts,
        "job_lessons": job_lessons,

    }

Memory Bank returns context about Dana. Firestore returns the permissions-first lesson. The tool formats both into one scoped payload:

Customer context:
Dana is an Enterprise administrator at ACME.

Relevant job lesson:
For permission-gated pages, verify account status and resource roles
before generic browser troubleshooting.

Now the agent can make a better first move:

Let me check your billing role before we try browser troubleshooting.

This does not guarantee success. It changes the starting point, which is what reflective memory is supposed to do.

One implementation note matters: this is a composition pattern, not a special built-in ADK multi-store abstraction. ADK can wire Memory Bank as the memory service, but the Firestore lesson lookup is owned by your custom tool. Filtering, ranking, scoping, and formatting are harness responsibilities.

One guardrail matters here: similar does not mean allowed. A lesson can be relevant and still belong to the wrong user, tenant, or agent, so the recall tool should enforce scope before returning anything. Private memories stay private, agent-domain lessons stay within their domain, and shared lessons require stronger review. Do that filtering in the query and tool layer, not after the model has already seen the memory.

The loop you are building

The whole architecture has one rhythm:

The live session records what happens.
Memory Bank derives approved user facts.
Firestore records completed task trajectories and outcomes.
The dream derives scoped job lessons.
Retrieval merges facts and lessons before the next action.
The next action becomes new evidence.

The stack can change, the model can change, and the schedule can change. The durable design is the separation of evidence, derived knowledge, scope, and retrieval.

That is the build-your-own version of agent Dreams: not a hidden product feature, but a harness pattern you can inspect, govern, and verify. The final post turns this architecture into a build and proves that memory actually changes the agent’s next decision.

This is Part 4 of a five-part series on building an AI harness with reflective memory on Google Cloud.

Next: Part 5 — Vibe Code Your Own Agent ‘Dreams’ in Antigravity

Anthropic Just Gave Agents ‘Dreams.’ Here’s How to Build Your Own on Google Cloud was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Cloud Tetris: A Technical Deep Dive into GCP Sole-Tenant Node Usage Optimization

Dwij Sheth — Sat, 20 Jun 2026 04:59:29 GMT

Authored by: Dwij Sheth & Vinay Damle

The cloud is built on the promise of elasticity, but certain workloads demand dedicated hardware for strict compliance, licensing restrictions, or absolute performance isolation. In Google Cloud Platform (GCP), this is achieved using Sole-Tenant Nodes.

However, moving to dedicated hardware brings back a physical constraint: you pay for the entire server node you provision, regardless of whether you are utilizing 1% or 100% of its capacity.

Imagine you operate a massive fleet of cargo ships (Sole-Tenant Nodes) carrying shipping containers (Virtual Machines). Managing this fleet effectively means playing a high-stakes game of Tetris on the open ocean:

You pay the fuel and crew costs for a ship whether it is fully loaded or carrying just two containers.
As workloads are spun up and destroyed over time, your cargo capacity becomes highly fragmented with scattered gaps.
To drastically reduce costs, you must perfectly pack the remaining cargo so you can send empty ships back to port.

Google Cloud provides powerful primitives like Sole-Tenant Nodes and seamless Live Migration that empower teams to build highly customized orchestrators. Here is a technical deep-dive into how we built an automated Sole Tenancy Node Optimizer, creating a digital “harbor master” to play and win this game of Cloud Tetris.

Orchestrating Live Migrations: Transferring Cargo at Sea

In traditional logistics, transferring heavy containers between ships requires halting operations and docking at a port. In the cloud, halting an enterprise system is simply not an option; applications must remain completely online.

Google Cloud’s Live Migration feature acts as our mid-ocean transfer mechanism, allowing us to move workloads without disrupting the fleet:

Pre-Copy Phase: The hypervisor iteratively copies the VM’s active memory pages from the source ship (node) to the destination ship, meticulously tracking any memory changed during the transfer.
Blackout Phase: Once the memory states are nearly synchronized, the VM is paused for just milliseconds to transfer the final state and device registers.
Resumption: The VM resumes execution on the new hardware seamlessly, completely unaware it has been moved.

While this process is incredibly robust, it carries a time cost. A single Live Migration typically takes minutes to complete from start to finish. Because moving a single container takes minutes, sequential moves across a massive fleet would take days. To rapidly consolidate the fleet, our harbor master needed a highly efficient, automated strategy to move dozens of containers simultaneously.

The Algorithm: Greedy Drain and Best-Fit Packing

We structured the orchestration engine around a four-step continuous loop: Discover State, Analyze & Plan, Live Migrate, and Refine. But the core intelligence lies in the planning phase, where the harbor master executes a Greedy Drain algorithm.

To maximize the number of empty ships we can decommission, the engine follows two strict rules:

Candidate Selection (The Greedy Drain): The engine actively searches for the “emptiest ships” first sorting source nodes by minimum instance count and minimum utilization. These are the easiest to completely clear out.
Destination Selection (Best-Fit): Instead of spreading containers evenly, the engine force-packs cargo into the destination ships that are already the fullest (sorting by maximum utilization).

This logic quickly maximizes ships sitting at 100% capacity. But as any Tetris player knows, if you drop the wrong pieces early on, you create unfillable gaps later.

Preventing Physical Fragmentation: Large-VM Prioritization

Even with the Greedy Drain strategy, a new physical constraint emerged. If our automated cranes grabbed the smallest containers (e.g., 2-vCPU VMs) first, they would scatter them across the destination ships, filling up random deck slots.

Later, when the crane attempts to load a massive 32-vCPU database VM, no single ship has enough total capacity left to hold it even if the combined free capacity across the entire fleet added up to hundreds of vCPUs. To prevent this, we implemented strict Large-VM Prioritization:

Before execution, the entire migration plan is sorted by VM size (vcpu and memory_mb) in descending order.
The massive, oversized containers are assigned to destination ships first, ensuring the bulk capacity is reserved for them.
Only then are the smaller crates loaded, which safely consume whatever fractional capacity is left over on the ships.

With the sorting logic perfected to avoid fragmentation, we fired up the automated cranes. However, running this at scale introduced a dangerous race condition on the docks.

The Dynamic Port: Managing Capacity Collisions

While our automated cranes are busy moving containers, the port remains open for normal business. The cloud is a dynamic environment, which introduces a new problem: unexpected double-booking.

Our optimizer discovers 16 vCPUs of available space on “Ship A” and schedules a crane to move a container there.
Before the crane can finish the move, a new container (an external workload from an autoscaler or developer) arrives at the port. The main GCP office immediately books it into that exact space on Ship A.
When our crane finally attempts to drop its load, the GCP API rejects the move with a 412 PRECONDITION FAILED (Insufficient Capacity) error.

Additionally, our own parallel cranes might try to claim the same space simultaneously. To prevent our own system from causing these collisions, we introduced an in-memory Resource Reservation System. When a thread plans a migration, it instantly deducts the reserved_vcpu from the destination node's simulated free pool so other threads don't double-book it.

However, we still needed a robust way to handle collisions caused by those unpredictable, external workloads.

Self-Healing State: The Multi-Pass Architecture

When a capacity collision occurs, the crane simply drops the container back onto the source ship. Rather than building a brittle, globally locked harbor schedule to prevent every possible external conflict, we designed the engine to embrace the dynamic nature of the cloud using a resilient Multi-Pass Architecture:

Execute the Pass: The engine lets the entire first optimization pass execute completely. The vast majority of parallel migrations succeed, but unpredictable external workloads might cause a few collisions.
The Refine Loop: Rather than halting the port or throwing a fatal error, the engine finishes its first pass and then automatically begins a fresh “Refine Loop.”
Re-Evaluate and Retry: It fetches the newly updated, post-collision state of the entire fleet, discovers which destination ships actually have real capacity remaining, and launches a second sweeping pass to route any stranded cargo to new open slots.

This self-healing loop elegantly resolves capacity conflicts on the fly, ensuring that the fleet is fully optimized without needing to pause external cloud traffic. However, while the destination ships were now protected and resilient, the sheer volume of concurrent moves accidentally threatened to capsize the source ships.

Hardware Strain: Enforcing Source-Node Parallelism

While optimizing the destination, we inadvertently created a massive bottleneck on the source ships.

If the engine identified a source ship with 50 small containers and commanded 50 cranes to unload it simultaneously, the physical hardware strained under the load. Overwhelming the underlying hypervisor’s management plane and I/O during Live Migration risks host stability. You simply cannot attach fifty cranes to a single small vessel without capsizing it.

To mitigate this, we throttled the system using Source-Node Parallelism Limits:

Instead of using a global free-for-all thread pool, the engine assigns dedicated ThreadPoolExecutors per physical source node.
We enforce a strict concurrency cap, allowing a maximum of 3 active Live Migrations originating from a single source host at any given time.

The throttles kept the physical hardware perfectly stable, but the open sea remains unpredictable. We still needed a way to handle sudden manifest changes mid-flight.

Conclusion: Winning the Cloud Tetris Game

Automating cloud efficiency requires treating infrastructure as a dynamic logistics puzzle that must account for mathematical bin-packing, concurrency safety, and the absolute physical limits of the underlying hardware.

By combining the power of GCP Live Migration with a Greedy Drain algorithm, strict resource locking, and source-node parallelism constraints, we transformed a fragmented fleet of servers into a highly dense compute environment:

CPU and RAM packing efficiencies climbed dramatically across our active nodes.
Dozens of emptied ships were successfully returned to port.
By eliminating wasted capacity, our overall dedicated compute costs plummeted.

With Google Cloud’s powerful architectural primitives like Live Migration, even the strictest dedicated hardware environments can achieve true cloud elasticity ensuring our enterprise customers always operate at peak performance and minimal cost.

The puzzle never really ends.

While our greedy-drain algorithm solved our immediate density challenges, the cloud is always evolving. How is your engineering team handling the physical constraints of dedicated compute? Have you built your own custom orchestrators, or do you rely entirely on out-of-the-box scheduling policies?

We’d love to hear your approach to solving these hardware puzzles at scale. Drop a comment below or connect with us on LinkedIn: Dwij Sheth, Vinay Damle to keep the conversation going!

The Cloud Tetris: A Technical Deep Dive into GCP Sole-Tenant Node Usage Optimization was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

OpenGravity: Turn Antigravity into your autonomous cloud agent, controlled from WhatsApp

Stéphane Giron — Sat, 20 Jun 2026 04:59:13 GMT

A hands-on guide to deploying your own AI assistant on a cloud VM, powered by Antigravity 2.0 and accessible from anywhere through WhatsApp.

From Hermes to Antigravity

A few days ago, I published a step-by-step guide on deploying the Hermes AI agent on GCP. The experience was genuinely exciting, having an autonomous agent running on a cloud VM, accessible through WhatsApp, felt like a glimpse into the future of how we’ll work with AI.

But as much as I enjoyed Hermes, something kept pulling me back. I’m a fan of Antigravity 2.0. Its deep integration with the development workflow, its ability to read, write, execute, and reason across an entire codebase, nothing else comes close for me. As a Google Developer Expert working for a Google reseller, Antigravity is genuinely the tool I use more and more for coding but also for performing analysis and various tasks.

So the question became obvious: why switch to an external agent platform when I could bring the power of Antigravity itself to the cloud?

That’s how OpenGravity was born.

The repository is public and available here: github.com/St3ph-fr/OpenGravity

Why this matters: The real benefits

Before we dive into the setup, let me share why this architecture is genuinely compelling, going far beyond just “it’s cool.”

🌍 Your Agent, Everywhere

With Antigravity running on a cloud VM and OpenGravity bridging it to WhatsApp, your agent becomes omnipresent. Waiting for a train? Ask it to check the status of a deployment. At lunch? Have it research a technical topic and summarize findings. In a meeting? Quickly trigger a script on your dev environment.

You don’t need a laptop; you don’t need SSH. You just text it.

💰 Cost-Effective: Use Your Existing Subscription

This is a detail that matters a lot in practice. When you use Antigravity through OpenGravity, you consume your AI Pro subscription quota, not per-token API charges. If you’re already paying for an AI Pro subscription, this is essentially “free” compute intelligence on top of a small VM cost. No surprise bills; no token counting.

🔒 A Safe Sandbox for Your Agent

Here’s something I learned from the Hermes deployment: having an agent run on a dedicated cloud VM is actually a significant safety advantage. The agent can freely execute commands, write files, install packages, compile code, and if anything goes wrong, it’s contained to that VM. Your personal machine stays untouched. The VM is disposable; that’s a real benefit when you’re letting an autonomous agent run tasks.

✋ You Stay in Control

Even though the agent runs autonomously, OpenGravity includes a human-in-the-loop approval system. When the agent wants to execute a sensitive command, it pauses and sends you a WhatsApp message asking for permission. You reply with a thumbs up 👍 or thumbs down 👎. Simple, fast, and secure.

📺 Demo

https://medium.com/media/63d4fa47e31a9a7578a8c7054ad0edc6/href

OpenGravity, setup guide

Step 1: Set up your cloud VM

If you followed my previous article on deploying Hermes on GCP, the foundation is exactly the same. You need a Linux VM running on a cloud provider.

What you need:

A GCP Compute Engine instance (or equivalent on AWS/Azure)
Recommended machine type: e2-medium (2 vCPUs, 4 GB RAM), which is sufficient for Antigravity and the WhatsApp bridge
OS: Ubuntu 22.04 LTS or Debian 12
Estimated cost: ~$25–35/month for an always-on instance

I won’t repeat the full VM creation steps here; the Hermes guide covers the GCP setup in detail (creating the instance, firewall rules, SSH access). Follow those steps to get your VM up and running, then come back here.

Step 2: Install Antigravity 2.0 on Linux

Once your VM is ready, head to the official download page: antigravity.google/download

Follow the Linux installation instructions on that page. The installation will set up:

The Antigravity CLI and agent runtime
The agentapi binary, which is used by OpenGravity to communicate with agent sessions
The configuration directory at ~/.gemini/config/

After installation, verify everything is working:

antigravity --version

Launch Antigravity in the interface and authenticate with your Google account when prompted. This is where your AI Pro subscription comes in, as the agent will use your subscription quota, not pay-per-token billing.

Antigravity on Linux VM hosted on Google Cloud Platform

Step 3: Install OpenGravity: Let the agent handle the setup

Here is where the magic of Antigravity 2.0 comes in. Instead of walking you through copy-pasting code block by code block, we are going to let Antigravity set up the entire integration itself.

GitHub repository OpenGravity

By using Antigravity to perform the setup, we also create a dedicated project. This centralizes all the files and configurations, making it extremely easy to manage, update, and extend the project in the future.

Launch and Prompt the Agent

Open the Antigravity 2.0 application on your machine (or start it via the terminal with the antigravity command). Create a new project for OpenGravity, and then prompt the agent with the following:

I want to set up the OpenGravity WhatsApp bridge sidecar and plugin. Here is the repository: https://github.com/St3ph-fr/OpenGravity.
Please clone it and follow the README to set up the sidecar files and the plugin configurations. I want to use pairing code authentication instead of QR code scanning for convenience, and my WhatsApp phone number is +33XXXXXXXXX.

The agent will read the repository code and automatically execute the required actions:

Clone the repository into your project space.
Copy the bridge sidecar files (bridge.js, sidecar.json, package.json) to the correct Antigravity sidecar directory at ~/.gemini/config/sidecars/whatsapp-bridge/.
Run npm install inside the sidecar folder to pull in all dependencies (like @whiskeysockets/baileys).
Copy the plugin/skill configuration files to your plugins folder at ~/.gemini/config/plugins/opengravity/.
Register the whatsapp-bridge sidecar in your global ~/.gemini/config/config.json file.
Configure the WHATSAPP_PHONE_NUMBER environment variable in the run configuration.

WhatsApp Pairing Code Authentication

Scanning QR codes in a remote headless terminal can be clunky. Thankfully, the WhatsApp bridge code supports generating a simple 8-character pairing code directly.

Since the agent has configured WHATSAPP_PHONE_NUMBER with your number, you can now launch the bridge manually once to request the pairing code:

cd ~/.gemini/config/sidecars/whatsapp-bridge
node bridge.js

Instead of showing a QR code, the terminal logs will display your association code:

=========================================
VOTRE CODE D'ASSOCIATION WHATSAPP : XXXX-XXXX
=========================================

On your phone:

Open WhatsApp → Settings → Linked Devices → Link with phone number instead (or Link a Device and select phone number pairing).
Enter the 8-character code displayed in your terminal.

Once WhatsApp shows the connection is established, press Ctrl+C to stop the manual process. The authentication state is saved in auth_session/, and the sidecar will run automatically in the background going forward.

Now restart Antigravity. The WhatsApp bridge sidecar will launch automatically as a background service.

Step 4: Start chatting with your agent

Open WhatsApp on your phone and navigate to your self-chat (the chat with your own number). Send a message:

Hello! What can you do?

Within seconds, your Antigravity agent will respond, right there in WhatsApp.

Conversation example with Antigravity from WhatsApp

The agent has the full power of Antigravity, including file system access, terminal execution, web search, code generation, and multi-step reasoning, all accessible from your phone.

Smart conversation management

OpenGravity handles conversation context automatically:

Within 1 hour: Your messages continue the same conversation thread. The agent remembers everything.
After 1 hour of inactivity: A fresh conversation is automatically started, accompanied by a system note so the agent knows the context has changed.
Project switching: If you change the projectId in the sidecar configuration, the next message automatically starts a conversation under the new project.
Manual control: Use /new to force a fresh start, or /reply to jump back into any previous conversation.

Human-in-the-Loop Approvals

When the agent needs to execute a command, it sends you an approval request:

⚠️ [OpenGravity Authorization Request]
Tool: run_command
Command: npm run build
Reply with:
• 👍 to allow
• 👎 to deny

Just tap the emoji. The agent waits for your decision before proceeding. This keeps you in full control even when you’re not watching the terminal.

WhatsApp conversation with image received from Antigravity

What’s next: Extending the agent’s capabilities

OpenGravity as it stands today is a functional, powerful bridge. But it’s just the beginning. Here’s what’s on the roadmap:

🔗 Google Workspace CLI integration

Connect the agent to your Google Workspace, including Gmail, Calendar, Drive, and Docs. Imagine telling your agent: “Schedule a meeting with the team for Thursday and draft the agenda in a Google Doc.” The plumbing exists; it just needs to be wired up.

⏱️ Background tasks & autonomous operations

Give the agent the ability to run scheduled tasks and long-running background jobs. Monitor a service, run nightly builds, generate daily reports, all without you sending a single message. The agent proactively does the work and notifies you when it’s done.

🧩 Extended skills & plugins

OpenGravity’s plugin system means you can teach your agent new capabilities. Custom skills for your specific workflow, such as deploying to staging, running test suites, parsing logs, and managing infrastructure. The skill files are simple markdown instructions that the agent follows.

The bigger picture: Agents as team members

I want to step back from the technical details for a moment and share what I truly believe is happening, and where projects like OpenGravity fit in the bigger picture.

Agents are the next fundamental shift

We’ve seen waves of transformation in tech: cloud computing, mobile, DevOps, and AI-assisted coding. Autonomous agents are the next one. Not chatbots, and not co-pilots that wait for you to press Tab. We are talking about real agents that can reason, plan, execute, and report back.

What I’m building toward

OpenGravity is a piece of a much larger puzzle. The vision is this:

An AI agent should be a real member of your team.

Think about what that means:

It needs to communicate. That’s what the WhatsApp integration solves; a natural, always-available communication channel. In the future, it could also be Slack, email, or any other channel your team uses.
It needs a “desk.” A cloud VM gives the agent its own workspace, consisting of its own file system, its own terminal, and its own environment, allowing it to work independently without stepping on anyone’s toes.
It needs an identity. This is the next frontier. Imagine the agent having its own Google Workspace account; a real email address, a calendar, and access to shared drives. It can receive tasks via email, update shared documents, and participate in the team’s workflow.
It needs a role. Just like any team member, the agent needs defined responsibilities, permissions, and skills. Antigravity’s skill system is the foundation for this; you define what the agent can do, how it should behave, and what guardrails to enforce.

The future of work with agents

I believe that within the next few years, we’ll see organizations where agents are integrated into teams alongside humans. We are not replacing people, but augmenting them. We will have an agent that handles the tedious operational tasks, monitors systems overnight, prepares research before a meeting, or drafts documentation after a sprint review.

For that to work, we need several things:

Communication channels: ✅ OpenGravity (WhatsApp today, more channels tomorrow)
An isolated workspace: ✅ Cloud VM with full system access
Tools and capabilities: ✅ Antigravity’s rich skill system
A persona and identity: 🔜 Google Workspace integration, team roles
Trust and safety: ✅ Human-in-the-loop approvals, sandboxed execution

This is a long-running effort. Building the right tools, defining the right skills, crafting the right persona; it all takes time. But I’m convinced this is the direction we’re heading, and OpenGravity is my first concrete step toward making it real.

Conclusion

OpenGravity gives me an always-on, always-available AI assistant that lives on a cloud VM and responds to my WhatsApp messages. It uses my existing Antigravity subscription, runs safely in an isolated environment, and keeps me in control through approval workflows.

If you’re an Antigravity user who wants to take your agent beyond the IDE and into the real world, give OpenGravity a try.

🔗 Repository: github.com/St3ph-fr/OpenGravity

The setup takes about 20 minutes; the possibilities are endless.

If you found this article useful, feel free to give it a clap 👏 and share it with others who might be interested in autonomous agent architectures. And if you build something cool with OpenGravity, I’d love to hear about it!

OpenGravity: Turn Antigravity into your autonomous cloud agent, controlled from WhatsApp was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Building a Serverless, Multi-Source Data Ingestion Framework on GCP (Snowflake & Databricks to…

Prabha Arya — Sat, 20 Jun 2026 04:59:10 GMT

Building a Serverless, Multi-Source Data Ingestion Framework on GCP (Snowflake & Databricks to BigQuery)

Introduction

In the modern data era, organizations often find themselves managing data across multiple cloud platforms and data warehouses. A common challenge is consolidating this data into a single, centralized analytics hub like Google BigQuery.

Traditional ETL pipelines often require complex infrastructure management, leading to high operational overhead and spiraling costs. Managing persistent Spark clusters just to run periodic data copies is a prime example of resource waste.

To solve this, we built a Serverless, Multi-Source Data Ingestion Framework on Google Cloud Platform (GCP). This framework dynamically orchestrates parallel data copies from both Snowflake and Databricks (Unity Catalog) into BigQuery using GCP Workflows and Dataproc Serverless.

Here is how we designed it, the challenges we solved, and why this architecture is a game-changer for data engineering teams.

The Architecture: Orchestration Meets Serverless Compute

Our goal was simple: Zero cluster management, zero idle compute costs, and maximum parallelism.

To achieve this, we decoupled orchestration from compute:

Orchestrator (GCP Workflows): A serverless, lightweight orchestrator that parses a JSON configuration manifest, manages authentication secrets, and dynamically triggers parallel execution steps.
Compute (Dataproc Serverless): Instead of maintaining a running Spark cluster, we spin up ephemeral Spark jobs via Dataproc Serverless. We only pay for the exact seconds the data copy runs.

Here is the high-level architecture:

Why GCP Workflows?

Unlike Airflow (Cloud Composer), which requires a continuously running environment, GCP Workflows is completely serverless. It charges per step execution, making it incredibly cost-effective for orchestrating batch loads that run daily or hourly. It natively handles parallel loops, allowing us to ingest dozens of tables concurrently.

Key Engineering Challenges & Solutions

Building a generic framework that handles multiple sources is never just about “connecting A to B.” Here are the production-grade stabilization measures we built into the PySpark jobs:

1. Smart Incremental Loading (Deltas)

To minimize data transfer, the framework automatically detects if it should perform a full historical load or an incremental delta load.

It queries the target BigQuery table to find the maximum timestamp in the partition field.
If the table exists and contains data, it extracts only newer rows from the source (Snowflake/Databricks).
If the table is empty or doesn’t exist, it defaults to a full load.

2. “Diversity Guards” Against Duplication

When dealing with incremental loads, source systems sometimes refresh a table completely (a snapshot) instead of appending. If you blindly run a delta load on a refreshed source, you risk duplicating rows.

Our Solution: The framework evaluates distinct timestamps in the source dataset. If it detects a single unified timestamp across the new data, it identifies this as a full snapshot refresh and automatically overrides the write mode to overwrite, preventing duplication.

3. Handling Complex Data Types (Snowflake Variants)

Snowflake makes heavy use of the VARIANT type to store semi-structured JSON. When Spark reads this via JDBC, it often struggles to map it directly to Parquet, leading to writer class-cast exceptions.

Our Solution: We implemented programmatic schema inspection in PySpark. The job identifies MapType, ArrayType, or known Variant columns and automatically serializes them into clean JSON strings before writing to the staging area, ensuring smooth BigQuery loading.

4. Nullability & JVM Protection

Spark and BigQuery sometimes disagree on schema nullability. A source schema might mark a field as “non-nullable,” but the incoming data contains unexpected nulls. In Spark, this can trigger JVM NullPointer exceptions during view transformations.

Our Solution: We built a “Catalyst-Suppressed Nullability” utility that programmatically overrides nullability flags on strictly non-nullable source fields when preparing the Spark DataFrame schema, making the pipeline resilient to bad source data.

5. Column Sanitization

BigQuery has strict rules about column names (e.g., no spaces, special characters, or system prefixes like _PARTITION).

Our Solution: The framework automatically sanitizes all column names, replacing invalid characters and renaming restricted prefixes to conform to BigQuery standards.

Declarative Configuration

Data engineers don’t need to write code to add new tables. They simply update a JSON manifest:

{
  "project": "my-gcp-project",
  "snowflake_tables": [
    {
      "source_schema": "SALES_DB",
      "source_table": "TRANSACTIONS",
      "target_bq": "my-gcp-project.sales.transactions",
      "bq_partition_field": "_extracted_at",
      "write_mode": "auto"
    }
  ]
}

By setting write_mode to auto, the framework automatically handles the logic of historical vs. incremental loads.

Business & Operational Benefits

Cost Reductions: Moving from persistent Dataproc clusters to Dataproc Serverless reduced our compute costs by up to 60% for batch workloads.
Operational Simplicity: No Kubernetes or Spark clusters to configure, patch, or scale. GCP handles the infrastructure.
Speed to Production: Adding a new table to the ingestion pipeline is now a configuration change, not a coding task.

Conclusion

By combining the serverless orchestration of GCP Workflows with the elastic scale of Dataproc Serverless, we built a highly resilient, cost-effective ingestion framework. It abstracts away the complexities of cross-cloud data movement, allowing data teams to focus on delivering insights rather than managing infrastructure. Please refer full solution here.

What are your biggest pain points when migrating data from Snowflake or Databricks to BigQuery? Let’s discuss in the comments!

Building a Serverless, Multi-Source Data Ingestion Framework on GCP (Snowflake & Databricks to… was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

You got informed consent. Can you prove the AI was right?

Boris Dali — Fri, 19 Jun 2026 02:16:53 GMT

A sequel to “You let AI operate on a production database without your consent?”

The first post ended with a gate.

A gate is what we call an optional, but highly recommended pause between AI diagnosis and AI action. It’s a triage screen that shows the operator exactly what the agents found, what they propose to do about it, the blast radius of doing it and the right to say no. Enter the Informed Consent framework borrowed from medical ethics: you cannot operate without telling the patient the diagnosis, proposing the treatment and getting explicit agreement.

That post made the case for why the gate has to exist. This one asks the harder question that comes after: you said “yes” at the gate, but was the diagnosis you consented to, actually correct? And how does your AI system go about providing a dead-easy proof that it delivered exactly as promised?

Because consent is where the conversation begins, not where it ends.

The gap nobody talks about

All respectable AI SRE tools claim transparency. They show you a diagnosis before they act. They present a remediation plan upfront. Some even let you approve individual, especially destructive, steps. That’s table stakes.

In my observation, what is largely missing in most AI SRE tools are the answers to the following two questions:

For your particular failure scenario…

How often the diagnosis was actually right.
Whether the remediation was actually appropriate.

These questions should be asked at the decision time. And the historical answers to these two questions should be presented to an operator, so that they can make a more informed decision. Because these answers reveal how a similar failure scenario was handled in the past. Right there on the same triage screen that the current on-caller needs to make a decision. NOW!

This matters more than it sounds. An operator at 2am staring at a screen that says “Root blocker PID 867 holds a transaction lock; terminate it” is not running their own analysis. They are evaluating an AI claim under time pressure. If that claim is correct 91% of the time, they have more confidence that the track history of AI diagnosis of this particular problem is good and so they could consider approving it. If it’s correct 60% of the time, they should probably investigate before acting. And if the AI reports 95% confidence in both cases — which number do they use?

Here’s the thing: without measurement, the gate is a ritual. The operator clicked approve because the screen looked authoritative, the clock was ticking, VP was looking over their shoulder and there was no better option. That is not informed consent. That is informed-looking consent.

To turn the gate from a ritual into a measured property, aiHelpDesk features a feedback loop designed to facilitate the Informed Consent, which is an integral part of the Operational SRE/DBA Flywheel (see here and here):

aiHelpDesk Feedback Loop

Every gate interaction generates a feedback record. Two questions, captured in sequence:

At the gate, before remediation runs:
— Is the diagnosis correct?
— Is the proposed remediation approach appropriate?

After recovery completes (aka post-incident):
— Looking back: was the diagnosis correct?
— Looking back: was the remediation appropriate?

These are not survey fields bolted on after the fact. They are the first-class records in the audit store, keyed to the same run ID as the playbook execution, the step log and the automated evaluation score. The gate is where the human verdict enters the system. Everything else is downstream from here.

Who is the primary consumer of this data? Meet the vault commands:

vault drift → Which faults are regressing over time?
vault versions → Did the last playbook update help or hurt?
vault accuracy → How often does the agent actually diagnose correctly?
vault calibration → Can I trust the automated scores?

These commands form a dependency chain. Drift tells you which fault to look at. Versions correlates the regression to a playbook change. Accuracy checks whether diagnosis quality actually improves (or drops). Calibration validates whether the scores you’ve been reading are reliable in the first place.

We recommend working through them in order. Drift is where you start because it surfaces the signal. Calibration is where you end because it validates that signal. The loop closes.

Why the gate is the best place to ask

In our experience, the gate is the best place to ask an operator for a feedback on the AI diagnosis. Because at this time an operator has to make a decision. Not just assessing if the diagnosis is correct, but also, was the diagnosis clearly explained, was it well presented, was it easy to grasp to form… well, an informed opinion?

We find that it’s critical to pose this feedback question before the operator knows whether remediation will succeed. We repeat the same question again post-incident, but that’s already a look in a rear view mirror. Also important, but giving a different signal.

This feedback ordering is deliberate. In our experience, the post-incident feedback is contaminated by outcome bias. An operator who just saw a clean recovery is more likely to confirm the diagnosis regardless of whether they actually understood and evaluated it. The relief of a resolved incident retroactively validates the path that got there.

What’s not what we are after. To improve and evolve triage playbooks, we need honesty. At-gate feedback has no outcome bias contamination. The operator is reading a hypothesis, possibly at 2am in the morning, possibly on their small cell phone screen. Yes, they are also presented with a proposed action plan to fix the problem, but they have no knowledge yet of the remediation outcome. This verdict is the honest one.

This is why vault calibration prefers at-gate feedback over post-incident when both exist for the same run. It is also why, if you are starting a feedback program, and we recommend that you do, the at-gate question matters more than the post-incident one. You get fewer clean shots at unbiased measurement than you think.

That’s our experience anyway. All feedback is optional, which means that an operator is free to skip any and all of our four questions. We advise against it. We encourage the SRE teams include it in their SoP the diligence of providing the feedback. But ultimately, whether you answer all four questions is up to you. This is the description of the feedback loop flow and our recommendations.

If you want to reproduce the findings presented in this blog post, please feel free to check out this doc page for details on the commands to run (for running aiHelpDesk Fault Injection Tests directly on a host/VM in particular or see the other samples for K8s and Docker/Podman), but the excerpts below should be sufficient to make our point:

Calibration is the clincher

After enough feedback accumulates, vault calibration produces something that very few SRE tools do:

Diagnosis calibration — fleet-wide 
(13 runs with eval + operator feedback)

  CONF BAND  RUNS  CORRECT  ACCURACY       CALIBRATION
  ───────── ───── ──────── ───────── ─────────────────
    90-100%    12      11        91%    WELL_CALIBRATED
     70-89%     1       1       100%  INSUFFICIENT_DATA
       <70%     0       0          –  INSUFFICIENT_DATA

Read this table carefully. When the agent reported 90–100% confidence in its diagnosis, it was confirmed correct by operators 91% of the time. The expected accuracy for that band, which is roughly mid-point 95%, is within 10 percentage points of the actual 91%. That is what WELL_CALIBRATED means: the model’s internal confidence score is an honest signal, not a number printed to look authoritative.

Compare this to OVERCONFIDENT, which fires when the actual accuracy falls more than 10 points below the expected band mid-point. If your 90–100% confidence band is only correct 70% of the time, the agent is systematically overstating its certainty. We’ve seen way too many times. And you certainly would want to know that. You would want to lower the autonomous action threshold for that failure scenario until the calibration improves.

This is what verifiable informed consent looks like. Not “the AI told the operator before acting.” But: the operator’s verdict, collected systematically, run against the AI’s stated confidence, checked for consistency. When it checks out, you have evidence. When it doesn’t, you have an early warning.

That was triage. The remediation side produces the same calibration, but independently of triage, because a correct diagnosis can still result in a wrong remediation:

  Remediation calibration — fleet-wide 
  (5 runs with remediation score + operator feedback)

  SCORE BAND  RUNS  CORRECT  ACCURACY     CALIBRATION
  ────────── ───── ──────── ───────── ────────────────
     90-100%     5        5      100%  WELL_CALIBRATED

Five runs is not enough to draw any strong conclusions. But the structure is there and the data accumulates automatically with every faulttest run.

The autonomy gradient

The first post framed full autonomy as something to be suspicious of. That framing holds, but it needs a second half.

Because the full autonomy in auto mode is available in aiHelpDesk too. It is not forbidden and it is not irresponsible … if used in the right circumstances. And those circumstances are very specific: you have run enough fault injection tests to accumulate the calibration data, so that vault calibration shows WELL_CALIBRATED across the confidence bands that matter for your environment and the approval mode you relax to still respects your playbook’s permitted_tools whitelist.

The journey is this:

Run in manual mode. Collect at-gate feedback on every gate interaction.
Watch vault accuracy accumulate. Look for correctness above your bar, whatever that means for your risk tolerance.
Run vault calibration. Confirm the automated scores predict real accuracy.
When both hold, relax the approval mode.

Skipping to step 4 is not faster. It is unverified. The difference between an operator who approved 50 gates with feedback and an operator who skipped the gate entirely is not attitude toward AI. It is evidence. The first operator knows whether and when the AI is right. The second is guessing.

This is also why the gate timeout matters. A gate not resolved within the configured window, transitions to abandoned. Not silently approved. There’s a sharp difference. The system defaults to inaction, not action. Speed is not the goal. Verifiable correctness is.

What we shipped

The feedback loop described her is part of the higher level Operational SRE/DBA Flywheel and it’s live in the current release. Specifically:

At-gate and post-incident feedback for both diagnosis and remediation. All four combinations, captured interactively during faulttest runs or via API for CI/CD pipelines.
vault accuracy showing per-series correctness with a triage and remediation breakdown
vault calibration comparing automated confidence scores to operator-confirmed outcomes, fleet-wide or per fault
vault drift detecting pass-rate regressions before they become production incidents, optionally with a “gateway path” for multi-machine fleet data
vault versions correlating per-playbook-version metrics like step count, recovery time, diagnosis score, remediation score, etc. This is so a bad update is visible before it accumulates too much history

The failure injection test harness that produces all of this data runs against a real Postgres database (either a local, dynamically span Docker if “-auto-db” option is requested or against any BYO instance), injects real failure conditions and captures operator feedback interactively at the exact gate where it matters most.

In a nutshell

The first post asked: did the AI operate with your consent?

This post asks a follow up question: when you consented, were you right to?

Those are different questions. The second one is harder to answer. It requires measurement infrastructure, in particular, feedback capture, evaluation storage, calibration logic, etc. It also requires a philosophical commitment that is easy to skip: the belief that your approval at the gate is a data point that should be tracked, aggregated and used to evaluate the system that asked for your approval.

aiHelpDesk tracks and measures it. The flywheel is the proof.

Does your AI system do this?

aiHelpDesk is open source. The fault injection test harness, the feedback schema and the vault command suite are in the repository. If you run it and your calibration comes back OVERCONFIDENT, that is useful information. Don’t discard it. Work with us to make it WELL_CALIBRATED.

Why aiHelpDesk Playbooks are trustworthy?

Because we give you, the customer, an ability to vet, verify and improve them through the methodology that we refer to as Operational SRE/DBA Flywheel, see here and here for details.

And yes, as a customer, you can not only confirm that the playbooks that you get with aiHelpDesk out of the box work for your environment, but you can easily bring your own. Both: BYO playbooks (by either importing your existing runbooks or cloning and customizing one of our’s or by creating one from scratch) + BYO faults as well.

Because nobody knows your specific databases, your environment and your workload with your upstream/downstream apps better than you do. You know how your database fails. Vendors don’t. At aiHelpDesk, we give you an option to create your own faults and add your own playbooks to triage and rectify them (in addition to the system playbooks we ship, of course).

And yes, we don’t depend on a particular model or a model provider. aiHelpDesk is model agnostic. From our standpoint, the LLMs are a disposable commodity. Flip from Gemini to Anthropic and aiHelpDesk should continue to give you exactly the same diagnosis and remediation. Anything shorter than that is a P0 bug.

AI Agent vs. AI Harness: Where Memory Actually Lives

Qingyue(Annie) Wang — Fri, 19 Jun 2026 02:16:30 GMT

The model decides what to do. The harness stores memory, runs tools, enforces rules, and keeps the agent working.

Part 2 named the building blocks: sessions, Memory Bank, runtime, identity. This part draws the boundary around them: which pieces belong to the agent’s behavior, and which pieces belong to the harness that operates it.

When an AI agent reads a file, who actually reads it?

Not the model.

The model decides that reading the file would help. The harness checks the path, runs the tool, limits the output, records the event, and sends the result back.

That is the agent/harness split in one example:

the agent is the behavior you experience;
the harness is the machinery that makes the behavior possible.

In this post, I’ll use that split to answer one practical question: which parts of an AI agent should be agent-specific, and which parts should become reusable harness infrastructure?

The distinction sounds like vocabulary until you try to add persistent memory, enforce permissions, recover from failure, or operate a second agent. Then it becomes an architectural boundary.

The word “agent” hides three layers

In ordinary conversation, we call the entire running product an agent. That is useful shorthand.

When designing the system, it helps to separate three layers:

Model: generates responses and decisions from the context it receives.
Agent behavior: the goal, instructions, tools, and policies configured for a particular job.
Harness: the reusable machinery that runs the behavior and connects it to the outside world.

For Dana’s support agent, the layers might look like this:

MODEL
an LLM that can reason and request tools

AGENT BEHAVIOR
"resolve customer-support tickets"
+ support instructions
+ account and ticket tools

HARNESS
execution loop
+ context assembly
+ tool dispatch and approvals
+ sessions and memory
+ identity and authorization
+ retries, traces, and evaluation

This split is not about arguing over terminology. Frameworks use “agent” differently, and capabilities can cross layers.

The point is to make ownership visible: what can you swap, reuse, govern, and debug?

A Simple Test for Finding the Harness

A simple way to find the harness is to ask what would stay if the agent changed.

I think of this as the swap test:

If you replace the model or configure a different task, which machinery can remain?

Suppose Dana’s support agent moves to another model. Its behavior may change, but much of the surrounding system can remain:

ticket and account integrations;
permission checks;
session storage;
memory retrieval;
trajectory logging;
retries, traces, and evaluation.

Now keep the model but replace the support behavior with a billing-review agent. Domain tools and instructions change, but the execution loop, identity controls, logging format, and storage infrastructure may still be reusable.

The parts that survive are good candidates for the harness.

The swap test is not a perfect rule. Model changes can require prompt changes. New tasks can require different safety policies. Some memories should remain private to one user or agent. The point is simpler: the test reveals which concerns could become shared infrastructure instead of being rebuilt every time.

Why memory belongs in the harness

In Part 1, reflective memory was the lesson an agent carries forward from experience. Dana’s support agent should remember her account context next week. It should also learn from a failed access ticket after the ticket is closed.

Neither of those happens inside a single model call.

This is why memory cannot just be a field on the agent object.

The model can help decide what was learned. But something outside the call must:

capture the conversation or completed task trajectory;
store it under the correct user, tenant, and agent scope;
validate, consolidate, update, or expire memories;
retrieve relevant memories for a future interaction;
support permissions, audit, and deletion.

That lifecycle belongs naturally in harness infrastructure.

But there is an important distinction:

The harness operates the memory lifecycle. Scope decides who owns and may read each memory.

Dana’s preferences may belong to Dana. A support procedure may belong to the support domain. A temporary plan may belong only to one session. A broadly useful lesson may be eligible for carefully governed sharing.

Calling memory a harness responsibility should never mean “put every memory in one shared bucket.”

What Is the Agent, and What Runs It?

So far, the agent/harness split has been conceptual. Google’s Agent Development Kit (ADK) makes the boundary visible in code.

ADK gives you an Agent object for the configured behavior: model, instructions, tools, and callbacks.

It also gives you a Runner plus services for the machinery that operates that behavior: execution, sessions, memory, and artifacts.

support_agent = Agent(
    name="support_agent",
    model=MODEL,
    instruction=SUPPORT_INSTRUCTIONS,
    tools=[search_account, recall],
)

runner = Runner(
    agent=support_agent,
    app_name="support",
    session_service=session_service,
    memory_service=memory_service,
)

The first object describes the support job. The runner and services operate it.

ADK gives you the pieces to assemble a harness. But assembling the harness is not the same as operating it.

You still have to run it, persist state, enforce identity, govern tool calls, observe failures, and improve the agent over time.

That is where Agent Platform enters the picture.

When Google Operates More of the Harness

Gemini Enterprise Agent Platform moves more of that operating layer into managed Google Cloud services: runtime, sessions, memory, identity, gateway policy, evaluation, and observability.

The agent behavior can stay small: its instructions, tools, and task logic. The surrounding machinery becomes managed platform capability instead of custom infrastructure every team rebuilds.

Most of those are operating concerns rather than one agent’s business behavior. That makes the platform useful for understanding harness responsibilities at organizational scale.

This does not mean ADK and Agent Platform are the same thing. ADK is a development framework; Agent Platform is a managed platform. You may still operate custom databases, tools, credentials, and reflection jobs yourself.

The useful point is the boundary: your agent behavior can be smaller when more of the operating machinery is handled by the harness.

The boundary becomes real when agent number two arrives

For one small agent, “agent versus harness” can feel conceptual. The same repository and team may own everything.

Then a second agent arrives.

Without a deliberate boundary, the new agent often receives its own:

session implementation;
memory wiring;
authentication rules;
logging format;
retry behavior;
deletion workflow;
evaluation setup.

Now two agents behave differently for reasons unrelated to their actual jobs.

A shared harness does not make agents identical. It makes common responsibilities consistent.

The support and billing agents can have different instructions, tools, and memory scopes while inheriting the same identity controls, trace format, memory lifecycle, and evaluation infrastructure.

That is where the boundary stops being philosophy and becomes architecture.

Ask a better design question

Instead of asking:

Is this feature part of the agent or part of the harness?

ask:

Is this behavior specific to one agent’s job, or is it operating machinery that should remain consistent across agents and sessions?

Agent-specific concerns often include goals, domain instructions, task tools, and agent-specific evaluations.

Harness concerns often include execution, tool dispatch, persistence, memory lifecycle, identity, authorization, recovery, logging, tracing, and shared evaluation infrastructure.

There will always be overlap. The goal is not a perfectly pure boundary. The goal is to prevent every new agent from rebuilding the same operational foundation.

For reflective memory, the agent can participate in deciding what was learned. The harness ensures the resulting memory is stored, governed, maintained, and available when a future interaction needs it.

The agent decides what job it is trying to do. The harness makes that job durable, governed, observable, and reusable.

The next post turns that boundary into a concrete architecture.

This is Part 3 of a five-part series on building an AI agent with reflective memory on Google Cloud.

Next: Part 4 — Designing an AI Harness With Reflective Memory

AI Agent vs. AI Harness: Where Memory Actually Lives was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Orchestrating with Antigravity: A Crescendo of Agents (Part 1)

Riccardo Carlesso — Fri, 19 Jun 2026 02:15:36 GMT

I’m a command-line guy. If it doesn’t run in a 🖥️ terminal or get driven by a bash script, I usually avoid it. For years, my daily workflow revolved around gemini-cli, and recently the newer antigravity-cli (agy). I avoided desktop apps and GUI tools like the plague (with the only exception of vscode); a friend calls people like me BashOps — the opposite of the perfectly reasonable ClickOps folks who prefer GUIs.

But recently, I hit a wall.

This article series

As I scaled up my AI agent workflows — managing multiple concurrent coding agents, multi-turn stateful loops, and file changes — babysitting 6 to 12 terminal windows across six virtual desktops became a cognitive nightmare. This is the story of that failure, and the learnings that followed. It is a story in two parts:

Part 1 (This Article): Trying to solve agent persistence programmatically via the Antigravity Managed Agents and agy CLI, and encountering the crescendo of complexity.
Part 2: Hitting the CLI limit and stepping into the Antigravity 2.0 UI / Desktop app to orchestrate parallel local subagents safely with git worktrees.
A part 3 is currently in my brain where I tell the story of how I conceived Ennio Morricone (greatest Italian conductor — see the analogy?) agy CLI helper.

In this first part, we will explore how stateful remote sandboxes work under the hood using the Google GenAI SDK (antigravity-preview-05-2026), how to re-attach to container environments, and how to programmatically retrieve your agent workspace.

This first part builds on the foundation laid out in Romin Irani’s excellent Antigravity Managed Agents Tutorial and Phil Schmid’s deep dive into how Managed Agents work under the hood. I spent a day playing around with Romin’s article code ideas, customizing them to my own setups, and this article is the output of those experiments. While their articles focus on the architectural and hosted aspects of managed agents, we will look at how a developer can drive stateful remote sandboxes programmatically to build automation loops directly from a local terminal shell.

Riccardo playing with Stateful Remote Sandboxes, after reading Romin’s article

Antigravity Agents: a Stateful delight

In a traditional agent API interaction, each call is stateless. You send a prompt, the agent responds, and the workspace disappears. If you want the agent to edit a file it created in a previous turn, you have to pass the entire file content back and forth in the prompt context. This consumes tokens, increases latency, and makes multi-turn code generation slow and expensive.

The Google Antigravity SDK solves this with Stateful Remote Sandboxes. When you run an agent with the environment parameter set to "remote", the SDK:

1. Provisions a private, secure Ubuntu container (sandbox) on Google Cloud.

2. Runs the agent inside that container.

3. Keeps the container alive and returns an environment_id.

4. Allows subsequent API calls to re-attach to the same container, inheriting the exact filesystem state.

You don’t believe me? Let me show you the code (heavily inspired by Romin’s article).

The SpaceX IPO Analyzer: Python Orchestration

To demonstrate this capability, let’s write a Python script that orchestrates a stateful, multi-turn agent session.

A few days back, SpaceX made the news as the biggest IPO in history, making Elon Musk the first Trillionaire in history (Reuters, June 2026). Everyone is thinking: should I buy? Should I not? Let’s put Gemini to the test.

Our agent acts as a SpaceX IPO Analyzer. In the first turn, it researches the SpaceX IPO and generates a Markdown report. In the second turn, it re-attaches to the same container, reads the Markdown report, converts it into a styled HTML dashboard (with custom CSS), and generates a space-themed image asset. Finally, in the third turn, it programmatically downloads the entire workspace container snapshot.

> 💡 Note: The Antigravity stateful features are accessed using the standard Google GenAI Python SDK by calling the preview agent model (antigravity-preview-05-2026) with environment="remote".

Here is the complete python script:

import os
import requests
import tarfile
from google import genai

client = genai.Client()
print("🚀 Turn 1: Launching SRE/Financial Agent in remote Ubuntu Sandbox...")
# Turn 1: Launch agent to research and write a report in a remote sandbox
interaction_1 = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input="Research SpaceX IPO and save report as spacex-report.md.",
    environment="remote"  # Launches a remote Ubuntu sandbox
)
env_id = interaction_1.environment_id
print(f"✅ Turn 1 Complete. Container Environment ID: {env_id}")

print("\n🔄 Turn 2: Re-attaching to same container and converting to HTML...")
# Turn 2: Re-attach to the SAME sandbox and preserve conversation memory
interaction_2 = client.interactions.create(
    agent="antigravity-preview-05-2026",
    environment=env_id,                              # ← Re-attaches to same sandbox
    previous_interaction_id=interaction_1.id,       # ← Preserves conversation memory
    input="Convert that spacex-report.md file into a clean index.html webpage" +
          " with styling and generate a custom nanobanana image."
)
print("✅ Turn 2 Complete.")

print("\n📦 Turn 3: Downloading the entire container snapshot (.tar) locally...")
# Turn 3: Download the entire sandbox environment state (.tar) locally
api_key = os.environ.get("GEMINI_API_KEY")
response = requests.get(
    f"https://generativelanguage.googleapis.com/v1beta/files/environment-{env_id}:download",
    params={"alt": "media"},
    headers={"x-goog-api-key": api_key},
)
tar_path = "snapshot_env.tar"
with open(tar_path, "wb") as f:
    f.write(response.content)
print(f"✅ Snapshot downloaded to {tar_path}. Extracting...")
with tarfile.open(tar_path) as tar:
    tar.extractall(path="./workspace_extract")
# Wow! We've dumped the remote agent workspace locally!
print("🎉 Workspace extracted successfully! Check ./workspace_extract/")

Let’s unpack what just happened in this code:

1. Spin up: Investigation. Google creates a docker container on a Ubuntu machine somewhere, and this instance with REAL disk starts doing some research on SpaceX IPO.

2. Continuation: format Conversion. Convert MD to HTML. This is silly — we could have done an HTML in iteration 1 of course — but it serves the purpose to demonstrate you can park your calculation today and come back tomorrow and continue the session and iterate over it, maybe with a chat over Telegram ;)

3. Pull results locally This is the magical part: git pull remote-workspace. This is where we know it's REAL. There's a real workspace there, maybe with some remote github repo pulled and some added Unit Tests. In our case, it contains spacex-report.md (created in turn 1), index.html and nanobanana.jpg (The space-themed HTML dashboard created in Turn 2), finally a CSS to make all very nice and space-themed.

This is what I see opening index.html with my browser:

Page 1 of the generated SpaceX IPO analysis dashboard.

Page 2 of the generated financial dashboard, displaying the bull/bear investment case.

Wow!

Experiment 2: watch me coding (pun intended!)

My son is great at Math. Nearly as good as papino. However, he struggles with watches.

After watching Heroes, I’ve always been a bit wary of watchmakers (especially Sylar). But recently, I spent a whole day helping my 8-year-old son struggle to map an analog clock pointing at 19:45 to its digital string representation. It was frustrating because he is actually fantastic at math — once he can work with digital time strings, he can calculate time differences like 08:45 +/- 20 minutes in seconds. The blocker was entirely visual.

So, I decided to build a game to help kids bridge this gap. The goal was to build orologia.io—a cross-platform mobile game built in Flutter.

This is because we travel a lot, and I need apps working on my Android on a plane.

A catchy name, that’s the easy part! orologia.io

Since the ‘.com’ era is so 2000s, and the Sardinian ‘.io’ era is now!

Here is how we used a single stateful agent script to build the initial prototype. We passed a detailed multi-step prompt to the agent — asking it to read docs/PRD.md, implement the game, run a local server, take a screenshot, and file a GitHub PR.

Here is the core logic we used to run this stateful agent loop. It uses the google-genai SDK to provision the remote sandbox, mounts the orologia.io git repository, and downloads/extracts the final workspace snapshot.

Here is the core API setup, repository mounting, and snapshot download sequence (complete run-agent-prototype.py script here ):

from google import genai
import requests, os

client = genai.Client()
api_key    = os.environ["GEMINI_API_KEY"]
gh_token   = os.environ.get("GITHUB_TOKEN")  # optional: enables the agent to push a PR

# Mount the git repo AND inject the GitHub token as a file into the sandbox
sources = [
    {"type": "repository",
     "source": "https://github.com/palladius/orologia.io",
     "target": "/workspace"},

    {"type": "inline",
     "target": "/workspace/.github_token",
     "content": gh_token},  # ← secret injection!
]
# The prompt tells the agent exactly what to build
prompt = """
You are an expert full-stack developer agent.

The repo orologia.io is mounted at /workspace.
1. Read docs/PRD.md and implement a beautiful clock-learning game...
2. Make it stunning: analog clock with rotating hands, digital display, ..
3. Optionally screenshot it, then commit and open a PR using the token 
   at /workspace/.github_token.
"""  # Full prompt: https://github.com/palladius/orologia.io/blob/main/solutions/20260615-antigravity-managed-agents/run-agent-prototype.py
# Launch remote stateful sandbox agent
interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input=prompt,
    environment={"type": "remote", "sources": sources}
)
# Download final snapshot locally
url = ...
response = requests.get(url, headers={"x-goog-api-key": api_key}, params={"alt": "media"})

Rather than jumping straight into a complex Flutter mobile app, we had the agent create a rapid “vibe-coded” prototype in plain HTML, CSS, and JavaScript. By describing the clock mechanics and visual feedback loop in a single turn, the agent generated a responsive clock app in one minute.

Vibe-Coded Prototype Code: solutions/20260615-antigravity-managed-agents/
Clock UI Draft: The agent designed interactive options, matching analog times to multiple digital buttons (you can view early drafts like orologia_ui_one_clock_four_bcds.png).
You can play the final game yourself here: palladius.github.io/orologia.io

You can play the game under https://palladius.github.io/orologia.io/ !

And here is Alessandro winning the game!

This proved the value of single-agent rapid prototyping. But what happens when you try to scale past a single prototype and orchestrate multiple subagents concurrently to implement the full-blown cross-platform Flutter mobile game?

Riccardo surrounded by a million agents and being desperate

This is where I hit the wall and resolved to invent a Worktree + Conductor Carlessian-customized workflow…. but this is for another article… 🚀 Orchestrating with Antigravity: A Crescendo of Agents (Part 2)

What are these Remote Agents good for?

So the big question is: What are these Remote Agents good for?

You could argue that everything is more comfortable in localhost,

but I see this as a pioneering of the “agents Dockerfiles" where you start playing with something locally, then a piece at a time you

start building your own “agent container” with bin/custom-script and .gemini/my-smart-config and maybe a gcloud ServiceAccount sa.json and

boom! You have a repeatable, secure, self-contained sendbox where you can interact with the Internet with the power of Gemini! Remember, git clone and API_KEYS are a powerfule combination!

My personal key takeaways:

Do NOT use for a one-off investigation. Use Web or local CLI instead, it’s much faster.
Use for repeatable workflows where the config takes a lot more than the final prompt. In other words, anything where you have 5 prompts of which 1–2–3–4 are the same, and 5 always changes. This is IMHO the sweet spot where these excel.
Try Gemini API CLI excellently explained in Philip's article.

Example: I’m currently working on containerizing a travel planner for my family where I just say the trip I want to plan, when and the constraints. Yes, it feels like a fine-tuned model, but fully FS observable/tweakable, so I can download the index.html later on!

The agent built a full Sardinia travel plan — with a flight proof and an itinerary map — from a three prompts (1 and 2 are constant), and with provable flight booking links.

—

🚀 Ready for Part 2? Read Orchestrating with Antigravity: A Crescendo of Agents (Part 2) to see how we scale this local development flow using git worktrees, Conductor++, and parallel subagents under Agostina.

Orchestrating with Antigravity: A Crescendo of Agents (Part 1) was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Building an agentic PR reviewer with Antigravity SDK

Remigiusz Samborski — Thu, 18 Jun 2026 19:32:40 GMT

As announced in this blog post on June 18, 2026, Gemini CLI and Gemini Code Assist IDE extensions will stop serving requests for Google AI Pro and Ultra, as well as those using it free of charge using Gemini Code Assist for individuals. Google is unifying its AI terminal tools by transitioning the community-focused Gemini CLI into Antigravity CLI, a new agent-first platform built for complex, multi-agent workflows.

With this transition timeline in place, development teams relying on Gemini CLI for repository management and automated tasks must establish a migration path. In this post, I will show you how to transition seamlessly by building an automated “first-pass” pull request reviewer using the Google Antigravity SDK and the run-agy-sdk composite GitHub Action.

The orchestration tax

The approach I am proposing also solves another pressing issue for modern engineering teams: cognitive overload. As Addy Osmani recently pointed out, there is an orchestration tax to using AI for coding. The time developers save generating code is often pushed onto reviewers as large, complex PRs, causing context switching and cognitive fatigue.

By offloading the tedious “first pass” search to an Antigravity agent, human reviewers can mitigate this tax and focus on high-level architecture and safeguarding quality.

Why we need automated agentic code reviews

AI-generated code can be deceptively good. It is often clean, well-documented, and syntactically correct. This makes it harder for human reviewers to spot subtle logical bugs or security vulnerabilities that might not be immediately obvious.

In a large codebase, manually verifying every change is simply not feasible. This is why we need autonomous agents that can step into the codebase and analyze it from a fresh perspective.

But if a developer used an LLM to generate the code, how can we trust another AI to find the bugs? The answer lies in the agent architecture and context separation.

Developers might write code using any tool — whether it’s CLI, an IDE extension, or various models like Gemini 3.5 Flash or Gemini 3.1 Pro. The reviewer, however, is a managed Antigravity Agent running via a separate SDK integration. This agent has a specialized, low-freedom persona and strict system instructions that force it to act as an adversarial code auditor rather than a developer. Furthermore, it operates in an isolated environment. Because it has a different system prompt, safety guardrails, and context boundaries, the agent reviews the changes with a completely fresh perspective, catching logical bugs and vulnerabilities that the original generator might miss.

To demonstrate it in practice I created an agentic review pipeline, which:

Leverages a managed Antigravity Agent configured via the SDK to review the code. The agent uses advanced reasoning to explore files and verify logic under strict guidelines.
Runs reviews inside isolated workspaces or sandboxes with custom policies to prevent shell or arbitrary code execution risks.
Enables the agent to use the GitHub MCP server to interact directly with the environment to write pull request comments and reviews.
Avoids using the synchronize trigger in pull request workflows to prevent redundant review runs and endless loops. Instead, runs reviews on opened and reopened events, and triggers subsequent passes manually by posting a @agy /review comment on the PR.

Agentic review pipeline

You can find the code at run-agy-sdk.

What is run-agy-sdk?

The run-agy-sdk is a composite GitHub Action that runs the Google Antigravity SDK (google-antigravity) directly on the GitHub Actions host runner.

Why run on the host instead of a container?

By running directly on the host, the Antigravity SDK has access to the host’s Docker daemon. This allows the SDK to spawn Docker-based MCP servers (like the GitHub MCP server) to read files, run tests, and post reviews.

Sub-containers should ideally run with restricted network access and read-only filesystems where possible to prevent an LLM from being tricked into executing arbitrary destructive commands. The limited set of permissions is handled in the GitHub Action configuration (see here). Whereas the Antigravity agent has a limited number of tools it can use from GitHub MCP (see here).

Moreover the workflow is explicitly protected from running automatically on forks, preventing unauthorized code execution. The automated review job will only run if the pull request originates from the same repository (see here). On-demand reviews triggered by commenting @agy /review are restricted so that they can only be initiated by maintainers (see here).

Demonstration walkthrough

The demo below shows the action triggered by a new PR:

https://medium.com/media/430b9cd6cc3049a213ab3e0b49e23ed5/href

Implementation: How to install the action in your repo

Let’s walk through the setup process step-by-step.

Step 1: Add your API key to GitHub secrets

The action requires a Google Gemini or Antigravity API key to authenticate language model interactions.

Generate your API key.
Navigate to your target GitHub repository and go to Settings > Secrets and variables > Actions.
Create a new Repository Secret named ANTIGRAVITY_API_KEY and paste your API key as the value.

Step 2: Configure the GitHub Actions workflow

Add a new file in your repository at .github/workflows/antigravity-review.yml and add the following configuration:

name: '🔎 Antigravity PR Review'

on:
  pull_request:
    types: [opened, reopened]
  workflow_dispatch:

concurrency:
  group: '${{ github.workflow }}-${{ github.event.pull_request.number || github.ref_name }}'
  cancel-in-progress: true

jobs:
  antigravity-review:
    runs-on: 'ubuntu-latest'
    timeout-minutes: 20
    
    permissions:
      contents: 'read'
      pull-requests: 'write'
      issues: 'write'

    steps:
      - name: 'Checkout Repository'
        uses: 'actions/checkout@v6'
        with:
          persist-credentials: false

      - name: 'Run Antigravity PR Review'
        uses: 'rsamborski/run-agy-sdk@main'
        id: 'agy_pr_review'
        with:
          api-key: '${{ secrets.ANTIGRAVITY_API_KEY }}'
          github-token: '${{ secrets.GITHUB_TOKEN }}'
          mode: 'review'
          prompt: '/antigravity-review'
          trust-workspace: 'true'
          sandbox-profile: 'true'

Pro Tip: Pin the action version to a specific commit SHA (e.g., rsamborski/run-agy-sdk@) rather than using @main. This prevents unexpected breaks from upstream updates.

While you can reference run-agy-sdk directly in your workflows, its real power lies in using it as a blueprint. I encourage you to fork the repository and use it as a template to build your own custom, agentic GitHub Actions. By modifying the safety policies, custom tools, or prompts in run_agent.py, you can tailor the agent’s review behavior to your team’s specific codebase, style guidelines, and compliance rules.

For a full workflow template supporting both automated PR reviews and comment-triggered reviews, refer to the workflows folder in the repository.

Conclusions

Automating code reviews is a necessity as AI-generated code volumes increase. By using run-agy-sdk, you can run the Antigravity SDK to review PRs automatically and shift more of the burden of code quality assurance away from human reviewers.

Access the full source code in the GitHub Repository.
Read the documentation to customize the prompts and mode.
Feel free to fork the repository and build your own automation.

Acknowledgments

This project was inspired by the run-gemini-cli action, while shifting to the recently released Antigravity SDK. It is a personal sample implementation of how to run the Antigravity SDK in a GitHub Action, and is not an officially supported Google product.

Let’s connect!

I’d love to hear how you’re using Antigravity for your agentic workflows. Are you building automated code review loops or keeping a tighter leash on your agents?

Connect with me on LinkedIn
Follow me on X
Catch me on Bluesky

Building an agentic PR reviewer with Antigravity SDK was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Next Era of Conversational AI: An Introduction to Gemini Enterprise CX and CX Agent Studio

Pramodkumar Gupta @ Google — Thu, 18 Jun 2026 03:09:10 GMT

Throughout my career as a conversational AI architect, I’ve designed countless intricate chatbot structures. If you build chatbots, you probably remember the old way of doing things: creating endless flowcharts, guessing what users might say, and fighting with system limits in tools like Dialogflow. That used to be the best we had. But the AI world has completely changed. We aren’t just programming basic, robotic responses anymore; we are building smart, independent AI assistants.

Welcome to Gemini Enterprise for Customer Experience (GECX) and its crown jewel: CX Agent Studio. Let’s dive into why this platform is completely changing how we build chatbots and voicebots in 2026.

A Brief History of Conversational AI

To appreciate the leap forward that GECX represents, we have to look back at how we got here:

Generation 1: Rule-Based Bots: Rigid, keyword-driven IVR trees. If the user deviated slightly from the script, the bot broke, resulting in a frustrating loop of “I didn’t quite get that.”

Generation 2: Intent-Based NLU (Dialogflow ES/CX, AWS Lex): A massive leap forward. Natural Language Understanding (NLU) allowed bots to extract intents and entities. However, scaling enterprise bots required massive flow charts that often hit hundreds of pages. They became incredibly difficult to maintain and visually resembled a tangled web.

Generation 3: Generative & Agentic AI (CX Agent Studio): The current frontier. Instead of predicting every possible turn of a conversation, we define an agent’s persona, tools, and goals. The LLM dynamically reasons and orchestrates the conversation to achieve the desired outcome.

The Market Landscape: Where Does Google Sit?

The enterprise conversational AI market is highly competitive. We have established Hyperscalers like AWS (Lex), Microsoft Azure (Bot Service), and IBM Watson, alongside specialized CCaaS vendors. However, Google Cloud has historically dominated developer mindshare with Dialogflow ES/CX.

With GECX, Google isn’t just releasing a version bump; they are redefining the category by deeply embedding the Gemini 3 model family into an enterprise-grade wrapper. It takes the generative capabilities of modern LLMs and wraps them in the deterministic guardrails that enterprises demand.

Why GECX and CX Agent Studio Are Leading the Pack

CX Agent Studio abandons the “flows and intents” paradigm and adopts an architecture based on agents, subagents, and tools. Here are the standout features that make it a game-changer for developers:

1. Visual Subagent Orchestration

Instead of managing a literal web of 300 pages, CX Agent Studio offers a clean visual UI to structure subagents. You can define one specialized agent for “Order Tracking,” another for “Returns,” and a “Routing” agent to seamlessly orchestrate between them. It’s highly modular and infinitely more scalable.

2. True Multimodal Support

We aren’t just dealing with text anymore. GECX supports text, audio, and image inputs natively. With direct audio-to-audio (A2A) translation in 10+ core languages and human-like voices in over 40 languages, you can deploy a highly responsive voicebot with near-zero latency.

3. Tool Connectors & Deterministic Control

LLMs hallucinate, but enterprise systems cannot. CX Agent Studio allows you to securely wrap external APIs using Python tools, OpenAPI, or the Model Context Protocol (MCP). This “Context Engineering” ensures the agent only sees relevant JSON data, which reduces reasoning time and keeps conversations strictly grounded in factual business data.

4. Global Variables & State Management

State management has historically been one of the messiest parts of building conversational interfaces. Now, there’s a dedicated page to manage global variables across your entire agent structure, effectively treating session state as a first-class feature rather than an afterthought.

5. Built-in Evaluations and Previews

Anyone who suffered through legacy simulators will appreciate this. The testing environment allows you to see the agent’s internal reasoning and tool calls step-by-step. Furthermore, automated evaluation pipelines allow you to catch regressions before deploying changes to production.

Why the Industry is Rapidly Adopting It

For businesses, the ROI is clear. Development time drops from weeks to mere days. By utilizing 35 ready-to-use templates and a low-code builder, teams can launch sophisticated AI with near-zero human engineering. Furthermore, native widget support allows companies to deploy rich business messaging — enabling instant transactions and product discovery — without needing a massive custom frontend build.

Real-World Adoption: Who is Using GECX?

Major enterprises and global systems integrators are already making the shift in 2026:

Retail & E-commerce: Brands are deploying dedicated AI Shopping Agents that manage the entire journey from product discovery to checkout. These agents use complex reasoning to suggest products based on real-time inventory and process multimodal inputs (like a customer uploading a photo of a shoe they want).
Global Systems Integrators: Tech giants have launched dedicated “Gemini Enterprise CX Practices,” actively hiring Practice Leads and Solution Architects to migrate massive legacy CCaaS architectures over to Google’s generative platform.
High-Volume Contact Centers: By replacing legacy flow-based bots with CX Agent Studio, companies are providing zero-wait-time resolutions while simultaneously utilizing “Customer Experience Insights” to analyze conversations and Upskill human staff in real time.

Conclusion

Building conversational AI is no longer about attempting to predict what a user might say; it’s about empowering an intelligent system to dynamically assist them. Gemini Enterprise CX and CX Agent Studio offer the perfect balance: the raw generative power and natural language fluency of the Gemini models, paired with the strict deterministic guardrails, tool orchestration, and observability required by the enterprise.

If you are still mapping out intents and dragging arrows between hundreds of pages, it’s time to rethink your architecture. The age of the autonomous, multi-modal enterprise agent is officially here.

Learn more about GECX : Click here

The Next Era of Conversational AI: An Introduction to Gemini Enterprise CX and CX Agent Studio was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Building a 100% Serverless, Strictly Typed Data Platform on GCP

Ben Mizrahi — Thu, 18 Jun 2026 03:08:59 GMT

If you are building a mobile gaming studio, an ad-tech startup, or any heavily data-driven company, you face a critical bottleneck on day one: You need to get massive amounts of telemetry from your application into an analytical database fast, reliably, and cheaply.

Most startups fall into one of two traps:

Trap 1: The “Wild West” Data Lake. You dump unstructured JSON payloads directly into storage. It’s fast to build, but within a month, developers change a field name, downstream analytics break, and your data team spends 80% of their time fixing silent corruption.

Trap 2: The Over-Engineered Behemoth. You deploy Kafka clusters, Kubernetes nodes, and dedicated stream processors. Suddenly, your infrastructure bill is in the thousands, and you need a dedicated DevOps engineer just to keep the lights on.

Startups don’t have time for either. What you actually need is an analytics platform that is strictly typed from the application all the way to the database, infinitely scalable, totally “No-Ops”, and aggressively cost-optimized.

To solve this, I designed and open-sourced a complete, click-to-deploy, 100% serverless data platform on Google Cloud (GCP). In this article, I’ll break down how it works, the business benefits, and why this architecture is the ultimate cheat code for early to mid-stage startups.

(You can grab the complete Infrastructure-as-Code and start deploying right now from my GitHub repo: benmizrahi/gcp-data-platform)

The Solution: A Click-to-Deploy Serverless Architecture

The core philosophy of this platform is simple: Let the cloud provider do the heavy lifting. By wiring together GCP’s native serverless offerings, we create a pipeline that requires zero provisioning, scales instantly, and enforces data quality at the door.

Here is the anatomy of the solution:

Event Ingestion: Google Cloud Run (Serverless compute)
The Event Backbone: Google Cloud Pub/Sub
Schema Enforcement: Pub/Sub Native Schema Registry (Protocol Buffers / Apache Avro)
The Analytical Database: Google BigQuery

1. Strictly Typed: From App to Database

The most powerful feature of this platform is strict typing. In a data-driven business like ad-tech or gaming, a missing currency field or a malformed timestamp can ruin revenue reports.

In this architecture, events are defined using Protobuf. When your application sends an event, GCP Pub/Sub validates it natively against the schema before accepting it. If the payload is malformed, it is rejected with an error instantly. BigQuery then ingests this validated data perfectly. No broken dashboards, no silent failures, and complete confidence in your data.

2. Ready-to-Use & Extensible

The entire platform is defined as Infrastructure-as-Code (Terraform). It is a true “click-to-deploy” solution. Furthermore, it relies on a Pub/Sub “fan-out” model. If you want to add a real-time fraud detection service down the line, you don’t touch the existing pipeline. You simply add a new serverless subscription to the existing topic. It is infinitely extensible without touching the core code.

Analysis: Why This Wins for Startups

Why choose this specific architecture over managed Kafka or traditional ETL tools? Let’s break down the tangible business benefits.

Benefit 1: True “No-Ops” and Infinite Scale

If your mobile game gets featured on the App Store and traffic spikes from 100 events a minute to 100,000 events a second, this platform won’t break a sweat.

Cloud Run scales container instances from zero to thousands in milliseconds.
Pub/Sub handles massive throughput globally without you ever having to think about “partitions” or “brokers.”
BigQuery allocates query resources dynamically. You don’t need to wake up at 3 AM to scale up a database disk. The infrastructure adapts to your traffic instantly.

Benefit 2: Aggressively Cost-Optimized (The Pricing Model)

When you are a startup, managing burn rate is everything. Traditional data infrastructure requires paying a baseline cost just to keep servers running, even if nobody is using your app at 4 AM.

This serverless architecture operates on a pure pay-for-what-you-use model:

Compute (Cloud Run): You pay in increments of 100 milliseconds only when code is actively executing. Scale to zero means $0 cost when idle.
Messaging (Pub/Sub): You pay per gigabyte of data transmitted. GCP offers a generous free tier every month, meaning early-stage startups might pay pennies.
Storage/Analytics (BigQuery): You pay for the data you store and the queries you run.

The Bottom Line: If you have zero traffic, your infrastructure bill is effectively $0. If your company goes viral, your costs scale linearly and predictably with your revenue-generating traffic.

Benefit 3: Faster Time-to-Market

Because this is a click-to-deploy solution, your engineering team isn’t spending weeks debating infrastructure. They deploy the Terraform scripts in minutes, generate their strongly-typed client code, and get back to building the actual product.

Conclusion

Building a data platform used to be a massive undertaking reserved for companies with dedicated data engineering teams and large budgets. Today, by leveraging serverless architecture and native cloud schema registries, you can achieve enterprise-grade reliability on a startup budget.

This architecture gives you the holy grail of data engineering: strict typing so your data is always accurate, infinite scale so you never drop an event, and zero-ops so your engineers can focus on product, not pipelines.

If you want to stop fighting your data infrastructure and start getting insights, check out the complete code and deployment guide here:
🔗 Deploy the Platform on GitHub

If this architecture helps your startup scale, feel free to drop a star on the repo or share how you’re using it in the comments!

Happy Hacking :)

Building a 100% Serverless, Strictly Typed Data Platform on GCP was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Google Cloud - Community - Medium

Anthropic Just Gave Agents ‘Dreams.’ Here’s How to Build Your Own on Google Cloud

The architecture

Start with evidence, not memory

Write path one: Memory Bank remembers the user

Write path two: the agent learns the job

The dream is a scheduled reflection job

Embeddings make lessons usable later

Read path: merge facts and lessons before acting

The loop you are building

The Cloud Tetris: A Technical Deep Dive into GCP Sole-Tenant Node Usage Optimization

Orchestrating Live Migrations: Transferring Cargo at Sea

The Algorithm: Greedy Drain and Best-Fit Packing

Preventing Physical Fragmentation: Large-VM Prioritization

The Dynamic Port: Managing Capacity Collisions

Self-Healing State: The Multi-Pass Architecture

Hardware Strain: Enforcing Source-Node Parallelism

Conclusion: Winning the Cloud Tetris Game

OpenGravity: Turn Antigravity into your autonomous cloud agent, controlled from WhatsApp

From Hermes to Antigravity

Why this matters: The real benefits

🌍 Your Agent, Everywhere

💰 Cost-Effective: Use Your Existing Subscription

🔒 A Safe Sandbox for Your Agent

✋ You Stay in Control

📺 Demo

OpenGravity, setup guide

Step 1: Set up your cloud VM

Step 2: Install Antigravity 2.0 on Linux

Step 3: Install OpenGravity: Let the agent handle the setup

Launch and Prompt the Agent

WhatsApp Pairing Code Authentication

Step 4: Start chatting with your agent

Smart conversation management

Human-in-the-Loop Approvals

What’s next: Extending the agent’s capabilities

🔗 Google Workspace CLI integration

⏱️ Background tasks & autonomous operations

🧩 Extended skills & plugins

The bigger picture: Agents as team members

Agents are the next fundamental shift

What I’m building toward

The future of work with agents

Conclusion

Building a Serverless, Multi-Source Data Ingestion Framework on GCP (Snowflake & Databricks to…

Building a Serverless, Multi-Source Data Ingestion Framework on GCP (Snowflake & Databricks to BigQuery)

Introduction

The Architecture: Orchestration Meets Serverless Compute

Why GCP Workflows?

Key Engineering Challenges & Solutions

1. Smart Incremental Loading (Deltas)

2. “Diversity Guards” Against Duplication

3. Handling Complex Data Types (Snowflake Variants)

4. Nullability & JVM Protection

5. Column Sanitization

Declarative Configuration

Business & Operational Benefits

Conclusion

You got informed consent. Can you prove the AI was right?

A sequel to “You let AI operate on a production database without your consent?”

The gap nobody talks about

aiHelpDesk Feedback Loop

Why the gate is the best place to ask

Calibration is the clincher

The autonomy gradient

What we shipped

In a nutshell

Why aiHelpDesk Playbooks are trustworthy?

Related Reading

AI Agent vs. AI Harness: Where Memory Actually Lives

The model decides what to do. The harness stores memory, runs tools, enforces rules, and keeps the agent working.

The word “agent” hides three layers

A Simple Test for Finding the Harness

Why memory belongs in the harness

What Is the Agent, and What Runs It?

When Google Operates More of the Harness

The boundary becomes real when agent number two arrives

Ask a better design question

Orchestrating with Antigravity: A Crescendo of Agents (Part 1)

This article series