Head of Realtime AI @OpenAI. Created WebRTC. Past: CTO @ultravox_dot_ai, Distinguished Engineer @google (Stadia, Meet/Duo), AIM. Amateur mathematician/musician.
New post on the OpenAI eng blog from two engineers on our Realtime AI team here in Seattle, outlining how we designed our v2 realtime infra and how we've optimized it for easy scalability and low latency. Check it out!
During the development of WebRTC, we recognized the impact of voice and video on human communication, and I wondered if someday we'd talk to AIs the same way. Today, we can see this future taking shape, and I'm excited to announce I've joined @OpenAI to lead real-time AI efforts!
Some news: after almost 15 years at @Google, today is my first day at @Clubhouse! I'm impressed with what @pdavison and @rohanseth have built, and I'm super excited to be joining the team and figuring out where this can go!
Wow! Ultravox is an *open source* speech to speech model โ understands non-textual speech elements โ paralinguistic information. @juberti just showed how it can pick up on tone, pauses, and more! @AITinkerers Seattle @FixieAI
In honor of the 12 days of OpenAI, here's a realtime voice assistant in 12 lines of JavaScript! This code uses our new WebRTC support for the Realtime API - just pass in a local stream from getUserMedia(), an <audio> element for output, and an API token.
platform.openai.com/docs/guides/reโฆ
Introducing TTSTSTT, a breakthrough new AI model architecture. Unlike prior LLM architectures trained entirely on text tokens, Text To Speech To Speech To Text (TTSTSTT) models are trained to perform reasoning entirely within the auditory domain, but also have conversions to text
Ever wondered how your mic doesnโt pick up the sound from your speaker during calls, but only your voice even though they are so close physically
Go build this, Make this your first project of 2025. Itโs highly technical and will teach you a ton
Huge Realtime API release today! Details below, but TLDR:
- GA (out of beta)
- better instruction following, naturalness, audio
- MCP support
- new voices
- SIP (telephony) support
- new WebRTC APIs and video support
Demos: hello-realtime.val.run, or call 425-800-0042
The new OpenAI realtime transcription API now supports WebRTC connections, which allows you to easily connect a MediaStream or <audio> element to the API. Just made a quick demo to show this off - try it out at juberti.github.io/demos/realtimeโฆ.
Porting the Stadia web app to iOS was a super fun project, and we're really happy with how it turned out. Here's a video I recorded a few months ago demonstrating we could deliver the complete Stadia experience on Safari, even for an extremely timing-sensitive game like Thumper.
Highly recommend learning by doing, but if you want an intro into audio DSP I recommend starting somewhere easier, e.g., a resampler, a noise suppressor, an AGC. Writing a software AEC (with delay estimation and NLP) as a first project is something out of Good Will Hunting.