Stories by Undo Bytes on Medium

AI Agents Don’t Have an Intelligence Problem. They Have a Context Problem.

Undo Bytes — Thu, 28 May 2026 10:21:29 GMT

By Greg Law, CEO at Undo

Over the past year, we’ve seen an explosion in AI-generated code.

On the surface, this looks like a productivity breakthrough. More code, written faster, with less effort. But if you spend time inside large, real systems, it doesn’t feel that simple.

In fact, something concerning is happening: we’re producing systems that fewer and fewer people (or machines!) actually understand.

More code. Less understanding.

AI has dramatically increased the volume of code being written.

But a growing proportion of that code is:

Not especially well structured
Only partially understood
Generated faster than it can realistically be validated

And there’s a tendency to trust it a bit more than we should.

The net effect isn’t just more software, but more unknowns and uncertainty.

Systems become harder to reason about and failures become harder to diagnose. Indeed, when something goes wrong, it’s harder to answer fairly basic questions like ‘what happened?’ or ‘why did that happen?’.

Where AI agents struggle

There’s a lot of interest in using AI agents to debug systems. And in simple cases, they can be very helpful indeed. But once you get into real-world systems (multi-process, multithreaded, legacy monolithic codebases), they tend to struggle.

It’s not really an intelligence issue. It’s just that they can’t see what happened across processes or threads or other kinds of complex interactions, especially where cause and effect are far apart. They can read static code or scan logs and follow documentation (where it exists!).

But they cannot see:

The actual execution path taken through the code.
The values that flowed through it at runtime.
The state as it changed across modules and services.
The exact sequence of events that led to an unexpected behavior or failure.
What actually happened in production.

So they fill the gaps with something that looks plausible… in other words, they guess. And as engineers, we don’t like guessing…

AI effectiveness = model + context

No matter how good a model is, it is only as good as the context it’s given. If you don’t provide enough of the right information (high-quality relevant context), even a very good model will struggle.

And the hardest problems in software don’t live in static code. They live in runtime behavior.

Why runtime is where the truth lives

In complex systems, cause and effect are often quite far apart.

A failure might be triggered by:

A specific ordering of events
A subtle state change that only happens under load
A wrong value that propagates across multiple services
A customer-specific environment that only exists in production

These things don’t show up clearly in logs (that’s if you have the right logs in the first place). They aren’t obvious from reading code either. So AI agents try to reconstruct what happened from incomplete information.

And they are often non-deterministic — meaning you can’t just “re-run” the system and expect the same result. This is why both engineers and AI agents struggle.

What’s missing: complete runtime context

If we want AI agents to be genuinely useful for understanding complex issues in complex codebases, we need to give them access to what actually happened when the program was running.

That means:

Full control flow across the system
Complete data flow (the values, as they changed)
A faithful recording of the exact software execution

When you provide this level of deep runtime context, AI agents stop guessing entirely and start reasoning based on some ground truth.

What this unlocks

With deep runtime context that is specific to your application, AI can move beyond surface-level assistance into something far more powerful:

Automatic root cause analysis — Not “possible causes”, but precise explanations grounded in actual execution. It can even mean no human engineer needs to see potentially sensitive customer information.

Faster, cheaper debugging — Less back-and-forth, fewer tokens, and the ability to use smaller models effectively.

Elimination of blind trust — Engineers can verify exactly what happened, rather than relying on AI-generated narratives.

Reduced operational drag — Less firefighting. Faster resolution. Fewer escalations.

Maintainable AI-generated code — Because if you can’t debug it, you can’t safely scale it.

The bigger picture

AI is increasing the scale and complexity of the systems we build. That’s fine… as long as our ability to understand those systems keeps up.

If we don’t solve the context problem, we risk trading short-term productivity gains for long-term instability. If we do solve this problem, AI becomes much more useful. Not just for writing code, but for understanding it.

In the end, this isn’t really about smarter models, but about giving them access to a ground truth (a record of events).

See how this works in real life

This keynote presentation actually does a good job of illustrating what you can achieve with deep runtime context specific to the application you’re working on:

https://www.youtube.com/watch?v=v6OyVjQpjjc&t=3592s

👉🏻 Feel free to DM Greg if you want to see how this might work on your own system.

Boost Your C/C++ Debugging in VS Code with Copilot and Time Travel Debugging

Undo Bytes — Thu, 02 Oct 2025 16:24:39 GMT

Copilot is driving a VS Code standard

For years, developers have had freedom of choice when it comes to editors and IDEs. Vim, Emacs, Visual Studio, Eclipse, CLion. Every team had its mix. But in recent times, a shift has accelerated: GitHub Copilot has pulled developers into Visual Studio Code.

It’s not always because they want to switch. It’s because they need to. Copilot is becoming a must-have productivity tool, and the best Copilot experience is in VS Code. As a result, many teams are standardizing on VS Code, even when they’ve historically been fragmented across different tools.

This standardization is not a bad thing. It reduces toolchain friction, simplifies onboarding, and aligns workflows. But it creates a new dynamic: developers are writing code faster than ever before, thanks to Copilot, yet when it comes to debugging, they’re still stuck with the same old breakpoints and print statements.

The productivity gap Copilot leaves behind

Copilot has proven its value:

It suggests boilerplate and idiomatic code
It accelerates feature development
It reduces friction for newer developers learning a codebase

But Copilot stops helping once code compiles and runs. When something goes wrong (a crash, a regression, or that intermittent race condition), Copilot doesn’t have answers. Sure, it can help when you have a simple 100% reproducible bug, or when you have good logs, but how many difficult bugs are like this? Debugging remains manual, slow, and frustrating.

For C and C++ teams, the problem is even sharper. These languages come with unique debugging pain points:

Memory corruption that only manifests under certain inputs
Data races and concurrency issues that vanish when you add logs
Legacy codebases where nobody truly understands the logic flow

The result? Developers spend disproportionate time debugging compared to writing new code. In fact, industry studies suggest engineers can spend more than 50% of their time in debugging mode. If Copilot makes you write code 20% faster but you’re still spending weeks chasing after an elusive bug, the productivity gains vanish.

Enter Undo: Time travel debugging in VS Code

Traditional debugging in C/C++ relies on a forward-only model: you set a breakpoint, run until it’s hit, then step line by line. If you overshoot the bug, you restart the program and try again. This trial-and-error cycle wastes time and makes it nearly impossible to capture nondeterministic issues like race conditions.

Undo’s time travel debugging eliminates this guesswork. The Undo extension for VS Code records the complete execution of your program, including all memory changes, system calls, and thread interleavings. Once recorded, the execution becomes deterministic and fully replayable. This enables:

Reverse step & reverse continue — move backwards through your program execution just like stepping forwards, so you can trace the chain of events leading up to a failure
Memory watchpoints in reverse — instantly jump to the point in time when a variable or memory location was last modified. Instead of hunting through logs, you can pinpoint where corruption first occurred
Cross-thread causality — for multithreaded code, Undo shows you exactly which thread wrote to shared state and when, making data race reproduction straightforward
Deterministic replay — the same bug can be replayed as many times as needed, with identical behavior. This is critical for those “heisenbugs” that disappear when you add logging

This means debugging becomes less about guesswork and more like forensic analysis. You don’t ask “what might have happened?” You see what actually did happen.

Debugging your way: GUI or terminal

One of the challenges with introducing new tools is developer preference. Some engineers live happily in a graphical debugger inside VS Code. Others are die-hard terminal users who find GUIs distracting. Undo respects both.

Option 1: Graphical debugging in VS Code

Inside VS Code’s Debugger pane, you’ll see familiar controls (breakpoints, call stack, variables, watch expressions) but now with time travel capabilities. For example:

When you hit a breakpoint and notice an incorrect value, you can reverse-step until you see the exact instruction that changed it
You can navigate backwards in the call stack to watch function parameters evolve over time
You can even scrub back through the timeline of execution to quickly jump between failure states and earlier execution points

This makes time travel debugging accessible to developers who already use VS Code’s native debugger. No new interface to learn, just more powerful capabilities layered on top.

Option 2: Terminal debugging inside VS Code

For developers who prefer the raw power and speed of a CLI debugger, the extension also exposes UDB, Undo’s GDB-compatible interface, directly inside VS Code’s integrated terminal.

Commands like reverse-stepi, reverse-continue, and watch give you low-level control to move backwards through program flow.
Since UDB is GDB-compatible, most existing workflows and scripts translate directly, just with the added dimension of time travel.
This means you can stay in VS Code for editing, but still have the feel of a classic terminal debugger session without context-switching into a different tool.

By offering both GUI and CLI workflows, Undo ensures that teams don’t have to compromise on debugging style. Some developers can use the point-and-click interface inside VS Code; others can live inside UDB in the terminal. And you don’t have to pick just one: many developers mix and match, using both at the same time. For example, you might keep the GUI open to visualize program state while simultaneously issuing precise reverse-step commands in UDB within the terminal. Both are powered by the same recording and time travel engine under the hood, so the context stays perfectly in sync.

A modern workflow: Copilot + Undo together

When you combine Copilot with Undo, you get a full-cycle development workflow inside VS Code:

Write code with Copilot — Copilot accelerates boilerplate, suggests APIs, and makes implementing new logic smoother
Run and test — As always, you compile and execute your program. If everything works, great. If not, this is where Undo comes in
Debug with Undo’s time travel — Instead of sprinkling logs or setting endless breakpoints, you record execution and rewind to the exact failure point
Fix confidently — Once the root cause is identified, you return to writing, with Copilot ready to help propose fixes or new implementations

This loop minimizes context switching. You’re not juggling multiple tools or wasting time reconstructing failure scenarios. Everything happens in VS Code, and both Copilot and Undo are playing to their strengths.

Why it matters for C/C++ teams

Higher-level languages often come with safer abstractions, runtime protections, or frameworks that simplify debugging. C and C++ do not. That’s part of their power: direct control over memory, performance, and system resources, but it’s also the source of pain.

When debugging becomes the bottleneck, the productivity gains of Copilot are capped. By pairing Copilot’s faster coding with Undo’s faster debugging, C/C++ teams can achieve a step-change in velocity. The two tools together:

In other words, the modern C/C++ workflow is no longer just Copilot + VS Code. It’s Copilot + Undo inside VS Code.

Closing thoughts

The future of development isn’t just about writing more code. It’s about trusting, testing, and debugging code faster. Copilot is already the default for writing. Undo is the natural counterpart for debugging. Together, they create a complete loop of productivity inside VS Code.

If your team is standardizing on VS Code and Copilot, don’t leave debugging stuck in the past. Pair Copilot with Undo and make debugging as modern as coding.

👉 Try the Undo Time Travel Debug for C/C++ VS Code extension today

Is Your AI Budget Too High?

Undo Bytes — Mon, 21 Jul 2025 14:03:04 GMT

Let Us Help You Spend It 😉

AI is everywhere these days.

If you work in software, you’ve probably been asked what your AI strategy is.

Or worse — told what it should be… by someone who doesn’t know what a compiler does.

Nothing says technical innovation like advice from someone who’s never compiled a line of code.

The result?

AI budgets are blooming like Japanese knotweed.

Meanwhile, actual improvements to engineering productivity remain… elusive.

AI meets real engineering

Much of today’s AI tooling in software development is impressive — until you ask it to do something genuinely hard.

AI is brilliant at:

Suggesting things you already knew
Explaining the obvious in a more verbose way
Making confident guesses that are ever so slightly wrong

What it can’t do is debug a thorny, intermittent fault buried in a million-line C++ codebase maintained by four different teams over 12 years. At least, not without help.

That’s where we come in.

Undo makes recordings of your program’s actual execution — every line, every thread, every change in state — and lets you replay it like a black box flight recorder.

Now, we’re feeding that data to AI.

Time travel for debuggers and other intelligent lifeforms

With Undo, your engineers already get to “rewind” and “fast-forward” through the exact behavior of their programs to find the root cause of bugs.

Now, with our AI integration, the machine can help too.

How?

It reads real execution history — not just logs or guesses.
It explains what happened and why in natural language.
It highlights suspicious states, summarizes complex call paths, and yes, even suggests fixes.

Think of it as giving your AI an actual memory, not just a hunch.

Justifying Undo on the AI budget (without blushing)

Here’s the good news: we tick all the right boxes.

Developers love us.
Managers understand us.
Procurement hears “AI-enabled” and relaxes.
And, crucially, it actually works.

If you need to show you’re spending your AI budget wisely, few things say “innovation” like letting your software replay itself, and then explain itself.

Especially when one of the world’s leading AI labs literally gasped when we showed them Undo.

Final thoughts (before your CFO asks for a slide deck)

If you’ve got money set aside for AI, don’t waste it on novelty or noise.

Spend it on something your team will thank you for.

Undo: Time travel debugging, now with AI support.

For when your software breaks and you’d rather not guess why.

Can Undo AI help my business?

AddressSanitizer and Undo

Undo Bytes — Mon, 14 Jul 2025 14:17:10 GMT

AddressSanitizer and Undo are two tools that can be used to find bugs in your code. This article compares the two tools, and shows how they can be used together to debug issues more effectively than either tool alone.

What is AddressSanitizer?

AddressSanitizer (ASan) is a compiler extension and runtime library, originally developed by Google. It has since been integrated into many compilers, including Clang/LLVM and GCC. It can be enabled at compile time by compiling the application with the -fsanitize=address option.

AddressSanitizer works by conceptually dividing the applications address space into two regions: one that the application uses and a “shadow” region. The runtime library replaces the malloc and free operation such that for every memory map created in the application region, a corresponding map is created in the shadow region. When the application map is freed, the shadow map is filled with “poison” values. AddressSanitizer modifies how the compiler generates machine code so that whenever it reads/writes memory, it also checks that the shadow map doesn’t contain poison values. See the AddressSanitizer algorithm for more details.

This allows AddressSanitizer to detect a wide variety of memory access bugs, including use-after-free bugs, buffer overflow bugs, and memory leaks. When AddressSanitizer detects an issue, its default behavior is to print some diagnostics and make the program exit immediately.

Key takeaways

AddressSanitizer can detect use-after-free, buffer overflows, memory leaks and other types of bugs.
AddressSanitizer can be enabled using the -fsanitize=address flag in GCC/Clang.
AddressSanitizer instruments the code to insert runtime checks before each memory access operation.

What is Undo?

Undo is a time travel debugging solution for large-scale enterprise applications. When there’s a bug in your software, you can use Undo to capture the bug in a recording file (capturing the full program execution in a single binary file) and to step back in the recording to examine the full state of the program at any point in time.

Undo comes with 2 components:

LiveRecorder: for recording the runtime behavior of an application and saving it as a portable recording.
UDB: for replaying recording files (or live debug sessions) back and forth in time to see what happened.

AddressSanitizer vs Undo: How do they compare?

ASan has to be enabled at compile time, and since the clang documentation recommends against using ASan in production, this usually means the application needs to be recompiled as an internal debug build to use ASan.

Undo on the other hand works well on production applications (as long as symbols are available when replaying the recording), so a special build isn’t required.

ASan and Undo both require additional memory when running the application. The memory overhead of ASan largely depends on the memory allocations the application performs; fewer, larger maps have a lower overhead than many small maps. ASan also imposes a larger (up to 3x) memory overhead for stack memory. Additionally, ASan uses a very large (but mostly unused) memory map for the shadow region. This usually doesn’t matter, unless memory overcommit mode is disabled on the system. In that case, the system will run out of memory and kill the application when ASan tries to create the shadow map.

The memory overhead introduced by Undo is usually less than 2x. While Undo doesn’t require overcommit mode like ASan, we still recommend using it.

In terms of execution speed, Google’s documentation of ASan says “The average slowdown of the instrumented program is ~2x”.

The slowdown of Undo is highly workload dependent. Many real-world programs can be recorded running at better than half-speed. For those with more threads, expect 1.5–5x slowdown per thread. Undo’s dynamic just-in-time instrumentation captures only the minimum data required to replay the process — 99% of the program state can be reconstructed on demand, so only the non-deterministic inputs need to be recorded. (see performance benchmarks)

Detecting vs understanding errors

Perhaps the biggest difference between ASan and Undo is that, while they’re both tools for improving software quality and stability, they’re useful at different points in the process. ASan (and other sanitizers, like ThreadSanitizer) excel at detecting potential issues, but don’t offer much support for working out why the application tried to perform an invalid access memory.

In contrast, Undo does not detect that the program it’s recording is behaving incorrectly, but once you’ve recorded an occurrence of a bug, you can find out exactly the sequence of events that caused it to happen.

Therefore, the natural question to ask is:

Can Undo find the cause of issues that ASan has detected?

Using Undo and AddressSanitizer together

Let’s try an example of combining Undo and ASan to detect and root-cause a bug.

For this demonstration, we have created a simple program called malloc-var. The program is intended to allocate an integer on the heap, then increment it. However the program has a bug; let’s run the program to see it:

$> cd examples/
$> make malloc-var
$> ./malloc-var
Set the value to 42
Incremented value from 42 to 0

Well that’s not right, so we’ll try using ASan and Undo’s LiveRecorder tool to locate and root-cause the bug. (We’re also going to pretend we aren’t allowed to read the source code of malloc-var unless one of our tools tells us that a line is of interest. This is to simulate how we might debug a real application that we aren’t familiar with, rather than a 22 line example program).

First, we’ll recompile malloc-var with -fsanitize=address, and rerun it under LiveRecorder:

$> make CFLAGS='-fsanitize=address -g -O0' malloc-var
$> live-record -o malloc-var.undo ./malloc-var
live-record: Maximum event log size is 1G.
Set the value to 42
=================================================================
==571059==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000014 at pc 0x5602a42402d5 bp 0x7ffc3bac05a0 sp 0x7ffc3bac0590
READ of size 4 at 0x602000000014 thread T0
    #0 0x5602a42402d4 in increment_pointed_value /home/dstevenson/undo/release/examples/malloc-var.c:12
    #1 0x5602a4240373 in main /home/dstevenson/undo/release/examples/malloc-var.c:23
    #2 0x7f999a7d3082 in __libc_start_main ../csu/libc-start.c:308
    #3 0x5602a424018d in _start (/home/dstevenson/undo/release/examples/malloc-var+0x118d)

0x602000000014 is located 0 bytes to the right of 4-byte region [0x602000000010,0x602000000014)
allocated by thread T0 here:
    #0 0x7f999aaae808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x5602a4240309 in main /home/dstevenson/undo/release/examples/malloc-var.c:18
    #2 0x7f999a7d3082 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/dstevenson/undo/release/examples/malloc-var.c:12 in increment_pointed_value
Shadow bytes around the buggy address:
  0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[04]fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==571059==ABORTING
live-record: Saving to /home/dstevenson/undo/release/examples/malloc-var.undo ...
live-record: Saving     99%

live-record: Termination recording written to /home/dstevenson/undo/release/examples/malloc-var.undo
live-record: Detaching...

This error message has a lot of information, but the main takeaway is that ASan detected a heap buffer overflow in malloc-var.c on line 12. If we peek at that line, we can infer that the memory referenced by ptr_to_value_to_increment must contain the value 0, but it’s not immediately obvious why:

printf("Incremented value from %d to %d\n", old, *ptr_to_value_to_increment);

However, we also saved a recording of the program, so let’s load that in Undo’s UDB debugger, jump to the end of the recording and look at the backtrace:

$> udb malloc-var.undo
[...]
start 1> ugo end
[...]
end 66,947,821> backtrace
#0  __sanitizer::internal__exit (exitcode=1) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux.cc:429
#1  0x00007f999aad6e97 in __sanitizer::Die () at ../../../../src/libsanitizer/sanitizer_common/sanitizer_flags.h:37
#2  0x00007f999aab852c in __asan::ScopedInErrorReport::~ScopedInErrorReport (this=0x7ffc3babf926, __in_chrg=) at ../../../../src/libsanitizer/asan/asan_report.cc:185
#3  0x00007f999aab7fa3 in __asan::ReportGenericError (pc=94569343746773, bp=bp@entry=140721309615520, sp=sp@entry=140721309615504, addr=105690555219988, is_write=is_write@entry=false,
	access_size=access_size@entry=4, exp=0, fatal=true) at ../../../../src/libsanitizer/asan/asan_report.cc:458
#4  0x00007f999aab8ccb in __asan::__asan_report_load4 (addr=) at ../../../../src/libsanitizer/asan/asan_rtl.cc:118
#5  0x00005602a42402d5 in increment_pointed_value (ptr_to_value_to_increment=0x602000000014) at malloc-var.c:12
#6  0x00005602a4240374 in main () at malloc-var.c:23
end 66,947,821>

Most of the backtrace is the code ASan calls to report the error, but in stack frame #5 we’re on line 12 in our code. Let’s place a watchpoint on ptr_to_value_to_increment, and reverse to see where that pointer came from:

end 66,947,821> frame 5
end 66,947,821> watch ptr_to_value_to_increment
end 66,947,821> reverse-continue
Continuing.

Hardware watchpoint 1: ptr_to_value_to_increment

Was = (int *) 0x602000000014
Now = (int *) 0x602000000010
increment_pointed_value (ptr_to_value_to_increment=0x602000000010) at malloc-var.c:11
11  ptr_to_value_to_increment += 1;
0% 3,007>

Reversing has taken us back to line 11, where we increment the pointer. But from the name of the function (increment_pointed_value) and the intended behavior of the program we can infer that we actually wanted to increment the value on the heap, not the value of the pointer.

And with that, we’ve root-caused this (very simple) bug; the corrected source code of malloc-var is:

1	/* This is free and unencumbered software released into the public domain.
2	 * Refer to LICENSE.txt in this directory. */
3    
4	#include 
5	#include 
6    
7	static void
8	increment_pointed_value(int *ptr_to_value_to_increment)
9	{
10		int old = *ptr_to_value_to_increment;
11 -	ptr_to_value_to_increment += 1;
   +	*ptr_to_value_to_increment += 1;
12		printf("Incremented value from %d to %d\n", old, *ptr_to_value_to_increment);
13	}
14    
15	int
16	main(void)
17	{
18		int *ptr = malloc(sizeof(int));
19    
20		*ptr = 42;
21		printf("Set the value to %d\n", *ptr);
22    
23		increment_pointed_value(ptr);
24    
25		free(ptr);
26    
27		return EXIT_SUCCESS;
28	}

We can also rerun malloc-var to prove that this fixes the bug:

$> ./malloc-var
Set the value to 42
Incremented value from 42 to 43

Caveats

Undo is compatible with ASan, but for some features a little configuration is required. Undo’s tools have the ability to attach to running processes, but in order to attach to applications that use Asynchronous I/O, the application must have been started with a special preload library. However, ASan’s runtime library must be loaded before all other shared libraries, and this has to be manually configured by setting the LD_PRELOAD environment variable.

# Compile examples/aio.c with AddressSanitizer
$> make CFLAGS=-fsanitize=address aio

# Start my-program, with the Undo Async I/O preload library
$> LD_PRELOAD=libasan.so:tools/libundo_aio_preload_x64.so ./my-program &

# Attach LiveRecorder to the process.
$> live-record -p $! -o recording.undo

Conclusion

In conclusion, AddressSanitizer and Undo are complementary tools that work well together to enhance developer productivity and improve the quality and correctness of applications developed with their help.

Interested in seeing Undo in action? Book a slot with one of our Solutions Engineers for a quick demo and determine whether this will work in your environment.

https://calendly.com/undo-time-travel-debugging/30min

Using AI to Debug your Programs with Undo

Undo Bytes — Thu, 10 Jul 2025 16:46:25 GMT

Author: Marco Barisione, Principal Software Engineer at Undo

AI is transforming how we write software, but debugging remains in the dark ages. At Undo, we’re changing that by giving AI access to complete execution history. Our time travel debugging engine records every instruction, variable, and function call, allowing the AI not only to watch what happened but also to understand why.

What is Undo?

If you are unfamiliar with Undo, at the core of our technology is the Undo Engine, which implements record, replay, and time travel debugging. It can produce recordings of your program’s execution.

Recordings capture the whole execution history — every line of code in every thread, every variable, every I/O. All you have to do is rewind and fast-forward in time in the recording to observe how the state changes and why the code behaves the way it does.

This means you can:

Get a picture of the code flow by exploring how the code is executed dynamically
See exactly what happened and how it happened
Understand subtle interactions amongst complex components

Recordings are portable, allowing them to be replayed outside of the original environment and shared with your team, or even generated on a customer or production system and then shared with developers.

But can we use execution history, along with AI, either to fix existing bugs or to be part of the AI feedback loop when generating new code?

Debugging with AI

You’ve just been assigned a bug. It only happened once on a test machine, or it happens in a customer environment you don’t have access to. The code is unfamiliar, large, and complicated. Maybe you’ve got a few logs and a vague description, and now it’s your problem to solve. This is probably a situation all developers have had to deal with — what are your options?

Nowadays, many developers will probably try using an AI with some logs and relevant code. It might make a plausible guess, but most of the time, with complex bugs, it will fixate on something unrelated, like a line that wasn’t even executed, or give a general suggestion that doesn’t apply. Without knowing what actually happened during the run, the AI is operating in the dark.

Now, imagine you have an Undo recording and you can use time travel debugging on the failing program run. You can see exactly what happened: which paths the program took, what values changed, and when. You can step back from the failure, inspect state, and understand the code as it actually behaved. This is powerful — and it’s what our customers already rely on — but it still involves a lot of manual work, especially when the code is unfamiliar.

A possible next step is to integrate Undo with an AI: you give the AI access to the recording and let it help drive the investigation. It can walk through function calls, summarize what the program did, and highlight areas worth looking into. This speeds up exploration, but current models still struggle with complex debugging, probably because, while there is a lot of training data on how to write code, there isn’t much high-quality data on how to debug.

The core problem is that the number of possible program states is astronomically huge, but most bugs only show up in one very specific scenario. What the AI needs is guidance. Instead of the AI guessing or manually probing the recording, we feed it structured, targeted insights from the program’s execution history. The model isn’t just driving a debugger — it’s informed by what actually happened and why. There are quite a few open questions about how to achieve this, but we’ve discovered a handful of new and interesting ways that are really promising to greatly improve debugging capabilities.

The goal is for the LLM not only to identify when something went wrong in the recording, but also to explain why it happened — and even suggest a fix.

Watch this space!

A less annoying Clippy?

In the meantime, while we work on deeper integrations, is there anything we can do now to gain some practical value from AI?

If you remember Clippy, the animated paperclip from Office 97, you’ll understand why we were cautious about bolting AI onto a debugger. Until recently, the idea felt a bit gimmicky. But newer models like ChatGPT, Claude Opus and Google’s Gemini have changed that. They’re more capable, more context-aware, and genuinely helpful in the right situations. While this is not yet the “help the AI” future we envision in the long term, we believe that some integration can already be very beneficial to our users.

We’ve been experimenting with a new explain command in UDB, which allows an AI (we currently use Claude Code) to drive the debugger to answer your questions.

https://medium.com/media/a14e0a69258c67aed05ac7ca9290ca76/href

In the video, you can see the AI solve a stack smash bug. This is, of course, a very simple example, but the AI integration has already proven useful in practice, especially when dealing with unfamiliar code. It can trace what happened, summarize it, and point you in the right direction. That can save a lot of time, particularly when stepping through complex or legacy code.

We’ll be releasing a preview version of this feature as an optional add-on soon. Stay tuned!

Interested in trying Undo on your code? Get a free trial below.

https://undo.io/udb-free-trial/

Undo × MCP: Time Traveling With Your AI Code Assistant

Undo Bytes — Thu, 10 Jul 2025 16:30:29 GMT

Author: Mark Williamson, CTO at Undo

Why your AI code assistant needs a time machine

AI code assistants based on LLMs (Large Language Models), such as Claude Code, OpenAI Codex and GitHub Copilot, are popping up everywhere today and enabling us to write code faster than ever. They can also help out by analyzing source code, logs and output from other tools to help find bugs.

But, clever though they are, there is critical information they lack:

They can’t know the dynamic behavior of the software just by looking at the source code (they don’t have access to the inputs, variable values, etc).
Logs capture some dynamic behavior but only if you have them in the right place. If you don’t, you and the LLM need to go around tedious cycles of editing, rebuilding and reproducing the issue.

Looked at another way, LLMs can be brilliant when they have enough of the right context. But if they can’t see how your program actually behaves then they don’t have this and can’t get it.

We think time travel debugging is the right technology to close this gap.

Undo is our time travel debugger — a suite of tools that can record unmodified Linux software, then replay and debug it to understand how it went wrong (or just how it works). You get an immutable recording of everything the process did, at machine instruction granularity.

It works with C, C++, Rust, Go and Java (with more languages to come in future).

Quick start for impatient people

If you don’t have Undo yet, go get it, then return to this section!

This article is about an MCP (Model Context Protocol) server integration to use Undo’s UDB debugger with AI coding assistants. If you’re already a user of Undo + AI, you can try our first proof of concept right now, then return to the article when you’re ready.

Caveat: This is a tech preview, it’s bleeding edge and won’t work in all cases, but it should already be useful.

If you have a recent installation of Undo (v8.2.2 or higher should work, for older versions YMMV) then you can activate our AI extension at the UDB prompt using:

extend explain

This will download and install the explain extension. Now you have two new commands:

uexperimental mcp serve – Starts an MCP server on localhost:8000 for you to access from an external coding agent of your choice. This will turn over control of your debug session to the external coding agent (when the AI agent has disconnected you can simply quit the server with Ctrl-C and return to using UDB as a normal debugger.).
explain – Ask questions about the program you’re debugging (requires a working installation of Claude Code). Behind the scenes, this turns over control to Claude Code via MCP, then returns control to you once your question has been answered.

Good questions for the explain command ask for information about the program’s behavior or ask for help with reverse navigation. For instance:

What went wrong in this program?

https://medium.com/media/c841bb27ad83e85564941108056f2270/href

What does the current backtrace mean?

https://medium.com/media/321b92a5fc58e4297c75b28163a1915e/href

Get me back out of libc.

https://medium.com/media/96ad1ba9c01290097b5dc9e819d89a73/href

Successive invocations of explain reuse the underlying Claude session, so you can have an ongoing dialogue during a single debug session.

Deep dive

What is Undo?

Undo provides software time travel. It can record the history of an x86 or ARM64 process, running on Linux, then wind back and forth through it down to machine instruction precision.

To do this, the Undo Engine instruments the process being recorded. The program requires no modification or special compilation — Undo injects JIT instrumentation at runtime to capture all the non-deterministic inputs to the program. Once this is done, the program can run as normal, while the Undo Engine captures every piece of non-deterministic information that flows into the program.

This includes:

System call results.
Thread scheduling.
Special machine instructions (e.g. x86’s rdtsc to read the time stamp counter).
Shared memory with other processes or hardware devices.
And so on…

With this information Undo can store the program’s execution into a portable recording file, which can be replayed any time — even on another machine. You can replay forward to see how the original execution unfolded or use time travel to rewind and jump around; ask questions like “When was the last time this function ran?” or “When did this variable get assigned this value?”.

On top of this machine-level capability, Undo supplies debug interfaces for C, C++, Rust, Go, Java, Kotlin and Scala.

With this available intermittent bugs can be recorded once and analyzed at will, memory corruptions become a matter of winding back to the bad write and complex code can be understood quickly by stepping backwards to see how execution unfolded.

Why does this matter to an AI?

AI based on Large Language Models (LLMs) is transforming how development is done. Code Assistants now generate considerable amounts of code in the real world. Agentic coding allows developers to assign work to an LLM and have it complete development tasks autonomously.

They’re also really good at digesting a big codebase and answering questions. What they’re less good at is understanding the dynamic behavior of a program — the reasoning required to infer what’s actually happened in the subtle corner cases that make for some of the hardest bugs.

Without enough information to tether them to what the program really did they’re also inclined to hallucinate a confident-but-wrong answer:

Undo recordings provide a complete trace of everything a program did when a bug occurred. There’s no need to guess — if the issue is captured in a recording then it can be understood. This provides a ground truth that can be used to guide and verify an LLM’s reasoning. With the information contained in a recording, AI coding agents can be smarter and more efficient.

What we’ve built so far

We’ve released an add-on extension called explain, which integrates with our UDB debugger. This is a tech preview but it can be used today to provide real assistance to debugging workflows.

This is shipped via our public Addons repository, so if you have a recent release of UDB you should be able to run:

extend explain

And get access to the command (put it in your .udbinit file to make it available in every session).

The extension provides a general-purpose MCP server (to integrate our UDB debugger with arbitrary AI agents) and a tightly-integrated explain command, which uses Claude Code behind the scenes to provide agentic reasoning.

To get this set up, read the “Quick start for impatient people“. (Give yourself a pat on the back for being patient the first time around)

The MCP server makes it possible to turn over control of the UDB debugger to an AI agent. This makes the context of a time travel debug session available to any agent that supports MCP, giving it the ability to reason about the runtime behavior of a program:

The explain command provides an AI assistant within the debugger yourself – you can ask it questions about the code and its behavior, interleaved with your normal debugging commands.

This workflow makes use of an external coding agent (currently it supports Claude Code via its SDK but we expect to support others in the future). Behind the scenes, the command activates an MCP server and passes connection details, plus your question, to the coding agent — it can then make full use of its configured tools, including UDB itself, to assist you.

In either case, the extension supplies something resembling conventional debugger functionality to the LLM. To help the LLM get the best possible value from these we’ve restricted the commands available and modified them slightly to impedance match better with the AI’s expectations.

This produces some impressive results — explain can solve some bugs and provide helpful assistance in understanding others, though like any AI it can make mistakes. However, because it’s a human-like interface we still consider this a shallow integration. Making the full power of time travel debugging available to an LLM needs a deep match between the strengths of the two technologies.

What we’re building next

The real value in combining time travel debugging with AI is going to be unlocked as we integrate more and more deeply with the LLM’s capabilities.

Interactive debuggers are a human-centric user interface — they provide a precise interface for moving through program execution and inspecting the detailed state of the program. We don’t think that’s ultimately going to be the most powerful interface for an LLM.

Modern AIs excel at digesting and sifting text (often using separate agents to offload some processing and while preserving their context windows) so we’re working towards a query-oriented interface that’s specialized for LLM use. The tricky question is: given a recording of everything the program did, what are the most valuable queries we can provide?

That’s what we’re working on right now. The early signs here are very positive — we’re already seeing more successful investigation of real world bugs, with lower token costs. We’re looking forward to sharing more details in the near future.

Conclusions

The capabilities of time travel debugging mesh well with the abilities of code assistants: you can capture the complete history of a program and then use AI to help sift through for the interesting parts.

The explain extension makes it possible to get AI assistance within a debug session, automating routine parts of the work and suggesting theories. Please let us know what you’re able to do with it, or if you have suggestions!

We’re going to keep working to create an AI-native interface to time travel debugging, to make code assistants smarter and more efficient — watch this space.

In the meantime, why not try Undo for free?

https://undo.io/udb-free-trial/

Comparison of ThreadSanitizer and Thread Fuzzing

Undo Bytes — Mon, 12 May 2025 20:35:46 GMT

Author: Gareth Rees, Senior Software Engineer at Undo

ThreadSanitizer and Thread Fuzzing are two tools for detecting data races in multi-threaded code. This article compares the tools.

ThreadSanitizer

ThreadSanitizer is a compiler extension, originally developed by Google, that has been added to various compilers, including Clang/LLVM and GCC, where it can be enabled using the -fsanitize=thread option.

It works by maintaining, for each word of memory allocated by the program, a set of N “shadow words”, where N is 2, 4, or 8, depending on the configuration (the larger the set of shadow words, the more accurate the analysis but the greater the memory overhead). The shadow words represent recent accesses to the word, or parts of the word, and are managed using random eviction. Each shadow word contains the thread that accessed the memory, the “epoch” of the access, the bytes accessed, and whether the access was read or write. The epoch is a global counter which is incremented when there is a synchronization between threads.

When the -fsanitize=thread option is provided, the compiler turns every access to memory by the compiled program into a call to a runtime function that updates the shadow words, and identifies cases where two threads accessed the same bytes in the same epoch. See the ThreadSanitizer algorithm documentation for details.

Thread Fuzzing

Thread Fuzzing is a feature of Undo’s LiveRecorder product. LiveRecorder records the runtime behavior of a program and saves it as an Undo recording so that it can later be replayed in a debugger. LiveRecorder allows only one thread to run at a time, by taking a lock in that thread, and letting all other threads block on the lock, but it regularly releases the lock to give other threads the opportunity to claim it and run.

When Thread Fuzzing is enabled, LiveRecorder varies the timing with which the lock is released and other threads may run. There are several fuzzing strategies which can be configured: see the fuzzing modes documentation for details.

Comparison

Build configuration

ThreadSanitizer requires a special build configuration, since the program must be compiled with the -fsanitize=thread option.

Thread Fuzzing can be applied to any program, and does not require a special build configuration.

Runtime configuration

ThreadSanitizer allocates the shadow words at locations determined only by the address of the original words, so that no memory accesses are required to find the shadow words. This means that it may not run under Address-Space Layout Randomization (ASLR), resulting in runtime failures like this:

FATAL: ThreadSanitizer: unexpected memory mapping 0x6395c9526000-0x6395c9527000

If this affects your program, you need to run it with ASLR disabled, for example, by running the program under setarch --addr-no-randomize, or by calling personality() with ADDR_NO_RANDOMIZE.

Thread Fuzzing works both with ASLR and without ASLR.

Performance

The Clang documentation says that “typical slowdown introduced by ThreadSanitizer is about 5×–15×.”

The slowdown due to Thread Fuzzing depends on the program. There is a slowdown due to the LiveRecorder recording overhead: this varies according to the type of workload. If the program is mostly working in its own private memory the slowdown may be as low as 1.5×, but if the program makes extensive use of shared memory then the slowdown can be much larger, as much as 50× if all memory is shared. This overhead needs to be multiplied by the slowdown due to serialization of threads, which is roughly proportional to the parallel workload: if the program keeps n threads busy when run natively, then the slowdown under LiveRecorder should be multiplied by n. For example, a program that keeps 4 threads busy, using mainly private memory and a small amount of shared memory, might see a slowdown of 2 (execution overhead) × 4 (threading overhead) = 8 times.

Memory Usage

The Clang documentation says that “typical memory overhead introduced by ThreadSanitizer is about 5×–10×.”

The memory overhead introduced by LiveRecorder is usually less than 2×. Thread Fuzzing adds no further memory overhead.

Classes of bugs detected

ThreadSanitizer reports data races regardless of whether they cause a program failure. This catches races as soon as they occur, instead of waiting for incorrect results to propagate through the program until it crashes or asserts (if it ever does).

Data races reported by ThreadSanitizer may in some cases be false positives. For example, consider a program in which increment_count() is called from multiple threads:

unsigned count = 0

void
increment_count(void)
{
    ++count;
}

ThreadSanitizer correctly reports this as a data race since an increment is not an atomic operation, but a read followed by a write. However, if the program uses count only as a lower bound on the number of times that increment_count() was called, then the code is safe. ThreadSanitizer provides mechanisms for suppressing false positives, either by using a suppression file or by annotating the source.

Thread Fuzzing reproduces data races, locking errors, and deadlocks. It does not reproduce races related to weak memory model semantics (out-of-order updates) on the ARM64 architecture. It does not detect or report errors: it is up to the program to bring data races to the attention of developers, for example, by crashing or asserting. It only reproduces error conditions that can occur in practice: that is, there are no false positives.

Interpreting the results

ThreadSanitizer emits its findings as reports giving backtraces for the racing threads. For example, consider these functions implementing lockless push and pop operations on a linked list:

static void
s_push(list_t *item)
{
    list_t *tmp = __atomic_load_n(&list_head.next, __ATOMIC_ACQUIRE);
    /* (A) */ item->next = tmp;    __atomic_store_n(&list_head.next, item, __ATOMIC_RELEASE);
}

static void *
s_pop(void)
{
   list_t *item = __atomic_load_n(&list_head.next, __ATOMIC_ACQUIRE); 
   if (!item) 
   { 
       return NULL; 
   } 
   /* (B) */ __atomic_store_n(&list_head.next, item->next, __ATOMIC_RELEASE); 
   return item;
}

static void *
s_consumer(void *arg)
{ 
   for (;;) 
   { 
       list_t *item = s_pop(); 
       if (item) 
       { 
           free(item); 
       } 
    } 
    return NULL;
}

When this code is compiled with ThreadSanitizer we get the following report:

==================
WARNING: ThreadSanitizer: data race (pid=945926)
   Read of size 8 at 0x72040008b1e0 by thread T2:
     #0 s_pop examples/linked-list.c:65 (linked-list+0x1401)
     #1 s_consumer examples/linked-list.c:117 (linked-list+0x1543)
  
  Previous write of size 8 at 0x72040008b1e0 by thread T1: 
    #0 s_push examples/linked-list.c:43 (linked-list+0x1388) 
    #1 s_producer examples/linked-list.c:100 (linked-list+0x1508)
  
  Location is heap block of size 16 at 0x72040008b1e0 allocated by thread T1: 
    #0 malloc src/libsanitizer/tsan/tsan_interceptors_posix.cpp:665 (libtsan.so.2+0x54b3f) 
    #1 s_producer examples/linked-list.c:99 (linked-list+0x14f8) 
  
  Thread T2 (tid=945929, running) created by main thread at: 
    #0 pthread_create src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1022 (libtsan.so.2+0x5ac1a) 
    #1 main examples/linked-list.c:146 (linked-list+0x1651) 
  
  Thread T1 (tid=945928, running) created by main thread at: 
    #0 pthread_create src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1022 (libtsan.so.2+0x5ac1a) 
    #1 main examples/linked-list.c:145 (linked-list+0x1634)

SUMMARY: ThreadSanitizer: data race examples/linked-list.c:65 in s_pop
==================

The developer reading the report must use pure deduction to figure out the cause of the race. In this case it is not too hard to see that the write of item->next at (A) is racing with the read of item->next at (B) and that a lock is needed to avoid interleaving of pushes and pops. However, in more complex cases it may not be so easy to deduce the cause.

Thread Fuzzing does not detect or report data races, but LiveRecorder saves an Undo recording of the program’s behavior, which allows a failure to be reproduced by replaying an instruction-precise trace of the program’s behavior, allowing all of memory to be inspected at any point in time. In the example below, the program crashes with a segmentation fault and live-record saves an Undo recording:

$ live-record --thread-fuzzing ./linked-list
live-record: Termination recording will be written to 
linked-list-968476-2025-03-18T16-40-45.616.undo
live-record: Maximum event log size is 1G.
live-record: Saving to 
linked-list-968476-2025-03-18T16-40-45.616.undo ...
live-record: Saving.. 100%

live-record: Termination recording written to 
linked-list-968476-2025-03-18T16-40-45.616.undo
live-record: Detaching...
Segmentation fault (core dumped)

We can load the Undo recording into UDB:

$ udb -q linked-list-968476-2025-03-18T16-40-45.616.undo
0x00005bad0e0f6120 in _start ()

The debugged program is at the beginning of recorded history. Start debugging
from here or, to proceed towards the end, use: 
continue - to replay from the beginning 
ugo end - to jump straight to the end of history

The crash is at the end of history, so we jump there:

start 1> ugo end
[New Thread 968476.968506]
[New Thread 968476.968507]
[Switching to Thread 968476.968507]
0x00005bad0e0f625e in s_pop () at examples/linked-list.c:65
65 __atomic_store_n(&list_head.next, item->next, 
__ATOMIC_RELEASE);
end 15,289,385> bt
#0 0x00005bad0e0f625e in s_pop () at examples/linked-list.c:65
#1 0x00005bad0e0f632d in s_consumer (arg=0x0) at 
examples/linked-list.c:117
#2 0x0000759b4b49caa4 in start_thread (arg=) at 
./nptl/pthread_create.c:447
#3 0x0000759b4b529c3c in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

The reason for the crash is that item points at unmapped memory:

end 15,289,385> p item->next
Cannot access memory at address 0x759c1db41545
end 15,289,385> p item
$1 = (list_t *) 0x759c1db41545
end 15,289,385> p *item
Cannot access memory at address 0x759c1db41545

We can run backwards to see how item got this bad value:

end 15,289,385> last item
Searching backward for changes to 0x759b4a98ee88-0x759b4a98ee90 for the
expression: 
  item

Thread 3 "linked-list" hit Hardware watchpoint -25: *(list_t * *) 0x759b4a98ee88

Was = (list_t *) 0x759c1db41545
Now = (list_t *) 0xffffffffffffff88
0x00005bad0e0f6248 in s_pop () at examples/linked-list.c:58
58 list_t *item = __atomic_load_n(&list_head.next, 
__ATOMIC_ACQUIRE);
end 15,289,385> p list_head.next
$3 = (struct list *) 0x759c1db41545
end 15,289,385> p *list_head.next
Cannot access memory at address 0x759c1db41545

The bad value was loaded from list_head.next. Running backwards again to find out how this value became bad:

end 15,289,385> last list_head.next
Searching backward for changes to 0x5bad0e0f9030-0x5bad0e0f9038 for the
expression: 
  list_head.next
Enable debuginfod for this session? (y or [n]) n

Thread 3 "linked-list" hit Hardware watchpoint -28: *(struct list * *) 0x5bad0e0f9030

Was = (struct list *) 0x759c1db41545
Now = (struct list *) 0x759b44005190
s_pop () at examples/linked-list.c:65
65 __atomic_store_n(&list_head.next, item->next, __ATOMIC_RELEASE);
99% 15,289,375> bt
#0 s_pop () at examples/linked-list.c:65
#1 0x00005bad0e0f632d in s_consumer (arg=0x0) at examples/linked-list.c:117
#2 0x0000759b4b49caa4 in start_thread (arg=) at ./nptl/pthread_create.c:447
#3 0x0000759b4b529c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
99% 15,289,375> p item
$4 = (list_t *) 0x759b44005190
99% 15,289,375> p *item
$5 = {next = 0x759c1db41545, data = 0x0}
99% 15,289,375> p *item->next
Cannot access memory at address 0x759c1db41545

Running backwards a third time:

99% 15,289,375> last item->next
Searching backward for changes to 0x759b44005190-0x759b44005198 for the
expression: 
  item->next

Thread 3 "linked-list" hit Hardware watchpoint -36: *(struct list * *) 0x759b44005190

Was = (struct list *) 0x759c1db41545
Now = (struct list *) 0x759b44005270
0x0000759b4b4ab2f6 in _int_free (av=0x759b44000030, p=, have_lock=0) at ./malloc/malloc.c:4619
4619 ./malloc/malloc.c: No such file or directory.
99% 15,283,100> bt
#0 0x0000759b4b4ab2f6 in _int_free (av=0x759b44000030, p=, have_lock=0) 
     at ./malloc/malloc.c:4619
#1 0x0000759b4b4addae in __GI___libc_free (mem=0x759b44005190) at ./malloc/malloc.c:3398
#2 0x00005bad0e0f634b in s_consumer (arg=0x0) at examples/linked-list.c:128
#3 0x0000759b4b49caa4 in start_thread (arg=) at ./nptl/pthread_create.c:447
#4 0x0000759b4b529c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

So the crash was caused by a use-after-free and we are at the location of the free in the debugger ready for further investigation of the cause.

Conclusion

Each tool has its own strengths and weaknesses. ThreadSanitizer detects data races directly, which is effective if the program omits to check the consistency of its own data structures. Thread Fuzzing varies the scheduling of threads, which can be effective when a data race occurs only under unusual conditions, and produces an Undo recording that makes it possible to reproduce the failure in a debugger, which is effective when the cause of a race is complex.

It makes sense to use both tools and take advantage of each one in the cases where it is best suited. For example, you might run the product first under ThreadSanitizer and fix the easy-to-reproduce races, then run the product under Thread Fuzzing to vary the scheduling of threads and discover hard-to-reproduce races, or capture recordings of races which can’t be solved by pure deduction from the ThreadSanitizer report.

Interested in trying Thread Fuzzing? You can sign up for a free trial of Undo using the button below.

Try now

A Common Sense Guide to Symbols and Debug Info

Undo Bytes — Thu, 27 Mar 2025 18:03:59 GMT

In this blog, Isa Smith, Staff Software Engineer, explains the difference between symbols and debug info, and suggests some ways to track down your “debug symbols”.

This isn’t a comprehensive guide to anything, and doesn’t talk about the innards of DWARF, ELF, binutils, GDB. It’s intended to be a friendlier guide than the many more rigorous guides.

Two essential resources for the absolute truth are:

https://sourceware.org/gdb/current/onlinedocs/gdb.html/Separate-Debug-Files.html

https://gcc.gnu.org/wiki/DebugFission

The information here isn’t specific to Undo’s tools, except for a couple of mentions and a section at the very end. The impact on Undo’s tools is nearly identical to live debugging or loading a core file, so although you see UDB in the screenshots, this is just as relevant for plain GDB.

Who is this for?

Anyone who has encountered this:

More specifically, you’ll get this when you’re trying to debug a stripped binary: one without debug info or symbols. You’re most likely to encounter this when you’ve built on one machine then run the binary on another. Four cases where this may happen are:

Normal interactive debugging on a remote host
Debugging a core file generated somewhere else
Using gdbserver on a remote host
Debugging an Undo recording

What are “symbols”? What is “debug info”? What are “debug symbols”?

Put simply, symbols are the names and addresses of functions and variables in your program. Debug info is all the extra information needed to tie your machine code to your source code. For example, line information (what machine code addresses correspond to this source line) or local variable info (what register is this variable in when we are at this address?).

Symbols are included in executables by default. Debug info is not, and you need to pass -g to the compiler to get it.

If you try to debug a fully stripped program, your backtraces will look like this:

Most stack frames show only a disassembly address, with the occasional symbol name scattered in.

If you only have symbols, it’s better, but it’s still not going to give you a good experience. You can see what function you’re in and set some more useful breakpoints, but source code debugging is unavailable:

If you have symbols and debug info, you’re in the happy place:

I don’t have a strict definition for the phrase “debug symbols”, as it’s not a strictly defined concept. It seems to refer to “everything you need for debugging” — that being both symbols and debug info. You can’t actually have debug info without symbols, so it’s a slightly odd phrase. I suspect people use it because for a developer hoping to debug a program, the difference between symbols and debug info is pretty irrelevant. You really need both. Unless you can debug compiled machine code as easily as source code… in which case you have achieved oneness with the computer and you don’t need to be here. Well done.

Why is this happening to me?

To answer this, let’s first say what has happened. The debug information and/or symbols for your program have been removed (or stripped) from the binary on the system you’re trying to debug on. This can happen at compile time or as a post-processing step after compilation.

There are some very good reasons people choose to do this:

Size
Secrecy

On size, consider the gdb binary itself

Unstripped : 288MB

Debuginfo stripped : 14MB

Symbols and debuginfo stripped : 12MB

If you add this size to the sizes of any shared objects, you can end up with a very large project, and you almost can guarantee that an end user somewhere will have a slow enough data transfer to be annoyed by all this useless information.

Note that I’m only talking about disk space here. Neither symbols nor debug info are loaded into memory when the program executes. They’re not needed at run time so would be a waste of RAM. An exception to this is dynamic symbols in the .dynsym and .dymstr sections. These are needed at runtime by the dynamic linker. This is why some symbol names still appear (and are breakpointable) in a fully stripped binary.

Symbol names take a lot less space, but they do give away information about how your program works. It’s much more difficult to decipher any meaning from a large amount of totally context-free machine code.

Some vendors (including Undo) choose to include symbols and debug information in their shipped binaries. In our case the debug info is relatively small and nothing valuable can be inferred from symbol names or filenames. More importantly, we find being able to debug our code on a remote site to be very useful. The cost-benefit analysis is in favor of shipping everything.

Where are my symbols?

Ideally this is a question to ask your build system team. They may have a really great answer (more in a moment). However, this isn’t always possible. Maybe they don’t respond quickly, or you don’t want to bother them, or they have other priorities at the moment. I’ve had to sniff out debug information from a fair number of customer environments, so I’ll share how I do it.

Note: It’s worth knowing that debug information is in a format called DWARF, which is composed largely of offsets. Everything is defined as an offset from everything else. For this reason you need the exact debug info for your binary. Anything “close” just won’t work, at all, and will waste your time.

It’s also possible that you don’t have any symbols or debug info anywhere in your build. It’s usually somewhere, but this is another good way to waste time.

Caveats aside, if you’re sure you’ve got the same build that your debugged process came from, let’s have a look.

The first step is to see if there’s an unstripped version of the binary in the build tree. Use find -name “mybinary”, and then file to check.

If you see this output:

$ file mybinary_unstripped

mybinary_unstripped: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=1d904bc819c5b7030ec0e2b54ee6034acbea4194, for GNU/Linux 3.2.0, with debug_info, not stripped

This is ideal and can be used for debugging.

If your program doesn’t include any shared objects, you’re done. If it does, you need to find the debug symbols for them too if you want to debug them. I suggest finding the debug info for a couple of shared objects: that should give you enough information about the structure of the build. I’ll explain what to do with this in a moment.

If you see this output:

$ file mybinary_strippeddebug

mybinary_strippeddebug: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=1d904bc819c5b7030ec0e2b54ee6034acbea4194, for GNU/Linux 3.2.0, not stripped

$ file mybinary_fullystripped

mybinary_fullystripped: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=1d904bc819c5b7030ec0e2b54ee6034acbea4194, for GNU/Linux 3.2.0, stripped

These mean you still have hunting to do.

Try searching the project for *.debug, *.dwo, *.dwp files. These are various different ways that separate debug files can be stored. In many cases, there will be a link embedded in the stripped binaries (either the .gnu_debuglink section or the .debug_str section for DWO) which gdb will be able to follow and load the corresponding symbol file.

If you haven’t found anything yet, it’s possible your project isn’t built with debug info, in which case you’re back to speaking to your build team or investigating your build infrastructure code. Look for “strip”, “objcopy” and “split-dwarf”.

If you’ve managed to find some debug info, great! The next step is to tell GDB about it.

Loading the symbols

GDB has many commands for loading symbol files once you’ve located them, for example set debug-file-directory, set solib-search-path, symbol-file, add-symbol-fileetc.. Unfortunately, each command is designed for a different scenario, and it’s not obvious which command applies in which scenario.

Rather than going through the scenario each command goes with, I will introduce the multi-tool of symbol loading: add-symbol-file.

This isn’t always the simplest way, but it is the most reliable. It can cope with everything from a simple executable to more complex and surprising situations.

The command works like this:

add-symbol-file symbolfile -o offset

You get the offset from the memory map of the process. In GDB, use info proc mappings – you need the lowest memory address of the library or executable you’re debugging. For example, if you are loading the symbols for libcool.so in this output:

Supposing your debug symbols were at /home/work/ismith/project/build/dbg/libcool. so you would use:

add-symbol-file /home/work/ismith/project/build/dbg/libcool.so -o 0x7ffff7f29000

Can’t I just debug the unstripped binary?

Yes. If this is an option for you, you can absolutely debug the unstripped binary. If you want to debug on-target you’ll need the source code as well. This may not be an option if:

There isn’t enough disk space on a small target device
You’re not allowed to put symbols/source there — e.g. don’t copy your secret project to AWS

But if it is an option, go for it. As previously mentioned, debug info is neither required nor used in any way by a running process. It will not affect its behavior — on the other hand, attaching a debugger to it will massively affect its timing behavior which may cause problems. This isn’t due to the debug info though.

I looked for the debug builds and I found dbg, rel, dbgrel, dbgopt, reldbg, relwithdbginfo, which one do I use?

It’s common to have a lot of different build targets in a project. Many build systems provide several targets by default and your project may have built on these. This is because there are some orthogonal choices when making a build:

Build with or without debug info (-g options)
What optimization level to use (-O options)
Project-specifc “debug” modes by defining DEBUG or similar (-D compiler options). Might enable extra checking or extended error messages for developers.

If you have lots of build targets then you probably have a partial (or even full) cross product of these options. For best debugging, you want low optimization and debug info, and probably with as many -DDEBUG type options as you can. With high optimization levels, some source code lines have no corresponding machine code, and others will have machine code reordered to be most efficient. So when debugging, you’ll encounter optimized out variables (https://undo.io/resources/value-optimized-out-reverse-debugging-rescue/) and experience the current location jumping around when you step.

Often “debug builds”, with 2 and 3, are too slow to be used for anything large, so you’ll have to debug optimized code. This is painful to begin with but there are certain tricks and reflexes you’ll learn that make it pretty easy. Then debugging low optimization code becomes a special treat.

If you’re using UDB, the ugo undo or uu command is very useful, as it takes you back to your previous place. So if a next took you too far, you can quickly go back and try a step instead, or even stepi through the assembly code.

Symbol files and Undo

The issue of locating and loading separate debug symbols is a subject close to our hearts at Undo. Our technology records processes in a minimally invasive way — this means we can, for example, record a critical process on a network switch without interfering with its operation. In such a scenario we almost always record a release build with separate debug symbols. Since the replay part of the technology is a debugger (UDB) we absolutely need to find those symbols and get them loaded.

On the other hand, if the symbols are available on the device, Undo (LiveRecorder or UDB) will find them and add them to the recording. At replay time, Undo constructs a sysroot inside /tmp and uses set sysroot to point to it.

All of our customers have complex environments, and many have to deal with separate symbols/debug info. We offer a “Debug Info Service” which can serve symbols via debuginfod, or we can work with you to use existing scripts or mechanisms with Undo for a one-step load and debug experience.

Interested in finding out more? Book a demo: https://calendly.com/undo-time-travel-debugging/30min

Making C++ Safer

Undo Bytes — Thu, 27 Feb 2025 10:51:05 GMT

Would you start a new project in C++ these days? Many people would say no, they’d use something else; usually they cite Rust. Or Go, or Python or anything “memory safe”, which these days effectively means anything that isn’t C or C++.

A few years ago this kind of discussion was for language nerds hanging out on Hacker News; now it’s about geopolitics! States are working ever harder to hack each others’ systems, and the majority of critical infrastructure code is written in C++ (or plain C), and this can be a gift to hackers. The US government is now urging programmers to drop C and C++. There were some very misleading headlines recently, saying “Whitehouse mandates companies stop using C++ by 2026”. Of course, they didn’t say this, because it would be nuts — they actually strongly encouraged software companies to produce a memory safety roadmap by 2026. The argument around should we use C++ or not is moot — there are an estimated 10 billion lines of C++ code in production today; rewriting it all will take decades.

I believe that over time C++ will become a lot safer, maybe even some kind of ‘safe’. Competition is good: Clang was the best thing to happen to GCC, and Rust might turn out to be the best thing to happen to C++. That journey has already begun, with proposals for the evolution of the language including Contracts and Profiles, and simply changing some of the defaults in C++26. While the language custodians work to make the language itself safer, what can you do today?

Follow the Core Guidelines

Even today, most C++ doesn’t need to be nearly as unsafe as it is. If everyone had followed the existing Core Guidelines, we’d already be in much better shape. The guidelines are a set of simple rules which, if followed, make C++ a lot safer, as well as easier to read and reason about, and more consistent. They are maintained by Bjarne Stroustrup and Herb Sutter, and they cover issues such as interfaces, resource and memory management, and concurrency. For example: RAII (Resource Allocation Is Initialization) can be used to make many simple resource safety mistakes, such as a leak or use-after-free, no longer possible. When writing or reviewing new code, or changing existing code, follow them. As discussed however, we can’t just rewrite all that code, so how do we live with all that existing code that doesn’t follow the guidelines?

Bounds checking by default

There is simply no excuse for many (most?) of the safety violations we see today: stupid out-of-bounds errors accessing arrays, vectors, strings, etc. Google recently published a blog where they report enabling bounds checking and other hardening resulted in a performance impact of just 0.3%. Just enable such bounds checking by default. See here for how to do so with Clang/LLVM, and here for GCC. If you determine by profiling that some performance-critical piece of code is being materially slowed down by this, you can always disable the bounds checking for that one bit of problematic code. (My bet is that you won’t though.) Chances are that C++26 will enable such bounds checking by default anyway, but why wait?

Stop ignoring those flaky tests

Those flaky test failures are telling you something, whether or not you’re writing in a memory safe language, but with memory unsafe languages a good proportion of them will be memory errors. This is actually part of something bigger — memory safety is just one kind of vulnerability, and the same concerns that are driving the memory safety debate are affected by undiagnosed, flaky tests.

Fixing a flaky-test problem takes commitment, but it’s well worth it. Engineers from Google have written a lot on it, as we at Undo published a handy 7-step guide.

Use the tools: sanitizers

Use the sanitizers, particularly ThreadSanitizer, AddressSanitizer, and Undefined Behavior Sanitizer. Your CI should run a complete run of the tests with sanitizers enabled. While this won’t catch all memory safety problems, it will catch many of them. Hunt down all the failures, be very wary of dismissing them as false positives. (See flaky tests above.)

They’re super easy to use these days — you just need to pass the option -fsanitize=address, -fsanitize=thread or -fsanitize=undefined when you compile (either gcc or clang), and that’s it. There are lots of options you can tweak, but to get started, the defaults work fine. See this article for more details.

Use the tools: static analyzers

Many of the memory safety bugs that lead to security vulnerabilities can be identified by modern static analysis. Even the best static analyzers suffer from generating false positives, and if you haven’t run them before the list of warnings can be daunting. The good ones will help you prioritize which ones to look into first and/or can be configured to only show you issues in new/changed code. Static analysis is one of those tooling categories where if you want quality you need to pay for it — the cost of developing and maintaining them mean it’s difficult for traditional open source models to work. Coverity, Klocwork and SonarQube are popular options. If you want to start with open source, you could try cppcheck.

Use the tools: fuzzing

Fuzzing typically uncovers all kinds of edge cases you didn’t think about. Many of them will be safety concerns. Like static analysis, you can get a bewildering number of failures, but if you run using fuzzing and time travel debugging it is usually pretty trivial to root cause and fix the issue when you can step back through a recording.

There are good free and commercial offerings. For example, Google’s FuzzTest is free, flexible and fairly easy to use. It will find bugs in your code that you didn’t know about. It is built on top of AddressSanitizer (see above). On the commercial side, BlackDuck’s Defensics is very good, as is Code Intelligence’s CI-Fuzz.

Stop papering over the cracks

When you smell smoke, you act. If a test is failing or a bug is reported and a small change makes it work but you don’t understand why, spend the time to do the root-cause analysis. When your code is behaving in a way you don’t understand, it’s telling you something. If you don’t keep pulling on that thread to properly understand it, chances are that you are leaving a safety vulnerability that may later be discovered by malign actors — or maybe it already has been!

Again, a time travel debugger can make this much easier.

Greater than the sum of its parts

Finally, do all of the above together. Sanitizers and fuzzing are a powerful combination, and when combined with time travel debugging you’d be surprised at how quickly you can squash all of the problems you find, one by one. When fuzzing throws up an error sometimes the root cause is obvious (you forgot to sanitize that user input), but other times the reason for the bad behavior is anything but obvious — combining fuzzing with sanitizers can really help narrow it down, and/or feed the fuzz failure into a time travel recording.

Conclusion

Buggy code is unsafe. Code we don’t understand is unsafe. Memory unsafe languages like C++ make us especially vulnerable, so when using them we must compensate. However, it is important to remember that memory safe languages are no panacea. If you use good software engineering practices and make best use of the tooling available, you will produce safer, better code, whatever language you’re using. In fact, a C++ codebase managed by a team that does all this is likely to be safer than the equivalent Rust by a team that ignores it. And not just safer, but also more pleasant and productive to work on.

Oh, one last thing: AI-generated code

No software article these days is complete without a reference to AI-generated code. AI can be a boon for developer productivity. But without care, it can be a nightmare for safety. Lots of AI-generated code without good practices and tooling threatens to be a recipe for disaster. Particularly if you’re writing systems code, then it is incumbent on you fully to understand the code written by you, or the AI on your behalf. Just staring at it and then accepting is not enough — you must use thorough testing, fuzzing, sanitizers, and most importantly make sure that you fully understand what the code is really doing.

I’d love to know your thoughts on the topic of C++ safety. Please connect and message me on LinkedIn to let me know.

“How Did This Variable Get That Value?”

Undo Bytes — Fri, 20 Jan 2023 16:30:43 GMT

It’s quite common when diagnosing an issue in an application to come across an unexpected value in a variable or data structure, and to wonder: “How did that value get there?”

With time travel debugging, it is now possible to find this out in a quick and efficient way: Undo’s LiveRecorder recently introduced the lastcommand to help C++ developers answer precisely that question.

The last command jumps to the last time in execution history when the value of a variable or expression was modified.
You can keep repeating last to keep jumping back to previous times that the value changed.

https://medium.com/media/5e26171ea9f1b26bbaa4f9f2e960955f/href

Learn more about how to save time debugging complex C++ applications with LiveRecorder.