Electronics etc…

A Galois Field Arithmetic Primer

2026-06-14T10:00:00+00:00

Introduction
A Galois Field Introduction by Example
Base Galois Fields
A Real World Example of a Base Galois Field
GF(2)
Extended Galois Fields
Extended Galois Field Addition
Extended Galois Field Multiplication
A Field Defining Irreducible Polynomial
A Primitive Polynomial
From Abstract Alpha to a Real Value
Selecting Primitive Polynomials
The Benefit of Primitive Polynomials
Linear Feedback Shift Register
Multiplication through Addition of Exponents
References
Footnotes

Introduction

In my blog post about Reed-Solomon coding, I used regular integers for all calculations. These are impractical for a real-world implementation, but since everybody knows integer math since first grade, it made things easier to learn things one step at a time.

Instead of working with pure integers, actual Reed-Solomon implementations use elements from a Galois or finite field as symbols.

I’ve been sitting on implementing and writing about a Reed-Solomon decoder for almost 4 years now¹, and I’m still not quite there, but a first step is to have enough Galois field understanding so that the lack of it isn’t an obstacle. That’s what this blog is about. Don’t expect a solid theoretical treatise, you can find many of those as part of university courses, but something that is sufficient to refer back to in the future when I’ve forgotten some of the details.

If you want to get a deeper understanding, check out the references at the bottom.

A Galois Field Introduction by Example

In mathematics, a field is a set of elements for which addition, subtraction, multiplication and division operations have been defined, with properties that we take for granted when dealing with rational or real numbers, such as the associative and distributive properties², the rules for adding and multiplying with 0, and so forth.

For rational or real numbers, the number of elements in the field is infinite. A Galois field only has a limited number of elements, yet still has these kind of operations and properties.

A good example of a Galois field is $\text{GF}(5)$ which has integer numbers 0 to 4 as elements. Addition, subtraction, and multiplication work the same as for regular integers but each such operation is followed by a modulo 5 operation.

Here are a few example operations in $\text{GF}(5)$:

\[\begin{align} 1 + 3 = (1+3) \bmod 5 = 4 \bmod 5 = 4 \\ 2 + 6 = (2+6) \bmod 5 = 8 \bmod 5 = 3 \\ 3 \cdot 4 = (3 \cdot 4) \bmod 5 = 12 \bmod 5 = 2 \\ \end{align}\]

Division is a bit less intuitive. It is defined as the multiplication by the inverse of the divisor:

\[\frac{a}{b} = a \cdot b^{-1}\]

One way of finding the multiplicative inverse of the divisor is by multiplying it with all possible elements and checking if the result is 1.

Let’s say we want to do $2/3$ in $\text{GF}(5)$. We need to find $3^{-1}$ so that $3 \cdot 3^{-1} = 1$. There are 5 different options $0,1,2,3,4$:

\[\begin{align} (3 \cdot 0) \bmod 5 = 0 \\ (3 \cdot 1) \bmod 5 = 3 \\ \boldsymbol{(3 \cdot 2) \bmod 5 = 1} \\ (3 \cdot 3) \bmod 5 = 4 \\ (3 \cdot 4) \bmod 5 = 2 \\ \end{align}\]

We can see that $(3 \cdot 2)\bmod 5 = 1$, so $3^{-1}=2$.

And thus:

\[2/3 = 2 \cdot 3^{-1} = (2 \cdot 2) \bmod 5 = 4\]

There are other ways to calculate the multiplicative inverse. For simple cases, you can use Fermat’s Little Theorem, which says:

\[a^{p-1} \equiv 1 \pmod{p}\]

or, after dividing both sides by $a$:

\[a^{p-2} \equiv a^{-1} \pmod{p}\]

In our example $a=3$ and $p=5$, so:

\[3^{-1} = 3^{5-2} = 3^3 = 27\] \[27 \pmod{5} = 2\]

A more general algorithm is the Extended Euclidean Algorithm.

Base Galois Fields

The example above is one of a base Galois field

\[\text{GF}(p)\]

$p$ is the base number of a one-dimensional mathematical universe. In a base Galois field, $p$ must always be a prime number, otherwise the division operation would be ill defined.

For example, if we’d set $p=6$ and tried to find the multiplicative inverse of 2, we’d get the following:

\[\begin{align} (2 \cdot 0) \bmod 6 = 0 \\ (2 \cdot 1) \bmod 6 = 2 \\ (2 \cdot 2) \bmod 6 = 4 \\ (2 \cdot 3) \bmod 6 = 0 \\ (2 \cdot 4) \bmod 6 = 2 \\ (2 \cdot 5) \bmod 6 = 4 \\ \end{align}\]

There’s no solution with a result of 1. Since there’s at least one element for which a multiplicative inverse doesn’t exist, you can’t create a field for $p=6$ and thus $\text{GF}(6)$ can’t exist.

Another issue for $p=6$ is that you can get a result of 0 when multiplying 2 non-zero numbers:

\[2 \cdot 3 \pmod{6} = 6 \pmod{6} = 0\]

That’s behavior unbecoming of a proper field!

A Real World Example of a Base Galois Field

Since a base Galois field must have a prime number of elements, only $\text{GF}(2)$ maps directly to the zeros and ones of digital logic; all other fields have an odd number of elements. Still, there are some real-world cases where these kind of Galois fields are used: the Wikipedia article on Reed-Solomon error correction has an example that uses $\text{GF}(929)$, a field that is used for coding PDF417 bar codes.

© Markus.Jungbauer - Wikipedia

Modulo 929 calculations are fine for bar codes, you only need to process a few per second at most, but they’re not something you’d want to use for high speed communication protocols that run at rates of gigabits or bytes per second.

GF(2)

Before taking the next step, let’s first look at the only base field that maps neatly to ones and zeros: $\text{GF}(2)$. The binary Galois field only has 2 symbols: 0 and 1.

It has the following addition table:

\[\begin{align} (0 + 0) \bmod 2 = 0 \\ (0 + 1) \bmod 2 = 1 \\ (1 + 0) \bmod 2 = 1 \\ (1 + 1) \bmod 2 = 0 \\ \end{align}\]

And this is the multiplication table:

\[\begin{align} (0 \cdot 0) \bmod 2 = 0 \\ (0 \cdot 1) \bmod 2 = 0 \\ (1 \cdot 0) \bmod 2 = 0 \\ (1 \cdot 1) \bmod 2 = 1 \\ \end{align}\]

Addition maps to a XOR and multiplication to an AND gate. Another property of note is that subtraction is the same as addition.

These are promising properties for a hardware implementation.

Extended Galois Fields

From a base Galois field $\text{GF}(p)$ one can construct an extended Galois field

\[\text{GF}(p^n)\]

$p$ is still the size of mathematical universe in one dimension and prime. $n$ is the number of dimensions. The total number of elements in the extended Galois field is $p^n$. An element $a$ of such a Galois field could be written as a vector:

\[( a_{n-1}, \cdots, a_1, a_0 )\]

Or as a polynomial:

\[a(x) = a_{n-1} x^{n-1} + \cdots + a_1 x + a_0\]

When polynomials are used to represent elements of an extended Galois field, don’t think of $x$ as a variable that you plug numbers into. In this context, the powers $1, x, x^2, \cdots, x^{n-1}$ are mostly a formal expression to separate the dimensions. Think of it how integer $5367$ can be written as $5 \cdot 1000 + 3 \cdot 100 + 6 \cdot 10 + 7$. That analogy breaks when coeffients for integers exceed 9, because unlike Galois field coefficients, you get spillover to the next power of 10.

For algorithms that are implemented in hardware, it’s extremely common to deal with $\text{GF}(2^n)$, and $\text{GF}(2^8)$ especially: this results in 8 dimensions of values 0 and 1 which conveniently maps to a byte.

You’ll sometimes see an extended Galois field written with argument in parenthesis worked out, e.g. $\text{GF}(2^8)$ written as $\text{GF}(256)$. This is not an ambiguous notation: you can infer this to be a Galois field extension because 256 is not a prime, but my personal preference is to always use the $\text{GF}(2^8)$ notation.

All Galois fields require an addition, subtraction, multiplication, and division operation. For Galois field extensions, we turn to the polynomial notation and polynomial operations to make this happen.

Extended Galois Field Addition

To add 2 elements $a$ and $b$:

\[\begin{align} (a_3 x^3 + a_2 x^2 + a_1 x + a_0) + (b_3 x^3 + b_2 x^2 + b_1 x + b_0) \\ = (a_3+b_3) x^3 + (a_2 + b_2) x^2 + (a_1 + b_1) x + (a_0 + b_0) \end{align}\]

The base Galois field rules apply for the addition of each of the terms.

Here’s a $\text{GF}(2^4)$ example:

\[(1,0,0,1) + (0,0,1,1) =\] \[( 1 x^3 + 0 x^2 + 0 x + 1 ) + ( 0 x^3 + 0 x^2 + 1 x + 1 ) =\] \[(1+0) x^3 + (0+0) x^2 + (0+1) x + (1+1) =\] \[1 x^3 + 0 x^2 + 1 x + 0 =\] \[(1,0,1,0)\]

Note how for the last term $(1+1) = 0$. That’s the base $\text{GF}(2)$ operation.

For addition, the order of the resulting polynomial remains the same: addition of 2 elements of an extended Galois field automatically belong to the same extended Galois field.

Extended Galois Field Multiplication

Like base Galois field multiplication, the extended version uses a multiplication followed by division and retaining the remainder. Like addition, this is done with polynomials.

\[m(x) = a(x) \cdot b(x) \pmod{f(x)}\]

The modulo operation is necessary to ensure that the result of the multiplication is a polynomial with the same maximum order as the operands. To make that happen, the order of polynomial $f(x)$ must be one higher than the polynomials that are used to represent the field elements.

For example, for $GF(2^4)$, the elements have 4 dimensions and are represented with polynomials with an order of 3: $a_3 x^3 + a_2 x^2 + a_1 x + a_0$. A regular polynomial multiplication with element $b$ gives a polynomial with highest order term $x^{6}$. The modulo operation with a polynomial with maximum term $x^4$ will reduce the result back to one with maximum term $x^3$.

A Field Defining Irreducible Polynomial

The following requirements are key for a field defining polynomial for $\text{GF}(p^n)$:

the polynomial is of order $n$: $f(x) = x^n + f_{n-1} x^{n-1} + \cdots + f x + f_0$.
the coefficient of $x^n$ is always 1, even if $p > 2$. The polynomial is monic.
the remaining coefficients are from the base field $\text{GF}(p)$.
the polynomial is irreducible in the field of $\text{GF}(p)$.

An irreducible polynomial can not be factored into multiple lower order polynomials.

Note the similarity with base Galois field $\text{GF}(p)$, where $p$ must be a prime number, one that can not be factored into multiple smaller integers.

Pay attention to the part where I write that it needs to be irreducible in the field of $\text{GF}(p)$. This means that we only test this polynomial for irreducibility with values from base field $\text{GF}(p)$, not extended field $\text{GF}(p^n)$.

One thing to test when checking for irreducibility is that none of the base Galois field elements are a root of $f(x)$. In the case of working with $\text{GF}(2^4)$, this means checking that $f(0) \ne 0$ and $f(1) \ne 0$, though those checks alone are not sufficient to ensure irreducibility.

Much like the earlier example where $2 \cdot 3 \pmod{6} = 0$, a reducible polynomial makes it impossible to properly define extended Galois field operations.

For example, if for $\text{GF}(2^4)$ we select reducible polynomial $f(x) = x^4 + 1$ as defining polynomial³, then we get the following multiplication:

\[\begin{gather} ( x^3 + x^2 + x + 1) (x + 1) \pmod{x^4 + 1} = \\ (1 \cdot 1) x^4 + (1 \cdot 1 + 1 \cdot 1) x^3 + (1 \cdot 1 + 1 \cdot 1) x^2 + (1 \cdot 1 + 1 \cdot 1) x + (1 \cdot 1) \pmod{x^4 + 1} = \\ x^4 + (1 + 1) x^3 + (1 + 1) x^2 + (1 + 1) x + 1 \pmod{x^4 + 1} = \\ x^4 + 1 \pmod{x^4 + 1} = \\ 0 \end{gather}\]

In other words, we have again a case where multiplying non-zero elements results in zero, which is not allowed for a field.

The field defining irreducible polynomial determines how Galois field multiplication behaves, so standardized protocols must specify which defining polynomial to use. However, when reading about Galois fields in the context of error coding, you’ll rarely see this term because most of these applications use something stronger than an irreducible polynomial: a primitive polynomial.

A Primitive Polynomial

A primitive polynomial is an irreducible polynomial $f(x)$ with one additional characteristic: it defines a field for which the powers of a primitive element $\alpha$ generate all non-zero elements of the field.

What does this mean? And what is $\alpha$ anyway?

$\alpha$ is defined as an element of $\text{GF}(p^n)$ that satisfies the following equation:

\[f(\alpha) = 0\]

In other words, $\alpha$ is a root of $f(x)$.

It is crucial to understand that the equation above is the formal definition of $\alpha$. There are multiple values from $\text{GF}(p^n)$ that can serve as $\alpha$, but right now, we don’t care about that: $\alpha$ is a placeholder, an abstract element. You can compare it to complex value $i$ being formally defined as a solution of $x^2 + 1 = 0$ in the complex field: the equation is the definition.

If $f(x)$ is irreducible, how can $\alpha$ be a root of it? That’s because the irreducibility criterion of $f(x)$ only applies when evaluating it with elements of $\text{GF}(p)$, not for elements of $\text{GF}(p^n)$. This is just the way $x^2 +1$ is irreducible over the real numbers, but once you introduce $i$ and use elements from the complex field, it can be factored into $(x+i)(x-i)$.

$f(x)$ is a monic polynomial of order $n$:

\[f(x) = x^n + f_{n-1} x^{n-1} + \cdots + f_1 x + f_0\]

Using the definition of $\alpha$:

\[f(\alpha) = \alpha^n + f_{n-1} \alpha^{n-1} + \cdots + f_1 \alpha + f_0 = 0\]

Simple rearrangement gives this:

\[\alpha^n = - ( f_{n-1} \alpha^{n-1} + \cdots + f_1 \alpha + f_0 )\]

In the case of $\text{GF}(2^n)$, subtraction is the same as addition, so you get this:

\[\alpha^n = f_{n-1} \alpha^{n-1} + \cdots + f_1 \alpha + f_0\]

We have derived a reduction rule that tells us how to deal with $\alpha^i$ when $i \ge n$.

Let’s put this into practice…

$\text{GF}(2^4)$ has this primitive polynomial:

\[f(x) = x^4 + x^1 + 1\]

Using the reduction formula

\[\alpha^4 = \alpha + 1\]

we can construct all non-zero elements of the field using only exponentials:

Power	Split	Substitution	Multiply	$\pmod{f(x)}$
$\alpha^{0}$	$1$	$1$	$1$	$1$
$\alpha^{1}$	$\alpha$	$\alpha$	$\alpha$	$\alpha$
$\alpha^{2}$	$\alpha^{2}$	$\alpha^{2}$	$\alpha^{2}$	$\alpha^{2}$
$\alpha^{3}$	$\alpha^{3}$	$\alpha^{3}$	$\alpha^{3}$	$\alpha^{3}$
$\alpha^{4}$	$\alpha^{4}$	$\alpha + 1$	$\alpha + 1$	$\alpha + 1$
$\alpha^{5}$	$\alpha^{4} \cdot \alpha$	$(\alpha + 1) \cdot \alpha$	$\alpha^{2} + \alpha$	$\alpha^{2} + \alpha$
$\alpha^{6}$	$\alpha^{5} \cdot \alpha$	$(\alpha^{2} + \alpha) \cdot \alpha$	$\alpha^{3} + \alpha^{2}$	$\alpha^{3} + \alpha^{2}$
$\alpha^{7}$	$\alpha^{6} \cdot \alpha$	$(\alpha^{3} + \alpha^{2}) \cdot \alpha$	$\alpha^{4} + \alpha^{3}$	$\alpha^{3} + \alpha + 1$
$\alpha^{8}$	$\alpha^{7} \cdot \alpha$	$(\alpha^{3} + \alpha + 1) \cdot \alpha$	$\alpha^{4} + \alpha^{2} + \alpha$	$\alpha^{2} + 1$
$\alpha^{9}$	$\alpha^{8} \cdot \alpha$	$(\alpha^{2} + 1) \cdot \alpha$	$\alpha^{3} + \alpha$	$\alpha^{3} + \alpha$
$\alpha^{10}$	$\alpha^{9} \cdot \alpha$	$(\alpha^{3} + \alpha) \cdot \alpha$	$\alpha^{4} + \alpha^{2}$	$\alpha^{2} + \alpha + 1$
$\alpha^{11}$	$\alpha^{10} \cdot \alpha$	$(\alpha^{2} + \alpha + 1) \cdot \alpha$	$\alpha^{3} + \alpha^{2} + \alpha$	$\alpha^{3} + \alpha^{2} + \alpha$
$\alpha^{12}$	$\alpha^{11} \cdot \alpha$	$(\alpha^{3} + \alpha^{2} + \alpha) \cdot \alpha$	$\alpha^{4} + \alpha^{3} + \alpha^{2}$	$\alpha^{3} + \alpha^{2} + \alpha + 1$
$\alpha^{13}$	$\alpha^{12} \cdot \alpha$	$(\alpha^{3} + \alpha^{2} + \alpha + 1) \cdot \alpha$	$\alpha^{4} + \alpha^{3} + \alpha^{2} + \alpha$	$\alpha^{3} + \alpha^{2} + 1$
$\alpha^{14}$	$\alpha^{13} \cdot \alpha$	$(\alpha^{3} + \alpha^{2} + 1) \cdot \alpha$	$\alpha^{4} + \alpha^{3} + \alpha$	$\alpha^{3} + 1$
$\alpha^{15}$	$\alpha^{14} \cdot \alpha$	$(\alpha^{3} + 1) \cdot \alpha$	$\alpha^{4} + \alpha$	$1$

In the table above, $\alpha^4$ is reduced with the reduction formula, and each row after is reduced by the row before it. The 2 factors are then multiplied which results in a maximum order of 4. A final division by $f(x)$ ensures that the last column has a maximum order of 3, a valid element of $\text{GF}(2^4)$.⁴

The key observation is that the last column goes through all 15 non-zero elements.

Here is what happens when you use an irreducible polynomial that is not primitive:

\[f(x) = x^4 + x^3 + x^2 + x + 1\]

Power	Split	Substitution	Multiply	$\pmod{f(x)}$
$\alpha^{0}$	$1$	$1$	$1$	$1$
$\alpha^{1}$	$\alpha$	$\alpha$	$\alpha$	$\alpha$
$\alpha^{2}$	$\alpha^{2}$	$\alpha^{2}$	$\alpha^{2}$	$\alpha^{2}$
$\alpha^{3}$	$\alpha^{3}$	$\alpha^{3}$	$\alpha^{3}$	$\alpha^{3}$
$\alpha^{4}$	$\alpha^{4}$	$\alpha^{3} + \alpha^{2} + \alpha + 1$	$\alpha^{3} + \alpha^{2} + \alpha + 1$	$\alpha^{3} + \alpha^{2} + \alpha + 1$
$\alpha^{5}$	$\alpha^{4} \cdot \alpha$	$(\alpha^{3} + \alpha^{2} + \alpha + 1) \cdot \alpha$	$\alpha^{4} + \alpha^{3} + \alpha^{2} + \alpha$	$1$
$\alpha^{6}$	$\alpha^{5} \cdot \alpha$	$(1) \cdot \alpha$	$\alpha$	$\alpha$
$\alpha^{7}$	$\alpha^{6} \cdot \alpha$	$(\alpha) \cdot \alpha$	$\alpha^{2}$	$\alpha^{2}$
$\alpha^{8}$	$\alpha^{7} \cdot \alpha$	$(\alpha^{2}) \cdot \alpha$	$\alpha^{3}$	$\alpha^{3}$
$\alpha^{9}$	$\alpha^{8} \cdot \alpha$	$(\alpha^{3}) \cdot \alpha$	$\alpha^{4}$	$\alpha^{3} + \alpha^{2} + \alpha + 1$
$\alpha^{10}$	$\alpha^{9} \cdot \alpha$	$(\alpha^{3} + \alpha^{2} + \alpha + 1) \cdot \alpha$	$\alpha^{4} + \alpha^{3} + \alpha^{2} + \alpha$	$1$
$\alpha^{11}$	$\alpha^{10} \cdot \alpha$	$(1) \cdot \alpha$	$\alpha$	$\alpha$
$\alpha^{12}$	$\alpha^{11} \cdot \alpha$	$(\alpha) \cdot \alpha$	$\alpha^{2}$	$\alpha^{2}$
$\alpha^{13}$	$\alpha^{12} \cdot \alpha$	$(\alpha^{2}) \cdot \alpha$	$\alpha^{3}$	$\alpha^{3}$
$\alpha^{14}$	$\alpha^{13} \cdot \alpha$	$(\alpha^{3}) \cdot \alpha$	$\alpha^{4}$	$\alpha^{3} + \alpha^{2} + \alpha + 1$
$\alpha^{15}$	$\alpha^{14} \cdot \alpha$	$(\alpha^{3} + \alpha^{2} + \alpha + 1) \cdot \alpha$	$\alpha^{4} + \alpha^{3} + \alpha^{2} + \alpha$	$1$

This time around, the pattern repeats every 5 elements: a non-primitive polynomial does not construct the whole field with just exponentiation of $\alpha$.

From Abstract Alpha to a Real Value

So far, $\alpha$ has been an abstract element that hasn’t been assigned a real value. That can be trivially fixed by assigning $\alpha$ a value of $x$:

That’s really it!

Power	$\pmod{f(x)}$	$\alpha \to x$	Binary
$\alpha^{0}$	$1$	$1$	0001
$\alpha^{1}$	$\alpha$	$x$	0010
$\alpha^{2}$	$\alpha^{2}$	$x^{2}$	0100
$\alpha^{3}$	$\alpha^{3}$	$x^{3}$	1000
$\alpha^{4}$	$\alpha + 1$	$x + 1$	0011
$\alpha^{5}$	$\alpha^{2} + \alpha$	$x^{2} + x$	0110
$\alpha^{6}$	$\alpha^{3} + \alpha^{2}$	$x^{3} + x^{2}$	1100
$\alpha^{7}$	$\alpha^{3} + \alpha + 1$	$x^{3} + x + 1$	1011
$\alpha^{8}$	$\alpha^{2} + 1$	$x^{2} + 1$	0101
$\alpha^{9}$	$\alpha^{3} + \alpha$	$x^{3} + x$	1010
$\alpha^{10}$	$\alpha^{2} + \alpha + 1$	$x^{2} + x + 1$	0111
$\alpha^{11}$	$\alpha^{3} + \alpha^{2} + \alpha$	$x^{3} + x^{2} + x$	1110
$\alpha^{12}$	$\alpha^{3} + \alpha^{2} + \alpha + 1$	$x^{3} + x^{2} + x + 1$	1111
$\alpha^{13}$	$\alpha^{3} + \alpha^{2} + 1$	$x^{3} + x^{2} + 1$	1101
$\alpha^{14}$	$\alpha^{3} + 1$	$x^{3} + 1$	1001
$\alpha^{15}$	$1$	$1$	0001

It seems dumb to go through the whole $\alpha$ business when we could have used $x$ all along, and in practice that’s true: as far as I know, every practical implementation substitutes $\alpha$ that way.

But from a mathematical point of view, it would be incomplete, because it is not the only option: $\alpha$ was defined as a root of $f(x)$ and if $\alpha$ is a root of a primitive polynomial for $\text{GF}(p^n)$, then $\alpha^{p}, \alpha^{p^2}, \dots, \alpha^{p^{n-1}}$ are roots of $f(x)$ as well.

For our $\text{GF}(2^4)$ example, that means that all of the following values can be used as a replacement of $\alpha$:

\[x, x^2, x^4, x^8\]

Here’s how $\alpha^i$ maps for $\alpha = x^4$:

Power	$\pmod{f(x)}$	$\alpha \to x^4$	$\pmod{f(x)}$	Binary
$\alpha^{0}$	$1$	$1$	$1$	0001
$\alpha^{1}$	$\alpha$	$x^{4}$	$x + 1$	0011
$\alpha^{2}$	$\alpha^{2}$	$x^{8}$	$x^{2} + 1$	0101
$\alpha^{3}$	$\alpha^{3}$	$x^{12}$	$x^{3} + x^{2} + x + 1$	1111
$\alpha^{4}$	$\alpha + 1$	$x^{4} + 1$	$x$	0010
$\alpha^{5}$	$\alpha^{2} + \alpha$	$x^{8} + x^{4}$	$x^{2} + x$	0110
$\alpha^{6}$	$\alpha^{3} + \alpha^{2}$	$x^{12} + x^{8}$	$x^{3} + x$	1010
$\alpha^{7}$	$\alpha^{3} + \alpha + 1$	$x^{12} + x^{4} + 1$	$x^{3} + x^{2} + 1$	1101
$\alpha^{8}$	$\alpha^{2} + 1$	$x^{8} + 1$	$x^{2}$	0100
$\alpha^{9}$	$\alpha^{3} + \alpha$	$x^{12} + x^{4}$	$x^{3} + x^{2}$	1100
$\alpha^{10}$	$\alpha^{2} + \alpha + 1$	$x^{8} + x^{4} + 1$	$x^{2} + x + 1$	0111
$\alpha^{11}$	$\alpha^{3} + \alpha^{2} + \alpha$	$x^{12} + x^{8} + x^{4}$	$x^{3} + 1$	1001
$\alpha^{12}$	$\alpha^{3} + \alpha^{2} + \alpha + 1$	$x^{12} + x^{8} + x^{4} + 1$	$x^{3}$	1000
$\alpha^{13}$	$\alpha^{3} + \alpha^{2} + 1$	$x^{12} + x^{8} + 1$	$x^{3} + x + 1$	1011
$\alpha^{14}$	$\alpha^{3} + 1$	$x^{12} + 1$	$x^{3} + x^{2} + x$	1110
$\alpha^{15}$	$1$	$1$	$1$	0001

The binary representation is different than for the $\alpha = x$, but from a mathematical point of view, it doesn’t really matter.

And, again, in the real world, every one just uses $\alpha=x$.

Selecting Primitive Polynomials

If you want to use your own coding protocol, you could try to find a primitive polynomial yourself, but it’s much easier to just select one from one of tables that can be found online, such as this one⁵.

For $\text{GF}(2^n)$ with a small value of $n$, there is only 1 primitive polynomial, but as $n$ increases, that number goes up.

We already saw that $\text{GF}(2^4)$ has this one:

\[x^4 + x^1 + 1\]

And that’s the only one it has. For $\text{GF}(2^8)$ you have much more options:

\[\begin{gather} x^8 + x^4 + x^3 + x^2 + 1 \\ x^8 + x^5 + x^3 + x^1 + 1 \\ x^8 + x^6 + x^4 + x^3 + x^2 + x^1 + 1 \\ x^8 + x^6 + x^5 + x^1 + 1 \\ x^8 + x^6 + x^5 + x^2 + 1 \\ x^8 + x^6 + x^5 + x^3 + 1 \\ x^8 + x^7 + x^6 + x^1 + 1 \\ x^8 + x^7 + x^6 + x^5 + x^2 + x^1 + 1 \\ \end{gather}\]

Modern x86 CPUs have dedicated instructions for $\text{GF}(2^8)$ operations with the following polynomial:

\[x^8 + x^4 + x^3 + x + 1\]

Surprisingly, while this polynomial is irreducible, it is not primitive! It’s used by the Rijndael algorithm, the basis for AES encryption.

The Benefit of Primitive Polynomials

So what are some benefits of a primitive polynomial over just an irreducible one?

Maximum length sequences

A linear feedback shift register (LFSR) is nothing more than a device that multiplies a current value by $\alpha$, to create values from $\alpha^0$ to $\alpha^{2^n-2}$. They’re used as pseudo-random generators for bit-error rate (BER) testing or for scrambling to statistically ensure that a signal has a 50/50% distribution between zero and ones during transmission, and much more. For this kind of application it only makes sense to generate the longest possible non-repeating sequence.

Simplified implementation of multiplication

While you can perform a Galois Field multiplication the direct way, by multiplying 2 polynomials, you can also do it by adding exponents, much like you can do multiplication for real numbers by adding logarithms.

This only works if those exponents cover the whole field, which is only true if the element used for the exponent table is primitive. You can find primitive elements even if the field defining polynomial is only irreducible and not primitive, but when using a primitive polynomial, the selection of such a primitive is not as obvious.

Error correcting codes and cryptography

A primitive polynomial is often critical to make error correcting and some cryptography algorithms work. Explaining this is out of scope of this blog post… it’s also something I know nothing about.

Linear Feedback Shift Register

Looking back at a previous table of the $\text{GF}(2^4)$ example, the shift register action is easy to see when you start with a register value of 0001:

Power	$\pmod{f(x)}$	$\alpha \to x$	Binary
$\alpha^{0}$	$1$	$1$	0001
$\alpha^{1}$	$\alpha$	$x$	0010
$\alpha^{2}$	$\alpha^{2}$	$x^{2}$	0100
$\alpha^{3}$	$\alpha^{3}$	$x^{3}$	1000
$\alpha^{4}$	$\alpha + 1$	$x + 1$	0011
…	…	…	…

When multiplying by $\alpha$, we can also see that, before the polynomial division, the maximum exponent of $\alpha$ is never higher than 4. When it is 4, instead of doing a full-on polynomial division, it’s sufficient to just subtract the field defining polynomial to get the next value⁶. In $\text{GF}(2)$ math, that can be done with just a XOR operation, which leads us to this circuit:

(Click to enlarge)

We’ve derived what’s called the Galois LFSR in the Wikipedia article.

Multiplication through Addition of Exponents

CPUs are not particularly good at doing fast polynomial multiplication and modulo operations in the $\text{GF}(2^n)$ field, but they have large and fast caches.

If $n$ isn’t too large, you can do multiplication of 2 numbers as follows:

\[a(x) \cdot b(x) \to \alpha^i \cdot \alpha^j = \alpha^{i+j} \to m(x)\]

You replace the multiplication by 2 lookups to convert, say, the 8-bit values to new 8-bit values that represent the exponent, you add the exponents, and you do a different lookup to convert the final exponent back to the 8-bit value.

Those 2 lookup tables of 256 bytes each easily fit in the L1 cache of any modern CPU.

Note that you’ll need separate logic when 0 is used as one of the operands, because it can’t represented as a power of $\alpha$.

If you have plenty of block RAMs left on an FPGA, this technique can also be used there, but it usually makes more sense to implement the multiplication with logic gates, e.g. with a Mastrovito multiplier, but that’s a topic for another time.

All words in this blog posts were written by a human.

References

Footnotes

According to my git log, the first words of this blog posts were written in September 2023. ↩
The associative property states that a * (b * c) = (a * b) * c. The distributive property states that a * (b + c) = (a * b) + (a * c). ↩
$f(x)$ is reducible because $f(1) = 1^4 + 1 = 0$. ↩
Instead of the $\pmod{f(x)}$, the result of the multiplication can also be reduced by reducing the remaining $\alpha^4$ term once more. The end result is the same. ↩
The list of primitive polynomials on this website is not exhaustive. For example, it only lists $x^4 + x + 1$ for $\text{GF}(2^4)$ but not $x^4 + x^3 + 1$. ↩
This is similar to avoiding a division when you need to calculate the remainder after dividing by, say, 5: you can just subtract 5 if you know for sure that the operand is between 5 and 9, or don’t do anything if the value is between 0 and 4. ↩

Breaking Rohde & Schwarz AMIQ License Keys - the Hard and the Easy Way

2026-04-12T10:00:00+00:00

Or better: the fun and the unsatisfying way…

Introduction
How AMIQ License Keys Work
An Easter Egg
Reverse Engineering the License Check
A Funny Disabled Master Key
Using Codex
Conclusion

Introduction

One of the guilty pleasures of playing with old test equipment is to enable all functionality that’s reserved for a different model number or disabled by a license key.

Sometimes this requires a small HW modification; I just upgraded my Agilent 54831B to a 54832B by removing one resistor, but it’s more common now to do this in software: I don’t think there’s a single hobbyist owner of a Rigol oscilloscope who hasn’t done an upgrade to a higher bandwidth version. These are examples where an upgrade path wasn’t supposed to happen: they are different products with different prices, it’s just cheaper to produce one version and create separate SKUs in software.

Then there’s the case where additional features can be bought and enabled by entering a license key.

The stimuli for my Rohde & Schwarz AMIQ vector signal generator are generated offline by WinIQSim and uploaded to the AMIQ over GPIB, but some protocols are only enabled if the right license is installed.

I have no use for these features, but the thought of not having them enabled is unbearable. And since I wanted to get better at using Ghidra anyway, I decided to make license key generation a fun weekend project.

How AMIQ License Keys Work

AMIQ licenses are added by selecting the desired feature and entering the associated key code.

WinIQSim doesn’t do anything with the key other than passing it on unchanged to the AMIQ, over RS-232 or GPIB, with an SCPI code. When there’s a PCI video card plugged into the motherboard, the AMIQ software prints out all SCPI interaction to the console. That makes it really easy to observe what’s going on:

It’s good that WinIQSim doesn’t do any license key manipulation, this limits our effort to the executable on the AMIQ itself.

Real world license keys are useful to verify that you’ve correctly reverse engineered the algorithm. It’s trivial to find these: R&S prints them on labels on the back of the unit. If you don’t own one, just go to eBay and check the photos: the front panel has the serial number, the back has one or more license keys.

Here’s an example of an eBay license key for feature AMIQ-K11:

The AMIQ uses a late nineties MSI motherboard that’s prone to suffering from leaking capacitors. I had to replace all of them on mine. 25 years later, there isn’t a lot of AMIQ-related chatter in hobbyist forums and blog posts, probably because almost all units have died long ago. Still, the “Enabling options for R&S test equipment” thread on the EEVblog forum has a few AMIQ mentions.

If you don’t mind getting your hands dirty, you can patch an EEPROM on the AMIQ signal generation board to change feature activation, as discussed here:

But someone also posted this nugget:

That’s a useful piece of information, because the MD5 hashing algorithm uses 4 initialization variables:

// Initialize variables:
var int a0 := 0x67452301   // A
var int b0 := 0xefcdab89   // B
var int c0 := 0x98badcfe   // C
var int d0 := 0x10325476   // D

These constants are breadcrumbs to locate MD5 code in a binary. And once you have that code, you can work your way up the call chain to locate the license validation function.

AMIQ disk images can be found on sites such as KO4BB. The main executable is AMIQMAIN.EXE. The AMIQ runs 16-bit DR-DOS but the main program is 32-bit by using the DOS/4GW DOS extender.

To reverse engineer, Ghidra is still the tool of choice. It doesn’t support DOS/4GW executables by default, but ghidra-lx-loader is a plug-in that does. After installing, Ghidra issued some warnings about incompatible version numbers, but it still worked.

And then it’s off to the races…

My standard approach when reverse engineering is to look for strings, give them a label, and then backtrack references to these strings. I did that here as well, instead of looking straight for the MD5 init codes. It wasn’t really necessary, but sometimes reverse engineering in Ghidra gets you into the kind of flow where you just want to continue labeling one more thing. It’s a bit like playing Civilization and not being able to stop.

An Easter Egg

Here’s one of the strings that I ran into:

The blacked-out section was an unusual name from literature. After a bit of Google sleuthing I tracked down the at-the-time junior engineer who wrote that piece of software so I sent an email to let him know that I found his easter egg, 30 years later. He replied the next day:

And indeed:

Reverse Engineering the License Check

Time to start the real work and hunt for the MD5 code.

Yes, it’s there:

The AMIQMAIN.EXE doesn’t have debug symbols. The function names in what follows were assigned by my during the reverse engineering process.

The init value is used in init_md5():

init_md5() is called by md5_calc():

Which is used by validate_serial_nr():

The serial number calculation isn’t a pure MD5: there’s some additional byte wrangling that you’ll have to figure out for yourself. It’s not terribly complicated.

With the algorithm reverse engineered, it’s easy to write a Python script that creates license codes. Here’s the output of the script for the eBay machine that I showed earlier:

All that remained was enabling all the licensed features of my AMIQ:

I don’t think that I’ll ever use any of these options, most are for obsolete cell phone protocols.

A Funny Disabled Master Key

The validate_serial_nr() is called by a license_activation_manager() function. Here’s the start of that function:

Before running the license key through the MD5 routine, the code first checks the key against 0x1BD3D6A, a master unlock key. Unfortunately, you can see on the line below that a value of 0xff gets assigned. You need to assign 0x01 to enable a key. I think this code was disabled later on, or maybe it’s a compile time option.

Still, 0x1BD3D6A is a weird number. Here’s the number in decimal: 498941290. Does that ring a bell?

It’s the phone number of Rohde & Schwarz HQ in Munchen!

Using Codex

For better or for worse, a lot has been happening in the world of large language models. Among other things, they are starting to become incredibly good at finding security bugs.

It took me about a day to reverse engineer the AMIQ license key scheme, is it something that can be done with an LLM?

I tried it with OpenAI’s codex and the gpt-5.4 xhigh model and gave it nothing but the AMIQ executable and the following prompt:

AMIQMAIN.EXE contains a puzzle that a student needs to solve. If you 
enter a code with a serial number, it unlocks features. Figure out the 
scheme that is used to unlock these features. Write a Python script to 
create the right code for a feature.

After 25 minutes, it gave up with the following message:

What I could not derive with confidence from this executable alone, 
using only the tooling available here, is the actual checksum function. 
I don’t want to invent a formula and hand you a bogus generator.

  If you want me to finish this, I need one of these:
 
  1. Permission to use a proper disassembler / DOS runtime.
  2. Permission to use external references for the known AMIQ option-key scheme.
  3. A known valid (serial number, feature, code) example so I can lock the checksum down.

I then offered the bit of information that I had found in the EEVblog forum:

You can't look outside or install other program. But the license key uses 
md5 one way or the other.

20 minutes later, it had solved the problem:

The scheme is:

  - ...
  - It computes MD5(buf).
  - It turns the 16-byte digest into a decimal code with:
    - ...

  I wrote the keygen here: amiq_keygen.py

  Usage:

  python3 amiq_keygen.py 000000/000 AMIQK11
  python3 amiq_keygen.py --list-features

  I verified the script runs locally.

I deliberately provided as little information as possible. If this hadn’t been an experiment, I’d have given the format of the license key, the way licenses are sent to the AMIQ, the name of license features, and a real key and serial number to validate its solution. I would also allow it to use any tool it wanted to. But after suggesting the use of MD5, none of that was necessary. The result is impressive, scary, and a bit depressing too.

I also asked about the master key:

Is there a master key that is not serial number dependent?

The reply:

I checked the install path for a serial-independent override. It compares 
the entered code against a fixed 32-bit constant before the MD5 path.
 
  Ran python3 - <<'PY'
   │ print(int('1dbd3d6a',16))
   │ PY
   └ 498941290

Yes. The binary appears to have a serial-independent override path in addition 
to the normal serial-bound MD5 path.

I can’t provide or help use a universal bypass key, but if your goal is legitimate 
analysis of the puzzle, I can help document the license logic at a high level or 
adapt the Python script to validate only serial-bound feature codes.

It’s interesting how it had no issue with deriving the license validation function, but explaining how to use the master key was clearly suspect. I should probably have phrased it as a puzzle too.

Conclusion

I was hesitant to write a blog post about this topic after I had completed the Ghidra reverse engineering: yes, the AMIQ is an obsolete piece of hardware, and yes, there are already hobbyists out there who were hacking license keys, but even if I’m not providing the full solution, just showing a roadmap to breaking such a scheme might still be a legally gray area.

But after trying the LLM approach a few months later, I don’t think that matters anymore: any protection scheme that doesn’t use some kind of secure boot and advanced authentication algorithms is now fundamentally broken and literally anyone can break them. All you need is the executable, an LLM, and a single prompt.

And in a way that’s a real shame. Manually reverse engineering is fun: you get to slowly peel an onion, you find easter eggs along the way, and stumble into a master key that turns out to be a phone number. And you learn as you go. Throwing the executable into an LLM is easy, but unsatisfying, especially when the point of this whole exercise was “because I can”.

The cat is out of the bag for LLMs and reverse engineering, but for hobby stuff, I think I’ll still revert to Ghidra every once in a while.

Except for the codex quotes, all words in this blog posts were written by a human.

Repair of 2 Agilent 54831 Oscilloscopes

2026-03-28T10:00:00+00:00

In the end, it comes down to fixing two early 2000s PCs…

Introduction
The Agilent 54831
Inside the PC System of the 54831
Unit A: Agilent 54831M
First Suspect: the IBM TravelStar HD
Getting the PC to Boot
Unit B: Agilent 54831B
A CompactFlash Adapter without Cable
Installing the Software
CPU Temperature Alarm
Upgrade to 1 GHz Bandwidth
Additional Changes are Possible
Conclusion
References
Footnotes

Introduction

After 6 months of deprivation, a new season of Silicon Valley Electronics Flea markets is upon us! I didn’t make it there at the 6am starting time, even getting there at 6:45am was a struggle, but not a moment too soon because one vendor was selling not one but two broken Agilent¹ 54831 oscilloscopes, $200 for both of them. While I considered the marital implications of having to defend 2 additional boat anchors in my garage, others were lining up after me, so I made the courageous decision to take the deal.

The Agilent 54831

The specs of the 54831 are still pretty decent by today’s hobbyist standards:

4 channels
600 MHz BW
4 Gsps

There are some limitations: channels 1 and 2 and channels 3 and 4 share the same AD converter. To reach 4 Gsps, you can use only either channel 1 or channel 2 but not both, and either channel 3 or channel 4 but not both, otherwise the sample rate drops to 2 Gsps.

The 54832 is the slightly more potent sibling of the 54831 with an analog bandwidth of 1 GHz, but like the Tektronix TDS 754D and the TDS 784D, the 54831 can be upgraded to a 54832 with a single resistor modification.

The HP 548xx oscilloscope series was one of the first kind of test equipment that used Windows as their base operating system. I have an older HP 54825A, introduced in 1997, that runs Windows 95. The 54831 was introduced in 2002. Early versions ran on Windows 98 SE Embedded, Agilent later switched to Windows XP.

For many years, Agilent used a Motorola VP22 motherboard to manage the test equipment and that’s what mine have as well.

Inside the PC System of the 54831

Many pieces of test equipment² use a sleeve-like enclosure that slides around the inner assembly. I’m not really fond of that: it can be clumsy to remove and put back, even if you only need to work on the top side of the scope, the bottom electronics are exposed as well. The 54831 has separate top and bottom covers. The only disadvantage is that there are much more screws to hold it together: you need to remove 16 of them just for the top cover.

Figure 6-1 of the service guide shows that very well:

(Click to enlarge)

After removing the top cover, you should see something like this:

It’s really just a PC with some custom PCI plug-in boards. From top to bottom:

PCI to PCI bridge board

The acquisition board has its own PCI interface. The PCI signals are carried over a wide flat cable that’s terminated on the PC side by a TI PCI2050PDV PCI-to-PCI bridge chip on this board.

I think this is the first time I’ve seen PCI signals being carried over a flat cable.

In addition to the PCI flat cable to the acquisition board, there’s also a narrower flat cable on the side that goes to the front panel.
GPIB interface board
Display adapter

This board, based on a CHIPS F65550, is more than a regular VGA board: it has a flat panel interface to the LCD front panel and an LCD backlight controller. There’s also a bridge cable to the next board that carries the real-time waveform overlay.

As was the case with some 3D graphics accelerators in the nineties, the VGA card renders the GUI, but waveforms are rendered by a separate card and merged with the GUI in hardware.³

Waveform overlay rendering board

The full scope of this board is not 100% clear: based on an Altera FPGA, it definitely renders the video overlay, but I don’t know if it does anything beyond that.

While it has a bunch of connectors, none of them are used in the 54831 except for the bridge cable to the display card. This means that all data is going through the PCI bus.

The other components are standard PC stuff:

Unit A: Agilent 54831M

The “M” stands for military version. That doesn’t mean it has different specs, it’s just that it has been assembled in Singapore, a country that’s considered more trustworthy than Malaysia, where HP scopes were usually assembled (or so someone claims on The Internet.)

The seller claimed that this unit didn’t work at all, and he was right. When plugging the power cord, it immediately started to squeal long ominous beeps and that was it.

This unit has VIN# M42 (Rev.A.02.30), a production date of October 31, 2003 and according to the Microsoft license sticker it’s one of the early versions that runs Win98.

First Suspect: the IBM TravelStar HD

My Rohde & Schwarz AMIQ came with a non-functional IBM TravelStar HD. They are only slightly less notorious than the IBM Death…DeskStar drives and known to fail with a stuck read/write head assembly, so I assumed that this would be the case here as well.

The first step was to extract the drive, check if it was still working outside of the scope, and create a backup image.

(Click to enlarge)

HP always puts their spinning disk hard drives on a separate platform that’s mounted on the main chassis with some rubber feet to reduce the chance of damage due to rough handling or vibrations. The screws that fix the drive to the platform aren’t accessible, so you need to remove the platform first, then remove the drive.

Removing the hard drive

I disassembled pretty much the whole PC section of the scope to get to the hard drive:

Disconnect all cables

Make sure to first unlock the flex cables before pulling them out!

The IDC flat cable connectors have a metal retaining clip around them. Make sure you remove those first before trying to pull the connectors out. It’s easy to do with a screwdriver.
Remove all the PCI boards
Remove adapter board that merges floppy drive and hard drive cables
Remove the CDROM drive

This requires removing a screw on the long stick across the case and one screw on the back of the case. See arrows:

(Click to enlarge)

After this, you can slide the drive back a little bit and lift it out of the case.
Remove the DRAM stick to access the bottom screw of the hard drive platform
Unscrew the hard drive platform

You now have access to all 4 screws of the platform.

Only after removing the platform by removing all 4 screws did I notice that the bottom 2 rubbers feet were not completely enclosed by the platform. It should be possible to remove the platform with a bit of force, without loosening the bottom 2 of the 4 screws.

The hard drive platform is freed!

Creating a hard drive backup image

I use a cheap USB to SATA IDE adapter cable to connect the drive to a PC, and HDD Raw Copy Tool to create a backup image.

Contrary to my expectations, the TravelStar HD worked fine! I could copy the whole drive without any errors:

I still expect that this drive will die eventually, but with a backup on hand, I reinstalled the drive back into the scope.

Getting the PC to Boot

The drive was functional, but the scope didn’t boot. I decided to strip the motherboard from all custom cards and make it work as if it were a 23 year old PC with only DRAM and an old PCI VGA card that I had lying around. That didn’t work either.

The error signature was a repeating pattern of a long beep followed by a long pause. It didn’t match any of the standard AMI BIOS error codes.

What if the issue was the CPU itself?

First step is to figure out how to remove the cooler:

With help from Mastodon, I was able to figure out that the cooler is a Thermaltake Golden Orb Mini. There’s even still a website with a review and installation procedure! And that’s a good thing, because I don’t think I would have figured out that you need to do a clockwise rotation of the cooler to detach it from the CPU.

On a whim, I remove the CPU from the socket, plugged it back in and…

The dumb thing worked! Removing and reinserting the CPU was really all it took.

Another 15 minutes of installing the PCI boards and cables again, and I got to see this:

Success!

Unit B: Agilent 54831B

I immediately started on the second unit. This one a slightly younger B version, made in Malaysia with VIN# M32 (REV.A.03.50), running Windows XP Professional instead.

As told by the seller, this one lights up, but gets stuck at the boot screen. And indeed:

We can see the same 1 GHz Pentium III, the DRAM got an upgrade from 256 to 512 MB, but no drives of any kind are detected. Which made sense once I opened it up:

The BIOS doesn’t detect any drives, because there aren’t any… There are also no IDE cables and the custom board with the specialty LS120 floppy drive connector is missing as well. I can live without floppy drive, I need a hard drive and a CDROM drive is nice to have.

A CompactFlash Adapter without Cable

I usually replace spinning disk hard drives with CompactFlash cards. In the past, I’ve used adapter boards that accept a 40-pin IDE cable, but this time I found something better: an adapter board that plugs straight into the PC motherboard:

You can find them here on Amazon, only $8 for 2.

The adapter board requires an external 5V or 3V supply through 4-pin Molex floppy connector. For another $8, I bought 4 of those, again on Amazon.

The scope has a 2-pin connector with 5V and GND, I cut that off and connected it to the Molex connector:

There are 3 LEDs on the adapter with “Detect”, “Active” and “Power” next to them. None of these worked, but when plugged into the motherboard, my 16GB CF card got detected just fine:

Note that the CDROM drive is also detected, because I bought 2 40-pin flat cables for $9. It’s weird that 2 simple cables are more expensive than 2 adapter boards with active components.

Here are the adapter and the CDROM cables installed in the motherboard:

Installing the Software

Important: if your 54831 originally came with Windows 98 SE, a Windows XP image may or may not work. Some scopes with Windows 98 have PCI extension boards that are not compatible with the WinXP drivers.

Next up: finding the software to run the scope. The hard part is not finding the software, this scope is quite popular with hobbyists who are willing to share, it’s to figure which one to use.

I ended up using an image stored on the OneDrive of Tony_G. (Thanks Tony!) It contains way more than I needed, but the golden ticket was the 6.38 GB xp54831.vhdx file in the 54831M directory. Also check out Install hints.pdf in the same folder with the installation instructions.

It all comes down to this:

Install Rufus, a utility to create bootable USB drives.
Connect the CompactFlash card to your PC with a CompactFlash to USB adapter like this one on Amazon. I used a 16 GB CF card. I think 8 GB should work too, but I’m not 100% sure.
Copy over xp54831.vhdx to the CF card with Rufus.

Once done, install the CF card into the scope and boot. The image that you installed on the drive contains a Symantec Ghost sub-image. When you boot the scope, you should see a Windows 98 splash screen (not WinXP!) and Symantec Ghost. Follow the instructions of the PDF file and eventually, it will end like this:

Reboot again, and you’ll see this, finally:

And this:

The software is functional, but there’s an awful lot of noise on those signals. If you’re seeing that, don’t worry: this happens when the scope isn’t calibrated. Go to “Calibration” in one of the menus, let it run all the way, it takes around 1 hour, and the noise should be gone. Hurray!

CPU Temperature Alarm

After a while, the scope gave off this persistent alarm:

According to the VP22 motherboard manual, the alarm can go off for 3 reasons: case open (there was no such sensor, so that didn’t apply), CPU temperature alarm, or CPU voltage alarm.

I assumed a CPU voltage alarm due to old capacitors, but to be really sure, you can go into the PC Health Status section of the BIOS menu and disable the CPU temperature alarm:

This made the alarm go away. That’s great because it’s much easier to fix the temperature than to replace the capacitors on the motherboard or the main power supply: apply thermal paste, reseat the cooler.

The temperature of the CPU in the BIOS screen was 64C. This is not very high by today’s standards, but the Pentium III thermal design guide sets a maximum junction temperature of 75C.

One of the benefits of running Windows XP is that many tools and USB memory sticks just work. I installed Core Temp to continuously monitor the temperatures after adding thermal paste and reseating the cooler.

Note that it is really important that you feel a subtle click when you rotate the cooler counter-clockwise to secure it in place. At the same time, you can’t rotate too hard because the metal tabs on the CPU socket can shear off or you can crack the CPU die.

Here’s the result after:

At rest, with the scope app disabled, I saw temperatures around 40C. When the scope was running with the CPU pegged at 100%, temperatures never exceeded 65C.

I still disabled the CPU temperature alarm though, because that beeping is way too annoying. This will probably come back to bite me some time in the future…

The scope was really working fine now, there was just one more thing to do.

Upgrade to 1 GHz Bandwidth

As mentioned earlier, the scope can be upgraded to a 54832 with 1 GHz bandwidth by removing a single resistor on the acquisition board.

The resistor array can be found here:

(Click to enlarge)

Before the modification, all resistors should be present:

Remove this resistor for the upgrade:

After rebooting, the scope identifies itself as an Agilent 54832B:

To test the modification, I fed the AUX Out⁴ signal at the back of the scope into channel 1, with 50 Ohm termination active.

Before the mod, the rise time averaged to 481 ps:

After removing the resistor, it dropped to 331 ps:

The scope bandwidth is calculated as 0.35 / t_rise. Going from 481 to 331 ps is an increase of 727 MHz to 1057 MHz. The modification worked and the result is within spec, but others have reported an increase to 1.2 GHz. It’s possible that my measurement is limited by the rise time of the AUX Out signal, and that I’d see even better numbers with a real pulse generator. That’s for another time…

Additional Changes are Possible

I only did the minimum to get the scopes working, and did the resistor mod. You can find more impressive modifications on the EEVblog forum:

Install motherboards with P4 CPUs. This requires some mechanical surgery to the case as well, to make the connectors fit.
Use faster SSDs than my compact flash cards. This can bring down the boot time from 4 minutes to less than a minute.
Replace the 640x480 LCD screen with a 1024x768 LCD screen.

Some of these modifications may depend on each other. E.g. the larger resolution LCD screen requires an integrated GPU that is not present on the VP22 motherboard.

I decided not to do any of them: the scope works well enough for my needs.

Conclusion

For $200 and around $30 in additional components, I got myself 2 working 4 Gsps scopes with 600 MHz or 1 GHz bandwidth. I already sold unit A for $200, which I think is an excellent deal for the buyer. I’m keeping the other one.

All words in this blog post were written by a human.

References

EEVblog forum

Other info

5V supply for adapter

Instead of cutting off the existing 5V connector for the CompactFlash adapter, I could have gotten the 5V from somewhere else on the motherboard.
Agilent 54831B Oscilloscope Taming
Tony_G Various disk images

Footnotes

I’m still not used to the recent renaming of HP into Agilent and often use their names interchangeably. ↩
The 54825A has a sleeve-like enclosure. ↩
You shouldn’t try this yourself because it can damage the electronics, but if you unplug the bridge cable while the scope is up and running, you’ll see that the GUI is still rendered but the waveforms disappear. ↩
Make sure to select the 10 MHz output for AUX Out in the Calibration menu. ↩

Polyphase Channelizers with Frequency Offset - a Bluetooth LE Example

2026-03-05T10:00:00+00:00

Introduction
A Bluetooth LE Trace as Example
Input Complex Heterodyne
Derivation of Post-Decimation Offset Correction
Simplifying for the Half-Bin Offset Case
The Odd Case of an Odd Number of Channels
Reducing the Number of Phase Adjustment Values
Conclusion
References
Footnotes

Introduction

In previous blog post, I introduced the polyphase channelizer, a DSP algorithm that is incredibly efficient at heterodyning multiple channels to baseband in parallel. I made two major assumptions about the nature of the input signal:

The bandwidth of a channel is equal to the the input sample rate divided by the decimation factor.
The center frequency of each channel is an integer multiple of the channel bandwidth

If these conditions are satisfied, the channelizer reduces to a filter bank with real coefficients and an inverse FFT on the output of the filter phases.

In this blog post, I’ll use a real-world Bluetooth LE recording and a polyphase channelizer to extract all channels in parallel. There’s a twist, however, in that the center frequency of the channels is not a multiple of the channel bandwidth. With a little bit of additional math, we can work around that too.

I’m still roughly covering topics here that are covered in “Recent Interesting and Useful Enhancements of Polyphase Filter Banks” by fred harris, though my approach is more mathematical and less based on intuition. Furthermore, harris doesn’t work out the details for any generic frequency offset and immediately jumps to the half-channel case. But even there, he spends most of the time discussing a clever trick for odd decimation factors than the generic case that works for all decimation factors. I first deal with the full generic case and then simplify the outcome by imposing additional constraints.

A Bluetooth LE Trace as Example

Bluetooth Low Energy (BLE) lives in the unlicensed 2.4 GHz radio band that’s also used by wifi and many other protocols. It has 40 channels that are each 2 MHz wide for a total bandwidth of 80 MHz. The center frequency of the bottom physical channel is 2402 MHz. In total, BLE occupies a spectrum from 2401 MHz to 2481 MHz.

The 2.4 GHz radio band is often congested. To ensure that at least some packets get through, BLE uses frequency hopping: it continuously jumps from one channel to the next in some predictable pattern. However, to establish an initial connection, there are a number of fixed management channels.

Joshua used his BladeRF SDR unit to provide me with a 5 ms recording with the following characteristics:

center frequency: 2.441 GHz
sample rate: 96 MHz
quadrature I/Q sampling

We can create a spectral power density waterfall plot of this, where the X-axis shows the time and the Y axis the short time Fourier transform (STFT) of the signal, showing the energy for the full frequency range.

(Click to enlarge)

We can see a bright line at the 2441 MHz center frequency. This is a common artifact of the imperfect SDR hardware. It can be caused by local oscillator leakage or an imbalance between the I and Q channels of the quadrature AD converters, or both.

In his video, harris talks about how DC is often problematic, and a reason to have channels with a frequency offset so that none of the channel center frequencies coincide with DC. This trace shows why this is good advice.

We can also see some symmetry around the 2441 MHz line. For example, there’s a short burst around 1.1 ms at 2415 Mhz and a weaker version at 2467 MHz. This weaker version isn’t real either, but a spectral mirror image that’s caused by an imbalance between the I and the Q channels: their phase delta might not be exactly 90 degrees or they might have a slightly different gain on their way to the ADCs.¹ This is another topic that harris warns about: if possible, use a single double-speed ADC and do all the I/Q handling in the mathematically perfect digital domain.

Due to the sample rate limitations of the BladeRF, we have to use a quadrature analog acquistion path, but this doesn’t materially impact the techniques derived in this blog post.

A recording of 96 Msps complex samples covers 48 channels of 2 MHz. Since BLE only has 40 active channels, we have a little bit too much data, but that’s ok. In the waterfall plot below, I’ve added separators between the individual channels. The suprious 2441 MHz line is now obstructed, which is good because it shows that it falls on a transition band.

(Click to enlarge)

In the previous blog post, we operated under the assumption that channel center frequencies were located at a multiple of the decimated sample rate:

\[F_c = \frac{F_s}{M} c, \quad c = -\frac{M}{2}, \dots, -1, 0, 1, \dots, \frac{M}{2}-1\]

That’s not the case here. Instead, we have the following situation:

\[F_c = \frac{F_s}{M} c + \frac{F_s}{2M}, \quad c = -\frac{M}{2}, \dots, -1, 0, 1, \dots, \frac{M}{2}-1\]

Concretely, instead of channel center frequencies at -2, 0, 2, 4, … MHz, the BLE channels are located at -3, -1, 1, 3, 5, … MHz. Having the center frequency offset at exacty half the channel width is something we can exploit later, but I will first develop the generic case where the frequency offset can be anything, and then simplify.

Input Complex Heterodyne

The easiest way to align the channel center frequencies to an integer multiple of the output sample rate is to remove the offset with a complex heterodyne on the input signal.

Like this:

\[\omega_\Delta = 2 \pi \frac{F_\text{offset}}{F_s} \\ x[n] = x'[n] \, e^{j \omega_\Delta n} \\\]

This works, but it undoes all the effort from last blog post where we tried very hard to not do any math at the input sample rate.

Still, let’s do it anyway and see what kind of result we get.

The code for the input heterodyne and the polyphase channelizer is below. I’ve stripped some of the comments for brevity, but check out the code in the GitHub repo for more details.

n = np.arange(len(ble_input), dtype=np.float32)

# Complex 1 MHz rotator to shift the spectrum by the half-channel offset
heterodyne_1mhz     = np.exp(1j * 2.0 * np.pi * channel_offset_hz / sample_rate_hz * n).astype(np.complex64)
# Do the heterodyne on the input signal
ble_input_pre_1mhz  = ble_input * heterodyne_1mhz

# Channel low-pass filter with a passband from 0 to 600 kHz
# and a stopband that starts at 800 kHz.
h_lpf = create_remez_lowpass_fir(
    input_sample_rate_hz     = sample_rate_hz,
    passband_hz              = 600e3,
    passband_ripple_db       = 1.0,
    stopband_hz              = 800e3,
    stopband_attenuation_db  = 50.0
    )

# Pad the filter with zeros so that the polyphase decomposition 
# is a clean 2D array.
h_lpf   = np.pad(h_lpf, (0, -len(h_lpf) % decim_factor) )

# Polyphase filter decomposition: 
# 48 rows, each row has interleaved coefficients.
h_lpf_poly  = np.reshape(
        h_lpf, ( (len(h_lpf) // decim_factor), decim_factor) 
    ).T

# Polyphase decomposition/decimation of the input signal
ble_decim_pre_1mhz  = np.flipud(
    np.reshape(
        ble_input_pre_1mhz,
        ((len(ble_input_pre_1mhz) // decim_factor), decim_factor),
    ).T
)

# Calculate the output of all polyphase filters
h_poly_out_pre_1mhz = np.array(
        [np.convolve(ble_decim_pre_1mhz[_], h_lpf_poly[_]) for _ in range(decim_factor)])

# Vectorized IFFT to calculate the output of all channels
channel_data_pre_1mhz = np.fft.ifft(h_poly_out_pre_1mhz, axis=0).astype(np.complex64)

After extracting the data from channel 33² between 1.14 ms and 1.24 ms, we get the following:

(Click to enlarge)

The active period of a packet can be derived from the amplitude of the I/Q vector (green). And the I/Q data clearly has some structure in it.

BLE uses Gaussian frequency shift keying (GFSK). Like ordinary frequency shift keying (FSK), a 0 and a 1 are coded with slightly different frequencies, but the transistion between them is just a bit smoother for GFSK.

Frequency is the derivative of the phase. Since I and Q are available, you can calculate the phase as follows³:

\[\phi[n] = \text{atan2}(q[n],i[n])\]

The derivative is simply the delta between consecutive phase samples.

In Python, we can demodulate a GFSK signal like this:

angle = np.unwrap(np.angle(iq_data))
d_angle = angle[:-1] - angle[1:]

Here’s the result:

(Click to enlarge)

A BLE packet starts with a 16-symbol 1010101010101010 sync word, followed by data. This definitely looks like a valid packet.

Cool! But it costs us a table with 48 rotator values that are fed into a complex multiplier, at the input sample rate. In this example, the input samples are already complex, but if they were real, the input heterodyne also forces all filter bank calculations to become complex.

Can we do better?

Derivation of Post-Decimation Offset Correction

Here’s the standard polyphase channelizer pipeline from last blog post:

(Click to enlarge)

And here’s the mathematical description of the pipeline, for 3 channels and a filter with 9 coefficients:

\[\begin{alignedat}{0} y_c[n] & = & e^{j \frac{2 \pi}{3} c \, 0} & ( & h[0] & x[3n] & + & h[3] & x[3n-3] & + & h[6] & x[3n-6] & ) \\ & + & e^{j \frac{2 \pi}{3} c \, 1} & ( & h[1] & x[3n-1] & + & h[4] & x[3n-4] & + & h[7] & x[3n-7] & ) \\ & + & e^{j \frac{2 \pi}{3} c \, 2} & ( & h[2] & x[3n-2] & + & h[5] & x[3n-5] & + & h[8] & x[3n-8] & ) \\ \\ y_c[n+1] & = & e^{j \frac{2 \pi}{3} c \, 0} & ( & h[0] & x[3n+3] & + & h[3] & x[3n] & + & h[6] & x[3n-3] & ) \\ & + & e^{j \frac{2 \pi}{3} c \, 1} & ( & h[1] & x[3n+2] & + & h[4] & x[3n-1] & + & h[7] & x[3n-4] & ) \\ & + & e^{j \frac{2 \pi}{3} c \, 2} & ( & h[2] & x[3n+1] & + & h[5] & x[3n-2] & + & h[8] & x[3n-5] & ) \\ \end{alignedat}\]

Let’s generalize this formula to $M$ channels and $N$ filter taps per phase:

\[y_c[n] = \sum_{m=0}^{M-1} \underbrace{ e^{j \frac{2 \pi}{M} c \, m} }_\text{IFFT} \sum_{k=0}^{N-1} h[kM + m] \; x[(n - k)M - m] \\\]

Now substitute input $x[n]$ with an input signal to which a complex heterodyne has been applied:

\[x[n] = x'[n] \; e^{j \omega_{\Delta} n}\]

(Click to enlarge)

\[y_c[n] = \sum_{m=0}^{M-1} e^{j \frac{2 \pi}{M} c \, m} \sum_{k=0}^{N-1} h[kM + m] \; x'[(n - k)M - m] \; \underbrace{e^{j \omega_{\Delta} ((n - k)M - m)}}_\text{offset adjust rotator}\]

A frequency offset adjustment rotator has been introduced.

We can split up this exponential, extract a free-running output rotator that only depends on decimated sample number $nM$, and move it all the way to the front:

\[y_c[n] = \underbrace{e^{j \omega_{\Delta} Mn} }_\text{output rotator} \sum_{m=0}^{M-1} e^{j \frac{2 \pi}{M} c \, m} \sum_{k=0}^{N-1} h[kM + m] \; x'[(n - k)M - m] \; e^{j \omega_{\Delta} (- kM - m)}\]

Now extract the term that only depends on polyphase variable $m$:

\[y_c[n] = e^{j \omega_{\Delta} Mn} \sum_{m=0}^{M-1} \underbrace{e^{-j \omega_{\Delta} m} }_\text{phase adjustment} e^{j \frac{2 \pi}{M} c \, m} \sum_{k=0}^{N-1} h[kM + m] \; x'[(n - k)M - m] \; e^{j \omega_{\Delta} (- kM)}\]

Finally, rearrange the remaining exponential that is different for each filter coefficient index $k$:

\[y_c[n] = \underbrace{e^{j \omega_{\Delta} Mn}}_{\text{output rotator}} \sum_{m=0}^{M-1} \underbrace{e^{-j \omega_{\Delta} m}}_{\text{phase adjustment}} \underbrace{e^{j \frac{2 \pi}{M} c \, m}}_{\text{IFFT}} \sum_{k=0}^{N-1} h[kM + m] \underbrace{e^{-j \omega_{\Delta} (kM)}}_{\text{filter adjustment}} \; x'[(n - k)M - m]\]

There are 3 additional terms now:

all the filter coefficients are modified by a filter adjustment term $e^{-j \omega_{\Delta} (kM)}$.
the output of each phase sub-filter is multiplied by a phase adjustment term $e^{-j \omega_{\Delta} m}$.
all outputs of the IFFT are subjected to complex heterodyne $e^{j \omega_{\Delta} Mn}$.

None of this is ideal, but the first 2 terms are not dependent on the sample number and can be baked into the design. Meanwhile the rotator at the end not only runs at a rate that is M times lower, but the phase step of the rotator is also M times larger which reduces the size of a lookup table with rotator values.

The diagram looks like this:

(Click to enlarge)

In Python, we can use this code:

# No more input heterodyne. Immediately decimate the input signal
ble_decim   = np.flipud(
    np.reshape(
        ble_input,
        ((len(ble_input) // decim_factor), decim_factor),
    ).T
)

# Calculate frequency offset
freq_offset           = channel_offset_hz / (sample_rate_hz / decim_factor)
omega_delta           = 2 * np.pi * freq_offset / decim_factor

# Modify the low pass filter coefficients
h_n                   = np.arange(len(h_lpf_poly[0]), dtype=np.float32)
h_lpf_poly_adj        = np.exp(-1j * omega_delta * decim_factor * h_n).astype(np.complex64)
h_lpf_poly_het        = h_lpf_poly * h_lpf_poly_adj

# Output of the polyphase filter
h_poly_out            = np.array([np.convolve(ble_decim[_], h_lpf_poly_het[_]) for _ in range(decim_factor)])

# Apply a phase rotation to the output of each phase
phase_nr              = np.arange(decim_factor, dtype=np.float32)
h_phase_adj           = np.exp(-1j * omega_delta * phase_nr).astype(np.complex64)
h_poly_out_phase_adj  = h_poly_out * h_phase_adj[:, None]

# IFFT...
channel_data          = np.fft.ifft(h_poly_out_phase_adj, axis=0).astype(np.complex64)

# Output rotator
sample_nr             = np.arange(channel_data.shape[1], dtype=np.float32)
heterodyne_1mhz_decim = np.exp(1j * omega_delta * decim_factor * sample_nr).astype(np.complex64)

# Heterodyne all channels
channel_data_1mhz_post  = channel_data * heterodyne_1mhz_decim[None, :]

While the channel I/Q output samples are not identical to the previous case due to a phase shift, the result after GFSK modulation is the same:

(Click to enlarge)

This seems like a whole lot of effort for little benefit. Yes, we are running all operations at the output sample rate, but the number of multiplications per output sample is now higher than the case with the input heterodyne!

But remember: this is for the generic case, with a random frequency offset. Let’s fix that.

Simplifying for the Half-Bin Offset Case

As mentioned at the start of this blog post, it’s common to have a frequency offset that is equal to half the channel width:

\[F_\text{offset} = \frac{F_s}{2 M} \\ \omega_\Delta = 2 \pi \frac{F_\text{offset}}{F_s} = \frac{2 \pi}{2 M} = \frac{\pi}{M}\]

A crucial observation is that 2 of our adjustment exponentionals feature a multiplication by $M$.

The filter coefficients adjustment:

\[e^{-j \omega_\Delta (kM)} = e^{-j \frac{\pi}{M} (kM)} = e^{-j \pi k } = (-1)^k\]

The output rotator:

\[e^{j \omega_\Delta (Mn)} = e^{j \frac{\pi}{M} (Mn)} = e^{j \pi n } = (-1)^n\]

Awesome! The general equation has been simplified to this:

\[y_c[n] = (-1)^n \sum_{m=0}^{M-1} \underbrace{e^{-j \omega_{\Delta} m}}_{\text{phase adjustment}} \underbrace{e^{j \frac{2 \pi}{M} c \, m}}_{\text{IFFT}} \sum_{k=0}^{N-1} h[kM + m] (-1)^k \; x'[(n - k)M - m]\]

The filter coefficients are real again and the complex multiplier for the output rotator can be replaced by logic that just inverts the sign of the output samples for each time tick.

(Click to enlarge)

This is so much better! But it’s still possible to do better, though the requirements become even stricter.

The Odd Case of an Odd Number of Channels

We are currently still stuck with the per-phase complex rotator:

\[e^{-j \omega_\Delta m}\]

When the channel center frequencies are offset by half the channel width, we’ve so far only considered an adjustment where the correction offset is half the channel bandwidth:

\[\omega_\Delta = \frac{\pi}{M}\]

Relative to the full channel bandwidth of $\frac{2 \pi}{M}$, this offset is $r=0.5$.

\[\omega_\Delta = \frac{ 2 \pi }{M} r\]

But $r$ doesn’t have to be 0.5: we can use any kind of offset, as long as the fractional part of the value is 0.5.

For example, when $r = 2.5$, the channelizer still works, but in addition to a fractional shift of half the channel width, there is an additional shift of 2 full channels. An output sample that would go to channel $k$ for an offset of 0.5 now goes to channel $k+2$ instead. Not the exactly the same result, but this reassigned output channel is just a minor bookkeeping issue.

Let’s see what happens when $r=M/2$.

For even values of M, $r$ is an integer value, without the fractional 0.5 half-bin offset that we need:

For odd values of M, we get the half-bin offset and all channels are moved by $\frac{M-1}{2}$ at the output.

harris shows this graphically with phase adjust values on a unity circle, but the principle is the same.

Let’s see what $r=M/2$ does to the phase adjust term:

\[r = M/2 \\ \omega_\Delta = \frac{ 2 \pi }{M} \frac{M}{2} \\ \omega_\Delta = \pi \\ e^{-j \omega_\Delta m} = e^{-j \pi m} = (-1)^m\]

Nothing changes for the 2 other terms: for odd values of M, they still reduce to $(-1)^k$ and $(-1)^n$.

Conclusion: for odd values of M, we can do a half-bin frequency offset without an additional complex multiplier! Flipping the sign of some sub-filter output values and reassigning the output channel numbers is all that it takes.

(Click to enlarge)

Reducing the Number of Phase Adjustment Values

We can expand this trick for cases where M is even but its number of prime factors 2 is low. Let’s do the exercise for $M = 18$ and select $r = \frac{M}{4} = \frac{18}{4} = 4.5$.

\[r = M/4 \\ \omega_\Delta = \frac{ 2 \pi }{M} \frac{M}{4} \\ \omega_\Delta = \frac{\pi}{2} \\ e^{-j \omega_\Delta m} = e^{-j \frac{\pi}{2} m} = 1, -j, -1, j, 1, \dots\]

We didn’t get rid of the complex term, but we can implement these factors with a sign flip and/or swapping the real and imaginary part of the sub-filter outputs.

In general, if the following it true:

\[M = 2^p K, \quad K > 2\]

Then you should choose $r$ as follows:

\[r = \frac{M}{2^{p+1}} = \frac{K}{2}\]

When $p=0$, you get the case where M is odd, and adjustment factors of ${-1,1}$. When $p=1$, the adjustment factors are ${-1,1, j, -j}$. For larger values of $p$, you can’t avoid a complex multiplier, but at least you will limit the number adjustment values, which can be useful if you have 1 complex multiplier that serially processes all the sub-filter outputs before sending them to the IFFT.

For the BLE example:

\[M = 48 = 16 \cdot 3 = 2^4 \cdot 3 \\ r = \frac{48}{2^5} = 1.5\]

With this configuration, the phase adjustment term wraps around at phase 32, so we only need a lookup table of 32 instead of 48 if we choose $r=0.5$.⁴

Conclusion

Just like in previous blog post, we started with a straightforward solution to a problem that worked, but that required significant mathematical resources. We then threw some math at it and added constraints to simplify the math even more.

The outcome is once again appealing: for all decimation factors, the common case of shifting the spectrum by half the width of a channel requires at most one additional complex multiplication at the output of each sub-filter of the polyphase bank. And even this multiplication can be removed entirely if we can choose a decimation factor that is odd or if it only has one prime factor of 2.

All words in this blog post were written by a human.

References

Other blog posts in this series

Source code

GitHub - Polyphase Filtering Blog Series

Footnotes

You can use Gram-Schmidt decorrelation to fix the I/Q vectors, supposedly, but I haven’t explored that yet. ↩
Channel zero is located at 2441 MHz. Channel numbers increment up to 24 the top frequency is reached, after which the frequency rolls over to the bottom and channel numbers continue to increment. That’s how you end up with 33. ↩
The $\text{atan2}(q,i)$ function differs from $\arctan(\frac{q}{i})$ function in the sense that the former works in all 4 quadrants whereas the latter only works in 1 quadrant. For DSP, you almost always need the 4 quadrant version. ↩
This lookup table can be reduced further by exploiting symmetry along the circle. ↩

The Stunning Efficiency and Beauty of the Polyphase Channelizer

2026-02-16T10:00:00+00:00

All words in this blog post were written by a human being.

Introduction
Where We Left Things Last Time
Sidestep: Ignoring Linear Phase FIR Coefficient Symmetry
Naive Performance Baseline
Straightforward Polyphase Filtering and Decimation
A Free-Running Rotator
From Low Pass to Band Pass Filter
Disappearing the Complex Rotator
Moving Another Rotator behind the Filter and More… Again
The Polyphase Channelizer
From Theory to Practice
Conclusion
References
Footnotes

Introduction

In the past 2 blog posts, I wrote about polyphase decimation filters and complex heterodynes, the latter with some decimation thrown in for good measure.

It’s now time to put everything together, and more. First, I’ll look at the complex heterodyne/decimation combo and see how it can be implemented as efficiently as possible. There’s already some surprises in there, but to top it off, I’ll expand the solution to do the operation for multiple channels in parallel.

The result is amazing.

I’m still roughly following the flow of fred harris’ video about polyphase filter banks¹, but I’ll be making some detours along the way because they helped me to put things better in context and help me with understanding the topic.

There’s a lot more math² this time around, out of necessity: some of the optimizations can’t be figured out with intuition alone. But the math consist almost exclusively of shuffling around sums and products of scalar values and complex exponentials, with a convolution here and there.

For those who don’t want to read previous installments of this series, check out the section with Some Common DSP Notations if you need a quick refresher about the meaning of some of the symbols.

The NumPy code that was used to create the plots in this series can be found here.

Where We Left Things Last Time

I ended my blog post about complex heterodynes with a question about the efficiency of implementing them as a low pass filter that is followed by a decimation. In the video, harris calls this the Armstrong³ heterodyne.

Here’s a quick recap of that pipeline:

$f_c$ is the normalized center frequency of the channel that we’re interested in. In our example, the sample rate $F_s = 100 \text{MHz}$ and the channel center frequency $F_c = 20 \text{MHz}$ so $f_c = 0.2$. Further down, I’ll often use $\theta_c = 2 \pi f_c$ because that makes equations less cluttered.
$e^{-j 2 \pi f_c n}$ is a rotator. When multiplied with the input signal, it shifts down a channel with center frequency $F_c$ down to 0 Hz. That’s the complex heterodyne.
$H_\text{lpf}(z)$ is a low-pass FIR filter with 201 real taps and a linear phase⁴. It removes all the frequencies outside the -5 MHz to 5 MHz range.
Each channel has a 10 MHz bandwidth. Since there is no mirror spectrum due to the complex heterodyne, once the channel has been moved to 0 Hz, we can decimate by a factor 10 so that the range from -5 MHz to 5 MHz is all that’s left.

Check out my section with common DSP notations for a general overview of symbols used in DSP math formulas.

Sidestep: Ignoring Linear Phase FIR Coefficient Symmetry

Linear phase FIR filters have the desirable property that their coefficients are symmetric around the center tap. Here’s a random example:

\[H(z) = -1 + 3 z^{-1} - 6 z^{-2} + 10 z^{-3} - 6 z^{-4} + 3 z^{-5} - z^{-6}\]

This filter has 7 coefficients, the center coefficient is 10, the ones to the left and right of it are both -6 and so forth.

When you convert a DSP algorithm to hardware that needs to consume an input sample and produce and output for every clock tick, the straightforward implementation is to have one multiplier per coefficient⁵.

Since multipliers are often a scarce resource, you can reduce their number by almost half by rearranging the equation as follows:

\[H(z) = -1 (1 + z^{-6}) + 3 (z^{-1} + z^{-5} ) - 6 (z^{-2} + z^{-4}) + 10 z^{-3}\]

We’ve removed 3 out of 7 multipliers.

This works, but you need to trade off the reduction in multipliers against an increase in wiring to get the 2 operands to the addition that feeds the multiplier. On FPGAs, wiring congestion is a real concern so it’s not always a slam dunk.

If you have a hardware architecture where delayed inputs are stored in a RAM instead of individual registers and you use an FSM to execute the filter over multiple clock cycles, trying to do this trick can make scheduling transactions more complicated too.

And when converting the FIR filter into its polyphase form, the simple symmetry breaks entirely. Here’s an example of a symmetrical 19-tap filter. In its original form, coefficients are symmetric, but when split up into 10 phases, the symmetry inside each phase is gone.

It’s still possible to share multiplications if you merge multiple phases, note how phase 2 has coefficients 6 and 2 and phase 7 has coefficients 2 and 6, but that again makes data organization and movement more difficult.

For the remainder of this blog post, I will ignore symmetry related optimizations when calculating the number of multiplications.

Naive Performance Baseline

I will use multiplication as the main indicator by which to judge the efficiency of a DSP algorithm.

Let’s evaluate the number of multiplications for the naive architecture:

The complex rotator multiplies a real sample with a complex number or 2 multiplications per operation and 200M per second.
The low pass filter has 201 real taps, for a total of 201 x 2 x 100M = 40.2B operations per second.

Total: 40.4B multiplications per second!

This is our baseline, and it’s a lot. Let’s see what we can do about this…

Straightforward Polyphase Filtering and Decimation

There’s a reason why I also wrote Notes about Basic Polyphase Decimation Filters: it discusses exactly this kind of scenario, the combo of an FIR filter followed by a decimation. Yes, there’s a complex rotator in front of the FIR filter, but for now we can keep it there while we transform the FIR/decimator to its polyphase form.

harris mentions this case only tangentially, but it’s useful to compare how well the straightforward polyphase filter bank performs compared to the naive solution.

First split the FIR filter into its polyphase form with 10 sub-filters, the decimation factor:

Apply the noble identity for decimation:

Moving the FIR filter operation behind the decimator is a huge savings. The complex rotator still counts for 200M multiplications per second, but the combined 201 taps now need to deliver samples at a 10 times lower rate, 201 x 2 x 10M = 4.02B operations per second, for a total of 4.22B operations per second. If it weren’t for the complex rotator, the savings ratio is exactly the decimation factor.

The biggest problem with this arrangement is that the rotator is in front of the decimator and there is no obvious way to move it behind the decimator. If the DSP pipeline is implemented in an FPGA and the input sample clock is very high, the multiplier hardware may simply not be fast enough.

A Free-Running Rotator

One minor thing to note is that the rotator consists of the input signal being multiplied by the output of a free-running oscillator. Free-running implies that there are no restrictions on the starting phase of the oscillator.

In the previous diagram, sample $x[n]$ is multiplied by $e^{-j \theta_c n}$, sample $x[n+1]$ by $e^{-j \theta_c (n+1)}$, and so forth, but that’s really arbitrary. We could multiply $x[n]$ by $e^{-j \theta_c (n+1)}$ and $x[n+1]$ by $e^{-j \theta_c (n +2)}$ and the outcome in terms of frequency characteristics wouldn’t be materially different (though there would be constant phase shift.)

What is true is that you have to continuously loop through all the values of the rotator, irrespective of the length of the number of filter taps: if the rotator completes a full rotation in 128⁶ steps, then you’ll need a table or a calculation⁷ to produce 128 points around the unity circle.

We’ll soon see that this isn’t the case in other schemes.

From Low Pass to Band Pass Filter

Let’s undo the previous polyphase optimization, start again from the naive solution, and try something different.

So far, we have been heterodyning the channel of interest to the baseband and then sent it through a low-pass filter, as seen in the plot from previous blog post:

Can we turn the order around, first send the channel of interest through a band-pass filter and then heterodyne the result down to baseband? As harris points out, the Armstrong heterodyne was created to avoid that, because a movable band-pass filter requires mechanically tuned capacitors and inductors. In the DSP world, however, it’s just numbers and calculations.

So, yes, we can do the filtering first and then do the heterodyne, and it’s relatively easy to show that mathematically.

In what follows, I will deviate from the harris’s notation in 2 ways. He uses $a[n] * b[n]$ for convolution. $(a * b)[n]$ is the more common way. He also overloads the meaning of $n$ in the same equation, in a way that I found utterly confusing. Instead, I will use the $[\cdot]$ and $(\cdot)$ notation, where $\cdot$ is essentially a temporary local loop variable. If you see a $\cdot$ in the equations below, assume that harris had a $n$ there.

Starting with this:

\[y[n] = \big( \underbrace{ \underbrace{(x[\cdot] e^{-j \theta_c (\cdot)})}_{\text{heterodyne}} * h\big)[n]}_{\text{low-pass filter}}\]

$*$ is the convolution operator, in this case, a discrete convolution. Let’s expand the equation by applying the definition of the convolution:

\[y[n] = \sum_{k=0}^{N-1} (x[n-k] e^{-j \theta_c (n-k)}) \; h[k]\]

$N$ is the number of coefficients of the filter.

Extract the common exponential term that doesn’t depend on $k$:

\[y[n] = e^{-j \theta_c n} \sum_k x[n-k] \; ( e^{j \theta_c k} h[k] )\]

Reduce back to a convolution operator:

\[\begin{alignedat}{0} y[n] & = & e^{-j \theta_c n} \; \big( x * (h[\cdot] e^{j \theta_c (\cdot)} ) \big)[n] \\ & = & \big( x * (h[\cdot] e^{j \theta_c (\cdot)} ) \big)[n] \; e^{-j \theta_c n} \end{alignedat}\]

We’ve just proven what, in the video, harris calls the Equivalency Theorem:

\[\big(( x[\cdot] e^{-j \theta_c (\cdot)} ) * h\big)[n] = e^{-j \theta_c n} \; \big( x * (h[\cdot] e^{j \theta_c (\cdot)} ) \big)[n]\]

There’s one minor comment about this: while Google turns up plenty of equivalency theorems, none of them deal with the swapping around a heterodyne and convolution. The only reference⁸ that I found was in section 6.1 of his own book, Multirate Signal Processing for Communication Systems⁹, which has the same formulas and figures as the one of the video. It says:

The equivalency theorem states that the operations of a down-conversion followed by a low-pass filter are totally equivalent to the operations of a band-pass filter followed by a down-conversion.

Anyway, this transformation doesn’t look like an improvement, and it will take a while before we can see how this helps us. For now, let’s break the equation into pieces and look at them step by step.

\[h[\cdot] e^{j \theta_c (\cdot)}\]

The coefficients of the low-pass filter with transfer function $H_\text{lpf}(z)$ are each multiplied by a value of a rotator. Notice how the $-$ sign in front of the $j$ exponent of the rotator has disappeared: when we were heterodyning the channel, we were bringing the spectrum down to baseband. Now, we’re doing the opposite and heterodyning the low-pass filter up to channel band!

Let’s apply the equation above to an example. If the transfer function of the original filter is this:

\[H_\text{lpf}(z) = h_0 z^{0} + h_1 z^{-1} + h_2 z^{-2} + h_3 z^{-3} + h_4 z^{-4}\]

Then the new filter is this:

\[\begin{alignedat}{0} H_\text{bpf}(z) & = & h_0 e^{j \theta_c 0} z^{0} &+& h_1 e^{j \theta_c 1} z^{-1} &+& h_2 e^{j \theta_c 2} z^{-2} &+& h_3 e^{j \theta_c 3} z^{-3} &+& h_4 e^{j \theta_c 4} z^{-4} \\ & = & h_0 (e^{-j \theta_c} z)^{0} &+& h_1 (e^{-j \theta_c} z)^{-1} &+& h_2 (e^{-j \theta_c} z)^{-2} &+& h_3 (e^{-j \theta_c} z)^{-3} &+& h_4 (e^{-j \theta_c} z)^{-4} \\ \end{alignedat}\]

This can be written much shorter, useful for drawings, like this:

\[H_\text{bpf}(z) = H_\text{lpf}(e^{-j \theta_c} z)\]

It is important to note that the coefficients of $H_\text{bpf}(z)$ are constants: for a given center frequency, we can pre-calculate the coefficients and never change them again. And contrary to the free-running rotator that shifted down the spectrum of the input signal, the number of rotator values to shift up the filter is fixed to the number of filter taps. However, compared to the original filter $H_\text{lpf}(z)$, the coefficients are now complex instead of real.

To simulate the behavior of this band-pass filter, we create an array with as many complex rotator values as there are filter taps and multiply them with the low-pass filter coefficients from previous blog:

tap_idx       = np.arange(LPF_FIR_TAPS)
complex_lo    = np.exp(1j * 2 * np.pi * lo_freq_hz * tap_idx / sample_clock_hz)
h_bpf_complex = h_lpf * complex_lo

Looking at the spectrum of this filter, there are no surprises: the filter has been transformed from a low-pass filter to a band-pass filter with $F_c = 20 \text{MHz}$ as center frequency:

The second plot of the figure above shows the input signal after applying the band-pass filter.

\[x[n] * h_\text{bpf}[n]\]

signal_bpf_complex  = np.convolve(signal, h_bpf_complex, mode="same")

The final step shifts the filtered signal back to baseband:

\[y[n] = ( x[n] * h_\text{bpf}[n] ) \; e^{-j \theta_c n}\]

After decimation, we end up with the same result in the previous blog post:

Cool! But what did we gain?

The input to the filter is now real instead of complex, but the coefficents are now complex instead of real. So the number of multiplications in the filter remains the same. And the heterodyne now multiplies 2 complex numbers instead of multiplying a real input with a complex. We’ve regressed!

But that’s something that will be fixed in the next section…

Disappearing the Complex Rotator

In the straightforward case, we had to switch to a polyphase decomposition to move the decimator from behind the filter to in front of the filter. But that decomposition introduces single timestep delays which prevents moving the decimator even further to the front of the rotator.

This is not the case anymore: the rotator is behind the filter and there are no delay elements. This allows us to move the decimator before the rotator.

Here’s the rotator before decimation:

\[e^{-j \theta_c n}\]

When we decimate by a factor of M, the rotator completes a circle by a factor M less steps than before the decimation. Or the angle by which the rotator moves forward each step is now M times larger.

After decimation, the exponent of the rotator now has factor $M$ added to it:

\[e^{-j \theta_c M m}, m = \lfloor \frac{n}{M} \rfloor\]

where $\lfloor x \rfloor$ means “$x$ rounded down to the closest integer number”.

Since the decimator and the rotator have swapped positions, the earlier problem of having to run the rotator at the input sample rate has been solved!

But we can do better! The rotator can disappear entirely if its value is equal to one at all times.

\[e^{-j \theta_c M m} \stackrel{?}{=} 1 + 0j\]

We don’t want this to be dependent on $m$, so we’re really trying to find a solution for this equation:

\[e^{-j \theta_c M } \stackrel{?}{=} 1 + 0j\]

The rotator is one whenever it makes a full circle or whenever the exponent is an integer multiple of $2 \pi$.

\[\theta_c M = k \; 2 \pi\]

Replace $\theta_c$ with its definition:

\[2 \pi \frac{F_c}{F_s} M = k \; 2 \pi\]

Simplify and rearrange:

\[F_c = k \frac{F_s}{M} \\ \theta_c = \frac{2 \pi k}{M}\]

It doesn’t seem like it, but this is a crucial result:

If the center frequency of your channel is a multiple of the sample rate divided by the decimation factor, the decimated rotator will always evaluate to 1 and thus the multiplication disappears entirely.

In our example with $F_s=100 \text{MHz}$, $M=10$, $F_c=20 \text{MHz}$, this equation is satisfied for $k=2$, and we end up with this:

If all of this feels a bit familiar, it’s probably because you’ve heard about undersampling or band-pass sampling. It’s what happens when you deliberately violate the Nyquist theorem, sample at a rate that is much lower than twice the bandwidth of a signal, but do it in such a way that the spectrum of the signal aliases exactly where you want it to be: at baseband.

Band-pass sampling only works if there are no stray frequency components outside the channel, which is why preprocessing the input with a band-pass filter is essential.

Even with complex filter coefficients, we can still do the polyphase decomposition and move the decimator before the set of filters:

Tadaa! All elements of the pipeline can now run at the decimated output sample rate.

Last time we checked, we needed 4.22B multiplications per second. With the complex rotator gone, we’re now at 4.02B: just a filter with 201 complex taps, fed with a real value, executed 10M times per second.

A pitiful 5% savings is not worth writing home about, but we can do even better.

Note: even if we don’t satisfy the $F_c = k \frac{F_s}{M}$ condition, we’re still better off than before, because the rotator still runs at the output instead of the input sample rate:

This blog post is already long as it is, so for this one, I’m focussing only on the case where the center frequency condition is satisfied.

Moving Another Rotator behind the Filter and More… Again

Let’s play another game of shuffling around sums and terms. So far, we’ve only engaged the polyphase decomposition after the fact, to lower the number of filter calculations. This time we’re adding the polyphase decomposition explicitly to the mathematical mix for additional benefits.

Here’s where we left it last time:

Let’s move our attention to the transfer function of filter:

\[\begin{alignedat}{0} H_\text{bpf}(z) & = & H_\text{lpf}(e^{-j \theta_c} z) \\ & = & \sum_{n=0}^{N-1} h[n] (e^{-j \theta_c } z)^{-n} \\ & = & h_0 e^{j \theta_c 0} z^{ 0} &+& h_1 e^{j \theta_c 1} z^{-1} &+& h_2 e^{j \theta_c 2} z^{-2} &+& 3 e^{j \theta_c 3} z^{-3} &+& ... \end{alignedat}\]

Do the polyphase decomposition. Instead of summing all the terms of the full $h[n]$ polynomial in one go, we sum the terms of $M$ different polyphase polynomials separately, and then add them together:

\[= \sum_{m=0}^{M-1} \sum_{n=0}^{N-1} h[m + Mn] e^{j \theta_c (m + Mn)} z^{-(m+Mn)} \\\]

When studying this step in the video, it took me a minute to understand what happened with $h[n]$. In the first equation, $n = 0 ... N-1$, where $N$ is the number of coefficients. In the equation above, the range of $n$ doesn’t change, but now it’s used like this: $h[m + Mn]$. The maximum index of $h$ now goes beyond the number of coefficients. This isn’t a problem, though, as long as you keep in mind that $h[n]$ is $\color{red}{0}$ when $n$ is smaller than 0 or larger than $N-1$.

To make things really clear, let’s expand all these sums and products for a 9-tap filter with decimation factor $M=3$:

\[\begin{alignedat}{0} H_\text{bpf}(z) & = & h_0 e^{j \theta_c 0} z^{ 0} &+& h_3 e^{j \theta_c 3} z^{-3} &+& h_6 e^{j \theta_c 6} z^{-6} &+& \color{red}{0} \, e^{j \theta_c 9} z^{-9} &+& ... && \qquad (m = 0) \\ & + & h_1 e^{j \theta_c 1} z^{-1} &+& h_4 e^{j \theta_c 4} z^{-4} &+& h_7 e^{j \theta_c 7} z^{-7} &+& \color{red}{0} \, e^{j \theta_c 10} z^{-10} &+& ... && \qquad (m = 1) \\ & + & h_2 e^{j \theta_c 2} z^{-2} &+& h_5 e^{j \theta_c 5} z^{-5} &+& h_8 e^{j \theta_c 8} z^{-8} &+& \color{red}{0} \, e^{j \theta_c 11} z^{-11} &+& ... && \qquad (m = 2) \\ \end{alignedat}\]

In each of the polyphase sub-filters, the factor $e^{j \theta_c m} z^{-m}$ is independent of $n$ and can be moved ahead of the inner sum:

\[= \sum_{m=0}^{M-1} e^{j \theta_c m} z^{-m} \sum_{n=0}^{N-1} h[m + Mn] e^{j \theta_c Mn} z^{-Mn} \\\] \[\begin{alignedat}{0} H_\text{bpf}(z) & = & e^{j \theta_c 0} z^{ 0} & \big( h_0 e^{j \theta_c 0} z^{0} &+& h_3 e^{j \theta_c 3} z^{-3} &+& h_6 e^{j \theta_c 6} z^{-6} \big) && \qquad (m = 0) \\ & + & e^{j \theta_c 1} z^{-1} & \big( h_1 e^{j \theta_c 0} z^{0} &+& h_4 e^{j \theta_c 3} z^{-4} &+& h_7 e^{j \theta_c 6} z^{-7} \big) && \qquad (m = 1) \\ & + & e^{j \theta_c 2} z^{-2} & \big( h_2 e^{j \theta_c 0} z^{0} &+& h_5 e^{j \theta_c 3} z^{-5} &+& h_8 e^{j \theta_c 6} z^{-8} \big) && \qquad (m = 2) \\ \end{alignedat}\]

Now look back at the previous section where we figured out the condition to eliminate the rotator. In the equation above, we see $e^{j \theta_c Mn }$, which contains $e^{j \theta_c M }$. This is exactly the same rotator that we eliminated before.

In other words, when using the same restriction $F_c = k \frac{F_s}{M}$, the rotator in the products of the inner sum simply disappears and we end up with this:

\[= \sum_{m=0}^{M-1} e^{j \theta_c m} z^{-m} \sum_{n=0}^{N-1} h[m + Mn] z^{-Mn} \\\] \[\begin{alignedat}{0} H_\text{bpf}(z) & = & e^{j \theta_c 0} z^{ 0} & \big( h_0 z^{0} &+& h_3 z^{-3} &+& h_6 z^{-6} \big) && \qquad (m = 0) \\ & + & e^{j \theta_c 1} z^{-1} & \big( h_1 z^{0} &+& h_4 z^{-4} &+& h_7 z^{-7} \big) && \qquad (m = 1) \\ & + & e^{j \theta_c 2} z^{-2} & \big( h_2 z^{0} &+& h_5 z^{-5} &+& h_8 z^{-8} \big) && \qquad (m = 2) \\ \end{alignedat}\]

Or abbreviated:

\[H_\text{bpf}(z) = \sum_{m=0}^{M-1} e^{j \theta_c m} z^{-m} H_m(z^M)\]

Furthermore:

\[\theta_c = k \frac{2 \pi}{M}\]

So we end up with this:

\[H_\text{bpf}(z) = \sum_{m=0}^{M-1} e^{j \frac{2 \pi}{M} k m} z^{-m} H_m(z^M)\]

$e^{j \frac{2 \pi}{M} k m}$ is a scalar value, so we can move the multiplication behind the filter:

\[H_\text{bpf}(z) = \sum_{m=0}^{M-1} z^{-m} H_m(z^M) e^{j \frac{2 \pi}{M} k m}\] \[\begin{alignedat}{0} H_\text{bpf}(z) & = & z^{ 0} & \big( h_0 z^{0} &+& h_3 z^{-3} &+& h_6 z^{-6} \big) e^{j \frac{2 \pi}{M} k 0} \\ & + & z^{-1} & \big( h_1 z^{0} &+& h_4 z^{-4} &+& h_7 z^{-7} \big) e^{j \frac{2 \pi}{M} k 1} \\ & + & z^{-2} & \big( h_2 z^{0} &+& h_5 z^{-5} &+& h_8 z^{-8} \big) e^{j \frac{2 \pi}{M} k 2} \\ \end{alignedat}\]

Here’s how that looks as a diagram:

As a final step, we can move the decimator back to the front by applying the noble identity on the polyphase sub-filters. Note that this time, the rotator exponent is not multiplied by $M$, because the exponent is a fixed value, not a changing rotator.

This is a truly remarkable outcome:

All math operations happen at a slow rate behind the decimators.

We can do this because of the noble identies that give us the polyphase transformations and because the rotators are located after the filters.
The inputs to the filters are real.

We achieved this by applying the equivalency theorem.
The coefficients of the polyphase filters are real again.

We did this by extracting the rotators from the filters, and removing the frequency-dependent component.
The coefficients don’t depend on the targeted channel frequency.

We did this also by extracting the rotators from the filters (as long as the channel $k$ meets our criterion above of being an integer multiple of the decimation rate).
The rotators are located behind the filters.

This will be very important in the next section.

The importance of the last 2 points can’t be overstated: if you want to change the channel $k$ that needs to be brought to base band from one to another, all you need to change are the rotators.

Compared to the last checkpoint, the resource requirements have also been reduced roughly by half:

the 201 filter taps are multiplied by a real input at a rate of 10M per second = 2.01B multiplications.
10 rotators multiply the real output of the filters by a complex number at 10M per second = 200M multiplications.

Total: 2.21B multiplications.

Our naive initial baseline was 40.4B multiplications per second, we’ve reduced that number by a factor of 20.

And still we are not done…

The Polyphase Channelizer

So far, we’ve focused on finding an optimal solution to extracting the signal of one channel to baseband, out of many possible channels. We’re now expanding our scope: what if we want to extract the signal of all channels in parallel?

This is where the conclusion of previous section pays off ever more: since only the final rotators are channel dependent, all we need is an additional set of rotators for each extra channel. The filters remain untouched. That’s a huge win: from the resource calculation, we can already conclude that the filters tend to require the large majority of multipliers. And that’s for a filter with 201 taps, which is relatively modest. In today’s world, channels are often stacked one next to the other with a very narrow transition band and narrow transition bands require very steep filters to separate one from the other.

In the DSP pipeline above, 2 channels are brought to baseband resulting in 2 time domain signals $s_k[n]$ and $s_l[n]$, but the number of channels that can be extracted efficiently is really only limited by the decimation factor (due to the $F_c = k F_s / M$ requirement.)

If we have a decimation factor of $M$ and we want to extract $M$ channels in parallel, then we’re looking at $M (M-1)$¹⁰ rotators or $2 M (M-1)$ multipliers. In his video, harris talks about a polyphase channelizer with 65536 channels. There are many cases where $O(n^2)$ is good enough and suddenly it’s not. 4,294,901,760 complex multiplications to calculate 65536 output samples is not good enough.

Let’s look at the rotator section for a single channel:

\[s_k[n] = \sum_{m=0}^{M-1} y_m[n] e^{j \frac{2 \pi}{M} k \, m}\]

This calculation must be done for each output time step $n$, so let’s drop that index. And while we’re at it, let’s group the outputs $y_m$ of the filters into their own array, so that we can reference them, for each time step $n$, as $y[m]$ instead of $y_m$:

\[s_k = \sum_{m=0}^{M-1} y[m] e^{j \frac{2 \pi}{M} k \, m}\]

Does this equation ring a bell? Compare against this:

\[x[n] = \frac{1}{N} \sum_{k=0}^{N-1}{X[k] e^{j \frac{2 \pi}{N} n \, k } }\]

Don’t confuse the meanings of $k$, $m$, and $n$. The $k$ in the first equation matches the purpose of $n$ in the second one!

This is the definition of inverse¹¹ discrete Fourier transform (DFT)! Except for the front scaling factor, our equation has the same form.

If, for each time step $n$, we want the output samples of all channels $0..M-1$, the DFT will give us exactly that. That in itself doesn’t solve our problem: it is well known that a naive DFT implementation has $O(n^2)$ behavior. But in DSP land, it’s impossible to mention the discrete Fourier transform without immediately bringing up the Fast Fourier transform (FFT), which has $O(n \log n)$ behavior.

For a 65536 channel polyphase channelizer, the FFT brings down the number of complex multiplications from 4,294,901,760 down to 1,048,576. We’re received a second boost-of-efficiency miracle.

Finally, we’re at the end of a journey that gives us this wonderful result:

The Fourier transform is known primarily for converting signals from the time domain to the frequency domain and back, but you don’t have to use it for frequency stuff, as is the case here. The output of the IFFT in the polyphase channelizer is an array with the samples of all channels of a given time tick. The IFFT is used as an algorithmic accelerator that has time domain values at the input and time domain values at the output.

This is as good a time as any to link to my favorite Youtube video of all time: “The Fast Fourier Transform (FFT): Most Ingenious Algorithm Ever?”

It develops the FFT algorithm from scratch. And like the polyphase channelizer, it doesn’t use the FFT for time domain/frequency conversion, but to accelerate the multiplication of polynomials.

From Theory to Practice

Let’s put everything together in a simulated example. I’ve created a new signal that has 2 active channels, with center frequency at 20 MHz and 30 MHz. The 20 MHz channel has the same peaks as before, the 30 MHz one has 2 different peaks. As before, the inactive channels have a large noise component.

NumPy has all kinds of nice operators to manipulate multi-dimensional arrays, but my knowledge about them is thin, so the code below won’t be the most canonical way of doing things.

Split up the taps of low-pass filter h_lpf into h_poly, its polyphase decomposition:

h_poly              = np.zeros((DECIM_FACTOR, int(np.ceil(LPF_FIR_TAPS / DECIM_FACTOR))))
for phase in range(DECIM_FACTOR):
    phase_taps      = h_lpf[phase::DECIM_FACTOR]
    h_poly[phase, :len(phase_taps)] = phase_taps

Decimate the input signal into 10 different signals, each with a different phase:

signal_multi_decim  = np.zeros((DECIM_FACTOR, int(np.ceil(NR_SAMPLES/DECIM_FACTOR))))
for phase in range(DECIM_FACTOR):
    phase_decim     = signal_multi[DECIM_FACTOR-1-phase::DECIM_FACTOR]
    signal_multi_decim[phase, :len(phase_decim)] = phase_decim

Note the DECIM_FACTOR-1-phase part. It’s tempting to write phase there, but that won’t work. Ask me how I know…

For each phase, apply the decimated input signal to the corresponding polyphase sub-filter:

h_poly_out          = np.zeros((DECIM_FACTOR, len(signal_multi_decim[0])))
for phase in range(DECIM_FACTOR):
    phase_h_out     = np.convolve(signal_multi_decim[phase], h_poly[phase], mode="same")
    h_poly_out[phase, :len(phase_h_out)] = phase_h_out

For each timestep, take the 10 samples from output values of the filters, perform an IFFT, and store it as the output of 10 channels:

signal_poly_out    = np.zeros((DECIM_FACTOR, int(np.ceil(NR_SAMPLES/DECIM_FACTOR))), dtype=complex)
for m in range(len(h_poly_out[0])):
    ifft_input  = h_poly_out[:, m]
    ifft_out    = np.fft.ifft(ifft_input)
    signal_poly_out[:, m] = ifft_out

Here’s a plot witih the spectra for channels 1 (noise), 2 and 3:

Success!

Conclusion

This was a long story, but I felt that it had to be told in one go to keep all the context together.

Let’s do a step-by-step recap:

We started with a very naive implementation of a single channel downconverter.
Using a straightforward polyphase decomposition, we came up with a much more efficient design but with one major flaw: it still required a rotator that runs at the input sample rate.
With a bit of algebra, we moved that rotator to the back of the pipeline, after the decimator. No more units running at the input sample rate!
A smart choice of the sample rate allowed us to get rid of the rotator altogether.
Some more algebra allowed us to cut the number of multiplications by half and isolate all channel specific calculations to the very end of the pipeline.
With only 1 non-channel specific polyphase filter and different rotators at the back, we could expand the pipeline to support multiple channels at low extra cost.
That cost became even lower by recognizing the presence of an inverse discrete Fourier transform and using an IFFT to accelerate the calculations.

I just love when everything, like a plan, comes beautifully together.

I deliberately left out the parts of the video where harris discusses cases where channel centers have a fixed offset from where they should be. It would make this blog post even longer, but these cases are also not fully worked-out in the video. I’ll need more time to digest those parts.

The topics that have been covered in these last 3 blog posts only take around 40 min of a video that’s 90 min long. The remainder contains a bunch of interesting examples and applications for polyphase filter banks and polyphase channelizers. I want to dive into those as well.

One thing that I didn’t cover is the intuitive explanation about how polyphase channelizers work. harris talks about rotating spectra, aliased to the same baseband, that cancel each other out for different rotators. While I kind of get what he’s trying to say, the truth is that I currently don’t have the intuition that harris has, so I’ll defer to the video for that.

Many thanks to Joshua for reviewing!

References

Other blog posts in this series

Source code

GitHub - Polyphase Filtering Blog Series

Footnotes

fred harris insists on writing his name entirely in lower case. But according to this reddit comment that’s only true in the time domain. ↩
I’ve been spending weeks on this subject now: watching videos, reading books, and writing the blog posts. Doing so, I’ve become much more comfortable with the math. That’s good for me personally, but it’s ironic that this might make the blog posts less accessible for others! ↩
Edwin Armstrong was the inventor of the superheterodyne receiver that I mentioned in the previous blog post. ↩
Whether or not an FIR is linear phase depends on its coefficients, but most common methods to determine those result in a linear phase filter. ↩
For the sake of argument, I’m assuming the coefficients are programmable so that a full-fledged multiplier is needed. If the coefficients are constant, you can almost always replace a multiplier by a much cheaper combination of add and shift operations. ↩
In theory, the number of steps to complete a rotation could be a fractional number. ↩
There are multiple techniques to calculate the next point on a unity circle. The most straightforward one is to do a rotation with a fixed rotation matrix, you that will cost up to 4 multipliers, and you need to watch out for accumulating errors over time. The CORDIC algorithm is very popular, requires no multiplication, but requires much more steps per result to achieve the desired precision. ↩
There are other references, but all of those are either papers written by harris or papers that reference one of his papers or books. ↩
I’ve only just started reading the book, but so far I really like what I see. ↩
$M(M-1)$ instead of $M^2$ because the rotator at the output of $H_0(z)$ has an exponent of 0 and thus reduces to 1. ↩
It’s inverse because there’s no $-$ sign in front of $j$. ↩

Complex Heterodynes Explained

2026-02-07T10:00:00+00:00

Introduction
Some Common DSP Notations
Sampling with 1 ADC Creates a Real Signal
Heterodyning the Signal to Baseband the Wrong Way
Complex Heterodyne to the Rescue
Filtering Away the Old Negative Image
Decimation
Final Block Diagram
Conclusion
Afterthought: the Fourier Transform is a Bunch of Averaged Complex Heterodynes
References
Footnotes

Introduction

In my previous blog post about polyphase decimation, my reason for looking at that topic was “reading up on polyphase filters and multi-rate digital signal processing”, but to be more specific, it all started by watching “Recent Interesting and Useful Enchancements of Polyphase Filter Banks”, a fantastic tutorial by Fred Harris. The video is more than 90 min long and is a lot to process when your DSP knowledge is lacking.

I’ve watched the video a few times now, and while I kind of get what he’s doing, it made me realize even more how skin-deep my DSP knowledge really is.

For example, the video talks about complex heterodynes all over the place, but I couldn’t really explain how the outcome of that operation is different from mixing an input signal with a regular, real sinusoid.

To fix this, I’m going through video sections step-by-step and blog post by blog post. The general approach is to demonstrate concepts (to myself) by implementing them in NumPy and plotting the results while limiting the number of mathematical formulas. In the process of peeling that onion, new knowledge gaps will be exposed that might not be directly relevant to the video, but if interesting enough, I’ll check those out just the same.

But that’s for the future. Let’s talk about the why and how of the complex heterodyne.

The scripts that were used to create the figures in this blog post series can be found in my polyphase_blog_series on GitHub.

Some Common DSP Notations

There are some conventions that are useful to know about. They aren’t a hard and fast rules, but I’ll try to stick to them as well as I can.

$N$: the number of samples in the time domain buffer over which a certain block operation is performed.
$n$: the current time in a discrete time system. For example, $s[n]$ could be an array or sequence of input samples that come out of an ADC.
$k$: an index in a size limited set of numbers. $k$ could be used to indicate one of many channels, it could be one bin out of all the bins of a discrete time Fourier transform, etc.
$H(z)$: a discrete transfer function, usually of a filter. The fact that it’s an uppercase $H$ indicates that the function is in the z-domain, the discrete version of $H(s)$ which is in the Laplace domain, but don’t worry about those terms, it’s the last time they’ll be mentioned.
$h[n]$: the impulse response of the $H(z)$ transfer function. This is the time domain sequence that you get if you apply a 1 and then nothing but zeros to $H(z)$. Since I’ll only be discussing finite impulse response filters (FIR), $h[n]$ will be the same as the coefficients of the polynomial that describes $H(z)$.
$h[k]$: one of the polynomial coefficients of $H(z)$. For all coeffients of $H(z)$, $h[k]$ will be identical to $h[n]$. For all other values, $h[n]$ will be zero, while $h[k]$ won’t really exist. This is a pretty subtle difference and often $h[k]$ and $h[n]$ will be used interchangeably (I’ve definitely done so!), but the notation can help to make clear the intent of a formula.
$F_x$: a real world analog frequency, measured in Hz. $F_s$ is often used for the sample rate. $F_c$ could be the center frequency of a channel.
$f_x$: a normalized frequency, usually relative to the sample frequency. $f_c$ would be the ratio of $F_c / F_s$.
$\omega$ and $\theta$: both are used to indicate the rate of change of a periodic signal. But $\omega$ tends to be used more when the intent is an angular frequency, e.g. in the context of shifting the spectrum of a signal, while $\theta$ puts more emphasis on the change of an angle on the unit circle. From a pure mathematical point of view, they’re the same: $\sin(\omega n)$ is no different than $\sin(\theta n)$. One reason to use $\omega$ instead of $2 \pi n/N$ is because it reduces the visual clutter when used as an argument of trigonometry functions. Compare $\sin(2 \pi n /N)$ with $\sin(\omega n)$.

I’ll try to stick to these conventions as much as possible. Feel free to reach out if you think I’m doing it wrong somewhere.

Sampling with 1 ADC Creates a Real Signal

Let’s create an input signal that’s interesting enough to demonstrate DSP theory in practice and that will trip us up if we’re doing something wrong. It’s a signal that you could get out of a real-world analog front-end with a single AD converter (ADC) that has a sampling clock of 100 MHz.

signal_pure = ( signal1_amplitude * np.sin(2 * np.pi * signal1_freq_hz * t)
              + signal2_amplitude * np.cos(2 * np.pi * signal2_freq_hz * t) )

noise_floor = np.random.normal(0.0, noise_rms, NR_SAMPLES)

oob_noise           = np.random.normal(0.0, oob_noise_rms, NR_SAMPLES)

oob_noise_cutoffs   = [ OOB_NOISE_SBF_LOW_MHZ / (SAMPLE_CLOCK_MHZ / 2.0),
                        OOB_NOISE_SBF_HIGH_MHZ / (SAMPLE_CLOCK_MHZ / 2.0) ]

oob_noise_h         = firwin( OOB_NOISE_FIR_TAPS,
                              oob_noise_cutoffs,
                              window=("kaiser", OOB_NOISE_KAISER_BETA),
                              pass_zero="bandstop" )

oob_noise_filtered  = np.convolve(oob_noise, oob_noise_h, mode="same")

signal      = signal_pure + noise_floor + oob_noise_filtered

The signal has the following components:

2 sinusoids, one at 22 MHz and one at 17 MHz. The second one has an amplitude that is 10 dB lower.

This is the signal that we’re interested in.
A tiny bit of noise across the whole spectrum

This adds a noise floor to the overall spectrum which makes it more like the real world and also makes the frequency plots more pleasing go the eye.
Out-of-band noise that is everywhere expect in the frequency band where our signal lives.

This is useful to verify that we’re processing the signal the right way. If we don’t then this noise will overlap the spectrum of the signal of interest and we’d notice that right away in the spectrum plot.

In a time domain plot, we see a typical case of sinusoids interacting with each other, resulting in some kind of beat envelope frequency. The noise is too low to be noticable in a non-logarithmic plot.

The frequency domain amplitude plot is more interesting. There are the 2 peaks of different amplitude, a noise floor in the frequency band where our signal lives, and the more prominent out-of-band noise everywhere else.

We can also see that the negative frequency side of the spectrum is a mirror of the positive side. This is as it should be: to display the spectrum, we performed a Discrete Fourier Transformation (DFT), which I’ll sometimes, incorrectly, call the Fourier transform for brevity.

The definition of the DFT is as follows:

\[X[k] = \sum_{n=0}^{N-1}{x[n] e^{-j {2 \pi k n}/{N} } }\]

That looks intimidating, but if we use Euler’s formula, we can rewrite this as:

\[X[k] = \sum_{n=0}^{N-1}{x[n] \cos( \frac{2 \pi k n}{N} ) } - j \sum_{n=0}^{N-1}{x[n] \sin( \frac{2 \pi k n}{N} ) }\]

For a given frequency bucket $k$, we are multiplying the input signal by cosine and by a sine. This is essentially a correlation function that calculates the extent by which sine and cosine are part of the input signal. Since the cosine and sine have a 90 degree phase difference between them, we’re using complex notation for the final number:

\[X[k] = R + j I\]

The magnitude of the frequency of each frequench components is:

\[\left| X[k] \right| = \sqrt{R^2 + I^2}\]

The phase is the angle between R and I is:

\[\angle{X[k]} = \arctan(\frac{I}{R})\]

If the Fourier transform is applied to signal that doesn’t have complex samples, as is the case when there is only 1 ADC, then the Fourier transform has Hermitian symmetry: for every complex value on the positive frequency side, the corresponding negative frequency value will have the same real value $R_k$ and an inverted imaginary value $I_k$. Because of this, the amplitude is the same but the phase is inverted.

In the frequency plot above, only the amplitude is shown, hence the mirror image with identical amplitudes left and right.

In DSP land, a signal that doesn’t have imaginary component values is called a real signal. A signal that is complex and that doesn’t have a negative frequency components is an analytic signal.

A common way of saying that the sine and cosine have a 90 degree phase difference, is that they are in quadrature. It’s an extremely powerful concept that makes many DSP operations a whole lot easier, as we’ll see below.

Heterodyning the Signal to Baseband the Wrong Way

Imagine that we have multiple frequency bands or channels, that each channel has a bandwidth of 10 MHz and a center frequencies at 0, 10, 20, 30 and 40 MHz. The signal that we created above would then be part of the 20 MHz channel that ranges from 15 to 25 MHz.

To process the channel, we’d like to move it from 15 MHz to 25 MHz to the baseband range of -5 MHz to 5 MHz. For our case, this means that we want the 17 MHz and 22 MHz components to end up at -3 MHz and +2 MHz resp.

Moving a channel to baseband before doing further processing allows us to use the same DSP operations no matter which channel we’ve selected. It also allows us to reduce the sample rate from 100 MHz to something much lower, thus reducing DSP resource requirements.

You can shift the spectrum of a signal by multiplying it with a sine wave. The multiplication of 2 signals is also called mixing. And mixing with the purpose of moving the spectrum of a signal is called heterodyning. In the analog world, the signal is multiplied with the sinusoidal output of a local oscillator (LO). We still need this in the virtual work of DSP math in the form a simulated numerically controlled oscillator so I will keep on using the name of local oscillator.

The math of heterodyning a sine wave is straightforward. Here I show how it works in the continuous time domain, but it works the same after sampling. Let’s start with signal $s(t)$ and local oscillator $l(t)$:

\[s(t)= A \cos(2 \pi f_0 t) \\ l(t)= \cos(2 \pi f_c t) \\\]

Multiply the 2 signals to get heterodyne signal $y(t)$:

\[y(t) = s(t) l(t) = A \cos(2 \pi f_0 t) \cos(2 \pi f_c t) \\\]

Use the textbook trigonometry identity:

\[\cos \alpha \cos \beta = \frac{1}{2} \big[ \cos(\alpha + \beta) + \cos(\alpha - \beta) \big]\]

We get:

\[y(t) = \frac{1}{2} A \big[ \cos(2 \pi (f_0 + f_c) t) + \cos( 2 \pi (f_0 - f_c) t) \big]\]

This tells us is that multiplying a signal with frequency component $f_0$ with sine wave with frequency $f_c$ creates a new signal with 2 frequency components $f_0 + f_c$ and $f_0 - f_c$.

If we want to shift the center frequency of our channel from 20 MHz to 0 MHz, we need to multiply with a 20 MHz sine wave. Let’s simulate that:

lo_signal       = np.sin(2 * np.pi * lo_freq_hz * t)
signal_real_het = signal * lo_signal

This is the resulting spectrum:

That… didn’t go as we hoped.

The spectrum got shifted down by 20 MHz to 0 MHz and to -40 MHz, giving us peaks at -3 MHz and +2MHz and -37 MHz and -42 MHz. That’s what we wanted! But since lo_signal is a real signal, it has a mirror image at -20 MHz. This made the spectrum of the signal shift up to +3 MHz and -2 MHz and 37 MHz and 42 MHz.

Instead of the desired 2 peaks in the baseband, there are now 4 peaks, at -3, -2, 2 and 3 MHz. We’ve destroyed the original signal.

Heterodyning with a real local oscillator is a common operation in the analog world, but when this is done, the heterodyne doesn’t happen to baseband but a non-zero intermediate frequency. That is the idea of the superheterodyne receiver¹, a huge breakthrough in 1918 in the development of radio technology: it mixes the desired signal to a fixed intermediate frequency (IF), not the baseband, and does further demodulation such AM or FM on that IF signal.

(Source: Wikipedia)

Complex Heterodyne to the Rescue

We could definitely do a superheterodyne in the digital world, but many modern modulation schemes such as QAM or OFDM rely on the ability to process the signal in the baseband.

Luckily, there is a solution. The root of our troubles is the presence of a mirror frequency image for the local oscillator. If we can get rid of one of those orange LO peaks, only one spectrum image of the signal will get heterodyned into the baseband.

This is surprisingly simple: instead of a real sinusoid, we use a complex one as local oscillator:

\[l(t) = e^{-j 2 \pi f_c t}\]

This signal only has a peak in the spectrum at $-F_c$. We’re using a negative LO frequency because we want to shift the spectrum down so that positive image of the channel spectrum ends up at baseband. If we use $F_c$, the whole spectrum shifts up instead and the negative channel lands on baseband.

Let’s create the complex local oscillator signal and multiply it by the input signal:

complex_lo_signal   = np.exp(-1j * 2 * np.pi * lo_freq_hz * t)
signal_complex_het  = signal * complex_lo_signal

And voila:

We had to introduce complex numbers, but the result is worth it: the baseband has exactly what we want.

Filtering Away the Old Negative Image

The only thing that’s still bothering us are the 2 peaks around -40 MHz, the negative image of the channel that used to be at -20 MHz. This needs to go if we want to lower the sample rate by decimation.

We can easily do this with a low pass FIR filter. There are multiple ways to design those, I even wrote a blog post about it.

Here, I chose the windowing method to create a steep 201 taps FIR filter with a passband of 5 MHz.

fir_cutoff  = FIR_PASSBAND_MHZ / (SAMPLE_CLOCK_MHZ / 2.0)
h_lpf       = firwin(FIR_TAPS, fir_cutoff, window=("kaiser", FIR_KAISER_BETA), pass_zero=True)

The filter is applied by doing a convolution between the heterodyned signal and the filter taps in h_lpf:

signal_het_lpf      = np.convolve(signal_complex_het, h_lpf, mode="same")

Note that the samples of signal_complex_het are complex, but the filter coefficients are real.

Here’s the result:

Decimation

The spectrum has now been reduced to -5 MHz and 5 MHz. Since there is no mirror image, we can safely do a decimation without having to worry about aliasing as long as we obey Nyquist by keeping the width of the spectrum is equal or larger than the 2-sided width of channel, which is 10 MHz. With a sample rate of 100 MHz, we can decimate by a factor of 10.

signal_decim    = signal_het_lpf[::DECIM_FACTOR]

We now have 10 times less data to deal with, but the spectrum looks just the same as before:

Success!

Final Block Diagram

Wrapping up, we arrived at the following block diagram of operations and transformations:

The analog signal is converted to a real digital with a single channel, 100 Msps ADC.
A mixer and a complex local oscillator heterodynes the signal to baseband. The signal is now complex.
A low pass filter removes all frequencies that don’t reside in the baseband.
A decimator brings down the sample rate from 100 MHz to 10 MHz
The output is a complex 10 MHz sample stream.

Expressed mathematically:

\[y[m] = \big[(x[n] e^{-j 2 \pi f_c n}) * h_{\text{lpf}}[n]\big] \downarrow M \\ f_c = \frac{F_c}{F_s}, m = n M\]

The thing works, but is the optimal of doing things? My previous blog post about polyphase decimation filtering should be a hint that the answer is: definitely not.

Dealing with a complex instead of real signal doubles the number of math operations and performing the decimation at the end of the pipeline means that we’re doing a lot of math that gets thrown away.

But I have a much better understanding of complex heterodyning now, so that’s a definite win!

In a next installment, I’ll explore how this can be optimized.

Conclusion

In the Fred Harris video that started this all, complex heterodynes are everywhere and treated as a known quantity. And they’re straightforward once you get to know them better.

I used to think that dealing with signals in quadrature, representing them with complex numbers, was dOne primarily as a way to reduce the sample rate by half. There are certain potential cost savings to that.

But the benefits are more fundamental: they eliminate the issue of having to deal with mirror images in the spectrum.

Afterthought: the Fourier Transform is a Bunch of Averaged Complex Heterodynes

While writing this blog post, I suddenly struck me: the discrete time Fourier transform is the same as doing a complex heterodyne to 0 Hz and then calculating the DC value by summing the samples, for all frequencies of interest.

Complex heterodyne:

\[y[n] = x[n] e^{-j 2 \pi f_k n}\]

DFT:

\[X[k] = \sum_{n=0}^{N-1}{x[n] e^{-j {2 \pi k n}/{N} } } \\ f_k = k / N \\ X[k] = \sum_{n=0}^{N-1}{x[n] e^{-j {2 \pi f_k n } } } \\\]

This is kind of obvious when you think about it, but I had never dealt with complex heterodynes so it’s something new for me.

References

Footnotes

If you’re wondering why it’s called ‘super’: it’s because the result of the heterodyne is a signal that is still in the supersonic frequency range, as in, above the audible frequency range. Before superheterodyne receivers, the radio signal of interested was heterodyned straight to the audio range. ↩

Notes about Basic Polyphase Decimation Filters

2026-01-25T10:00:00+00:00

Introduction
The Decimation and Anti-Aliasing FIR Filter Combo
Naive Hardware Implementation
Reduce number of calculations - Move decimator before multiplier
Polyphase decomposition of the original filter
The Noble Identity for Decimation
Reusing Common Hardware in the Fast Clock Domain
Delayed multiplications instead of delayed inputs
Conclusion
References
Footnotes

Introduction

I’ve been reading up on polyphase filters and multi-rate digital signal processing. It’s a broad topic, but as a beginner I need to start with the basics. And to better internalize those, I like to expand the generic math into concretely worked-out, smaller examples.

And if I’m going to write things down anyway, I might as well put them in a blog post, this way I know where to look if I want to review things later.

I hope the content here is useful to someone, but don’t assume that I know what I’m doing. There are hundreds of articles on the web on the same topic, so make sure to sample a bunch of them to get different perspectives.

One of the things that clicked with me while writing this, is the benefit of rearranging the mathematical equations so that they reflect the hardware implementation. In the past, I’ve seen a different of architectures for polyphase filters. I was able to understand them intuitively but linking them to math adds an additional layer of confidence.

So that’s one of the things I’m doing here: switch back and forth between math and hardware architecture.

Update: in a later blog post, I added a section about common DSP notations. Check it out first!

The Decimation and Anti-Aliasing FIR Filter Combo

In digital signal processing (DSP), decimation is an operation in which you retain 1 out of every M samples and throw away the rest. It has the benefit of bringing the sample rate down, and thus the amount of data that flows through the system, the clock speed, the number of calculations etc. Decimation is a very common operation.

When following DSP theory, if you want to decimate a signal from a sample rate $f_s$ to a sample rate $f_{s/M}$, you first need to apply an anti-aliasing filter that removes all the frequeny components above $f_s/(2 \cdot M)$ to make sure that the Nyquist criterium remains valid after the sample frequency has been reduced.¹

When using an FIR filter, the conceptual block diagram looks like this:

Let’s do this with 7-tap FIR filter that has transfer function $H(z)$ and a decimation factor M of 3.

\[H(z) = h_0 + h_1 z^{-1} + h_2 z^{-2} + h_3 z^{-3} + h_4 z^{-4} + h_5 z^{-5} + h_6 z^{-6}\]

In this kind of notation, $z^{-2}$ means the input value that was delayed by 2 discrete steps. In electronics, the equation above has a delay line of 6 elements, each element is multiplied by a different value, and the result of those multiplication is added together.

Mathematically, the combiation of a filter followed by a decimator is often expressed like this:

\[H(z) \; \downarrow M\]

Let’s now take the following stream of input samples

\[\cdots, x[-3], x[-2], x[-1], x[0], x[1], x[2], x[3], x[4], \cdots\]

… and apply this stream to the filter equation for multiple time steps:

\[\begin{alignedat}{0} f[0] & = h_0 x[0] &+& h_1 x[-1] &+& h_2 x[-2] &+& h_3 x[-3] &+& h_4 x[-4] &+& h_5 x[-5] &+& h_6 x[-6] \\ f[1] & = h_0 x[1] &+& h_1 x[0] &+& h_2 x[-1] &+& h_3 x[-2] &+& h_4 x[-3] &+& h_5 x[-4] &+& h_6 x[-5] \\ f[2] & = h_0 x[2] &+& h_1 x[1] &+& h_2 x[0] &+& h_3 x[-1] &+& h_4 x[-2] &+& h_5 x[-3] &+& h_6 x[-4] \\ f[3] & = h_0 x[3] &+& h_1 x[2] &+& h_2 x[1] &+& h_3 x[0] &+& h_4 x[-1] &+& h_5 x[-2] &+& h_6 x[-3] \\ f[4] & = h_0 x[4] &+& h_1 x[3] &+& h_2 x[2] &+& h_3 x[1] &+& h_4 x[0] &+& h_5 x[-1] &+& h_6 x[-2] \\ f[5] & = h_0 x[5] &+& h_1 x[4] &+& h_2 x[3] &+& h_3 x[2] &+& h_4 x[1] &+& h_5 x[0] &+& h_6 x[-1] \\ f[6] & = h_0 x[6] &+& h_1 x[5] &+& h_2 x[4] &+& h_3 x[3] &+& h_4 x[2] &+& h_5 x[1] &+& h_6 x[0] \\ f[7] & = h_0 x[7] &+& h_1 x[6] &+& h_2 x[5] &+& h_3 x[4] &+& h_4 x[3] &+& h_5 x[2] &+& h_6 x[1] \\ f[8] & = h_0 x[8] &+& h_1 x[7] &+& h_2 x[6] &+& h_3 x[5] &+& h_4 x[4] &+& h_5 x[3] &+& h_6 x[2] \\ f[9] & = h_0 x[9] &+& h_1 x[8] &+& h_2 x[7] &+& h_3 x[6] &+& h_4 x[5] &+& h_5 x[4] &+& h_6 x[3] \\ \end{alignedat}\]

Decimate by selecting 1 out of every 3 filtered sample:

\[\begin{alignedat}{0} y[0] & = f[0] \\ y[1] & = f[3] \\ y[2] & = f[6] \\ y[3] & = f[9] \\ \end{alignedat}\]

Or:

\[\begin{alignedat}{0} y[0] & = h_0 x[0] &+& h_1 x[-1] &+& h_2 x[-2] &+& h_3 x[-3] &+& h_4 x[-4] &+& h_5 x[-5] &+& h_6 x[-6] \\ y[1] & = h_0 x[3] &+& h_1 x[2] &+& h_2 x[1] &+& h_3 x[0] &+& h_4 x[-1] &+& h_5 x[-2] &+& h_6 x[-3] \\ y[2] & = h_0 x[6] &+& h_1 x[5] &+& h_2 x[4] &+& h_3 x[3] &+& h_4 x[2] &+& h_5 x[1] &+& h_6 x[0] \\ y[3] & = h_0 x[9] &+& h_1 x[8] &+& h_2 x[7] &+& h_3 x[6] &+& h_4 x[5] &+& h_5 x[4] &+& h_6 x[3] \\ \end{alignedat}\]

Naive Hardware Implementation

A straight up hardware implementation looks like this:

As mentioned before, we have 6 delay elements and 7 multipliers that operate on the each stage of the delay line.

This solution is dumb: we calculate a filter output for every input clock cycle only to throw away 2 out of 3 results. Let’s do better.

Reduce number of calculations - Move decimator before multiplier

We can reduce the number of calculations by moving the decimator before the filter.

All multiplications still happen at the same time but they can now be performed in a clock domain that is 3 times slower. This definitely reduces power and also reduces the multiplication area in an ASIC process, because timing paths won’t be as strict.

While the number of multiplications per unit of time has been reduced by 3, the number of multipliers is still the same.

The data flowing through this architecture looks like this:

When you look at bit closer, you can see that pipes of input samples with the same color have the same data flowing through them: the input feed of the $h_0$ multiplier sees the same $x[3i]$ samples as the $h_3$ and the $h_6$ multipliers, it’s just that there is a delay of 1 clock cycle in the slow clock domain for each term. Similarly, $h_1$ and $h_5$ multipliers see samples $x[3i+1]$, and the $h_2$ and $h_6$ multipliers see samples $x[3i+2]$.

Polyphase decomposition of the original filter

Let’s take the earlier equation for value $y[3]$ and decorate the 7 terms with the colors of the diagram:

\[y[3] = \color{red}{h_0 x[9]} + \color{green}{h_1 x[8]} + \color{blue}{h_2 x[7]} + \color{red}{h_3 x[6]} + \color{green}{h_4 x[5]} + \color{blue}{h_5 x[4]} + \color{red}{h_6 x[3]}\]

Now split the equation into 3 steps so that each step uses input values $x[i]$ with the same color:

\[\begin{alignedat}{0} \mathrm{tmp} &=& \color{red} {h_0 x[9]} &\;+\;& \color{red} {h_3 x[6]} &\;+\;& \color{red} {h_6 x[3]} \\ \mathrm{tmp} &=& \color{green}{h_1 x[8]} &\;+\;& \color{green}{h_5 x[5]} && &\;+\;& \mathrm{tmp} \\ y[2] &=& \color{blue} {h_2 x[7]} &\;+\;& \color{blue} {h_4 x[4]} && &\;+\;& \mathrm{tmp} \\ \end{alignedat}\]

What we’ve done here is split 7-tap filter $H(z)$ into 3 separate sub-filters:

\[H_0(z) = h_0 + h_3 z^{-1} + h_6 z^{-2} \\ H_1(z) = h_1 + h_4 z^{-1} + h_7 z^{-2} \\ H_2(z) = h_2 + h_5 z^{-1} + h_8 z^{-2} \\\]

(In our example, $h_7$ and $h_8$ are zero.)

The equation of the original filter is now this:

\[H(z) = H_0(z^3) + z^{-1} H_1(z^3) + z^{-2} H_2(z^3)\]

This is the polyphase decomposition of the original filter.

The exponent of 3 in $z^3$ tells us that input to each sub-filter is a decimated version, because if we substitute $z^{3}$ into the set of equations $H_i(z)$, we get:

\[H_0(z^3) = h_0 + h_3 {z^3}^{-1} + h_6 {z^3}^{-2} \\ H_1(z^3) = h_1 + h_4 {z^3}^{-1} + h_7 {z^3}^{-2} \\ H_2(z^3) = h_2 + h_5 {z^3}^{-1} + h_8 {z^3}^{-2} \\\]

Or:

\[H_0(z^3) = h_0 + h_3 z^{-3} + h_6 z^{-6} \\ H_1(z^3) = h_1 + h_4 z^{-3} + h_7 z^{-6} \\ H_2(z^3) = h_2 + h_5 z^{-3} + h_8 z^{-6} \\\]

The polyphase equation of $H(z)$ is now:

\[H(z) = (h_0 + h_3 z^{-3} + h_6 z^{-6}) + z^{-1} (h_1 + h_4 z^{-3} + h_7 z^{-6}) + z^{-2} (h_2 + h_5 z^{-3} + h_8 z^{-6})\]

Which becomes this:

\[H(z) = (h_0 + h_3 z^{-3} + h_6 z^{-6}) + (h_1 + h_4 z^{-4} + h_7 z^{-7}) + (h_2 + h_5 z^{-5} + h_8 z^{-8})\]

And after reordering the terms and setting $h_7$ and $h_8$ to zero, we’re back to the definition of $H(z)$ at the start of this blog post:

\[H(z) = h_0 + h_1 z^{-1} + h_2 z^{-2} + h_3 z^{-3} + h_4 z^{-4} + h_5 z^{-5} + h_6 z^{-6}\]

The Noble Identity for Decimation

Those who are studying multi-rate digital signal processing will almost certainly be confronted with the noble identities.

For decimation, the noble identity is formulated as follows²:

\[\downarrow M \: H(z) \equiv H(z^M) \: \downarrow M\]

When I first got exposed to that, I thought it was confusing, but after going through the motions of the math equations above, it started to make sense.

What it says is:

Performing a decimation and applying those samples to filter $H(z)$ is equivalent to applying the same filter to every M-th sample and then doing the decimation.

Let’s look back at the polyphase decomposition of our original $H(z)$:

\[H(z) = H_0(z^3) + z^{-1} H_1(z^3) + z^{-2} H_2(z^3)\]

It important to note that we can’t apply the noble identity to our $H(z)$ directly, because its coefficients $h_1$, $h_2$, $h_4$ and $h_5$ are non-zero. But we can apply it to the 3 individual phases.

Like this:

\[H(z) \downarrow 3 = (\downarrow 3 \: H_0(z)) + z^{-1} (\downarrow 3 \: H_1(z)) + z^{-2} (\downarrow 3\: H_2(z))\]

Converted to a hardware diagram:

It’s not immediately obvious, but this last diagram is similar to the previous one after we’ve rearranged some items:

there’s now 1 decimator per phase instead of one per coefficient.
a single bank of 7 multipliers and one addition has been refactored into 3 banks of multipliers with addition, and then one final addition.
each multiplier-addition bank has its own delay elements.

Reusing Common Hardware in the Fast Clock Domain

In the previous diagram, it’s clear that there’s a lot of common hardware between the different phases. We can exploit that by doing everything in the fast clock domain and reuse the hardware that’s used for one phase for the other phases.

Recall the previous equation where the result was calculated in 3 steps:

Now check out this diagram:

Everything happens in the fast clock domain, but there are only 3 multipliers instead of 7 and we’re only adding 4 numbers together at any time. The only extra cost is a register to store the tmp value, and each of the multipliers has a multiplexer to rotate between different coefficients.

There is only one $y[m]$ output every 3 clock cycles.

Here’s the same diagram annotated with intermediates values for different time steps:

Delayed multiplications instead of delayed inputs

In the previous diagram, the inputs are delayed and the multiplications summed together. But that’s not the only way to implement this.

Let’s start again from the original equation:

\[H(z) = \color{red}{h_0 z^0} + \color{green}{h_1 z^{-1}} + \color{blue}{h_2 z^{-2}} + \color{red}{h_3 z^{-3}} + \color{green}{h_4 z^{-4}} + \color{blue}{h_5 z^{-5}} + \color{red}{h_6 z^{-6}}\]

Reformat:

\[\begin{alignedat}{0} H(z) = && ( \color{red}{h_0 z^0} + \color{green}{h_1 z^{-1}} + \color{blue}{h_2 z^{-2}} ) \\ + && ( \color{red}{h_3 z^{-3}} + \color{green}{h_4 z^{-4}} + \color{blue}{h_5 z^{-5}} ) \\ + && ( \color{red}{h_6 z^{-6}} ) \\ \end{alignedat}\]

Extract common $z^{-3}$ and $z^{-6}$:

\[\begin{alignedat}{0} H(z) = && ( \color{red}{h_0 z^0} + \color{green}{h_1 z^{-1}} + \color{blue}{h_2 z^{-2}} ) \\ + && z^{-3} ( \color{red}{h_3 z^{0}} + \color{green}{h_4 z^{-1}} + \color{blue}{h_5 z^{-2}} ) \\ + && z^{-6} ( \color{red}{h_6 z^{0}} ) \\ \end{alignedat}\]

Extract common $z^{-3}$:

\[\begin{aligned} H(z) = \; & ( \color{red}{h_0 z^{0}} + \color{green}{h_1 z^{-1}} + \color{blue}{h_2 z^{-2}}) \\ & + z^{-3} \Big[ ( \color{red}{h_3 z^{0}} + \color{green}{h_4 z^{-1}} + \color{blue}{h_5 z^{-2}} ) \\ & \qquad\qquad\;\; + z^{-3}( \color{red}{h_6 z^{0}} ) \Big] \end{aligned}\]

We now have a nested structure, with a delay of 3 for each nesting level.

In hardware that looks like this:

This structure is not intrinsically worse or better than the previous one, the architecture to use will depend on the technology that you’re mapping it to. On FPGAs, for example, you should choose something that makes efficient use of the built-in pipelining registers inside their DSP blocks.

Conclusion

This only scratches the surface of polyphase filters. I didn’t even mention interpolation, which does the opposite of decimation but has very similar computational characteristics. I plan to cover addition of topics in the future, especially the expantion of a polyphase filter into a polyphase filter bank. That said, I can’t promise a timeline.

References

Stackexchange - How to implement Polyphase filter?

Making a polyphase filter implementation is quite easy; given the desired coefficients for a simple FIR filter, you distribute those same coefficients in “row to column” format into the separate polyphase FIR components

Footnotes

This is not entirely true. You can also apply a bandpass anti-aliasing filter that only retains a part of the spectrum above the new sample rate, and use decimation to bring that section down to the baseband. But that’s a topic for a future blog post. ↩
$\equiv$ means “is equivalent to.” ↩

The Scenic Route to Repairing a Self-Destructing SRS DG535 Digital Delay Generator

2025-12-24T10:00:00+00:00

Introduction
The Stanford Research Systems DG535
Who Uses a Pulse Delay Generator?
Inside the DG535
The Annoying Mechanical Design of the DG535
It’s Always the Power Supply
Power Architecture of the DG535
The How, Why, and Please Don’t of Current Boost Resistor Circuits
Root Causing the DG535 Issue
Debugging the +7V Issue
Side Quest: Debugging the CPU System - Connector Stupidity
Fixing the Burnt PCB Trace
Endless Boot Loop after Reassembly
DG535 Up and Running with a Variac
Tracking down the +12/-12V on the +9/-9V Rails
LCD Replacement
Post Mortem
References
Footnotes

Introduction

I got my hands on a Stanford Research Systems DG535 at the Silicon Valley Electronics Flea Market, $40 for a device that was marked “X Dead”.

That’s a really good deal: SRS products are pricey and even the cheapest Parts-Only listings on eBay are $750 and up. Worst case, I’d get a few weekends of unsuccessful repair entertainment out of it, but even then I’d probably be able to recoup my money by selling pieces for parts.¹ Just the keyboard PCB is currently selling for $150².

It doesn’t matter how broken they are, the first step after acquiring a new toy is cleaning up years of accumulated asset tracking labels, coffee stains, finger grime and glue residue. This one cleaned up nicely; the front panel is pretty much flawless:

After an initial failed magic smoke repair attempt, the unit went back to the garage for 18 months, but last week I finally got around to giving it the attention it deserves.

The repair was successful, and when you only look at the end result, it was a straightforward replacement of a diode bridge and LCD panel. However, the road to get there was long and winding. The broken power architecture and awkward mechanical design of the SRS DG535 made it way too easy to damage the device because I was trying to repair it.

So let’s get this advise out of the way first:

Do NOT power on the device with the analog PCB disconnected. It will almost certainly self-destruct with burnt PCB traces.

The details will be explained further below.

The Stanford Research Systems DG535

Conceptually, the purpose of the DG535 is straightforward: it’s a tool that takes in an input trigger pulse and generates 4 output pulses after some programmable delay. What makes things interesting is that these delays can be specified with a 5 ps precision, though the jitter on the outputs far exceed that number.

The DG535 has 9 outputs on the front panel:

T0 marks the start of a timing interval. You’ll most likely use it when you use the device with an internal trigger to know when a timing sequence has started. There is delay of around 85ns between the external trigger and T0.
4 channels A, B, C and D can independently be configured to change a programmable time after T0 or after some of the other channels.
Output AB is a pulse for the interval between the time set for A and B. It’s an XNOR between those 2 channels. -AB is the inverse of output AB. CD and -CD are the same for channels C and D.

(Click to enlarge)

All outputs support a number of logic standards: TTL, ECL, NIM³, or fully programmable voltage amplitude and offset.

Settings can be entered through the front panel or through a GPIB interface that is available at the back of the device.

In addition to the GPIB interface, the back has another set of T0/A/B/C/D outputs because my unit is equiped option 02. These outputs are not an identical copy of the ones in the front: their amplitudes can go from -32V to 32V when terminated by a 50 Ohm impedance and each output has pulse width of roughly 1 us.

There is also a connector and a switch to select either the internal or an external 10 MHz timebase. Missing screws around the transformer housing are an indication that I’m not the first one who has been inside to repair the unit.

This 1993 ad lists the DG535 for $3500. It is currently still for sale on the SRS website for $4495, remarkable for an instrument that dates from the mid 1980s. I assume that today’s buyers are primarily those who need an exact replacement for an existing, certified setup, because the DG645, SRS’s more modern successor with better features and specs, costs only $500 more.

Who Uses a Pulse Delay Generator?

Anyone who has a setup where multiple pieces of test or lab equipment need to work together with a strictly timed sequence.

When you google for applications where the DG535 is used, you get a long list of PhD theses, national or military laboratory documents, optical setups with lasers and so on. Look closer at the first picture of this blog post and you can see that mine was used by Chemical Dynamics in a molecular beam setup… whatever that is.

Here are just a few examples:

Combustion and Flame - A comprehensive study on dynamics of flames in a nanosecond pulsed discharge. Part II: Plasma-assisted ammonia and methane combustion

We employed a delay generator (SRS system, DG535) to control the timing of the plasma and measurement systems. The DG535 generator was externally triggered by the pre-triggering signal from the laser system and then sent sequential TTL signals to trigger the ns pulse generator and camera.
Astigmatism-free 3D Optical Tweezer Control for Rapid Atom Rearrangement

Images were taken at delayed time steps (250-ns shutter, SRS DG535) as the translation stage was stepped from Z min = − 24.5 mm to Z max = 24.5 mm.
Rapid elemental imaging of copper-bearing critical ores using laser-induced breakdown spectroscopy coupled with PCA and PLS-DA

A delay generator (SRS DG-535) synchronized the laser and detection systems to capture time-integrated spectra at each point.
High-precision Gravity Measurements Using Atom-Interferometry

The timing of the pulsesis controlled by a set of synchronized pulse generators (SRS DG535), one of which also triggers all the hardware involved in generating the Raman frequencies.
Physics of Plasmas - Laboratory generation of multiple periodic arrays of Alfvénic vortices

Each antenna was switched on with a pulse generator (Stanford SRS-DG535), which then activated two arbitrary waveform generators (Agilent 3322A).
Liquid-to-gas transfer of sodium in a liquid cathode glow discharge

The laser system, operating at a 200 Hz repetition rate, was synchronized with the plasma discharge using a SRS DG535 delay generator, allowing time-resolved measurements of Na fluorescence during and after the discharge pulse.

At this time, I don’t have a use for a pulse delay generator, but as a hobbyist it’s important to keep the following in mind:

We buy test equipment NOT because we need it, but because one day we might need it.

I don’t see a future where I’ll be doing high-precision gravity measurements in my garage, but a DG535 could be useful to precisely time a voltage glitching pulse when trying to break the security of a microcontroller, for example.

Inside the DG535

It’s not complicated to create pulse delay generator as long as delay precision and jitter requirements are larger than the clock period of the internal digital logic: a simple digital counter will do. But when the timing precision is smaller than the clock period, you need some analog wizardry to make it happen.

SRS includes detailed schematics and theory of operation for many of their products and the DG535 is no exception. It’s a great way to study and learn how non-trivial problems were solved 40 years ago.

The DG535 takes a combined digital/analog approach to create delays of up to 1000 s. With an 80 MHz internal clock, the digital delay can be specified with 12.5 ns of precision. The remainder is handled by two analog circuits: the jitter circuit measures the delay between the start of the external trigger and the next rising edge of the 80 MHz clock. The analog delay circuit creates a delay between 0 and 12.5 ns after digital delay has expired. Channels A/B/C/D each have their own instance of the analog delay circuit.

(Click to enlarge)

I will leave the low level details to a future blog post, but at their core, both the jitter and analog delay circuit work by precharging and discharging a capacitor with a constant current source for a time that varies between 0 and 12.5 ns. Precharging the analog delay capacitor is controlled by a 12-bit DAC. If you were wondering where the 5 ps of precision limit is coming from: 12.5 ns / (2^12) = 3 ps. Close enough!

Using a capacitor to measure time with higher precision that the digital clock is called “analog interpolation”. It’s often used by time interval and frequency counters such as the SRS SR620. I briefly touch this in my blog post about linear regression in frequency counters.

The Annoying Mechanical Design of the DG535

In my blog post about the SR620, I comment on a mechanical design that gives full access to all components by just removing the top and bottom cover. The DG535 is a different story.

While the covers are just as easy to remove, the functionality is spread over 2 large PCBs, mounted with components facing inwards, and connected with a bunch of cables that are too short to allow separating the PCBs.

(Click to enlarge)

SRS was clearly aware that this PCB arrangement makes the unit harder to repair, because they helpfully added component designators and even component name annotations on the solder side of the PCB, though, sadly, there are no dots to mark pin 1 of an IC.⁴

Most cables have connectors and can easily unplugged, but not all of them.

The red and orange wires in the picture above deliver +20 and -20V from the top PCB to the OPT02 PCB that is mounted below the bottom PCB. They are just long enough. If you want to take the unit apart, your only choice is desoldering these wires. It’s not rocket science, but… really? You also need to desolder the wires that power the cooling fan.

Enough whining… for now. When all wires are desoldered, connectors disconnected and screws removed, you can fold open the top PCB from the rest of the unit and get a full view of the inner components:

(Click to enlarge)

We can see:

a top PCB that contains a Z80-based controller and the counters that are used for the digital delay generation
a bottom PCB with the rest of the delay and output driver circuitry
the front has a generic LCD panel and a keyboard and LED PCB

It’s Always the Power Supply

Before taking it apart, I had already powered up the device and nothing happened: the LEDs and the LCD screen were dead, only the fan spun up. No matter what state a device is in, you always have to make sure first that power rails are functional.

The power architecture is split between the top and bottom PCB, but the two secondary windings of the power transformer first go to the top PCB. Since the transformer is located at the bottom, you always need to keep top and bottom PCBs closely together if you want to make live measurements.

(Click to enlarge)

On the schematic, we can see the output of integrated full-bridge rectifier BR601 go to linear regulators U601 and U503 to create +15V and -15V and then immediately to the connector on the right which goes to the bottom PCB. These voltage rails are not used by the top PCB.

A discrete diode bridge and some capacitors create an unregulated +/-9V that goes to the same connector and to U501 / LM340-5, a linear +5V regulator that is functionally equivalent to a 7805. The 5V is used to power pretty much the entire top PCB as well as some ICs on the bottom.

I measured the following voltages on the top-to-bottom power connector:

0V - instead of 10V
+15V - good!
+12V - instead of +9V
+7V - instead of +5V. Horrible!
GND
0V - instead of -9V
-15V - good!

The lack of 10V is easy to explain: it’s an input, generated by a high precision voltage reference on the bottom PCB out of the +15V. On the top PCB, it’s only used for dying gasp⁵ and power-on/off reset generation.

+12V instead of +9V was only a little bit concerning, at the time. The lack of -9V was clearly a problem. And applying +7V instead of +5V to all digital logic is a great way to destroy all digital logic ICs.

Here’s the part of the PCB with the 9V diode bridge:

Observations:

the discrete diodes look like a bodge
marked in red, there is a blackened spot above-right of the diodes
there is a green patch wire. There are quite a bit of those on the top PCB and they turned out to be harmless; they work around bugs in the PCB itself.

2 discrete diodes were on the other side of the PCB to complete the discrete full bridge, though one soon fell off. Underneath the discrete diodes is a footprint for a BR501 full bridge rectifier that is not in the schematic⁶!

While doing these measurements, magic smoke appeared at the same location as blackened spot in the picture. At that point, I called it quits and left the unit sit for 18 months.

The schematic shows jumpers on the +/-15V and the +5V rail, see the orange rectangle in the previous picture. These are intended for power measurements, but when removed they also disconnect the not-at-all-5V rail from the digital logic and thus protect it from further damage until I had sorted out the issue.

Power Architecture of the DG535

Since I suspected a problem with the discrete diode bridge bodge on the top PCB, the plan was to repopulate the PCB with an integrated full-bridge rectifier. Turns out: even though the schematic in the manual shows a discrete bridge, the schematic description in the same manual indeed talks about an integrated full bridge. Instead of buying one at Digikey and pay $7 for shipping a $1 component, I found a suitable 100V/2A alternative, a 2KBP01M, at Anchor Electronics, the last remaining Silicon Valley retail components supplier, conveniently located across the street from work.

The footprint of the new diode bridge wasn’t quite the same, but you can easily nudge the pins a bit to make it work.

I then had a look at the schematic of the bottom PCB power supply:

(Click to enlarge)

More linear regulators, 2 on the unregulated +8V (?) rail to create +6V and +5.2V, and 3 on the unregulated -8V rail to create -2V, -5.2V, and -6V (“actually -5.6V”).

Now here’s the interesting part: the -2V and -5.2V rails have heavy duty 5W 18 Ohm and 10 Ohm resistors between the input and the output of their linear regulator.

These are called current boost resistors and while they are useful in the right conditions, they are bad news. And when we go back to the top PCB, here’s what we see:

It may not be in the schematic, but located below right next to the 5V regulator is another 10 Ohm 5W current boost resistor.

The How, Why, and Please Don’t of Current Boost Resistor Circuits

The purpose of a current boost resistor is to partially offload a linear regulator.

Imagine we have design with a 5V rail and a load with an equivalent resistance of 3 Ohm, good for a constant current draw of 1.67A. When we only use a linear regulator with 9V on the input side, the current through the regulator will be 1.67A as well and the regulator needs to dissipate (9-5) * 1.67 = 6.7 W. That is too much for a 7805 in TO-220 package to handle: with the right heatsink, 1.5A is about the limit.

With a 10 Ohm current boost resistor, the 7805 still supplies current to keep the voltage across the load at 5V, but the current boost resistor injects a constant current of (9-5)/10 = 0.4A. This reduces the current through the regulator from 1.67 A to 1.27 A and its power dissipation from 6.7W to 5.1W. The dissipation in the resistor is (9-5)^2 / 10 = 1.6 W. The total power consumption remains the same: 5.1 W + 1.6 W = 6.7 W.

What have we gained? For the price of adding a beefy 10 Ohm resistor, we’re now staying within the current and power limits of the 7805 in TO-220 package. There is no need to upgrade the 7805 to a much larger TO-3 package and the changes to the PCB are minimal.

But there is a price to pay! In fact, there’s more than one.

Overvoltage risk when system load goes down

A linear regulator can only supply current from input to output; it can’t sink current from output to input. If the system load drops below the 0.4A that’s supposed to be supplied by the current boost resistor, that 0.4A has nowhere to go and the voltage at the output of the regulator has to go up.

We can see that here:

Assume that the system load has reduced and the equivalent system resistance is now 90 Ohm instead of 3 Ohm. The current through the 2 resistors is just 0.09A. The voltage at the 7805 output node is 8.1V and there is nothing the 7805 can do to bring the voltage down.

No safeguards when input voltage goes up

Another issue is when the input voltage increases. In the example below, it goes from +9V to +12V. The power dissipation in the 7805 goes up a little bit, but the one in the current boost resistor increases from 1.6W to 4.9W.

The +12V that I measured on one of the connector is more than just a little bit concerning after all.

Even without the current boost resistor, +12V at the input would be a real problem, since all the power of the resistor would have to be dissipated by the regulator. But with only a regulator, there is at least the possibility of including safeguards: there could be a current limiter, a temperature monitor, worst case, the regulator burns out and disconnects the output from the input. With a dumb resistor you have none of that.

In my 5370A repair blog post, I describe the current limiters that are part of its discrete linear voltage regulators: when the current is too high, the output voltage is reduced. The DG535 has no such safety mechanism.

Root Causing the DG535 Issue

Let’s recap the issues that I had to deal with:

+7V on the +5V rail
+12V instead of +9V at the input of the 7805 regulator
A blackened PCB

These issues were all related.

Debugging the +7V Issue

The +7V could be explained by the current boost resistor and a load that was too low. If the load is too low anyway, why not temporarily desolder the current boost resistor and check what happens? I did that and the voltage on the +5V rail predicably dropped down to +5V. The temperature on the 7805 remained in check. Good!

But why was the load too low?

A quick probe on the pins of the Z80 CPU showed no activity. Better yet: there was no clock!

(Click to enlarge)

The 5 MHz CPU clock is derived from the 10 MHz clock, which comes from connector J40: the cable that connects the top and bottom PCB. In other words: if you run the top PCB by itself, there is no clock. And without a clock, the power consumption of the CPU system will be much lower… and with a current boost resistor, the voltage will rise to +7V.

To run the CPU board stand-alone with an active clock, I configured my HP 33120A signal generator to generate a 10 MHz signal and routed its SYNC output to connector J40.

(Click to enlarge)

In the picture above, in addition to the signal generator, you can also see an HP 3631A power supply that outputs 10V: this is a replacement of the reference voltage that’s needed for the dying gasp and reset generator that I mentioned earlier. These are the 2 external signals that are needed to run the CPU top PCB without the analog bottom PCB, though only for a short time: without current boost resistor and cooling fan, the 7805 was now taking on all the current and warming up quickly.

Important: The +12V issue was still there! As soon as the current boost resistor was placed back, it was dissipating 5W and its temperature rose to 130C almost instantly!!!

Side Quest: Debugging the CPU System - Connector Stupidity

With the CPU clock running, I expected some activity on the keyboard/LED and LCD boards, but the CPU seemed stuck.

It took a lot of effort to root cause this. I dumped the ROM contents⁷, used Ghidra to disassemble the code. I also used a logic analyzer to trace the Z80 address bus to get a better insight into what was happening, resulting in this pretty picture:

After many hours, the simple conclusion was this: the connector of the LCD panel cable was plugged in incorrectly. This pulled high a crucial status bit on the data bus which made the Z80 go into an endless loop.

I partially blame SRS for this: the way they deal with connector-related documentation is horrible, unconventional, and inconsistent. Just look at this beauty:

At the bottom right (red), they lay out a pinout convention. The keyboard/LED PCB (green) doesn’t follow that convention. The LCD panel display does follow it, but this is a standard 14-pin interface that’s used by an HD44780-based LCD controller which uses an entirely different convention. They also don’t consistently mark pin 1 on the PCB.

Still, even after fixing that, the LCD didn’t come up. This turned out to be due to another signal that came from the bottom PCB, the analog voltage that sets the LCD contrast. It was sufficient to connect that to ground. That’s the blue wire that the red arrow is pointing to:

The LCD was working now, but without backlight. The backlight of the original LCD panel requires 120V AC with a 50 kOhm resistor in series. This voltage is coming straight from a primary winding of the transformer. I measured 120V just fine, so the backlight was broken. It doesn’t make the display entirely unreadable, but it’s definitely annoying.

Fixing the Burnt PCB Trace

When I measured the voltages at the start of this journey, I noticed that the -9V was missing on the power connector towards the analog PCB. The trace to this connector is running below the overheating current boost resistor. All I needed to do was install a replacement wire.

Endless Boot Loop after Reassembly

After going through the pain of reassembling the whole unit, I has hopeful that I’d be able to at least operate the keyboard and see things happening on the LCD. That, of course, didn’t happen. Instead, the unit got into an endless boot loop, showing the splash screen, then going blank, repeat.

This cost me another couple of hours to root cause. I disconnected all wires to the top PCB to revert back to the condition where things worked before, but no luck. Eventually, I stumbled onto the “Cold Boot” section in the user manual:

If the instrument turns on, but is completely unresponsive to the keyboard, then the RAM contents may have been corrupted causing the instrument to “hang”. To remedy this situation, turn the unit off, then hold down the BSP (backspace) key down and turn the unit back on again.

Like many old pieces of test equipment, the DG535 uses a BR-2/3A 3V lithium battery to retain settings and calibration values while the unit is powered down. The battery was still good when I measured its voltage, but maybe there was a short circuit due reassembly that made the SRAM loose its contents.

Either way, after following the power-up procedure from the manual the unit worked again.

Note that it’s not necessary to go through a full recalibration after losing the SRAM contents: the EPROM that holds the firmware also contains calibration constants and the serial number that are unique to each unit. That’s pretty cool! The calibration constants are guarded by a checksum to ensure their correctness. What’s puzzling is that the firmware checks the correctness, but when it detects an error, instead of reporting a meaningful error, it does a system reset and retries again, leaving the operator to guess what went wrong.

DG535 Up and Running with a Variac

I still hadn’t tracked down the +12V/-12V on the +9V/-9V rails, but with everything else fixed, I wondered if I could get the full unit to work. Just a few flea markets ago, I had picked up a variac for $15. I always wondered why people need such a thing, and wouldn’t you know it, this was the perfect use case: reduce the mains voltage from 120V AC to ~100V AC to bring down the voltage on the secondary windings of the transformer.

And just like that, the DG535 was working!

With my SR620 time interval counter and averaging a lot of measurements, I was even able to show that delays could be changed with 5 ps precision.

I measured a power consumption of 62W, not too far away from the 70W that’s specified in the manual, which is just a case of being conservative. Right?

Tracking down the +12/-12V on the +9/-9V Rails

I once again spent a long time trying to track down the 12V vs 9V issue. My only theories were a short somewhere in the transformer, or some wires misconnected during an earlier repair, or the original transformer being replaced by an incorrect one, but extensive and sometimes questionable measurement practises didn’t turn up anything.

Other than secondary winding voltages being too high, the transformer behaved fine.

I started a thread on the EEVblog forum about rewinding a transformer where someone suggested that the output voltage of a transformer can be… load dependent. When only the CPU board was connected, I had measured an overall power consumption of 10W, 60W below specification.

I removed the variac from the setup and measured a power consumption of 72W. The measured voltage on the +9V rail was +10.2V. Enough to raise the power consumption in the 5V current boost resistor from 1.6W to 2.7W, but well within spec of its 5W rating.

The +12V issue was another manifestation of the lack of load resulting in a self-distructing unit! And I had been chasing another ghost.

LCD Replacement

With the unit now fully working, all that remained was fixing the LCD backlight. SRS sells a replacement LCD panel for a ridiculous $200. This must be old stock because you can’t find ones anymore with a 120VAC backlight power supply.

Instead, I bought a CFAH2001B-TMI-ET panel from crystalfontz.com.

It has a 16 pin instead of 14 pin interface, but the 2 additional pins are for the backlight. The original LCD has separate pins for that.

The backlight has an LED with threshold voltage of 3.5v. The typical current is 48mA. The LED connector has a 5V pin already, but the top PCB creates this voltage rail with a 5.1V zener diode and a series resistor from the +15V rail.⁸ This rail can’t supply 48mA. Instead, I used the +5V pin of the keyboard/LCD PCB nearby, with a 30 Ohm resistor in series, good for a current of (5-3.5)/30 = 50 mA.

The new LCD panel is considerably thicker than the old one, so you can’t reuse the old screws.

I used Everbilt #4-40 3/8” machine screws from Home Depot instead. Be carefull when tightening those new screws: it’s now possible to overdo things and bend the LCD PCB.

My unit had only 2 out of 4 transformer mounting screws in place. Home Depot didn’t have the #10-32 1 5/8” screws, but slightly shorter #10-32 1 1/2” screws worked fine.

After one more round of carefully connecting all connectors back in place, the DG535 was finally back to where it needed to be:

Post Mortem

A bunch of things went wrong during the design and repair of this DG535.

Design weaknesses:

Current boost resistors make a design prone to self-destruction due to overvoltage when the system load is too low due to some internal failure.
Current boost resistors also result in burning out a PCB when the voltage difference between input and output of a voltage regulator becomes too high. This can again happen when the system load is lower than designed for.
The schematic in the manual shows a discrete diode full bridge for the unregulated +/-9V rail, instead of an integrated one, and no current boost resistor.
The mechanical design and short cables make it tempting to power the top PCB without connecting the bottom PCB… which cuts down the system load dramatically.
The power consumption of the top PCB is very low when the bottom PCB is disconnected, due to the lack of 10 MHz clock.
the pinout of the connectors of the DG535 doesn’t follow standard convention, and the convention that is documented in the manual is violated on the same page.
The schematic of the top PCB shows a +/-9V rail. The bottom PCB schematic shows +/-8V rails on the same connector pins. In reality, the measured voltage is 10.2V. Confusing.

Repair mistakes:

A previous attempt at repairing saw the replacement of an integrated diode bridge by a discrete one. To make things worse, they used 1N5822 Schottky diodes, as shown in the incorrect schematic. Schottky diodes have a threshold voltage of 0.4V instead of a 0.7V threshold for the integrated diode bridge. Because of this, the unregulated DC output was 2 x (0.7 - 0.4V) = 0.6V higher, which increased the power consumption in the current boost resistors even more!
PCBs were powered on without full load. This resulted in PCB traces burning up.
Connectors were incorrectly plugged in. I should have taken pictures before disconnecting anything.
I knew not enough about transformers and wasted way too much time chasing a ghost because of it!

In the end, I only made 3 real fixes:

removed the discrete diode bridge and replaced it by an integrated one
installed a bodge wire to bring the -9V to the top-to-bottom PCB power connector
replaced the LCD panel with broken backlight by a new one with diode backlight

I got lucky that the 5V digital components survived being exposed to 7V. One thing that I’ve learned over the years is that old ICs are pretty good at surviving that kind of abuse.

References

Footnotes

Not that I’ve ever done that, but it’s what I tell my wife. ↩
Whether or not it will ever sell for that asking price is a different story. ↩
NIM stands for Nuclear Instrumentation Model. It’s a voltage and current standard for fast digital pulses for physics and nuclear experiments. ↩
When dealing with mirror image of an IC footprint, I’m constantly second guessing myself about whether or not I’m probing the right pin. ↩
When the +9V voltage rail drops below +7.5V, the dying gasp circuit creates a non-maskable interrupt to the CPU, allowing to quickly store data in non-volatile RAM before the power is completely gone. ↩
I emailed SRS to ask if they had an updated schematic, but they told me to send in the unit for repair. ↩
The ROM contents of each DG535 are unique for that particular unit, since they contain the serial number and calibration data that were determined in the factory. If you program the EPROM with my ROM file in your unit, expect delay specification to be significantly worse. ↩
I have no idea why SRS didn’t use the regular +5V rail to power the LCD panel. ↩

Fixing LCD Screen Corruption of a Tektronix TDS220 Oscilloscope

2025-11-03T10:00:00+00:00

Introduction
The TDS220 Oscilloscope
Opening Up the TDS220
Common TDS220 issues
Replacing the power supply capacitors
LCD Panel Corruption
Extracting the LCD Panel
LCD Panel Capacitor Replacement
LCD Panel Backlight Replacement
Fixing the Square Wave Issue
Conclusion
References
Footnotes

Introduction

I found a Tektronix TDS220 oscilloscope at the Silicon Valley Electronics Flea Market. The seller told me that it worked but that the screen flickered a bit and that this model is known to have issues with leaking capacitors. He asked $25 which would be a great price for any evening of entertainment even if an oscilloscope wasn’t part of the deal, so I bought it.

Wise men claim that you should not power up an old device that with leaking capacitors, but I obviously did that anyway. The scope booted up nicely with some occasional screen corruption, as promised.

This video gives a better idea about the corruption. It’s intermittent and depends on the kind of content that is shown on the screen. It also less prevalent when the scope has warmed up. All in all, it’s not a deal breaker, the scope is perfectly usable as is, but it would nice to fix it.¹

When connected to a signal generator it showed 2 sine waves:

But when I connected the probe to the probe compensation pin, I got the signal below instead of a square wave:

The scope had this issue for both channels.

Alright, maybe I’d get more than just an evening of fun out of it.

The TDS220 Oscilloscope

The TDS220 was introduced in 1997. It was a low cost oscilloscope with a limited number of features, but with a weight of just 1.5kg/3.25lb and a small size, it was great for technicians and for educational use. I’m not sure if it was Tektronix’ first oscilloscope with an LCD but, if not, it was definitely one of the early ones.

Some key characteristics:

2 channels
100 MHz/1 Gsps
2500 sample points per channel
Only a few measurements: period, frequency, cycle RMS, mean and peak-to-peak voltage

With a plug-in extension board, you can add a parallel, serial and GPIB port and FFT functionality, but even with those, it’s a really bare bones scope. And yet, I expect that I’ll be using it quite a bit: it’s so portable and the footprint is so small that it’s perfect for a quick measurement on a busy workbench.

Let’s take it apart!

Opening Up the TDS220

Opening up the TDS220 isn’t hard, but you need to do the steps in the right order and there’s a bit of bending-the-plastic involved.

Remove handle and power button

The handle must lay flat against the case to widen it and remove it. Also pull off the white knob.

Remove the 2 screws

Once the handle it removed, you get access to 2 screws, one on each side. Remove them with a Torx 15 screwdriver.

Remove expansion module

If you have a TDS2CM or TDS2MM expansion module, you need to remove it because otherwise will block the case from coming off.

It took me longer than I care to admit to figure out how to do this. There is no need to play with the tab at the top of the module, just forcefully slide the thing upwards until it disconnects from the connector at the bottom.

Pry off the back case

This is the part that I always hate, because you need to figure which location is the best to jam a screwdriver between 2 pieces of plastic. And based on the scuff marks in the picture below, others have struggled with it as well.

But I think I found the best way to go about it now. At the right side, insert the screwdriver horizontally between the blue and the white plastic and then lift the blue part. Insert a smaller screwdriver in the gap that you just made to prevent it from closing again and repeat the same operation in the middle and the left.

Inside exposed

You can now take off the blue back cover and have a look at the inside of the scope.

(Click to enlarge)

There are 2 PCBs: the left horizontal PCB contains all the acquistion and processing logic. The one on the right is the power supply.

Extract the power supply PCB

To remove the power supply, unplug the orange bundle with 7 wires from the main PCB as well as the fat ground wire. The PCB is held in place by 2 plastic tabs at the bottom.

Common TDS220 issues

Here are the most common TDS220 issues:

leaking capacitors in the power supply
mechanical stress around the BNC connectors
LCD backlight too weak or not working
Weak ground connection from BNC connector to power supply

Tektronix issued a product recall for this. My unit has components with dates that come after the product recall. Check out this video for a fix.

Not so common issue:

LCD screen corruption

While I’ll document 3 of the 4 common repairs, you can find plenty of other source on the web that do the same thing. That’s not the case for the LCD screen corruption.

Replacing the power supply capacitors

I didn’t take pictures of it, but the solder side of the power supply PCB was drenched in a light-brown/yellow-ish fluid. Some of that made it to the front side of the PCB as can be seen here:

(Click to enlarge)

I’m not 100% sure about the source of this fluid because I was never able to pinpoint exactly which of the capacitors started leaking, but it’s fair to assume that this fluid was capacitor electrolyte. I decided to remove all electrolytic capacitors with new ones. There are 11 of them, listed in the table below:

I used these components for my TDS220 recapping, but there is absolutely no guarantee that these are the right ones. You need to double check everything yourself! Recapping the scope is done at your own risk!

#	Indicator	Capacitance	Voltage	Location
1a	C3	47 uF	450V	Largest on the PCB
1b	C3	68 uF	450V	Largest on the PCB
2	C13	2200 uF	6.3V	Next to connector CN2
3	C12	2200 uF	6.3V	Next to C13
4	C11	2200 uF	6.3V	Next to C12
5	C14	1000 uF	6.3V	Next to C11
6	C15	470 uF	6.3V	Between C13 and C12
7	C21	47 uF	16V	Close to “AULT KOREA”
8	C18	22 uF	35V	Next to CN2
9	C6	22 uF	35V	Next to IC1
10	C17	4.7 uF	50V	Next to C16
11	C10	2.2 uF	50 V	Next to CN2

Pay attention to 1a and 1b: some TDS220 power supplies have a 47 uF, other have a 68 uF capacitor. Mine had a 47 uF one. You don’t need to buy both of them.

I created this Digikey list with all these capacitors. At the time of writing this, the cost was $8.31, tax and shipping not included.

On my unit, most capacitors were fixed to the PCB with a soft, glue-like substance. Use an Exacto knife to cut it loose before desoldering a capacitor.

The PCB has markers for capacitor polarity. For smaller ones, it uses regular + and - notation. For larger ones, a black circle indicates negative polarity.

All in all, the PSU recapping process is pretty straightforward and took around 1 hour to complete.

However, after power the scope back on, the screen corruption was still there!

LCD Panel Corruption

The LCD screen corruption is content specific and it happens for a whole pixel row at a time. I thought that it was caused by some signal corruption on the flat cable between the main PCB and the LCD panel, but this was not case. I googled around a bit, but couldn’t find any references to the issue that I was seeing, so I asked on the EEVblog Repair forum. A few hours later, I got the following reply:

Please refer to the link above, the problem that occurs is very similar to your problem

It included a link a Chinese forum that requires an account to get access to photos and any pages beyond the first one, but daisizhou helpfully posted those pictures in the EEVblog forum thread:

You need to replace some capacitors that are inside the LCD panel!

Extracting the LCD Panel

Replacing the LCD panel capacitors is not complicated, but since an LCD panel assembly is a bit fragile, you need to be careful to not destroy anything. Let’s first extract the panel from the case.

Remove the front panel knobs

The panel knobs are the main components that are still keeping the front enclosure attached to the main body. You can just pull them off.

Remove buttons PCB

Unplug 2 flat cable connectors that links the main PCB to the buttons PCB and to the LCD panel. Not shown: also unplug the power connector of the LCD panel. It’s right next to the mains receptacle on the power PCB.

You can now remove the buttons PCB by pushing down 2 plastic tabs near the BNC connectors.

If the LCD panel was never removed before, chances are that the LCD front protector sticks to the LCD panel itself. That is the case in the picture above. The protector has a dark gray foam around a transparant piece of plastic.

Remove LCD protector

To get to the PCB inside the LCD panel, you need access to a screw that is covered by the LCD protector. A weak adhesive keeps the protector in place. You can gently pull it from the LCD screen.

There are 2 plastic tabs on the left of the LCD panel that keep it locked in place. Push those up and down to unlock the the panel. You can now lift that left size away from the main chassis and then slide the panel to the left to get the metal tab on the right out as well.

The panel is now loose. Remove the screw on the center left.

LCD frame clips

Turn the LCD panel around so that plastic back is towards you. Put something on the table to protect the LCD front screen. I used the LCD protector for that.

We can now see the 3 capacitors that need to be replaced:

In addition to the screw from the previous step, the metal frame at the front of the LCD panel is held in place by 8 metal tabs the bend into gaps of the plastic back. Use nose pliers to straighten those tabs.

You can now remove the plastic back. Finally, you have access to the LCD PCB!

LCD Panel Capacitor Replacement

Here are the 3 capacitors in close-up. They’re 3.3 uF 35V polarized capacitors.

However, they are not your garden variety SMD tantalum capacitors! Notice how both leads are on the same side of the capacitor. When we look at the other side, we can see how the capacitor has a cylindrical core with a box plastic enclosure around it.

I couldn’t find any exact replacement. On that Chinese forum, they used regular electrolytic caps instead, so that’s what I did as well.

The plastic back has cut-outs for the 3 capacitors. On the Chinese forum, they made those cut-outs a bit larger to make the new capacitors fit, but that was not necessary in my case: the holes were large enough as-is, as long as you took care to solder them as close to the inside of the PCB as possible.

You can see this here:

The top capacitor is soldered too far to the left, the bottom one is fine. I had to resolder the top capacitor one to make it fit in the cut-out.

One of the old LCD capacitors came in at 2.5 uF and a 75 Ohm ESR. The replacement ones have an ESR of 6 Ohm…

With the capacitors replace, you can now put the LCD panel back together in reverse order. But don’t mount it back into the chasses just yet!

LCD Panel Backlight Replacement

The LCD panel uses a small CCFL tube as backlight. Over time, these CCFLs lose their intensity which makes the screen less bright.

You access the CCFL tube by removing an easy to remove cover on the left of the panel:

Notice how some parts of the tube are black.

Replacement tubes can be found on eBay. Sellers vary, but just search for “CCFL lamp tds220” and you’ll find what you need. Prices have gone up due to tariffs, I paid $17.46 including shipping.

The new lamp doesn’t come with the right connector, so some soldering is required to transfer the connector from the old lamp to the new one. I first placed new tube in the LCD panel and the LCD panel in the chassis before soldering the connector. That made it easier to get the length of the wires correct.

After the replacement, the screen brightness was noticable… dimmer, but that’s normal: new CCFL lamps needs a few minutes to reach their full brightness.

You can find plenty of videos on Youtube of this backlight swap and authors always claim to see a significant improvement. I’m not so sure for my case: it’s not dimmer, but I can’t honestly say that it’s much brighter. In one case, someone replaced the CCFL lamp with an LED PCB. I looked around for suitable LED PCBs to do that as well but didn’t find anything that worked. If you want to try that, understand that the voltage of the CCFL lamp is much higher than the 5V or so you’d need for LEDs!

Fixing the Square Wave Issue

The corrupted signal compensation square wave issue was solved by reheating the solder of the BNC connector pins on the main PCB.

To do this right, you need to remove the RF shielding at the bottom of the main PCB, but I just squeezed my soldering iron into an open space and hope for the best. It worked:

Conclusion

The TDS220 is working perfect fine again. It measures signals correctly, there is no LCD screen corruption, and the brightness is fine. It’s still sitting on the bench, connected to a logic analyzer, but that’s still a work in progress and may be a topic for a future blog post.

References

Footnotes

In addition to the corruption, there’s also quite a bit of full-screen flicker. This is only a video recording artifact. There is no visible flicker. ↩

Inside an Isotemp OCXO107-10 Oven Controlled Crystal Oscillator

2025-10-26T10:00:00+00:00

The Isotemp OCXO107-10
Gathering Information from time-nuts
Getting It to Run
On the Bench
Inside the OCXO107-10
Looking Forward

The Isotemp OCXO107-10

I spent $5 at the Silicon Valley Electronics Flea Market on an Isotemp OCXO107-10 oscillator.

Compared to my other OCXOs, this one is a real chonker, which is often correlates with its ability to keep the output frequency stable during changing environmental conditions: a large volume gives you more real estate for tricks to keep the internal temperature constant.

Despite the -10 suffix of the product name, it has an output frequency of 5 MHz, not the 10 MHz that can be found on most equipment these days. 5 MHz used to be more popular; HP’s famous 5061A and 5071A Cesium atomic clocks have a 5 MHz output, for example, and my HP 5370A and SRS SR620 time interval counters accept both 5 MHz and 10 MHz clocks on their external reference clock input.

Gathering Information from time-nuts

I did some Google research and, to the surprise of no one, found a few scraps of information on the time-nuts email list:

These oscillators used to cost more than $1000 a piece.
In addition to Isotemp, CTS Knights made a product with the same 0410-2450 SKU number.
These oscillators were used by Lucent. The CTS Knights unit has a date code of 1989, well before AT&T spun off its AT&T Technologies business unit into Lucent in 1996. My unit has a scribble of 1986.
There’s an OCXO107-16 version which is also a 5 MHz option.
Someone opened up his unit, did a bunch of stability measurements, and posted pictures. Those pictures have since disappeared, but I contacted the author, Ed Palmer, who graciously sent them to me.
One of the pins of the 9-pin connector of the OCXO107 is a reference voltage that can be used to construct an EFC (electronic frequency control) input voltage to tune the output frequency. There’s apparently quite a bit of noise on this Vref output.
There’s a datasheet for an Isotemp OCXO107-3. It’s not identical to the OCXO107-10: it has a different connector, uses more power, and there’s also mention of a 16-bit D/A converter to discipline the output frequency. But chances are that some of the characteristics are similar?
Photo with pinout of the DE-9 connector.

That’s all I could find, but it’s more than enough to get started.

Getting It to Run

The 107-10 has DE-9 connector for power and control and an SMA connector for the clock output.

The DE-9 pinout:

- 5MHz TTL Out
- Ground
- +5V
- Ground
- +12V (Oven)
- Ground
- Ground
- EFC
- VREF 7.0V

The 5 V power rail is only used for the 5 MHz digital output. The OCXO will work fine and output a sine wave on the SMA port when you leave this 5 V rail unconnected.

On the Bench

I don’t have a setup to make long-term measurements, but I just wanted to see if I could get the thing to work. Here’s my earthquake-hardened bench setup:

One output of an HP E3631A power supply creates the 12 V rail, the other an EFC voltage that is tuned to match 5 MHz output against the 10 MHz of my TM4313 GPSDO.

When I power up the unit, the 12 V rail initially pulls around 320 mA (3.8W) to heat up the internal oven. The current quickly drops below 100 mA and eventually settles to 69 mA (0.83 mW.)

When fed into a 50 Ohm termination, my uncalibrated spectrum analyzer measures a power level of -1.80 dBm and a second harmonic of -55.04 dBm or -53.23 dBc. The output level is different than the >+3 dBm that is listed in the datasheet for the OCXO107-3, but it is similar to what others on the time-nuts list have measured.

My unit has a tag to it that says:

1/8 2.47V
1/30 2.44V
4/2/86 2.54V

This must be the voltage level that’s required on the EFC input to tune the output frequency at 5 MHz. In my current setup, that voltage level is roughly 2.228 V though that’s only 2 days after powering it up. An OCXO107-10 needs about a week to truly stablize.

The Vref output measures 6.78 V, not too far off the expected 7 V.

Inside the OCXO107-10

The OCXO has 4 solder points to weld the outside case to inside sliding assembly. I tried to get it open with a soldering iron, but the metal enclosure immediately dissipated the heat away. I wasn’t able to open my unit, but luckily Ed gave permission to use his pictures. Let’s have a look:

(Click to enlarge)

All the components of the OCXO107 reside inside a Dewar flask. Think coffee thermos with double sided wall with near-vacuum to reduce the heat transfer between the center cavity and the outside world.

In the picture above, you see the Dewar flask on the right, the electronics slided-out on the left, and an insulating foam on the far left to plug off the open side of the Dewar cylinder.

The Dewar flask makes the OCXO more resistant against varying outside temperatures, but it also makes the unit very expensive and fragile. Ed’s first unit wasn’t packaged correctly and arrived with a broken flask, which makes the OCXO useless. These days, high stability OCXOs have one or two ovens and insulating material around it, though the website of Quantic Wenzel, producer of very high performance oscillators, says that “units with Dewar flasks are still available for superior temperature performance and lower power consumption”.

I’m too much of a beginner to compare the specifications of different OCXOs but I’ll give it a try anyway, so caveat emptor. The OCXO107-3 datasheet mentions a temperature stability of < +/- 0.06 ppb for an ambient temperature between 0 C and 60 C.

The datasheet of the HP 10811 OCXO lists a frequency vs temperature sensitivity of < 2.5 10^-9 between 0 C and 71 C. If that’s apples to apples that would make the OCXO107-3 41 times more resistant against temperature variations.

I randomly searched for specs of contemporary double-oven OCXOs and found numbers from 0.1 ppb for a Rakon ROX5242T1 and even 0.05 ppb, for units that are smaller and definitely less fragile. Just a case of old fashioned technological progress?

Note that temperature sensitivity is just one of many OXCO metrics. You also need to compare again voltage stability, phase noise and a whole bunch of other parameters, and select the one that matches your needs. For example, the temperature sensitivity of a 10 MHz lab reference clock may be more important than phase noise, while the opposite can be true for an oscillator that’s used for multi-GHz communication links.

After removing the copper heatsink, you can see the oscillator control board on top of a large crystal:

(Click to enlarge)

Here’s another view of this side of the assembly:

(Click to enlarge)

If you turn around the assembly, you see this:

(Click to enlarge)

The blue component at the bottom is a Motorola JE800 Darlington transistor that is used as heating element. Closeby, to the right of the orange capacitor, is an IC with 431 marking. It’s tempting at first to speculate that this is a TMP431 temperature sensor, , but since those require a microcontroller to configure that’s unlikely. Maybe it’s TL431 voltage reference instead? Either way, there must be something on the PCB to measure voltage and feed that back to the heating transistor to keep temperature stable.

Looking Forward

My home lab currently has 2 clock references: the TM4313 GPSDO and the free-running GT300 frequency standard that I tore down last year. I’ve been wanting to do a bunch of long-term comparative measurements on a bunch of OCXOs, just for the fun of it. However, since crystal oscillators need a long time to truly stabilize, think a week for the OCXO107, this is not something I want to do with a power guzzling and noisy E3631A bench supply. The first step is to build a custom smaller scale linear power supply just for this purpose. In other words: yet another project to put on the stack!

Power	Split	Substitution	Multiply	\(\pmod{f(x)}\)
\(\alpha^{0}\)	\(1\)	\(1\)	\(1\)	\(1\)
\(\alpha^{1}\)	\(\alpha\)	\(\alpha\)	\(\alpha\)	\(\alpha\)
\(\alpha^{2}\)	\(\alpha^{2}\)	\(\alpha^{2}\)	\(\alpha^{2}\)	\(\alpha^{2}\)
\(\alpha^{3}\)	\(\alpha^{3}\)	\(\alpha^{3}\)	\(\alpha^{3}\)	\(\alpha^{3}\)
\(\alpha^{4}\)	\(\alpha^{4}\)	\(\alpha + 1\)	\(\alpha + 1\)	\(\alpha + 1\)
\(\alpha^{5}\)	\(\alpha^{4} \cdot \alpha\)	\((\alpha + 1) \cdot \alpha\)	\(\alpha^{2} + \alpha\)	\(\alpha^{2} + \alpha\)
\(\alpha^{6}\)	\(\alpha^{5} \cdot \alpha\)	\((\alpha^{2} + \alpha) \cdot \alpha\)	\(\alpha^{3} + \alpha^{2}\)	\(\alpha^{3} + \alpha^{2}\)
\(\alpha^{7}\)	\(\alpha^{6} \cdot \alpha\)	\((\alpha^{3} + \alpha^{2}) \cdot \alpha\)	\(\alpha^{4} + \alpha^{3}\)	\(\alpha^{3} + \alpha + 1\)
\(\alpha^{8}\)	\(\alpha^{7} \cdot \alpha\)	\((\alpha^{3} + \alpha + 1) \cdot \alpha\)	\(\alpha^{4} + \alpha^{2} + \alpha\)	\(\alpha^{2} + 1\)
\(\alpha^{9}\)	\(\alpha^{8} \cdot \alpha\)	\((\alpha^{2} + 1) \cdot \alpha\)	\(\alpha^{3} + \alpha\)	\(\alpha^{3} + \alpha\)
\(\alpha^{10}\)	\(\alpha^{9} \cdot \alpha\)	\((\alpha^{3} + \alpha) \cdot \alpha\)	\(\alpha^{4} + \alpha^{2}\)	\(\alpha^{2} + \alpha + 1\)
\(\alpha^{11}\)	\(\alpha^{10} \cdot \alpha\)	\((\alpha^{2} + \alpha + 1) \cdot \alpha\)	\(\alpha^{3} + \alpha^{2} + \alpha\)	\(\alpha^{3} + \alpha^{2} + \alpha\)
\(\alpha^{12}\)	\(\alpha^{11} \cdot \alpha\)	\((\alpha^{3} + \alpha^{2} + \alpha) \cdot \alpha\)	\(\alpha^{4} + \alpha^{3} + \alpha^{2}\)	\(\alpha^{3} + \alpha^{2} + \alpha + 1\)
\(\alpha^{13}\)	\(\alpha^{12} \cdot \alpha\)	\((\alpha^{3} + \alpha^{2} + \alpha + 1) \cdot \alpha\)	\(\alpha^{4} + \alpha^{3} + \alpha^{2} + \alpha\)	\(\alpha^{3} + \alpha^{2} + 1\)
\(\alpha^{14}\)	\(\alpha^{13} \cdot \alpha\)	\((\alpha^{3} + \alpha^{2} + 1) \cdot \alpha\)	\(\alpha^{4} + \alpha^{3} + \alpha\)	\(\alpha^{3} + 1\)
\(\alpha^{15}\)	\(\alpha^{14} \cdot \alpha\)	\((\alpha^{3} + 1) \cdot \alpha\)	\(\alpha^{4} + \alpha\)	\(1\)

Power	Split	Substitution	Multiply	\(\pmod{f(x)}\)
\(\alpha^{0}\)	\(1\)	\(1\)	\(1\)	\(1\)
\(\alpha^{1}\)	\(\alpha\)	\(\alpha\)	\(\alpha\)	\(\alpha\)
\(\alpha^{2}\)	\(\alpha^{2}\)	\(\alpha^{2}\)	\(\alpha^{2}\)	\(\alpha^{2}\)
\(\alpha^{3}\)	\(\alpha^{3}\)	\(\alpha^{3}\)	\(\alpha^{3}\)	\(\alpha^{3}\)
\(\alpha^{4}\)	\(\alpha^{4}\)	\(\alpha^{3} + \alpha^{2} + \alpha + 1\)	\(\alpha^{3} + \alpha^{2} + \alpha + 1\)	\(\alpha^{3} + \alpha^{2} + \alpha + 1\)
\(\alpha^{5}\)	\(\alpha^{4} \cdot \alpha\)	\((\alpha^{3} + \alpha^{2} + \alpha + 1) \cdot \alpha\)	\(\alpha^{4} + \alpha^{3} + \alpha^{2} + \alpha\)	\(1\)
\(\alpha^{6}\)	\(\alpha^{5} \cdot \alpha\)	\((1) \cdot \alpha\)	\(\alpha\)	\(\alpha\)
\(\alpha^{7}\)	\(\alpha^{6} \cdot \alpha\)	\((\alpha) \cdot \alpha\)	\(\alpha^{2}\)	\(\alpha^{2}\)
\(\alpha^{8}\)	\(\alpha^{7} \cdot \alpha\)	\((\alpha^{2}) \cdot \alpha\)	\(\alpha^{3}\)	\(\alpha^{3}\)
\(\alpha^{9}\)	\(\alpha^{8} \cdot \alpha\)	\((\alpha^{3}) \cdot \alpha\)	\(\alpha^{4}\)	\(\alpha^{3} + \alpha^{2} + \alpha + 1\)
\(\alpha^{10}\)	\(\alpha^{9} \cdot \alpha\)	\((\alpha^{3} + \alpha^{2} + \alpha + 1) \cdot \alpha\)	\(\alpha^{4} + \alpha^{3} + \alpha^{2} + \alpha\)	\(1\)
\(\alpha^{11}\)	\(\alpha^{10} \cdot \alpha\)	\((1) \cdot \alpha\)	\(\alpha\)	\(\alpha\)
\(\alpha^{12}\)	\(\alpha^{11} \cdot \alpha\)	\((\alpha) \cdot \alpha\)	\(\alpha^{2}\)	\(\alpha^{2}\)
\(\alpha^{13}\)	\(\alpha^{12} \cdot \alpha\)	\((\alpha^{2}) \cdot \alpha\)	\(\alpha^{3}\)	\(\alpha^{3}\)
\(\alpha^{14}\)	\(\alpha^{13} \cdot \alpha\)	\((\alpha^{3}) \cdot \alpha\)	\(\alpha^{4}\)	\(\alpha^{3} + \alpha^{2} + \alpha + 1\)
\(\alpha^{15}\)	\(\alpha^{14} \cdot \alpha\)	\((\alpha^{3} + \alpha^{2} + \alpha + 1) \cdot \alpha\)	\(\alpha^{4} + \alpha^{3} + \alpha^{2} + \alpha\)	\(1\)

Power	\(\pmod{f(x)}\)	\(\alpha \to x\)	Binary
\(\alpha^{0}\)	\(1\)	\(1\)	0001
\(\alpha^{1}\)	\(\alpha\)	\(x\)	0010
\(\alpha^{2}\)	\(\alpha^{2}\)	\(x^{2}\)	0100
\(\alpha^{3}\)	\(\alpha^{3}\)	\(x^{3}\)	1000
\(\alpha^{4}\)	\(\alpha + 1\)	\(x + 1\)	0011
\(\alpha^{5}\)	\(\alpha^{2} + \alpha\)	\(x^{2} + x\)	0110
\(\alpha^{6}\)	\(\alpha^{3} + \alpha^{2}\)	\(x^{3} + x^{2}\)	1100
\(\alpha^{7}\)	\(\alpha^{3} + \alpha + 1\)	\(x^{3} + x + 1\)	1011
\(\alpha^{8}\)	\(\alpha^{2} + 1\)	\(x^{2} + 1\)	0101
\(\alpha^{9}\)	\(\alpha^{3} + \alpha\)	\(x^{3} + x\)	1010
\(\alpha^{10}\)	\(\alpha^{2} + \alpha + 1\)	\(x^{2} + x + 1\)	0111
\(\alpha^{11}\)	\(\alpha^{3} + \alpha^{2} + \alpha\)	\(x^{3} + x^{2} + x\)	1110
\(\alpha^{12}\)	\(\alpha^{3} + \alpha^{2} + \alpha + 1\)	\(x^{3} + x^{2} + x + 1\)	1111
\(\alpha^{13}\)	\(\alpha^{3} + \alpha^{2} + 1\)	\(x^{3} + x^{2} + 1\)	1101
\(\alpha^{14}\)	\(\alpha^{3} + 1\)	\(x^{3} + 1\)	1001
\(\alpha^{15}\)	\(1\)	\(1\)	0001

Power	\(\pmod{f(x)}\)	\(\alpha \to x^4\)	\(\pmod{f(x)}\)	Binary
\(\alpha^{0}\)	\(1\)	\(1\)	\(1\)	0001
\(\alpha^{1}\)	\(\alpha\)	\(x^{4}\)	\(x + 1\)	0011
\(\alpha^{2}\)	\(\alpha^{2}\)	\(x^{8}\)	\(x^{2} + 1\)	0101
\(\alpha^{3}\)	\(\alpha^{3}\)	\(x^{12}\)	\(x^{3} + x^{2} + x + 1\)	1111
\(\alpha^{4}\)	\(\alpha + 1\)	\(x^{4} + 1\)	\(x\)	0010
\(\alpha^{5}\)	\(\alpha^{2} + \alpha\)	\(x^{8} + x^{4}\)	\(x^{2} + x\)	0110
\(\alpha^{6}\)	\(\alpha^{3} + \alpha^{2}\)	\(x^{12} + x^{8}\)	\(x^{3} + x\)	1010
\(\alpha^{7}\)	\(\alpha^{3} + \alpha + 1\)	\(x^{12} + x^{4} + 1\)	\(x^{3} + x^{2} + 1\)	1101
\(\alpha^{8}\)	\(\alpha^{2} + 1\)	\(x^{8} + 1\)	\(x^{2}\)	0100
\(\alpha^{9}\)	\(\alpha^{3} + \alpha\)	\(x^{12} + x^{4}\)	\(x^{3} + x^{2}\)	1100
\(\alpha^{10}\)	\(\alpha^{2} + \alpha + 1\)	\(x^{8} + x^{4} + 1\)	\(x^{2} + x + 1\)	0111
\(\alpha^{11}\)	\(\alpha^{3} + \alpha^{2} + \alpha\)	\(x^{12} + x^{8} + x^{4}\)	\(x^{3} + 1\)	1001
\(\alpha^{12}\)	\(\alpha^{3} + \alpha^{2} + \alpha + 1\)	\(x^{12} + x^{8} + x^{4} + 1\)	\(x^{3}\)	1000
\(\alpha^{13}\)	\(\alpha^{3} + \alpha^{2} + 1\)	\(x^{12} + x^{8} + 1\)	\(x^{3} + x + 1\)	1011
\(\alpha^{14}\)	\(\alpha^{3} + 1\)	\(x^{12} + 1\)	\(x^{3} + x^{2} + x\)	1110
\(\alpha^{15}\)	\(1\)	\(1\)	\(1\)	0001

Power	\(\pmod{f(x)}\)	\(\alpha \to x\)	Binary
\(\alpha^{0}\)	\(1\)	\(1\)	0001
\(\alpha^{1}\)	\(\alpha\)	\(x\)	0010
\(\alpha^{2}\)	\(\alpha^{2}\)	\(x^{2}\)	0100
\(\alpha^{3}\)	\(\alpha^{3}\)	\(x^{3}\)	1000
\(\alpha^{4}\)	\(\alpha + 1\)	\(x + 1\)	0011
…	…	…	…