Anka He Chen

A Sparse Coarse Space for VBD from Element Eigenmodes

2026-05-20T00:00:00-07:00

VBD’s per-vertex 3×3 solve already gives the locally optimal descent direction — there is no leverage left inside a single block. So why does it still slow down on problems with high stiffness contrast? The answer turns out to be a story about what basis you descend in, and once that picture is in place the fix — a sparse coarse correction built from per-element eigenmodes — almost suggests itself.

Conditioning, geometrically

For a symmetric positive definite matrix in the 2-norm,

\[\kappa(A) = \lVert A\rVert_2 \cdot \lVert A^{-1}\rVert_2 = \frac{\lambda_\text{max}}{\lambda_\text{min}}\]

The quadratic energy $E(\mathbf{x}) = \tfrac{1}{2}\mathbf{x}^T A \mathbf{x}$ has level sets that are ellipsoids whose principal axes are the eigenvectors of $A$ and whose squared semi-axis lengths are $1/\lambda_i$. So $\kappa$ is literally the squared aspect ratio of the level-set ellipse. Big $\kappa$ means very elongated; $\kappa = 1$ means a sphere.

That ratio is exactly what governs CG’s convergence:

\[\lVert e_k\rVert_A \;\leq\; 2 \left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^k \lVert e_0\rVert_A\]

For a FEM stiffness matrix, $\kappa$ is at least as bad as the stiffness ratio between the stiffest and softest elements. A contrast of $10^6$ pushes the CG factor to $\approx 0.999$ — thousands of iterations to make a dent. Stiffness contrast is the standard way to wreck a global iterative solve.

But CG is not what VBD does. VBD is block coordinate descent, and for BCD, $\kappa$ is not the governing quantity.

Diagonal dominance is a different geometry

Partition $A$ into block-diagonal $D$ and off-diagonal $L + L^T$. Block Gauss-Seidel iterates with matrix $M = -(D+L)^{-1}L^T$, and its convergence rate is $\rho(M)$, controlled by block diagonal dominance — the size of each diagonal block $D_i$ relative to its incident off-diagonal entries.

This is a distinct quantity from $\kappa$. Geometrically:

$\kappa$ controls the eccentricity of the energy ellipse — how elongated the level sets are.
Diagonal dominance controls the alignment — how rotated those level sets are relative to your coordinate axes.

You can have a well-conditioned, nearly circular system whose principal axes are tilted 45° from the coordinate basis: BCD still zig-zags because each coordinate step cuts diagonally across the level sets. And you can have a horribly elongated system whose long axis is aligned with a coordinate axis: BCD walks straight down it in a sweep.

Same $\kappa = 25$ in both panels, completely different BCD experience. That distinction is why VBD does fine on uniformly stiff materials but stalls at interfaces: a soft vertex coupled to a stiff neighbor has small $D_i$ but a large off-diagonal entry on the stiff side. Local diagonal dominance breaks at the interface, regardless of the global condition number.

Every iterative method is a choice of descent basis

Once the picture is “an energy ellipse and a sequence of line searches,” every iterative method just becomes a strategy for picking the directions to search along. Three are well-established, plus a fourth that this post is building toward:

Eigenbasis. Diagonalize $A = Q \Lambda Q^T$ and change variables $\mathbf{z} = Q^T\mathbf{x}$. In the new coordinates, the energy decouples into independent parabolas $\sum_i \tfrac{1}{2}\lambda_i z_i^2 - \tilde b_i z_i$, and each one is solved in a single division. One sweep, done. The catch is finding $Q$ in the first place: an eigendecomposition costs $O(n^3)$, which is more expensive than the direct solve you were trying to avoid.

CG. CG does not try to find the eigenbasis directly. It builds its own basis adaptively, one vector per iteration, by taking matrix-vector products from the current residual:

\[\mathcal{K}_k(A, r_0) = \mathrm{span}\{r_0,\, A r_0,\, A^2 r_0,\, \ldots,\, A^{k-1} r_0\}\]

Each new direction is $A$-conjugate to the previous ones, so progress in one direction is never undone in the next. In 2D, two iterations span the whole space and CG is exact. The directions are globally optimal given what CG has seen, but each is dense — forming the next Krylov vector takes a full matvec, and computing $\alpha_k$ takes a global inner product. That is the parallel bottleneck.

VBD (colored Gauss-Seidel). VBD picks the coordinate axes, in disjoint groups. Within a color, every vertex’s 3 coordinate axes are independent and can be solved in parallel via direct 3×3 inverses. The directions are sparse by construction (one vertex’s 3 axes touch only that vertex and its incident elements) so the local solves are embarrassingly parallel and there is no global reduction — but the basis is fixed in the coordinate system, regardless of where the actual eigenvectors of $A$ are pointing. When the principal axes are tilted, every coordinate step is partly orthogonal to the direction it should be going, and the iteration zig-zags.

Sparse local-eigen basis. What I am after: directions that are sparse like VBD’s (so the local solve and parallelism survive) but spectrally informed like CG’s (so each direction actually points along a principal axis of $A$). The element stiffness matrices give them to us for free, as we will see in a moment.

The three established strategies, stacked against the same tilted, ill-conditioned ellipse: eigenbasis walks straight in, CG takes two clean $A$-conjugate steps, VBD’s coordinate basis zig-zags down the long axis. The fourth strategy — sparse and spectrally informed — has no off-the-shelf solver to plot here yet; the rest of the post is what it would look like for VBD.

What an eigenbasis rebase buys

Zoom into the eigenbasis option for a moment. The reason it converges in one sweep is just that the change of variables $\mathbf{z} = Q^T \mathbf{x}$ rotates the ellipse so its principal axes line up with the new coordinate axes:

Same quadratic, same starting point, two coordinate systems. On the left the principal axes (dashed) sit at 30° from the coordinates and BCD ricochets between them. On the right, after applying $Q^T$ the contours line up with the new axes and one update of each coordinate lands at the minimum — the “sum of independent parabolas” decoupling $E(\mathbf{z}) = \frac{1}{2}\sum_i \lambda_i z_i^2$, each component solved by a single division $z_i^\star = \tilde b_i / \lambda_i$.

This is the upper bound on what any rebasing can buy. Every other strategy — CG’s adaptive Krylov directions, multigrid’s hierarchical bases, the usual preconditioners — is an approximation of the same trick at lower cost. For VBD we cannot afford the global $Q$, but the element stiffness matrices already hand us local $Q_e$’s for free; the rest of the post builds a cheap, sparse coarse correction on that observation.

VBD’s actual bottleneck

With graph coloring, every vertex of a given color is decoupled from every other vertex of that color. One color sweep is thousands of perfectly parallel 3×3 direct solves, each inverting its local Hessian exactly. There is no further leverage from rotating an individual block into its own eigenbasis — the direct $3\times 3$ inverse is already optimal in any basis.

So the bottleneck cannot be the local solve. It has to be inter-color communication. With $k$ colors and a mesh of diameter $d$, information needs $O(d/k)$ sweeps to cross the mesh: vertex $i$ updates in color 1, neighbor $j$ in color 2 sees the new residual the next sweep, vertex $l$ in color 1 sees $j$’s update the sweep after, and so on. The stiffer the inter-vertex coupling, the bigger the residual that bounces between colors — the BCD zig-zag, lifted from coordinate axes to entire colors.

That immediately tells me which modes I should care about: the ones that span colors and are stiff. Within-color modes are absorbed by the local solve; weak-coupling modes converge fine on their own.

Element eigenmodes as a free coarse basis

Every FEM element ships with its own stiffness matrix $K_e$ — $12\times 12$ for a linear tet, $9\times 9$ for a linear tri. Its eigendecomposition

\[K_e = Q_e\,\Lambda_e\,Q_e^T\]

is essentially free, and the columns of $Q_e$ are the element’s natural deformation modes (stretch along a fiber direction, shear, volumetric compression, etc.). Pick the stiffest few modes per element and stack them as columns of a tall, sparse matrix:

\[P = \big[\,\phi_1\;\big|\;\phi_2\;\big|\;\cdots\;\big|\;\phi_m\,\big]\]

Each $\phi_j \in \mathbb{R}^{3N}$ is nonzero only on the vertices of its source element. Now solve a small reduced system at every outer iteration:

\[(P^T A P)\,z = P^T r, \qquad \mathbf{x} \leftarrow \mathbf{x} + P z\]

This is a deflation / coarse-space correction — it projects the residual into the span of the selected modes and solves optimally inside that span. Slotted into the VBD loop:

for each sweep:
    for each color:
        parallel_vertex_block_solves()   # standard VBD
    r = compute_residual()
    z = solve(P^T A P, P^T r)            # sparse coarse correction
    x += P @ z

The mental model: this is a physics-informed two-level method. The VBD color sweeps are the smoother — they handle high-frequency error perfectly. The coarse solve handles exactly the cross-color modes that the smoother is structurally incapable of resolving in a single sweep. Algebraic multigrid does the same trick, but pays a heavy setup phase to discover its coarse basis from algebraic heuristics. Here the basis is handed to us by the constitutive model.

Sparsity-wise: two columns of $P$ interact in $P^T A P$ only if their source elements share a vertex, so the coarse system is sparse along the element-adjacency graph. No global dense inner products like CG would need.

The cost of one direction

For a single sparse basis vector $\phi$, the optimal step along it is a 1D line search,

\[\alpha^* \;=\; \frac{\phi^T r}{\phi^T A\,\phi}\]

If $\phi$ has support on $k$ vertices, both products only touch a local neighborhood — $O(k\cdot\text{valence})$, not $O(n)$. For a nonlinear energy it is a couple of 1D Newton iterations on the same local support. Each individual direction is effectively free.

The cost lives in the count of columns of $P$, not in each column. Naïvely picking one mode per element gives $m \approx 5N$–$6N$ for a tet mesh, bigger than the original vertex system. That defeats the purpose.

What’s left to figure out

The construction is only useful if $P$ stays small. The pruning strategies that look most promising:

Stiffness-contrast filter. Include modes only from elements whose stiffness deviates sharply from their neighbors — the interfaces where diagonal dominance actually breaks.
Cross-color filter. Drop any mode whose support sits inside a single color. Those modes add no information the smoother is missing.
Patch aggregation. Group neighboring elements and replace per-element top-modes with one patch-level mode — trade some spectral fidelity for far fewer columns.

I do not yet know which filter is sharpest in practice, or how few modes survive before the convergence gain disappears. But the structural picture is clean: the bottleneck is cross-color propagation, the analytical fix lives in element eigendecompositions we already have, and each individual sparse line search is nearly free. The whole game is picking the right small handful of directions.

Quaternion Math for Rigid Body Simulation

2026-04-30T00:00:00-07:00

A practical primer covering exactly the quaternion operations used in rigid body simulation, with reference to the Newton AVBD implementation. No proofs, just what you need to read the code.

What is a quaternion

Four numbers: three “vector” components and one “scalar” component.

\[\mathbf{q} = (x,\, y,\, z,\, w) \quad\text{or equivalently}\quad \mathbf{q} = (\mathbf{v},\, w)\]

A unit quaternion ($\lVert\mathbf{q}\rVert = 1$) represents a 3D rotation. The identity rotation is $(0, 0, 0, 1)$.

Quaternion multiplication

Given $\mathbf{a} = (\mathbf{a}_v,\, a_w)$ and $\mathbf{b} = (\mathbf{b}_v,\, b_w)$:

\[\mathbf{a} \otimes \mathbf{b} = \big(a_w\,\mathbf{b}_v + b_w\,\mathbf{a}_v + \mathbf{a}_v \times \mathbf{b}_v,\;\; a_w\,b_w - \mathbf{a}_v \cdot \mathbf{b}_v\big)\]

This is not commutative: $\mathbf{a}\otimes\mathbf{b} \neq \mathbf{b}\otimes\mathbf{a}$ in general. Order matters, just like matrix multiplication. In fact quaternion multiplication corresponds exactly to multiplying the equivalent $3\times 3$ rotation matrices.

Conjugate and inverse

\[\mathbf{q}^* = (-\mathbf{v},\, w) = (-x,\, -y,\, -z,\, w)\]

For a unit quaternion, $\mathbf{q}^{-1} = \mathbf{q}^*$. This represents the opposite rotation.

How a quaternion encodes a rotation

A rotation by angle $\theta$ about unit axis $\hat{\mathbf{n}}$ is:

\[\mathbf{q} = \big(\sin(\theta/2)\,\hat{\mathbf{n}},\;\; \cos(\theta/2)\big)\]

The half-angle appears because quaternions rotate vectors via the sandwich product (next section), which applies the rotation from both sides—left and right—each contributing half the angle.

This means $\mathbf{q}$ and $-\mathbf{q}$ represent the same rotation (double cover of $SO(3)$). This is why code often checks if q.w < 0: q = -q to pick the shorter path.

Rotating a vector

To rotate vector $\mathbf{v}$ by quaternion $\mathbf{q}$, embed $\mathbf{v}$ as a pure quaternion $(\mathbf{v}, 0)$ and sandwich:

\[\mathbf{v}' = \big(\mathbf{q}\otimes(\mathbf{v}, 0)\otimes\mathbf{q}^*\big)_\text{vec}\]

In code this is quat_rotate(q, v). The efficient formula (no full quaternion multiply) is:

\[\mathbf{t} = 2\,(\mathbf{q}_v \times \mathbf{v}), \qquad \mathbf{v}' = \mathbf{v} + w\,\mathbf{t} + \mathbf{q}_v \times \mathbf{t}\]

The inverse rotation (world to body) is quat_rotate(conjugate(q), v), which in Warp is quat_rotate_inv(q, v).

Composing rotations

To apply rotation $\mathbf{a}$ then rotation $\mathbf{b}$:

\[\mathbf{q}_\text{combined} = \mathbf{b} \otimes \mathbf{a}\]

The rotation applied first goes on the right. Same convention as matrices: $(\mathbf{B}\mathbf{A})\mathbf{v} = \mathbf{B}(\mathbf{A}\mathbf{v})$.

Relative rotation

Given two orientations $\mathbf{q}_\text{cur}$ and $\mathbf{q}_\text{target}$, the rotation from current to target is:

\[\mathbf{q}_\delta = \mathbf{q}_\text{cur}^{-1} \otimes \mathbf{q}_\text{target}\]

This $\mathbf{q}_\delta$ is in $\mathbf{q}_\text{cur}$’s body frame. If you stand in the body frame of $\mathbf{q}_\text{cur}$, $\mathbf{q}_\delta$ tells you how much more to rotate to reach $\mathbf{q}_\text{target}$.

If you instead compute:

\[\mathbf{q}_{\delta,\text{world}} = \mathbf{q}_\text{target} \otimes \mathbf{q}_\text{cur}^{-1}\]

you get the same physical rotation, but expressed in world frame.

This is the key body-vs-world distinction in the Newton code:

# Body-frame delta (Newton uses this)
q_delta = quat_inverse(rot_current) * rot_star

# World-frame delta (the AVBD demo uses this)
q_delta = rot_current * quat_inverse(rot_star)

Quaternion to rotation vector

Extract the axis and angle from a quaternion:

\[\theta = 2\,\arccos(w), \qquad \hat{\mathbf{n}} = \frac{\mathbf{v}}{\sin(\theta/2)}\]

The rotation vector packs both into one $\mathbb{R}^3$ vector:

\[\boldsymbol{\theta} = \hat{\mathbf{n}}\,\theta\]

Its magnitude is the angle, its direction is the axis. This is what quat_to_axis_angle followed by axis * angle does in Newton, and it is the natural quantity for the inertial spring $\mathbf{f}_\text{ang} = \mathbf{I}_\text{world}\,\boldsymbol{\theta}/h^2$.

Rotation vector back to quaternion

Given rotation vector $\boldsymbol{\theta} \in \mathbb{R}^3$:

\[\theta = \lVert\boldsymbol{\theta}\rVert, \qquad \hat{\mathbf{n}} = \boldsymbol{\theta}/\theta, \qquad \mathbf{q} = \big(\sin(\theta/2)\,\hat{\mathbf{n}},\;\cos(\theta/2)\big)\]

For small angles, the small-angle approximation avoids the trig:

\[\mathbf{q} \approx \text{normalize}\!\big(\boldsymbol{\theta}/2,\; 1\big)\]

This is the _USE_SMALL_ANGLE_APPROX path in Newton’s solve_rigid_body.

Angular velocity and dq/dt

If a body has world-frame angular velocity $\boldsymbol{\omega}$, its orientation changes as:

\[\dot{\mathbf{q}} = \tfrac{1}{2}\,\widetilde{\boldsymbol{\omega}} \otimes \mathbf{q}\]

where $\widetilde{\boldsymbol{\omega}} = (\boldsymbol{\omega}, 0)$ is $\boldsymbol{\omega}$ embedded as a pure quaternion (zero scalar part).

Why? A small rotation by angle $\lVert\boldsymbol{\omega}\rVert\,\Delta t$ about axis $\boldsymbol{\omega}/\lVert\boldsymbol{\omega}\rVert$ is the quaternion:

\[\delta\mathbf{q} = \Big(\sin\!\big(\tfrac{\lVert\boldsymbol{\omega}\rVert\Delta t}{2}\big)\,\frac{\boldsymbol{\omega}}{\lVert\boldsymbol{\omega}\rVert},\;\; \cos\!\big(\tfrac{\lVert\boldsymbol{\omega}\rVert\Delta t}{2}\big)\Big) \;\approx\; \big(\boldsymbol{\omega}\,\Delta t/2,\; 1\big)\]

The new orientation is $\delta\mathbf{q}\otimes\mathbf{q}$ (left-multiply = world frame), so:

\[\mathbf{q}(t+\Delta t) - \mathbf{q}(t) = (\delta\mathbf{q} - \mathbf{1})\otimes\mathbf{q} = \big(\boldsymbol{\omega}\,\Delta t/2,\; 0\big)\otimes\mathbf{q}\] \[\dot{\mathbf{q}} = \tfrac{1}{2}\,(\boldsymbol{\omega}, 0) \otimes \mathbf{q}\]

Euler integration of this gives:

\[\mathbf{q}^{n+1} = \text{normalize}\!\big(\mathbf{q}^n + \tfrac{h}{2}\,\widetilde{\boldsymbol{\omega}} \otimes \mathbf{q}^n\big)\]

which is exactly wp.normalize(r0 + wp.quat(w1, 0.0) * r0 * 0.5 * dt) in Newton.

Body-frame angular velocity would use right-multiplication instead:

\[\dot{\mathbf{q}} = \tfrac{1}{2}\,\mathbf{q} \otimes (\boldsymbol{\omega}_\text{body}, 0)\]

Rotation matrix from quaternion

Sometimes you need the $3\times 3$ rotation matrix, e.g. to compute $\mathbf{I}_\text{world} = \mathbf{R}\,\mathbf{I}_\text{body}\,\mathbf{R}^T$. Given $\mathbf{q} = (x, y, z, w)$:

\[\mathbf{R} = \begin{bmatrix} 1-2(y^2+z^2) & 2(xy-wz) & 2(xz+wy) \\\\ 2(xy+wz) & 1-2(x^2+z^2) & 2(yz-wx) \\\\ 2(xz-wy) & 2(yz+wx) & 1-2(x^2+y^2) \end{bmatrix}\]

In Warp this is quat_to_matrix(q).

Summary: operations used in Newton’s rigid body solver

Code	Math	What it does
`quat_rotate(q, v)`	$\mathbf{R}\,\mathbf{v}$	Rotate vector to world frame
`quat_rotate_inv(q, v)`	$\mathbf{R}^T\mathbf{v}$	Rotate vector to body frame
`quat_inverse(q_cur) * q_star`	$\mathbf{R}_\text{cur}^T\,\mathbf{R}_\star$	Relative rotation in body frame
`quat_to_axis_angle(q)` $\to$ `axis*angle`	$\boldsymbol{\theta} = \log(\mathbf{q})$	Quaternion to rotation vector
`quat(half_w, 1.0)` normalized	$\exp(\Delta\boldsymbol{\omega}/2)$	Small rotation vector to quaternion
`dq * q_current`	$\delta\mathbf{R}\,\mathbf{R}_\text{cur}$	Apply world-frame rotation increment
`r0 + wp.quat(w1,0)r00.5*dt`	$\text{normalize}(\mathbf{q} + \tfrac{h}{2}\widetilde{\boldsymbol{\omega}}\otimes\mathbf{q})$	Integrate angular velocity one step
`quat_to_matrix(q)`	$\mathbf{R}$	$3\times 3$ rotation matrix for $\mathbf{R}\,\mathbf{I}\,\mathbf{R}^T$

See also: Rigid Body Dynamics with VBD, Section I for the full AVBD derivation that uses these operations.

Stable Neo-Hookean for VBD: Deriving the Per-Vertex Hessian

2026-04-24T00:00:00-07:00

This post derives the per-vertex 3×3 Hessian block for the stable Neo-Hookean tet material under VBD-style block Gauss-Seidel, and shows how it lands as an unconditionally PSD expression with no clamp or eigenvalue projection required. The derivation is short but the algebraic cancellation it relies on is easy to miss, so it is worth writing out in full. The post is meant as a reference for anyone wiring stable Neo-Hookean into a VBD solver.

Stable Neo-Hookean Energy and Its Hessian

For a tet with deformation gradient $\mathbf{F} \in \mathbb{R}^{3\times3}$ and energy parameters $\mu, \lambda$, the stable Neo-Hookean energy density is

\[\psi(\mathbf{F}) \;=\; \tfrac{\mu}{2}(I_C - 3) \;+\; \tfrac{\lambda}{2}(J - \alpha)^2, \qquad I_C = \|\mathbf{F}\|_F^2,\quad J = \det \mathbf{F},\quad \alpha = 1 + \tfrac{\mu}{\lambda}.\]

The shift $\alpha$ ensures $\partial\psi/\partial \mathbf{F} = \mathbf{0}$ at the rest configuration $\mathbf{F} = \mathbf{I}$; it does not prevent inversion.

A subtlety worth flagging: the symbols $\mu, \lambda$ in this energy are not directly the Lamé parameters. Matching the small-strain limit of stable Neo-Hookean to linear elasticity (Smith et al. §3.4, eq. 13) gives the relation

\[\mu_\text{NH} \;=\; \mu_\text{Lam\'e}, \qquad \lambda_\text{NH} \;=\; \lambda_\text{Lam\'e} \;+\; \mu_\text{Lam\'e}.\]

So if you are exposing material constants to users in textbook Lamé convention, convert with $\lambda_\text{NH} = \lambda_\text{Lam'e} + \mu_\text{Lam'e}$ before plugging into the energy. Throughout the rest of this post, $\mu, \lambda$ refer to the Neo-Hookean parameters $\mu_\text{NH}, \lambda_\text{NH}$ as they appear in the energy expression above.

The first Piola–Kirchhoff stress is

\[\mathbf{P}(\mathbf{F}) \;=\; \frac{\partial \psi}{\partial \mathbf{F}} \;=\; \mu \mathbf{F} \;+\; s\,\text{cof}(\mathbf{F}), \qquad s \;\equiv\; \lambda(J - \alpha).\]

Vectorising $\mathbf{F}$ column-major as $\text{vec}(\mathbf{F}) \in \mathbb{R}^9$, the Hessian splits into three pieces:

\[\mathbf{H}\_\text{elastic} \;=\; \underbrace{\mu \mathbf{I}\_9}\_{\mathbf{A}\_\mu} \;+\; \underbrace{\lambda\,\text{vec}(\text{cof}\,\mathbf{F})\;\text{vec}(\text{cof}\,\mathbf{F})^T}\_{\mathbf{A}\_\lambda} \;+\; \underbrace{s\,\frac{\partial^2 J}{\partial \mathbf{F}^2}}\_{\mathbf{A}\_\sigma}.\]

$\mathbf{A}_\mu$ is a positive multiple of identity. $\mathbf{A}_\lambda$ is rank-1 PSD. $\mathbf{A}_\sigma$ is the only piece that can be indefinite: $s$ is negative for compressed tets, and $\partial^2 J/\partial \mathbf{F}^2$ has both positive and negative eigenvalues. Standard Newton-style implementations therefore SPD-project $\mathbf{A}_\sigma$ (e.g. via Smith–Kim eigenanalysis, or by clamping $s$ into a precomputed safe interval).

For VBD this projection turns out to be unnecessary. Showing why is the rest of the post.

What VBD Needs

VBD updates one vertex at a time by solving the local Newton system

\[\mathbf{H}_{aa}\,\Delta\mathbf{x}_a \;=\; \mathbf{f}_a,\]

so the only piece of the elastic Hessian that ever enters a solve is the $3\times3$ diagonal block $\mathbf{H}{aa}$ corresponding to a single vertex $a$. Off-diagonal blocks $\mathbf{H}{ab}$ ($a \neq b$) influence convergence rate through Gauss-Seidel coupling but never appear inside any matrix inverse.

For a linear tet, $\mathbf{F}$ is affine in the vertex positions, so each vertex contributes via a fixed rest-frame weight $\mathbf{m}^a \in \mathbb{R}^3$ (a row of $\mathbf{D}_m^{-1}$):

\[\frac{\partial F_{ij}}{\partial x_a^\alpha} \;=\; \delta_{i\alpha}\, m_j^a.\]

Throughout, $i,j,k,l$ are deformation-gradient indices ($1\ldots 3$) and $\alpha,\beta$ are spatial-coordinate indices ($1\ldots 3$). The per-vertex 3×3 block is

\[\mathbf{H}\_{aa}^{\alpha\beta} \;=\; \sum_{ijkl}\,\frac{\partial F\_{ij}}{\partial x\_a^\alpha}\;\frac{\partial^2 \psi}{\partial F\_{ij}\,\partial F\_{kl}}\;\frac{\partial F\_{kl}}{\partial x\_a^\beta}.\]

We will plug each of the three pieces $\mathbf{A}_\mu$, $\mathbf{A}_\lambda$, $\mathbf{A}_\sigma$ into this and simplify.

Contracting $\mathbf{A}\mu$ and $\mathbf{A}\lambda$

For $\mathbf{A}_\mu = \mu\,\mathbf{I}_9$:

\[\mathbf{H}\_{aa}^{\alpha\beta}\big[\mathbf{A}\_\mu\big] \;=\; \mu\,\sum\_{ij}\,\delta\_{i\alpha}\,m\_j^a\,\delta\_{i\beta}\,m\_j^a \;=\; \mu\,\delta\_{\alpha\beta}\,\|\mathbf{m}^a\|^2.\]

So the $\mathbf{A}_\mu$ contribution is $\mu\,|\mathbf{m}^a|^2\,\mathbf{I}_3$.

For $\mathbf{A}_\lambda = \lambda\,\text{vec}(\text{cof}\mathbf{F})\,\text{vec}(\text{cof}\mathbf{F})^T$, define $\mathbf{w}^a = \text{cof}(\mathbf{F})\,\mathbf{m}^a \in \mathbb{R}^3$. Then

\[\sum\_{ij}\,\delta\_{i\alpha}\,m\_j^a\,(\text{cof}\,\mathbf{F})\_{ij} \;=\; \sum\_j (\text{cof}\,\mathbf{F})\_{\alpha j}\, m\_j^a \;=\; w\_\alpha^a,\]

so $\mathbf{H}_{aa}^{\alpha\beta}[\mathbf{A}_\lambda] = \lambda\,w_\alpha^a\,w_\beta^a$, i.e. the rank-1 dyad $\lambda\,\mathbf{w}^a(\mathbf{w}^a)^T$.

Both contributions are PSD by inspection.

Contracting $\mathbf{A}_\sigma$

The Hessian of $J = \det \mathbf{F}$ is the Levi-Civita identity

\[\frac{\partial^2 J}{\partial F\_{ij}\,\partial F\_{kl}} \;=\; \varepsilon\_{ikp}\,\varepsilon\_{jlq}\,F\_{pq}.\]

This tensor is nonzero in general, but contract it with $\partial F/\partial x_a$ on both legs:

\[\begin{aligned} \mathbf{H}\_{aa}^{\alpha\beta}\big[\mathbf{A}\_\sigma\big] &= s\,\sum\_{ijkl}\,\delta\_{i\alpha}\,m\_j^a \,\cdot\, \varepsilon\_{ikp}\,\varepsilon\_{jlq}\,F\_{pq}\,\cdot\,\delta\_{k\beta}\,m\_l^a \\\\ &= s\,\varepsilon\_{\alpha\beta p}\,F\_{pq}\,\sum\_{j,l}\, m\_j^a\,\varepsilon\_{jlq}\,m\_l^a \\\\ &= s\,\varepsilon\_{\alpha\beta p}\,F\_{pq}\,(\mathbf{m}^a \times \mathbf{m}^a)\_q \\\\ &= 0. \end{aligned}\]

The inner sum is the cross product of $\mathbf{m}^a$ with itself, which vanishes for any vector. The cancellation goes through for any $\mathbf{F}$ (including $\det \mathbf{F} \leq 0$), any $\mathbf{m}^a$, and any scalar $s$.

The structural reason: $\partial F/\partial x_a^\alpha = \mathbf{e}_\alpha \otimes \mathbf{m}^a$ is a rank-1 dyad. Sandwiching the antisymmetric tensor $\partial^2 J/\partial \mathbf{F}^2$ between two copies of the same rank-1 dyad pins the $j,l$ indices to the same vector $\mathbf{m}^a$, and antisymmetry collapses the contraction to zero. Off-diagonal blocks $\mathbf{H}_{ab}$ for $a \neq b$ replace $\mathbf{m}^a \times \mathbf{m}^a$ with $\mathbf{m}^a \times \mathbf{m}^b$, which is generically nonzero — they do see $\mathbf{A}_\sigma$.

The Per-Vertex Block

Combining the three contractions,

\[\boxed{\;\mathbf{H}\_{aa} \;=\; \mu\,\|\mathbf{m}^a\|^2\,\mathbf{I}\_3 \;+\; \lambda\,\mathbf{w}^a (\mathbf{w}^a)^T,\qquad \mathbf{w}^a = \text{cof}(\mathbf{F})\,\mathbf{m}^a.\;}\]

Both summands are PSD for any $\mathbf{F}$ and any $\mathbf{m}^a$:

$\mu\,|\mathbf{m}^a|^2\,\mathbf{I}_3$ is a positive multiple of identity.
$\lambda\,\mathbf{w}^a(\mathbf{w}^a)^T$ is a rank-1 outer product with positive coefficient.

So the per-vertex block is unconditionally PSD with no projection step. The cofactor-derivative term that complicates Newton-style implementations does not contribute to it.

The corresponding per-vertex elastic force is the same expression evaluated against the true (unclamped) stress:

\[\mathbf{f}\_a \;=\; -\mathbf{P}(\mathbf{F})\,\mathbf{m}^a \;=\; -\mu\,\mathbf{F}\,\mathbf{m}^a \;-\; s\,\mathbf{w}^a.\]

Forces use the real $s = \lambda(J-\alpha)$ even when it is negative; this is what carries the inversion-recovery signal in stable Neo-Hookean.

Implementation

The full evaluator multiplies the result by the rest volume and (optionally) adds a damping contribution. In Warp the elastic part is just:

@wp.func
def evaluate_volumetric_neo_hookean_force_and_hessian(
    tet_id: int, v_order: int,
    pos: wp.array[wp.vec3],
    tet_indices: wp.array2d[wp.int32],
    Dm_inv: wp.mat33,
    mu: float, lmbd: float,
):
    v0 = pos[tet_indices[tet_id, 0]]
    v1 = pos[tet_indices[tet_id, 1]]
    v2 = pos[tet_indices[tet_id, 2]]
    v3 = pos[tet_indices[tet_id, 3]]
    rest_volume = 1.0 / (wp.determinant(Dm_inv) * 6.0)

    # F = D_s D_m^{-1}
    Ds = wp.matrix_from_cols(v1 - v0, v2 - v0, v3 - v0)
    F = Ds * Dm_inv

    # Per-vertex weight m^a (a row of D_m^{-1}; vertex 0 is the negative sum)
    if v_order == 0:
        m = wp.vec3(-(Dm_inv[0,0] + Dm_inv[1,0] + Dm_inv[2,0]),
                    -(Dm_inv[0,1] + Dm_inv[1,1] + Dm_inv[2,1]),
                    -(Dm_inv[0,2] + Dm_inv[1,2] + Dm_inv[2,2]))
    elif v_order == 1:
        m = wp.vec3(Dm_inv[0,0], Dm_inv[0,1], Dm_inv[0,2])
    elif v_order == 2:
        m = wp.vec3(Dm_inv[1,0], Dm_inv[1,1], Dm_inv[1,2])
    else:
        m = wp.vec3(Dm_inv[2,0], Dm_inv[2,1], Dm_inv[2,2])

    # Stress (uses the TRUE s, no clamp)
    J     = wp.determinant(F)
    alpha = 1.0 + mu / lmbd
    s     = lmbd * (J - alpha)
    cof   = compute_cofactor(F)             # adjugate via cross products

    # Per-vertex auxiliary vectors
    Fm = F * m                              # mu term
    w  = cof * m                            # lambda term: w^a = cof(F) m^a

    # Force: f_a = -P m^a
    force = -rest_volume * (mu * Fm + s * w)

    # Hessian: H_aa = mu ||m||^2 I + lambda w w^T
    I3      = wp.identity(n=3, dtype=float)
    hessian = rest_volume * (mu * wp.dot(m, m) * I3 + lmbd * wp.outer(w, w))
    return force, hessian

The 9×9 elastic Hessian never gets assembled; nothing is clamped. Compared to a textbook implementation that builds $\mathbf{A}_\mu + \mathbf{A}_\lambda + \mathbf{A}_\sigma$ as a $9\times9$ matrix, projects it, then contracts with $\partial F/\partial x_a$, this is a small constant-factor saving per tet per VBD inner iteration.

Why This Doesn’t Extend to Triangle Membranes

It is tempting to apply the same logic to a stable Neo-Hookean triangle membrane and conclude that its per-vertex 3×3 block also drops the cofactor-derivative term. It does not. The cancellation hinges on a structural property of the volumetric case that the membrane does not share.

For a 3D triangle in 2D rest space, the deformation gradient is $\mathbf{F} \in \mathbb{R}^{3\times 2}$ with columns $\mathbf{f}_0, \mathbf{f}_1$. The natural area scalar that plays the role of $J$ is

\[J_s \;=\; \sqrt{\det(\mathbf{F}^T \mathbf{F})} \;=\; \|\mathbf{f}_0 \times \mathbf{f}_1\|.\]

Two things change relative to the volumetric case:

$J_s$ is not a polynomial in $\mathbf{F}$ (it is a square root). Its second derivative does not have the clean Levi-Civita form $\partial^2 J/\partial F_{ij}\partial F_{kl} = \varepsilon_{ikp}\varepsilon_{jlq}F_{pq}$. There is in fact an extra $-(1/J_s)\,\nabla J_s \otimes \nabla J_s$ piece coming from differentiating the $1/J_s$ factor in $\nabla J_s = (\mathbf{n}\cdot\nabla\mathbf{n})/J_s$.
Rows and columns of $\mathbf{F}$ live in different spaces. Row indices run over 3D world coordinates $(i \in \{1,2,3\})$, column indices run over 2D parameter coordinates $(j \in \{1,2\})$. The 3-index Levi-Civita $\varepsilon_{jlq}$ that produced $\mathbf{m}\times\mathbf{m}$ in the volumetric proof has nowhere to live on the column-index leg — there are only two column indices to antisymmetrise over, not three.

Concretely, the per-vertex contraction in the membrane case becomes

\[\mathbf{H}\_{aa}^{\alpha\beta}\big[\mathbf{A}\_\sigma^\text{2D}\big] \;=\; s\,\sum\_{j,l \in \\{0,1\\}}\,m\_j^a\,m\_l^a\,\frac{\partial^2 J\_s}{\partial F\_{\alpha j}\,\partial F\_{\beta l}},\]

with no antisymmetric-in-$(j,l)$ structure to exploit. Working through the algebra with $\mathbf{n} = \mathbf{f}_0\times\mathbf{f}_1$ gives a clean form for the contracted block:

\[\mathbf{H}\_{aa}\big[\mathbf{A}\_\sigma^\text{2D}\big] \;=\; \frac{s}{J_s}\,\Big(\|\mathbf{w}\|^2\,\mathbf{I}\_3 \;-\; \mathbf{w}\mathbf{w}^T \;-\; \boldsymbol{\nabla}J_s\,\boldsymbol{\nabla}J_s^T\Big),\]

with $\mathbf{w} = \mathbf{f}_1\,m_0^a - \mathbf{f}_0\,m_1^a$ and $\boldsymbol{\nabla}J_s = \mathbf{g}_0\,m_0^a + \mathbf{g}_1\,m_1^a$, $\mathbf{g}_\alpha = \partial J_s/\partial \mathbf{f}_\alpha$. None of these vanish in general; in fact the $|\mathbf{w}|^2 \mathbf{I}_3 - \mathbf{w}\mathbf{w}^T$ piece projects onto the direction normal to the membrane and produces a genuine out-of-plane stiffness. The cofactor-derivative term carries real physics here.

Geometric reading. The volumetric tet has no “extra” direction — both legs of $\mathbf{F}$ span the same 3D space, and the Levi-Civita pattern absorbs all three coordinate axes uniformly. The membrane has a normal direction that is not in the column space of $\mathbf{F}$; the second-derivative term contributes precisely along that normal. Stripping it would weaken out-of-plane resistance and change the physics, not just save flops.

The Tight PSD Clamp for the Membrane

Although the cofactor-derivative term has to stay, the per-vertex 3×3 block still has a clean PSD characterisation. Combining the three contractions for the membrane case,

\[\mathbf{H}\_{aa} \;=\; \mu\,\|\mathbf{m}^a\|^2\,\mathbf{I}\_3 \;+\; (\lambda - r)\,\boldsymbol{\nabla}J\_s\,\boldsymbol{\nabla}J\_s^T \;+\; r\,\big(\|\mathbf{w}\|^2\,\mathbf{I}\_3 - \mathbf{w}\mathbf{w}^T\big), \qquad r \;\equiv\; \frac{s}{J_s}.\]

(The $\lambda\,\boldsymbol{\nabla}J_s\,\boldsymbol{\nabla}J_s^T$ piece comes from $\mathbf{A}_\lambda$ in the membrane case — the rank-1 cofactor outer product specialises to $\boldsymbol{\nabla}J_s\,\boldsymbol{\nabla}J_s^T$ here. The $-r\,\boldsymbol{\nabla}J_s\,\boldsymbol{\nabla}J_s^T$ piece comes from the $\mathbf{A}_\sigma$ contraction, which is why the two combine.)

Two algebraic identities make this block diagonalisable.

Lemma 1. $\mathbf{w} \cdot \boldsymbol{\nabla}J_s = 0$.

Proof. Direct computation using $\mathbf{g}_\alpha = \partial J_s/\partial \mathbf{f}_\alpha$ and $J_s^2 = AB - C^2$ with $A = |\mathbf{f}_0|^2,\ B = |\mathbf{f}_1|^2,\ C = \mathbf{f}_0\cdot \mathbf{f}_1$ gives $\mathbf{f}_1\cdot\mathbf{g}_0 = \mathbf{f}_0\cdot\mathbf{g}_1 = 0$ and $\mathbf{f}_0\cdot\mathbf{g}_0 = \mathbf{f}_1\cdot\mathbf{g}_1 = J_s$. Expanding $\mathbf{w}\cdot\boldsymbol{\nabla}J_s$ in $(m_0^a, m_1^a)$ and substituting collapses the four terms to $J_s\,m_0^a m_1^a - J_s\,m_0^a m_1^a = 0$. $\square$

Lemma 2. $|\mathbf{w}| = |\boldsymbol{\nabla}J_s|$.

Proof. Compute $\mathbf{w}\times\boldsymbol{\nabla}J_s$ using $\mathbf{f}_i\times\mathbf{g}_j$ which all reduce to scalar multiples of $\mathbf{n} = \mathbf{f}_0\times\mathbf{f}_1$. The four cross products give

\[\mathbf{w}\times\boldsymbol{\nabla}J\_s \;=\; -\frac{\mathbf{n}}{J\_s}\,\big(A(m\_1^a)^2 - 2C\,m\_0^a m\_1^a + B(m\_0^a)^2\big) \;=\; -\|\mathbf{w}\|^2\,\hat{\mathbf{n}},\]

where the last equality uses $|\mathbf{w}|^2 = A(m_1^a)^2 - 2C\,m_0^a m_1^a + B(m_0^a)^2$ and $\hat{\mathbf{n}} = \mathbf{n}/J_s$. By Lemma 1, $\mathbf{w}\perp\boldsymbol{\nabla}J_s$, so $|\mathbf{w}\times\boldsymbol{\nabla}J_s| = |\mathbf{w}|\,|\boldsymbol{\nabla}J_s|$. Equating with the right-hand side gives $|\mathbf{w}|\,|\boldsymbol{\nabla}J_s| = |\mathbf{w}|^2$. $\square$

Diagonalisation. Choose the orthonormal basis $\{\hat{\mathbf{w}}, \widehat{\boldsymbol{\nabla}J_s}, \hat{\mathbf{n}}\}$ where $\hat{\mathbf{n}}$ is the unit triangle normal (orthogonal to both $\mathbf{w}$ and $\boldsymbol{\nabla}J_s$ by Lemma 1 and the cross-product computation). Off-diagonal entries of $\mathbf{H}_{aa}$ vanish in this basis (each of the three building blocks $\mathbf{I}_3$, $\boldsymbol{\nabla}J_s\,\boldsymbol{\nabla}J_s^T$, $|\mathbf{w}|^2\mathbf{I}_3 - \mathbf{w}\mathbf{w}^T$ is diagonal in it), and using $|\mathbf{w}| = |\boldsymbol{\nabla}J_s|$ to combine the $r$-terms in the $\widehat{\boldsymbol{\nabla}J_s}$ direction:

Direction	Eigenvalue
$\hat{\mathbf{w}}$	$\mu\,\|\mathbf{m}^a\|^2$
$\widehat{\boldsymbol{\nabla}J_s}$	$\mu\,\|\mathbf{m}^a\|^2 + \lambda\,\|\boldsymbol{\nabla}J_s\|^2$
$\hat{\mathbf{n}}$	$\mu\,\|\mathbf{m}^a\|^2 + r\,\|\mathbf{w}\|^2$

The first two eigenvalues are PSD for any $r$ — the $r$-dependence in the $\widehat{\boldsymbol{\nabla}J_s}$ direction cancels exactly because $|\mathbf{w}| = |\boldsymbol{\nabla}J_s|$. Only the normal direction sees $r$, and the PSD condition there is

\[r \;\geq\; -\frac{\mu\,\|\mathbf{m}^a\|^2}{\|\mathbf{w}\|^2}.\]

The right-hand side is geometry-dependent. For a uniform clamp that works for every triangle and every vertex, the only safe choice is $r \geq 0$, i.e.

\[\boxed{\;s\_\text{clamp} \;=\; \max(0, s).\;}\]

This is tight in the uniform sense: any larger lower bound on $s$ would change the physics for at least some configurations where the unclamped block is already PSD; any smaller (more permissive) bound risks an indefinite block in some configuration.

A geometry-aware solver could instead use the per-element lower bound $s \geq -\mu\,|\mathbf{m}^a|^2 J_s/|\mathbf{w}|^2$ and recover a slightly looser projection, but the bookkeeping cost rarely justifies it. Force always uses the unclamped $s = \lambda(J_s - \alpha)$, exactly as in the volumetric case.

A stable Neo-Hookean triangle evaluator therefore keeps the second-derivative contribution and applies the simple uniform clamp $s_\text{clamp} = \max(0, s)$. The simple result for the volumetric tet is genuinely a special property of square deformation gradients.

Sanity Checks Before Shipping

A few things worth verifying when wiring this up:

The shift $\alpha = 1 + \mu/\lambda$ depends on $\lambda \neq 0$. Guard against $\lambda$ near zero (e.g. $\lambda \mapsto \text{sign}(\lambda)\,\max(

\lambda

, \epsilon)$).

Use the explicit cofactor / adjugate $\text{cof}(\mathbf{F})$ rather than $J\,\mathbf{F}^{-T}$. The adjugate is a polynomial in the entries of $\mathbf{F}$ and remains well-defined as $J \to 0$, while $\mathbf{F}^{-T}$ blows up.
The force expression carries the signed $s = \lambda(J - \alpha)$, including when $J < 0$ (inverted tet) or $J < \alpha$ (compressed). This is what pulls inverted tets back through $J = 0$.
The cancellation breaks for higher-order elements (quadratic tets, hexes, isogeometric basis), where $\partial F/\partial x_a$ is no longer a constant rank-1 dyad. If you adapt this evaluator to a non-linear element, the $\mathbf{A}_\sigma$ term reappears and needs SPD projection.
The cancellation also breaks for off-diagonal blocks, so a global Newton solver assembling $\mathbf{H}_{ab}$ for $a \neq b$ does need a clamp. VBD’s per-vertex block does not.

Summary

For a linear tet with a stable Neo-Hookean energy, the VBD per-vertex block reduces to

\[\mathbf{H}\_{aa} \;=\; \mu\,\|\mathbf{m}^a\|^2\,\mathbf{I}\_3 \;+\; \lambda\,(\text{cof}\,\mathbf{F}\,\mathbf{m}^a)(\text{cof}\,\mathbf{F}\,\mathbf{m}^a)^T,\]

unconditionally PSD without any projection of the cofactor-derivative term. The cancellation comes from $\partial F/\partial x_a^\alpha = \delta_{i\alpha}\,m_j^a$ being a rank-1 dyad and the Hessian of $\det\mathbf{F}$ being antisymmetric in matching index pairs, so the contraction collapses through $\mathbf{m}^a \times \mathbf{m}^a = 0$. Force uses the unclamped stress and inversion recovery is carried by the gradient, not the Hessian.

Rigid Body Dynamics with VBD, Section I: Free Bodies

2026-03-19T00:00:00-07:00

In the VBD paper (SIGGRAPH 2024), we briefly discuss extending Vertex Block Descent to rigid body simulation. The idea is natural: instead of updating a single vertex with 3 DoF, you update an entire rigid body with 6 DoF. But the details matter. This post walks through the full derivation—from the continuous Newton-Euler equations, to discrete backward Euler as a nonlinear system, to the Schur complement solve you actually run each iteration—with reference code from Newton, which implements this approach under the name AVBD (Augmented VBD).

This is Section I: free (unconstrained) rigid bodies. Section II will cover articulated bodies with joints.

Prerequisite: This post assumes familiarity with quaternion math for rigid body rotation—in particular the rotation-vector exponential map, quaternion multiplication, and how angular velocity integrates orientation. If you’re rusty on any of this, I recommend reading Quaternion Math for Rigid Body Simulation first.

Continuous Rigid Body Dynamics

A rigid body has two coupled equations of motion. For the translational DoF:

\[m \ddot{\mathbf{x}}_\text{com} = \mathbf{f}\]

where $m$ is the total mass, $\mathbf{x}_\text{com}$ is the world-space center-of-mass position, and $\mathbf{f}$ includes gravity, contact forces, and applied forces.

For the rotational DoF, the Newton-Euler equation in the body frame (where inertia is constant) is:

\[\mathbf{I}_\text{body}\,\dot{\boldsymbol{\omega}} + \boldsymbol{\omega} \times (\mathbf{I}_\text{body}\,\boldsymbol{\omega}) = \boldsymbol{\tau}\]

where $\boldsymbol{\omega}$ is the angular velocity in the body frame, $\boldsymbol{\tau}$ is the torque mapped to the body frame, and $\mathbf{I}_\text{body}$ is the constant body-frame inertia tensor. In the world frame this is equivalently $\mathbf{I}_\text{world}\,\dot{\boldsymbol{\omega}}_\text{world} = \boldsymbol{\tau}_\text{world} - \boldsymbol{\omega}_\text{world} \times \mathbf{I}_\text{world}\,\boldsymbol{\omega}_\text{world}$ with $\mathbf{I}_\text{world} = \mathbf{R}\,\mathbf{I}_\text{body}\,\mathbf{R}^T$.

The full state at step $n$ is: position $\mathbf{x}^n \in \mathbb{R}^3$, orientation $\mathbf{R}^n \in SO(3)$, linear velocity $\mathbf{v}^n$, body-frame angular velocity $\boldsymbol{\omega}^n$, mass $m$, and body inertia $\mathbf{I}_\text{body}$.

Discretizing with Backward Euler

Step 1: Pose Increments as DoFs

Rather than solving for $\mathbf{x}^{n+1}$ and $\mathbf{R}^{n+1}$ directly, we introduce pose increments as the unknowns:

\[\Delta\mathbf{x} \in \mathbb{R}^3, \qquad \Delta\boldsymbol{\theta} \in \mathbb{R}^3 \;\text{(rotation vector)}\]

The new pose is then:

\[\mathbf{x}^{n+1} = \mathbf{x}^n + \Delta\mathbf{x}\] \[\mathbf{R}^{n+1} = \exp(\widehat{\Delta\boldsymbol{\theta}})\,\mathbf{R}^n\]

where $\widehat{\Delta\boldsymbol{\theta}}$ is the skew-symmetric matrix of $\Delta\boldsymbol{\theta}$. This is the standard left-perturbation on $SO(3)$. We will find $\Delta\mathbf{x}$ and $\Delta\boldsymbol{\theta}$ by enforcing implicit Euler as a 6-equation residual system.

Step 2: Translational Residual

Start from the standard backward Euler update:

\[m\,\frac{\mathbf{v}^{n+1} - \mathbf{v}^n}{h} = \mathbf{f}(\mathbf{x}^{n+1}, \mathbf{R}^{n+1})\]

Use the kinematic relation $\mathbf{x}^{n+1} = \mathbf{x}^n + h\,\mathbf{v}^{n+1}$ to eliminate $\mathbf{v}^{n+1} = \Delta\mathbf{x}/h$:

\[m\,\frac{\Delta\mathbf{x}/h - \mathbf{v}^n}{h} = \mathbf{f}(\mathbf{x}^n + \Delta\mathbf{x},\; \mathbf{R}^{n+1}(\Delta\boldsymbol{\theta}))\]

Rearranging to residual form:

\[\mathbf{r}_\text{lin}(\Delta\mathbf{x}, \Delta\boldsymbol{\theta}) \;=\; \frac{m}{h^2}\!\left(\Delta\mathbf{x} - h\mathbf{v}^n\right) - \mathbf{f}(\mathbf{x}^n + \Delta\mathbf{x},\; \mathbf{R}^{n+1}) \;=\; \mathbf{0}\]

Step 3: Rotational Residual (Body Frame)

Recall Newton-Euler equation:

\[\mathbf{I}_\text{body}\,\dot{\boldsymbol{\omega}} + \boldsymbol{\omega} \times (\mathbf{I}_\text{body}\,\boldsymbol{\omega}) = \boldsymbol{\tau}\]

Convert to the stand ODE form of $\dot{\boldsymbol{\omega}}=f(\boldsymbol{\omega}, t)$, we have: $\dot{\boldsymbol{\omega}}= I^{-1}_\text{body}(\boldsymbol{\tau} - \boldsymbol{\omega} \times (\mathbf{I}_\text{body}\,\boldsymbol{\omega}))$

Work in the body frame where $\mathbf{I}_\text{body}$ is constant. Backward Euler on the angular velocity gives:

\[\mathbf{I}_\text{body}\,\frac{\boldsymbol{\omega}^{n+1} - \boldsymbol{\omega}^n}{h} + \boldsymbol{\omega}^{n+1} \times (\mathbf{I}_\text{body}\,\boldsymbol{\omega}^{n+1}) = \boldsymbol{\tau}^{n+1}\]

The rotation increment $\Delta\boldsymbol{\theta}$ integrates to $\mathbf{R}^{n+1}$, so we identify:

\[\boldsymbol{\omega}^{n+1} \approx \frac{\Delta\boldsymbol{\theta}}{h}\]

(constant angular velocity over the step whose integrated angle equals $\Delta\boldsymbol{\theta}$). Substituting and multiplying through by $h$:

\[\mathbf{I}_\text{body}\!\left(\frac{\Delta\boldsymbol{\theta}}{h^2} - \frac{\boldsymbol{\omega}^n}{h}\right) + \frac{1}{h^2}\,\Delta\boldsymbol{\theta} \times (\mathbf{I}_\text{body}\,\Delta\boldsymbol{\theta}) = \boldsymbol{\tau}^{n+1}\]

Rearranging to residual form:

\[\mathbf{r}_\text{rot}(\Delta\mathbf{x}, \Delta\boldsymbol{\theta}) \;=\; \mathbf{I}_\text{body}\!\left(\frac{\Delta\boldsymbol{\theta}}{h^2} - \frac{\boldsymbol{\omega}^n}{h}\right) + \frac{\Delta\boldsymbol{\theta} \times (\mathbf{I}_\text{body}\,\Delta\boldsymbol{\theta})}{h^2} - \boldsymbol{\tau}^{n+1}(\Delta\mathbf{x}, \Delta\boldsymbol{\theta}) \;=\; \mathbf{0}\]

$\Delta\boldsymbol{\theta} \times (\mathbf{I}_\text{body}\,\Delta\boldsymbol{\theta})/h^2$ is called the gyroscopic term. It is a quadratic force term.

Step 4: Combined Nonlinear System

Stack both residuals into a single 6-equation system:

\[F(\Delta\mathbf{x},\,\Delta\boldsymbol{\theta}) = \begin{bmatrix} \mathbf{r}_\text{lin}(\Delta\mathbf{x}, \Delta\boldsymbol{\theta}) \\\\ \mathbf{r}_\text{rot}(\Delta\mathbf{x}, \Delta\boldsymbol{\theta}) \end{bmatrix} = \mathbf{0}\]

This is solved with Newton’s method. Initialize $\Delta\mathbf{x} = h\mathbf{v}^n$, $\Delta\boldsymbol{\theta} = h\boldsymbol{\omega}^n$ (explicit Euler guess), then iterate:

Evaluate residual $F$
Build Jacobian $\mathbf{J} = \partial F / \partial (\Delta\mathbf{x},\, \Delta\boldsymbol{\theta})$
Solve $\mathbf{J}\,\delta = -F$
Update $\Delta\mathbf{x} \mathrel{+}= \delta_x$, $\Delta\boldsymbol{\theta} \mathrel{+}= \delta_\theta$

Once converged, recover the new state and velocities:

\[\mathbf{x}^{n+1} = \mathbf{x}^n + \Delta\mathbf{x}, \qquad \mathbf{R}^{n+1} = \exp(\widehat{\Delta\boldsymbol{\theta}})\,\mathbf{R}^n\] \[\mathbf{v}^{n+1} = \frac{\Delta\mathbf{x}}{h}, \qquad \boldsymbol{\omega}^{n+1} = \frac{\Delta\boldsymbol{\theta}}{h}\]

From Residual to the 6×6 Newton System

Rather than solving the full implicit-Euler residual—which includes the nonlinear gyroscopic term $\boldsymbol{\omega}\times\mathbf{I}_\text{body}\boldsymbol{\omega}$ and requires the Newton-Euler equation to stay in the body frame—we split the problem into explicit and implicit parts:

Explicit: free-body dynamics (inertia, gravity, gyroscopic torque) are forward-integrated once into inertial targets $\mathbf{x}^{\ast}$ and $\mathbf{R}^{\ast}$, then frozen for the rest of the step.
Implicit: contact and constraint forces are resolved iteratively through VBD’s Gauss-Seidel sweeps.

This is a compromise from fully-implicit backward Euler for the rigid-body dynamics. In exchange, it buys three things: the gyroscopic nonlinearity is absorbed into $\mathbf{R}^{\ast}$ rather than carried in the residual, the angular Hessian stays symmetric positive-definite, and the entire Newton system can be assembled and solved in world frame (since the body-frame gyroscopic term—the reason Newton-Euler is traditionally formulated in the body frame—is no longer present in the iterative solve).

With this split, the rotational residual reduces from

\[\mathbf{r}_\text{rot} = \mathbf{I}_\text{body}\!\left(\frac{\Delta\boldsymbol{\theta}}{h^2} - \frac{\boldsymbol{\omega}^n}{h}\right) + \frac{\Delta\boldsymbol{\theta} \times (\mathbf{I}_\text{body}\,\Delta\boldsymbol{\theta})}{h^2} - \boldsymbol{\tau}^{n+1}\]

to a simple spring pulling toward the explicit target:

\[\mathbf{r}_\text{rot} = \frac{1}{h^2}\,\mathbf{I}_\text{world}\,(\Delta\boldsymbol{\theta} - h\boldsymbol{\omega}^{\ast}) - \boldsymbol{\tau}_\text{constraint}\]

where $\boldsymbol{\omega}^{\ast}$ is the gyro-corrected angular velocity from the forward step and $\Delta\boldsymbol{\theta} - h\boldsymbol{\omega}^{\ast}$ is just $-\boldsymbol{\theta}$, the rotation vector from $\mathbf{R}_\text{cur}$ to $\mathbf{R}^{\ast}$. This has the same structure as the translational residual $\tfrac{m}{h^2}(\mathbf{x}_\text{com} - \mathbf{x}^{\ast}_\text{com}) - \mathbf{f}_\text{constraint}$: a quadratic spring to an explicit inertial target, plus implicit constraint forces. The 6×6 system then has a natural 2×2 block structure, all in world frame:

\[\begin{bmatrix} H_{ll} & H_{al}^T \\\\ H_{al} & H_{aa} \end{bmatrix} \begin{bmatrix} \Delta\mathbf{x} \\\\ \Delta\boldsymbol{\omega} \end{bmatrix} = \begin{bmatrix} \mathbf{f}_{lin} \\\\ \mathbf{f}_{ang} \end{bmatrix}\]

where $\Delta\mathbf{x}$ and $\Delta\boldsymbol{\omega}$ are the Newton step corrections and the right-hand side is $-\mathbf{r}$ at the current iterate. In VBD we run this as a single Newton step per body per VBD iteration, giving us a fast inner solve with guaranteed descent.

Inertial Blocks

The inertial blocks are simple springs to the forward-integrated targets:

\[H_{ll}^\text{inertia} = \frac{m}{h^2}\mathbf{I}_3, \qquad \mathbf{f}_{lin}^\text{inertia} = \frac{m}{h^2}(\mathbf{x}^{\ast}_\text{com} - \mathbf{x}_\text{com})\] \[H_{aa}^\text{inertia} = \frac{1}{h^2}\mathbf{I}_\text{world}, \qquad \mathbf{f}_{ang}^\text{inertia} = \frac{1}{h^2}\mathbf{I}_\text{world}\,\boldsymbol{\theta}\] \[H_{al}^\text{inertia} = \mathbf{0}\]

Here $\mathbf{x}^{\ast}_\text{com} = \mathbf{x}^n_\text{com} + h\mathbf{v}^n + h^2 m^{-1}\mathbf{f}_\text{ext}$ is the translational inertial target, $\boldsymbol{\theta}$ is the rotation vector from the current orientation to $\mathbf{R}^{\ast}$, and $\mathbf{I}_\text{world} = \mathbf{R}\,\mathbf{I}_\text{body}\,\mathbf{R}^T$. The linear and angular inertial blocks have identical structure: a mass-weighted pull toward an explicit prediction, with the constraint solve handling everything else.

At convergence the angular residual gives $\mathbf{I}_\text{world}(\boldsymbol{\omega}^{n+1} - \boldsymbol{\omega}^{\ast})/h = \boldsymbol{\tau}_\text{constraint}$, i.e. the only thing that changes $\boldsymbol{\omega}$ from the gyro-corrected prediction is the implicit constraint response.

Computing $\mathbf{R}^{\ast}$: baking the gyroscopic term into the target

The angular target $\mathbf{R}^{\ast}$ is produced by one semi-implicit Newton-Euler step that includes the gyroscopic torque. Inside integrate_rigid_body the body-frame torque used to step angular velocity is

# body-frame angular velocity and torque, with gyroscopic correction
wb = wp.quat_rotate_inv(r0, w0)
tb = wp.quat_rotate_inv(r0, t0) - wp.cross(wb, inertia * wb)   # subtract ω × Iω
w1 = wp.quat_rotate(r0, wb + inv_inertia * tb * dt)            # semi-implicit ω*
r1 = wp.normalize(r0 + wp.quat(w1, 0.0) * r0 * 0.5 * dt)        # → R*

Line by line, with $\mathbf{R}^n$ the current orientation, $\boldsymbol{\omega}^n$ the world-frame angular velocity, and $\boldsymbol{\tau}^n$ the world-frame torque:

Rotate $\boldsymbol{\omega}^n$ into the body frame:

\[\boldsymbol{\omega}_b \;=\; (\mathbf{R}^n)^{T}\,\boldsymbol{\omega}^n\]

Rotate the torque into the body frame and subtract the gyroscopic term (Newton-Euler RHS, $\mathbf{I}_\text{body}\dot{\boldsymbol{\omega}}_b = \boldsymbol{\tau}_b - \boldsymbol{\omega}_b\times\mathbf{I}_\text{body}\boldsymbol{\omega}_b$):

\[\boldsymbol{\tau}_b^\text{eff} \;=\; (\mathbf{R}^n)^{T}\,\boldsymbol{\tau}^n \;-\; \boldsymbol{\omega}_b \times (\mathbf{I}_\text{body}\,\boldsymbol{\omega}_b)\]

Semi-implicit Euler step on body-frame $\boldsymbol{\omega}$, then rotate back to world:

\[\boldsymbol{\omega}^{\ast} \;=\; \mathbf{R}^n\!\left(\boldsymbol{\omega}_b + h\,\mathbf{I}_\text{body}^{-1}\,\boldsymbol{\tau}_b^\text{eff}\right)\]

Compactly:

\[\boxed{\;\boldsymbol{\omega}^{\ast} = \mathbf{R}^n\!\left[\boldsymbol{\omega}_b + h\,\mathbf{I}_\text{body}^{-1}\!\big((\mathbf{R}^n)^{T}\boldsymbol{\tau}^n - \boldsymbol{\omega}_b\times\mathbf{I}_\text{body}\boldsymbol{\omega}_b\big)\right], \qquad \mathbf{R}^{\ast} = \exp\!\big(h\,[\boldsymbol{\omega}^{\ast}]_\times\big)\,\mathbf{R}^n\;}\]

Because $\boldsymbol{\omega}^{\ast}$ includes the gyroscopic correction $-\mathbf{I}_\text{body}^{-1}(\boldsymbol{\omega}^n \times \mathbf{I}_\text{body}\boldsymbol{\omega}^n)\,h$, the free-body residual vanishes to leading order at $\mathbf{R}^{\ast}$. The simple inertial spring $\mathbf{f}_\text{ang}^\text{inertia} = h^{-2}\mathbf{I}_\text{world}\,\boldsymbol{\theta}$ therefore agrees with the full nonlinear residual at the initial iterate $\mathbf{R}_\text{cur} = \mathbf{R}^{\ast}$, with the gyroscopic torque’s value rerouted through $\mathbf{R}^{\ast}$ rather than evaluated directly each iteration.

What this approximation drops

The full rotational residual re-centered at $\mathbf{R}^{\ast}$ (writing $\boldsymbol{\delta}$ for the rotation vector from $\mathbf{R}^{\ast}$ to $\mathbf{R}_\text{cur}$, i.e. how far constraints have pushed the iterate off the target) is

\[\mathbf{r}_\text{rot} = \underbrace{\frac{1}{h^2}\,\mathbf{I}_\text{body}\,\boldsymbol{\delta}}_{\text{kept: inertial spring}} \;+\; \underbrace{\frac{\boldsymbol{\omega}\times\mathbf{I}_\text{body}\boldsymbol{\delta} + \boldsymbol{\delta}\times\mathbf{I}_\text{body}\boldsymbol{\omega}}{h}}_{\text{dropped: gyro coupling}} \;+\; \underbrace{\frac{\boldsymbol{\delta}\times\mathbf{I}_\text{body}\boldsymbol{\delta}}{h^2}}_{\text{dropped: quadratic gyro}} \;-\; \boldsymbol{\tau}_\text{constraint}\]

The kept spring scales like $1/h^2$ in $\boldsymbol{\delta}$, the gyro coupling like $|\boldsymbol{\omega}|/h$, and the quadratic gyro like $|\boldsymbol{\delta}|/h^2$. The ratio of the gyro coupling to the inertial spring is $O(|\boldsymbol{\omega}|h)$—small for typical simulation timesteps. Three practical reasons AVBD drops these terms: the gyro coupling’s Jacobian $[\boldsymbol{\omega}]_\times\mathbf{I} - [\mathbf{I}\boldsymbol{\omega}]_\times$ is not symmetric, which would break the Cholesky factorization used in the Schur complement solve; keeping the gyroscopic term in the residual would require working in the body frame (where $\mathbf{I}_\text{body}$ is constant), giving up the world-frame formulation that contacts and joints naturally live in; and the gyroscopic term evaluated at $\boldsymbol{\omega}^n$ vs. $\boldsymbol{\omega}^{n+1}$ differs by $O(h)$ in torque units, the same order as backward Euler’s intrinsic discretization error, so refining it further would not improve the integrator’s accuracy.

In Newton, forward_step_rigid_bodies computes $\mathbf{x}^{\ast}$ and $\mathbf{R}^{\ast}$ by semi-implicit integration, storing them as body_inertia_q:

# forward_step_rigid_bodies (simplified)
q_new, qd_new = integrate_rigid_body(
    q_current, qd_current, f_ext,
    com_local, I_body, inv_m, inv_I, gravity, dt
)
body_inertia_q[tid] = q_new   # frozen inertial target q*
body_q[tid]         = q_new   # initial guess for iterations

Contact and Constraint Blocks

Any force element (contact, joint) acting at contact point $\mathbf{p}$ with moment arm $\mathbf{r} = \mathbf{p} - \mathbf{x}_\text{com}$ contributes:

\[H_{ll}^c = \mathbf{K}_c, \qquad H_{al}^c = -[\mathbf{r}]_\times^T \mathbf{K}_c, \qquad H_{aa}^c = [\mathbf{r}]_\times^T \mathbf{K}_c\,[\mathbf{r}]_\times\]

where $\mathbf{K}_c = \partial \mathbf{f}_c / \partial \mathbf{x}$ is the contact stiffness and $[\mathbf{r}]_\times$ is the skew-symmetric cross-product matrix. All blocks are summed over adjacent force elements before the solve.

Assembling the 6×6 System in Code

Contact Force and Hessian (`evaluate_rigid_contact_from_collision`)

For each contact between body $A$ and body $B$, the contact model in evaluate_rigid_contact_from_collision computes the full wrench and Hessian blocks for both bodies. The normal force and stiffness come from the penalty model:

# Normal force and stiffness
n_outer  = wp.outer(contact_normal, contact_normal)
f_total  = contact_normal * (contact_ke * penetration_depth)
K_total  = contact_ke * n_outer

Damping is added when the contact is closing ($\mathbf{v}_\text{rel} \cdot \hat{\mathbf{n}} < 0$):

# Relative velocity via finite difference of contact points
dx_rel = (x_c_b_now - x_c_b_prev) - (x_c_a_now - x_c_a_prev)
v_rel  = dx_rel / dt
v_dot_n = wp.dot(contact_normal, v_rel)

if contact_kd > 0.0 and v_dot_n < 0.0:
    damping_coeff    = contact_kd * contact_ke
    f_total         += -damping_coeff * v_dot_n * contact_normal
    K_total         += (damping_coeff / dt) * n_outer

Then for each body the moment arm $\mathbf{r} = \mathbf{p}_\text{contact} - \mathbf{x}_\text{com}$ is used to build all three Hessian blocks:

# Body B side (body A is symmetric with opposite sign on force)
force_b  =  f_total
torque_b = wp.cross(r_b, force_b)

r_b_skew       = wp.skew(r_b)               # [r]_x
r_b_skew_T_K   = wp.transpose(r_b_skew) * K_total

h_ll_b = K_total                             # ∂f/∂x
h_al_b = -r_b_skew_T_K                      # ∂τ/∂x  =  -[r]_x^T K
h_aa_b =  r_b_skew_T_K * r_b_skew           # ∂τ/∂ω  =  [r]_x^T K [r]_x

Per-Body Accumulation (`accumulate_body_body_contacts_per_body`)

Rather than iterating over all contacts globally, the solver builds a per-body contact list once per step (a CSR-style buffer). During each Gauss-Seidel color sweep, each body iterates only over its own contacts using 16 strided threads, accumulating into local registers before a single atomic write:

# Each body_id iterates its own contact list (strided over 16 threads)
force_acc = vec3(0);  torque_acc = vec3(0)
h_ll_acc  = mat33(0); h_al_acc  = mat33(0); h_aa_acc = mat33(0)

i = thread_id_within_body               # 0..15
while i < num_contacts_for_body:
    contact_idx = body_contact_indices[body_id * buffer_size + i]

    # Compute contact world points and penetration depth
    cp0_world = transform_point(body_q[b0], cp0_local)
    cp1_world = transform_point(body_q[b1], cp1_local)
    penetration = thickness - dot(contact_normal, cp1_world - cp0_world)

    if penetration > eps:
        force_0, torque_0, h_ll_0, h_al_0, h_aa_0,
        force_1, torque_1, h_ll_1, h_al_1, h_aa_1 = \
            evaluate_rigid_contact_from_collision(b0, b1, ...)

        # Pick the side that belongs to this body
        if body_id == b0:
            force_acc += force_0;  torque_acc += torque_0
            h_ll_acc  += h_ll_0;   h_al_acc  += h_al_0;  h_aa_acc += h_aa_0
        else:
            force_acc += force_1;  torque_acc += torque_1
            h_ll_acc  += h_ll_1;   h_al_acc  += h_al_1;  h_aa_acc += h_aa_1

    i += 16   # stride

# One atomic add per body at the end
atomic_add(body_forces,      body_id, force_acc)
atomic_add(body_torques,     body_id, torque_acc)
atomic_add(body_hessian_ll,  body_id, h_ll_acc)
atomic_add(body_hessian_al,  body_id, h_al_acc)
atomic_add(body_hessian_aa,  body_id, h_aa_acc)

Final Assembly and Solve (`solve_rigid_body`)

After all contacts (and joints, via evaluate_joint_force_hessian) have been accumulated into body_forces/torques/hessians, solve_rigid_body reads those external contributions and adds the inertial blocks to form the complete system:

# ── Inertial contributions ────────────────────────────────────────
inertial_coeff = m * dt_sqr_reciprocal          # m/h²

# Linear inertial force: pull COM toward inertial target
f_lin = (com_star - com_current) * inertial_coeff

# Angular inertial torque: pull orientation toward target
q_delta   = quat_inverse(rot_current) * rot_star
theta_body = axis_angle_to_vec(q_delta)         # rotation vector in body frame
tau_body   = I_body * (theta_body * dt_sqr_reciprocal)
tau_world  = quat_rotate(rot_current, tau_body)

# Angular Hessian in world frame
R_cur      = quat_to_matrix(rot_current)
I_world    = R_cur * I_body * R_cur.T
angular_hessian = dt_sqr_reciprocal * I_world

# ── Add external (contact + joint) contributions ──────────────────
f_force  = f_lin   + external_forces[body_id]
f_torque = tau_world + external_torques[body_id]

h_ll = diag(inertial_coeff) + external_hessian_ll[body_id]
h_al =                         external_hessian_al[body_id]
h_aa = angular_hessian       + external_hessian_aa[body_id]

# ── Joint contributions (CSR adjacency loop) ──────────────────────
for j in adjacent_joints(body_id):
    jf, jt, jH_ll, jH_al, jH_aa = evaluate_joint_force_hessian(body_id, j, ...)
    f_force  += jf;   f_torque += jt
    h_ll += jH_ll;    h_al += jH_al;   h_aa += jH_aa

# ── Schur complement solve (see next section) ─────────────────────
dw, dx = schur_solve(h_ll, h_al, h_aa, f_force, f_torque)

The key design point: contacts write into a shared body_forces/hessians buffer with atomic adds (one write per body per color), while joints are accumulated inline inside solve_rigid_body via a private loop. Both feed into the same 6×6 solve.

Solving via Schur Complement

We reduce the 6×6 system to two successive 3×3 solves. Eliminate $\Delta\mathbf{x}$ from the top block row:

\[\Delta\mathbf{x} = H_{ll}^{-1}(\mathbf{f}_{lin} - H_{al}^T \Delta\boldsymbol{\omega})\]

Substitute into the bottom row:

\[\underbrace{(H_{aa} - H_{al}\,H_{ll}^{-1}\,H_{al}^T)}_{\mathbf{S}}\,\Delta\boldsymbol{\omega} = \mathbf{f}_{ang} - H_{al}\,H_{ll}^{-1}\,\mathbf{f}_{lin}\]

Factorize and solve in order:

Step 1. $\mathbf{L}_M\mathbf{L}_M^T = H_{ll}$ (Cholesky)

Step 2. $\mathbf{S} = H_{aa} - H_{al}\,H_{ll}^{-1}\,H_{al}^T$ (Schur complement)

Step 3. $\mathbf{S}\,\Delta\boldsymbol{\omega} = \mathbf{f}_{ang} - H_{al}\,H_{ll}^{-1}\,\mathbf{f}_{lin}$ (solve for angular)

Step 4. $H_{ll}\,\Delta\mathbf{x} = \mathbf{f}_{lin} - H_{al}^T\,\Delta\boldsymbol{\omega}$ (back-substitute for linear)

Both 3×3 solves use a packed Cholesky (6-float lower-triangular factor). $H_{aa}$ is lightly regularized first:

\[H_{aa} \leftarrow H_{aa} + \varepsilon\mathbf{I}, \qquad \varepsilon = 10^{-9}\!\left(\tfrac{\mathrm{tr}(H_{aa})}{3} + 1\right)\]

Pose Update and Velocity Recovery

Apply the Newton increments to the current pose:

\[\mathbf{x}_\text{com}^\text{new} = \mathbf{x}_\text{com} + \Delta\mathbf{x}\] \[\mathbf{r}^\text{new} = \delta\mathbf{r} \otimes \mathbf{r}, \qquad \delta\mathbf{r} = \text{quat\_from\_axis\_angle}\!\left(\tfrac{\Delta\boldsymbol{\omega}}{|\Delta\boldsymbol{\omega}|},\; |\Delta\boldsymbol{\omega}|\right)\]

For small $\Delta\boldsymbol{\omega}$ the first-order approximation $\delta\mathbf{r} \approx \text{normalize}(\tfrac{1}{2}\Delta\boldsymbol{\omega},\, 1)$ is used for efficiency (controlled by _USE_SMALL_ANGLE_APPROX in Newton).

After all VBD iterations finish, velocities are recovered by finite difference (BDF1):

\[\mathbf{v}^{n+1} = \frac{\mathbf{x}_\text{com}^{n+1} - \mathbf{x}_\text{com}^n}{h}, \qquad \boldsymbol{\omega}^{n+1} = \frac{\log(\mathbf{r}^n{}^{-1} \otimes \mathbf{r}^{n+1})}{h}\]

AVBD: Adaptive Penalty for Constraints and Contacts

For particles, force elements are elastic energies with analytic Hessians. For rigid bodies, the dominant force elements are contacts (non-penetration) and joints (relative pose targets). Both enter the same Newton system as soft penalty forces with adaptive stiffness—this is the “Augmented” in AVBD.

A contact with penetration depth $d > 0$ contributes $E_c = \tfrac{1}{2}k_c d^2$, giving $\mathbf{f}_c = k_c d\,\hat{\mathbf{n}}$ and stiffness $k_c$. Rather than a fixed $k_c$, the penalty grows each iteration to push the violation toward zero:

\[k\_c \leftarrow \min\!\left(k\_c + \beta\,|C|,\; k\_\text{max}\right)\]

where $C$ is the constraint violation, $\beta$ is a ramp rate, and $k_\text{max}$ is the material stiffness cap. At the start of each timestep, $k_c$ is warmstarted from the previous step with a small decay:

\[k\_c \leftarrow \gamma\,k\_c, \qquad k\_c \in [k\_\text{min},\; k\_\text{max}]\]

with $\gamma \approx 0.99$. This carries stiffness information across frames without indefinite growth.

The Complete Per-Step Algorithm

# ── Initialization ────────────────────────────────────────────────
for each body b:
    q_star[b] = forward_integrate(q[b], qd[b], f_ext, dt)
    q[b]          = q_star[b]    # initial guess = inertial target
    body_inertia_q[b] = q_star[b]

warmstart_penalties(gamma)       # k <- clamp(gamma*k, k_min, k_max)
build_contact_lists(contacts)    # per-body CSR adjacency

# ── VBD Iterations ────────────────────────────────────────────────
for iter in range(N_iterations):
    for color in body_color_groups:      # Gauss-Seidel by coloring
        zero(body_forces, body_torques, body_hessians)

        for each contact adjacent to bodies in color:
            f, tau, H_ll, H_al, H_aa = contact_force_hessian(...)
            body_forces[b]  += f
            body_torques[b] += tau
            body_hessians[b] += (H_ll, H_al, H_aa)

        for each body b in color:
            # Inertial contributions (from r_lin, r_rot)
            f_lin = (m/h^2) * (x_com_star - x_com)
            theta  = log(r^-1 * r_star)     # rotation vector to target
            f_ang = (I_world/h^2) * theta
            H_ll  = (m/h^2)*I3 + H_ll_contacts
            H_al  =                H_al_contacts
            H_aa  = I_world/h^2  + H_aa_contacts

            for each joint adjacent to b:
                f_lin, f_ang, H_ll, H_al, H_aa += joint_force_hessian(b, j)

            # Schur complement solve
            L_M = chol(H_ll)
            S   = H_aa - H_al @ inv(L_M) @ H_al.T
            dw  = solve(chol(S), f_ang - H_al @ solve(L_M, f_lin))
            dx  = solve(L_M, f_lin - H_al.T @ dw)

            x_com += dx
            r = normalize(quat(dw) * r)

    # Dual update after each sweep
    for each contact c:
        k_c = min(k_c + beta * |penetration_c|, k_max_c)
    for each joint j:
        k_j = min(k_j + beta * |C_j|,           k_max_j)

# ── Finalization ──────────────────────────────────────────────────
for each body b:
    v[b]     = (x_com[b] - x_com_prev[b]) / dt
    omega[b] = quat_velocity(r[b], r_prev[b], dt)
    body_q_prev[b] = body_q[b]

Reference Code

The implementation lives in Newton’s VBD solver. Key files:

rigid_vbd_kernels.py — GPU kernels: forward_step_rigid_bodies, solve_rigid_body, update_duals_body_body_contacts, update_duals_joint, update_body_velocity
solver_vbd.py — orchestration in SolverVBD.step() and _solve_rigid_body_iteration()

The Schur complement solve from solve_rigid_body:

# Regularize H_aa
trA = wp.trace(h_aa) / 3.0
eps = 1e-9 * (trA + 1.0)
h_aa[0,0] += eps;  h_aa[1,1] += eps;  h_aa[2,2] += eps

# Factorize H_ll
Lm = chol33(h_ll)

# Compute H_ll^{-1} * H_al^T, column by column
X0 = chol33_solve(Lm, h_al[0])
X1 = chol33_solve(Lm, h_al[1])
X2 = chol33_solve(Lm, h_al[2])
MinvCt = mat33_from_columns(X0, X1, X2)

# Schur complement and solve
S     = h_aa - h_al @ MinvCt
Ls    = chol33(S)
rhs_w = f_ang - h_al @ chol33_solve(Lm, f_lin)
dw    = chol33_solve(Ls, rhs_w)           # angular increment
dx    = chol33_solve(Lm, f_lin - wp.transpose(h_al) @ dw)  # linear

# Apply (small-angle approximation)
half_w = dw * 0.5
dq     = wp.normalize(wp.quat(half_w[0], half_w[1], half_w[2], 1.0))
r_new  = wp.normalize(dq * r_current)
x_com_new = x_com + dx

AVBD dual updates after each color sweep:

# Contact penalty (update_duals_body_body_contacts)
penetration = max(0.0, thickness - dot(n, p1_world - p0_world))
k[contact]  = min(k[contact] + beta * penetration, k_max[contact])

# Joint penalty (update_duals_joint), e.g. BALL joint
C_lin    = length(x_child_frame - x_parent_frame)
k[joint] = min(k[joint] + beta * C_lin, k_max[joint])

What’s Next

This covers the full pipeline for free rigid bodies: continuous Newton-Euler dynamics, the pose-increment formulation of backward Euler, the resulting 6×6 Newton system and its Schur complement solve, and the AVBD adaptive penalty mechanism for contacts. Each body is updated as an independent local solve within its color group, matching the exact VBD pattern from the particle solver—just 6 DoF instead of 3.

Section II will cover articulated bodies: joint constraints, the rotation-vector curvature error for cable/fixed joints, and how the adjacency graph coloring extends to joint chains.

Newton: github.com/newton-physics/newton

VBD paper: Anka He Chen, Ziheng Liu, Yin Yang, Cem Yuksel. “Vertex Block Descent.” ACM Trans. Graph. 43, 4, Article 116 (2024). doi:10.1145/3658179

Implementing VBD Damping Properly

2026-03-17T00:00:00-07:00

Vertex Block Descent (VBD) is a physics solver we published at SIGGRAPH 2024 for elastic body dynamics. It offers unconditional stability, excellent GPU parallelism, and fast convergence to implicit Euler solutions. While the paper covers the formulation comprehensively, actually implementing VBD correctly—especially the damping—turns out to be subtler than it first appears. This post discusses the key pitfalls and how to get them right, based on lessons learned during development with NVIDIA Warp.

What Is VBD, in a Nutshell?

VBD solves the variational form of implicit Euler:

\[\mathbf{x}^{t+1} = \underset{\mathbf{x}}{\operatorname{argmin}} \; G(\mathbf{x}) = \frac{1}{2h^2} \| \mathbf{x} - \mathbf{y} \|_M^2 + E(\mathbf{x})\]

Instead of assembling and solving a massive global linear system (as Newton’s method would), VBD updates one vertex at a time, solving a tiny 3×3 local system:

\[\mathbf{H}_i \, \Delta\mathbf{x}_i = \mathbf{f}_i\]

where $\mathbf{H}_i$ is the local Hessian and $\mathbf{f}_i$ is the total force on vertex $i$, both assembled only from force elements that touch vertex $i$. This is essentially block Gauss-Seidel on the vertex positions. Each local solve is cheap (a 3×3 analytical inverse), and because we color vertices rather than elements, we typically need only 6–9 colors for parallelization—an order of magnitude fewer than element-based coloring.

The critical guarantee: every local solve that reduces $G_i$ also reduces the global energy $G$, giving us unconditional stability even with a single iteration per time step.

The Damping Trap: Why Naïve Rayleigh Damping Breaks Physics

The paper describes Rayleigh stiffness-proportional damping as modifying the force and Hessian:

\[\mathbf{f}_i = -\frac{m_i}{h^2}(\mathbf{x}_i - \mathbf{y}_i) - \sum_{j \in \mathcal{F}_i} \frac{\partial E_j}{\partial \mathbf{x}_i} - \left(\sum_{j \in \mathcal{F}_i} \frac{k_d}{h} \frac{\partial^2 E_j}{\partial \mathbf{x}_i^2}\right)(\mathbf{x}_i - \mathbf{x}_i^t)\]

This looks straightforward: take the stiffness Hessian, scale by $k_d/h$, multiply by the displacement (which approximates $h \cdot v$), and add to the force. However, there is a critical implementation subtlety that is easy to miss.

The Bug: Damping That Kills Free Fall

A naïve implementation might do something like:

displacement = x_prev - x_current
h_d = hessian * (damping / dt)
f_d = h_d * displacement

This applies damping proportional to the absolute velocity of the vertex. The problem is immediate: a freely falling object has nonzero absolute velocity, so this damping fights gravity. Objects sink slower than they should. Stacked objects behave as if embedded in molasses.

The mathematical reason: for full Rayleigh damping, the damping force on vertex $i$ is:

\[\mathbf{f}_{d,i} = -\beta \sum_j \mathbf{K}_{ij} \mathbf{v}_j\]

If the entire system translates rigidly ($\mathbf{v}_j = \mathbf{v}$ for all $j$), then $\mathbf{f}_{d,i} = -\beta (\sum_j \mathbf{K}_{ij}) \mathbf{v}$. For any translation-invariant energy, $\sum_j \mathbf{K}_{ij} = \mathbf{0}$, so the damping force vanishes. But in VBD, we only have the diagonal block $\mathbf{K}_{ii}$, and $\mathbf{K}_{ii} \mathbf{v} \neq \mathbf{0}$ in general.

The Fix: Damp the Internal Variable, Not the Position

The solution is to formulate damping in terms of internal variables—quantities that are inherently invariant to rigid motion.

For volumetric elasticity, the internal variable is the deformation gradient $\mathbf{F} = \mathbf{D}_s \mathbf{D}_m^{-1}$, where $\mathbf{D}_s = [\mathbf{x}_1 - \mathbf{x}_0, \mathbf{x}_2 - \mathbf{x}_0, \mathbf{x}_3 - \mathbf{x}_0]$. Its rate of change is:

\[\dot{\mathbf{F}} = \dot{\mathbf{D}}_s \mathbf{D}_m^{-1}\]

where $\dot{\mathbf{D}}_s = [\mathbf{v}_1 - \mathbf{v}_0, \mathbf{v}_2 - \mathbf{v}_0, \mathbf{v}_3 - \mathbf{v}_0]$. Notice: $\dot{\mathbf{F}}$ depends only on relative velocities. If all four vertices move with the same velocity, $\dot{\mathbf{D}}_s = \mathbf{0}$, so $\dot{\mathbf{F}} = \mathbf{0}$. No damping.

The damping stress is then:

\[\mathbf{P}_{\text{damp}} = k_d \cdot \frac{\partial^2 E}{\partial \mathbf{F}^2} : \dot{\mathbf{F}}\]

And the force on vertex $i$ is assembled as $\mathbf{f}_{d,i} = -V_0 \, \mathbf{G}_i^T \text{vec}(\mathbf{P}_{\text{damp}})$, where $\mathbf{G}_i = \partial \text{vec}(\mathbf{F}) / \partial \mathbf{x}_i$ is the 9×3 matrix mapping vertex displacements to flattened deformation gradient changes.

For dihedral-angle bending, the internal variable is the dihedral angle $\theta$ between two adjacent triangles. The angular velocity is:

\[\dot{\theta} = \sum_{j=0}^{3} \frac{\partial \theta}{\partial \mathbf{x}_j} \cdot \mathbf{v}_j\]

Since $\theta$ depends only on relative positions, we have $\sum_j \frac{\partial \theta}{\partial \mathbf{x}_j} = \mathbf{0}$. For rigid translation ($\mathbf{v}_j = \mathbf{v}$), $\dot{\theta} = \mathbf{v} \cdot \sum_j \frac{\partial \theta}{\partial \mathbf{x}_j} = 0$. The damping force is:

\[\mathbf{f}_{d,i} = -c \, \dot{\theta} \, \frac{\partial \theta}{\partial \mathbf{x}_i}\]

Think of it like a door hinge with a damper: the damper resists opening/closing, but if you translate the entire door frame, the damper does nothing.

For collision damping, the internal variable is the gap distance $d$ between contact points. Using barycentric weights $b_j$ that sum to zero ($\sum_j b_j = 0$), the gap rate is:

\[\dot{d} = \sum_j b_j \, (\hat{\mathbf{n}} \cdot \mathbf{v}_j)\]

Again, rigid translation produces $\dot{d} = \hat{\mathbf{n}} \cdot \mathbf{v} \cdot \sum_j b_j = 0$.

The General Principle

For any energy $E = f(q)$ where $q$ is a translation-invariant internal variable:

\[\sum_j \frac{\partial q}{\partial \mathbf{x}_j} = \mathbf{0}\]

This guarantees $\dot{q} = 0$ for rigid translation, and therefore zero damping force. If $q$ is also rotation-invariant (like edge lengths and dihedral angles), then rigid rotation is also undamped.

The pattern for VBD is always:

Force: compute using all vertices in the stencil (exact relative velocity information)
Hessian: use only the diagonal block (the standard VBD approximation)

This asymmetry is fundamental to VBD: forces are exact, Hessians are approximate. Off-diagonal coupling is recovered through iteration.

Welcome to My Blog

2024-01-01T00:00:00-08:00

Welcome to my blog! I’ll be sharing updates about my research and projects here.

Stay tuned for future posts!