<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://ankachan.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://ankachan.github.io/" rel="alternate" type="text/html" /><updated>2026-06-08T14:32:16-07:00</updated><id>https://ankachan.github.io/feed.xml</id><title type="html">Anka He Chen</title><subtitle>PhD Computing Student - Computer Graphics &amp; Physics-Based Animation</subtitle><author><name>Anka He Chen</name><email>ankachan92@gmail.com</email></author><entry><title type="html">A Sparse Coarse Space for VBD from Element Eigenmodes</title><link href="https://ankachan.github.io/posts/2026/05/vbd-sparse-coarse-space/" rel="alternate" type="text/html" title="A Sparse Coarse Space for VBD from Element Eigenmodes" /><published>2026-05-20T00:00:00-07:00</published><updated>2026-05-20T00:00:00-07:00</updated><id>https://ankachan.github.io/posts/2026/05/vbd-sparse-coarse-space</id><content type="html" xml:base="https://ankachan.github.io/posts/2026/05/vbd-sparse-coarse-space/"><![CDATA[<p>VBD’s per-vertex 3×3 solve already gives the locally optimal descent direction — there is no leverage left inside a single block. So why does it still slow down on problems with high stiffness contrast? The answer turns out to be a story about <strong>what basis you descend in</strong>, and once that picture is in place the fix — a sparse coarse correction built from per-element eigenmodes — almost suggests itself.</p>

<hr />

<h2 id="conditioning-geometrically">Conditioning, geometrically</h2>

<p>For a symmetric positive definite matrix in the 2-norm,</p>

\[\kappa(A) = \lVert A\rVert_2 \cdot \lVert A^{-1}\rVert_2 = \frac{\lambda_\text{max}}{\lambda_\text{min}}\]

<p>The quadratic energy $E(\mathbf{x}) = \tfrac{1}{2}\mathbf{x}^T A \mathbf{x}$ has level sets that are ellipsoids whose principal axes are the eigenvectors of $A$ and whose squared semi-axis lengths are $1/\lambda_i$. So $\kappa$ is literally the <strong>squared aspect ratio</strong> of the level-set ellipse. Big $\kappa$ means very elongated; $\kappa = 1$ means a sphere.</p>

<p><img src="/images/posts/vbd-sparse-coarse-space/condition-eccentricity.png" alt="Condition number controls eccentricity of the energy ellipse" /></p>

<p>That ratio is exactly what governs CG’s convergence:</p>

\[\lVert e_k\rVert_A \;\leq\; 2 \left(\frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}\right)^k \lVert e_0\rVert_A\]

<p>For a FEM stiffness matrix, $\kappa$ is at least as bad as the stiffness ratio between the stiffest and softest elements. A contrast of $10^6$ pushes the CG factor to $\approx 0.999$ — thousands of iterations to make a dent. Stiffness contrast is the standard way to wreck a global iterative solve.</p>

<p>But CG is not what VBD does. VBD is block coordinate descent, and for BCD, $\kappa$ is <strong>not</strong> the governing quantity.</p>

<hr />

<h2 id="diagonal-dominance-is-a-different-geometry">Diagonal dominance is a different geometry</h2>

<p>Partition $A$ into block-diagonal $D$ and off-diagonal $L + L^T$. Block Gauss-Seidel iterates with matrix $M = -(D+L)^{-1}L^T$, and its convergence rate is $\rho(M)$, controlled by <strong>block diagonal dominance</strong> — the size of each diagonal block $D_i$ relative to its incident off-diagonal entries.</p>

<p>This is a distinct quantity from $\kappa$. Geometrically:</p>

<ul>
  <li>$\kappa$ controls the <strong>eccentricity</strong> of the energy ellipse — how elongated the level sets are.</li>
  <li>Diagonal dominance controls the <strong>alignment</strong> — how rotated those level sets are relative to your coordinate axes.</li>
</ul>

<p>You can have a well-conditioned, nearly circular system whose principal axes are tilted 45° from the coordinate basis: BCD still zig-zags because each coordinate step cuts diagonally across the level sets. And you can have a horribly elongated system whose long axis is <em>aligned</em> with a coordinate axis: BCD walks straight down it in a sweep.</p>

<p><img src="/images/posts/vbd-sparse-coarse-space/diagonal-dominance.png" alt="Same kappa, different rotation; BCD zig-zag depends on alignment, not eccentricity" /></p>

<p>Same $\kappa = 25$ in both panels, completely different BCD experience. That distinction is why VBD does fine on uniformly stiff materials but stalls at <em>interfaces</em>: a soft vertex coupled to a stiff neighbor has small $D_i$ but a large off-diagonal entry on the stiff side. Local diagonal dominance breaks at the interface, regardless of the global condition number.</p>

<hr />

<h2 id="every-iterative-method-is-a-choice-of-descent-basis">Every iterative method is a choice of descent basis</h2>

<p>Once the picture is “an energy ellipse and a sequence of line searches,” every iterative method just becomes a strategy for <strong>picking the directions to search along</strong>. Three are well-established, plus a fourth that this post is building toward:</p>

<p><strong>Eigenbasis.</strong> Diagonalize $A = Q \Lambda Q^T$ and change variables $\mathbf{z} = Q^T\mathbf{x}$. In the new coordinates, the energy decouples into independent parabolas $\sum_i \tfrac{1}{2}\lambda_i z_i^2 - \tilde b_i z_i$, and each one is solved in a single division. One sweep, done. The catch is finding $Q$ in the first place: an eigendecomposition costs $O(n^3)$, which is more expensive than the direct solve you were trying to avoid.</p>

<p><strong>CG.</strong> CG does not try to find the eigenbasis directly. It builds its own basis adaptively, one vector per iteration, by taking matrix-vector products from the current residual:</p>

\[\mathcal{K}_k(A, r_0) = \mathrm{span}\{r_0,\, A r_0,\, A^2 r_0,\, \ldots,\, A^{k-1} r_0\}\]

<p>Each new direction is $A$-conjugate to the previous ones, so progress in one direction is never undone in the next. In 2D, two iterations span the whole space and CG is exact. The directions are <em>globally optimal</em> given what CG has seen, but each is dense — forming the next Krylov vector takes a full matvec, and computing $\alpha_k$ takes a global inner product. That is the parallel bottleneck.</p>

<p><strong>VBD (colored Gauss-Seidel).</strong> VBD picks the coordinate axes, in disjoint groups. Within a color, every vertex’s 3 coordinate axes are independent and can be solved in parallel via direct 3×3 inverses. The directions are <em>sparse</em> by construction (one vertex’s 3 axes touch only that vertex and its incident elements) so the local solves are embarrassingly parallel and there is no global reduction — but the basis is fixed in the coordinate system, regardless of where the actual eigenvectors of $A$ are pointing. When the principal axes are tilted, every coordinate step is partly orthogonal to the direction it should be going, and the iteration zig-zags.</p>

<p><strong>Sparse local-eigen basis.</strong> What I am after: directions that are sparse like VBD’s (so the local solve and parallelism survive) but <em>spectrally informed</em> like CG’s (so each direction actually points along a principal axis of $A$). The element stiffness matrices give them to us for free, as we will see in a moment.</p>

<p><img src="/images/posts/vbd-sparse-coarse-space/descent-comparison.png" alt="Three descent strategies on the same energy ellipse: eigenbasis, CG, VBD coordinate basis" /></p>

<p>The three established strategies, stacked against the same tilted, ill-conditioned ellipse: eigenbasis walks straight in, CG takes two clean $A$-conjugate steps, VBD’s coordinate basis zig-zags down the long axis. The fourth strategy — sparse and spectrally informed — has no off-the-shelf solver to plot here yet; the rest of the post is what it would look like for VBD.</p>

<hr />

<h2 id="what-an-eigenbasis-rebase-buys">What an eigenbasis rebase buys</h2>

<p>Zoom into the eigenbasis option for a moment. The reason it converges in one sweep is just that the change of variables $\mathbf{z} = Q^T \mathbf{x}$ rotates the ellipse so its principal axes line up with the new coordinate axes:</p>

<p><img src="/images/posts/vbd-sparse-coarse-space/eigenbasis-rebase.png" alt="Same energy viewed in original basis vs eigenbasis; BCD zig-zags on the left, lands in one sweep on the right" /></p>

<p>Same quadratic, same starting point, two coordinate systems. On the left the principal axes (dashed) sit at 30° from the coordinates and BCD ricochets between them. On the right, after applying $Q^T$ the contours line up with the new axes and one update of each coordinate lands at the minimum — the “sum of independent parabolas” decoupling $E(\mathbf{z}) = \frac{1}{2}\sum_i \lambda_i z_i^2$, each component solved by a single division $z_i^\star = \tilde b_i / \lambda_i$.</p>

<p>This is the upper bound on what any rebasing can buy. Every other strategy — CG’s adaptive Krylov directions, multigrid’s hierarchical bases, the usual preconditioners — is an approximation of the same trick at lower cost. For VBD we cannot afford the global $Q$, but the element stiffness matrices already hand us <em>local</em> $Q_e$’s for free; the rest of the post builds a cheap, sparse coarse correction on that observation.</p>

<hr />

<h2 id="vbds-actual-bottleneck">VBD’s actual bottleneck</h2>

<p>With graph coloring, every vertex of a given color is decoupled from every other vertex of that color. One color sweep is thousands of perfectly parallel 3×3 direct solves, each inverting its local Hessian exactly. There is no further leverage from rotating an individual block into its own eigenbasis — the direct $3\times 3$ inverse is already optimal in any basis.</p>

<p>So the bottleneck cannot be the local solve. It has to be inter-color communication. With $k$ colors and a mesh of diameter $d$, information needs $O(d/k)$ sweeps to cross the mesh: vertex $i$ updates in color 1, neighbor $j$ in color 2 sees the new residual the next sweep, vertex $l$ in color 1 sees $j$’s update the sweep after, and so on. The stiffer the inter-vertex coupling, the bigger the residual that bounces between colors — the BCD zig-zag, lifted from coordinate axes to entire colors.</p>

<p>That immediately tells me which modes I should care about: the ones that <strong>span colors</strong> and are <strong>stiff</strong>. Within-color modes are absorbed by the local solve; weak-coupling modes converge fine on their own.</p>

<hr />

<h2 id="element-eigenmodes-as-a-free-coarse-basis">Element eigenmodes as a free coarse basis</h2>

<p>Every FEM element ships with its own stiffness matrix $K_e$ — $12\times 12$ for a linear tet, $9\times 9$ for a linear tri. Its eigendecomposition</p>

\[K_e = Q_e\,\Lambda_e\,Q_e^T\]

<p>is essentially free, and the columns of $Q_e$ are the element’s natural deformation modes (stretch along a fiber direction, shear, volumetric compression, etc.). Pick the stiffest few modes per element and stack them as columns of a tall, sparse matrix:</p>

\[P = \big[\,\phi_1\;\big|\;\phi_2\;\big|\;\cdots\;\big|\;\phi_m\,\big]\]

<p>Each $\phi_j \in \mathbb{R}^{3N}$ is nonzero only on the vertices of its source element. Now solve a small reduced system at every outer iteration:</p>

\[(P^T A P)\,z = P^T r, \qquad \mathbf{x} \leftarrow \mathbf{x} + P z\]

<p>This is a deflation / coarse-space correction — it projects the residual into the span of the selected modes and solves optimally inside that span. Slotted into the VBD loop:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for each sweep:
    for each color:
        parallel_vertex_block_solves()   # standard VBD
    r = compute_residual()
    z = solve(P^T A P, P^T r)            # sparse coarse correction
    x += P @ z
</code></pre></div></div>

<p>The mental model: this is a <strong>physics-informed two-level method</strong>. The VBD color sweeps are the smoother — they handle high-frequency error perfectly. The coarse solve handles exactly the cross-color modes that the smoother is structurally incapable of resolving in a single sweep. Algebraic multigrid does the same trick, but pays a heavy setup phase to <em>discover</em> its coarse basis from algebraic heuristics. Here the basis is handed to us by the constitutive model.</p>

<p>Sparsity-wise: two columns of $P$ interact in $P^T A P$ only if their source elements share a vertex, so the coarse system is sparse along the element-adjacency graph. No global dense inner products like CG would need.</p>

<hr />

<h2 id="the-cost-of-one-direction">The cost of one direction</h2>

<p>For a single sparse basis vector $\phi$, the optimal step along it is a 1D line search,</p>

\[\alpha^* \;=\; \frac{\phi^T r}{\phi^T A\,\phi}\]

<p>If $\phi$ has support on $k$ vertices, both products only touch a local neighborhood — $O(k\cdot\text{valence})$, not $O(n)$. For a nonlinear energy it is a couple of 1D Newton iterations on the same local support. Each individual direction is effectively free.</p>

<p>The cost lives in the <strong>count</strong> of columns of $P$, not in each column. Naïvely picking one mode per element gives $m \approx 5N$–$6N$ for a tet mesh, bigger than the original vertex system. That defeats the purpose.</p>

<hr />

<h2 id="whats-left-to-figure-out">What’s left to figure out</h2>

<p>The construction is only useful if $P$ stays small. The pruning strategies that look most promising:</p>

<ul>
  <li><strong>Stiffness-contrast filter.</strong> Include modes only from elements whose stiffness deviates sharply from their neighbors — the interfaces where diagonal dominance actually breaks.</li>
  <li><strong>Cross-color filter.</strong> Drop any mode whose support sits inside a single color. Those modes add no information the smoother is missing.</li>
  <li><strong>Patch aggregation.</strong> Group neighboring elements and replace per-element top-modes with one patch-level mode — trade some spectral fidelity for far fewer columns.</li>
</ul>

<p>I do not yet know which filter is sharpest in practice, or how few modes survive before the convergence gain disappears. But the structural picture is clean: the bottleneck is cross-color propagation, the analytical fix lives in element eigendecompositions we already have, and each individual sparse line search is nearly free. The whole game is picking the right small handful of directions.</p>]]></content><author><name>Anka He Chen</name><email>ankachan92@gmail.com</email></author><category term="physics-simulation" /><category term="VBD" /><category term="computer-graphics" /><category term="numerical-methods" /><summary type="html"><![CDATA[VBD’s per-vertex 3×3 solve already gives the locally optimal descent direction — there is no leverage left inside a single block. So why does it still slow down on problems with high stiffness contrast? The answer turns out to be a story about what basis you descend in, and once that picture is in place the fix — a sparse coarse correction built from per-element eigenmodes — almost suggests itself.]]></summary></entry><entry><title type="html">Quaternion Math for Rigid Body Simulation</title><link href="https://ankachan.github.io/posts/2026/04/quaternion-primer/" rel="alternate" type="text/html" title="Quaternion Math for Rigid Body Simulation" /><published>2026-04-30T00:00:00-07:00</published><updated>2026-04-30T00:00:00-07:00</updated><id>https://ankachan.github.io/posts/2026/04/quaternion-primer</id><content type="html" xml:base="https://ankachan.github.io/posts/2026/04/quaternion-primer/"><![CDATA[<p>A practical primer covering exactly the quaternion operations used in rigid body simulation, with reference to the <a href="https://github.com/newton-physics/newton">Newton</a> AVBD implementation. No proofs, just what you need to read the code.</p>

<hr />

<h2 id="what-is-a-quaternion">What is a quaternion</h2>

<p>Four numbers: three “vector” components and one “scalar” component.</p>

\[\mathbf{q} = (x,\, y,\, z,\, w) \quad\text{or equivalently}\quad \mathbf{q} = (\mathbf{v},\, w)\]

<p>A <strong>unit quaternion</strong> ($\lVert\mathbf{q}\rVert = 1$) represents a 3D rotation. The identity rotation is $(0, 0, 0, 1)$.</p>

<hr />

<h2 id="quaternion-multiplication">Quaternion multiplication</h2>

<p>Given $\mathbf{a} = (\mathbf{a}_v,\, a_w)$ and $\mathbf{b} = (\mathbf{b}_v,\, b_w)$:</p>

\[\mathbf{a} \otimes \mathbf{b} = \big(a_w\,\mathbf{b}_v + b_w\,\mathbf{a}_v + \mathbf{a}_v \times \mathbf{b}_v,\;\; a_w\,b_w - \mathbf{a}_v \cdot \mathbf{b}_v\big)\]

<p>This is <strong>not commutative</strong>: $\mathbf{a}\otimes\mathbf{b} \neq \mathbf{b}\otimes\mathbf{a}$ in general. Order matters, just like matrix multiplication. In fact quaternion multiplication corresponds exactly to multiplying the equivalent $3\times 3$ rotation matrices.</p>

<hr />

<h2 id="conjugate-and-inverse">Conjugate and inverse</h2>

\[\mathbf{q}^* = (-\mathbf{v},\, w) = (-x,\, -y,\, -z,\, w)\]

<p>For a unit quaternion, $\mathbf{q}^{-1} = \mathbf{q}^*$. This represents the <strong>opposite rotation</strong>.</p>

<hr />

<h2 id="how-a-quaternion-encodes-a-rotation">How a quaternion encodes a rotation</h2>

<p>A rotation by angle $\theta$ about unit axis $\hat{\mathbf{n}}$ is:</p>

\[\mathbf{q} = \big(\sin(\theta/2)\,\hat{\mathbf{n}},\;\; \cos(\theta/2)\big)\]

<p>The half-angle appears because quaternions rotate vectors via the <strong>sandwich product</strong> (next section), which applies the rotation from both sides—left and right—each contributing half the angle.</p>

<p>This means $\mathbf{q}$ and $-\mathbf{q}$ represent the <strong>same rotation</strong> (double cover of $SO(3)$). This is why code often checks <code class="language-plaintext highlighter-rouge">if q.w &lt; 0: q = -q</code> to pick the shorter path.</p>

<hr />

<h2 id="rotating-a-vector">Rotating a vector</h2>

<p>To rotate vector $\mathbf{v}$ by quaternion $\mathbf{q}$, embed $\mathbf{v}$ as a pure quaternion $(\mathbf{v}, 0)$ and sandwich:</p>

\[\mathbf{v}' = \big(\mathbf{q}\otimes(\mathbf{v}, 0)\otimes\mathbf{q}^*\big)_\text{vec}\]

<p>In code this is <code class="language-plaintext highlighter-rouge">quat_rotate(q, v)</code>. The efficient formula (no full quaternion multiply) is:</p>

\[\mathbf{t} = 2\,(\mathbf{q}_v \times \mathbf{v}), \qquad \mathbf{v}' = \mathbf{v} + w\,\mathbf{t} + \mathbf{q}_v \times \mathbf{t}\]

<p>The <strong>inverse rotation</strong> (world to body) is <code class="language-plaintext highlighter-rouge">quat_rotate(conjugate(q), v)</code>, which in Warp is <code class="language-plaintext highlighter-rouge">quat_rotate_inv(q, v)</code>.</p>

<hr />

<h2 id="composing-rotations">Composing rotations</h2>

<p>To apply rotation $\mathbf{a}$ <strong>then</strong> rotation $\mathbf{b}$:</p>

\[\mathbf{q}_\text{combined} = \mathbf{b} \otimes \mathbf{a}\]

<p>The rotation applied <strong>first</strong> goes on the <strong>right</strong>. Same convention as matrices: $(\mathbf{B}\mathbf{A})\mathbf{v} = \mathbf{B}(\mathbf{A}\mathbf{v})$.</p>

<hr />

<h2 id="relative-rotation">Relative rotation</h2>

<p>Given two orientations $\mathbf{q}_\text{cur}$ and $\mathbf{q}_\text{target}$, the rotation <strong>from current to target</strong> is:</p>

\[\mathbf{q}_\delta = \mathbf{q}_\text{cur}^{-1} \otimes \mathbf{q}_\text{target}\]

<p>This $\mathbf{q}_\delta$ is in $\mathbf{q}_\text{cur}$’s <strong>body frame</strong>. If you stand in the body frame of $\mathbf{q}_\text{cur}$, $\mathbf{q}_\delta$ tells you how much more to rotate to reach $\mathbf{q}_\text{target}$.</p>

<p>If you instead compute:</p>

\[\mathbf{q}_{\delta,\text{world}} = \mathbf{q}_\text{target} \otimes \mathbf{q}_\text{cur}^{-1}\]

<p>you get the same physical rotation, but expressed in <strong>world frame</strong>.</p>

<p>This is the key body-vs-world distinction in the Newton code:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Body-frame delta (Newton uses this)
</span><span class="n">q_delta</span> <span class="o">=</span> <span class="n">quat_inverse</span><span class="p">(</span><span class="n">rot_current</span><span class="p">)</span> <span class="o">*</span> <span class="n">rot_star</span>

<span class="c1"># World-frame delta (the AVBD demo uses this)
</span><span class="n">q_delta</span> <span class="o">=</span> <span class="n">rot_current</span> <span class="o">*</span> <span class="n">quat_inverse</span><span class="p">(</span><span class="n">rot_star</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<h2 id="quaternion-to-rotation-vector">Quaternion to rotation vector</h2>

<p>Extract the axis and angle from a quaternion:</p>

\[\theta = 2\,\arccos(w), \qquad \hat{\mathbf{n}} = \frac{\mathbf{v}}{\sin(\theta/2)}\]

<p>The <strong>rotation vector</strong> packs both into one $\mathbb{R}^3$ vector:</p>

\[\boldsymbol{\theta} = \hat{\mathbf{n}}\,\theta\]

<p>Its magnitude is the angle, its direction is the axis. This is what <code class="language-plaintext highlighter-rouge">quat_to_axis_angle</code> followed by <code class="language-plaintext highlighter-rouge">axis * angle</code> does in Newton, and it is the natural quantity for the inertial spring $\mathbf{f}_\text{ang} = \mathbf{I}_\text{world}\,\boldsymbol{\theta}/h^2$.</p>

<hr />

<h2 id="rotation-vector-back-to-quaternion">Rotation vector back to quaternion</h2>

<p>Given rotation vector $\boldsymbol{\theta} \in \mathbb{R}^3$:</p>

\[\theta = \lVert\boldsymbol{\theta}\rVert, \qquad \hat{\mathbf{n}} = \boldsymbol{\theta}/\theta, \qquad \mathbf{q} = \big(\sin(\theta/2)\,\hat{\mathbf{n}},\;\cos(\theta/2)\big)\]

<p>For small angles, the <strong>small-angle approximation</strong> avoids the trig:</p>

\[\mathbf{q} \approx \text{normalize}\!\big(\boldsymbol{\theta}/2,\; 1\big)\]

<p>This is the <code class="language-plaintext highlighter-rouge">_USE_SMALL_ANGLE_APPROX</code> path in Newton’s <code class="language-plaintext highlighter-rouge">solve_rigid_body</code>.</p>

<hr />

<h2 id="angular-velocity-and-dqdt">Angular velocity and dq/dt</h2>

<p>If a body has world-frame angular velocity $\boldsymbol{\omega}$, its orientation changes as:</p>

\[\dot{\mathbf{q}} = \tfrac{1}{2}\,\widetilde{\boldsymbol{\omega}} \otimes \mathbf{q}\]

<p>where $\widetilde{\boldsymbol{\omega}} = (\boldsymbol{\omega}, 0)$ is $\boldsymbol{\omega}$ embedded as a pure quaternion (zero scalar part).</p>

<p><strong>Why?</strong> A small rotation by angle $\lVert\boldsymbol{\omega}\rVert\,\Delta t$ about axis $\boldsymbol{\omega}/\lVert\boldsymbol{\omega}\rVert$ is the quaternion:</p>

\[\delta\mathbf{q} = \Big(\sin\!\big(\tfrac{\lVert\boldsymbol{\omega}\rVert\Delta t}{2}\big)\,\frac{\boldsymbol{\omega}}{\lVert\boldsymbol{\omega}\rVert},\;\; \cos\!\big(\tfrac{\lVert\boldsymbol{\omega}\rVert\Delta t}{2}\big)\Big) \;\approx\; \big(\boldsymbol{\omega}\,\Delta t/2,\; 1\big)\]

<p>The new orientation is $\delta\mathbf{q}\otimes\mathbf{q}$ (left-multiply = world frame), so:</p>

\[\mathbf{q}(t+\Delta t) - \mathbf{q}(t) = (\delta\mathbf{q} - \mathbf{1})\otimes\mathbf{q} = \big(\boldsymbol{\omega}\,\Delta t/2,\; 0\big)\otimes\mathbf{q}\]

\[\dot{\mathbf{q}} = \tfrac{1}{2}\,(\boldsymbol{\omega}, 0) \otimes \mathbf{q}\]

<p><strong>Euler integration</strong> of this gives:</p>

\[\mathbf{q}^{n+1} = \text{normalize}\!\big(\mathbf{q}^n + \tfrac{h}{2}\,\widetilde{\boldsymbol{\omega}} \otimes \mathbf{q}^n\big)\]

<p>which is exactly <code class="language-plaintext highlighter-rouge">wp.normalize(r0 + wp.quat(w1, 0.0) * r0 * 0.5 * dt)</code> in Newton.</p>

<p><strong>Body-frame angular velocity</strong> would use right-multiplication instead:</p>

\[\dot{\mathbf{q}} = \tfrac{1}{2}\,\mathbf{q} \otimes (\boldsymbol{\omega}_\text{body}, 0)\]

<hr />

<h2 id="rotation-matrix-from-quaternion">Rotation matrix from quaternion</h2>

<p>Sometimes you need the $3\times 3$ rotation matrix, e.g. to compute $\mathbf{I}_\text{world} = \mathbf{R}\,\mathbf{I}_\text{body}\,\mathbf{R}^T$. Given $\mathbf{q} = (x, y, z, w)$:</p>

\[\mathbf{R} = \begin{bmatrix} 1-2(y^2+z^2) &amp; 2(xy-wz) &amp; 2(xz+wy) \\\\ 2(xy+wz) &amp; 1-2(x^2+z^2) &amp; 2(yz-wx) \\\\ 2(xz-wy) &amp; 2(yz+wx) &amp; 1-2(x^2+y^2) \end{bmatrix}\]

<p>In Warp this is <code class="language-plaintext highlighter-rouge">quat_to_matrix(q)</code>.</p>

<hr />

<h2 id="summary-operations-used-in-newtons-rigid-body-solver">Summary: operations used in Newton’s rigid body solver</h2>

<table>
  <thead>
    <tr>
      <th>Code</th>
      <th>Math</th>
      <th>What it does</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">quat_rotate(q, v)</code></td>
      <td>$\mathbf{R}\,\mathbf{v}$</td>
      <td>Rotate vector to world frame</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">quat_rotate_inv(q, v)</code></td>
      <td>$\mathbf{R}^T\mathbf{v}$</td>
      <td>Rotate vector to body frame</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">quat_inverse(q_cur) * q_star</code></td>
      <td>$\mathbf{R}_\text{cur}^T\,\mathbf{R}_\star$</td>
      <td>Relative rotation in body frame</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">quat_to_axis_angle(q)</code> $\to$ <code class="language-plaintext highlighter-rouge">axis*angle</code></td>
      <td>$\boldsymbol{\theta} = \log(\mathbf{q})$</td>
      <td>Quaternion to rotation vector</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">quat(half_w, 1.0)</code> normalized</td>
      <td>$\exp(\Delta\boldsymbol{\omega}/2)$</td>
      <td>Small rotation vector to quaternion</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">dq * q_current</code></td>
      <td>$\delta\mathbf{R}\,\mathbf{R}_\text{cur}$</td>
      <td>Apply world-frame rotation increment</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">r0 + wp.quat(w1,0)*r0*0.5*dt</code></td>
      <td>$\text{normalize}(\mathbf{q} + \tfrac{h}{2}\widetilde{\boldsymbol{\omega}}\otimes\mathbf{q})$</td>
      <td>Integrate angular velocity one step</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">quat_to_matrix(q)</code></td>
      <td>$\mathbf{R}$</td>
      <td>$3\times 3$ rotation matrix for $\mathbf{R}\,\mathbf{I}\,\mathbf{R}^T$</td>
    </tr>
  </tbody>
</table>

<hr />

<p><em>See also: <a href="/posts/2026/03/vbd-rigid-body-section-1/">Rigid Body Dynamics with VBD, Section I</a> for the full AVBD derivation that uses these operations.</em></p>]]></content><author><name>Anka He Chen</name><email>ankachan92@gmail.com</email></author><category term="physics-simulation" /><category term="rigid-body" /><category term="computer-graphics" /><category term="math" /><summary type="html"><![CDATA[A practical primer covering exactly the quaternion operations used in rigid body simulation, with reference to the Newton AVBD implementation. No proofs, just what you need to read the code.]]></summary></entry><entry><title type="html">Stable Neo-Hookean for VBD: Deriving the Per-Vertex Hessian</title><link href="https://ankachan.github.io/posts/2026/04/neohookean-vbd-cofactor-cancellation/" rel="alternate" type="text/html" title="Stable Neo-Hookean for VBD: Deriving the Per-Vertex Hessian" /><published>2026-04-24T00:00:00-07:00</published><updated>2026-04-24T00:00:00-07:00</updated><id>https://ankachan.github.io/posts/2026/04/neohookean-vbd-cofactor-cancellation</id><content type="html" xml:base="https://ankachan.github.io/posts/2026/04/neohookean-vbd-cofactor-cancellation/"><![CDATA[<p>This post derives the per-vertex 3×3 Hessian block for the <a href="https://graphics.pixar.com/library/StableElasticity/paper.pdf">stable Neo-Hookean</a> tet material under <a href="https://anka-chen.me/publication/vbd-paper">VBD</a>-style block Gauss-Seidel, and shows how it lands as an unconditionally PSD expression with no clamp or eigenvalue projection required. The derivation is short but the algebraic cancellation it relies on is easy to miss, so it is worth writing out in full. The post is meant as a reference for anyone wiring stable Neo-Hookean into a VBD solver.</p>

<h2 id="stable-neo-hookean-energy-and-its-hessian">Stable Neo-Hookean Energy and Its Hessian</h2>

<p>For a tet with deformation gradient $\mathbf{F} \in \mathbb{R}^{3\times3}$ and energy parameters $\mu, \lambda$, the stable Neo-Hookean energy density is</p>

\[\psi(\mathbf{F}) \;=\; \tfrac{\mu}{2}(I_C - 3) \;+\; \tfrac{\lambda}{2}(J - \alpha)^2,
\qquad I_C = \|\mathbf{F}\|_F^2,\quad J = \det \mathbf{F},\quad \alpha = 1 + \tfrac{\mu}{\lambda}.\]

<p>The shift $\alpha$ ensures $\partial\psi/\partial \mathbf{F} = \mathbf{0}$ at the rest configuration $\mathbf{F} = \mathbf{I}$; it does <em>not</em> prevent inversion.</p>

<p>A subtlety worth flagging: the symbols $\mu, \lambda$ in this energy are <em>not</em> directly the Lamé parameters. Matching the small-strain limit of stable Neo-Hookean to linear elasticity (<a href="https://graphics.pixar.com/library/StableElasticity/paper.pdf">Smith et al. §3.4</a>, eq. 13) gives the relation</p>

\[\mu_\text{NH} \;=\; \mu_\text{Lam\'e}, \qquad \lambda_\text{NH} \;=\; \lambda_\text{Lam\'e} \;+\; \mu_\text{Lam\'e}.\]

<p>So if you are exposing material constants to users in textbook Lamé convention, convert with $\lambda_\text{NH} = \lambda_\text{Lam'e} + \mu_\text{Lam'e}$ before plugging into the energy. Throughout the rest of this post, $\mu, \lambda$ refer to the Neo-Hookean parameters $\mu_\text{NH}, \lambda_\text{NH}$ as they appear in the energy expression above.</p>

<p>The first Piola–Kirchhoff stress is</p>

\[\mathbf{P}(\mathbf{F}) \;=\; \frac{\partial \psi}{\partial \mathbf{F}} \;=\; \mu \mathbf{F} \;+\; s\,\text{cof}(\mathbf{F}),
\qquad s \;\equiv\; \lambda(J - \alpha).\]

<p>Vectorising $\mathbf{F}$ column-major as $\text{vec}(\mathbf{F}) \in \mathbb{R}^9$, the Hessian splits into three pieces:</p>

\[\mathbf{H}\_\text{elastic} \;=\; \underbrace{\mu \mathbf{I}\_9}\_{\mathbf{A}\_\mu} \;+\; \underbrace{\lambda\,\text{vec}(\text{cof}\,\mathbf{F})\;\text{vec}(\text{cof}\,\mathbf{F})^T}\_{\mathbf{A}\_\lambda} \;+\; \underbrace{s\,\frac{\partial^2 J}{\partial \mathbf{F}^2}}\_{\mathbf{A}\_\sigma}.\]

<p>$\mathbf{A}_\mu$ is a positive multiple of identity. $\mathbf{A}_\lambda$ is rank-1 PSD. $\mathbf{A}_\sigma$ is the only piece that can be indefinite: $s$ is negative for compressed tets, and $\partial^2 J/\partial \mathbf{F}^2$ has both positive and negative eigenvalues. Standard Newton-style implementations therefore SPD-project $\mathbf{A}_\sigma$ (e.g. via <a href="https://www.tkim.graphics/RESCALE/elastic_eigenanalysis.pdf">Smith–Kim eigenanalysis</a>, or by clamping $s$ into a precomputed safe interval).</p>

<p>For VBD this projection turns out to be unnecessary. Showing why is the rest of the post.</p>

<h2 id="what-vbd-needs">What VBD Needs</h2>

<p>VBD updates one vertex at a time by solving the local Newton system</p>

\[\mathbf{H}_{aa}\,\Delta\mathbf{x}_a \;=\; \mathbf{f}_a,\]

<p>so the only piece of the elastic Hessian that ever enters a solve is the $3\times3$ diagonal block $\mathbf{H}<em>{aa}$ corresponding to a single vertex $a$. Off-diagonal blocks $\mathbf{H}</em>{ab}$ ($a \neq b$) influence convergence rate through Gauss-Seidel coupling but never appear inside any matrix inverse.</p>

<p>For a linear tet, $\mathbf{F}$ is affine in the vertex positions, so each vertex contributes via a fixed rest-frame weight $\mathbf{m}^a \in \mathbb{R}^3$ (a row of $\mathbf{D}_m^{-1}$):</p>

\[\frac{\partial F_{ij}}{\partial x_a^\alpha} \;=\; \delta_{i\alpha}\, m_j^a.\]

<p>Throughout, $i,j,k,l$ are deformation-gradient indices ($1\ldots 3$) and $\alpha,\beta$ are spatial-coordinate indices ($1\ldots 3$). The per-vertex 3×3 block is</p>

\[\mathbf{H}\_{aa}^{\alpha\beta} \;=\; \sum_{ijkl}\,\frac{\partial F\_{ij}}{\partial x\_a^\alpha}\;\frac{\partial^2 \psi}{\partial F\_{ij}\,\partial F\_{kl}}\;\frac{\partial F\_{kl}}{\partial x\_a^\beta}.\]

<p>We will plug each of the three pieces $\mathbf{A}_\mu$, $\mathbf{A}_\lambda$, $\mathbf{A}_\sigma$ into this and simplify.</p>

<h2 id="contracting-mathbfamu-and-mathbfalambda">Contracting $\mathbf{A}<em>\mu$ and $\mathbf{A}</em>\lambda$</h2>

<p>For $\mathbf{A}_\mu = \mu\,\mathbf{I}_9$:</p>

\[\mathbf{H}\_{aa}^{\alpha\beta}\big[\mathbf{A}\_\mu\big] \;=\; \mu\,\sum\_{ij}\,\delta\_{i\alpha}\,m\_j^a\,\delta\_{i\beta}\,m\_j^a \;=\; \mu\,\delta\_{\alpha\beta}\,\|\mathbf{m}^a\|^2.\]

<p>So the $\mathbf{A}_\mu$ contribution is $\mu\,|\mathbf{m}^a|^2\,\mathbf{I}_3$.</p>

<p>For $\mathbf{A}_\lambda = \lambda\,\text{vec}(\text{cof}\mathbf{F})\,\text{vec}(\text{cof}\mathbf{F})^T$, define $\mathbf{w}^a = \text{cof}(\mathbf{F})\,\mathbf{m}^a \in \mathbb{R}^3$. Then</p>

\[\sum\_{ij}\,\delta\_{i\alpha}\,m\_j^a\,(\text{cof}\,\mathbf{F})\_{ij} \;=\; \sum\_j (\text{cof}\,\mathbf{F})\_{\alpha j}\, m\_j^a \;=\; w\_\alpha^a,\]

<p>so $\mathbf{H}_{aa}^{\alpha\beta}[\mathbf{A}_\lambda] = \lambda\,w_\alpha^a\,w_\beta^a$, i.e. the rank-1 dyad $\lambda\,\mathbf{w}^a(\mathbf{w}^a)^T$.</p>

<p>Both contributions are PSD by inspection.</p>

<h2 id="contracting-mathbfa_sigma">Contracting $\mathbf{A}_\sigma$</h2>

<p>The Hessian of $J = \det \mathbf{F}$ is the Levi-Civita identity</p>

\[\frac{\partial^2 J}{\partial F\_{ij}\,\partial F\_{kl}} \;=\; \varepsilon\_{ikp}\,\varepsilon\_{jlq}\,F\_{pq}.\]

<p>This tensor is nonzero in general, but contract it with $\partial F/\partial x_a$ on both legs:</p>

\[\begin{aligned}
\mathbf{H}\_{aa}^{\alpha\beta}\big[\mathbf{A}\_\sigma\big]
&amp;= s\,\sum\_{ijkl}\,\delta\_{i\alpha}\,m\_j^a \,\cdot\, \varepsilon\_{ikp}\,\varepsilon\_{jlq}\,F\_{pq}\,\cdot\,\delta\_{k\beta}\,m\_l^a \\\\
&amp;= s\,\varepsilon\_{\alpha\beta p}\,F\_{pq}\,\sum\_{j,l}\, m\_j^a\,\varepsilon\_{jlq}\,m\_l^a \\\\
&amp;= s\,\varepsilon\_{\alpha\beta p}\,F\_{pq}\,(\mathbf{m}^a \times \mathbf{m}^a)\_q \\\\
&amp;= 0.
\end{aligned}\]

<p>The inner sum is the cross product of $\mathbf{m}^a$ with itself, which vanishes for any vector. The cancellation goes through for any $\mathbf{F}$ (including $\det \mathbf{F} \leq 0$), any $\mathbf{m}^a$, and any scalar $s$.</p>

<p>The structural reason: $\partial F/\partial x_a^\alpha = \mathbf{e}_\alpha \otimes \mathbf{m}^a$ is a rank-1 dyad. Sandwiching the antisymmetric tensor $\partial^2 J/\partial \mathbf{F}^2$ between two copies of the <em>same</em> rank-1 dyad pins the $j,l$ indices to the same vector $\mathbf{m}^a$, and antisymmetry collapses the contraction to zero. Off-diagonal blocks $\mathbf{H}_{ab}$ for $a \neq b$ replace $\mathbf{m}^a \times \mathbf{m}^a$ with $\mathbf{m}^a \times \mathbf{m}^b$, which is generically nonzero — they do see $\mathbf{A}_\sigma$.</p>

<h2 id="the-per-vertex-block">The Per-Vertex Block</h2>

<p>Combining the three contractions,</p>

\[\boxed{\;\mathbf{H}\_{aa} \;=\; \mu\,\|\mathbf{m}^a\|^2\,\mathbf{I}\_3 \;+\; \lambda\,\mathbf{w}^a (\mathbf{w}^a)^T,\qquad \mathbf{w}^a = \text{cof}(\mathbf{F})\,\mathbf{m}^a.\;}\]

<p>Both summands are PSD for any $\mathbf{F}$ and any $\mathbf{m}^a$:</p>

<ul>
  <li>$\mu\,|\mathbf{m}^a|^2\,\mathbf{I}_3$ is a positive multiple of identity.</li>
  <li>$\lambda\,\mathbf{w}^a(\mathbf{w}^a)^T$ is a rank-1 outer product with positive coefficient.</li>
</ul>

<p>So the per-vertex block is unconditionally PSD with no projection step. The cofactor-derivative term that complicates Newton-style implementations does not contribute to it.</p>

<p>The corresponding per-vertex elastic force is the same expression evaluated against the <em>true</em> (unclamped) stress:</p>

\[\mathbf{f}\_a \;=\; -\mathbf{P}(\mathbf{F})\,\mathbf{m}^a \;=\; -\mu\,\mathbf{F}\,\mathbf{m}^a \;-\; s\,\mathbf{w}^a.\]

<p>Forces use the real $s = \lambda(J-\alpha)$ even when it is negative; this is what carries the inversion-recovery signal in stable Neo-Hookean.</p>

<h2 id="implementation">Implementation</h2>

<p>The full evaluator multiplies the result by the rest volume and (optionally) adds a damping contribution. In Warp the elastic part is just:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">wp</span><span class="p">.</span><span class="n">func</span>
<span class="k">def</span> <span class="nf">evaluate_volumetric_neo_hookean_force_and_hessian</span><span class="p">(</span>
    <span class="n">tet_id</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">v_order</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
    <span class="n">pos</span><span class="p">:</span> <span class="n">wp</span><span class="p">.</span><span class="n">array</span><span class="p">[</span><span class="n">wp</span><span class="p">.</span><span class="n">vec3</span><span class="p">],</span>
    <span class="n">tet_indices</span><span class="p">:</span> <span class="n">wp</span><span class="p">.</span><span class="n">array2d</span><span class="p">[</span><span class="n">wp</span><span class="p">.</span><span class="n">int32</span><span class="p">],</span>
    <span class="n">Dm_inv</span><span class="p">:</span> <span class="n">wp</span><span class="p">.</span><span class="n">mat33</span><span class="p">,</span>
    <span class="n">mu</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">lmbd</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span>
<span class="p">):</span>
    <span class="n">v0</span> <span class="o">=</span> <span class="n">pos</span><span class="p">[</span><span class="n">tet_indices</span><span class="p">[</span><span class="n">tet_id</span><span class="p">,</span> <span class="mi">0</span><span class="p">]]</span>
    <span class="n">v1</span> <span class="o">=</span> <span class="n">pos</span><span class="p">[</span><span class="n">tet_indices</span><span class="p">[</span><span class="n">tet_id</span><span class="p">,</span> <span class="mi">1</span><span class="p">]]</span>
    <span class="n">v2</span> <span class="o">=</span> <span class="n">pos</span><span class="p">[</span><span class="n">tet_indices</span><span class="p">[</span><span class="n">tet_id</span><span class="p">,</span> <span class="mi">2</span><span class="p">]]</span>
    <span class="n">v3</span> <span class="o">=</span> <span class="n">pos</span><span class="p">[</span><span class="n">tet_indices</span><span class="p">[</span><span class="n">tet_id</span><span class="p">,</span> <span class="mi">3</span><span class="p">]]</span>
    <span class="n">rest_volume</span> <span class="o">=</span> <span class="mf">1.0</span> <span class="o">/</span> <span class="p">(</span><span class="n">wp</span><span class="p">.</span><span class="n">determinant</span><span class="p">(</span><span class="n">Dm_inv</span><span class="p">)</span> <span class="o">*</span> <span class="mf">6.0</span><span class="p">)</span>

    <span class="c1"># F = D_s D_m^{-1}
</span>    <span class="n">Ds</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">matrix_from_cols</span><span class="p">(</span><span class="n">v1</span> <span class="o">-</span> <span class="n">v0</span><span class="p">,</span> <span class="n">v2</span> <span class="o">-</span> <span class="n">v0</span><span class="p">,</span> <span class="n">v3</span> <span class="o">-</span> <span class="n">v0</span><span class="p">)</span>
    <span class="n">F</span> <span class="o">=</span> <span class="n">Ds</span> <span class="o">*</span> <span class="n">Dm_inv</span>

    <span class="c1"># Per-vertex weight m^a (a row of D_m^{-1}; vertex 0 is the negative sum)
</span>    <span class="k">if</span> <span class="n">v_order</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
        <span class="n">m</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">vec3</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="n">Dm_inv</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span><span class="mi">0</span><span class="p">]),</span>
                    <span class="o">-</span><span class="p">(</span><span class="n">Dm_inv</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">]),</span>
                    <span class="o">-</span><span class="p">(</span><span class="n">Dm_inv</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">2</span><span class="p">]</span> <span class="o">+</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">]</span> <span class="o">+</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">]))</span>
    <span class="k">elif</span> <span class="n">v_order</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
        <span class="n">m</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">vec3</span><span class="p">(</span><span class="n">Dm_inv</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">2</span><span class="p">])</span>
    <span class="k">elif</span> <span class="n">v_order</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
        <span class="n">m</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">vec3</span><span class="p">(</span><span class="n">Dm_inv</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">])</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">m</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">vec3</span><span class="p">(</span><span class="n">Dm_inv</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="n">Dm_inv</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">])</span>

    <span class="c1"># Stress (uses the TRUE s, no clamp)
</span>    <span class="n">J</span>     <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">determinant</span><span class="p">(</span><span class="n">F</span><span class="p">)</span>
    <span class="n">alpha</span> <span class="o">=</span> <span class="mf">1.0</span> <span class="o">+</span> <span class="n">mu</span> <span class="o">/</span> <span class="n">lmbd</span>
    <span class="n">s</span>     <span class="o">=</span> <span class="n">lmbd</span> <span class="o">*</span> <span class="p">(</span><span class="n">J</span> <span class="o">-</span> <span class="n">alpha</span><span class="p">)</span>
    <span class="n">cof</span>   <span class="o">=</span> <span class="n">compute_cofactor</span><span class="p">(</span><span class="n">F</span><span class="p">)</span>             <span class="c1"># adjugate via cross products
</span>
    <span class="c1"># Per-vertex auxiliary vectors
</span>    <span class="n">Fm</span> <span class="o">=</span> <span class="n">F</span> <span class="o">*</span> <span class="n">m</span>                              <span class="c1"># mu term
</span>    <span class="n">w</span>  <span class="o">=</span> <span class="n">cof</span> <span class="o">*</span> <span class="n">m</span>                            <span class="c1"># lambda term: w^a = cof(F) m^a
</span>
    <span class="c1"># Force: f_a = -P m^a
</span>    <span class="n">force</span> <span class="o">=</span> <span class="o">-</span><span class="n">rest_volume</span> <span class="o">*</span> <span class="p">(</span><span class="n">mu</span> <span class="o">*</span> <span class="n">Fm</span> <span class="o">+</span> <span class="n">s</span> <span class="o">*</span> <span class="n">w</span><span class="p">)</span>

    <span class="c1"># Hessian: H_aa = mu ||m||^2 I + lambda w w^T
</span>    <span class="n">I3</span>      <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">identity</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
    <span class="n">hessian</span> <span class="o">=</span> <span class="n">rest_volume</span> <span class="o">*</span> <span class="p">(</span><span class="n">mu</span> <span class="o">*</span> <span class="n">wp</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">m</span><span class="p">)</span> <span class="o">*</span> <span class="n">I3</span> <span class="o">+</span> <span class="n">lmbd</span> <span class="o">*</span> <span class="n">wp</span><span class="p">.</span><span class="n">outer</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">w</span><span class="p">))</span>
    <span class="k">return</span> <span class="n">force</span><span class="p">,</span> <span class="n">hessian</span>
</code></pre></div></div>

<p>The 9×9 elastic Hessian never gets assembled; nothing is clamped. Compared to a textbook implementation that builds $\mathbf{A}_\mu + \mathbf{A}_\lambda + \mathbf{A}_\sigma$ as a $9\times9$ matrix, projects it, then contracts with $\partial F/\partial x_a$, this is a small constant-factor saving per tet per VBD inner iteration.</p>

<h2 id="why-this-doesnt-extend-to-triangle-membranes">Why This Doesn’t Extend to Triangle Membranes</h2>

<p>It is tempting to apply the same logic to a stable Neo-Hookean <em>triangle</em> membrane and conclude that its per-vertex 3×3 block also drops the cofactor-derivative term. It does not. The cancellation hinges on a structural property of the volumetric case that the membrane does not share.</p>

<p>For a 3D triangle in 2D rest space, the deformation gradient is $\mathbf{F} \in \mathbb{R}^{3\times 2}$ with columns $\mathbf{f}_0, \mathbf{f}_1$. The natural area scalar that plays the role of $J$ is</p>

\[J_s \;=\; \sqrt{\det(\mathbf{F}^T \mathbf{F})} \;=\; \|\mathbf{f}_0 \times \mathbf{f}_1\|.\]

<p>Two things change relative to the volumetric case:</p>

<ol>
  <li><strong>$J_s$ is not a polynomial in $\mathbf{F}$</strong> (it is a square root). Its second derivative does not have the clean Levi-Civita form $\partial^2 J/\partial F_{ij}\partial F_{kl} = \varepsilon_{ikp}\varepsilon_{jlq}F_{pq}$. There is in fact an extra $-(1/J_s)\,\nabla J_s \otimes \nabla J_s$ piece coming from differentiating the $1/J_s$ factor in $\nabla J_s = (\mathbf{n}\cdot\nabla\mathbf{n})/J_s$.</li>
  <li><strong>Rows and columns of $\mathbf{F}$ live in different spaces.</strong> Row indices run over 3D world coordinates $(i \in \{1,2,3\})$, column indices run over 2D parameter coordinates $(j \in \{1,2\})$. The 3-index Levi-Civita $\varepsilon_{jlq}$ that produced $\mathbf{m}\times\mathbf{m}$ in the volumetric proof has nowhere to live on the column-index leg — there are only two column indices to antisymmetrise over, not three.</li>
</ol>

<p>Concretely, the per-vertex contraction in the membrane case becomes</p>

\[\mathbf{H}\_{aa}^{\alpha\beta}\big[\mathbf{A}\_\sigma^\text{2D}\big] \;=\; s\,\sum\_{j,l \in \\{0,1\\}}\,m\_j^a\,m\_l^a\,\frac{\partial^2 J\_s}{\partial F\_{\alpha j}\,\partial F\_{\beta l}},\]

<p>with no antisymmetric-in-$(j,l)$ structure to exploit. Working through the algebra with $\mathbf{n} = \mathbf{f}_0\times\mathbf{f}_1$ gives a clean form for the contracted block:</p>

\[\mathbf{H}\_{aa}\big[\mathbf{A}\_\sigma^\text{2D}\big] \;=\; \frac{s}{J_s}\,\Big(\|\mathbf{w}\|^2\,\mathbf{I}\_3 \;-\; \mathbf{w}\mathbf{w}^T \;-\; \boldsymbol{\nabla}J_s\,\boldsymbol{\nabla}J_s^T\Big),\]

<p>with $\mathbf{w} = \mathbf{f}_1\,m_0^a - \mathbf{f}_0\,m_1^a$ and $\boldsymbol{\nabla}J_s = \mathbf{g}_0\,m_0^a + \mathbf{g}_1\,m_1^a$, $\mathbf{g}_\alpha = \partial J_s/\partial \mathbf{f}_\alpha$. None of these vanish in general; in fact the $|\mathbf{w}|^2 \mathbf{I}_3 - \mathbf{w}\mathbf{w}^T$ piece projects onto the direction <em>normal</em> to the membrane and produces a genuine out-of-plane stiffness. The cofactor-derivative term carries real physics here.</p>

<p><strong>Geometric reading.</strong> The volumetric tet has no “extra” direction — both legs of $\mathbf{F}$ span the same 3D space, and the Levi-Civita pattern absorbs all three coordinate axes uniformly. The membrane has a normal direction that is <em>not</em> in the column space of $\mathbf{F}$; the second-derivative term contributes precisely along that normal. Stripping it would weaken out-of-plane resistance and change the physics, not just save flops.</p>

<h2 id="the-tight-psd-clamp-for-the-membrane">The Tight PSD Clamp for the Membrane</h2>

<p>Although the cofactor-derivative term has to stay, the per-vertex 3×3 block still has a clean PSD characterisation. Combining the three contractions for the membrane case,</p>

\[\mathbf{H}\_{aa} \;=\; \mu\,\|\mathbf{m}^a\|^2\,\mathbf{I}\_3 \;+\; (\lambda - r)\,\boldsymbol{\nabla}J\_s\,\boldsymbol{\nabla}J\_s^T \;+\; r\,\big(\|\mathbf{w}\|^2\,\mathbf{I}\_3 - \mathbf{w}\mathbf{w}^T\big), \qquad r \;\equiv\; \frac{s}{J_s}.\]

<p>(The $\lambda\,\boldsymbol{\nabla}J_s\,\boldsymbol{\nabla}J_s^T$ piece comes from $\mathbf{A}_\lambda$ in the membrane case — the rank-1 cofactor outer product specialises to $\boldsymbol{\nabla}J_s\,\boldsymbol{\nabla}J_s^T$ here. The $-r\,\boldsymbol{\nabla}J_s\,\boldsymbol{\nabla}J_s^T$ piece comes from the $\mathbf{A}_\sigma$ contraction, which is why the two combine.)</p>

<p>Two algebraic identities make this block diagonalisable.</p>

<p><strong>Lemma 1.</strong> $\mathbf{w} \cdot \boldsymbol{\nabla}J_s = 0$.</p>

<p><em>Proof.</em> Direct computation using $\mathbf{g}_\alpha = \partial J_s/\partial \mathbf{f}_\alpha$ and $J_s^2 = AB - C^2$ with $A = |\mathbf{f}_0|^2,\ B = |\mathbf{f}_1|^2,\ C = \mathbf{f}_0\cdot \mathbf{f}_1$ gives
$\mathbf{f}_1\cdot\mathbf{g}_0 = \mathbf{f}_0\cdot\mathbf{g}_1 = 0$ and $\mathbf{f}_0\cdot\mathbf{g}_0 = \mathbf{f}_1\cdot\mathbf{g}_1 = J_s$. Expanding $\mathbf{w}\cdot\boldsymbol{\nabla}J_s$ in $(m_0^a, m_1^a)$ and substituting collapses the four terms to $J_s\,m_0^a m_1^a - J_s\,m_0^a m_1^a = 0$. $\square$</p>

<p><strong>Lemma 2.</strong> $|\mathbf{w}| = |\boldsymbol{\nabla}J_s|$.</p>

<p><em>Proof.</em> Compute $\mathbf{w}\times\boldsymbol{\nabla}J_s$ using $\mathbf{f}_i\times\mathbf{g}_j$ which all reduce to scalar multiples of $\mathbf{n} = \mathbf{f}_0\times\mathbf{f}_1$. The four cross products give</p>

\[\mathbf{w}\times\boldsymbol{\nabla}J\_s \;=\; -\frac{\mathbf{n}}{J\_s}\,\big(A(m\_1^a)^2 - 2C\,m\_0^a m\_1^a + B(m\_0^a)^2\big) \;=\; -\|\mathbf{w}\|^2\,\hat{\mathbf{n}},\]

<p>where the last equality uses $|\mathbf{w}|^2 = A(m_1^a)^2 - 2C\,m_0^a m_1^a + B(m_0^a)^2$ and $\hat{\mathbf{n}} = \mathbf{n}/J_s$. By Lemma 1, $\mathbf{w}\perp\boldsymbol{\nabla}J_s$, so $|\mathbf{w}\times\boldsymbol{\nabla}J_s| = |\mathbf{w}|\,|\boldsymbol{\nabla}J_s|$. Equating with the right-hand side gives $|\mathbf{w}|\,|\boldsymbol{\nabla}J_s| = |\mathbf{w}|^2$. $\square$</p>

<p><strong>Diagonalisation.</strong> Choose the orthonormal basis $\{\hat{\mathbf{w}}, \widehat{\boldsymbol{\nabla}J_s}, \hat{\mathbf{n}}\}$ where $\hat{\mathbf{n}}$ is the unit triangle normal (orthogonal to both $\mathbf{w}$ and $\boldsymbol{\nabla}J_s$ by Lemma 1 and the cross-product computation). Off-diagonal entries of $\mathbf{H}_{aa}$ vanish in this basis (each of the three building blocks $\mathbf{I}_3$, $\boldsymbol{\nabla}J_s\,\boldsymbol{\nabla}J_s^T$, $|\mathbf{w}|^2\mathbf{I}_3 - \mathbf{w}\mathbf{w}^T$ is diagonal in it), and using $|\mathbf{w}| = |\boldsymbol{\nabla}J_s|$ to combine the $r$-terms in the $\widehat{\boldsymbol{\nabla}J_s}$ direction:</p>

<table>
  <thead>
    <tr>
      <th>Direction</th>
      <th>Eigenvalue</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>$\hat{\mathbf{w}}$</td>
      <td>$\mu\,|\mathbf{m}^a|^2$</td>
    </tr>
    <tr>
      <td>$\widehat{\boldsymbol{\nabla}J_s}$</td>
      <td>$\mu\,|\mathbf{m}^a|^2 + \lambda\,|\boldsymbol{\nabla}J_s|^2$</td>
    </tr>
    <tr>
      <td>$\hat{\mathbf{n}}$</td>
      <td>$\mu\,|\mathbf{m}^a|^2 + r\,|\mathbf{w}|^2$</td>
    </tr>
  </tbody>
</table>

<p>The first two eigenvalues are PSD for any $r$ — the $r$-dependence in the $\widehat{\boldsymbol{\nabla}J_s}$ direction cancels exactly because $|\mathbf{w}| = |\boldsymbol{\nabla}J_s|$. Only the normal direction sees $r$, and the PSD condition there is</p>

\[r \;\geq\; -\frac{\mu\,\|\mathbf{m}^a\|^2}{\|\mathbf{w}\|^2}.\]

<p>The right-hand side is geometry-dependent. For a <em>uniform</em> clamp that works for every triangle and every vertex, the only safe choice is $r \geq 0$, i.e.</p>

\[\boxed{\;s\_\text{clamp} \;=\; \max(0, s).\;}\]

<p>This is tight in the uniform sense: any larger lower bound on $s$ would change the physics for at least some configurations where the unclamped block is already PSD; any smaller (more permissive) bound risks an indefinite block in some configuration.</p>

<p>A geometry-aware solver could instead use the per-element lower bound $s \geq -\mu\,|\mathbf{m}^a|^2 J_s/|\mathbf{w}|^2$ and recover a slightly looser projection, but the bookkeeping cost rarely justifies it. Force always uses the unclamped $s = \lambda(J_s - \alpha)$, exactly as in the volumetric case.</p>

<p>A stable Neo-Hookean triangle evaluator therefore keeps the second-derivative contribution and applies the simple uniform clamp $s_\text{clamp} = \max(0, s)$. The simple result for the volumetric tet is genuinely a special property of square deformation gradients.</p>

<h2 id="sanity-checks-before-shipping">Sanity Checks Before Shipping</h2>

<p>A few things worth verifying when wiring this up:</p>

<ol>
  <li>
    <table>
      <tbody>
        <tr>
          <td>The shift $\alpha = 1 + \mu/\lambda$ depends on $\lambda \neq 0$. Guard against $\lambda$ near zero (e.g. $\lambda \mapsto \text{sign}(\lambda)\,\max(</td>
          <td>\lambda</td>
          <td>, \epsilon)$).</td>
        </tr>
      </tbody>
    </table>
  </li>
  <li>Use the explicit cofactor / adjugate $\text{cof}(\mathbf{F})$ rather than $J\,\mathbf{F}^{-T}$. The adjugate is a polynomial in the entries of $\mathbf{F}$ and remains well-defined as $J \to 0$, while $\mathbf{F}^{-T}$ blows up.</li>
  <li>The force expression carries the <em>signed</em> $s = \lambda(J - \alpha)$, including when $J &lt; 0$ (inverted tet) or $J &lt; \alpha$ (compressed). This is what pulls inverted tets back through $J = 0$.</li>
  <li>The cancellation breaks for higher-order elements (quadratic tets, hexes, isogeometric basis), where $\partial F/\partial x_a$ is no longer a constant rank-1 dyad. If you adapt this evaluator to a non-linear element, the $\mathbf{A}_\sigma$ term reappears and needs SPD projection.</li>
  <li>The cancellation also breaks for off-diagonal blocks, so a global Newton solver assembling $\mathbf{H}_{ab}$ for $a \neq b$ does need a clamp. VBD’s per-vertex block does not.</li>
</ol>

<h2 id="summary">Summary</h2>

<p>For a linear tet with a stable Neo-Hookean energy, the VBD per-vertex block reduces to</p>

\[\mathbf{H}\_{aa} \;=\; \mu\,\|\mathbf{m}^a\|^2\,\mathbf{I}\_3 \;+\; \lambda\,(\text{cof}\,\mathbf{F}\,\mathbf{m}^a)(\text{cof}\,\mathbf{F}\,\mathbf{m}^a)^T,\]

<p>unconditionally PSD without any projection of the cofactor-derivative term. The cancellation comes from $\partial F/\partial x_a^\alpha = \delta_{i\alpha}\,m_j^a$ being a rank-1 dyad and the Hessian of $\det\mathbf{F}$ being antisymmetric in matching index pairs, so the contraction collapses through $\mathbf{m}^a \times \mathbf{m}^a = 0$. Force uses the unclamped stress and inversion recovery is carried by the gradient, not the Hessian.</p>]]></content><author><name>Anka He Chen</name><email>ankachan92@gmail.com</email></author><category term="physics-simulation" /><category term="VBD" /><category term="neo-hookean" /><category term="computer-graphics" /><summary type="html"><![CDATA[This post derives the per-vertex 3×3 Hessian block for the stable Neo-Hookean tet material under VBD-style block Gauss-Seidel, and shows how it lands as an unconditionally PSD expression with no clamp or eigenvalue projection required. The derivation is short but the algebraic cancellation it relies on is easy to miss, so it is worth writing out in full. The post is meant as a reference for anyone wiring stable Neo-Hookean into a VBD solver.]]></summary></entry><entry><title type="html">Rigid Body Dynamics with VBD, Section I: Free Bodies</title><link href="https://ankachan.github.io/posts/2026/03/vbd-rigid-body-section-1/" rel="alternate" type="text/html" title="Rigid Body Dynamics with VBD, Section I: Free Bodies" /><published>2026-03-19T00:00:00-07:00</published><updated>2026-03-19T00:00:00-07:00</updated><id>https://ankachan.github.io/posts/2026/03/vbd-rigid-body-section-1</id><content type="html" xml:base="https://ankachan.github.io/posts/2026/03/vbd-rigid-body-section-1/"><![CDATA[<p>In the <a href="https://doi.org/10.1145/3658179">VBD paper</a> (SIGGRAPH 2024), we briefly discuss extending Vertex Block Descent to rigid body simulation. The idea is natural: instead of updating a single vertex with 3 DoF, you update an entire rigid body with 6 DoF. But the details matter. This post walks through the full derivation—from the continuous Newton-Euler equations, to discrete backward Euler as a nonlinear system, to the Schur complement solve you actually run each iteration—with reference code from <a href="https://github.com/newton-physics/newton">Newton</a>, which implements this approach under the name AVBD (Augmented VBD).</p>

<p>This is Section I: free (unconstrained) rigid bodies. Section II will cover articulated bodies with joints.</p>

<blockquote>
  <p><strong>Prerequisite:</strong> This post assumes familiarity with quaternion math for rigid body rotation—in particular the rotation-vector exponential map, quaternion multiplication, and how angular velocity integrates orientation. If you’re rusty on any of this, I recommend reading <a href="/posts/2026/04/quaternion-primer/">Quaternion Math for Rigid Body Simulation</a> first.</p>
</blockquote>

<hr />

<h2 id="continuous-rigid-body-dynamics">Continuous Rigid Body Dynamics</h2>

<p>A rigid body has two coupled equations of motion. For the <strong>translational</strong> DoF:</p>

\[m \ddot{\mathbf{x}}_\text{com} = \mathbf{f}\]

<p>where $m$ is the total mass, $\mathbf{x}_\text{com}$ is the world-space center-of-mass position, and $\mathbf{f}$ includes gravity, contact forces, and applied forces.</p>

<p>For the <strong>rotational</strong> DoF, the Newton-Euler equation in the <strong>body frame</strong> (where inertia is constant) is:</p>

\[\mathbf{I}_\text{body}\,\dot{\boldsymbol{\omega}} + \boldsymbol{\omega} \times (\mathbf{I}_\text{body}\,\boldsymbol{\omega}) = \boldsymbol{\tau}\]

<p>where $\boldsymbol{\omega}$ is the angular velocity in the body frame, $\boldsymbol{\tau}$ is the torque mapped to the body frame, and $\mathbf{I}_\text{body}$ is the constant body-frame inertia tensor. In the world frame this is equivalently $\mathbf{I}_\text{world}\,\dot{\boldsymbol{\omega}}_\text{world} = \boldsymbol{\tau}_\text{world} - \boldsymbol{\omega}_\text{world} \times \mathbf{I}_\text{world}\,\boldsymbol{\omega}_\text{world}$ with $\mathbf{I}_\text{world} = \mathbf{R}\,\mathbf{I}_\text{body}\,\mathbf{R}^T$.</p>

<p>The full state at step $n$ is: position $\mathbf{x}^n \in \mathbb{R}^3$, orientation $\mathbf{R}^n \in SO(3)$, linear velocity $\mathbf{v}^n$, body-frame angular velocity $\boldsymbol{\omega}^n$, mass $m$, and body inertia $\mathbf{I}_\text{body}$.</p>

<hr />

<h2 id="discretizing-with-backward-euler">Discretizing with Backward Euler</h2>

<h3 id="step-1-pose-increments-as-dofs">Step 1: Pose Increments as DoFs</h3>

<p>Rather than solving for $\mathbf{x}^{n+1}$ and $\mathbf{R}^{n+1}$ directly, we introduce <strong>pose increments</strong> as the unknowns:</p>

\[\Delta\mathbf{x} \in \mathbb{R}^3, \qquad \Delta\boldsymbol{\theta} \in \mathbb{R}^3 \;\text{(rotation vector)}\]

<p>The new pose is then:</p>

\[\mathbf{x}^{n+1} = \mathbf{x}^n + \Delta\mathbf{x}\]

\[\mathbf{R}^{n+1} = \exp(\widehat{\Delta\boldsymbol{\theta}})\,\mathbf{R}^n\]

<p>where $\widehat{\Delta\boldsymbol{\theta}}$ is the skew-symmetric matrix of $\Delta\boldsymbol{\theta}$. This is the standard left-perturbation on $SO(3)$. We will find $\Delta\mathbf{x}$ and $\Delta\boldsymbol{\theta}$ by enforcing implicit Euler as a 6-equation residual system.</p>

<h3 id="step-2-translational-residual">Step 2: Translational Residual</h3>

<p>Start from the standard backward Euler update:</p>

\[m\,\frac{\mathbf{v}^{n+1} - \mathbf{v}^n}{h} = \mathbf{f}(\mathbf{x}^{n+1}, \mathbf{R}^{n+1})\]

<p>Use the kinematic relation $\mathbf{x}^{n+1} = \mathbf{x}^n + h\,\mathbf{v}^{n+1}$ to eliminate $\mathbf{v}^{n+1} = \Delta\mathbf{x}/h$:</p>

\[m\,\frac{\Delta\mathbf{x}/h - \mathbf{v}^n}{h} = \mathbf{f}(\mathbf{x}^n + \Delta\mathbf{x},\; \mathbf{R}^{n+1}(\Delta\boldsymbol{\theta}))\]

<p>Rearranging to residual form:</p>

\[\mathbf{r}_\text{lin}(\Delta\mathbf{x}, \Delta\boldsymbol{\theta}) \;=\; \frac{m}{h^2}\!\left(\Delta\mathbf{x} - h\mathbf{v}^n\right) - \mathbf{f}(\mathbf{x}^n + \Delta\mathbf{x},\; \mathbf{R}^{n+1}) \;=\; \mathbf{0}\]

<h3 id="step-3-rotational-residual-body-frame">Step 3: Rotational Residual (Body Frame)</h3>

<p>Recall Newton-Euler equation:</p>

\[\mathbf{I}_\text{body}\,\dot{\boldsymbol{\omega}} + \boldsymbol{\omega} \times (\mathbf{I}_\text{body}\,\boldsymbol{\omega}) = \boldsymbol{\tau}\]

<p>Convert to the stand ODE form of $\dot{\boldsymbol{\omega}}=f(\boldsymbol{\omega}, t)$, we have:
\(\dot{\boldsymbol{\omega}}= I^{-1}_\text{body}(\boldsymbol{\tau} -  \boldsymbol{\omega} \times (\mathbf{I}_\text{body}\,\boldsymbol{\omega}))\)</p>

<p>Work in the body frame where $\mathbf{I}_\text{body}$ is constant. Backward Euler on the angular velocity gives:</p>

\[\mathbf{I}_\text{body}\,\frac{\boldsymbol{\omega}^{n+1} - \boldsymbol{\omega}^n}{h} + \boldsymbol{\omega}^{n+1} \times (\mathbf{I}_\text{body}\,\boldsymbol{\omega}^{n+1}) = \boldsymbol{\tau}^{n+1}\]

<p>The rotation increment $\Delta\boldsymbol{\theta}$ integrates to $\mathbf{R}^{n+1}$, so we identify:</p>

\[\boldsymbol{\omega}^{n+1} \approx \frac{\Delta\boldsymbol{\theta}}{h}\]

<p>(constant angular velocity over the step whose integrated angle equals $\Delta\boldsymbol{\theta}$). Substituting and multiplying through by $h$:</p>

\[\mathbf{I}_\text{body}\!\left(\frac{\Delta\boldsymbol{\theta}}{h^2} - \frac{\boldsymbol{\omega}^n}{h}\right) + \frac{1}{h^2}\,\Delta\boldsymbol{\theta} \times (\mathbf{I}_\text{body}\,\Delta\boldsymbol{\theta}) = \boldsymbol{\tau}^{n+1}\]

<p>Rearranging to residual form:</p>

\[\mathbf{r}_\text{rot}(\Delta\mathbf{x}, \Delta\boldsymbol{\theta}) \;=\; \mathbf{I}_\text{body}\!\left(\frac{\Delta\boldsymbol{\theta}}{h^2} - \frac{\boldsymbol{\omega}^n}{h}\right) + \frac{\Delta\boldsymbol{\theta} \times (\mathbf{I}_\text{body}\,\Delta\boldsymbol{\theta})}{h^2} - \boldsymbol{\tau}^{n+1}(\Delta\mathbf{x}, \Delta\boldsymbol{\theta}) \;=\; \mathbf{0}\]

<p>$\Delta\boldsymbol{\theta} \times (\mathbf{I}_\text{body}\,\Delta\boldsymbol{\theta})/h^2$ is called the gyroscopic term. It is a quadratic force term.</p>

<h3 id="step-4-combined-nonlinear-system">Step 4: Combined Nonlinear System</h3>

<p>Stack both residuals into a single 6-equation system:</p>

\[F(\Delta\mathbf{x},\,\Delta\boldsymbol{\theta}) = \begin{bmatrix} \mathbf{r}_\text{lin}(\Delta\mathbf{x}, \Delta\boldsymbol{\theta}) \\\\ \mathbf{r}_\text{rot}(\Delta\mathbf{x}, \Delta\boldsymbol{\theta}) \end{bmatrix} = \mathbf{0}\]

<p>This is solved with Newton’s method. Initialize $\Delta\mathbf{x} = h\mathbf{v}^n$, $\Delta\boldsymbol{\theta} = h\boldsymbol{\omega}^n$ (explicit Euler guess), then iterate:</p>

<ol>
  <li>Evaluate residual $F$</li>
  <li>Build Jacobian $\mathbf{J} = \partial F / \partial (\Delta\mathbf{x},\, \Delta\boldsymbol{\theta})$</li>
  <li>Solve $\mathbf{J}\,\delta = -F$</li>
  <li>Update $\Delta\mathbf{x} \mathrel{+}= \delta_x$, $\Delta\boldsymbol{\theta} \mathrel{+}= \delta_\theta$</li>
</ol>

<p>Once converged, recover the new state and velocities:</p>

\[\mathbf{x}^{n+1} = \mathbf{x}^n + \Delta\mathbf{x}, \qquad \mathbf{R}^{n+1} = \exp(\widehat{\Delta\boldsymbol{\theta}})\,\mathbf{R}^n\]

\[\mathbf{v}^{n+1} = \frac{\Delta\mathbf{x}}{h}, \qquad \boldsymbol{\omega}^{n+1} = \frac{\Delta\boldsymbol{\theta}}{h}\]

<hr />

<h2 id="from-residual-to-the-66-newton-system">From Residual to the 6×6 Newton System</h2>

<p>Rather than solving the full implicit-Euler residual—which includes the nonlinear gyroscopic term $\boldsymbol{\omega}\times\mathbf{I}_\text{body}\boldsymbol{\omega}$ and requires the Newton-Euler equation to stay in the body frame—we split the problem into explicit and implicit parts:</p>

<ul>
  <li><strong>Explicit:</strong> free-body dynamics (inertia, gravity, gyroscopic torque) are forward-integrated once into inertial targets $\mathbf{x}^{\ast}$ and $\mathbf{R}^{\ast}$, then frozen for the rest of the step.</li>
  <li><strong>Implicit:</strong> contact and constraint forces are resolved iteratively through VBD’s Gauss-Seidel sweeps.</li>
</ul>

<p>This is a compromise from fully-implicit backward Euler for the rigid-body dynamics. In exchange, it buys three things: the gyroscopic nonlinearity is absorbed into $\mathbf{R}^{\ast}$ rather than carried in the residual, the angular Hessian stays symmetric positive-definite, and the entire Newton system can be assembled and solved in world frame (since the body-frame gyroscopic term—the reason Newton-Euler is traditionally formulated in the body frame—is no longer present in the iterative solve).</p>

<p>With this split, the rotational residual reduces from</p>

\[\mathbf{r}_\text{rot} = \mathbf{I}_\text{body}\!\left(\frac{\Delta\boldsymbol{\theta}}{h^2} - \frac{\boldsymbol{\omega}^n}{h}\right) + \frac{\Delta\boldsymbol{\theta} \times (\mathbf{I}_\text{body}\,\Delta\boldsymbol{\theta})}{h^2} - \boldsymbol{\tau}^{n+1}\]

<p>to a simple <strong>spring pulling toward the explicit target</strong>:</p>

\[\mathbf{r}_\text{rot} = \frac{1}{h^2}\,\mathbf{I}_\text{world}\,(\Delta\boldsymbol{\theta} - h\boldsymbol{\omega}^{\ast}) - \boldsymbol{\tau}_\text{constraint}\]

<p>where $\boldsymbol{\omega}^{\ast}$ is the gyro-corrected angular velocity from the forward step and $\Delta\boldsymbol{\theta} - h\boldsymbol{\omega}^{\ast}$ is just $-\boldsymbol{\theta}$, the rotation vector from $\mathbf{R}_\text{cur}$ to $\mathbf{R}^{\ast}$. This has the same structure as the translational residual $\tfrac{m}{h^2}(\mathbf{x}_\text{com} - \mathbf{x}^{\ast}_\text{com}) - \mathbf{f}_\text{constraint}$: a quadratic spring to an explicit inertial target, plus implicit constraint forces. The 6×6 system then has a natural 2×2 block structure, all in world frame:</p>

\[\begin{bmatrix} H_{ll} &amp; H_{al}^T \\\\ H_{al} &amp; H_{aa} \end{bmatrix} \begin{bmatrix} \Delta\mathbf{x} \\\\ \Delta\boldsymbol{\omega} \end{bmatrix} = \begin{bmatrix} \mathbf{f}_{lin} \\\\ \mathbf{f}_{ang} \end{bmatrix}\]

<p>where $\Delta\mathbf{x}$ and $\Delta\boldsymbol{\omega}$ are the Newton step corrections and the right-hand side is $-\mathbf{r}$ at the current iterate. In VBD we run this as a <strong>single Newton step per body per VBD iteration</strong>, giving us a fast inner solve with guaranteed descent.</p>

<h3 id="inertial-blocks">Inertial Blocks</h3>

<p>The inertial blocks are simple springs to the forward-integrated targets:</p>

\[H_{ll}^\text{inertia} = \frac{m}{h^2}\mathbf{I}_3, \qquad \mathbf{f}_{lin}^\text{inertia} = \frac{m}{h^2}(\mathbf{x}^{\ast}_\text{com} - \mathbf{x}_\text{com})\]

\[H_{aa}^\text{inertia} = \frac{1}{h^2}\mathbf{I}_\text{world}, \qquad \mathbf{f}_{ang}^\text{inertia} = \frac{1}{h^2}\mathbf{I}_\text{world}\,\boldsymbol{\theta}\]

\[H_{al}^\text{inertia} = \mathbf{0}\]

<p>Here $\mathbf{x}^{\ast}_\text{com} = \mathbf{x}^n_\text{com} + h\mathbf{v}^n + h^2 m^{-1}\mathbf{f}_\text{ext}$ is the <strong>translational inertial target</strong>, $\boldsymbol{\theta}$ is the rotation vector from the current orientation to $\mathbf{R}^{\ast}$, and $\mathbf{I}_\text{world} = \mathbf{R}\,\mathbf{I}_\text{body}\,\mathbf{R}^T$. The linear and angular inertial blocks have identical structure: a mass-weighted pull toward an explicit prediction, with the constraint solve handling everything else.</p>

<p>At convergence the angular residual gives $\mathbf{I}_\text{world}(\boldsymbol{\omega}^{n+1} - \boldsymbol{\omega}^{\ast})/h = \boldsymbol{\tau}_\text{constraint}$, i.e. the only thing that changes $\boldsymbol{\omega}$ from the gyro-corrected prediction is the implicit constraint response.</p>

<h4 id="computing-mathbfrast-baking-the-gyroscopic-term-into-the-target">Computing $\mathbf{R}^{\ast}$: baking the gyroscopic term into the target</h4>

<p>The angular target $\mathbf{R}^{\ast}$ is produced by one semi-implicit Newton-Euler step that includes the gyroscopic torque. Inside <code class="language-plaintext highlighter-rouge">integrate_rigid_body</code> the body-frame torque used to step angular velocity is</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># body-frame angular velocity and torque, with gyroscopic correction
</span><span class="n">wb</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">quat_rotate_inv</span><span class="p">(</span><span class="n">r0</span><span class="p">,</span> <span class="n">w0</span><span class="p">)</span>
<span class="n">tb</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">quat_rotate_inv</span><span class="p">(</span><span class="n">r0</span><span class="p">,</span> <span class="n">t0</span><span class="p">)</span> <span class="o">-</span> <span class="n">wp</span><span class="p">.</span><span class="n">cross</span><span class="p">(</span><span class="n">wb</span><span class="p">,</span> <span class="n">inertia</span> <span class="o">*</span> <span class="n">wb</span><span class="p">)</span>   <span class="c1"># subtract ω × Iω
</span><span class="n">w1</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">quat_rotate</span><span class="p">(</span><span class="n">r0</span><span class="p">,</span> <span class="n">wb</span> <span class="o">+</span> <span class="n">inv_inertia</span> <span class="o">*</span> <span class="n">tb</span> <span class="o">*</span> <span class="n">dt</span><span class="p">)</span>            <span class="c1"># semi-implicit ω*
</span><span class="n">r1</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">normalize</span><span class="p">(</span><span class="n">r0</span> <span class="o">+</span> <span class="n">wp</span><span class="p">.</span><span class="n">quat</span><span class="p">(</span><span class="n">w1</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">)</span> <span class="o">*</span> <span class="n">r0</span> <span class="o">*</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">dt</span><span class="p">)</span>        <span class="c1"># → R*
</span></code></pre></div></div>

<p>Line by line, with $\mathbf{R}^n$ the current orientation, $\boldsymbol{\omega}^n$ the world-frame angular velocity, and $\boldsymbol{\tau}^n$ the world-frame torque:</p>

<p><strong>Rotate $\boldsymbol{\omega}^n$ into the body frame:</strong></p>

\[\boldsymbol{\omega}_b \;=\; (\mathbf{R}^n)^{T}\,\boldsymbol{\omega}^n\]

<p><strong>Rotate the torque into the body frame and subtract the gyroscopic term</strong> (Newton-Euler RHS, $\mathbf{I}_\text{body}\dot{\boldsymbol{\omega}}_b = \boldsymbol{\tau}_b - \boldsymbol{\omega}_b\times\mathbf{I}_\text{body}\boldsymbol{\omega}_b$):</p>

\[\boldsymbol{\tau}_b^\text{eff} \;=\; (\mathbf{R}^n)^{T}\,\boldsymbol{\tau}^n \;-\; \boldsymbol{\omega}_b \times (\mathbf{I}_\text{body}\,\boldsymbol{\omega}_b)\]

<p><strong>Semi-implicit Euler step on body-frame $\boldsymbol{\omega}$, then rotate back to world:</strong></p>

\[\boldsymbol{\omega}^{\ast} \;=\; \mathbf{R}^n\!\left(\boldsymbol{\omega}_b + h\,\mathbf{I}_\text{body}^{-1}\,\boldsymbol{\tau}_b^\text{eff}\right)\]

<p>Compactly:</p>

\[\boxed{\;\boldsymbol{\omega}^{\ast} = \mathbf{R}^n\!\left[\boldsymbol{\omega}_b + h\,\mathbf{I}_\text{body}^{-1}\!\big((\mathbf{R}^n)^{T}\boldsymbol{\tau}^n - \boldsymbol{\omega}_b\times\mathbf{I}_\text{body}\boldsymbol{\omega}_b\big)\right], \qquad \mathbf{R}^{\ast} = \exp\!\big(h\,[\boldsymbol{\omega}^{\ast}]_\times\big)\,\mathbf{R}^n\;}\]

<p>Because $\boldsymbol{\omega}^{\ast}$ includes the gyroscopic correction $-\mathbf{I}_\text{body}^{-1}(\boldsymbol{\omega}^n \times \mathbf{I}_\text{body}\boldsymbol{\omega}^n)\,h$, the free-body residual vanishes to leading order at $\mathbf{R}^{\ast}$. The simple inertial spring $\mathbf{f}_\text{ang}^\text{inertia} = h^{-2}\mathbf{I}_\text{world}\,\boldsymbol{\theta}$ therefore agrees with the full nonlinear residual at the initial iterate $\mathbf{R}_\text{cur} = \mathbf{R}^{\ast}$, with the gyroscopic torque’s value rerouted through $\mathbf{R}^{\ast}$ rather than evaluated directly each iteration.</p>

<h4 id="what-this-approximation-drops">What this approximation drops</h4>

<p>The full rotational residual re-centered at $\mathbf{R}^{\ast}$ (writing $\boldsymbol{\delta}$ for the rotation vector from $\mathbf{R}^{\ast}$ to $\mathbf{R}_\text{cur}$, i.e. how far constraints have pushed the iterate off the target) is</p>

\[\mathbf{r}_\text{rot} = \underbrace{\frac{1}{h^2}\,\mathbf{I}_\text{body}\,\boldsymbol{\delta}}_{\text{kept: inertial spring}} \;+\; \underbrace{\frac{\boldsymbol{\omega}\times\mathbf{I}_\text{body}\boldsymbol{\delta} + \boldsymbol{\delta}\times\mathbf{I}_\text{body}\boldsymbol{\omega}}{h}}_{\text{dropped: gyro coupling}} \;+\; \underbrace{\frac{\boldsymbol{\delta}\times\mathbf{I}_\text{body}\boldsymbol{\delta}}{h^2}}_{\text{dropped: quadratic gyro}} \;-\; \boldsymbol{\tau}_\text{constraint}\]

<p>The kept spring scales like $1/h^2$ in $\boldsymbol{\delta}$, the gyro coupling like $|\boldsymbol{\omega}|/h$, and the quadratic gyro like $|\boldsymbol{\delta}|/h^2$. The ratio of the gyro coupling to the inertial spring is $O(|\boldsymbol{\omega}|h)$—small for typical simulation timesteps. Three practical reasons AVBD drops these terms: the gyro coupling’s Jacobian $[\boldsymbol{\omega}]_\times\mathbf{I} - [\mathbf{I}\boldsymbol{\omega}]_\times$ is <strong>not symmetric</strong>, which would break the Cholesky factorization used in the Schur complement solve; keeping the gyroscopic term in the residual would require working in the body frame (where $\mathbf{I}_\text{body}$ is constant), giving up the world-frame formulation that contacts and joints naturally live in; and the gyroscopic term evaluated at $\boldsymbol{\omega}^n$ vs. $\boldsymbol{\omega}^{n+1}$ differs by $O(h)$ in torque units, the same order as backward Euler’s intrinsic discretization error, so refining it further would not improve the integrator’s accuracy.</p>

<p>In Newton, <code class="language-plaintext highlighter-rouge">forward_step_rigid_bodies</code> computes $\mathbf{x}^{\ast}$ and $\mathbf{R}^{\ast}$ by semi-implicit integration, storing them as <code class="language-plaintext highlighter-rouge">body_inertia_q</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># forward_step_rigid_bodies (simplified)
</span><span class="n">q_new</span><span class="p">,</span> <span class="n">qd_new</span> <span class="o">=</span> <span class="n">integrate_rigid_body</span><span class="p">(</span>
    <span class="n">q_current</span><span class="p">,</span> <span class="n">qd_current</span><span class="p">,</span> <span class="n">f_ext</span><span class="p">,</span>
    <span class="n">com_local</span><span class="p">,</span> <span class="n">I_body</span><span class="p">,</span> <span class="n">inv_m</span><span class="p">,</span> <span class="n">inv_I</span><span class="p">,</span> <span class="n">gravity</span><span class="p">,</span> <span class="n">dt</span>
<span class="p">)</span>
<span class="n">body_inertia_q</span><span class="p">[</span><span class="n">tid</span><span class="p">]</span> <span class="o">=</span> <span class="n">q_new</span>   <span class="c1"># frozen inertial target q*
</span><span class="n">body_q</span><span class="p">[</span><span class="n">tid</span><span class="p">]</span>         <span class="o">=</span> <span class="n">q_new</span>   <span class="c1"># initial guess for iterations
</span></code></pre></div></div>

<h3 id="contact-and-constraint-blocks">Contact and Constraint Blocks</h3>

<p>Any force element (contact, joint) acting at contact point $\mathbf{p}$ with moment arm $\mathbf{r} = \mathbf{p} - \mathbf{x}_\text{com}$ contributes:</p>

\[H_{ll}^c = \mathbf{K}_c, \qquad H_{al}^c = -[\mathbf{r}]_\times^T \mathbf{K}_c, \qquad H_{aa}^c = [\mathbf{r}]_\times^T \mathbf{K}_c\,[\mathbf{r}]_\times\]

<p>where $\mathbf{K}_c = \partial \mathbf{f}_c / \partial \mathbf{x}$ is the contact stiffness and $[\mathbf{r}]_\times$ is the skew-symmetric cross-product matrix. All blocks are summed over adjacent force elements before the solve.</p>

<hr />

<h2 id="assembling-the-66-system-in-code">Assembling the 6×6 System in Code</h2>

<h3 id="contact-force-and-hessian-evaluate_rigid_contact_from_collision">Contact Force and Hessian (<code class="language-plaintext highlighter-rouge">evaluate_rigid_contact_from_collision</code>)</h3>

<p>For each contact between body $A$ and body $B$, the contact model in <code class="language-plaintext highlighter-rouge">evaluate_rigid_contact_from_collision</code> computes the full wrench and Hessian blocks for both bodies. The normal force and stiffness come from the penalty model:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Normal force and stiffness
</span><span class="n">n_outer</span>  <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">outer</span><span class="p">(</span><span class="n">contact_normal</span><span class="p">,</span> <span class="n">contact_normal</span><span class="p">)</span>
<span class="n">f_total</span>  <span class="o">=</span> <span class="n">contact_normal</span> <span class="o">*</span> <span class="p">(</span><span class="n">contact_ke</span> <span class="o">*</span> <span class="n">penetration_depth</span><span class="p">)</span>
<span class="n">K_total</span>  <span class="o">=</span> <span class="n">contact_ke</span> <span class="o">*</span> <span class="n">n_outer</span>
</code></pre></div></div>

<p>Damping is added when the contact is closing ($\mathbf{v}_\text{rel} \cdot \hat{\mathbf{n}} &lt; 0$):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Relative velocity via finite difference of contact points
</span><span class="n">dx_rel</span> <span class="o">=</span> <span class="p">(</span><span class="n">x_c_b_now</span> <span class="o">-</span> <span class="n">x_c_b_prev</span><span class="p">)</span> <span class="o">-</span> <span class="p">(</span><span class="n">x_c_a_now</span> <span class="o">-</span> <span class="n">x_c_a_prev</span><span class="p">)</span>
<span class="n">v_rel</span>  <span class="o">=</span> <span class="n">dx_rel</span> <span class="o">/</span> <span class="n">dt</span>
<span class="n">v_dot_n</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">contact_normal</span><span class="p">,</span> <span class="n">v_rel</span><span class="p">)</span>

<span class="k">if</span> <span class="n">contact_kd</span> <span class="o">&gt;</span> <span class="mf">0.0</span> <span class="ow">and</span> <span class="n">v_dot_n</span> <span class="o">&lt;</span> <span class="mf">0.0</span><span class="p">:</span>
    <span class="n">damping_coeff</span>    <span class="o">=</span> <span class="n">contact_kd</span> <span class="o">*</span> <span class="n">contact_ke</span>
    <span class="n">f_total</span>         <span class="o">+=</span> <span class="o">-</span><span class="n">damping_coeff</span> <span class="o">*</span> <span class="n">v_dot_n</span> <span class="o">*</span> <span class="n">contact_normal</span>
    <span class="n">K_total</span>         <span class="o">+=</span> <span class="p">(</span><span class="n">damping_coeff</span> <span class="o">/</span> <span class="n">dt</span><span class="p">)</span> <span class="o">*</span> <span class="n">n_outer</span>
</code></pre></div></div>

<p>Then for each body the moment arm $\mathbf{r} = \mathbf{p}_\text{contact} - \mathbf{x}_\text{com}$ is used to build all three Hessian blocks:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Body B side (body A is symmetric with opposite sign on force)
</span><span class="n">force_b</span>  <span class="o">=</span>  <span class="n">f_total</span>
<span class="n">torque_b</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">cross</span><span class="p">(</span><span class="n">r_b</span><span class="p">,</span> <span class="n">force_b</span><span class="p">)</span>

<span class="n">r_b_skew</span>       <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">skew</span><span class="p">(</span><span class="n">r_b</span><span class="p">)</span>               <span class="c1"># [r]_x
</span><span class="n">r_b_skew_T_K</span>   <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">transpose</span><span class="p">(</span><span class="n">r_b_skew</span><span class="p">)</span> <span class="o">*</span> <span class="n">K_total</span>

<span class="n">h_ll_b</span> <span class="o">=</span> <span class="n">K_total</span>                             <span class="c1"># ∂f/∂x
</span><span class="n">h_al_b</span> <span class="o">=</span> <span class="o">-</span><span class="n">r_b_skew_T_K</span>                      <span class="c1"># ∂τ/∂x  =  -[r]_x^T K
</span><span class="n">h_aa_b</span> <span class="o">=</span>  <span class="n">r_b_skew_T_K</span> <span class="o">*</span> <span class="n">r_b_skew</span>           <span class="c1"># ∂τ/∂ω  =  [r]_x^T K [r]_x
</span></code></pre></div></div>

<h3 id="per-body-accumulation-accumulate_body_body_contacts_per_body">Per-Body Accumulation (<code class="language-plaintext highlighter-rouge">accumulate_body_body_contacts_per_body</code>)</h3>

<p>Rather than iterating over all contacts globally, the solver builds a <strong>per-body contact list</strong> once per step (a CSR-style buffer). During each Gauss-Seidel color sweep, each body iterates only over its own contacts using 16 strided threads, accumulating into local registers before a single atomic write:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Each body_id iterates its own contact list (strided over 16 threads)
</span><span class="n">force_acc</span> <span class="o">=</span> <span class="n">vec3</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>  <span class="n">torque_acc</span> <span class="o">=</span> <span class="n">vec3</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">h_ll_acc</span>  <span class="o">=</span> <span class="n">mat33</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="n">h_al_acc</span>  <span class="o">=</span> <span class="n">mat33</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="n">h_aa_acc</span> <span class="o">=</span> <span class="n">mat33</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>

<span class="n">i</span> <span class="o">=</span> <span class="n">thread_id_within_body</span>               <span class="c1"># 0..15
</span><span class="k">while</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">num_contacts_for_body</span><span class="p">:</span>
    <span class="n">contact_idx</span> <span class="o">=</span> <span class="n">body_contact_indices</span><span class="p">[</span><span class="n">body_id</span> <span class="o">*</span> <span class="n">buffer_size</span> <span class="o">+</span> <span class="n">i</span><span class="p">]</span>

    <span class="c1"># Compute contact world points and penetration depth
</span>    <span class="n">cp0_world</span> <span class="o">=</span> <span class="n">transform_point</span><span class="p">(</span><span class="n">body_q</span><span class="p">[</span><span class="n">b0</span><span class="p">],</span> <span class="n">cp0_local</span><span class="p">)</span>
    <span class="n">cp1_world</span> <span class="o">=</span> <span class="n">transform_point</span><span class="p">(</span><span class="n">body_q</span><span class="p">[</span><span class="n">b1</span><span class="p">],</span> <span class="n">cp1_local</span><span class="p">)</span>
    <span class="n">penetration</span> <span class="o">=</span> <span class="n">thickness</span> <span class="o">-</span> <span class="n">dot</span><span class="p">(</span><span class="n">contact_normal</span><span class="p">,</span> <span class="n">cp1_world</span> <span class="o">-</span> <span class="n">cp0_world</span><span class="p">)</span>

    <span class="k">if</span> <span class="n">penetration</span> <span class="o">&gt;</span> <span class="n">eps</span><span class="p">:</span>
        <span class="n">force_0</span><span class="p">,</span> <span class="n">torque_0</span><span class="p">,</span> <span class="n">h_ll_0</span><span class="p">,</span> <span class="n">h_al_0</span><span class="p">,</span> <span class="n">h_aa_0</span><span class="p">,</span>
        <span class="n">force_1</span><span class="p">,</span> <span class="n">torque_1</span><span class="p">,</span> <span class="n">h_ll_1</span><span class="p">,</span> <span class="n">h_al_1</span><span class="p">,</span> <span class="n">h_aa_1</span> <span class="o">=</span> \
            <span class="n">evaluate_rigid_contact_from_collision</span><span class="p">(</span><span class="n">b0</span><span class="p">,</span> <span class="n">b1</span><span class="p">,</span> <span class="p">...)</span>

        <span class="c1"># Pick the side that belongs to this body
</span>        <span class="k">if</span> <span class="n">body_id</span> <span class="o">==</span> <span class="n">b0</span><span class="p">:</span>
            <span class="n">force_acc</span> <span class="o">+=</span> <span class="n">force_0</span><span class="p">;</span>  <span class="n">torque_acc</span> <span class="o">+=</span> <span class="n">torque_0</span>
            <span class="n">h_ll_acc</span>  <span class="o">+=</span> <span class="n">h_ll_0</span><span class="p">;</span>   <span class="n">h_al_acc</span>  <span class="o">+=</span> <span class="n">h_al_0</span><span class="p">;</span>  <span class="n">h_aa_acc</span> <span class="o">+=</span> <span class="n">h_aa_0</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">force_acc</span> <span class="o">+=</span> <span class="n">force_1</span><span class="p">;</span>  <span class="n">torque_acc</span> <span class="o">+=</span> <span class="n">torque_1</span>
            <span class="n">h_ll_acc</span>  <span class="o">+=</span> <span class="n">h_ll_1</span><span class="p">;</span>   <span class="n">h_al_acc</span>  <span class="o">+=</span> <span class="n">h_al_1</span><span class="p">;</span>  <span class="n">h_aa_acc</span> <span class="o">+=</span> <span class="n">h_aa_1</span>

    <span class="n">i</span> <span class="o">+=</span> <span class="mi">16</span>   <span class="c1"># stride
</span>
<span class="c1"># One atomic add per body at the end
</span><span class="n">atomic_add</span><span class="p">(</span><span class="n">body_forces</span><span class="p">,</span>      <span class="n">body_id</span><span class="p">,</span> <span class="n">force_acc</span><span class="p">)</span>
<span class="n">atomic_add</span><span class="p">(</span><span class="n">body_torques</span><span class="p">,</span>     <span class="n">body_id</span><span class="p">,</span> <span class="n">torque_acc</span><span class="p">)</span>
<span class="n">atomic_add</span><span class="p">(</span><span class="n">body_hessian_ll</span><span class="p">,</span>  <span class="n">body_id</span><span class="p">,</span> <span class="n">h_ll_acc</span><span class="p">)</span>
<span class="n">atomic_add</span><span class="p">(</span><span class="n">body_hessian_al</span><span class="p">,</span>  <span class="n">body_id</span><span class="p">,</span> <span class="n">h_al_acc</span><span class="p">)</span>
<span class="n">atomic_add</span><span class="p">(</span><span class="n">body_hessian_aa</span><span class="p">,</span>  <span class="n">body_id</span><span class="p">,</span> <span class="n">h_aa_acc</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="final-assembly-and-solve-solve_rigid_body">Final Assembly and Solve (<code class="language-plaintext highlighter-rouge">solve_rigid_body</code>)</h3>

<p>After all contacts (and joints, via <code class="language-plaintext highlighter-rouge">evaluate_joint_force_hessian</code>) have been accumulated into <code class="language-plaintext highlighter-rouge">body_forces/torques/hessians</code>, <code class="language-plaintext highlighter-rouge">solve_rigid_body</code> reads those external contributions and adds the <strong>inertial blocks</strong> to form the complete system:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># ── Inertial contributions ────────────────────────────────────────
</span><span class="n">inertial_coeff</span> <span class="o">=</span> <span class="n">m</span> <span class="o">*</span> <span class="n">dt_sqr_reciprocal</span>          <span class="c1"># m/h²
</span>
<span class="c1"># Linear inertial force: pull COM toward inertial target
</span><span class="n">f_lin</span> <span class="o">=</span> <span class="p">(</span><span class="n">com_star</span> <span class="o">-</span> <span class="n">com_current</span><span class="p">)</span> <span class="o">*</span> <span class="n">inertial_coeff</span>

<span class="c1"># Angular inertial torque: pull orientation toward target
</span><span class="n">q_delta</span>   <span class="o">=</span> <span class="n">quat_inverse</span><span class="p">(</span><span class="n">rot_current</span><span class="p">)</span> <span class="o">*</span> <span class="n">rot_star</span>
<span class="n">theta_body</span> <span class="o">=</span> <span class="n">axis_angle_to_vec</span><span class="p">(</span><span class="n">q_delta</span><span class="p">)</span>         <span class="c1"># rotation vector in body frame
</span><span class="n">tau_body</span>   <span class="o">=</span> <span class="n">I_body</span> <span class="o">*</span> <span class="p">(</span><span class="n">theta_body</span> <span class="o">*</span> <span class="n">dt_sqr_reciprocal</span><span class="p">)</span>
<span class="n">tau_world</span>  <span class="o">=</span> <span class="n">quat_rotate</span><span class="p">(</span><span class="n">rot_current</span><span class="p">,</span> <span class="n">tau_body</span><span class="p">)</span>

<span class="c1"># Angular Hessian in world frame
</span><span class="n">R_cur</span>      <span class="o">=</span> <span class="n">quat_to_matrix</span><span class="p">(</span><span class="n">rot_current</span><span class="p">)</span>
<span class="n">I_world</span>    <span class="o">=</span> <span class="n">R_cur</span> <span class="o">*</span> <span class="n">I_body</span> <span class="o">*</span> <span class="n">R_cur</span><span class="p">.</span><span class="n">T</span>
<span class="n">angular_hessian</span> <span class="o">=</span> <span class="n">dt_sqr_reciprocal</span> <span class="o">*</span> <span class="n">I_world</span>

<span class="c1"># ── Add external (contact + joint) contributions ──────────────────
</span><span class="n">f_force</span>  <span class="o">=</span> <span class="n">f_lin</span>   <span class="o">+</span> <span class="n">external_forces</span><span class="p">[</span><span class="n">body_id</span><span class="p">]</span>
<span class="n">f_torque</span> <span class="o">=</span> <span class="n">tau_world</span> <span class="o">+</span> <span class="n">external_torques</span><span class="p">[</span><span class="n">body_id</span><span class="p">]</span>

<span class="n">h_ll</span> <span class="o">=</span> <span class="n">diag</span><span class="p">(</span><span class="n">inertial_coeff</span><span class="p">)</span> <span class="o">+</span> <span class="n">external_hessian_ll</span><span class="p">[</span><span class="n">body_id</span><span class="p">]</span>
<span class="n">h_al</span> <span class="o">=</span>                         <span class="n">external_hessian_al</span><span class="p">[</span><span class="n">body_id</span><span class="p">]</span>
<span class="n">h_aa</span> <span class="o">=</span> <span class="n">angular_hessian</span>       <span class="o">+</span> <span class="n">external_hessian_aa</span><span class="p">[</span><span class="n">body_id</span><span class="p">]</span>

<span class="c1"># ── Joint contributions (CSR adjacency loop) ──────────────────────
</span><span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">adjacent_joints</span><span class="p">(</span><span class="n">body_id</span><span class="p">):</span>
    <span class="n">jf</span><span class="p">,</span> <span class="n">jt</span><span class="p">,</span> <span class="n">jH_ll</span><span class="p">,</span> <span class="n">jH_al</span><span class="p">,</span> <span class="n">jH_aa</span> <span class="o">=</span> <span class="n">evaluate_joint_force_hessian</span><span class="p">(</span><span class="n">body_id</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="p">...)</span>
    <span class="n">f_force</span>  <span class="o">+=</span> <span class="n">jf</span><span class="p">;</span>   <span class="n">f_torque</span> <span class="o">+=</span> <span class="n">jt</span>
    <span class="n">h_ll</span> <span class="o">+=</span> <span class="n">jH_ll</span><span class="p">;</span>    <span class="n">h_al</span> <span class="o">+=</span> <span class="n">jH_al</span><span class="p">;</span>   <span class="n">h_aa</span> <span class="o">+=</span> <span class="n">jH_aa</span>

<span class="c1"># ── Schur complement solve (see next section) ─────────────────────
</span><span class="n">dw</span><span class="p">,</span> <span class="n">dx</span> <span class="o">=</span> <span class="n">schur_solve</span><span class="p">(</span><span class="n">h_ll</span><span class="p">,</span> <span class="n">h_al</span><span class="p">,</span> <span class="n">h_aa</span><span class="p">,</span> <span class="n">f_force</span><span class="p">,</span> <span class="n">f_torque</span><span class="p">)</span>
</code></pre></div></div>

<p>The key design point: contacts write into a shared <code class="language-plaintext highlighter-rouge">body_forces/hessians</code> buffer with atomic adds (one write per body per color), while joints are accumulated inline inside <code class="language-plaintext highlighter-rouge">solve_rigid_body</code> via a private loop. Both feed into the same 6×6 solve.</p>

<hr />

<h2 id="solving-via-schur-complement">Solving via Schur Complement</h2>

<p>We reduce the 6×6 system to two successive 3×3 solves. Eliminate $\Delta\mathbf{x}$ from the top block row:</p>

\[\Delta\mathbf{x} = H_{ll}^{-1}(\mathbf{f}_{lin} - H_{al}^T \Delta\boldsymbol{\omega})\]

<p>Substitute into the bottom row:</p>

\[\underbrace{(H_{aa} - H_{al}\,H_{ll}^{-1}\,H_{al}^T)}_{\mathbf{S}}\,\Delta\boldsymbol{\omega} = \mathbf{f}_{ang} - H_{al}\,H_{ll}^{-1}\,\mathbf{f}_{lin}\]

<p>Factorize and solve in order:</p>

<p><strong>Step 1.</strong> $\mathbf{L}_M\mathbf{L}_M^T = H_{ll}$   (Cholesky)</p>

<p><strong>Step 2.</strong> $\mathbf{S} = H_{aa} - H_{al}\,H_{ll}^{-1}\,H_{al}^T$   (Schur complement)</p>

<p><strong>Step 3.</strong> $\mathbf{S}\,\Delta\boldsymbol{\omega} = \mathbf{f}_{ang} - H_{al}\,H_{ll}^{-1}\,\mathbf{f}_{lin}$   (solve for angular)</p>

<p><strong>Step 4.</strong> $H_{ll}\,\Delta\mathbf{x} = \mathbf{f}_{lin} - H_{al}^T\,\Delta\boldsymbol{\omega}$   (back-substitute for linear)</p>

<p>Both 3×3 solves use a packed Cholesky (6-float lower-triangular factor). $H_{aa}$ is lightly regularized first:</p>

\[H_{aa} \leftarrow H_{aa} + \varepsilon\mathbf{I}, \qquad \varepsilon = 10^{-9}\!\left(\tfrac{\mathrm{tr}(H_{aa})}{3} + 1\right)\]

<hr />

<h2 id="pose-update-and-velocity-recovery">Pose Update and Velocity Recovery</h2>

<p>Apply the Newton increments to the current pose:</p>

\[\mathbf{x}_\text{com}^\text{new} = \mathbf{x}_\text{com} + \Delta\mathbf{x}\]

\[\mathbf{r}^\text{new} = \delta\mathbf{r} \otimes \mathbf{r}, \qquad \delta\mathbf{r} = \text{quat\_from\_axis\_angle}\!\left(\tfrac{\Delta\boldsymbol{\omega}}{|\Delta\boldsymbol{\omega}|},\; |\Delta\boldsymbol{\omega}|\right)\]

<p>For small $\Delta\boldsymbol{\omega}$ the first-order approximation $\delta\mathbf{r} \approx \text{normalize}(\tfrac{1}{2}\Delta\boldsymbol{\omega},\, 1)$ is used for efficiency (controlled by <code class="language-plaintext highlighter-rouge">_USE_SMALL_ANGLE_APPROX</code> in Newton).</p>

<p>After all VBD iterations finish, velocities are recovered by finite difference (BDF1):</p>

\[\mathbf{v}^{n+1} = \frac{\mathbf{x}_\text{com}^{n+1} - \mathbf{x}_\text{com}^n}{h}, \qquad \boldsymbol{\omega}^{n+1} = \frac{\log(\mathbf{r}^n{}^{-1} \otimes \mathbf{r}^{n+1})}{h}\]

<hr />

<h2 id="avbd-adaptive-penalty-for-constraints-and-contacts">AVBD: Adaptive Penalty for Constraints and Contacts</h2>

<p>For particles, force elements are elastic energies with analytic Hessians. For rigid bodies, the dominant force elements are <strong>contacts</strong> (non-penetration) and <strong>joints</strong> (relative pose targets). Both enter the same Newton system as soft penalty forces with <strong>adaptive stiffness</strong>—this is the “Augmented” in AVBD.</p>

<p>A contact with penetration depth $d &gt; 0$ contributes $E_c = \tfrac{1}{2}k_c d^2$, giving $\mathbf{f}_c = k_c d\,\hat{\mathbf{n}}$ and stiffness $k_c$. Rather than a fixed $k_c$, the penalty grows each iteration to push the violation toward zero:</p>

\[k\_c \leftarrow \min\!\left(k\_c + \beta\,|C|,\; k\_\text{max}\right)\]

<p>where $C$ is the constraint violation, $\beta$ is a ramp rate, and $k_\text{max}$ is the material stiffness cap. At the start of each timestep, $k_c$ is warmstarted from the previous step with a small decay:</p>

\[k\_c \leftarrow \gamma\,k\_c, \qquad k\_c \in [k\_\text{min},\; k\_\text{max}]\]

<p>with $\gamma \approx 0.99$. This carries stiffness information across frames without indefinite growth.</p>

<hr />

<h2 id="the-complete-per-step-algorithm">The Complete Per-Step Algorithm</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># ── Initialization ────────────────────────────────────────────────
for each body b:
    q_star[b] = forward_integrate(q[b], qd[b], f_ext, dt)
    q[b]          = q_star[b]    # initial guess = inertial target
    body_inertia_q[b] = q_star[b]

warmstart_penalties(gamma)       # k &lt;- clamp(gamma*k, k_min, k_max)
build_contact_lists(contacts)    # per-body CSR adjacency

# ── VBD Iterations ────────────────────────────────────────────────
for iter in range(N_iterations):
    for color in body_color_groups:      # Gauss-Seidel by coloring
        zero(body_forces, body_torques, body_hessians)

        for each contact adjacent to bodies in color:
            f, tau, H_ll, H_al, H_aa = contact_force_hessian(...)
            body_forces[b]  += f
            body_torques[b] += tau
            body_hessians[b] += (H_ll, H_al, H_aa)

        for each body b in color:
            # Inertial contributions (from r_lin, r_rot)
            f_lin = (m/h^2) * (x_com_star - x_com)
            theta  = log(r^-1 * r_star)     # rotation vector to target
            f_ang = (I_world/h^2) * theta
            H_ll  = (m/h^2)*I3 + H_ll_contacts
            H_al  =                H_al_contacts
            H_aa  = I_world/h^2  + H_aa_contacts

            for each joint adjacent to b:
                f_lin, f_ang, H_ll, H_al, H_aa += joint_force_hessian(b, j)

            # Schur complement solve
            L_M = chol(H_ll)
            S   = H_aa - H_al @ inv(L_M) @ H_al.T
            dw  = solve(chol(S), f_ang - H_al @ solve(L_M, f_lin))
            dx  = solve(L_M, f_lin - H_al.T @ dw)

            x_com += dx
            r = normalize(quat(dw) * r)

    # Dual update after each sweep
    for each contact c:
        k_c = min(k_c + beta * |penetration_c|, k_max_c)
    for each joint j:
        k_j = min(k_j + beta * |C_j|,           k_max_j)

# ── Finalization ──────────────────────────────────────────────────
for each body b:
    v[b]     = (x_com[b] - x_com_prev[b]) / dt
    omega[b] = quat_velocity(r[b], r_prev[b], dt)
    body_q_prev[b] = body_q[b]
</code></pre></div></div>

<hr />

<h2 id="reference-code">Reference Code</h2>

<p>The implementation lives in <a href="https://github.com/newton-physics/newton/blob/main/newton/_src/solvers/vbd/">Newton’s VBD solver</a>. Key files:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">rigid_vbd_kernels.py</code> — GPU kernels: <code class="language-plaintext highlighter-rouge">forward_step_rigid_bodies</code>, <code class="language-plaintext highlighter-rouge">solve_rigid_body</code>, <code class="language-plaintext highlighter-rouge">update_duals_body_body_contacts</code>, <code class="language-plaintext highlighter-rouge">update_duals_joint</code>, <code class="language-plaintext highlighter-rouge">update_body_velocity</code></li>
  <li><code class="language-plaintext highlighter-rouge">solver_vbd.py</code> — orchestration in <code class="language-plaintext highlighter-rouge">SolverVBD.step()</code> and <code class="language-plaintext highlighter-rouge">_solve_rigid_body_iteration()</code></li>
</ul>

<p>The Schur complement solve from <code class="language-plaintext highlighter-rouge">solve_rigid_body</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Regularize H_aa
</span><span class="n">trA</span> <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">trace</span><span class="p">(</span><span class="n">h_aa</span><span class="p">)</span> <span class="o">/</span> <span class="mf">3.0</span>
<span class="n">eps</span> <span class="o">=</span> <span class="mf">1e-9</span> <span class="o">*</span> <span class="p">(</span><span class="n">trA</span> <span class="o">+</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="n">h_aa</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">]</span> <span class="o">+=</span> <span class="n">eps</span><span class="p">;</span>  <span class="n">h_aa</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">]</span> <span class="o">+=</span> <span class="n">eps</span><span class="p">;</span>  <span class="n">h_aa</span><span class="p">[</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">]</span> <span class="o">+=</span> <span class="n">eps</span>

<span class="c1"># Factorize H_ll
</span><span class="n">Lm</span> <span class="o">=</span> <span class="n">chol33</span><span class="p">(</span><span class="n">h_ll</span><span class="p">)</span>

<span class="c1"># Compute H_ll^{-1} * H_al^T, column by column
</span><span class="n">X0</span> <span class="o">=</span> <span class="n">chol33_solve</span><span class="p">(</span><span class="n">Lm</span><span class="p">,</span> <span class="n">h_al</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">X1</span> <span class="o">=</span> <span class="n">chol33_solve</span><span class="p">(</span><span class="n">Lm</span><span class="p">,</span> <span class="n">h_al</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="n">X2</span> <span class="o">=</span> <span class="n">chol33_solve</span><span class="p">(</span><span class="n">Lm</span><span class="p">,</span> <span class="n">h_al</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="n">MinvCt</span> <span class="o">=</span> <span class="n">mat33_from_columns</span><span class="p">(</span><span class="n">X0</span><span class="p">,</span> <span class="n">X1</span><span class="p">,</span> <span class="n">X2</span><span class="p">)</span>

<span class="c1"># Schur complement and solve
</span><span class="n">S</span>     <span class="o">=</span> <span class="n">h_aa</span> <span class="o">-</span> <span class="n">h_al</span> <span class="o">@</span> <span class="n">MinvCt</span>
<span class="n">Ls</span>    <span class="o">=</span> <span class="n">chol33</span><span class="p">(</span><span class="n">S</span><span class="p">)</span>
<span class="n">rhs_w</span> <span class="o">=</span> <span class="n">f_ang</span> <span class="o">-</span> <span class="n">h_al</span> <span class="o">@</span> <span class="n">chol33_solve</span><span class="p">(</span><span class="n">Lm</span><span class="p">,</span> <span class="n">f_lin</span><span class="p">)</span>
<span class="n">dw</span>    <span class="o">=</span> <span class="n">chol33_solve</span><span class="p">(</span><span class="n">Ls</span><span class="p">,</span> <span class="n">rhs_w</span><span class="p">)</span>           <span class="c1"># angular increment
</span><span class="n">dx</span>    <span class="o">=</span> <span class="n">chol33_solve</span><span class="p">(</span><span class="n">Lm</span><span class="p">,</span> <span class="n">f_lin</span> <span class="o">-</span> <span class="n">wp</span><span class="p">.</span><span class="n">transpose</span><span class="p">(</span><span class="n">h_al</span><span class="p">)</span> <span class="o">@</span> <span class="n">dw</span><span class="p">)</span>  <span class="c1"># linear
</span>
<span class="c1"># Apply (small-angle approximation)
</span><span class="n">half_w</span> <span class="o">=</span> <span class="n">dw</span> <span class="o">*</span> <span class="mf">0.5</span>
<span class="n">dq</span>     <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">normalize</span><span class="p">(</span><span class="n">wp</span><span class="p">.</span><span class="n">quat</span><span class="p">(</span><span class="n">half_w</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">half_w</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">half_w</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="mf">1.0</span><span class="p">))</span>
<span class="n">r_new</span>  <span class="o">=</span> <span class="n">wp</span><span class="p">.</span><span class="n">normalize</span><span class="p">(</span><span class="n">dq</span> <span class="o">*</span> <span class="n">r_current</span><span class="p">)</span>
<span class="n">x_com_new</span> <span class="o">=</span> <span class="n">x_com</span> <span class="o">+</span> <span class="n">dx</span>
</code></pre></div></div>

<p>AVBD dual updates after each color sweep:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Contact penalty (update_duals_body_body_contacts)
</span><span class="n">penetration</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">thickness</span> <span class="o">-</span> <span class="n">dot</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">p1_world</span> <span class="o">-</span> <span class="n">p0_world</span><span class="p">))</span>
<span class="n">k</span><span class="p">[</span><span class="n">contact</span><span class="p">]</span>  <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">k</span><span class="p">[</span><span class="n">contact</span><span class="p">]</span> <span class="o">+</span> <span class="n">beta</span> <span class="o">*</span> <span class="n">penetration</span><span class="p">,</span> <span class="n">k_max</span><span class="p">[</span><span class="n">contact</span><span class="p">])</span>

<span class="c1"># Joint penalty (update_duals_joint), e.g. BALL joint
</span><span class="n">C_lin</span>    <span class="o">=</span> <span class="n">length</span><span class="p">(</span><span class="n">x_child_frame</span> <span class="o">-</span> <span class="n">x_parent_frame</span><span class="p">)</span>
<span class="n">k</span><span class="p">[</span><span class="n">joint</span><span class="p">]</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">k</span><span class="p">[</span><span class="n">joint</span><span class="p">]</span> <span class="o">+</span> <span class="n">beta</span> <span class="o">*</span> <span class="n">C_lin</span><span class="p">,</span> <span class="n">k_max</span><span class="p">[</span><span class="n">joint</span><span class="p">])</span>
</code></pre></div></div>

<hr />

<h2 id="whats-next">What’s Next</h2>

<p>This covers the full pipeline for free rigid bodies: continuous Newton-Euler dynamics, the pose-increment formulation of backward Euler, the resulting 6×6 Newton system and its Schur complement solve, and the AVBD adaptive penalty mechanism for contacts. Each body is updated as an independent local solve within its color group, matching the exact VBD pattern from the particle solver—just 6 DoF instead of 3.</p>

<p>Section II will cover <strong>articulated bodies</strong>: joint constraints, the rotation-vector curvature error for cable/fixed joints, and how the adjacency graph coloring extends to joint chains.</p>

<hr />

<p><em>Newton: <a href="https://github.com/newton-physics/newton">github.com/newton-physics/newton</a></em></p>

<p><em>VBD paper: Anka He Chen, Ziheng Liu, Yin Yang, Cem Yuksel. “Vertex Block Descent.” ACM Trans. Graph. 43, 4, Article 116 (2024). <a href="https://doi.org/10.1145/3658179">doi:10.1145/3658179</a></em></p>]]></content><author><name>Anka He Chen</name><email>ankachan92@gmail.com</email></author><category term="physics-simulation" /><category term="VBD" /><category term="rigid-body" /><category term="computer-graphics" /><category term="SIGGRAPH" /><summary type="html"><![CDATA[In the VBD paper (SIGGRAPH 2024), we briefly discuss extending Vertex Block Descent to rigid body simulation. The idea is natural: instead of updating a single vertex with 3 DoF, you update an entire rigid body with 6 DoF. But the details matter. This post walks through the full derivation—from the continuous Newton-Euler equations, to discrete backward Euler as a nonlinear system, to the Schur complement solve you actually run each iteration—with reference code from Newton, which implements this approach under the name AVBD (Augmented VBD).]]></summary></entry><entry><title type="html">Implementing VBD Damping Properly</title><link href="https://ankachan.github.io/posts/2026/03/implementing-vbd-properly/" rel="alternate" type="text/html" title="Implementing VBD Damping Properly" /><published>2026-03-17T00:00:00-07:00</published><updated>2026-03-17T00:00:00-07:00</updated><id>https://ankachan.github.io/posts/2026/03/implementing-vbd-properly</id><content type="html" xml:base="https://ankachan.github.io/posts/2026/03/implementing-vbd-properly/"><![CDATA[<p>Vertex Block Descent (VBD) is a physics solver we published at SIGGRAPH 2024 for elastic body dynamics. It offers unconditional stability, excellent GPU parallelism, and fast convergence to implicit Euler solutions. While the paper covers the formulation comprehensively, actually implementing VBD correctly—especially the damping—turns out to be subtler than it first appears. This post discusses the key pitfalls and how to get them right, based on lessons learned during development with NVIDIA Warp.</p>

<h2 id="what-is-vbd-in-a-nutshell">What Is VBD, in a Nutshell?</h2>

<p>VBD solves the variational form of implicit Euler:</p>

\[\mathbf{x}^{t+1} = \underset{\mathbf{x}}{\operatorname{argmin}} \; G(\mathbf{x}) = \frac{1}{2h^2} \| \mathbf{x} - \mathbf{y} \|_M^2 + E(\mathbf{x})\]

<p>Instead of assembling and solving a massive global linear system (as Newton’s method would), VBD updates <strong>one vertex at a time</strong>, solving a tiny 3×3 local system:</p>

\[\mathbf{H}_i \, \Delta\mathbf{x}_i = \mathbf{f}_i\]

<p>where $\mathbf{H}_i$ is the local Hessian and $\mathbf{f}_i$ is the total force on vertex $i$, both assembled only from force elements that touch vertex $i$. This is essentially <strong>block Gauss-Seidel</strong> on the vertex positions. Each local solve is cheap (a 3×3 analytical inverse), and because we color vertices rather than elements, we typically need only 6–9 colors for parallelization—an order of magnitude fewer than element-based coloring.</p>

<p>The critical guarantee: every local solve that reduces $G_i$ also reduces the global energy $G$, giving us <strong>unconditional stability</strong> even with a single iteration per time step.</p>

<h2 id="the-damping-trap-why-naïve-rayleigh-damping-breaks-physics">The Damping Trap: Why Naïve Rayleigh Damping Breaks Physics</h2>

<p>The paper describes Rayleigh stiffness-proportional damping as modifying the force and Hessian:</p>

\[\mathbf{f}_i = -\frac{m_i}{h^2}(\mathbf{x}_i - \mathbf{y}_i) - \sum_{j \in \mathcal{F}_i} \frac{\partial E_j}{\partial \mathbf{x}_i} - \left(\sum_{j \in \mathcal{F}_i} \frac{k_d}{h} \frac{\partial^2 E_j}{\partial \mathbf{x}_i^2}\right)(\mathbf{x}_i - \mathbf{x}_i^t)\]

<p>This looks straightforward: take the stiffness Hessian, scale by $k_d/h$, multiply by the displacement (which approximates $h \cdot v$), and add to the force. However, there is a critical implementation subtlety that is easy to miss.</p>

<h3 id="the-bug-damping-that-kills-free-fall">The Bug: Damping That Kills Free Fall</h3>

<p>A naïve implementation might do something like:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">displacement</span> <span class="o">=</span> <span class="n">x_prev</span> <span class="o">-</span> <span class="n">x_current</span>
<span class="n">h_d</span> <span class="o">=</span> <span class="n">hessian</span> <span class="o">*</span> <span class="p">(</span><span class="n">damping</span> <span class="o">/</span> <span class="n">dt</span><span class="p">)</span>
<span class="n">f_d</span> <span class="o">=</span> <span class="n">h_d</span> <span class="o">*</span> <span class="n">displacement</span>
</code></pre></div></div>

<p>This applies damping proportional to the <strong>absolute velocity</strong> of the vertex. The problem is immediate: a freely falling object has nonzero absolute velocity, so this damping fights gravity. Objects sink slower than they should. Stacked objects behave as if embedded in molasses.</p>

<p>The mathematical reason: for full Rayleigh damping, the damping force on vertex $i$ is:</p>

\[\mathbf{f}_{d,i} = -\beta \sum_j \mathbf{K}_{ij} \mathbf{v}_j\]

<p>If the entire system translates rigidly ($\mathbf{v}_j = \mathbf{v}$ for all $j$), then $\mathbf{f}_{d,i} = -\beta (\sum_j \mathbf{K}_{ij}) \mathbf{v}$. For any translation-invariant energy, $\sum_j \mathbf{K}_{ij} = \mathbf{0}$, so the damping force vanishes. But in VBD, we only have the <strong>diagonal block</strong> $\mathbf{K}_{ii}$, and $\mathbf{K}_{ii} \mathbf{v} \neq \mathbf{0}$ in general.</p>

<h3 id="the-fix-damp-the-internal-variable-not-the-position">The Fix: Damp the Internal Variable, Not the Position</h3>

<p>The solution is to formulate damping in terms of <strong>internal variables</strong>—quantities that are inherently invariant to rigid motion.</p>

<p><strong>For volumetric elasticity</strong>, the internal variable is the deformation gradient $\mathbf{F} = \mathbf{D}_s \mathbf{D}_m^{-1}$, where $\mathbf{D}_s = [\mathbf{x}_1 - \mathbf{x}_0, \mathbf{x}_2 - \mathbf{x}_0, \mathbf{x}_3 - \mathbf{x}_0]$. Its rate of change is:</p>

\[\dot{\mathbf{F}} = \dot{\mathbf{D}}_s \mathbf{D}_m^{-1}\]

<p>where $\dot{\mathbf{D}}_s = [\mathbf{v}_1 - \mathbf{v}_0, \mathbf{v}_2 - \mathbf{v}_0, \mathbf{v}_3 - \mathbf{v}_0]$. Notice: $\dot{\mathbf{F}}$ depends only on <strong>relative velocities</strong>. If all four vertices move with the same velocity, $\dot{\mathbf{D}}_s = \mathbf{0}$, so $\dot{\mathbf{F}} = \mathbf{0}$. No damping.</p>

<p>The damping stress is then:</p>

\[\mathbf{P}_{\text{damp}} = k_d \cdot \frac{\partial^2 E}{\partial \mathbf{F}^2} : \dot{\mathbf{F}}\]

<p>And the force on vertex $i$ is assembled as $\mathbf{f}_{d,i} = -V_0 \, \mathbf{G}_i^T \text{vec}(\mathbf{P}_{\text{damp}})$, where $\mathbf{G}_i = \partial \text{vec}(\mathbf{F}) / \partial \mathbf{x}_i$ is the 9×3 matrix mapping vertex displacements to flattened deformation gradient changes.</p>

<p><strong>For dihedral-angle bending</strong>, the internal variable is the dihedral angle $\theta$ between two adjacent triangles. The angular velocity is:</p>

\[\dot{\theta} = \sum_{j=0}^{3} \frac{\partial \theta}{\partial \mathbf{x}_j} \cdot \mathbf{v}_j\]

<p>Since $\theta$ depends only on relative positions, we have $\sum_j \frac{\partial \theta}{\partial \mathbf{x}_j} = \mathbf{0}$. For rigid translation ($\mathbf{v}_j = \mathbf{v}$), $\dot{\theta} = \mathbf{v} \cdot \sum_j \frac{\partial \theta}{\partial \mathbf{x}_j} = 0$. The damping force is:</p>

\[\mathbf{f}_{d,i} = -c \, \dot{\theta} \, \frac{\partial \theta}{\partial \mathbf{x}_i}\]

<p>Think of it like a door hinge with a damper: the damper resists opening/closing, but if you translate the entire door frame, the damper does nothing.</p>

<p><strong>For collision damping</strong>, the internal variable is the gap distance $d$ between contact points. Using barycentric weights $b_j$ that sum to zero ($\sum_j b_j = 0$), the gap rate is:</p>

\[\dot{d} = \sum_j b_j \, (\hat{\mathbf{n}} \cdot \mathbf{v}_j)\]

<p>Again, rigid translation produces $\dot{d} = \hat{\mathbf{n}} \cdot \mathbf{v} \cdot \sum_j b_j = 0$.</p>

<h3 id="the-general-principle">The General Principle</h3>

<p>For <strong>any</strong> energy $E = f(q)$ where $q$ is a translation-invariant internal variable:</p>

\[\sum_j \frac{\partial q}{\partial \mathbf{x}_j} = \mathbf{0}\]

<p>This guarantees $\dot{q} = 0$ for rigid translation, and therefore zero damping force. If $q$ is also rotation-invariant (like edge lengths and dihedral angles), then rigid rotation is also undamped.</p>

<p>The pattern for VBD is always:</p>
<ol>
  <li><strong>Force</strong>: compute using all vertices in the stencil (exact relative velocity information)</li>
  <li><strong>Hessian</strong>: use only the diagonal block (the standard VBD approximation)</li>
</ol>

<p>This asymmetry is fundamental to VBD: forces are exact, Hessians are approximate. Off-diagonal coupling is recovered through iteration.</p>]]></content><author><name>Anka He Chen</name><email>ankachan92@gmail.com</email></author><category term="physics-simulation" /><category term="VBD" /><category term="computer-graphics" /><category term="SIGGRAPH" /><summary type="html"><![CDATA[Vertex Block Descent (VBD) is a physics solver we published at SIGGRAPH 2024 for elastic body dynamics. It offers unconditional stability, excellent GPU parallelism, and fast convergence to implicit Euler solutions. While the paper covers the formulation comprehensively, actually implementing VBD correctly—especially the damping—turns out to be subtler than it first appears. This post discusses the key pitfalls and how to get them right, based on lessons learned during development with NVIDIA Warp.]]></summary></entry><entry><title type="html">Welcome to My Blog</title><link href="https://ankachan.github.io/posts/2024/01/welcome/" rel="alternate" type="text/html" title="Welcome to My Blog" /><published>2024-01-01T00:00:00-08:00</published><updated>2024-01-01T00:00:00-08:00</updated><id>https://ankachan.github.io/posts/2024/01/welcome</id><content type="html" xml:base="https://ankachan.github.io/posts/2024/01/welcome/"><![CDATA[<p>Welcome to my blog! I’ll be sharing updates about my research and projects here.</p>

<p>Stay tuned for future posts!</p>]]></content><author><name>Anka He Chen</name><email>ankachan92@gmail.com</email></author><category term="welcome" /><summary type="html"><![CDATA[Welcome to my blog! I’ll be sharing updates about my research and projects here.]]></summary></entry></feed>