Algorithms · build notes

How a graph draws itself

I built a small viewer that versions a knowledge graph — 162 nodes, 269 edges — and shows what changed between any two revisions. Two problems needed real algorithms, not hand-waving: place nodes that arrive with no coordinates, and compare two versions instantly. Press Play on the figure and step at your own pace.

Q1
Lay out 162 nodes that have no x/y — legibly — without an O(n²) blow-up as it scales.
Q2
Show exactly what changed between two versions, in a single pass.
01
The problem

A graph with no coordinates

The data is pure structure — nodes and edges, no positions. To draw it as the organic web you actually want to read, you run a physics simulation and let it find the coordinates for you.

Two forces do the work. Every edge is a spring that pulls its two endpoints together; every pair of nodes carries a like charge that pushes them apart. A faint gravity keeps disconnected pieces from drifting off screen. Let it run and connected clusters tighten while the whole graph breathes out into legible space. The springs are cheap — one per edge, O(E). The repulsion is where the cost hides.

02
The trap

Every pair pushes

Naive repulsion compares all C(n, 2) pairs on every tick — that is O(n²).

At 162 nodes that is ~13,000 interactions per frame: fine. But this viewer is built to scale to a thousand-node graph, where O(n²) is ~500,000 interactions every frameuntil the layout settles. That quadratic term is the thing to kill — and the fix is a classic: you don’t need every distant node individually, because a faraway cluster pushes on you almost exactly like a single blob sitting at its center of mass. Approximate the far field; compute the near field exactly.

03
The centerpiece

Barnes–Hut: approximate the far field

Build a quadtree over the nodes, give every cell a center of mass, then for each node walk the tree: if a cell looks small from where you stand — its width s over its distance d falls below a threshold θ — treat the whole thing as one point; otherwise open it and recurse.

Barnes–Hut · approximate the far field
interactions for the target noden−1 = 12
target
A graph with no coordinates — just nodes and the edges between them. A force layout will arrange it: connected nodes attract, and every node repels every other.
01 / 06
repulsion per tick: O(n log n) with the quadtree  vs  O(n²) all-pairs
θ is the knob
θ → 0 opens every cell: exact, and slow. A larger θ approximates more aggressively: faster, rougher. The quadtree (Barnes–Hut) with its standard opening criterion is the right tool for the thousand-node target, not just today’s 162.
04
The algorithm itself

Walk the tree, one cell at a time

The overview showed the shape of the idea. Now watch the recursion actually execute. Step through force(target, cell) for one node and see where every interaction goes — the far cluster collapsing to a single pull is what turns 12 into 7.

force(target, cell) · the θ-walk, stepped
interactions on the target0vs naive n−1 = 12
target
execution · force(target, cell)
1force(target, cell):
 2 if cell.mass == 0: return
 3 if cell is a single node:
 4 if it is target: return
 5 add its exact repulsion
 6 return
 7 s = cell.size
 8 d = dist(target, cell.com)
 9 if s / d < θ:
 10 add repulsion from cell.com
 11 else:
 12 for child in cell.children: force(target, child)
watch · variables right now
targetnode 0
the node we're computing the net push on
θ0.9
opening threshold — smaller is more exact, slower
interactions0
force terms so far · naive would be n−1 = 12
Call force(target, cell) on each root quadrant, depth-first. Every cell is approximated, opened, or handled as a single node.
We have the quadtree and every cell’s center of mass. Now run the algorithm itself: force(target, cell), depth-first from the root’s four quadrants. Each cell, one question — far enough to fake?
01 / 13
Read the watch panel
Every cell asks one question — s / d < θ? — and takes one of three branches: approximate a far cell by its center of mass, open a near one and recurse, or handle a lone node directly. Multiply that ~log n cells per node across all n nodes and you get Barnes–Hut’s O(n log n).
05
From one cheap tick to a finished layout

Will it settle — and settle the same way every time?

Barnes–Hut made a single tick cheap. But a layout is not one tick — it is thousands, each nudging every node a little. Two questions decide whether that loop hands you a picture worth looking at: does it ever stop, and does it stop in the same place every time?

Does it stop? Each tick sums the forces and advances positions with velocity-Verlet — steadier than plain Euler, and it keeps a node from rocketing off when two repulsions briefly stack up. A cooling schedule then caps how far any node may move in a tick and decays that cap geometrically — the simulated-annealing idea: big rearrangements early, fine polish late. When the largest move in a tick slips below ε, the layout freezes and stops redrawing.

Cooling schedule · big moves early, freeze latefig
vertical: largest node move in one tick  ·  horizontal: ticks →
ε · freeze thresholdfreeze · CPU idle
Each tick’s largest move shrinks ×0.8; once it dips below ε the layout freezes and stops redrawing.
Does it settle the same way? — and why that is this page’s hinge
The starting scatter comes from a seeded xorshift32 generator, never Math.random — same seed, same final picture, run after run. That reproducibility is not a nicety. Because each version settles to one fixed layout (computed once and memoized per version, positions kept in a parallel Float32Array), you can step from one revision to the next and the graph holds still instead of reshuffling under you. A stable picture per version is exactly what makes “what changed between versions?” — the next section — a question you can even ask.
06
The second problem

What changed? Diff as a set difference

Comparing two versions sounds like graph-walking, but a diff is just a set difference — so hash it.

Diff · hash-join set differencefig
v001 · parent
seed
149 nodes · 243 edges
hash join
v002 · selected
information-loop
162 nodes · 269 edges
149
unchanged
+13
added
0
removed
0
changed
key(node) = id  ·  key(edge) = subject|edge|object
drop one version’s keys into a Map, sweep the other → every lookup O(1), whole diff O(V + E)

Give every node an identity (its id) and every edge an identity (the triple subject|edge|object). Drop one version’s keys into a Map, then sweep the other: present in both is unchanged — and if the payload differs, changed, carrying a field-level delta; present only in the new version is added; present only in the old is removed. Every lookup is O(1), so the whole comparison is a single O(V + E) pass with no replay.

The first thing it had to get right was the seed graph (149 / 243) against the information-loop graph (162 / 269): the diff reports +13 nodes and +26 edges, nothing removed — exactly the nodes and edges that one ingestion added. A hash join, and the numbers fall out.

07
Every choice, on one page

The decisions, condensed

Place the nodes
A force-directed layout: a spring per edge (Hooke, O(E)) pulling neighbours together, a like-charge repulsion pushing every pair apart, and a whisper of center gravity. Integrate with velocity-Verlet; freeze when the kinetic energy falls below ε.
Tame the repulsion
A Barnes–Hut quadtree with the θ opening criterion. A distant cluster pushes like a single mass at its center, so each node touches O(log n) cells instead of n−1 — O(n log n), not O(n²).
Make it converge
A geometric cooling schedule (simulated-annealing temperature decay) caps per-tick displacement: big rearrangements early, polish late, then a hard freeze — no perpetual animation, CPU idle at rest.
Make it reproducible
A seeded xorshift32 PRNG for the initial scatter (never Math.random), and each version’s settled positions memoized once. Stepping between versions never reshuffles a layout you have already seen.
Diff two versions
A hash-join set difference: node identity is its id, edge identity the triple subject|edge|object. Map membership is O(1), so the full diff — added / removed / changed with field-level deltas — is one O(V + E) pass.
Click a node
The same quadtree, reused: index the current positions and query the O(log n) nearest cell instead of scanning all n. The tree earns its keep twice — once for forces, once for the mouse.
Store a version
Full snapshots over delta-chains — each version is the whole graph (~135 KB), small enough that deltas would be premature. Snapshots buy O(1) random access and replay-free diffs; a parent pointer in the manifest makes “diff vs the previous version” unambiguous.
place  ·  approximate  ·  settle  ·  diff  — the whole viewer