Algorithms · build notes

How a graph draws itself

I built a small viewer that versions a knowledge graph — 162 nodes, 269 edges — and shows what changed between any two revisions. Two problems needed real algorithms, not hand-waving: place nodes that arrive with no coordinates, and compare two versions instantly. Press Play on the figure and step at your own pace.

Lay out 162 nodes that have no x/y — legibly — without an O(n²) blow-up as it scales.

Show exactly what changed between two versions, in a single pass.

The problem

A graph with no coordinates

The data is pure structure — nodes and edges, no positions. To draw it as the organic web you actually want to read, you run a physics simulation and let it find the coordinates for you.

Two forces do the work. Every edge is a spring that pulls its two endpoints together; every pair of nodes carries a like charge that pushes them apart. A faint gravity keeps disconnected pieces from drifting off screen. Let it run and connected clusters tighten while the whole graph breathes out into legible space. The springs are cheap — one per edge, O(E). The repulsion is where the cost hides.

The trap

Every pair pushes

Naive repulsion compares all C(n, 2) pairs on every tick — that is O(n²).

At 162 nodes that is ~13,000 interactions per frame: fine. But this viewer is built to scale to a thousand-node graph, where O(n²) is ~500,000 interactions every frameuntil the layout settles. That quadratic term is the thing to kill — and the fix is a classic: you don’t need every distant node individually, because a faraway cluster pushes on you almost exactly like a single blob sitting at its center of mass. Approximate the far field; compute the near field exactly.

The centerpiece

Barnes–Hut: approximate the far field

Build a quadtree over the nodes, give every cell a center of mass, then for each node walk the tree: if a cell looks small from where you stand — its width s over its distance d falls below a threshold θ — treat the whole thing as one point; otherwise open it and recurse.

Barnes–Hut · approximate the far field

interactions for the target noden−1 = 12

A graph with no coordinates — just nodes and the edges between them. A force layout will arrange it: connected nodes attract, and every node repels every other.

01 / 06

repulsion per tick: O(n log n) with the quadtree vs O(n²) all-pairs

θ is the knob

θ → 0 opens every cell: exact, and slow. A larger θ approximates more aggressively: faster, rougher. The quadtree (Barnes–Hut) with its standard opening criterion is the right tool for the thousand-node target, not just today’s 162.

The algorithm itself

Walk the tree, one cell at a time

The overview showed the shape of the idea. Now watch the recursion actually execute. Step through force(target, cell) for one node and see where every interaction goes — the far cluster collapsing to a single pull is what turns 12 into 7.

force(target, cell) · the θ-walk, stepped

interactions on the target0vs naive n−1 = 12

execution · force(target, cell)

▸1force(target, cell):

2 if cell.mass == 0: return

3 if cell is a single node:

4 if it is target: return

5 add its exact repulsion

6 return

7 s = cell.size

8 d = dist(target, cell.com)

9 if s / d < θ:

10 add repulsion from cell.com

11 else:

12 for child in cell.children: force(target, child)

watch · variables right now

targetnode 0

the node we're computing the net push on

θ0.9

opening threshold — smaller is more exact, slower

interactions0

force terms so far · naive would be n−1 = 12

Call force(target, cell) on each root quadrant, depth-first. Every cell is approximated, opened, or handled as a single node.

We have the quadtree and every cell’s center of mass. Now run the algorithm itself: force(target, cell), depth-first from the root’s four quadrants. Each cell, one question — far enough to fake?

01 / 13

Read the watch panel

Every cell asks one question — s / d < θ? — and takes one of three branches: approximate a far cell by its center of mass, open a near one and recurse, or handle a lone node directly. Multiply that ~log n cells per node across all n nodes and you get Barnes–Hut’s O(n log n).

From one cheap tick to a finished layout

Will it settle — and settle the same way every time?

Barnes–Hut made a single tick cheap. But a layout is not one tick — it is thousands, each nudging every node a little. Two questions decide whether that loop hands you a picture worth looking at: does it ever stop, and does it stop in the same place every time?

Does it stop? Each tick sums the forces and advances positions with velocity-Verlet — steadier than plain Euler, and it keeps a node from rocketing off when two repulsions briefly stack up. A cooling schedule then caps how far any node may move in a tick and decays that cap geometrically — the simulated-annealing idea: big rearrangements early, fine polish late. When the largest move in a tick slips below ε, the layout freezes and stops redrawing.

Cooling schedule · big moves early, freeze latefig

vertical: largest node move in one tick · horizontal: ticks →

Each tick’s largest move shrinks ×0.8; once it dips below ε the layout freezes and stops redrawing.

Does it settle the same way? — and why that is this page’s hinge

The starting scatter comes from a seeded xorshift32 generator, never Math.random — same seed, same final picture, run after run. That reproducibility is not a nicety. Because each version settles to one fixed layout (computed once and memoized per version, positions kept in a parallel Float32Array), you can step from one revision to the next and the graph holds still instead of reshuffling under you. A stable picture per version is exactly what makes “what changed between versions?” — the next section — a question you can even ask.

The second problem

What changed? Diff as a set difference

Comparing two versions sounds like graph-walking, but a diff is just a set difference — so hash it.

Diff · hash-join set differencefig

v001 · parent

seed

149 nodes · 243 edges

⋈hash join

v002 · selected

information-loop

162 nodes · 269 edges

149

unchanged

+13

added

removed

changed

key(node) = id · key(edge) = subject|edge|object
drop one version’s keys into a Map, sweep the other → every lookup O(1), whole diff O(V + E)

Give every node an identity (its id) and every edge an identity (the triple subject|edge|object). Drop one version’s keys into a Map, then sweep the other: present in both is unchanged — and if the payload differs, changed, carrying a field-level delta; present only in the new version is added; present only in the old is removed. Every lookup is O(1), so the whole comparison is a single O(V + E) pass with no replay.

The first thing it had to get right was the seed graph (149 / 243) against the information-loop graph (162 / 269): the diff reports +13 nodes and +26 edges, nothing removed — exactly the nodes and edges that one ingestion added. A hash join, and the numbers fall out.

Every choice, on one page

The decisions, condensed

Place the nodes

A force-directed layout: a spring per edge (Hooke, O(E)) pulling neighbours together, a like-charge repulsion pushing every pair apart, and a whisper of center gravity. Integrate with velocity-Verlet; freeze when the kinetic energy falls below ε.

Tame the repulsion

A Barnes–Hut quadtree with the θ opening criterion. A distant cluster pushes like a single mass at its center, so each node touches O(log n) cells instead of n−1 — O(n log n), not O(n²).

Make it converge

A geometric cooling schedule (simulated-annealing temperature decay) caps per-tick displacement: big rearrangements early, polish late, then a hard freeze — no perpetual animation, CPU idle at rest.

Make it reproducible

A seeded xorshift32 PRNG for the initial scatter (never Math.random), and each version’s settled positions memoized once. Stepping between versions never reshuffles a layout you have already seen.

Diff two versions

A hash-join set difference: node identity is its id, edge identity the triple subject|edge|object. Map membership is O(1), so the full diff — added / removed / changed with field-level deltas — is one O(V + E) pass.

Click a node

The same quadtree, reused: index the current positions and query the O(log n) nearest cell instead of scanning all n. The tree earns its keep twice — once for forces, once for the mouse.

Store a version

Full snapshots over delta-chains — each version is the whole graph (~135 KB), small enough that deltas would be premature. Snapshots buy O(1) random access and replay-free diffs; a parent pointer in the manifest makes “diff vs the previous version” unambiguous.