How a graph draws itself
I built a small viewer that versions a knowledge graph — 162 nodes, 269 edges — and shows what changed between any two revisions. Two problems needed real algorithms, not hand-waving: place nodes that arrive with no coordinates, and compare two versions instantly. Press Play on the figure and step at your own pace.
A graph with no coordinates
The data is pure structure — nodes and edges, no positions. To draw it as the organic web you actually want to read, you run a physics simulation and let it find the coordinates for you.
Two forces do the work. Every edge is a spring that pulls its two endpoints together; every pair of nodes carries a like charge that pushes them apart. A faint gravity keeps disconnected pieces from drifting off screen. Let it run and connected clusters tighten while the whole graph breathes out into legible space. The springs are cheap — one per edge, O(E). The repulsion is where the cost hides.
Every pair pushes
Naive repulsion compares all C(n, 2) pairs on every tick — that is O(n²).
At 162 nodes that is ~13,000 interactions per frame: fine. But this viewer is built to scale to a thousand-node graph, where O(n²) is ~500,000 interactions every frameuntil the layout settles. That quadratic term is the thing to kill — and the fix is a classic: you don’t need every distant node individually, because a faraway cluster pushes on you almost exactly like a single blob sitting at its center of mass. Approximate the far field; compute the near field exactly.
Barnes–Hut: approximate the far field
Build a quadtree over the nodes, give every cell a center of mass, then for each node walk the tree: if a cell looks small from where you stand — its width s over its distance d falls below a threshold θ — treat the whole thing as one point; otherwise open it and recurse.
Walk the tree, one cell at a time
The overview showed the shape of the idea. Now watch the recursion actually execute. Step through force(target, cell) for one node and see where every interaction goes — the far cluster collapsing to a single pull is what turns 12 into 7.
s / d < θ? — and takes one of three branches: approximate a far cell by its center of mass, open a near one and recurse, or handle a lone node directly. Multiply that ~log n cells per node across all n nodes and you get Barnes–Hut’s O(n log n).Will it settle — and settle the same way every time?
Barnes–Hut made a single tick cheap. But a layout is not one tick — it is thousands, each nudging every node a little. Two questions decide whether that loop hands you a picture worth looking at: does it ever stop, and does it stop in the same place every time?
Does it stop? Each tick sums the forces and advances positions with velocity-Verlet — steadier than plain Euler, and it keeps a node from rocketing off when two repulsions briefly stack up. A cooling schedule then caps how far any node may move in a tick and decays that cap geometrically — the simulated-annealing idea: big rearrangements early, fine polish late. When the largest move in a tick slips below ε, the layout freezes and stops redrawing.
Math.random — same seed, same final picture, run after run. That reproducibility is not a nicety. Because each version settles to one fixed layout (computed once and memoized per version, positions kept in a parallel Float32Array), you can step from one revision to the next and the graph holds still instead of reshuffling under you. A stable picture per version is exactly what makes “what changed between versions?” — the next section — a question you can even ask.What changed? Diff as a set difference
Comparing two versions sounds like graph-walking, but a diff is just a set difference — so hash it.
drop one version’s keys into a Map, sweep the other → every lookup O(1), whole diff O(V + E)
Give every node an identity (its id) and every edge an identity (the triple subject|edge|object). Drop one version’s keys into a Map, then sweep the other: present in both is unchanged — and if the payload differs, changed, carrying a field-level delta; present only in the new version is added; present only in the old is removed. Every lookup is O(1), so the whole comparison is a single O(V + E) pass with no replay.
The first thing it had to get right was the seed graph (149 / 243) against the information-loop graph (162 / 269): the diff reports +13 nodes and +26 edges, nothing removed — exactly the nodes and edges that one ingestion added. A hash join, and the numbers fall out.
The decisions, condensed
Math.random), and each version’s settled positions memoized once. Stepping between versions never reshuffles a layout you have already seen.id, edge identity the triple subject|edge|object. Map membership is O(1), so the full diff — added / removed / changed with field-level deltas — is one O(V + E) pass.