What this is about
Most things that work in the world are made of parts. Your phone is a stack of well-defined modules, each with a clean interface to the next. A cell is a collection of organelles. A proof is a chain of lemmas. A company is a set of teams with (mostly) known hand-offs. When we want to reason about a complicated system, the first thing we usually do is break it into pieces and think about the pieces.
AI, right now, is in a strange place with respect to this. On the one hand, modern systems are obviously built out of parts: transformer blocks, attention heads, mixture of experts, retrieval indices, tool calls, agent scaffolds. On the other hand, the pieces don’t always behave like pieces. Fine-tune one part of a model and something unrelated breaks. Add a new tool to an agent and the router picks it up in some situations but not others. Compose two prompts that each work fine on their own and get nonsense. The pieces exist in a trivial sense, but they are less part-like than the word “part” suggests, and I find that gap interesting.
1. When does modularity actually emerge?
If you train a big network on a rich enough mixture of tasks, do clean, reusable modules show up on their own? A growing body of interpretability work says “sometimes, maybe, it depends how you look,” and I think the hedging in that answer is where the actual science lives. What counts as a module? How do we tell the difference between a network that is structurally modular and one that just happens to look modular under one particular probe? What would the null hypothesis even be?
2. When does composition generalize?
Once we have parts, when does snapping them together in new ways produce correct new behavior instead of noise? The classical compositional generalization benchmarks have mostly been solved by scale, but I don’t think that settled the question. It mostly showed that with enough data, memorization can do a convincing impression of composition. The question I care about is the harder one: in the regime where a model genuinely has to combine two things it has never seen together, what property of the model or its training predicts whether it will pull it off?
3. What is the right interface between parts?
This one is, I think, the most underrated of the three. Whenever we connect two modules we have to pick a medium. It can be a dense vector, a discrete token, a gradient, a natural-language message, or an executable program. Each choice makes different things easy and different things hard. Dense vectors are information-rich but opaque. Tokens are legible but lossy. Natural language is maximally flexible but adds a whole second inference problem at the boundary. Programs are precise but narrow. A lot of what gets attributed to architecture or scale may come down to choice of interface, and working out which parts do is near the top of what I want to spend time on.
How to read the posts
Everything in this section is written as working notes rather than finished papers. The style leans on the tradition set by distill.pub: margin notes, diagrams where they help, footnotes where they don’t, and prose that tries to be honest about what’s known and what isn’t. Posts are meant to be revised in public. If something is wrong, it will get fixed, not hidden.