The Dupoux-LeCun-Malik paper introduces "Evo/Devo"—an evolutionary-developmental framework where simple capabilities emerge first and complex ones build on top. This isn't a new idea. It's how children learn. It's how skills develop. It's how evolution works. You don't start with calculus; you start with counting. You don't start with symphonies; you start with scales. What's interesting is how rarely AI systems actually implement this principle. Most ship as monolithic capability sets where everything arrives at once, nothing composes from simpler parts, and when something breaks you have no idea which layer caused the problem. We went a different direction, mostly because the monolithic approach kept failing in ways that were expensive to debug.

Our patterns are organized by level, and this is enforced at the schema level so you literally cannot violate it. Levels 0-2 are atomic: single actions, cannot compose other patterns, clear inputs and outputs. Level 3 is spans: must compose at least two L0-L2 patterns, orchestrates sequences, handles failures at component level. Levels 4-9 are workflows: can only compose lower-level patterns, increasingly complex orchestration, never allowed to reference same-level or higher patterns. You cannot define a Level 3 pattern that doesn't reference lower patterns—the system rejects it during validation. This constraint felt annoying when we implemented it. It turned out to be essential for debugging and maintenance.

An example to make this concrete: Level 0 pattern retrieves patient demographics from EHR. Single action, single target system, clear success or failure. Level 0 pattern formats data for payer submission. Single transformation, input to output, deterministic. Level 0 pattern submits to payer portal. Single API call, wait for response, clear success or failure. Level 3 pattern handles prior authorization request by composing all three: retrieve demographics, format data, submit to payer. Orchestrates the sequence, handles failure at each step, provides unified success/failure for the composed operation. Level 5 pattern handles prior auth with appeal workflow by composing: the L3 prior auth pattern, plus denial detection, plus appeal generation. Conditional logic based on outcomes, error handling at multiple levels. Each level builds on the one below. Complex patterns don't reinvent basic capabilities—they compose them. The "retrieve patient demographics" capability is written once, tested once, debugged once, and used by dozens of higher-level patterns.

Why composition matters, and I hope you're paying attention because I'm giving you pearls here: reuse, debuggability, and incremental learning. Reuse: our pattern library has maybe 200 atomic patterns and 800+ composed patterns. That's a 4:1 composition ratio. The atomic patterns do enormous work because every composed pattern that needs demographics just references the existing atomic pattern. No duplication. No drift between implementations. One source of truth for how to get demographics. Debuggability: when a Level 5 pattern fails, you can trace which Level 3 pattern failed, which Level 0 pattern failed, and exactly what went wrong at that level. The composition creates a stack trace of capabilities. With a monolithic pattern, failure is opaque—"it didn't work" tells you nothing. With composed patterns, failure is localized—"the payer submission step returned error code 4021 because the date format was wrong" tells you exactly where to look. Incremental learning: simple patterns can achieve confidence independently of complex ones. A Level 0 pattern might hit 0.95 confidence after 100 executions. The Level 3 pattern that composes it benefits from that confidence even if the Level 3 pattern itself has only run 20 times. Confidence propagates upward through the composition tree.

The paper describes "Evo/Devo" as bilevel optimization: inner loop (developmental) where individual patterns learn from their executions, outer loop (evolutionary) where the pattern library evolves through selection pressure. We implement something similar without the biology language. Developmental: each pattern has a confidence score that updates based on outcomes, patterns go through a probationary period (minimum 10 successful executions) before graduating to full autonomy. Evolutionary: patterns with low effectiveness decay over time, patterns with high effectiveness get prioritized for future matching. Over time the library trends toward what works. We don't run literal evolutionary algorithms—no crossover, no mutation—but the selection pressure is real. Effective patterns survive. Ineffective ones fade. Nobody manually decides which patterns are good.

The constraint that makes this actually work, and most "composable" systems miss this: no upward references. A Level 0 pattern cannot reference a Level 3 pattern. A Level 3 pattern cannot reference a Level 5 pattern. Composition only flows downward. Why? Because upward references create cycles—Pattern A references Pattern B which references Pattern A, and now you've got infinite recursion and a hung system. More subtly, upward references create coupling. If a Level 0 pattern depends on a Level 5 pattern, you can't test the Level 0 pattern in isolation. You can't deploy it independently. You can't reason about it without understanding the entire dependency tree. The downward-only constraint keeps patterns independent. Any pattern can be tested with just its dependencies (which are all simpler). Any pattern can be reasoned about by looking down, never up.

The trade-off, and we're explicit about trade-offs because the paper is too: composition adds overhead. A Level 5 pattern might have 8-12 underlying pattern references. Each reference has coordination cost. Total latency is higher than a monolithic pattern that does everything in one shot. We accept this trade-off because the debuggability and reuse benefits outweigh the latency cost. For workflows measured in minutes (prior auth, denial appeals, eligibility checks), 500ms of composition overhead is invisible. For real-time systems you might make a different choice. We know who we're building for. Monolithic AI is easier to ship initially and harder to maintain forever. Composed AI is harder to ship initially and easier to maintain forever. We chose the long game. Alas.