The Dupoux-LeCun-Malik paper introduces "System M"—a meta-controller that orchestrates when to observe versus act, what data to prioritize, and how to balance learning modes. It's the conductor of the orchestra. Their description is elegant. But when you actually build one, you discover something: a single meta-controller is a single point of failure. And a single point of failure in a system that's supposed to learn autonomously is, to put it technically, a terrible idea. I learned this the hard way circa January 2025, when our first centralized orchestrator went down and took the entire learning pipeline with it. Twelve hours of accumulated patterns, gone. Not corrupted—just gone. The database was fine. The pattern storage was fine. But the thing that decided what to do with any of it was a smoking crater. That's when we went distributed.
System M, as described in the paper, operates like "a meta-policy: monitoring low-dimensional telemetry and outputting meta-actions." Which is academically correct and practically terrifying if you're running production systems. The paper lists System M's responsibilities: input selection (what data matters), loss/reward modulation (when to learn faster or slower), mode control (when to observe versus act), plus advanced modes like learning from communication and imagination. That's a shedload of responsibility for one component. In our experience, any component that does that many things becomes the component that fails in the most creative ways.
So we didn't build a single System M. We built a collection of specialized services that collectively perform meta-control functions, communicating through message bus rather than shared state. Joan handles workflow orchestration—deciding which patterns to apply to which requests, managing execution sequences, handling the "what should we do" question. A separate intelligence layer handles confidence assessment—the "how sure are we" question—using what we call the CTO principle (Consult The Oracle, not the executive title). Another service handles strategy selection: given the current situation and confidence levels, which learning mode should dominate? Yet another handles performance monitoring, resource allocation, error recovery. Each does its job. They coordinate through messages. No single point of failure.
The decision logic ends up looking something like this: if the complexity score exceeds threshold, consult the external oracle (fer chrissakes, don't guess on complex cases). If pattern confidence exceeds 0.85, apply autonomously. If pattern confidence falls between 0.5 and 0.85, apply but verify. Below 0.5, pause and request detailed guidance. This isn't one system making all these decisions—it's multiple systems, each contributing their piece, with the final decision emerging from their coordination. The paper describes System M "monitoring telemetry and outputting meta-actions." We implement the same concept, just spread across services that can fail independently without cascading.
The coordination overhead is real. Maybe 50-100ms per round-trip when multiple services need to agree on something. For systems that operate in microseconds, that's unacceptable. For healthcare workflows that operate in minutes (sometimes hours, if you've ever waited on hold with a payer), it's noise. We know who we're building for. The alternative—a monolithic meta-controller that processes everything in one place—would be faster in theory and catastrophically fragile in practice. We chose resilience over latency. The paper's authors, coming from a theoretical perspective, describe System M as unified. We, coming from a "things break at 3am and someone has to fix them" perspective, distribute it. Same concept. Different failure modes.
What the paper gets right, and I mean genuinely right, is the insight that meta-control should monitor "low-dimensional telemetry." Not raw data. Compressed signals. Prediction errors. Uncertainty estimates. Things you can act on quickly without drowning in noise. Our confidence scores are exactly this—a single number between 0.0 and 1.0 that compresses a lot of uncertainty analysis into something actionable. Below threshold? Consult. Above threshold? Act. The compression enables fast decisions even in a distributed architecture. If every meta-control decision required re-analyzing raw inputs, the coordination overhead would be unbearable. Instead, services pass compressed signals—confidence scores, pattern identifiers, outcome classifications—and the decisions happen fast.
The thing we're still working on, and the paper identifies this as a gap for the field generally, is learning from imagination. The paper discusses using memory replay and counterfactual simulation to learn without environmental interaction. We don't have this yet. Our learning is grounded entirely in actual execution—real requests, real outcomes, real feedback. There's no "what if we had tried X instead?" capability. Adding it would require a simulation layer, which for healthcare workflows means simulating payer systems, EHR responses, regulatory database lookups. Not trivial. The architecture supports it—the feedback loops exist, the pattern storage exists—but implementation is, suffice it to say, a non-trivial engineering project. It's on the roadmap. The paper's framework gives us theoretical backing for why it matters.
The implication for anyone else building these systems: you can implement System M as a single component or as a distributed mesh. Both approaches satisfy the theoretical requirements. But if you're running production systems that need to stay up, the distributed approach fails more gracefully. One service goes down, the others keep running with degraded capability. The monolith goes down, everything stops. We learned this the expensive way. At any rate.