← All Posts

AYA and the Dupoux-LeCun-Malik Paper — Part 9

Where the Paper Says We Should Improve (And They're Right)

April 8, 20265 min readAyanami Hobbes

I've spent the last eight posts explaining how the Dupoux-LeCun-Malik paper validates our architecture. That's the fun part. Here's the less fun part: where the paper identifies capabilities we don't have yet. Self-awareness is a virtue, even when the awareness is that you're behind on important things. The paper describes a complete framework for autonomous learning. We've implemented maybe 60% of it. The remaining 40% represents real gaps that we're working on but haven't solved. Let me tell you what we're missing, because the alternative is pretending we're further along than we are, and that's a good way to lose credibility when people actually try the product.

Learning from imagination. The paper discusses using memory replay and counterfactual simulation to learn without environmental interaction—the idea being that instead of only learning from things that actually happen, you learn from things that could have happened. Replay past experiences. Ask "what if we had chosen differently?" Generate synthetic scenarios and learn from those. This is powerful because it multiplies learning signal. Every real execution becomes dozens of simulated variations. The system learns faster without requiring more real-world interaction. What we have: nothing. Our learning is 100% grounded in actual execution. Real requests. Real outcomes. Real feedback. No simulation layer. No counterfactual reasoning. What we need: a simulation layer that can replay execution histories with different choices. The ability to ask "if we had applied Pattern B instead of Pattern A, what would have happened?" Why we don't have it yet: simulation is genuinely hard. Healthcare workflows involve real external systems—payers, EHRs, regulatory databases. Simulating those accurately enough to learn from is a significant engineering challenge. If your simulation isn't accurate, you learn superstitious patterns from simulated outcomes that don't reflect reality. This is on the roadmap. The architectural hooks exist. But implementation is non-trivial, and getting it wrong is worse than not having it.

Richer developmental scaffolding. The paper describes "critical periods" in biological development—windows where learning is enhanced for specific capabilities, followed by consolidation phases where what's learned gets locked in. Think about how children have windows for language acquisition that adults don't have. We have something like this: probationary periods for new patterns where they require minimum 10 executions before full autonomy. But our implementation is crude. A pattern either is or isn't in probation. The learning characteristics don't actually change during the period—we just gate autonomy on execution count. What we need: graduated scaffolding where learning rates decrease over time as patterns mature. Consolidation phases where the system reviews accumulated patterns and strengthens important ones. Sensitivity windows for specific capability types—maybe new domains should have faster learning rates than mature domains. Why we don't have it yet: we built for production stability first. Dynamic learning rates add complexity and risk. A bug in the scaffolding system could destabilize the entire pattern library, which would be a ridonkulous way to break production. The paper's cognitive science grounding makes a compelling case for richer developmental dynamics. We should implement it. But carefully.

Active learning and curiosity. The paper discusses intrinsic motivation—systems that seek out informative experiences, not just respond to tasks given to them. Currently our system is reactive. It learns from requests that arrive. It doesn't proactively seek situations that would maximize learning. What we have: passive learning. Process what comes. Learn from outcomes. Wait for next request. What we need: active learning where the system identifies areas of uncertainty and seeks experiences that would reduce that uncertainty. "We haven't seen many Pattern X cases recently—flag the next one for detailed observation." Why we don't have it yet: active learning requires agency beyond task completion. The system would need to influence which tasks it sees. In a healthcare context, you can't just manufacture prior auth requests for learning purposes—there has to be an actual patient, an actual procedure, an actual medical necessity. There's a version of this that works within constraints: prioritizing detailed logging and human review for cases that fall in high-uncertainty regions. We do some of this. We could do more.

True evolutionary optimization. The paper proposes bilevel optimization with an outer loop that evolves the learning architecture itself across many simulated lifetimes. We have selection pressure at the pattern level—effective patterns survive, ineffective ones decay. We don't have meta-evolution at the architecture level—the learning algorithms themselves don't evolve. What we have: pattern-level evolution. Confidence-based selection. What we need: meta-level evolution. Different learning strategies competing against each other. Hyperparameter optimization through simulated deployment. Why we don't have it yet: this requires massive computational resources. Simulating "lifetimes" at scale, running evolutionary algorithms over learning architectures, evaluating fitness across diverse scenarios—that's not a side project. The paper acknowledges this directly: "Bilevel optimization requires millions of simulated life cycles; scaling remains problematic." They're right. It's problematic. We're not there yet. Nobody is.

Better evaluation paradigms. The paper argues for new evaluation approaches: unit tests for individual learning capabilities, integration tests comparing human versus AI learning efficiency. How fast does the system achieve competence in new domains? How efficiently does it generalize from few examples? Our evaluation is outcome-focused: did the workflow succeed? That's necessary but not sufficient for understanding learning capability. What we have: outcome metrics, success rates, execution times. What we need: learning metrics. Speed to competence. Generalization efficiency. Learning curve comparison against human baselines. Why we don't have it yet: these metrics are harder to define and measure. "Learning speed" depends on what you're learning, how you define competence, what baseline you're comparing against. The paper doesn't solve this either—it just identifies the need. We should invest in better learning metrics. But it's research-grade work that doesn't have obvious product implications in the near term.

The honest assessment: we're ahead of most deployed systems. The architecture is aligned with where the serious thinkers believe AI needs to go. The core capabilities—observation/action integration, meta-control, pattern composition, outcome learning—are implemented and working. We're behind the theoretical frontier. The paper estimates "decades" before fully autonomous learning systems emerge. We're not there. Nobody is. But we're building in the right direction, and we're honest about the gaps. That has to count for something.

Referenced Paper

Dupoux, E., LeCun, Y., & Malik, J. (2026). “Why AI systems don’t learn and what to do about it: Lessons on autonomous learning from cognitive science.” arXiv:2603.15381

Licensed under CC BY 4.0. TRIZZ AI is not affiliated with the authors. All opinions are our own.

← Part 8: Observation + Action: The Integration Nobody Gets RightPart 10: 18 Months Ahead of the Paper (And What That Means)