Top Top Top

AI's Stealth Surges

Disruption can be deceptive

AI progress can appear to be in the doldrums to many, with little apparent development in the space, at least as far as consumers are concerned. However, this can be quite deceptive. In the background, enormous developments have been made, which are likely to lead to rapid advancement in 2025 and beyond, especially in practically-focused AI systems that excel at solving complex problems.

The main advances are happening in specialized areas that aren't immediately apparent to average users. AI models have improved dramatically at answering PhD-level questions and are accelerating research in fields like materials science. Meanwhile, AI has made remarkable progress in fixing real-world software issues, with success rates increasing from ~5% to over 70% on various benchmarks. Google reports that AI now generates more than 25% of their new code.

AI systems are continually growing more autonomous and capable of complex tasks, sometimes outperforming human experts in specific scenarios while being more cost-effective. Meanwhile, new technological developments suggest this advancement will continue. For example, models fine-tuned on weaker and less expensive generated data consistently outperform those trained on stronger and more costly generated data across multiple benchmarks.

In many cases, you actually end up with better-trained models if you choose the simpler model and generate more solutions. By generating more samples with the less expensive model, you are more likely to solve a wider variety of problems (i.e., you "cover" more unique examples). Moreover, having multiple correct solutions for each problem helps the model learn richer ways to reason. Generating more samples increases the chance of seeing multiple valid ways to solve the same question.

While the simpler model can produce more "false positives" (correct final answers with flawed reasoning steps), the downstream models still learn to arrive at correct solutions. In fact, those final trained models end up just as logically consistent as the ones trained with data from a larger model.

This means that even modest models can produce powerful outputs with appropriate scaffolding to aggregate their outputs effectively, bootstrapping upwards from modest levels of data and compute through powerful synthetic data techniques.

Google's new Titans architecture could be a further game-changer, enabling radically larger models with a much longer life in each context window.

Transformers attend to every token within a "context window," which you might notice tends to get slower as it gets more filled up with requests and info, sometimes leading models to forget important context over time.

Titans introduce a hybrid approach that treats attention as "short-term memory" for immediate context while incorporating a novel "neural memory" module for historical context retention. This memory module continues learning during inference through gradient-based updates, allowing dynamic adaptation to new inputs.

The attention layer manages local, immediate context as short-term processing. A neural memory module continuously learns and selectively forgets information, storing extended-range data in its weights. The system includes an adaptive update mechanism where the memory module updates when encountering novel data, using a "forget gate" to remove outdated or irrelevant information.

The design draws inspiration from human memory processes, particularly in how it handles new information. The system uses "momentary surprise" (deviation from current model understanding) and "past surprise" (decaying record of previous unexpected inputs) to guide memory updates, similar to how human memory prioritizes unexpected events.

This approach represents a significant advancement in neural network design, particularly for processing extremely long sequences. By separating memory functions into both immediate and long-term components, scalability beyond millions of tokens becomes feasible. Expect to see a much larger generation of models applying techniques similar to this before long.