We propose PonderTTT, a method that applies selective computational updates
to language model inputs based on difficulty. Using a reconstruction loss signal from TTT layers,
the system decides when to trigger additional processing—a decision made without any learned classifier.
A single scalar threshold, calibrated on unlabeled data and adapted during inference, governs update frequency.
Testing on GPT-2 models (124M to 1.5B parameters) for code language modeling shows our approach achieves
82–89% Oracle Recovery while being fully training-free, and substantially improves
performance on out-of-distribution languages compared to random baseline approaches.