Fractional Reasoning in LLMs: A New Way to Control Inference Depth

What is included in this article:

The limitations of current test-time compute strategies in LLMs.
Introduction of Fractional Reasoning (FR) as a training-free, model-agnostic framework.
Techniques for latent state manipulation using reasoning prompts and adjustable scaling.
Breadth- and depth-based scaling benefits demonstrated across GSM8K, MATH500, and GPQA.
Evaluation results showing FR’s superiority over Best-of-N and Majority Vote.
Analysis of FR’s behavior across different models, including DeepSeek-R1.

Introduction: Challenges in Uniform Reasoning During Inference

LLMs have shown improvements in various domains, with test-time compute playing a crucial role in their performance. This approach enhances reasoning during inference by allocating extra computational resources, such as generating multiple candidate responses and selecting the most suitable one, or refining answers iteratively through self-reflection. However, current test-time compute strategies treat all problems uniformly, applying the same depth of reasoning regardless of query difficulty or structure. In reality, reasoning needs are highly variable, and reasoning with under-, overthinking, or reflection can lead to degraded answers or unnecessary computational costs. Therefore, LLMs must be capable of adjusting their reasoning depth or level of reflection dynamically.

Prior Work: Latent Steering and Representation Control

Existing research has explored various methods to enhance LLM reasoning through inference-time scaling and latent state control. The Chain-of-Thought (CoT) prompting technique guides models to decompose complex problems into intermediate steps to improve reasoning performance. Outcome reward models (ORMs) and process reward models (PRMs) evaluate generated responses based on correctness or quality of internal reasoning. Moreover, representation engineering methods use steering vectors in LLM latent spaces for controlled generation, while methods like In-Context Vectors (ICV) extract latent vectors from demonstrations to steer internal states at inference time, and Representation Finetuning (ReFT) learns task-specific low-rank interventions over latent representations.

The Proposed Framework: Fractional Reasoning for Adaptive Inference

Researchers from Stanford University have proposed Fractional Reasoning (FR), a training-free and model-agnostic framework for improving test-time compute through adaptive reasoning control. FR adjusts reasoning behavior by directly modifying the model’s internal representations, extracting the latent shift induced by reasoning-promoting inputs such as CoT or reflection prompts, and again applying this shift with a tunable scaling factor. This enables models to adjust the depth of reasoning during inference without modifying the input text or requiring fine-tuning. FR supports and enhances two key forms of test-time scaling: (a) Breadth-based scaling, like Best-of-N and Majority vote, and (b) Depth-based scaling, like self-reflection.

Benchmarking: Performance Gains on Reasoning Tasks

FR is evaluated on three benchmarks that require multi-step reasoning: GSM8K, MATH500, and GPQA. The evaluation utilizes test sets for GSM8K and MATH500 while using the diamond split for GPQA. Main experiments use two competitive open-source instruction-tuned models: Qwen2.5-7B-Instruct and LLaMA-3.1-8B-Instruct, both of which demonstrate strong reasoning capabilities and provide access to the latent state representations required by the proposed method. FR outperforms standard test-time compute methods on all benchmarks and models, showing that it can strongly enhance performance. Adjusting the influence of prompts enables broader exploration of the solution space, increasing the efficiency of traditional test-time compute methods.

Behavior and Model-Agnostic Generality of Fractional Reasoning

Researchers further analyzed FR to understand its behavioral dynamics, generality across models, and other metrics. Analysis reveals that increasing the scaling parameter leads to longer outputs with more detailed multi-step reasoning, confirming the framework steers model behavior predictably and continuously. FR remains effective even when applied to reasoning-specialized models such as DeepSeek-R1-Distill-Qwen-7B, improving accuracy over standard prompting baselines and showing its generality across both general-purpose and specialized LLMs. Performance scaling analysis shows consistent improvements with an increasing number of generations, and FR shows higher accuracy across most sampling budgets compared to the majority vote baseline.

Conclusion: Towards More Dynamic and Efficient LLM Inference

In conclusion, researchers from Stanford University introduced Fractional Reasoning (FR), a training-free and model-agnostic framework that improves test-time compute through adaptive control of reasoning behavior in LLMs. It offers a general and interpretable approach for more precise and efficient allocation of computational effort during inference, overcoming the limitation of uniform reasoning application in current test-time compute strategies. However, the framework currently depends on predefined reasoning directions and lacks automatic selection of scaling factors, indicating future research directions toward adaptive policies for fully dynamic inference.

Check out the Paper. All credit for this research goes to the researchers of this project. Ready to connect with 1 Million+ AI Devs/Engineers/Researchers? See how NVIDIA, LG AI Research, and top AI companies leverage MarkTechPost to reach their target audience [Learn More]

Sajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.

Source link