Scalable Chain of Thoughts via Elastic Reasoning
Salesforce AI Research developed 'Elastic Reasoning,' a framework that enables Large Reasoning Models (LRMs) to operate effectively under strict output length constraints.
Etiqueta
Entradas que incluyen esta etiqueta.
Salesforce AI Research developed 'Elastic Reasoning,' a framework that enables Large Reasoning Models (LRMs) to operate effectively under strict output length constraints.
Researchers at Carnegie Mellon University introduced Length Controlled Policy Optimization (LCPO), an RL-based method that trains large language models to precisely control the length of their reasoning steps.
Researchers at Stanford, UW, and AI2 developed `s1-32B`, an open-source model that achieves state-of-the-art reasoning performance and clear test-time scaling on challenging benchmarks
DeepSeek-AI developed DeepSeek-R1, an LLM demonstrating that sophisticated reasoning capabilities can emerge through pure outcome-based reinforcement learning without reliance on human-annotated reasoning trajectories.