OpenAI o3 Achieves Breakthrough on ARC-AGI

Techmeme

OpenAI o3 Achieves Breakthrough on ARC-AGI - 18h

Read more: www.techmeme.com

OpenAI's new o3 model has achieved a significant breakthrough on the ARC-AGI benchmark, demonstrating advanced reasoning capabilities through a 'private chain of thought' mechanism. This approach involves the model searching over natural language programs to solve tasks, with a substantial increase in compute leading to a vastly improved score of 75.7% on the Semi-Private Evaluation set within a $10k compute limit, and 87.5% in a high-compute configuration. The o3 model uses deep learning to guide program search, moving beyond basic next-token prediction. Its ability to recombine knowledge at test time through program execution marks a major step toward more general AI capabilities.

The o3 model's architecture and performance represents a form of deep learning-guided program search, where it explores many paths through program space. This process, which can involve tens of millions of tokens and cost thousands of dollars for a single task, is guided by a base LLM. While o3 appears to be more than just next-token prediction, it’s still being speculated what the core mechanisms of this process are. This breakthrough highlights how increases in compute can drastically improve performance and marks a substantial leap in AI capabilities, moving far beyond previous GPT model performance. The model's development and testing also revealed that it cost around $6,677 to run o3 in "high efficiency" mode against the 400 public ARC-AGI puzzles for a score of 82.8%.

Original img attribution: https://assets.bwbx.io/images/users/iqjWHBFdfxIU/ijetWtOiZdy8/v1/1200x825.jpg

References:

arcprize.org - OpenAI's new o3 system - trained on the ARC-AGI-1 Public Training set - has scored a breakthrough 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit. - 1d
Simon Willison's Weblog - OpenAI o3 breakthrough high score on ARC-AGI-PUB - 1d
Techmeme - Techmeme report about O3 model. - 22h
TechCrunch - TechCrunch reporting on OpenAI's unveiling of o3 and o3-mini with advanced reasoning capabilities. - 22h
Ars Technica - OpenAI announces o3 and o3-mini, its next simulated reasoning models - 20h
THE DECODER - The Decoder article about OpenAI o3 models - 20h
www.heise.de - OpenAI's new o3 model aims to outperform humans in reasoning benchmarks - 20h
NextBigFuture.com - OpenAI Releases O3 Model With High Performance and High Cost - 20h
www.techmeme.com - Techmeme post about OpenAI o3 model - 14h

Classification:

HashTags: OpenAIO3 ARCAGI DeepLearning
Company: OpenAI
Target: AI research
Attacker:
Product: o3
Feature: private chain of thought
Type: AI
Severity: Major

FlagThis AI

OpenAI o3 Achieves Breakthrough on ARC-AGI - 18h

References:

Classification: