Summer of ARC-AGI: Part 2

Jack Hogan

2024-12-04

Part 2: Open Sourcing an Autonomous ARC Solver

In Part 1 of this series, we discussed how our research into automated software development led us to tackle the ARC-AGI challenge, and we described the neurosymbolic approach that we adopted, combining LLMs with an object-centric framework for visual reasoning. Today, we're excited to make this research accessible to the wider community by releasing `arcsolver`, an open-source Python library that implements these ideas.

From Research to Open Source

Our internal solver, which managed to solve 85% of the ARC public training set, was built on repurposed proprietary infrastructure from our core product, CodeWords, that would not be practical to release. Instead, we've developed a new implementation that captures the core ideas while being more accessible and extensible.

Built using `nbdev`, the entire codebase is available as a collection of annotated Jupyter notebooks that researchers can easily experiment with and modify.

We believe this will be a valuable resource for the ARC research community—the interactive notebook format makes it particularly easy to explore different solving strategies or experiment with new primitive operations, while the modular design allows researchers to focus on specific components without needing to modify the entire system.

Architecture Overview

The solver follows a three-stage pipeline that mirrors how a human might approach an ARC task:

Task Description: First, Claude analyses the input-output examples for a given task, generating natural language descriptions of the transformation pattern.
Solution Generation: Using these descriptions, Claude generates Python code that implements the transformation using our object-centric framework.
Iterative Refinement: The solver executes the generated code against the training examples and provides detailed feedback to Claude about any errors or incorrect predictions. This feedback loop continues until either a correct solution is found or the attempt budget is exhausted.

Each stage of this pipeline is designed to be modular and extensible. For example, in the description stage, we implemented two different strategies: the solver can use either a "direct" approach (analysing all examples at once) or an "indirect" approach (analysing examples independently, then synthesising the patterns). Many more different strategies or modifications could be considered and easily incorporated into the solver.

Key Design Features

At the heart of the solver lies our object-centric framework, allowing solutions to be expressed in terms of high-level objects (`Rectangle`, `Line`, `Bitmap`, etc.) and transformations rather than raw grid manipulations. This abstraction layer makes solutions more interpretable and closer to human reasoning patterns.

Caution is always necessary when designing a system that can execute LLM-generated code. Our solver includes highly prescriptive prompts defining the context within which Claude’s proposed solutions should operate. Before any code response is executed, it is parsed and validated to contain classes with the desired method names and types. Only then is it run in an isolated python subprocess with a restricted set of imports and file access.

Performance hasn't been overlooked either. The solver leverages `asyncio` for concurrent API calls and multiprocessing for parallel solution testing, making it efficient even when exploring multiple solution paths simultaneously. This becomes particularly important when scaling up to larger sets of tasks or when experimenting with different solving strategies.

Research Applications

While our open-source solver uses Claude instead of fine-tuned models, its modular design makes it valuable for ARC research in several ways:

Prompt Engineering: The solver's description and solution generation stages provide a framework for experimenting with different prompting strategies.
Training Data Generation: Researchers can use the solver to generate training data for their own models, similar to our iterative fine-tuning approach.
Framework/DSL Design: The object-centric framework can be modified or extended, or the solver can be used to baseline against alternative solution frameworks or DSLs.

Looking Forward

By open-sourcing this solver, we hope to encourage more research into neurosymbolic approaches to abstract reasoning. The ARC challenge remains far from solved, and we believe that exploring the intersection of language models, symbolic reasoning, and object-centric representations will be crucial for progress toward AGI.

Stay tuned for Part 3, where we'll dive deeper into how we trained models using solver-generated training data, including preference fine-tuning and inference optimisations.

In the meantime, play around with our solver and let us know what you think on X and LinkedIn.

— Jack