The ARC Benchmark: Related Work You Should Know About

cover
11 Apr 2024

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Sebastien Ferr ´e, Univ Rennes, CNRS, Inria, IRISA Campus de Beaulieu, 35042 Rennes, France and [email protected].

Abstract & Introduction

Abstraction and Reasoning Corpus (ARC)

Related Work

Object-centric Models for ARC Grids

MDL-based Model Learning

Evaluation

Conclusion and Perspectives, References & Supplementary Materials

The ARC benchmark is recent and not many approaches have been published so far. All those we know define a DSL (Domain-Specific Language) of programs that transform an input grid into an output grid, and search for a program that is correct on the training examples [10,3,23,2]. The differences mostly lie in the primitive transformations (prior knowledge) and in the search strategy.

It is tempting to define more and more primitives like the Kaggle winner did, hence more prior knowledge, but this means a less intelligent system according to Chollet’s measure. To guide the search in the huge program space, those approaches use either grammatical evolution [10], neural networks [3], search tree pruning with hashing and Tabu list [23], or stochastic search trained on solved tasks [2].

A difficulty is that the output grids are generally only used to score a candidate program so that the search is kind of blind. Alford [3] improves this with a neural-guided bi-directional search that grows the program in both directions, from input and output. Xu [23] compares the in-progress generated grid to the expected grid but this limits the approach to the tasks whose output grids have the same size and same objects as the input grids.

DSL-based approaches have a scaling issue because the search space increases exponentially with the number of primitives. Ainooson [2] alleviates this difficulty by defining high-level primitives that embody specialized search strategies. We compare and discuss their performance in the evaluation section.

Johnson et al. [14] report on a psychological study of ARC. It reveals that humans use object-centric mental representations to solve ARC tasks. This is in contrast with existing solutions that are based on grid transformations. Interestingly, the tasks that are found the most difficult by humans are those based on logics (e.g., an exclusive-or between grids) and symmetries (e.g., rotation), precisely those most easily solved by DSL-based approaches.

The study exhibits two challenges: (1) the need for a large set of primitives, especially about geometry; (2) the difficulty to identify objects, which can be only visible in part due to overlap or occlusion. A valuable resource is LARC, for Language-annotated ARC [1], collected by crowd-sourcing.

It provides for most training tasks one or several natural programs that confirm the object-centric and declarative nature of human representations. A natural program is short textual descriptions produced by a participant that could be used by another participant to generate test output grids (without access to the training examples).

Beyond the ARC benchmark, a number of work has been done in the domain of program synthesis, which is also known as program induction or programming by examples (PbE) [17]. An early approach is Inductive Logic Programming (ILP) [19], where target predicates are learned from symbolic representations.

PbE is used in the FlashFill feature of Microsoft Excel 2013 to learn complex string processing formulas from a few examples [18]. Dreamcoder [8] alternates a wake phase that uses a neurally guided search to solve tasks, and a sleep phase that extends a library of abstractions to compress programs found during wake. Bayesian program learning was shown to outperform deep learning at parsing and generating handwritten world’s alphabets [16].


This paper is available on Arxiv under CC 4.0 license.