The Abstraction and Reasoning Corpus: A. Additional Details on the Mode

cover
11 Mar 2024

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Mattia Atzeni, EPFL, Switzerland and [email protected];

(2) Mrinmaya Sachan, ETH Zurich, Switzerland;

(3) Andreas Loukas, Prescient Design, Switzerland.

A. Additional Details on the Model

This section describes the LATFORMER architecture providing additional details that were not covered in Section 4.1. As mentioned in Section 4.1, it is possible to design convolutional neural networks that perform all considered transformations of the lattice. Figure 6 shows the architecture of the four expert models that generate translation, rotation, reflection and scaling masks.

Figure 6: Model architecture of all the mask experts that we considered.

All models are CNNs applied to the identity matrix. In the figure, we use the following notation:

• M(δ) T denotes an attention mask implementing a translation by δ along one dimension;

• M(90) R denotes an attention mask implementing a translation by 90◦ ;

• MF denotes an attention mask implementing a reflection along one dimension;

• M(h) S denotes an attention mask implementing an upscaling by h along one dimension.

Using Corollary 3.3, we can derive the kernels of the convolutional layers shown in Figure 6. These kernels are frozen at training time, the model only learns the gating function, denoted as σ in the figure. Notice that all the models follow the same overall structure. However, for scaling, we also learn an additional gate, denoted as σ(MS,M⊤ S ) in the Figure 6. This gate allows the model to transpose the mask and serves the purpose of implementing down-scaling operations (down-scaling is the transpose of up-scaling).

The composition of more actions can be obtained by combining different experts. This can be done either by chaining the experts or by matrix multiplication of the masks. In preliminary experiments, we did not notice any significant difference in performance between the two options and we rely on the latter in our implementation.