This paper is available on arxiv under CC 4.0 license.

**Authors:**

(1) Jingjing Wang, School of Computing;

(2) Joshua Luo, The Westminster Schools;

(3) Grace Yang, South Windsor High School;

(4) Allen Hong, D.W. Daniel High School;

(5) Feng Luo, School of Computing.

## Table of Links

## Abstract

This paper studies the design of cluster experiments to estimate the global treatment effect in the presence of spillovers on a single network. We provide an econometric framework to choose the clustering that minimizes the worst-case mean-squared error of the estimated global treatment effect. We show that the optimal clustering can be approximated as the solution of a novel penalized min-cut optimization problem computed via off-the-shelf semi-definite programming algorithms. Our analysis also characterizes easy-to-check conditions to choose between a cluster or individual-level randomization. We illustrate the method’s properties using unique network data from the universe of Facebook’s users and existing network data from a field experiment.

*Keywords*: Experimental Design, Spillover Effects, Causal Inference, Cluster Designs. JEL Codes: C10, C14, C31, C54

## 1 Introduction

Consider a (large) population of n individuals connected under a single observed network. Researchers are interested in conducting an experiment to estimate the global average treatment effect, i.e., the difference between the average effect of treating all versus none of the individuals in the population. Treating an individual may generate spillovers to her friends in the network. To capture such effects, researchers conduct a cluster experiment. Individuals are first partitioned into clusters. Within a cluster, either all units are assigned to the treatment or all units are assigned to the control group. Finally, researchers estimate treatment effects by taking a difference between the average outcomes of treated and control units (possibly adjusting for baseline covariates). The cluster design does not require modeling the dependence of individual outcomes on neighbors’ assignments, but it requires a choice of clusters and some assumptions on the extend of the spillovers along the network. For example, cluster experiments on online platforms require choosing a partition of the social network, and field experiments require choosing the unit of randomization, such as villages or regions. This raises the question of how many and which clusters to use in experiments.

Typical approaches in economic research assume prior knowledge of many independent clusters. There are many settings where this information is not available, and instead units in the population have different degrees of connections.[1] This paper provides an econometric framework to choose when and how to design the clusters in cluster experiments. Different from existing clustering algorithms geared towards community detection, we motivate the choice of the clusters based on the task of estimating global treatment effects. The choice of clustering must balance two competing objectives: the larger the clusters (and the smaller the number of clusters), the smaller the bias of the estimated global effect, but the larger its variance. We introduce an algorithmic procedure – entitled Causal Clustering – to choose the clustering that minimizes a weighted combination of the worst-case bias and variance as a function of the network and clusters. The worst-case approach encodes uncertainty over the dependence of individual outcomes on neighbors’ assignments. We study (i) whether to run a cluster-level instead of individual-level randomization; (ii) how to cluster individuals (and how many clusters to use).

We focus on a class of models where spillover effects are small relative to the outcomes’ variance but possibly non-negligible for inference. This is formalized in a novel framework of local asymptotics where individual outcomes depend arbitrarily on neighbors’ treatments, and, as n grows, spillovers from neighbors (and possibly also direct effects) converge to

zero, but at an arbitrary slow rate (e.g., slower than n −1/2 ). This framework encodes the researchers’ uncertainty on the presence (and magnitude) of spillover effects by modeling first-order neighbors’ effect local to zero, with the convergence rate capturing the expected magnitude of spillovers.[2] The local asymptotic framework we study is consistent with settings with small (but non-negligible) treatment and spillover effects, typical, for instance, in online experiments [e.g., Karrer et al., 2021]. We characterize optimal clustering as a function of the expected magnitude of the largest spillover effects that the experiment can generate. The largest size of the spillover effects is a key input in our algorithms, whose characterization, in practice, is necessary for the design of the experiment but can be challenging. This parameter can be informed by previous experiments, in the same spirit of minimum detectable effects used in power analysis [e.g. Baird et al., 2018], or using some particular modeling assumptions. We provide guidance to practitioners in Section 6.

Our analysis proceeds as follows. First, we provide a formal characterization of the worstcase bias and variance. We show that the worst-case bias is closely related to a particular notion of between-clusters connectedness, defined as the per-individual average number of friends in other clusters. The worst-case variance can potentially be an arbitrary function of within-clusters covariances and between-clusters covariances: individuals in the same cluster have identical assignments and in different clusters may share common neighbors. We show that the variance only depends on the average squared cluster size, up to an asymptotically negligible error. This result formalizes the intuition that a larger number of clusters, with a small variance in cluster size, decreases the variance of the estimator.

We draw the implications of these results for choosing between a cluster experiment (for a given clustering) or assigning treatments independently between individuals (i.e., Bernoulli design or completely randomized design). Suppose the magnitude of the spillover effects is smaller than the square root of the number of clusters. In that case, the variance component dominates the bias, and a Bernoulli design is preferred (where a Bernoulli design is a special case of a cluster design with clusters containing a single unit). Vice-versa, a cluster design is preferred if the bias dominates the variance. Intuitively, because our objective trades off the bias and variance of the estimator, whenever the number of clusters is small, it is best to run a Bernoulli design for any value of spillover effects local to zero. On the other hand, if the number of clusters is sufficiently large, and the cluster design appropriately controls the bias of the estimator, a cluster design is preferred. We provide practitioners with a simple decision rule between cluster and Bernoulli designs that only depends on the number of clusters and the expected spillover effects’ magnitude.

We then turn to the design of the optimal clustering, where the choice is not whether to run a cluster or Bernoulli design, but rather which clustering to use. The choice of optimal clustering reduces to a novel penalized minimum cut optimization problem, with a penalty that depends on the variation in clusters’ size. The program admits a convenient formulation as a sequence of trace-optimization problems, each solved via off-the-shelf semidefinite programming algorithms.

We provide an empirical application using unique network data from the universe of Facebook users. We show that our procedure provides an explicit ranking between different clustering algorithms implemented at scale at Facebook. We also illustrate trade-offs in the choice of the clustering algorithm and properties of the graph. A second application using network data from Cai et al. [2015] illustrates the method’s advantages for choosing clusters in field experiments. We show that the choice of the clusters based on village identity is sub-optimal in this application, because the number of village clusters is too small relative to the optimal clustering. We present an alternative choice of clusters.

This paper connects to the literature on experimental design, causal inference with spillover effects, and clustering. Existing methods for experimental designs with spillover effects include cluster and saturation designs, studied in Baird et al. [2018], Basse and Feller [2016], Karrer et al. [2021], Pouget-Abadie [2018], Taylor and Eckles [2018], Viviano [2020b], among others. These papers provide an analysis of the properties of particular designs for a given clustering or clustering algorithm. Different from the current paper, these references either do not study the question of the optimal clustering algorithm for experimental design, or only provide heuristic comparisons of different clustering methods.

A particular class of clustering algorithms is ε-net clustering algorithms – which consist of assigning sequentially individuals in the same neighborhood to the same cluster [Eckles et al., 2017, Ugander et al., 2013]. Variants of these algorithms have been recently studied in Leung [2022] for spatial spillovers and Faridani and Niehaus [2022] in more general nonEuclidean spaces. These papers provide an optimal rate of the clusters’ size as a function of how interference decays in space for approximately unbiased estimators. Here, instead, we provide an explicit trade-off and novel characterization of the bias and variance, leveraging local asymptotics. It allows us to characterize the optimal clustering as the solution of a trace-optimization program (different from ε-net clustering). The comparison of a cluster and Bernoulli designs through local asymptotics is also a novel contribution.

Additional references of experiments with networks in the absence of cluster experiments are Basse and Airoldi [2018b], and Jagadeesan et al. [2020] for estimating direct instead of global average treatment effects studied here; Kang and Imbens [2016] who study encouragement designs, without focusing on the problem of mean-squared error optimal designs; Viviano [2020a] for experiments on networks using information from a pilot study; Basse and Airoldi [2018a] discuss limitations of design-based causal inference under interference. None of these paper study (optimal) cluster designs.

The literature on treatment effects under network interference includes Aronow and Samii [2017], Hudgens and Halloran [2008], Manski [2013], Leung [2020], Athey et al. [2018], Goldsmith-Pinkham and Imbens [2013], S¨avje et al. [2021], Ogburn et al. [2017], Manresa [2013], Li and Wager [2022], Leung [2022], among others. The assumption that an individual depends on first-order degree connections is most closely related to Leung [2020].3 None of the above references study experimental designs. Finally, we relate more broadly to the literature on clustering in economics – see Wooldridge [2003], Leung [2023], Abadie et al. [2017] and references therein – with the difference that here we focus on optimal clustering, rather than taking the clusters as given – and the literature on graph clustering, including Von Luxburg [2007], Newman [2013a,b], Lei [2019], Lei et al. [2020], Li et al. [2022], and references therein. Different from this last set of papers on graph clustering, this paper focuses on treatment effect estimation, instead of community detection.

The remainder of the paper is organized as follows: Section 2 presents the setup; Section 3 characterizes the bias and variance and contrast cluster and Bernoulli designs; Section 4 formalizes the optimization problem for the optimal clustering; Section 5 presents two applications and numerical studies and Section 6 contains recommendations for practice.

[1] For example, when using villages as clusters, individuals may interact in the same and nearby villages. See Egger et al. [2022] for an example in cash-transfer programs.

[2] The assumption of spillovers within first-order neighbors can be relaxed here by assuming that higher order spillovers are an order of magnitude smaller than first-order spillovers.