This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Berrenur Saylam;
(2) Ozlem Durmaz ¨ ˙Incel.
Table of Links
Federated Learning: Collaborative and Privacy-Preserving Model Training
State of the Art Federated Learning Software Platforms and Testbeds
Sensors and Edge-Sensing Devices
Federated Learning Applications on Sensing Devices
Conclusions, Acknowledgment, and References
Abstract
The ability to monitor ambient characteristics, interact with them, and derive information about the surroundings has been made possible by the rapid proliferation of edge sensing devices like IoT, mobile, and wearable devices and their measuring capabilities with integrated sensors.
Even though these devices are small and have less capacity for data storage and processing, they produce vast amounts of data. Some example application areas where sensor data is collected and processed include healthcare, environmental (including air quality and pollution levels), automotive, industrial, aerospace, and agricultural applications.
These enormous volumes of sensing data collected from the edge devices are analyzed using a variety of Machine Learning (ML) and Deep Learning (DL) approaches.
However, analyzing them on the cloud or a server presents challenges related to privacy, hardware, and connectivity limitations. Federated Learning (FL) is emerging as a solution to these problems while preserving privacy by jointly training a model without sharing raw data. In this paper, we review the FL strategies from the perspective of edge sensing devices to get over the limitations of conventional machine learning techniques.
We focus on the key FL principles, software frameworks, and testbeds. We also explore the current sensor technologies, properties of the sensing devices and sensing applications where FL is utilized. We conclude with a discussion on open issues and future research directions on FL for further studies.
Keywords: Federated Learning, Edge Computing, Sensing Devices, Mobile Sensing, Sensors
1. Introduction
Sensor-integrated devices, such as IoT devices, wearables, smartphones, drones, and robots, make it possible to collect vast amounts of sensing data to learn more about or extract knowledge from the environment and the surrounding users. Such edge devices can have varied resources regarding memory, battery capacity, and computing power.
Even though some devices may appear more potent regarding computing capabilities, they are still considered edge devices because extra resources might not be provided Dey et al. (2013).
Many use cases and application areas utilize the sensing data from these edge devices. For example, GPS data from a phone can be used to determine how congested the local traffic is, or motion-sensor data from a wearable can be used to determine a person’s daily activity patterns Atzori et al. (2010).
Mainly, machine learning techniques are commonly applied to sensing data. The total amount of collected data has increased as these devices are used by masses and connected via wireless interfaces, making it easy to transfer data for further processing.
In traditional learning methods, data is usually gathered in one location on a server-type machine, and then learning algorithms are trained and applied to the aggregated data.
A typical machine learning pipeline involves gathering data from various sensors to extract a significant result following an objective. However, several challenges arise for sensor data collection and processing with a variety of devices. Since different devices have different hardware, even sensor data acquisition may differ.
There may also be differences in the volume of data gathered, the data sampling frequencies, the battery capacity of the devices, and the cost of communication used to transfer the data to the central data processing unit.
It is also challenging to get meaningful results without sufficient, highquality data, which is often the case with sensor data. For example, each device may not capture the same data distribution due to its placement or sensing trajectory. For instance, the human activity recognition field corresponds to classifying performed activity types.
Since various people may have different characteristics, these types may vary amongst individuals. As a result, learning algorithms’ outcomes, mainly supervised ones, may be skewed because the training data does not contain a diverse set of classes. This requirement forces building a model with numerous classes, preferably with a significant amount of data for each class.
Besides, analyzing the gathered data at one location brings another challenge: typically, this raises a privacy concern. The GDPR (General Data Protection Regulation Voigt and Von dem Bussche (2017)) is a recentlyintroduced example law to secure users’ data. Additionally, the collection of data at one location increases the expense of communication.
Furthermore, applying algorithms as a global model may not be effective from a security perspective since the whole system can be corrupted with one successful cyber attack, especially for medical domains. Therefore, gathering a large volume of data in one location is not practical.
An alternative to centralized sensor data processing is to run the machine learning models directly on edge devices. However, edge devices have hardware limitations that reduce their ability to execute complex learning algorithms, particularly deep architectures. Deep learning algorithms require higher processing power and energy in addition to memory.
Consequently, we require new mechanisms to adapt the centralized and on-device models to modest computation restrictions Capra et al. (2019). Considering that resource efficiency and privacy are the two critical factors in many sensing applications for edge devices, combining data from many sources while maintaining data privacy is necessary to produce resource-efficient learning systems with such devices.
A promising paradigm for cooperative and privacy-preserving machine learning in dispersed contexts is federated learning (FL) Yang et al. (2019). FL has attracted considerable attention in several disciplines since it allows training on local data while keeping it private and decentralized. Google presented FL in 2016 as a solution to these problems while protecting user privacy.
The goal is to jointly train a model on each device without sharing raw data and produce meaningful results by only sharing model parameters with a central orchestrating unit Yu et al. (2017).
This paper aims to explore Federated Learning, a novel approach to learning to provide a state-of-the-art assessment from the viewpoint of sensorbased edge devices and provide an overview of FL applications on sensing devices. In literature, some survey studies Nguyen et al. (2021); Aledhari et al. (2020) focus on federated learning with IoT devices.
In this article, we examine FL from a broader perspective including sensing edge devices, such as IoT sensors, wearables, and other mobile devices. Most current surveys Nguyen et al. (2021); Xu et al. (2021); Wu et al. (2020); Sannara et al. (2021) on FL offer overviews of one specific application case without providing the overall state in the literature.
Sensors are measuring devices used in various applications, including medical, environmental, automotive, industrial, aerospace, and agriculture. They detect changes in their environment, enabling precise monitoring, control, and decision-making processes. Medical sensors like ECGs and pulse oximeters help diagnose heart conditions, while environmental sensors assess water quality, pH, and dissolved oxygen.
Automotive applications use light sensors, rain sensors, flow sensors, vibration sensors, and gyroscopes for stability and navigation accuracy. The diverse range of sensors used in various domains contributes to producing multimodal sensor data.
In this paper, we revisit this hot topic from the standpoint of boosting sensing devices, considering the practice of many sensing domains. Our contribution is divided into two parts: a comprehensive review of the state of the art on FL, including major sensing application domains, and state-of-the-art frameworks along with testbeds for the application of FL, which are not generally included in other surveys. Furthermore, we provide extensive discussions that are prominent for further studies.
The structure of this paper is shown in Figure 1. In Section 2, we describe the FL concept and working procedure, along with its primary challenges, methods, and measurements. We introduce the most cutting-edge platforms for using FL algorithms and a collection of testbeds to provide applicability and encourage future studies in Section 3. Section 4 describes the different kinds of sensors and devices and their properties.
In Section 5, we put together the application areas and current usage of FL techniques in real life. We discuss the state-of-the-art approaches for sensing devices, suggest further directions in Section 6, and conclude with Section 7.
This paper is available on Arxiv under a CC 4.0 license.