Technology to Make Data Processing More Efficient and Safe
ETU "LETI" scientists, together with Smartilizer, researched a new approach to data analysis, which does not require transferring data from the source to the centralized repository.
The researchers tested the effectiveness of existing open-source systems on different data sets: sensor readings from moving cars and X-rays from pneumonia patients. To test the applicability in IoT systems, the authors evaluated the following features: ease of use and installation, analysis capabilities, accuracy, and performance. The paper was published in the journal Sensors.
The Internet of things (IoT) describes the network of physical objects— "things" or objects —that are embedded with sensors, software, and other technologies to connect and exchange data with other devices and systems over the Internet. For example, in the smart home concept, appliances are connected to each other and external control, allowing controlling from a cell phone. The standard architecture of an IoT system consists of three layers. The first (device layer) is the hardware devices that produce and collect the data. The middle layer is responsible for transferring data from the devices to the application layer, which provides services or applications that integrate or analyze the data.
Traditional approaches to such systems involve data collection from IoT devices into one centralized repository for further analysis. However, they are not always applicable due to a large amount of collected data, communication channels with limited bandwidth, security and privacy requirements. Significant disadvantages are an increase in total processing time, network traffic, and risk of unauthorized access to the data. Therefore, new approaches to the analysis of such data are being developed. One of them is federated learning that allows analyzing data directly on sources and federating the results of each analysis to yield a result as traditional centralized data processing. There is less load and risk because all the data is processed locally.
One of the main applications of this AI-based technology is the security and privacy of personal data collected around the world every second. This issue has become extremely important after the adoption of several legislative regulations, such as the GDPR in the European Union, CCPA in the USA, and PDPA in Singapore. They require transparent processing of personal data with an explicitly stated purpose and the consent of the data subject.
For example, in a smart home, the data sources are the devices in each apartment: the alarm clock, the bathroom faucet, the underfloor heating, and the lights. In the traditional approach, all data from each apartment is collected in a centralized repository, which is used to train a model (such as a neural network). At the alarm call, such a model "knows" that heating should start warming up, the bath should be filled, and the lights in certain rooms should turn on. On the one hand, data collection is necessary to train such a model because the more data, the smarter the model is. On the other hand, information about you: when you get up, when you go to the bathroom, when you eat, and so on, becomes available to someone else, and you do not know how it will be used. According to the principles of federated learning, the data will not leave your apartment.
ETU "LETI" scientists tested systems from different companies: Google, Webank, Baidu, the OpenMined community, and others. The authors conducted a series of experiments with them on three data sets. The first contained information about the parameters of a moving passenger car (average speed, engine load, etc.) and assessed the driving style, the road surface, and the traffic state. The second included similar signal data for dumpers, and its analysis provided information about the machine operation. Finally, the third set was X-ray images from 5,232 patients (3,383 images with signs of pneumonia and 1,349 normal images). The analysis allowed us to distinguish sick people from healthy ones.
"We compared all currently available open-source federated learning frameworks and evaluated their capabilities. It turned out that the approach gives fairly accurate results in all three cases. However, not all of them are suitable for industrial development now. Some systems are still in their early stages and not ready for widespread use. Nevertheless, the federated learning technology itself is very relevant and rapidly developing," Ivan Kholod, Dean of the Faculty of Computer Science and Technology at ETU "LETI," says.
"For example, right now, given the heavy load of servers that process data on coronavirus infection, its spread, and other aspects, it will be possible to quickly analyze data from different hospitals and compile statistics with this technology. Patients' rights would not be violated, because patient information would not be transferred outside the hospital."
Currently, the Department of Computer Engineering of ETU "LETI" is developing its own federated learning framework. The group of developers includes Ivan Kholod, Dean of the Faculty of Computer Science and Technology, Evgeniya Novikova, Associate Professor of the Department of Information Systems, Dmitry Fomichev and Evgeny Yanaki, 1st-year master's students of the faculty, and Evgeny Shalugin, a 4th-year undergraduate student of the faculty.