Comparing CNN and ViT in both Centralised and Federated Learning Scenarios a Pneumonia Diagnosis Case Study

Abstract

In the last few years, the healthcare industry has seen significant advances in medical image analysis, mainly driven by the substantial progress of Deep Learning (DL). Convolutional Neural Networks (CNNs) have been the reference model for image-processing tasks. Recently, however, the advent of Vision Transformers (ViTs) has challenged their dominance. In this work, we explore the potential of ViTs for pneumonia diagnosis, comparing their performance with CNNs using different learning approaches. Specifically, we assessed the behaviour of From-Scratch Learning (FSL) and Pre-Trained (PT) models, leveraging Transfer Learning (TL), to highlight their performance differences. Experiments are performed in a Microsoft Azure Cloud laboratory considering both centralised and distributed Federated Learning (FL) scenarios, proving that the latter helps to mitigate the potential biases contained in the dataset …