This study explores the application of data fusion (DF) in tele-rehabilitation (TR). The idea is to combine the patient’s cognitive and motor information captured during remote sessions using a low-cost webcam with the purpose of training a Machine Learning (ML) model to assess the patient’s outcomes. Data is captured by analysing video sequences of rehabilitation sessions using Face Expression Recognition (FER) and Pose Estimation (PE) models, respectively providing cognitive and motor information. Furthermore, we apply a Long Short-Term Memory (LSTM) model to analyse and classify temporal sequences of both facial expression and movement data. Two DF techniques, i.e., early fusion and intermediate fusion, are compared by training an LSTM model on a dataset composed of skeletal movements data (UI-PRMD dataset) and facial mesh data. The early fusion approach combines raw features, whereas …