Doumanis, Ioannis
ORCID: 0000-0002-4898-7209, Tsioutas, Konstantinos and Xylomenos, George
(2026)
An AI-Driven Multi-Feature Approach for Synchronisation and QoE Assessment in Network Music Performance.
Applied Sciences, 16
(12).
p. 5919.
Preview |
PDF (VOR)
- Published Version
Available under License Creative Commons Attribution. 1MB |
Official URL: https://doi.org/10.3390/app16125919
Abstract
Network Music Performance (NMP) refers to remote musical collaboration over a network in applications such as music education, music production, and live performance. In NMP, synchronisation is a critical factor in musicians’ Quality of Experience (QoE). This interpersonal coordination of musical actions is highly sensitive to variable network conditions, particularly to end-to-end delay and signal degradation. Existing evaluations rely mainly on subjective questionnaires or isolated objective descriptors, creating a gap for a unified metric that quantifies synchrony directly from performance signals. To address this gap, we propose the Objective Synchrony Index (OSI), an AI-driven metric that quantifies ensemble synchrony from paired NMP recordings. We computed OSI using a two-tower multi-task convolutional recurrent neural network (CRNN) that estimates synchrony-relevant descriptors from paired Musician A and Musician B audio streams. We introduce two OSI variants: timing-OSI, which captures temporal coordination through offsets, onsets, beats, and tempo coherence; and ensemble-OSI, which extends this formulation by integrating chord agreement and signal fidelity to reflect structural and perceptual aspects of ensemble interaction. We evaluated OSI using recordings from two NMP studies in which eleven pairs of musicians performed under systematically varied delay and sampling-rate conditions. After each performance, musicians completed QoE questionnaires, allowing us to relate OSI and its components to subjective ratings using repeated-measures correlation. Results showed that, under delay, timing-OSI decreases as latency increases and demonstrates construct validity against subjective QoE measures. Higher synchrony-OSI was associated with greater perceived synchronisation and satisfaction, and with lower perceived delay, irritation, and effort to follow a partner. These relationships were most consistent for offset synchrony and most selective for onset synchrony, while beat and tempo remained relatively stable. Under audio-quality degradation, ensemble-OSI remained relatively stable across sampling rates and did not significantly track subjective QoE as a single predictor. Instead, modest component-level associations suggested that satisfaction was higher when temporal stability and fidelity were preserved, whereas irritation was more closely related to reduced chord agreement. Together, these findings support timing-OSI as a promising objective synchrony metric for delay-impaired NMP, while showing that the extended ensemble-OSI requires further perceptual calibration for audio-quality degradations.
Repository Staff Only: item control page
Lists
Lists