Poster Abstract: Leveraging General-Purpose Audio Datasets for Vibration-Based Crowd Monitoring in Stadiums

May 1, 2025·
Yen-Cheng Chang
,
Jesse Codling
,
Yiwen Dong
,
Jiale Zhang
,
Jiasi Chen
,
Hae Young Noh
,
Pei Zhang
· 0 min read
Abstract
Crowd monitoring in sports stadiums is important to enhance public safety and improve audience experience. Existing approaches mainly rely on cameras and microphones, which can cause significant disturbances and often raise privacy concerns. In this paper, we sense floor vibration, which provides a less disruptive and more non-intrusive way of crowd sensing, to predict crowd behavior. However, since the vibration-based crowd monitoring approach is newly developed, one main challenge is the lack of training data due to sports stadiums are usually large public spaces with complex physical activities. To overcome this challenge, we present Vibration Leverage Audio (ViLA), a vibration-based method that reduces the dependency on labeled data by pre-training with unlabeled cross-modality data. ViLA first pre-trains a model in an unsupervised manner using commonly available audio datasets and then fine-tunes the model with a small amount of labeled vibration data. Our real-world experiments demonstrate that pre-training the vibration model using publicly available audio data (YouTube8M) achieved up to a 4.5× accuracy improvement compared to the model without audio pre-training.
Type
Publication
Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems