Frequency-Temporal Feature Integration for Compressed Video Action Recognition


Zhou Jiang wan (Beijing University of Posts and Telecommunications), Yue Ming (Beijing University of Posts and Telecommunications)
The 35th British Machine Vision Conference

Abstract

Compressed domain action recognition aims to identify human actions by directly leveraging I-frames and P-frames extracted from partially decoded compressed videos. Existing works usually adopted Transformer-based architectures, such as ViT, to perform temporal motion modeling. However, these frameworks tended to overlook high-frequency components (e.g., edge textures and local motion boundaries), which compromised their ability to construct precise spatiotemporal semantic representations. To tackle this issue, we present a Frequency-Temporal feature integration framework for compressed video action recognition, which effectively combines high-frequency edge information with low-frequency global context from both temporal and frequency perspectives. At the feature extraction stage, we design a Frequency-Aware and Temporal-Spatial Embedding (FTE) module to mitigate performance degradation caused by the ViT framework’s insensitivity to high-frequency cues. During feature fusion and prediction, we introduce Frequency-Temporal Interaction Attention (FTIA), which facilitates hierarchical integration between temporal dynamics and high-frequency features, enhancing sensitivity to motion-related regions. Extensive experiments on Kinetics-400, UCF-101, and HMDB-51 demonstrate that FreqTNet achieves state-of-the-art performance in the compressed domain while maintaining computational efficiency.

Citation

@inproceedings{wan_2025_BMVC,
author    = {Zhou Jiang wan and Yue Ming},
title     = {Frequency-Temporal Feature Integration for Compressed Video Action Recognition},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_445/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection