TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification


Darya Taratynova (Mohamed bin Zayed University of Artificial Intelligence), Alya Almsouti (Mohamed bin Zayed University of Artificial Intelligence), Beknur Kalmakhanbet (Mohamed bin Zayed University of Artificial Intelligence), Numan Saeed (Mohamed bin Zayed University of Artificial Intelligence), Mohammad Yaqub (Mohamed bin Zayed University of Artificial Intelligence)
The 35th British Machine Vision Conference

Abstract

Congenital heart defect (CHD) detection in ultrasound videos is hindered by image noise and probe positioning variability. While automated methods can reduce operator dependence, current machine learning approaches often neglect temporal information, limit themselves to binary classification, and do not account for prediction calibration. We propose Temporal Prompt Alignment (TPA), a method leveraging foundation image-text model and prompt-aware contrastive learning to classify fetal CHD on cardiac ultrasound videos. TPA extracts features from each frame of video subclips using an image encoder, aggregates them with a trainable temporal extractor to capture heart motion, and aligns the video representation with class-specific text prompts via a margin-hinge contrastive loss. To enhance calibration for clinical reliability, we introduce a Conditional Variational Autoencoder Style Modulation (CVAESM) module, which learns a latent style vector to modulate embeddings and quantifies classification uncertainty. Evaluated on a private dataset for CHD detection and on a large public dataset, EchoNet-Dynamic, for systolic dysfunction, TPA achieves state-of-the-art macro F1 scores of 85.40\% for CHD diagnosis, while also reducing expected calibration error by 5.38\% and adaptive ECE by 6.8\%. On EchoNet-Dynamic’s three-class task, it boosts macro F1 by 4.73\% (from 53.89\% to 58.62\%). The code will be provided upon acceptance.

Citation

@inproceedings{Taratynova_2025_BMVC,
author    = {Darya Taratynova and Alya Almsouti and Beknur Kalmakhanbet and Numan Saeed and Mohammad Yaqub},
title     = {TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_751/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection