Schedule Mon Tue Wed Thu

BMVC conference papers, supplementary material and video presentations can be found at: BMVC Papers (To be added)

BMVC workshop papers can be found at: BMVC Workshop Papers (To be added)

Keynote 1 - Phil Torr
14:00 - 15:00
14:00 - 15:00 Title: To be announced

Abstract: Details to be confirmed.

Location: Main Hall, Cutlers' Hall
Oral Session 1 - Multimodal Learning
09:45 - 11:00
Chair: To be announced 09:45 717
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Zizhao Li, Zhengkang Xiang, Joseph West, Kourosh Khoshelham
10:00 768
Interpretable Text-Guided Image Clustering via Iterative Search
Bingchen Zhao, Oisin Mac Aodha
10:15 705
ALSA: Anchors in Logit Space for Out-of-Distribution Accuracy Estimation
Chenzhi Liu, Mahsa Baktashmotlagh, Yanran Tang, Zi Huang, Ruihong Qiu
10:30 509
Dynamic Convolution and Graph-Coupled Attention for Cross-Subject EEG-Vision Decoding
Tianyu Zhang, FAN WAN, Kaili Sun, Xingyu Miao, Yueming Sun, Minye Shao, Yang Long
10:45 69
TACTFL: Temporal Contrastive Training for Multi-modal Federated Learning with Similarity-guided Model Aggregation
Guanxiong Sun, Majid Mirmehdi, Zahraa S. Abdallah, Raúl Santos-Rodriguez, Ian James Craddock, Telmo M Silva Filho
Location: Main Hall, Cutlers' Hall
Poster Session 1 - Multimodal Learning
11:00 - 12:30
11:00 - 12:30
Papers Presented
1209Language-Guided Reinforcement Learning for Hard Attention in Few-Shot LearningBahareh Nikpour, Narges Armanfard
493Audio-Visual Separation with Hierarchical Fusion and Representation AlignmentHan Hu, Dongheng Lin, Qiming Huang, Yuqi Hou, Hyung Jin Chang, Jianbo Jiao
666Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference OptimizationAlberto Compagnoni, Davide Caffagni, Nicholas Moratelli, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
722Unsupervised Multimodal Deepfake Detection Through Explicit Intra-Modal and Cross-Modal Inconsistency DiscoveryMulin Tian, Mahyar Khayatkhoei, Joe Mathai, Wael AbdAlmageed
644Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction ProblemsQihao Yuan, Kaili Li, Jiaming Zhang
284Multimodal Feature Collaboration and Fusion for Fine-Grained Action RecognitionXinyu Bian, Dongliang Chang, Yuqi Yang, Lei Chen, Zhanyu Ma
1037Generative Data Augmentation for Object Point Cloud SegmentationDekai Zhu, Stefan Gavranovic, Flavien Boussuge, Benjamin Busam, Slobodan Ilic
859Improving Multimodal Distillation for 3D Semantic Segmentation under Domain ShiftBjörn Michele, Alexandre Boulch, Gilles Puy, Tuan-Hung VU, Renaud Marlet, Nicolas Courty
392UMM: A Unified Multi-Modal Model for Low-Level Vision Tasks with Dual-Driven PromptingZiqi Luo, Jinxiang Lai, Ruitao Chen, Jinyu Yang, Bin-Bin Gao, Qiang Nie, Jun Liu, Jinfan Wang, Feng Zheng
113Prompt-Based Exemplar Super-Compression and Regeneration for Class-Incremental LearningRuxiao Duan, Jieneng Chen, Adam Kortylewski, Alan Yuille, Yaoyao Liu
1137ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding UpdatesHamidreza Dastmalchi, Aijun An, Ali Cheraghian
588CoT-SD: Chain-of-Thought Semantic DenoisingYanlin Jiang, Yuchen Liu, Mingren Liu
1123Evaluating Self-Supervised Learning in Medical Imaging: A Systematic Investigation of Robustness, Generalizability, and Multi-Domain ImpactValay Bundele, Karahan Sarıtaş, Bora Kargi, Oğuz Ata Çal, Kıvanç Tezören, Zohreh Ghaderi, Hendrik Lensch
650Prompt-Informed Reinforcement Learning for Visual Coverage Path PlanningVenkat Margapuri
294Task Progressive Curriculum Learning for Robust Visual Question AnsweringAhmed Akl, Abdelwahed Khamis, Zhe Wang, Ali Cheraghian, Sara Khalifa, Kewen Wang
357What Can We Learn from Harry Potter? An Exploratory Study of Visual Representation Learning from Atypical VideosQiyang Sun, Qiming Huang, Yang Yang, Hongjun Wang, Jianbo Jiao
828Leveraging Modality Tags for Enhanced Cross-Modal Video RetrievalAdriano Fragomeni, Dima Damen, Michael Wray
1030Language-Guided Decision Override for Adaptive and Retraining-Free Video Anomaly DetectionRyo Moriyama, Shin Suzuki, Naoshi Kaneko, Kazuhiko Sumi
949Prompt Image to Watch and Hear: Multimodal Prompting for Parameter-Efficient Audio-Visual LearningKai Wang, Shentong Mo, Yapeng Tian, Dimitrios Hatzinakos
37Cross-Modal Scene Semantic Alignment for Image Complexity AssessmentYuqing Luo, YIXIAO LI, Jiang Liu, Jun Fu, Hadi Amirpour, Guanghui Yue, Baoquan Zhao, Padraig Corcoran, Hantao Liu, Wei Zhou
376Multi-Rationale Explainable Object Recognition via Contrastive Conditional InferenceAli Rasekh, Sepehr Kazemi Ranjbar, Simon Gottschalk
408Multimodal Hate Detection Using Dual-Stream Graph Neural NetworksJiangbei Yue, Shuonan Yang, Tailin Chen, Jianbo Jiao, ZEYU FU
903Toward Robust Audio-Visual Synchronization Detection in Egocentric Video with Sparse Synchronization EventsJordan Voas, Wei-Cheng Tseng, Benoit Vallade, Alex Mackin, David Higham, David Harwath
898FaceCPT: Toward Cross-Modal Facial Representation Learning with Face-Caption Pre-TrainingMd Mahedi Hasan, Shoaib Meraj Sami, Nasser Nasrabadi, Jeremy M. Dawson
84Continual Vision-and-Language NavigationSeongJun Jeong, Gi-Cheon Kang, Seongho Choi, Joochan Kim, Byoung-Tak Zhang
479CleverDistiller: Simple and Spatially Consistent Cross-modal DistillationHariprasath Govindarajan, Maciej Wozniak, Marvin Klingner, Camille Maurice, B Ravi Kiran, Senthil Yogamani
232Unsupervised Video Continual Learning via Non-Parametric Deep Embedded ClusteringNattapong Kurpukdee, Adrian G. Bors
992Dual Polarity Prompts with Stochastic Entropy Perturbation for Label NoiseChanghui Hu, Bhalaji Nagarajan, Ricardo Marques, Petia Radeva Ivanova
922FSLC: Fast Scoring with Learnable Coreset for Zero-shot Industrial Anomaly DetectionSongtao Ni, Yuxin Li, Xu Zhao
875Lost in Translation? Vocabulary Alignment for Source-Free Adaptation in Open-Vocabulary Semantic SegmentationSilvio Mazzucco, Carl Persson, Mattia Segu, Pier Luigi Dovesi, Federico Tombari, Luc Van Gool, Matteo Poggi
281CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image CollectionsMohamed Fazli Mohamed Imam, Rufael Fekadu Marew, Jameel Hassan Abdul Samadh, Mustansar Fiaz, Alham Fikri Aji, Hisham Cholakkal
43Split Matching for Inductive Zero-shot Semantic SegmentationJialei Chen, Xu Zheng, Dongyue Li, Chong Yi, Seigo Ito, Danda Pani Paudel, Luc Van Gool, Hiroshi Murase, Daisuke Deguchi
458Intra-Modal Divergence-Weighted Distillation for Vision-Language ModelsYouva Addad, Alexis Lechervy, Frédéric Jurie
1071Stabilizing Open-Set Test-Time Adaptation via Primary-Auxiliary Filtering and Knowledge-Integrated PredictionByung-Joon Lee, Jin-Seop Lee, Jee-Hyong Lee
517Catching the Unknown with Limited Data: Bi-Directional Prompt Tuning in CLIP for Few-Shot Open-Set AdaptationMoloud Abdar, Md Mehedi Hasan, Biplab Banerjee, Abbas Khosravi, Pietro Lio
373Back To The Drawing Board: Rethinking Scene-Level Sketch-Based Image RetrievalEmil Demić, Luka Čehovin Zajc
124Mask2Act: Predictive Multi-Object Tracking as Video Pre-Training for Robot ManipulationJunbo Zhang, Kaisheng Ma
1081ImProvShow: Multimodal Fusion for Image Provenance SummarizationAlexander Black, Jing Shi, Yifei Fan, John Collomosse
857Lost in Time: A New Temporal Benchmark for VideoLLMsDaniel Cores, Michael Dorkenwald, Manuel Mucientes, Cees G. M. Snoek, Yuki M Asano
364Learning from Silence and Noise for Visual Sound Source LocalizationXavier Juanola, Giovana Morais, Magdalena Fuentes, Gloria Haro
866Image Recognition with Vision and Language Embeddings of VLMsIllia Volkov, Nikita Kisel, Klara Janouskova, Jiri Matas
370RGB-Event Fusion for Robust Lane DetectionJingtao Dong, Hao Zhuang, Hao Yang, Liyuan Pan
Location: Cutlers' Hall
Oral Session 2 - Generative Models and Synthesis
16:30 - 17:30
Chair: To be announced 16:30 26
Guiding a diffusion model with itself using sliding windows
Nikolas Adaloglou, Tim Kaiser, Damir Iagudin, Markus Kollmann
16:45 931
Diffusion Transformer-to-Mamba Distillation for High-Resolution Image Generation
Yuan Yao, Yicong Hong, Difan Liu, Long Mai, Feng Liu, Jiebo Luo
17:00 700
Q-Align: Alleviating Attention Leakage in Zero-Shot Appearance Transfer via Query-Query Alignment
Namu Kim, Wonbin Kweon, Minsoo Kim, Hwanjo Yu
17:15 182
eXtended Multimodal Composite Association Score (xMCAS): A Gender Inclusive Approach to Measurement of Bias in Text-To-Image Diffusion Models
Abhishek Mandal, Susan Leavy, Suzanne Little
Location: Main Hall, Cutlers' Hall
Poster Session 2 - Generative Models and Synthesis
15:00 - 16:30
15:00 - 16:30
Papers Presented
864Geometry-Aware Diffusion Models for Multiview Scene InpaintingAhmad Salimi, Tristan Ty Aumentado-Armstrong, Marcus A Brubaker, Konstantinos G. Derpanis
330FaceCrafter: Identity-Conditional Diffusion with Disentangled Control over Facial Pose, Expression, and EmotionKazuaki Mishima, Antoni Bigata Casademunt, Stavros Petridis, Maja Pantic, Kenji Suzuki
1006Verifier Matters: Enhancing Inference-Time Scaling for Video Diffusion ModelsLorenzo Baraldi, Davide Bucciarelli, Zifan Zeng, Chongzhe Zhang, Qunli Zhang, Marcella Cornia, Lorenzo Baraldi, Feng Liu, zheng hu, Rita Cucchiara
627TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud GenerationZechaoGuan, Feng yan, Shuai Du, Lin Ma, Qingshan Liu
159Identity-Motion Trade-offs in Text-to-Video GenerationYuval Atzmon, Rinon Gal, Yoad Tewel, Yoni Kasten, Gal Chechik
138SemanticControl: A Training-Free Approach for Handling Loosely Aligned Visual Conditions in ControlNetWoosung Joung, Daewon Chae, Jinkyu Kim
261ADIR: Adaptive Diffusion for Image ReconstructionShady Abu-Hussein, Tom Tirer, Raja Giryes
128Uncertainty Diffusion: Parameter-Efficient Depth Refinement via Uncertainty-Guided Diffusion ModelsJeng-Huo Tzeng, Chuan-Yuan Huang, Kuan-Wen Chen
766Specify and Edit: Overcoming Ambiguity in Text-Based Image EditingEkaterina Iakovleva, Fabio Pizzati, Philip Torr, Stéphane Lathuilière
77Training-Free Synthetic Data Generation with Dual IP-Adapter GuidanceLuc Boudier, Loris Manganelli, Eleftherios Tsonis, Nicolas Dufour, Vicky Kalogeiton
999Lid-Lab-NeRF: Generating Temporally Consistent, Labelled LiDAR Point Clouds using Neural Radiance FieldsShrestha Srivastava, Vaibhav Kumar
38GC-Font: Few-Shot Font Generation via Global Contextual Feature ModellingWeiran Chen, Guiqian Zhu, Ying Li, Yi Ji, Chunping Liu
593PanoHair: Detailed Hair Strand Synthesis on Volumetric HeadsShashikant Verma, Shanmuganathan Raman
558EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidanceZicheng Duan, Yuxuan Ding, Chenhui Gou, Ziqin Zhou, Ethan Smith, Lingqiao Liu
88CRCE: Coreference-Retention Concept Erasure in Text-to-Image Diffusion ModelsYuyang Xue, Edward Moroshko, Feng Chen, Jingyu Sun, Steven G. McDonagh, Sotos Tsaftaris
896Llama Learns to Direct: DirectorLLM for Human-Centric Video GenerationKunpeng Song, Tingbo Hou, Zecheng He, Haoyu Ma, Jialiang Wang, Animesh Sinha, Sam Tsai, Yaqiao Luo, Xiaoliang Dai, Li Chen, Xide Xia, Peizhao Zhang, Peter Vajda, Ahmed M. Elgammal, Felix Juefei-Xu
594Video Dataset Condensation with Diffusion ModelsZhe Li, Hadrien Reynaud, Mischa Dombrowski, Sarah Cechnicka, Franciskus Xaverius Erick, Bernhard Kainz
718PADS: Plug-and-Play 3D Human Pose Analysis via Diffusion Generative ModelingHaorui Ji, Hongdong Li
348PosBridge: Multi-View Positional Embedding Transplant for Identity-Aware Image EditingPEILIN XIONG, Junwen Chen, HONGHUI YUAN, Keiji Yanai
732TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion ModelsRiza Velioglu, Petra Bevandić, Robin Chan, Barbara Hammer
83RPD-Diff: Region-Adaptive Physics-Guided Diffusion Model for Visibility Enhancement under Dense and Non-Uniform HazeRuicheng Zhang, Puxin Yan, Zeyu Zhang, Yicheng Chang, Hongyi Chen, Zhi Jin
832SIMULDITEX: Single Image Multiscale & Lightweight Diffusion for Texture ModellingPierrick Chatillon, Julien Rabin, David Tschumperlé
257S2V2V: Training-Free Video-to-Video with Sparse Points and Motion GuidanceXinyu Zhang, Zicheng Duan, Dong Gong, Lingqiao Liu
602Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention MechanismJun Zheng, Jing Wang, Fuwei Zhao, Xujie Zhang, Xiaodan Liang
28Gen4Gen: Generative Data Pipeline for Generative Multi-Concept CompositionChun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, David Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H. T. Kung, Yubei Chen
154Seed-to-Seed: Unpaired Image Translation in Diffusion Seed SpaceOr Greenberg, Eran Kishon, Dani Lischinski
211Efficient Image Restoration via Latent Consistency Flow MatchingElad Cohen, Idan Achituve, Idit Diamant, Arnon Netzer, Hai Victor Habi
628MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene ReconstructionYingshuang Zou, Yikang Ding, Chuanrui Zhang, Jiazhe Guo, Bohan Li, Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Haoqian Wang
962LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and MambaYunxiang Fu, Chaoqi Chen, Yizhou Yu
831LOGen: Toward LiDAR Object Generation by Point DiffusionEllington Kirby, Mickael Chen, Renaud Marlet, Nermin Samet
10643D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D ShapesTejaswini Medi, Arianna Rampini, Pradyumna Reddy, Pradeep Kumar Jayaraman, Margret Keuper
755Audio-Guided Visual Editing with Complex Multi-Modal PromptsHyeonyu Kim, Seokhoon Jeong, Seonghee Han, Chanhyuk Choi, Taehwan Kim
704JOG3R: Towards 3D-Consistent Video GeneratorsChun-Hao Paul Huang, Niloy J. Mitra, Hyeonho Jeong, Jae Shin Yoon, Duygu Ceylan
366HuGeDiff: 3D Human Generation via Diffusion with Gaussian SplattingMaksym Ivashechkin, Oscar Mendez, Richard Bowden
784UDT : Unsupervised Discovery of Transformations between Fine-Grained Classes in Diffusion ModelsYoungjae Choi, Hyunsuh Koh, Hojae Jeong, ByungKwan Chae, Sungyong Park, Heewon Kim
852LoFT: LoRA-fused Training Dataset Generation with Few-shot GuidanceJae Myung Kim, Stephan Alaniz, Cordelia Schmid, Zeynep Akata
782Is Safety Checker Still Safe? A Study on the Covert NSFW TextXin Li, Kai Chen, XUE YANG, Weijun Shan, Jun Yu, Qing Li
753Boosting Camera Motion Control for Video Diffusion TransformersSoon Yau Cheong, Duygu Ceylan, Armin Mustafa, Andrew Gilbert, Chun-Hao Paul Huang
Location: Cutlers' Hall