Schedule Mon Tue Wed Thu

BMVC conference papers, supplementary material and video presentations can be found at: BMVC Papers

BMVC workshop papers can be found at: BMVC Workshop Papers

All the rooms listed are in Cutlers' Hall.

Keynote 3 - Angela Dai
09:00 - 10:00
09:00 - 10:00 Title: Can Transformers Speak Geometry?

Abstract: What if generating a 3D mesh were as natural as predicting the next word in a sentence? Autoregressive modeling has rapidly become a unifying learning paradigm across data modalities, across language to images, and now offers a compelling approach for 3D geometry. This talk explores how transformer-based autoregressive models enable mesh generation by representing meshes as sequences. Framing mesh generation as a next-token prediction problem enables new ways to handle the compact, irregular structure of human-designed 3D assets, directly compatible with downstream graphics and vision applications. We explore sequence formulation and data representation, and address practical challenges in scaling to high-resolution meshes and interactive synthesis. This will enable more accessible and democratized 3D content creation, paving the way for interactive design, rapid prototyping, and simulation-ready assets, and unlocking new possibilities for both creative and computational exploration of 3D geometry.

Location: Main Hall
Doctoral Consortium
10:00 - 13:00
Chairs: Dr Cass Zhixue Zhao and Dr Yang Long
Uncertainty Propagation and Robustness in Rotation Averaging
Yaroslava Lochman (Chalmers University of Technology)
Complex-valued Neural Networks in Computer Vision
Saurabh Yadav (IIIT Delhi)
Robust and Generalizable 3D Perception for Autonomous Systems
Maciej Wozniak (KTH Royal Institute of Technology)
Constructing and Maintaining Geometric Digital Twins of Road Conditions
Percy Pui Hei Lam (University of Cambridge)
Visual Perception and Reconstruction under Challenging Lighting Conditions
Ziteng Cui (The University of Tokyo)
Generalizable Multimodal Brain Decoding
Weihao Xia (University College London)
Towards Visually-Plausible and Controllable 3D Representations
Dongqing Wang (EPFL)
Location: Goodwin Room
Poster Session 5 - Advanced Architectures, Robustness and Efficiency
10:00 - 11:45
10:00 - 11:45
Drawing Room Posters
239REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation (Oral Presentation)Maëlic Neau, Paulo Eduardo Santos, Anne-Gwenn Bosser, Akihiro Sugimoto, Cedric Buche
511LieMorph: Transformer-based Image Registration Using Flows on Lie Groups (Oral Presentation)Johannes Bostelmann, Jan Lellmann
567Zero-Shot Anomaly Detection with Dual-Branch Prompt Selection (Oral Presentation)Zihan Wang, Samira Ebrahimi Kahou, Narges Armanfard
721Gromov Wasserstein Optimal Transport for Semantic Correspondences (Oral Presentation)Francis Snelgar, Stephen Gould, Ming Xu, Liang Zheng, Akshay Asthana
1132Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning (Oral Presentation)Wenyi Lian, Patrick Micke, Joakim Lindblad, Nataša Sladoje
22SynGround: Learning from Synthetic Data for Visual GroundingRuozhen He, Ziyan Yang, Paola Cascante-Bonilla, Alexander C. Berg, Vicente Ordonez
56Pointly-Supervised Weak-Shot Semantic Segmentation via Dual Mapping TransferWenhui Jiang, Ruikang Luo, Zeyu Luo, Xiaowei Zhao, Junjie Chen, Yuming Fang
126CLAIR: CLIP-Aided Weakly Supervised Zero-Shot Cross-Domain Image RetrievalChor Boon Tan, Conghui Hu, Gim Hee Lee
147Spatial-Frequency Domain Aggregation for Visual Place RecognitionChaoqun Wang, Shaobo Min
150WTNet: A Weather Transfer Network for Domain-Adaptive All-In-One Adverse Weather Image RestorationSi-Yu Huang, Fu-Jen Tsai, Chia-Wen Lin, Yen-Yu Lin
165Four eyes see more than two: Dataset Distillation with Mixture-of-ExpertsJia-Jiun Yao, Sheng-Feng Yu, Wei-Chen Chiu
184Graph Similarity Learning of Floor PlansCasper van Engelenburg, Jan van Gemert, Seyran Khademi
259Revisiting Entropy Minimization for Long-Sequence Continual Test-Time AdaptationWei Qin Chuah, Ruwan Tennakoon, Alireza Bab-Hadiashar
276Learning Event-guided Exposure-agnostic Video Frame Interpolation via Adaptive Feature BlendingJunsik Jung, Yoonki Cho, Woo Jae Kim, Lin Wang, Sung-Eui Yoon
283AegisRF: Adversarial Perturbations Guided with Sensitivity for Protecting Intellectual Property of Neural Radiance FieldsWoo Jae Kim, Kyu Beom Han, Yoonki Cho, Youngju Na, Junsik Jung, Sooel Son, Sung-Eui Yoon
291Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to AblationsNils Hütten, Florian Hölken, Hasan Tercan, Tobias Meisen
292SVAC: Scaling Is All You Need For Referring Video Object SegmentationLi Zhang, Haoxiang Gao, Zhihao Zhang, Luoxiao Huang, Tao Zhang
320Extreme Model Compression with Structured Sparsity at Low PrecisionDan Liu, Nikita Dvornik, Xue Liu
333Fast Self-Supervised depth and mask aware Association for Multi-Object TrackingMilad Khanchi, Maria Amer, Charalambos Poullis
334DAOVI: Distortion-Aware Omnidirectional Video InpaintingRyosuke Seshimo, Mariko Isogawa
339Events Meet Dynamic Mode Decomposition: Capturing the Spatiotemporal Dynamics of Moving ObjectsZhouning Du, Israr Ulhaq, Thanh Thi Huyen Phan, Yuichiro Yoshimura, Jigyasa Chand, Truong Vinh Truong Duy
342Flatness-aware Curriculum Learning via Adversarial DifficultyHiroaki Aizawa, Yoshikazu Hayashi
346Coarse Attribute Prediction with Task Agnostic Distillation for Real World Clothes Changing ReIDPriyank Pathak, Yogesh S Rawat
354Dual-Branch Network via Multiple Illumination-Aware Representation Learning for Steel Surface Defect ClassificationYong Seok Oh, Min Geol Kim, Bogyeong Kim, Jun Young Kim, Hyeongseob Jo, Jae Hyeon Park, Gyoomin Lee, Sung In Cho
381Permutation-Invariant Polar Harmonic Pooling for Point-based Neural NetworksJaspreet Singh Maan, Grzegorz Cielniak
398Knowledge Distillation via Cross Supervising with Attention for Remote Sensing Object DetectionKefan Zhan, An Luo, Yunpeng Zeng, Jiaxin Li, Yuan Zhang, Kai Hu
404Exploring Histogram-based Color ConstancyDavid R. Treadwell IV, Yunxuan Rao, Daniel Y. Bi, Bruce A. Maxwell
416Beyond Softmax: Dual-Branch Sigmoid Architecture for Accurate Class Activation MapsYoojin Oh, Junhyug Noh
444Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question AnsweringHaopeng Li, Mohammed Bennamoun, Jun Liu, Hossein Rahmani, Qiuhong Ke
457A Novel Local Focusing Mechanism for Deepfake Detection GeneralizationMingliang Li, Hanxi Li, Lin Yuanbo Wu, Changhong Liu
Location: Drawing Room
10:00 - 11:45
Hadfield Hall Posters
495An Explorative Study on Abstract Images and Visual Representations Learned from ThemHaotian LI, Jianbo Jiao
500UFD-KD: Unified Frequency Decoupled Knowledge DistillationSihan Lu, Yang Zheng, Jie Liu, Zhenghao Xi
546Multi-Method Ensemble for Out-of-Distribution DetectionLucas RAKOTOARIVONY
577Zero-Shot CFC: Fast Real-World Image Denoising based on Cross-Frequency ConsistencyYanlin Jiang, Yuchen Liu, Mingren Liu
614PerSense: Training-Free Personalized Instance Segmentation in Dense ImagesMuhammad Ibraheem Siddiqui, Muhammad Umer Sheikh, Hassan Abid, Muhammad Haris Khan
632Segmentation Assisted Incremental Test Time Adaptation in an Open WorldManogna Sreenivas, Soma Biswas
661Leveraging Sparsity for Efficient Inference of High-Resolution Vision Foundation ModelsXin Xu, Jason Kuen, Brian L Price, Kangning Liu, Zijun Wei, Yu-Xiong Wang
668MLoRQ: Bridging Low-Rank and Quantization for Transformer CompressionOfir Gordon, Ariel Lapid, Elad Cohen, Yarden Yagil, Arnon Netzer, Hai Victor Habi
680On the Role of Individual Differences in Current Approaches to Computational Image AestheticsLi-Wei Chen, Ombretta Strafforello, Anne-Sofie Maerten, Tinne Tuytelaars, Johan Wagemans
701Category-level Text-to-Image Retrieval Improved: Bridging the Domain Gap with Diffusion Models and Vision EncodersFaizan Farooq Khan, Vladan Stojnić, Zakaria Laskar, Mohamed Elhoseiny, Giorgos Tolias
707Catch Your Concepts: A Flexible Concept Locator for Interpretable Visual RecognitionQiyang Wan, Ruiping Wang, Chengzhi Gao, Xilin Chen
709TAPM-Net: Trajectory-Aware Perturbation Modeling for Infrared Small Target DetectionHongyang Xie, Hongyang He, Victor Sanchez
762Learning a Neural Association Network for Self-supervised Multi-Object TrackingShuai Li, Michael Burke, Subramanian Ramamoorthy, Juergen Gall
786TAG: A Simple Yet Effective Temporal-Aware Approach for Zero-Shot Video Temporal GroundingJin-Seop Lee, SungJoon Lee, Jaehan Ahn, YunSeok Choi, Jee-Hyong Lee
788Vision Backbone Efficient Selection for Image Classification in Low-Data RegimesJoris Guerin, Shray Bansal, Amirreza Shaban, Paulo Mann, Harshvardhan Gazula
824A Unified Framework for High-Frame-Rate HDR Video SynthesisHue Nguyen, Trevor Dalton Canham, Michael S. Brown
825Modular Embedding Recomposition for Incremental LearningAniello Panariello, Emanuele Frascaroli, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara
850C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks CompressionBaptiste Bauvin, Loïc Baret, Ola Ahmad
873Exploring Image Representation with Decoupled Classical Visual DescriptorsChenyuan Qu, Hao Chen, Jianbo Jiao
955Clean Sample Selection and Noisy Sample Rematching for Text-Based Pedestrian RetrievalDaiqiang Li, Weicheng Zhang, yuanyuan wu, Honggang Chen
1007Is Structural Awareness the Key to Event Camera Data Cleansing for Enhancing Veracity?Haiyu Li, Charith Abhayaratne
1015In-Model Merging for Enhancing the Robustness of Medical Imaging Classification ModelsHu Wang, Ibrahim Almakky, Congbo Ma, Numan Saeed, Mohammad Yaqub
1051Conditional Prototype Learning for Few-Shot Object DetectionZhenwei He, Xinye Liao, Xin Feng
1091CLIMB-3D: Class-Incremental Imbalanced 3D Instance SegmentationVishal Thengane, Jean Lahoud, Hisham Cholakkal, Rao Muhammad Anwer, Lu Yin, Xiatian Zhu, Salman Khan
1118Conformal Predictors for Efficient Video Text SpottingAmor Ben Tanfous, Sankha Subhra Mukherjee, Neil M. Robertson
1161Evaluating Perceptual Distance Models by Fitting Binomial Distributions to Two-Alternative Forced Choice DataAlexander Hepburn, Raúl Santos-Rodriguez, Javier Portilla
1171Conflict-Aware Adversarial TrainingZhiyu Xue, Haohan Wang, Yao Qin, Ramtin Pedarsani
1180MO-SHW: Hierarchy-Aware Multi-Objective Optimization for Open-World SegmentationErico M. Pereira, Frederico Gadelha Guimarães, Jefersson A Dos Santos
1183Contrastive Point Feature Matching for Open-world Object CountingNgo Xuan Cuong
1187Out-of-Distribution Detection from Small Training Sets using Bayesian Neural Network ClassifiersKevin Raina, Tanya Schmah
Location: Hadfield Hall
Oral Session 6 - Advanced Architectures, Robustness and Efficiency
11:45 - 13:00
Chair: Márjory Da Costa-Abreu (Sheffield Hallam University) 11:45 1132
Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning
Wenyi Lian, Patrick Micke, Joakim Lindblad, Nataša Sladoje
12:00 511
LieMorph: Transformer-based Image Registration Using Flows on Lie Groups
Johannes Bostelmann, Jan Lellmann
12:15 239
REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation
Maëlic Neau, Paulo Eduardo Santos, Anne-Gwenn Bosser, Akihiro Sugimoto, Cedric Buche
12:30 567
Zero-Shot Anomaly Detection with Dual-Branch Prompt Selection
Zihan Wang, Samira Ebrahimi Kahou, Narges Armanfard
12:45 721
Gromov Wasserstein Optimal Transport for Semantic Correspondences
Francis Snelgar, Stephen Gould, Ming Xu, Liang Zheng, Akshay Asthana
Location: Main Hall
Poster Session 6 - Human & Motion Analysis (+ Doctoral Consortium Posters)
14:00 - 15:45
14:00 - 15:45
Drawing Room Posters
66ST-GDance: Long-Term and Collision-Free Group Choreography from Music (Oral Presentation)Jing Xu, Weiqiang Wang, Cunjian Chen, Jun Liu, Qiuhong Ke
338GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition (Oral Presentation)Tianyue Wang, Shuang Yang, Shiguang Shan, Xilin Chen
566OpenHuman4D: Open-Vocabulary 4D Human Parsing (Oral Presentation)Keito Suzuki, Bang Du, Runfa Li, Kunyao Chen, Lei Wang, Peng Liu, Ning Bi, Truong Nguyen
610HERO-VQL: Hierarchical, Egocentric and Robust Visual Query Localization (Oral Presentation)Hyogun Lee, Joohyun Chang, Soyeon Hong, Seong Jong Ha, Dongho Lee, Seong Tae Kim, Jinwoo Choi
675Jack of many Faces: A Step Towards Facial Expression and Physiological State Analysis with a Single Network (Oral Presentation)Abdullah Tariq, Martin Masek, R Muhammad Atif Azad, Syed Zulqarnain Gilani
55Piezoelectric Acoustic Sensing for Sitting Pose ClassificationYuuki Shibuya, Go Irie
117RAPrivacy: a Readable Anonymizer for Privacy Preserving Action RecognitionZi-Zhen Wang, Yen-Lung Chu, Pei-Chun Tsai, Kuan-Wen Chen
181LuKAN: A Kolmogorov-Arnold Network Framework for 3D Human Motion PredictionMd Zahidul Hasan, Abdessamad Ben Hamza, Nizar Bouguila
220LiDAR MOT-DETR: A LiDAR-based Two-Stage Transformer for 3D Multiple Object TrackingMartha Teiko Teye, Ori Maoz, Matthias Rottmann
265Enhancing Visual Tracking by Leveraging High-frequency Information within Event SignalsYuheng Jiang, Hebei Li, Dachun Kai, Yansong Peng, Jiahui Yuan, Peilin Xiao, Yueyi Zhang, Xiaoyan Sun
270FaceGCD: Generalized Face Discovery via Dynamic Prefix GenerationYunseok Oh, Dong-Wan Choi
272B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction TestingYoojin Jang, Junsu Kim, Hayeon Kim, Eunki Lee, Eunsol Kim, Seungryul Baek, Jaejun Yoo
277CFFlow: An Optical Flow Estimation Hinging on Cross-Frequency AttentionWang Mengfei, Zhu Dongchen, Wang Lei, Li Jiamao
288Benchmarking Microsaccade Recognition with Event Cameras: A Novel Dataset and EvaluationWaseem Shariff, Timothy Hanley, Maciej Stec, Hossein Javidnia, Peter Corcoran
358Pose-Robust Calibration Strategy for Point-of-Gaze Estimation on Mobile PhonesYujie Zhao, Jiabei Zeng, Shiguang Shan
445Frequency-Temporal Feature Integration for Compressed Video Action RecognitionJiangwan Zhou, Yue Ming
453HalfMix Augmentation and Regularized Dual-Path Learning for Cross-Domain Gaze EstimationJiuk Hong, Heechul Jung
626Beyond Gloss: A Hand-Centric Framework for Gloss-Free Sign Language TranslationSobhan Asasi, Mohamed Ilyas Lakhal, Ozge Mercanoglu Sincan, Richard Bowden
Location: Drawing Room
14:00 - 15:45
Hadfield Hall Posters
664Canonical Makeup TransferXinyu Lin, Kun Zhou, Xiaoguang Han, Jiangbo Lu
737Self-Intersection-Aware 3D Human Motion Generation Using an Efficient Human Sphere ProxyPascal Herrmann, Maarten Bieshaar, Dennis Mack, Paul Robert Herzog, Juergen Gall
770DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency PredictionKiana Hooshanfar, Alireza Hosseini, Mona Ahmadian, Ahmad Kalhor, Babak N Araabi
811DEAD: Data-Efficient Audiovisual Dubbing using Neural Rendering PriorsJack Saunders, Vinay Namboodiri
865FootFormer: Estimating Stability from Visual InputKeaton Yukio Kraiger, Jingjing Li, Skanda Bharadwaj, Jesse Scott, Robert T. Collins, Yanxi Liu
880Distortion-Aware Multi-Object Tracking via Virtual Plane Projection in Overhead Fisheye CamerasPanithi Vanasirikul, Piyanon Charoenpoonpanich, Ekapol Chuangsuwanich
900Time-Scaling State-Space Models for Dense Video CaptioningAJ Piergiovanni, Ganesh Satish Mallya, Dahun Kim, Anelia Angelova
914Improving Human Motion Plausibility with Body MomentumHa Linh Nguyen, Tze Ho Elden Tse, Angela Yao
987Spatiotemporal Event Spotting via 3D Heatmaps with Dynamically Shifted Gaussian KernelsAnkhzaya Jamsrandorj, VANYI CHAO, Hoang Quoc Nguyen, Yin May Oo, Muhammad Amrulloh Robbani, Yewon Hwang, Kyung-Ryoul Mun, Jinwook Kim
1024Quantifying Risk in Pedestrian Crowds Using Divergence Estimated from Flows of Head-Tracking DataHaruto Nakayama, Masaki Onishi
1060Learning Correlation-aware Aleatoric Uncertainty for 3D Hand Pose EstimationLee Chae-Yeon, Nam Hyeon-Woo, Tae-Hyun Oh
DC1Uncertainty Propagation and Robustness in Rotation Averaging (Doctoral Consortium)Yaroslava Lochman - Chalmers University of Technology
DC2Complex-valued Neural Networks in Computer Vision (Doctoral Consortium)Saurabh Yadav - IIIT Delhi
DC3Robust and Generalizable 3D Perception for Autonomous Systems (Doctoral Consortium)Maciej Wozniak - KTH Royal Institute of Technology
DC4Constructing and Maintaining Geometric Digital Twins of Road Conditions (Doctoral Consortium)Percy Pui Hei Lam - Cambridge
DC5Visual Perception and Reconstruction under Challenging Lighting Conditions (Doctoral Consortium)Cui Zitong - The University of Tokyo
DC6Generalizable Multimodal Brain Decoding (Doctoral Consortium)Weihao Xia - University College London
DC7Towards Visually-Plausible and Controllable 3D Representations (Doctoral Consortium)Dongqing Wang - EPFL
Location: Hadfield Hall
Oral Session 7 - Human & Motion Analysis
15:45 - 17:00
Chair: Jianbo Jiao (University of Birmingham) 15:45 66
ST-GDance: Long-Term and Collision-Free Group Choreography from Music
Jing Xu, Weiqiang Wang, Cunjian Chen, Jun Liu, Qiuhong Ke
16:00 566
OpenHuman4D: Open-Vocabulary 4D Human Parsing
Keito Suzuki, Bang Du, Runfa Li, Kunyao Chen, Lei Wang, Peng Liu, Ning Bi, Truong Nguyen
16:15 338
GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition
Tianyue Wang, Shuang Yang, Shiguang Shan, Xilin Chen
16:30 675
Jack of many Faces: A Step Towards Facial Expression and Physiological State Analysis with a Single Network
Abdullah Tariq, Martin Masek, R Muhammad Atif Azad, Syed Zulqarnain Gilani
16:45 610
HERO-VQL: Hierarchical, Egocentric and Robust Visual Query Localization
Hyogun Lee, Joohyun Chang, Soyeon Hong, Seong Jong Ha, Dongho Lee, Seong Tae Kim, Jinwoo Choi
Location: Main Hall