The 36th British Machine Vision Conference 2025: Wednesday, 26th November

Keynote 3 - Angela Dai

09:00 - 10:00

09:00 - 10:00	Title: Can Transformers Speak Geometry? Abstract: What if generating a 3D mesh were as natural as predicting the next word in a sentence? Autoregressive modeling has rapidly become a unifying learning paradigm across data modalities, across language to images, and now offers a compelling approach for 3D geometry. This talk explores how transformer-based autoregressive models enable mesh generation by representing meshes as sequences. Framing mesh generation as a next-token prediction problem enables new ways to handle the compact, irregular structure of human-designed 3D assets, directly compatible with downstream graphics and vision applications. We explore sequence formulation and data representation, and address practical challenges in scaling to high-resolution meshes and interactive synthesis. This will enable more accessible and democratized 3D content creation, paving the way for interactive design, rapid prototyping, and simulation-ready assets, and unlocking new possibilities for both creative and computational exploration of 3D geometry. Location: Main Hall

Title: Can Transformers Speak Geometry?

Abstract: What if generating a 3D mesh were as natural as predicting the next word in a sentence? Autoregressive modeling has rapidly become a unifying learning paradigm across data modalities, across language to images, and now offers a compelling approach for 3D geometry. This talk explores how transformer-based autoregressive models enable mesh generation by representing meshes as sequences. Framing mesh generation as a next-token prediction problem enables new ways to handle the compact, irregular structure of human-designed 3D assets, directly compatible with downstream graphics and vision applications. We explore sequence formulation and data representation, and address practical challenges in scaling to high-resolution meshes and interactive synthesis. This will enable more accessible and democratized 3D content creation, paving the way for interactive design, rapid prototyping, and simulation-ready assets, and unlocking new possibilities for both creative and computational exploration of 3D geometry.

Location: Main Hall

Doctoral Consortium

10:00 - 13:00

Chairs: Dr Cass Zhixue Zhao and Dr Yang Long	Welcome & Overview (10:00–10:05, 5 mins)
	Oral Presentations – Session 1 (10:05–10:45, 40 mins) 3 candidate presentations (10 mins each + 3 mins Q&A)
	1. Uncertainty Propagation and Robustness in Rotation Averaging Yaroslava Lochman (Chalmers University of Technology) 2. Complex-valued Neural Networks in Computer Vision Saurabh Yadav (IIIT Delhi) 3. Robust and Generalizable 3D Perception for Autonomous Systems Maciej Wozniak (KTH Royal Institute of Technology)
	Tea & Coffee Break (10:45–11:15, 30 mins)
	Oral Presentations – Session 2 (11:20–12:20, 50 mins) 4 candidate presentations (10 mins each + 3 mins Q&A)
	1. Constructing and Maintaining Geometric Digital Twins of Road Conditions Percy Pui Hei Lam (University of Cambridge) 2. Visual Perception and Reconstruction under Challenging Lighting Conditions Ziteng Cui (The University of Tokyo) 3. Generalizable Multimodal Brain Decoding Weihao Xia (University College London) 4. Towards Visually-Plausible and Controllable 3D Representations Dongqing Wang (EPFL)
	Mentor Discussion & Wrap-Up (12:20–12:55, 35 mins)
	Closing Remarks (12:55–13:00, 5 mins)
	Location: Goodwin Room

Poster Session 5 - Advanced Architectures, Robustness and Efficiency

10:00 - 11:45

Drawing Room Posters

239	REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation (Oral Presentation)	Maëlic Neau, Paulo Eduardo Santos, Anne-Gwenn Bosser, Akihiro Sugimoto, Cedric Buche
511	LieMorph: Transformer-based Image Registration Using Flows on Lie Groups (Oral Presentation)	Johannes Bostelmann, Jan Lellmann
567	Zero-Shot Anomaly Detection with Dual-Branch Prompt Selection (Oral Presentation)	Zihan Wang, Samira Ebrahimi Kahou, Narges Armanfard
721	Gromov Wasserstein Optimal Transport for Semantic Correspondences (Oral Presentation)	Francis Snelgar, Stephen Gould, Ming Xu, Liang Zheng, Akshay Asthana
1132	Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning (Oral Presentation)	Wenyi Lian, Patrick Micke, Joakim Lindblad, Nataša Sladoje
22	SynGround: Learning from Synthetic Data for Visual Grounding	Ruozhen He, Ziyan Yang, Paola Cascante-Bonilla, Alexander C. Berg, Vicente Ordonez
56	Pointly-Supervised Weak-Shot Semantic Segmentation via Dual Mapping Transfer	Wenhui Jiang, Ruikang Luo, Zeyu Luo, Xiaowei Zhao, Junjie Chen, Yuming Fang
126	CLAIR: CLIP-Aided Weakly Supervised Zero-Shot Cross-Domain Image Retrieval	Chor Boon Tan, Conghui Hu, Gim Hee Lee
147	Spatial-Frequency Domain Aggregation for Visual Place Recognition	Chaoqun Wang, Shaobo Min
150	WTNet: A Weather Transfer Network for Domain-Adaptive All-In-One Adverse Weather Image Restoration	Si-Yu Huang, Fu-Jen Tsai, Chia-Wen Lin, Yen-Yu Lin
165	Four eyes see more than two: Dataset Distillation with Mixture-of-Experts	Jia-Jiun Yao, Sheng-Feng Yu, Wei-Chen Chiu
184	Graph Similarity Learning of Floor Plans	Casper van Engelenburg, Jan van Gemert, Seyran Khademi
259	Revisiting Entropy Minimization for Long-Sequence Continual Test-Time Adaptation	Wei Qin Chuah, Ruwan Tennakoon, Alireza Bab-Hadiashar
276	Learning Event-guided Exposure-agnostic Video Frame Interpolation via Adaptive Feature Blending	Junsik Jung, Yoonki Cho, Woo Jae Kim, Lin Wang, Sung-Eui Yoon
283	AegisRF: Adversarial Perturbations Guided with Sensitivity for Protecting Intellectual Property of Neural Radiance Fields	Woo Jae Kim, Kyu Beom Han, Yoonki Cho, Youngju Na, Junsik Jung, Sooel Son, Sung-Eui Yoon
291	Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations	Nils Hütten, Florian Hölken, Hasan Tercan, Tobias Meisen
292	SVAC: Scaling Is All You Need For Referring Video Object Segmentation	Li Zhang, Haoxiang Gao, Zhihao Zhang, Luoxiao Huang, Tao Zhang
320	Extreme Model Compression with Structured Sparsity at Low Precision	Dan Liu, Nikita Dvornik, Xue Liu
333	Fast Self-Supervised depth and mask aware Association for Multi-Object Tracking	Milad Khanchi, Maria Amer, Charalambos Poullis
334	DAOVI: Distortion-Aware Omnidirectional Video Inpainting	Ryosuke Seshimo, Mariko Isogawa
339	Events Meet Dynamic Mode Decomposition: Capturing the Spatiotemporal Dynamics of Moving Objects	Zhouning Du, Israr Ulhaq, Thanh Thi Huyen Phan, Yuichiro Yoshimura, Jigyasa Chand, Truong Vinh Truong Duy
342	Flatness-aware Curriculum Learning via Adversarial Difficulty	Hiroaki Aizawa, Yoshikazu Hayashi
346	Coarse Attribute Prediction with Task Agnostic Distillation for Real World Clothes Changing ReID	Priyank Pathak, Yogesh S Rawat
354	Dual-Branch Network via Multiple Illumination-Aware Representation Learning for Steel Surface Defect Classification	Yong Seok Oh, Min Geol Kim, Bogyeong Kim, Jun Young Kim, Hyeongseob Jo, Jae Hyeon Park, Gyoomin Lee, Sung In Cho
381	Permutation-Invariant Polar Harmonic Pooling for Point-based Neural Networks	Jaspreet Singh Maan, Grzegorz Cielniak
398	Knowledge Distillation via Cross Supervising with Attention for Remote Sensing Object Detection	Kefan Zhan, An Luo, Yunpeng Zeng, Jiaxin Li, Yuan Zhang, Kai Hu
404	Exploring Histogram-based Color Constancy	David R. Treadwell IV, Yunxuan Rao, Daniel Y. Bi, Bruce A. Maxwell
416	Beyond Softmax: Dual-Branch Sigmoid Architecture for Accurate Class Activation Maps	Yoojin Oh, Junhyug Noh
444	Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering	Haopeng Li, Mohammed Bennamoun, Jun Liu, Hossein Rahmani, Qiuhong Ke
457	A Novel Local Focusing Mechanism for Deepfake Detection Generalization	Mingliang Li, Hanxi Li, Lin Yuanbo Wu, Changhong Liu

Location: Drawing Room

10:00 - 11:45

Hadfield Hall Posters

495	An Explorative Study on Abstract Images and Visual Representations Learned from Them	Haotian LI, Jianbo Jiao
500	UFD-KD: Unified Frequency Decoupled Knowledge Distillation	Sihan Lu, Yang Zheng, Jie Liu, Zhenghao Xi
546	Multi-Method Ensemble for Out-of-Distribution Detection	Lucas RAKOTOARIVONY
577	Zero-Shot CFC: Fast Real-World Image Denoising based on Cross-Frequency Consistency	Yanlin Jiang, Yuchen Liu, Mingren Liu
614	PerSense: Training-Free Personalized Instance Segmentation in Dense Images	Muhammad Ibraheem Siddiqui, Muhammad Umer Sheikh, Hassan Abid, Muhammad Haris Khan
632	Segmentation Assisted Incremental Test Time Adaptation in an Open World	Manogna Sreenivas, Soma Biswas
661	Leveraging Sparsity for Efficient Inference of High-Resolution Vision Foundation Models	Xin Xu, Jason Kuen, Brian L Price, Kangning Liu, Zijun Wei, Yu-Xiong Wang
668	MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression	Ofir Gordon, Ariel Lapid, Elad Cohen, Yarden Yagil, Arnon Netzer, Hai Victor Habi
680	On the Role of Individual Differences in Current Approaches to Computational Image Aesthetics	Li-Wei Chen, Ombretta Strafforello, Anne-Sofie Maerten, Tinne Tuytelaars, Johan Wagemans
701	Category-level Text-to-Image Retrieval Improved: Bridging the Domain Gap with Diffusion Models and Vision Encoders	Faizan Farooq Khan, Vladan Stojnić, Zakaria Laskar, Mohamed Elhoseiny, Giorgos Tolias
707	Catch Your Concepts: A Flexible Concept Locator for Interpretable Visual Recognition	Qiyang Wan, Ruiping Wang, Chengzhi Gao, Xilin Chen
709	TAPM-Net: Trajectory-Aware Perturbation Modeling for Infrared Small Target Detection	Hongyang Xie, Hongyang He, Victor Sanchez
762	Learning a Neural Association Network for Self-supervised Multi-Object Tracking	Shuai Li, Michael Burke, Subramanian Ramamoorthy, Juergen Gall
786	TAG: A Simple Yet Effective Temporal-Aware Approach for Zero-Shot Video Temporal Grounding	Jin-Seop Lee, SungJoon Lee, Jaehan Ahn, YunSeok Choi, Jee-Hyong Lee
788	Vision Backbone Efficient Selection for Image Classification in Low-Data Regimes	Joris Guerin, Shray Bansal, Amirreza Shaban, Paulo Mann, Harshvardhan Gazula
824	A Unified Framework for High-Frame-Rate HDR Video Synthesis	Hue Nguyen, Trevor Dalton Canham, Michael S. Brown
825	Modular Embedding Recomposition for Incremental Learning	Aniello Panariello, Emanuele Frascaroli, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara
850	C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression	Baptiste Bauvin, Loïc Baret, Ola Ahmad
873	Exploring Image Representation with Decoupled Classical Visual Descriptors	Chenyuan Qu, Hao Chen, Jianbo Jiao
955	Clean Sample Selection and Noisy Sample Rematching for Text-Based Pedestrian Retrieval	Daiqiang Li, Weicheng Zhang, yuanyuan wu, Honggang Chen
1007	Is Structural Awareness the Key to Event Camera Data Cleansing for Enhancing Veracity?	Haiyu Li, Charith Abhayaratne
1015	In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models	Hu Wang, Ibrahim Almakky, Congbo Ma, Numan Saeed, Mohammad Yaqub
1051	Conditional Prototype Learning for Few-Shot Object Detection	Zhenwei He, Xinye Liao, Xin Feng
1091	CLIMB-3D: Class-Incremental Imbalanced 3D Instance Segmentation	Vishal Thengane, Jean Lahoud, Hisham Cholakkal, Rao Muhammad Anwer, Lu Yin, Xiatian Zhu, Salman Khan
1118	Conformal Predictors for Efficient Video Text Spotting	Amor Ben Tanfous, Sankha Subhra Mukherjee, Neil M. Robertson
1161	Evaluating Perceptual Distance Models by Fitting Binomial Distributions to Two-Alternative Forced Choice Data	Alexander Hepburn, Raúl Santos-Rodriguez, Javier Portilla
1171	Conflict-Aware Adversarial Training	Zhiyu Xue, Haohan Wang, Yao Qin, Ramtin Pedarsani
1180	MO-SHW: Hierarchy-Aware Multi-Objective Optimization for Open-World Segmentation	Erico M. Pereira, Frederico Gadelha Guimarães, Jefersson A Dos Santos
1183	Contrastive Point Feature Matching for Open-world Object Counting	Ngo Xuan Cuong
1187	Out-of-Distribution Detection from Small Training Sets using Bayesian Neural Network Classifiers	Kevin Raina, Tanya Schmah

Location: Hadfield Hall

Oral Session 6 - Advanced Architectures, Robustness and Efficiency

11:45 - 13:00

Chair: Márjory Da Costa-Abreu (Sheffield Hallam University)	11:45	1132	Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning Wenyi Lian, Patrick Micke, Joakim Lindblad, Nataša Sladoje
	12:00	511	LieMorph: Transformer-based Image Registration Using Flows on Lie Groups Johannes Bostelmann, Jan Lellmann
	12:15	239	REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation Maëlic Neau, Paulo Eduardo Santos, Anne-Gwenn Bosser, Akihiro Sugimoto, Cedric Buche
	12:30	567	Zero-Shot Anomaly Detection with Dual-Branch Prompt Selection Zihan Wang, Samira Ebrahimi Kahou, Narges Armanfard
	12:45	721	Gromov Wasserstein Optimal Transport for Semantic Correspondences Francis Snelgar, Stephen Gould, Ming Xu, Liang Zheng, Akshay Asthana
	Location: Main Hall

Poster Session 6 - Human & Motion Analysis (+ Doctoral Consortium Posters)

14:00 - 15:45

Drawing Room Posters

66	ST-GDance: Long-Term and Collision-Free Group Choreography from Music (Oral Presentation)	Jing Xu, Weiqiang Wang, Cunjian Chen, Jun Liu, Qiuhong Ke
338	GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition (Oral Presentation)	Tianyue Wang, Shuang Yang, Shiguang Shan, Xilin Chen
566	OpenHuman4D: Open-Vocabulary 4D Human Parsing (Oral Presentation)	Keito Suzuki, Bang Du, Runfa Li, Kunyao Chen, Lei Wang, Peng Liu, Ning Bi, Truong Nguyen
610	HERO-VQL: Hierarchical, Egocentric and Robust Visual Query Localization (Oral Presentation)	Hyogun Lee, Joohyun Chang, Soyeon Hong, Seong Jong Ha, Dongho Lee, Seong Tae Kim, Jinwoo Choi
675	Jack of many Faces: A Step Towards Facial Expression and Physiological State Analysis with a Single Network (Oral Presentation)	Abdullah Tariq, Martin Masek, R Muhammad Atif Azad, Syed Zulqarnain Gilani
55	Piezoelectric Acoustic Sensing for Sitting Pose Classification	Yuuki Shibuya, Go Irie
117	RAPrivacy: a Readable Anonymizer for Privacy Preserving Action Recognition	Zi-Zhen Wang, Yen-Lung Chu, Pei-Chun Tsai, Kuan-Wen Chen
181	LuKAN: A Kolmogorov-Arnold Network Framework for 3D Human Motion Prediction	Md Zahidul Hasan, Abdessamad Ben Hamza, Nizar Bouguila
220	LiDAR MOT-DETR: A LiDAR-based Two-Stage Transformer for 3D Multiple Object Tracking	Martha Teiko Teye, Ori Maoz, Matthias Rottmann
265	Enhancing Visual Tracking by Leveraging High-frequency Information within Event Signals	Yuheng Jiang, Hebei Li, Dachun Kai, Yansong Peng, Jiahui Yuan, Peilin Xiao, Yueyi Zhang, Xiaoyan Sun
270	FaceGCD: Generalized Face Discovery via Dynamic Prefix Generation	Yunseok Oh, Dong-Wan Choi
272	B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction Testing	Yoojin Jang, Junsu Kim, Hayeon Kim, Eunki Lee, Eunsol Kim, Seungryul Baek, Jaejun Yoo
277	CFFlow: An Optical Flow Estimation Hinging on Cross-Frequency Attention	Wang Mengfei, Zhu Dongchen, Wang Lei, Li Jiamao
288	Benchmarking Microsaccade Recognition with Event Cameras: A Novel Dataset and Evaluation	Waseem Shariff, Timothy Hanley, Maciej Stec, Hossein Javidnia, Peter Corcoran
358	Pose-Robust Calibration Strategy for Point-of-Gaze Estimation on Mobile Phones	Yujie Zhao, Jiabei Zeng, Shiguang Shan
445	Frequency-Temporal Feature Integration for Compressed Video Action Recognition	Jiangwan Zhou, Yue Ming
453	HalfMix Augmentation and Regularized Dual-Path Learning for Cross-Domain Gaze Estimation	Jiuk Hong, Heechul Jung
626	Beyond Gloss: A Hand-Centric Framework for Gloss-Free Sign Language Translation	Sobhan Asasi, Mohamed Ilyas Lakhal, Ozge Mercanoglu Sincan, Richard Bowden

Location: Drawing Room

14:00 - 15:45

Hadfield Hall Posters

664	Canonical Makeup Transfer	Xinyu Lin, Kun Zhou, Xiaoguang Han, Jiangbo Lu
737	Self-Intersection-Aware 3D Human Motion Generation Using an Efficient Human Sphere Proxy	Pascal Herrmann, Maarten Bieshaar, Dennis Mack, Paul Robert Herzog, Juergen Gall
770	DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction	Kiana Hooshanfar, Alireza Hosseini, Mona Ahmadian, Ahmad Kalhor, Babak N Araabi
811	DEAD: Data-Efficient Audiovisual Dubbing using Neural Rendering Priors	Jack Saunders, Vinay Namboodiri
865	FootFormer: Estimating Stability from Visual Input	Keaton Yukio Kraiger, Jingjing Li, Skanda Bharadwaj, Jesse Scott, Robert T. Collins, Yanxi Liu
880	Distortion-Aware Multi-Object Tracking via Virtual Plane Projection in Overhead Fisheye Cameras	Panithi Vanasirikul, Piyanon Charoenpoonpanich, Ekapol Chuangsuwanich
900	Time-Scaling State-Space Models for Dense Video Captioning	AJ Piergiovanni, Ganesh Satish Mallya, Dahun Kim, Anelia Angelova
914	Improving Human Motion Plausibility with Body Momentum	Ha Linh Nguyen, Tze Ho Elden Tse, Angela Yao
987	Spatiotemporal Event Spotting via 3D Heatmaps with Dynamically Shifted Gaussian Kernels	Ankhzaya Jamsrandorj, VANYI CHAO, Hoang Quoc Nguyen, Yin May Oo, Muhammad Amrulloh Robbani, Yewon Hwang, Kyung-Ryoul Mun, Jinwook Kim
1024	Quantifying Risk in Pedestrian Crowds Using Divergence Estimated from Flows of Head-Tracking Data	Haruto Nakayama, Masaki Onishi
1060	Learning Correlation-aware Aleatoric Uncertainty for 3D Hand Pose Estimation	Lee Chae-Yeon, Nam Hyeon-Woo, Tae-Hyun Oh
DC1	Uncertainty Propagation and Robustness in Rotation Averaging (Doctoral Consortium)	Yaroslava Lochman - Chalmers University of Technology
DC2	Complex-valued Neural Networks in Computer Vision (Doctoral Consortium)	Saurabh Yadav - IIIT Delhi
DC3	Robust and Generalizable 3D Perception for Autonomous Systems (Doctoral Consortium)	Maciej Wozniak - KTH Royal Institute of Technology
DC4	Constructing and Maintaining Geometric Digital Twins of Road Conditions (Doctoral Consortium)	Percy Pui Hei Lam - Cambridge
DC5	Visual Perception and Reconstruction under Challenging Lighting Conditions (Doctoral Consortium)	Cui Zitong - The University of Tokyo
DC6	Generalizable Multimodal Brain Decoding (Doctoral Consortium)	Weihao Xia - University College London
DC7	Towards Visually-Plausible and Controllable 3D Representations (Doctoral Consortium)	Dongqing Wang - EPFL

Location: Hadfield Hall

Oral Session 7 - Human & Motion Analysis

15:45 - 17:00

Chair: Jianbo Jiao (University of Birmingham)	15:45	66	ST-GDance: Long-Term and Collision-Free Group Choreography from Music Jing Xu, Weiqiang Wang, Cunjian Chen, Jun Liu, Qiuhong Ke
	16:00	566	OpenHuman4D: Open-Vocabulary 4D Human Parsing Keito Suzuki, Bang Du, Runfa Li, Kunyao Chen, Lei Wang, Peng Liu, Ning Bi, Truong Nguyen
	16:15	338	GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition Tianyue Wang, Shuang Yang, Shiguang Shan, Xilin Chen
	16:30	675	Jack of many Faces: A Step Towards Facial Expression and Physiological State Analysis with a Single Network Abdullah Tariq, Martin Masek, R Muhammad Atif Azad, Syed Zulqarnain Gilani
	16:45	610	HERO-VQL: Hierarchical, Egocentric and Robust Visual Query Localization Hyogun Lee, Joohyun Chang, Soyeon Hong, Seong Jong Ha, Dongho Lee, Seong Tae Kim, Jinwoo Choi
	Location: Main Hall