Below is the list of accepted papers for BMVC 2025. Congratulations! You will receive an email with further information and the next steps soon!
If your paper is not listed, it has been rejected. We understand how disappointing it can be to have a paper rejected—we’ve all been there. We hope the feedback from the reviews (when you receive the email) will provide valuable insights for revising the work and that you will consider resubmitting it in the future.
This year, BMVC received 865 submissions of which 276 papers were accepted. Each paper had 3 reviews, including a meta-review. All papers were discussed among the reviewers and the assigned Area Chairs (AC). Meta-reviews were verified by our Programme Chairs (PCs). All this was done while preserving author anonymity and avoiding domain conflicts.
ID | Title |
---|---|
12 | Part Segmentation and Motion Estimation for Articulated Objects with Dynamic 3D Gaussians |
18 | Volumetric Temporal Texture for Smoke Stylization using Dynamic Radiance Fields |
22 | Learning from Synthetic Data for Visual Grounding |
24 | MSMVD: Exploiting Multi-scale Image Features via Multi-scale BEV Features for Multi-view Pedestrian Detection |
26 | Guiding a diffusion model with itself using sliding windows |
28 | Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition |
29 | SPARTAN: Spatiotemporal Pose-Aware Retrieval for Text-guided Autonomous Navigation |
37 | Cross-Modal Scene Semantic Alignment for Image Complexity Assessment |
38 | GC-Font: Few-Shot Font Generation via Global Contextual Feature Modelling |
39 | DocAttentionRect: Attention-Guided Document Image Rectification |
43 | Split Matching for Inductive Zero-shot Semantic Segmentation |
55 | Piezoelectric Acoustic Sensing for Sitting Pose Classification |
56 | Pointly-Supervised Weak-Shot Semantic Segmentation via Dual Mapping Transfer |
59 | CMAMRNet: A Contextual Mask-Aware Network Enhancing Mural Restoration Through Comprehensive Mask Guidance |
66 | ST-GDance: Long-Term and Collision-Free Group Choreography from Music |
69 | TACTFL: Temporal Contrastive Training for Multi-modal Federated Learning with Similarity-guided Model Aggregation |
77 | Training-Free Synthetic Data Generation with Dual IP-Adapter Guidance |
81 | SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation |
83 | RPD-Diff: Region-Adaptive Physics-Guided Diffusion Model for Visibility Enhancement under Dense and Non-Uniform Haze |
84 | Continual Vision-and-Language Navigation |
88 | CRCE: Coreference-Retention Concept Erasure in Text-to-Image Diffusion Models |
102 | STAIN: Smooth Tile-Aware Instance Normalisation for Virtual Staining |
113 | Prompt-Based Exemplar Super-Compression and Regeneration for Class-Incremental Learning |
117 | RAPrivacy: a Readable Anonymizer for Privacy Preserving Action Recognition |
121 | BOTM: Echocardiography Segmentation via Bi-directional Optimal Token Matching |
122 | Distribution-guided Generative Replay with Semantic Prompts for Class-Incremental Chest X-ray Diagnosis |
124 | Mask2Act: Predictive Multi-Object Tracking as Video Pre-Training for Robot Manipulation |
126 | CLAIR: CLIP-Aided Weakly Supervised Zero-Shot Cross-Domain Image Retrieval |
128 | Uncertainty Diffusion: Parameter-Efficient Depth Refinement via Uncertainty-Guided Diffusion Models |
131 | Dual-Stream Attention with Multi-Modal Queries for Object Detection in Transportation Applications |
133 | M$^2$StyleGS: Multi-Modality 3D Style Transfer with Gaussian Splatting |
136 | Lang4D: Weakly Supervised Learning of 4D Language Splatting |
138 | SemanticControl: A Training-Free Approach for Handling Loosely Aligned Visual Conditions in ControlNet |
147 | Spatial-Frequency Domain Aggregation for Visual Place Recognition |
150 | WTNet: A Weather Transfer Network for Domain-Adaptive All-In-One Adverse Weather Image Restoration |
154 | Seed-to-Seed: Unpaired Image Translation in Diffusion Seed Space |
159 | Identity-Motion Trade-offs in Text-to-Video Generation |
165 | Four eyes see more than two: Dataset Distillation with Mixture-of-Experts |
168 | PSScreen: Partially Supervised Multiple Retinal Disease Screening |
173 | CellMamba: Adaptive Mamba for Accurate and Efficient Cell Detection |
181 | LuKAN: A Kolmogorov-Arnold Network Framework for 3D Human Motion Prediction |
182 | eXtended Multimodal Composite Association Score (xMCAS): A Gender Inclusive Approach to Measurement of Bias in Text-To-Image Diffusion Models |
184 | Graph Similarity Learning of Floor Plans |
195 | Proto-FG3D: Prototype-based Interpretable Fine-Grained 3D Shape Classification |
211 | Efficient Image Restoration via Latent Consistency Flow Matching |
218 | Bridging Visual-Textual Modalities: Weakly Supervised Histopathology Segmentation |
220 | LiDAR MOT-DETR: A LiDAR-based Two-Stage Transformer for 3D Multiple Object Tracking |
228 | Dual-Expert Collaborative Network for Fake News Detection with External Knowledge Integration |
232 | Unsupervised Video Continual Learning via Non-Parametric Deep Embedded Clustering |
238 | Capture and Reconstruct 3D Clothed Human from Images |
239 | REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation |
257 | S$^2$V2V: Training-Free Video-to-Video with Sparse Points and Motion Guidance |
259 | Revisiting Entropy Minimization for Long-Sequence Continual Test-Time Adaptation |
261 | ADIR: Adaptive Diffusion for Image Reconstruction |
265 | Enhancing Visual Tracking by Leveraging High-frequency Information within Event Signals |
270 | FaceGCD: Generalized Face Discovery via Dynamic Prefix Generation |
272 | B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction Testing |
276 | Learning Event-guided Exposure-agnostic Video Frame Interpolation via Adaptive Feature Blending |
277 | CFFlow: An Optical Flow Estimation Hinging on Cross-Frequency Attention |
281 | CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections |
283 | AegisRF: Adversarial Perturbations Guided with Sensitivity for Protecting Intellectual Property of Neural Radiance Fields |
284 | Multimodal Feature Collaboration and Fusion for Fine-Grained Action Recognition |
288 | Benchmarking Microsaccade Recognition with Event Cameras: A Novel Dataset and Evaluation |
289 | CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings |
291 | Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations |
292 | SVAC: Scaling Is All You Need For Referring Video Object Segmentation |
294 | Task Progressive Curriculum Learning for Robust Visual Question Answering |
310 | PME3D: An Adaptive and Efficient Multi-modal Feature Extraction Plug-in for 3D Object Detection |
313 | OctreeNCA: Single-Pass 184 MP Segmentation on Consumer Hardware |
320 | Extreme Model Compression with Structured Sparsity at Low Precision |
328 | DepthHMR: Leveraging Depth Around Humans for Multi-Human Mesh Generation |
330 | FaceCrafter: Identity-Conditional Diffusion with Disentangled Control over Facial Pose, Expression, and Emotion |
333 | Fast Self-Supervised depth and mask aware Association for Multi-Object Tracking |
334 | DAOVI: Distortion-Aware Omnidirectional Video Inpainting |
338 | GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition |
339 | Events Meet Dynamic Mode Decomposition: Capturing the Spatiotemporal Dynamics of Moving Objects |
342 | Flatness-aware Curriculum Learning via Adversarial Difficulty |
346 | Coarse Attribute Prediction with Task Agnostic Distillation for Real World Clothes Changing ReID |
348 | PosBridge: Multi-View Positional Embedding Transplant for Identity-Aware Image Editing |
354 | Dual-Branch Network via Multiple Illumination-Aware Representation Learning for Steel Surface Defect Classification |
357 | What Can We Learn from Harry Potter? An Exploratory Study of Visual Representation Learning from Atypical Videos |
358 | Pose-Robust Calibration Strategy for Point-of-Gaze Estimation on Mobile Phones |
359 | SteerPose: Simultaneous Extrinsic Camera Calibration and Matching from Articulation |
364 | Learning from Silence and Noise for Visual Sound Source Localization |
366 | HuGeDiff: 3D Human Generation via Diffusion with Gaussian Splatting |
367 | RUSplatting: Robust 3D Gaussian Splatting for Sparse-View Underwater Scene Reconstruction |
370 | RGB-Event Fusion for Robust Lane Detection |
373 | Back To The Drawing Board: Rethinking Scene-Level Sketch-Based Image Retrieval |
376 | Multi-Rationale Explainable Object Recognition via Contrastive Conditional Inference |
381 | Permutation-Invariant Polar Harmonic Pooling for Point-based Neural Networks |
391 | Depth Inconsistency-based spatial-channel attention gate for Mirror Segmentation |
392 | UMM: A Unified Multi-Modal Model for Low-Level Vision Tasks with Dual-Driven Prompting |
398 | Knowledge Distillation via Cross Supervising with Attention for Remote Sensing Object Detection |
399 | 3D Shape Reconstruction from Autonomous Driving Radars |
402 | Log NeRF: Comparing Spaces for Learning Radiance Fields |
404 | Exploring Histogram-based Color Constancy |
405 | HVLO-YOLO: An Ultra-Lightweight Detection Model for High-voltage Line Obstacles |
408 | Multimodal Hate Detection Using Dual-Stream Graph Neural Networks |
416 | Beyond Softmax: Dual-Branch Sigmoid Architecture for Accurate Class Activation Maps |
423 | JDATT: A Joint Distillation Framework for Atmospheric Turbulence Mitigation and Target Detection |
427 | Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data |
444 | Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering |
445 | Frequency-Temporal Feature Integration for Compressed Video Action Recognition |
453 | HalfMix Augmentation and Regularized Dual-Path Learning for Cross-Domain Gaze Estimation |
457 | A Novel Local Focusing Mechanism for Deepfake Detection Generalization |
458 | Intra-Modal Divergence-Weighted Distillation for Vision-Language Models |
461 | FSF3A: Federated Spatial Feature Alignment and Adaptive Aggregation for Heterogeneous Brain Tumor Segmentation |
478 | Bézier Curve-Based Stroke Extraction for Handwritten Characters |
479 | CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation |
486 | FreqSelect: Frequency-Aware fMRI-to-Image Reconstruction |
493 | Audio-Visual Separation with Hierarchical Fusion and Representation Alignment |
495 | An Explorative Study on Abstract Images and Visual Representations Learned from Them |
499 | One target to align them all: LiDAR, RGB and event cameras extrinsic calibration for Autonomous Driving |
500 | UFD-KD: Unified Frequency Decoupled Knowledge Distillation |
502 | DefectGPT: Towards Multi-Class Defect Detection with Limited Electrical Samples |
507 | Robust and Label-efficient Deep Waste Detection |
509 | Dynamic Convolution and Graph-Coupled Attention for Cross-Subject EEG-Vision Decoding |
511 | LieMorph: Transformer-based Image Registration Using Flows on Lie Groups |
516 | Making Rotation Averaging Fast and Robust with Anisotropic Coordinate Descent |
517 | Catching the Unknown with Limited Data: Bi-Directional Prompt Tuning in CLIP for Few-Shot Open-Set Adaptation |
528 | OmniSegNet: Towards Scalable, Efficient & Universal Medical Image Segmentation |
540 | OptSplat: Recurrent Optimization for Generalizable Reconstruction and Novel View Renderings |
543 | Temporally Compressed 3D Gaussian Splatting for Dynamic Scenes |
546 | Multi-Method Ensemble for Out-of-Distribution Detection |
548 | Pandora: Articulated 3D Scene Graphs from Egocentric Vision |
558 | EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidance |
566 | OpenHuman4D: Open-Vocabulary 4D Human Parsing |
567 | Zero-Shot Anomaly Detection with Dual-Branch Prompt Selection |
577 | Zero-Shot CFC: Fast Real-World Image Denoising based on Cross-Frequency Consistency |
588 | CoT-SD: Chain-of-Thought Semantic Denoising |
593 | PanoHair: Detailed Hair Strand Synthesis on Volumetric Heads |
594 | Video Dataset Condensation with Diffusion Models |
596 | TopoMortar: A Dataset to Evaluate Topology Accuracy in Image Segmentation |
602 | Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism |
603 | Dual-Stream Adapters for Open-Set Segmentation in Driving Scenes |
610 | HERO-VQL: Hierarchical, Egocentric and Robust Visual Query Localization |
614 | PerSense: Training-Free Personalized Instance Segmentation in Dense Images |
626 | Beyond Gloss: A Hand-Centric Framework for Gloss-Free Sign Language Translation |
627 | TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud Generation |
628 | MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction |
630 | Incremental Multi-Scene Modeling via Continual Neural Graphics Primitives |
632 | Segmentation Assisted Incremental Test Time Adaptation in an Open World |
644 | Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems |
646 | Towards Data-Efficient Medical Imaging: A Generative and Semi-Supervised Framework |
649 | Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes |
650 | Prompt-Informed Reinforcement Learning for Visual Coverage Path Planning |
656 | 3D Curvix: From Multiview 2D Edges to 3D Curve Segments |
661 | Leveraging Sparsity for Efficient Inference of High-Resolution Vision Foundation Models |
664 | Canonical Makeup Transfer |
666 | Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization |
668 | MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression |
675 | Jack of many Faces: A Step Towards Facial Expression and Physiological State Analysis with a Single Network |
680 | On the Role of Individual Differences in Current Approaches to Computational Image Aesthetics |
690 | MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction |
694 | Occam’s LGS: An Efficient Approach for Language Gaussian Splatting |
698 | SAMWave: Adapting Segment Anything Model to difficult tasks |
700 | Q-Align: Alleviating Attention Leakage in Zero-Shot Appearance Transfer via Query-Query Alignment |
701 | Category-level Text-to-Image Retrieval Improved: Bridging the Domain Gap with Diffusion Models and Vision Encoders |
704 | JOG3R: Towards 3D-Consistent Video Generators |
705 | ALSA: Anchors in Logit Space for Out-of-Distribution Accuracy Estimation |
707 | Catch Your Concepts: A Flexible Concept Locator for Interpretable Visual Recognition |
709 | TAPM-Net: Trajectory-Aware Perturbation Modeling for Infrared Small Target Detection |
716 | Size-aware Contrastive Imitation Learning for Language-conditioned Multi-task Robotic Manipulation |
717 | From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects |
718 | PADS: Plug-and-Play 3D Human Pose Analysis via Diffusion Generative Modeling |
721 | Gromov Wasserstein Optimal Transport for Semantic Correspondences |
722 | Unsupervised Multimodal Deepfake Detection Through Explicit Intra-Modal and Cross-Modal Inconsistency Discovery |
732 | TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models |
736 | EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models |
737 | Self-Intersection-Aware 3D Human Motion Generation Using an Efficient Human Sphere Proxy |
745 | Mapping like a Skeptic: Probabilistic BEV Projection for Online HD Mapping |
751 | TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification |
753 | Boosting Camera Motion Control for Video Diffusion Transformers |
755 | Audio-Guided Visual Editing with Complex Multi-Modal Prompts |
762 | Learning a Neural Association Network for Self-supervised Multi-Object Tracking |
766 | Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing |
768 | Interpretable Text-Guided Image Clustering via Iterative Search |
770 | DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction |
782 | Is Safety Checker Still Safe? A Study on the Covert NSFW Text |
784 | UDT : Unsupervised Discovery of Transformations between Fine-Grained Classes in Diffusion Models |
786 | TAG: A Simple Yet Effective Temporal-Aware Approach for Zero-Shot Video Temporal Grounding |
787 | ${C}^{3}$-GS: Learning Context-aware, Cross-dimension, Cross-scale Feature for Generalizable Gaussian Splatting |
788 | Vision Backbone Efficient Selection for Image Classification in Low-Data Regimes |
811 | DEAD: Data-Efficient Audiovisual Dubbing using Neural Rendering Priors |
824 | A Unified Framework for High-Frame-Rate HDR Video Synthesis |
825 | Modular Embedding Recomposition for Incremental Learning |
827 | MonoTracker: Monocular RGB-Only 6D Tracking of Unknown Object |
828 | Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval |
831 | LOGen: Toward LiDAR Object Generation by Point Diffusion |
832 | SIMULDITEX: Single Image Multiscale & Lightweight Diffusion for Texture Modelling |
835 | Cloud-Stereo: A Dataset and Benchmark for Reconstructing Atmospheric Clouds from Stereo Images |
850 | C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression |
851 | Asymmetric Event-Image Stereo with Temporal Feature Gating and Iterative Structure-Detail Refinement |
852 | LoFT: LoRA-fused Training Dataset Generation with Few-shot Guidance |
857 | Lost in Time: A New Temporal Benchmark for VideoLLMs |
859 | Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift |
864 | Geometry-Aware Diffusion Models for Multiview Scene Inpainting |
865 | Estimating Foot Pressure and Stability from Visual Input |
866 | Image Recognition with Vision and Language Embeddings of VLMs |
873 | Exploring Image Representation with Decoupled Classical Visual Descriptors |
875 | Lost in Translation? Vocabulary Alignment for Source-Free Adaptation in Open-Vocabulary Semantic Segmentation |
880 | Distortion-Aware Multi-Object Tracking via Virtual Plane Projection in Overhead Fisheye Cameras |
887 | QWD-GAN: Quality-aware Wavelet-driven GAN for Unsupervised Medical Microscopy Images Denoising |
896 | Llama Learns to Direct: DirectorLLM for Human-Centric Video Generation |
898 | FaceCPT: Toward Cross-Modal Facial Representation Learning with Face-Caption Pre-Training |
900 | Time-Scaling State-Space Models for Dense Video Captioning |
902 | ITC-RWKV: Interactive Tissue–Cell Modeling with Recurrent Key-Value Aggregation for Histopathological Subtyping |
903 | Toward Robust Audio-Visual Synchronization Detection in Egocentric Video with Sparse Synchronization Events |
914 | Improving Human Motion Plausibility with Body Momentum |
915 | DualDistill: A Unified Cross-Modal Knowledge Distillation Framework for Camera-Based BEV Representation |
922 | FSLC: Fast Scoring with Learnable Coreset for Zero-shot Industrial Anomaly Detection |
931 | Diffusion Transformer-to-Mamba Distillation for High-Resolution Image Generation |
938 | Grad-CL: Source Free Domain Adaptation with Gradient Guided Feature Disalignment |
940 | Semi-MoE: Mixture-of-Experts meets Semi-Supervised Histopathology Segmentation |
943 | HEAL: Learning-Free Source Free Unsupervised Domain Adaptation for Cross-Modality Medical Image Segmentation |
948 | Tracking Meets Large Multimodal Models for Driving Scenario Understanding |
949 | Prompt Image to Watch and Hear: Multimodal Prompting for Parameter-Efficient Audio-Visual Learning |
955 | Clean Sample Selection and Noisy Sample Rematching for Text-Based Pedestrian Retrieval |
960 | A Hybrid Framework Bridging CNN and ViT based on Theory of Evidence for Diabetic Retinopathy Grading |
962 | LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba |
976 | Advancing Utility Pole and Sign Detection Through Deep Learning |
981 | Ev4DGS: Novel-view Rendering of Non-Rigid Objects from Monocular Event Streams |
986 | Visible Structure Retrieval for Lightweight Image-Based Relocalisation |
987 | Spatiotemporal Event Spotting via 3D Heatmaps with Dynamically Shifted Gaussian Kernels |
988 | Ink Enhancement for Ancient Bamboo Manuscripts Using Iterative Restoration-Degradation Adversarial Learning |
989 | MedOpenSeg: Open-World Medical Segmentation with Memory-Augmented Transformers |
992 | Dual Polarity Prompts with Stochastic Entropy Perturbation for Label Noise |
999 | Lid-Lab-NeRF: Generating Temporally Consistent, Labelled LiDAR Point Clouds using Neural Radiance Fields |
1004 | RP-SAM2: Refining Point Prompts for Stable Surgical Instrument Segmentation |
1006 | Verifier Matters: Enhancing Inference-Time Scaling for Video Diffusion Models |
1007 | Is Structural Awareness the Key to Event Camera Data Cleansing for Enhancing Veracity? |
1013 | SALT: Parameter-Efficient Fine-Tuning via Singular Value Adaptation with Low-Rank Transformation |
1015 | In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models |
1024 | Quantifying Risk in Pedestrian Crowds Using Divergence Estimated from Flows of Head-Tracking Data |
1029 | PrIINeR: Towards Prior-Informed Implicit Neural Representations for Accelerated MRI |
1030 | Language-Guided Decision Override for Adaptive and Retraining-Free Video Anomaly Detection |
1035 | Superpixel Anything: A general object-based framework for accurate yet regular superpixel segmentation |
1036 | Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving |
1037 | Generative Data Augmentation for Object Point Cloud Segmentation |
1051 | Conditional Prototype Learning for Few-Shot Object Detection |
1058 | Atomizer: Generalizing to unseen modalities by breaking images down to a set of scalars |
1060 | Learning Correlation-aware Aleatoric Uncertainty for 3D Hand Pose Estimation |
1061 | IPGPhormer: Interpretable Pathology Graph-Transformer for Survival Analysis |
1062 | Calibration-Aware Prompt Learning for Medical Vision-Language Models |
1064 | 3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D Shapes |
1071 | Stabilizing Open-Set Test-Time Adaptation via Primary-Auxiliary Filtering and Knowledge-Integrated Prediction |
1072 | Controllable Garment Generation with Multi-Modal Diffusion Guidance |
1076 | CLFSeg: A Fuzzy-Logic based Solution for Boundary Clarity and Uncertainty Reduction in Medical Image Segmentation |
1077 | Towards Sharper Object Boundaries in Self-Supervised Depth Estimation |
1081 | ImProvShow: Multimodal Fusion for Image Provenance Summarization |
1091 | CLIMB-3D: Class-Incremental Imbalanced 3D Instance Segmentation |
1095 | Supervised Segmentation Model for Improved Detection of OSSN using Slit Lamp Images |
1096 | LED: Light Enhanced Depth Estimation at Night |
1109 | AKD-BNN: Adaptive Kernel Dynamic Bayesian Neural Networks for Enhanced Medical Image Segmentation with Uncertainty Estimation |
1110 | Robust Human Registration with Body Part Segmentation on Noisy Point Clouds |
1118 | Conformal Predictors for Efficient Video Text Spotting |
1121 | MIAS-SAM: Medical Image Anomaly Segmentation without thresholding |
1123 | Evaluating Self-Supervised Learning in Medical Imaging: A Systematic Investigation of Robustness, Generalizability, and Multi-Domain Impact |
1127 | ALFred: An Active Learning Framework for Real-world Semi-supervised Anomaly Detection with Adaptive Thresholds |
1132 | Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning |
1134 | TRUDI and TITUS: A Multi-Perspective Dataset and A Three-Stage Recognition System for Transportation Unit Identification |
1135 | The Trauma THOMPSON Dataset for Real-World Emergency AI |
1137 | ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding Updates |
1139 | RETRO: REthinking Tactile Representation Learning with Material PriOrs |
1156 | Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion |
1161 | Evaluating Perceptual Distance Models by Fitting Binomial Distributions to Two-Alternative Forced Choice Data |
1171 | Conflict-Aware Adversarial Training |
1177 | Binarizing Severely Degraded Ancient Bamboo Slips: Dataset and Baseline |
1180 | MO-SHW: Hierarchy-Aware Multi-Objective Optimization for Open-World Segmentation |
1183 | Contrastive Point Feature Matching for Open-world Object Counting |
1187 | Out-of-Distribution Detection from Small Training Sets using Bayesian Neural Network Classifiers |
1196 | IoSR: End-to-End Intraoral Scans Repairing |
1199 | AULUNet: An Adaptive Ultra-Lightweight U-Net Framework for Efficient Skin Lesion Segmentation in Resource-Constrained Environments |
1209 | Language-Guided Reinforcement Learning for Hard Attention in Few-Shot Learning |
1220 | Hair Strand Reconstruction based on 3D Gaussian Splatting |
1224 | RASALoRE: Region Aware Spatial Attention with Location-based Random Embeddings for Weakly Supervised Anomaly Detection in Brain MRI Scans |
1257 | End-to-End LiDAR optimization for point cloud registration |