BOTM: Echocardiography Segmentation
via Bi-directional Optimal Token Matching

1University of Leicester   2AstraZeneca   3CUHK   4Imperial College London
British Machine Vision Conference (BMVC) 2025
*Corresponding author
BOTM teaser: echocardiography segmentation with anatomical consistency

BOTM performs echocardiography segmentation and optimal anatomy transportation simultaneously, achieving accurate and anatomically consistent segmentation across challenging echocardiographic sequences.

Abstract

Existing echocardiography segmentation methods often suffer from anatomical inconsistency challenges caused by shape variation, partial observation and region ambiguity with similar intensity across 2D echocardiographic sequences, resulting in false positive segmentation with anatomical defeated structures in challenging low signal-to-noise ratio conditions. To provide a strong anatomical guarantee across different echocardiographic frames, we propose a novel segmentation framework named BOTM (Bi-directional Optimal Token Matching) that performs echocardiography segmentation and optimal anatomy transportation simultaneously. Given paired echocardiographic images, BOTM learns to match two sets of discrete image tokens by finding optimal correspondences from a novel anatomical transportation perspective. We further extend the token matching into a bi-directional cross-transport attention proxy to regulate the preserved anatomical consistency within the cardiac cyclic deformation in temporal domain. Extensive experimental results show that BOTM can generate stable and accurate segmentation outcomes (e.g. −1.917 HD on CAMUS 2H LV, +1.9% Dice on TED), and provide a better matching interpretation with anatomical consistency guarantee.

Method

BOTM pipeline overview
Step 1

Optimal Token Matching

Finding optimal correspondences between token embeddings via Sinkhorn iterations.

Step 1: Optimal Token Matching via Sinkhorn

Given paired echocardiographic images, we compute a cost matrix from token cosine similarity and solve the optimal transport problem using entropy-regularized Sinkhorn iterations to obtain the matching score map.

Step 2

Cross-Transport Attention

Bi-directional attention proxy guided by the optimal transport plan.

Step 2: Bi-directional Cross-Transport Attention

We reformulate optimal token matching as a novel Bi-directional Cross-Transport Attention (BCTA) proxy, computing barycentric interpolated embeddings in both forward and backward directions.

Step 3

Anatomical Importance

Learnable mask to suppress low-relevance background regions.

Step 3: Anatomical Importance Masking

A learnable anatomical importance mask combines local saliency and global distribution maps to suppress regions with high matching probability but low anatomical relevance, such as background areas.

Performance

Segmentation visualization on CAMUS dataset in 2 chamber view

Video Segmentation

BOTM produces accurate, stable, and temporally consistent segmentation across extended frame sequences.

Segmentation Uncertainty

By incorporating token matching to enforce anatomical consistency, BOTM effectively reduces segmentation uncertainty, leading to more coherent and reliable mask boundary delineation.

Segmentation uncertainty comparison showing BOTM reduces uncertainty

BibTeX

@inproceedings{liu2025botm,
  title={BOTM: Echocardiography Segmentation via Bi-directional Optimal Token Matching},
  author={Liu, Zhihua and Tong, Lei and He, Xilin and Liu, Che and Arcucci, Rossella and Jin, Chen and Zhou, Huiyu},
  booktitle={Proceedings of the 36th British Machine Vision Conference},
  year={2025}
}