Bin Yan

I am a third year PhD student at the IIAU-Lab in Dalian University of Technology, under the supervision of Prof. Huchuan Lu. My main research interests are Computer Vision and Deep Learning, especially in the tasks of Video Object Tracking.

I also worked as a research intern at the Microsoft Research Asia and the ByteDance AI Lab, from 2020-2021 and 2021-2022 respectively. Prior to that, I completed my bachelor degree in Dalian University of Technology, China in 2019.

Email  /  GitHub  /  Google Scholar  / 

profile photo

Publications

project image

[8] Universal Instance Perception as Object Discovery and Retrieval


Bin Yan, Yi Jiang*, Jiannan Wu, Dong Wang*, Ping Luo, Zehuan Yuan, Huchuan Lu
CVPR, 2023
arxiv / code /

we present a universal instance perception model of the next generation, termed UNINEXT. UNINEXT reformulates diverse instance perception tasks into a unified object discovery and retrieval paradigm and can flexibly perceive different types of objects by simply changing the input prompts. UNINEXT shows superior performance on 20 challenging benchmarks from 10 instance-level tasks including classical image-level tasks (object detection and instance segmentation), vision-and-language tasks (referring expression comprehension and segmentation), and six video-level object tracking tasks.

project image

[7] Towards Grand Unification of Object Tracking


Bin Yan, Yi Jiang*, Peize Sun, Dong Wang*, Zehuan Yuan, Ping Luo, Huchuan Lu
ECCV, 2022   (Oral Presentation)
arxiv / code /

We present a unified method, termed Unicorn, that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters. Unicorn provides a unified solution, adopting the same input, backbone, embedding, and head across all tracking tasks. For the first time, we accomplish the great unification of the tracking network architecture and learning paradigm. Unicorn performs on-par or better than its task-specific counterparts in 8 tracking datasets, including LaSOT, TrackingNet, MOT17, BDD100K, DAVIS16-17, MOTS20, and BDD100K MOTS.

project image

[6] Learning Spatio-Temporal Transformer for Visual Tracking


Bin Yan, Houwen Peng*, Jianlong Fu, Dong Wang*, Huchuan Lu
ICCV, 2021
arxiv / code /

We present a new tracking architecture with an encoder-decoder transformer as the key component. The encoder models the global spatio-temporal feature dependencies between target objects and search regions, while the decoder learns a query embedding to predict the spatial positions of the target objects. Our method casts object tracking as a direct bounding box prediction problem. The whole method is end-to-end, does not need any postprocessing steps, thus largely simplifying existing tracking pipelines. The proposed tracker achieves state-of-the-art performance on five challenging short-term and long-term benchmarks, while running at real-time speed, being 6x faster than Siam R-CNN.

project image

[5] LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search


Bin Yan*, Houwen Peng*, Kan Wu*, Dong Wang, Jianlong Fu, Huchuan Lu
CVPR, 2021
arxiv / code /

We present LightTrack, which uses neural architecture search to design more lightweight and efficient object trackers. It can find trackers that achieve superior performance compared to handcrafted SOTA trackers, while using much fewer model Flops and parameters. On Snapdragon 845 Adreno GPU, LightTrack runs 12x faster than Ocean, while using 13x fewer parameters and 38x fewer Flops.

project image

[4] Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation


Bin Yan*, Xinyu Zhang*, Dong Wang, Huchuan Lu, Xiaoyun Yang
CVPR, 2021
arxiv / code /

We present Alpha-Refine, a flexible and accurate refinement module, which can significantly improve the base trackers’ box estimation quality. Alpha-Refine adopts a pixel-wise correlation, a corner prediction head, and an auxiliary mask head as the core components. Comprehensive experiments on TrackingNet, LaSOT, GOT-10K, and VOT2020 benchmarks with multiple base trackers show that our approach significantly improves the base tracker’s performance with little extra latency.

project image

[3] Transformer Tracking


Xin Chen*, Bin Yan*, Jiawen Zhu, Dong Wang, Xiaoyun Yang, Huchuan Lu
CVPR, 2021
arxiv / code /

We present a Transformer tracking method named TransT, which effectively combines the template and search region features solely using attention. The proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. TransT achieves very promising results on six challenging datasets, especially on large-scale LaSOT, TrackingNet, and GOT-10k benchmarks.

project image

[2] Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises


Bin Yan, Dong Wang, Huchuan Lu, Xiaoyun Yang
CVPR, 2020
arxiv / code /

We present the Cooling-Shrinking Attack (CSA) to deceive object trackers. An effective and efficient perturbation generator is trained with a carefully designed adversarial loss, which can simultaneously cool hot regions where the target exists on the heatmaps and force the predicted bounding box to shrink. Our method can effectively fool the trackers by adding small perturbations to the template or the search regions and has good transferability.

project image

[1] ‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking


Bin Yan*, Haojie Zhao*, Dong Wang, Huchuan Lu, Xiaoyun Yang
ICCV, 2019
arxiv / code /

We present SPLT, a robust and real-time long-term tracking framework based on the skimming and perusal modules. The perusal module consists of an effective bounding box regressor and a robust target verifier. Besides, a novel skimming module is designed to speed up the image-wide global search. The proposed method runs in real-time and achieves the best performance on the VOT-2018 long-term and OxUvA long-term benchmarks.





Design and source code from Goutam Bhat's website.