Home / Papers / Transformer in Computer Vision

Transformer in Computer Vision

20 Citations2021
Jiarui Bi, Zengliang Zhu, Qinglong Meng
2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI)

An in-depth review of the vision-based transformer, covering transformers on image object detection, multiple object tracking, action classification, and visual segmentation, and a comprehensive experimental comparison to validate the strength of transformer-based methods.

Abstract

Transformer is widely used in Natural Language Processing (NLP), in which numerous papers have been proposed. Recently, the transformer has been borrowed for many computer vision tasks. However, there are few papers to give a comprehensive survey on the vision-based transformer. To this end, we give an in-depth review of the vision-based transformer. We conclude 15 articles covering transformers on image object detection, multiple object tracking, action classification, and visual segmentation. Furthermore, we summarize 6 related datasets for corresponding tasks as well as their metrics. We also provide a comprehensive experimental comparison to validate the strength of transformer-based methods. We provide a brief introduction to the transformer and its applications on computer vision tasks, which can help beginners to recognize it.