Home / Papers / Top Research Papers on Transformers

Top Research Papers on Transformers

Delve into the transformative world of Transformer models with our curated list of top research papers. Whether you're a novice or an expert, these papers provide valuable insights and advancements in deep learning technology. Discover the latest trends and findings in Transformer research here.

Looking for research-backed answers?Try AI Search

Transformer in Transformer

1008 Citations 2021

Kai Han, An Xiao, Enhua Wu + 3 more

arXiv (Cornell University)

Transformer is a new kind of neural architecture which encodes the input data as powerful features via the attention mechanism. Basically, the visual transformers first divide the input images into several local patches and then calculate both representations and their relationship. Since natural images are of high complexity with abundant detail and color information, the granularity of the patch dividing is not fine enough for excavating features of objects in different scales and locations. In this paper, we point out that the attention inside these local patches are also essential for buil...

Perspectives on urban transformation research: transformations in, of, and by cities

154 Citations 2021

Katharina Hölscher, Niki Frantzeskaki

Urban Transformations

Abstract The narrative of ‘urban transformations’ epitomises the hope that cities provide rich opportunities for contributing to local and global sustainability and resilience. Urban transformation research is developing a rich yet consistent research agenda, offering opportunities for integrating multiple perspectives and disciplines concerned with radical change towards desirable urban systems. We outline three perspectives on urban transformations in , of and by cities as a structuring approach for integrating knowledge about urban transformations. We illustrate how each perspective helps d...

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

123 Citations 2021

Zilong Huang, Youcheng Ben, Guozhong Luo + 3 more

arXiv (Cornell University)

Very recently, Window-based Transformers, which computed self-attention within non-overlapping local windows, demonstrated promising results on image classification, semantic segmentation, and object detection. However, less study has been devoted to the cross-window connection which is the key element to improve the representation ability. In this work, we revisit the spatial shuffle as an efficient way to build connections among windows. As a result, we propose a new vision transformer, named Shuffle Transformer, which is highly efficient and easy to implement by modifying two lines of code....

A survey of the vision transformers and their CNN-transformer based variants

263 Citations 2023

Asifullah Khan, Zunaira Rauf, Anabia Sohail + 4 more

Artificial Intelligence Review

This survey presents a taxonomy of the recent vision transformer architectures and more specifically that of the hybrid vision transformers, and sheds light on the future directions of this rapidly evolving architecture.

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

311 Citations 2020

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas + 1 more

arXiv (Cornell University)

This work expresses the self-attention as a linear dot-product of kernel feature maps and makes use of the associativity property of matrix products to reduce the complexity from O(N) to N, where N is the sequence length.

Point Transformer

245 Citations 2021

Nico Engel, Vasileios Belagiannis, Klaus Dietmayer

IEEE Access

This work proposes SortNet, as part of the Point Transformer, which induces input permutation invariance by selecting points based on a learned score, to extract local and global features and relate both representations by introducing the local-global attention mechanism.

Point Transformer

1990 Citations 2021

Hengshuang Zhao, Li Jiang, Jiaya Jia + 2 more

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

The Point Transformer design improves upon prior work across domains and tasks and crosses the 70% mIoU threshold for the first time on the challenging S3DIS dataset.

MSA Transformer

329 Citations 2021

Roshan Rao, Jason Liu, Robert Verkuil + 5 more

journal unavailable

A protein language model which takes as input a set of sequences in the form of a multiple sequence alignment and is trained with a variant of the masked language modeling objective across many protein families surpasses current state-of-the-art unsupervised structure learning methods by a wide margin.

Misogynoir Transformed

243 Citations 2021

Moya Bailey

New York University Press eBooks

Where racism and sexism meet—an understanding of anti-Black misogyny When Moya Bailey first coined the term misogynoir , she defined it as the ways anti-Black and misogynistic representation shape broader ideas about Black women, particularly in visual culture and digital spaces. She had no idea that the term would go viral, touching a cultural nerve and quickly entering into the lexicon. Misogynoir now has its own Wikipedia page and hashtag, and has been featured on Comedy Central’s The Daily Show and CNN’s Cuomo Prime Time . In Misogynoir Transformed , Bailey delves into her groundbreaking c...

Digitale Transformation

112 Citations 2021

Jan-Felix Schrape

journal unavailable

Mit der fortschreitenden Digitalisierung gehen fundamentale gesellschaftliche Transformationsprozesse einher, die weit über eine bloße Rekonfiguration technischer, medialer oder wirtschaftlicher Verhältnisse hinausreichen. Sie betreffen sämtliche Bereiche sozialer Koordination, Kommunikation und Entwicklung der gegenwärtigen Gesellschaft. Die Einführung ordnet die viel diskutierte "digitale Transformation" in das langfristige Wechselverhältnis von Technik und Gesellschaft ein und zeigt die Diskontinuitäten, aber auch die Kontinuitäten in diesem Prozess auf. Sie gibt einen systematisierenden Üb...

Improving your data transformations: Applying the Box-Cox transformation

878 Citations 2020

Jason W. Osborne

Scholarworks (University of Massachusetts Amherst)

Many of us in the social sciences deal with data that do not conform to assumptions of normality and/or homoscedasticity/homogeneity of variance. Some research has shown that parametric tests (e.g., multiple regression, ANOVA) can be robust to modest violations of these assumptions. Yet the reality is that almost all analyses (even nonparametric tests) benefit from improved the normality of variables, particularly where substantial non-normality is present. While many are familiar with select traditional transformations (e.g., square root, log, inverse) for improving normality, the Box-Cox tra...

FLatten Transformer: Vision Transformer using Focused Linear Attention

236 Citations 2023

Dongchen Han, Xuran Pan, Yizeng Han + 2 more

journal unavailable

This paper proposes a novel Focused Linear Attention module, which introduces a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity.

The Interplay of Digital Transformational Leadership, Organizational Agility, and Digital Transformation

107 Citations 2023

Bora Ly

Journal of the Knowledge Economy

The model of how digital transformational leadership (DTL) influences DT through organizational agility (OA) is tested, demonstrating that alignment of organizational models and evolving OA are critical to DT.

Unpacking the Difference Between Digital Transformation and IT-Enabled Organizational Transformation

656 Citations 2021

Lauri Wessel, Abayomi Baiyere, Roxana Ologeanu‐Taddei + 2 more

Journal of the Association for Information Systems

An empirically grounded conceptualization is developed that sets these two phenomena apart, finding that there are two distinctive differences: digital transformation activities leverage digital technology in (re)defining an organization’s value proposition, while IT-enabled organizational transformation activities Leverage digitalTechnology in supporting the value proposition.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

27893 Citations 2021

Ze Liu, Yutong Lin, Yue Cao + 5 more

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

A hierarchical Transformer whose representation is computed with Shifted windows, which has the flexibility to model at various scales and has linear computational complexity with respect to image size and will prove beneficial for all-MLP architectures.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

355 Citations 2021

Ze Liu, Yutong Lin, Yue Cao + 5 more

arXiv (Cornell University)

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differences, we propose a hierarchical Transformer whose representation is computed with \textbf{S}hifted \textbf{win}dows. The shifted windowing scheme brings greater efficiency by limiting self-attention ...

Transformers are RNNs: Fast Autoregressive Transformers with Linear\n Attention

338 Citations 2020

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas + 1 more

arXiv (Cornell University)

Transformers achieve remarkable performance in several tasks but due to their\nquadratic complexity, with respect to the input's length, they are\nprohibitively slow for very long sequences. To address this limitation, we\nexpress the self-attention as a linear dot-product of kernel feature maps and\nmake use of the associativity property of matrix products to reduce the\ncomplexity from $\\mathcal{O}\\left(N^2\\right)$ to $\\mathcal{O}\\left(N\\right)$,\nwhere $N$ is the sequence length. We show that this formulation permits an\niterative implementation that dramatically accelerates autoregre...

The transformative leadership compass: six competencies for digital transformation entrepreneurship

253 Citations 2021

Giovanni Schiuma, Eva Schettini, Francesco Santarsiero + 1 more

International Journal of Entrepreneurial Behaviour & Research

The transformative leadership compass is proposed as a model to outline the critical competencies distinguishing a digital transformative leader capable of driving continuous company innovation and specifically digital transformation entrepreneurship.

Transformative outcomes: assessing and reorienting experimentation with transformative innovation policy

159 Citations 2021

Bipashyee Ghosh, Paula Kivimaa, Matías Ramírez + 2 more

Science and Public Policy

Abstract The impending climate emergency, the Paris agreement and Sustainable Development Goals demand significant transformations in economies and societies. Science funders, innovation agencies, and scholars have explored new rationales and processes for policymaking, such as transformative innovation policy (TIP). Here, we address the question of how to orient the efforts of science, technology, and innovation policy actors to enable transformations. We build on sustainability transitions research and a 4-year co-creation journey of the TIP Consortium to present twelve transformative outcom...

Transformation strategies for the supply chain: the impact of industry 4.0 and digital transformation

132 Citations 2020

Raphael Preindl, Κωνσταντίνος Νικολόπουλος, Konstantia Litsiou

Supply Chain Forum an International Journal

It is shown that the possibility of an entire SC integration based on new technologies is still at distance and the impact of Industry 4.0 and the Digital Transformation on decision making is greatly connected to information sharing.

Cas-VSwin transformer: A variant swin transformer for surface-defect detection

143 Citations 2022

Linfeng Gao, Jianxun Zhang, Changhui Yang + 1 more

Computers in Industry

Surface defect detection using deep learning approaches has become a promising area of research, but the difficulty of accurately locating and segmenting various forms of defects presents a challenge for this method. Swin Transformer, as a Transformer-based model, has made significant progress in computer vision. Its performance surpasses standard CNN’s performance on most tasks, but it has drawn scant attention from industrial applications. Thus far, using CNNs for surface defect detection tends to be the most common application. To explore the extensibility of the Transformer, we seek to exp...

Defect transformer: An efficient hybrid transformer architecture for surface defect detection

109 Citations 2023

Junpu Wang, Guili Xu, Fuju Yan + 2 more

Measurement

Surface defect detection is an extremely crucial step to ensure the quality of industrial products. Nowadays, convolutional neural networks (CNNs) based on encoder–decoder architecture have achieved tremendous success in various defect detection tasks. However, the intrinsic locality of convolution prevents them from modeling long-range interactions explicitly, making it difficult to distinguish pseudo-defects in cluttered backgrounds. Recent transformers are especially skilled at learning global image dependencies, but with limited local structural information for the refined defect location....

Transformers in Vision: A Survey

124 Citations 2022

Salman Khan, Muzammal Naseer, Munawar Hayat + 3 more

ACM Computing Surveys

This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline with an introduction to fundamental concepts behind the success of Transformers, i.e., self-attention, large-scale pre-training, and bidirectional feature encoding.

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

1171 Citations 2022

Xiaoyi Dong, Jianmin Bao, Dongdong Chen + 5 more

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

The Cross-Shaped Window self-attention mechanism for computing self-Attention in the horizontal and vertical stripes in parallel that form a cross-shaped window is developed, with each stripe obtained by splitting the input feature into stripes of equal width.

Adapting transformation and transforming adaptation to climate change using a pathways approach

103 Citations 2021

Matthew J. Colloff, Russell Gorddard, Nick Abel + 11 more

Environmental Science & Policy

Human actions have driven earth systems close to irreversible and profound change. The need to shift towards intentional transformative adaptation (ITA) is clear. Using case studies from the Transformative Adaptation Research Alliance (TARA), we explore ITA as a way of thinking and acting that is transformative in concept and objectives, but achieved through a mix of incremental and transformative co-production processes that ultimately lead to the social-ecological system being transformed. Central to ITA are social and political issues of how individuals and collectives address environmental...

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

101 Citations 2021

Xiaoyi Dong, Jianmin Bao, Dongdong Chen + 5 more

arXiv (Cornell University)

We present CSWin Transformer, an efficient and effective Transformer-based backbone for general-purpose vision tasks. A challenging issue in Transformer design is that global self-attention is very expensive to compute whereas local self-attention often limits the field of interactions of each token. To address this issue, we develop the Cross-Shaped Window self-attention mechanism for computing self-attention in the horizontal and vertical stripes in parallel that form a cross-shaped window, with each stripe obtained by splitting the input feature into stripes of equal width. We provide a mat...

Video Swin Transformer

1757 Citations 2022

Ze Liu, Ning Jia, Yue Cao + 4 more

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

This paper advocates an inductive bias of locality in video Transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization.

Pop Music Transformer

222 Citations 2020

Yu-Siang Huang, Yi‐Hsuan Yang

journal unavailable

A Pop Music Transformer is built that composes Pop piano music with better rhythmic structure than existing Transformer models, when the way a musical score is converted into the data fed to a Transformer model is improved.

Efficient Transformers: A Survey

872 Citations 2022

Yi Tay, Mostafa Dehghani, Dara Bahri + 1 more

ACM Computing Surveys

This article characterizes a large and thoughtful selection of recent efficiency-flavored “X-former” models, providing an organized and comprehensive overview of existing work and models across multiple domains.

STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

183 Citations 2023

Dasom Ahn, Sangwon Kim, Hyunsu Hong + 1 more

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

The proposed Spatio-TemporAl cRoss (STAR)-transformer, which can effectively represent two cross-modal features as a recognizable vector, achieves a promising improvement in performance in comparison to previous state-of-the-art methods.

Neighborhood Attention Transformer

352 Citations 2023

Ali Hassani, Steven Walton, Jiachen Li + 2 more

journal unavailable

NA is a pixel-wise operation, localizing self attention to the nearest neighboring pixels, and therefore enjoys a linear time and space complexity compared to the quadratic complexity of SA, and is the first efficient and scalable sliding window attention mechanism for vision.

Video Transformers: A Survey

130 Citations 2023

Javier Selva, Anders Skaarup Johansen, Sérgio Escalera + 3 more

IEEE Transactions on Pattern Analysis and Machine Intelligence

This survey dives into how videos are handled at the input level first and studies the architectural changes made to deal with video more efficiently, reduce redundancy, re-introduce useful inductive biases, and capture long-term temporal dynamics.

Global Tracking Transformers

164 Citations 2022

Xingyi Zhou, Tianwei Yin, Vladlen Koltun + 1 more

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

A novel transformer-based architecture for global multi-object tracking that takes a short sequence of frames as input and produces global trajectories for all objects, and seamlessly integrates into state-of-the-art large-vocabulary detectors to track any objects.

Reformer: The Efficient Transformer

322 Citations 2020

Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya

arXiv (Cornell University)

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, w...

Methane transformation by photocatalysis

347 Citations 2022

Xiyi Li, Chao Wang, Junwang Tang

Nature Reviews Materials

Methane hydrate and shale gas are predicted to have substantial reserves, far beyond the sum of other fossil fuels. Using methane instead of crude oil as a building block is, thus, a very attractive strategy for synthesizing valuable chemicals. Because methane is so inert, its direct conversion needs a high activation energy and typically requires harsh reaction conditions or strong oxidants. Photocatalysis, which employs photons operated under very mild conditions, is a promising technology to reduce the thermodynamic barrier in direct methane conversion and to avoid the common issues of over...

Dual Vision Transformer

134 Citations 2023

Ting Yao, Yehao Li, Yingwei Pan + 3 more

IEEE Transactions on Pattern Analysis and Machine Intelligence

This paper proposes a novel Transformer architecture that elegantly exploits the global semantics for self-attention learning, namely Dual Vision Transformer (Dual-ViT).

Text Spotting Transformers

116 Citations 2022

Xiang Zhang, Yongwen Su, Subarna Tripathi + 1 more

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

This paper presents TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild, and designs a bounding-box guided polygon detection process.

Towards a Transformation of Philosophy

228 Citations 2023

K. O. Apel, Glyn Adey, David Frisby

journal unavailable

First Published in 1980 (English Translation) Towards a Transformation of Philosophy presents selected essays from Karl -Otto Apel's two- volume German collection that was published in 1973 under the title Transformation der Philosophie. Karl -Otto Apel's studies in philosophy and the social sciences can be said to have bridged the gap that had hitherto existed between the Anglo-Saxon traditions of analytical philosophy of language and pragmatism, and the philosophical traditions of the European continent of phenomenology, existentialism, and hermeneutics. Apel points to language as the crucia...

Image Fusion Transformer

182 Citations 2022

Vibashan VS, Jeya Maria Jose Valanarasu, Poojan Oza + 1 more

2022 IEEE International Conference on Image Processing (ICIP)

This work proposes a novel Image Fusion Transformer (IFT) where a transformer-based multi-scale fusion strategy that attends to both local and long-range information (or global context) in the image fusion process.

Transforming tuberculosis diagnosis

105 Citations 2023

Madhukar Pai, Puneet Dewan, Soumya Swaminathan

Nature Microbiology

This work describes seven critical transitions that can close the massive TB diagnostic gap and enable TB programmes worldwide to recover from the pandemic setbacks.