Top Research Papers on Transformers
Delve into the transformative world of Transformer models with our curated list of top research papers. Whether you're a novice or an expert, these papers provide valuable insights and advancements in deep learning technology. Discover the latest trends and findings in Transformer research here.
Looking for research-backed answers?Try AI Search
Transformer in Transformer
1008 Citations 2021Kai Han, An Xiao, Enhua Wu + 3 more
arXiv (Cornell University)
Transformer is a new kind of neural architecture which encodes the input data as powerful features via the attention mechanism. Basically, the visual transformers first divide the input images into several local patches and then calculate both representations and their relationship. Since natural images are of high complexity with abundant detail and color information, the granularity of the patch dividing is not fine enough for excavating features of objects in different scales and locations. In this paper, we point out that the attention inside these local patches are also essential for buil...
Perspectives on urban transformation research: transformations in, of, and by cities
154 Citations 2021Katharina Hölscher, Niki Frantzeskaki
Urban Transformations
Abstract The narrative of ‘urban transformations’ epitomises the hope that cities provide rich opportunities for contributing to local and global sustainability and resilience. Urban transformation research is developing a rich yet consistent research agenda, offering opportunities for integrating multiple perspectives and disciplines concerned with radical change towards desirable urban systems. We outline three perspectives on urban transformations in , of and by cities as a structuring approach for integrating knowledge about urban transformations. We illustrate how each perspective helps d...
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
123 Citations 2021Zilong Huang, Youcheng Ben, Guozhong Luo + 3 more
arXiv (Cornell University)
Very recently, Window-based Transformers, which computed self-attention within non-overlapping local windows, demonstrated promising results on image classification, semantic segmentation, and object detection. However, less study has been devoted to the cross-window connection which is the key element to improve the representation ability. In this work, we revisit the spatial shuffle as an efficient way to build connections among windows. As a result, we propose a new vision transformer, named Shuffle Transformer, which is highly efficient and easy to implement by modifying two lines of code....
A survey of the vision transformers and their CNN-transformer based variants
263 Citations 2023Asifullah Khan, Zunaira Rauf, Anabia Sohail + 4 more
Artificial Intelligence Review
This survey presents a taxonomy of the recent vision transformer architectures and more specifically that of the hybrid vision transformers, and sheds light on the future directions of this rapidly evolving architecture.
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
311 Citations 2020Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas + 1 more
arXiv (Cornell University)
This work expresses the self-attention as a linear dot-product of kernel feature maps and makes use of the associativity property of matrix products to reduce the complexity from O(N) to N, where N is the sequence length.
This work proposes SortNet, as part of the Point Transformer, which induces input permutation invariance by selecting points based on a learned score, to extract local and global features and relate both representations by introducing the local-global attention mechanism.
Point Transformer
1990 Citations 2021Hengshuang Zhao, Li Jiang, Jiaya Jia + 2 more
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
The Point Transformer design improves upon prior work across domains and tasks and crosses the 70% mIoU threshold for the first time on the challenging S3DIS dataset.
A protein language model which takes as input a set of sequences in the form of a multiple sequence alignment and is trained with a variant of the masked language modeling objective across many protein families surpasses current state-of-the-art unsupervised structure learning methods by a wide margin.
Where racism and sexism meet—an understanding of anti-Black misogyny When Moya Bailey first coined the term misogynoir , she defined it as the ways anti-Black and misogynistic representation shape broader ideas about Black women, particularly in visual culture and digital spaces. She had no idea that the term would go viral, touching a cultural nerve and quickly entering into the lexicon. Misogynoir now has its own Wikipedia page and hashtag, and has been featured on Comedy Central’s The Daily Show and CNN’s Cuomo Prime Time . In Misogynoir Transformed , Bailey delves into her groundbreaking c...
Mit der fortschreitenden Digitalisierung gehen fundamentale gesellschaftliche Transformationsprozesse einher, die weit über eine bloße Rekonfiguration technischer, medialer oder wirtschaftlicher Verhältnisse hinausreichen. Sie betreffen sämtliche Bereiche sozialer Koordination, Kommunikation und Entwicklung der gegenwärtigen Gesellschaft. Die Einführung ordnet die viel diskutierte "digitale Transformation" in das langfristige Wechselverhältnis von Technik und Gesellschaft ein und zeigt die Diskontinuitäten, aber auch die Kontinuitäten in diesem Prozess auf. Sie gibt einen systematisierenden Üb...
Improving your data transformations: Applying the Box-Cox transformation
878 Citations 2020Jason W. Osborne
Scholarworks (University of Massachusetts Amherst)
Many of us in the social sciences deal with data that do not conform to assumptions of normality and/or homoscedasticity/homogeneity of variance. Some research has shown that parametric tests (e.g., multiple regression, ANOVA) can be robust to modest violations of these assumptions. Yet the reality is that almost all analyses (even nonparametric tests) benefit from improved the normality of variables, particularly where substantial non-normality is present. While many are familiar with select traditional transformations (e.g., square root, log, inverse) for improving normality, the Box-Cox tra...
FLatten Transformer: Vision Transformer using Focused Linear Attention
236 Citations 2023Dongchen Han, Xuran Pan, Yizeng Han + 2 more
journal unavailable
This paper proposes a novel Focused Linear Attention module, which introduces a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity.
The Interplay of Digital Transformational Leadership, Organizational Agility, and Digital Transformation
107 Citations 2023Bora Ly
Journal of the Knowledge Economy
The model of how digital transformational leadership (DTL) influences DT through organizational agility (OA) is tested, demonstrating that alignment of organizational models and evolving OA are critical to DT.
Unpacking the Difference Between Digital Transformation and IT-Enabled Organizational Transformation
656 Citations 2021Lauri Wessel, Abayomi Baiyere, Roxana Ologeanu‐Taddei + 2 more
Journal of the Association for Information Systems
An empirically grounded conceptualization is developed that sets these two phenomena apart, finding that there are two distinctive differences: digital transformation activities leverage digital technology in (re)defining an organization’s value proposition, while IT-enabled organizational transformation activities Leverage digitalTechnology in supporting the value proposition.
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
27893 Citations 2021Ze Liu, Yutong Lin, Yue Cao + 5 more
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
A hierarchical Transformer whose representation is computed with Shifted windows, which has the flexibility to model at various scales and has linear computational complexity with respect to image size and will prove beneficial for all-MLP architectures.
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
355 Citations 2021Ze Liu, Yutong Lin, Yue Cao + 5 more
arXiv (Cornell University)
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differences, we propose a hierarchical Transformer whose representation is computed with \textbf{S}hifted \textbf{win}dows. The shifted windowing scheme brings greater efficiency by limiting self-attention ...
Transformers are RNNs: Fast Autoregressive Transformers with Linear\n Attention
338 Citations 2020Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas + 1 more
arXiv (Cornell University)
Transformers achieve remarkable performance in several tasks but due to their\nquadratic complexity, with respect to the input's length, they are\nprohibitively slow for very long sequences. To address this limitation, we\nexpress the self-attention as a linear dot-product of kernel feature maps and\nmake use of the associativity property of matrix products to reduce the\ncomplexity from $\\mathcal{O}\\left(N^2\\right)$ to $\\mathcal{O}\\left(N\\right)$,\nwhere $N$ is the sequence length. We show that this formulation permits an\niterative implementation that dramatically accelerates autoregre...
The transformative leadership compass: six competencies for digital transformation entrepreneurship
253 Citations 2021Giovanni Schiuma, Eva Schettini, Francesco Santarsiero + 1 more
International Journal of Entrepreneurial Behaviour & Research
The transformative leadership compass is proposed as a model to outline the critical competencies distinguishing a digital transformative leader capable of driving continuous company innovation and specifically digital transformation entrepreneurship.
Transformative outcomes: assessing and reorienting experimentation with transformative innovation policy
159 Citations 2021Bipashyee Ghosh, Paula Kivimaa, Matías Ramírez + 2 more
Science and Public Policy
Abstract The impending climate emergency, the Paris agreement and Sustainable Development Goals demand significant transformations in economies and societies. Science funders, innovation agencies, and scholars have explored new rationales and processes for policymaking, such as transformative innovation policy (TIP). Here, we address the question of how to orient the efforts of science, technology, and innovation policy actors to enable transformations. We build on sustainability transitions research and a 4-year co-creation journey of the TIP Consortium to present twelve transformative outcom...
Transformation strategies for the supply chain: the impact of industry 4.0 and digital transformation
132 Citations 2020Raphael Preindl, Κωνσταντίνος Νικολόπουλος, Konstantia Litsiou
Supply Chain Forum an International Journal
It is shown that the possibility of an entire SC integration based on new technologies is still at distance and the impact of Industry 4.0 and the Digital Transformation on decision making is greatly connected to information sharing.
Cas-VSwin transformer: A variant swin transformer for surface-defect detection
143 Citations 2022Linfeng Gao, Jianxun Zhang, Changhui Yang + 1 more
Computers in Industry
Surface defect detection using deep learning approaches has become a promising area of research, but the difficulty of accurately locating and segmenting various forms of defects presents a challenge for this method. Swin Transformer, as a Transformer-based model, has made significant progress in computer vision. Its performance surpasses standard CNN’s performance on most tasks, but it has drawn scant attention from industrial applications. Thus far, using CNNs for surface defect detection tends to be the most common application. To explore the extensibility of the Transformer, we seek to exp...
Defect transformer: An efficient hybrid transformer architecture for surface defect detection
109 Citations 2023Junpu Wang, Guili Xu, Fuju Yan + 2 more
Measurement
Surface defect detection is an extremely crucial step to ensure the quality of industrial products. Nowadays, convolutional neural networks (CNNs) based on encoder–decoder architecture have achieved tremendous success in various defect detection tasks. However, the intrinsic locality of convolution prevents them from modeling long-range interactions explicitly, making it difficult to distinguish pseudo-defects in cluttered backgrounds. Recent transformers are especially skilled at learning global image dependencies, but with limited local structural information for the refined defect location....
Transformers in Vision: A Survey
124 Citations 2022Salman Khan, Muzammal Naseer, Munawar Hayat + 3 more
ACM Computing Surveys
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline with an introduction to fundamental concepts behind the success of Transformers, i.e., self-attention, large-scale pre-training, and bidirectional feature encoding.
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
1171 Citations 2022Xiaoyi Dong, Jianmin Bao, Dongdong Chen + 5 more
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
The Cross-Shaped Window self-attention mechanism for computing self-Attention in the horizontal and vertical stripes in parallel that form a cross-shaped window is developed, with each stripe obtained by splitting the input feature into stripes of equal width.
Adapting transformation and transforming adaptation to climate change using a pathways approach
103 Citations 2021Matthew J. Colloff, Russell Gorddard, Nick Abel + 11 more
Environmental Science & Policy
Human actions have driven earth systems close to irreversible and profound change. The need to shift towards intentional transformative adaptation (ITA) is clear. Using case studies from the Transformative Adaptation Research Alliance (TARA), we explore ITA as a way of thinking and acting that is transformative in concept and objectives, but achieved through a mix of incremental and transformative co-production processes that ultimately lead to the social-ecological system being transformed. Central to ITA are social and political issues of how individuals and collectives address environmental...
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
101 Citations 2021Xiaoyi Dong, Jianmin Bao, Dongdong Chen + 5 more
arXiv (Cornell University)
We present CSWin Transformer, an efficient and effective Transformer-based backbone for general-purpose vision tasks. A challenging issue in Transformer design is that global self-attention is very expensive to compute whereas local self-attention often limits the field of interactions of each token. To address this issue, we develop the Cross-Shaped Window self-attention mechanism for computing self-attention in the horizontal and vertical stripes in parallel that form a cross-shaped window, with each stripe obtained by splitting the input feature into stripes of equal width. We provide a mat...
Video Swin Transformer
1757 Citations 2022Ze Liu, Ning Jia, Yue Cao + 4 more
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
This paper advocates an inductive bias of locality in video Transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization.
A Pop Music Transformer is built that composes Pop piano music with better rhythmic structure than existing Transformer models, when the way a musical score is converted into the data fed to a Transformer model is improved.
Efficient Transformers: A Survey
872 Citations 2022Yi Tay, Mostafa Dehghani, Dara Bahri + 1 more
ACM Computing Surveys
This article characterizes a large and thoughtful selection of recent efficiency-flavored “X-former” models, providing an organized and comprehensive overview of existing work and models across multiple domains.
STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
183 Citations 2023Dasom Ahn, Sangwon Kim, Hyunsu Hong + 1 more
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
The proposed Spatio-TemporAl cRoss (STAR)-transformer, which can effectively represent two cross-modal features as a recognizable vector, achieves a promising improvement in performance in comparison to previous state-of-the-art methods.
Neighborhood Attention Transformer
352 Citations 2023Ali Hassani, Steven Walton, Jiachen Li + 2 more
journal unavailable
NA is a pixel-wise operation, localizing self attention to the nearest neighboring pixels, and therefore enjoys a linear time and space complexity compared to the quadratic complexity of SA, and is the first efficient and scalable sliding window attention mechanism for vision.
Video Transformers: A Survey
130 Citations 2023Javier Selva, Anders Skaarup Johansen, Sérgio Escalera + 3 more
IEEE Transactions on Pattern Analysis and Machine Intelligence
This survey dives into how videos are handled at the input level first and studies the architectural changes made to deal with video more efficiently, reduce redundancy, re-introduce useful inductive biases, and capture long-term temporal dynamics.
Global Tracking Transformers
164 Citations 2022Xingyi Zhou, Tianwei Yin, Vladlen Koltun + 1 more
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
A novel transformer-based architecture for global multi-object tracking that takes a short sequence of frames as input and produces global trajectories for all objects, and seamlessly integrates into state-of-the-art large-vocabulary detectors to track any objects.
Reformer: The Efficient Transformer
322 Citations 2020Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya
arXiv (Cornell University)
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, w...
Methane transformation by photocatalysis
347 Citations 2022Xiyi Li, Chao Wang, Junwang Tang
Nature Reviews Materials
Methane hydrate and shale gas are predicted to have substantial reserves, far beyond the sum of other fossil fuels. Using methane instead of crude oil as a building block is, thus, a very attractive strategy for synthesizing valuable chemicals. Because methane is so inert, its direct conversion needs a high activation energy and typically requires harsh reaction conditions or strong oxidants. Photocatalysis, which employs photons operated under very mild conditions, is a promising technology to reduce the thermodynamic barrier in direct methane conversion and to avoid the common issues of over...
Dual Vision Transformer
134 Citations 2023Ting Yao, Yehao Li, Yingwei Pan + 3 more
IEEE Transactions on Pattern Analysis and Machine Intelligence
This paper proposes a novel Transformer architecture that elegantly exploits the global semantics for self-attention learning, namely Dual Vision Transformer (Dual-ViT).
Text Spotting Transformers
116 Citations 2022Xiang Zhang, Yongwen Su, Subarna Tripathi + 1 more
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
This paper presents TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild, and designs a bounding-box guided polygon detection process.
Towards a Transformation of Philosophy
228 Citations 2023K. O. Apel, Glyn Adey, David Frisby
journal unavailable
First Published in 1980 (English Translation) Towards a Transformation of Philosophy presents selected essays from Karl -Otto Apel's two- volume German collection that was published in 1973 under the title Transformation der Philosophie. Karl -Otto Apel's studies in philosophy and the social sciences can be said to have bridged the gap that had hitherto existed between the Anglo-Saxon traditions of analytical philosophy of language and pragmatism, and the philosophical traditions of the European continent of phenomenology, existentialism, and hermeneutics. Apel points to language as the crucia...
Image Fusion Transformer
182 Citations 2022Vibashan VS, Jeya Maria Jose Valanarasu, Poojan Oza + 1 more
2022 IEEE International Conference on Image Processing (ICIP)
This work proposes a novel Image Fusion Transformer (IFT) where a transformer-based multi-scale fusion strategy that attends to both local and long-range information (or global context) in the image fusion process.
Transforming tuberculosis diagnosis
105 Citations 2023Madhukar Pai, Puneet Dewan, Soumya Swaminathan
Nature Microbiology
This work describes seven critical transitions that can close the massive TB diagnostic gap and enable TB programmes worldwide to recover from the pandemic setbacks.