Home / Papers / Top Research Papers on Transformers

Top Research Papers on Transformers

Delve into the transformative world of Transformer models with our curated list of top research papers. Whether you're a novice or an expert, these papers provide valuable insights and advancements in deep learning technology. Discover the latest trends and findings in Transformer research here.

Looking for research-backed answers?Try AI Search

Transformer in Transformer

1008 Citations 2021

Kai Han, An Xiao, Enhua Wu + 3 more

arXiv (Cornell University)

Transformer is a new kind of neural architecture which encodes the input data as powerful features via the attention mechanism. Basically, the visual transformers first divide the input images into several local patches and then calculate both representations and their relationship. Since natural images are of high complexity with abundant detail and color information, the granularity of the patch dividing is not fine enough for excavating features of objects in different scales and locations. In this paper, we point out that the attention inside these local patches are also essential for buil...

Perspectives on urban transformation research: transformations in, of, and by cities

154 Citations 2021

Katharina Hölscher, Niki Frantzeskaki

Urban Transformations

Abstract The narrative of ‘urban transformations’ epitomises the hope that cities provide rich opportunities for contributing to local and global sustainability and resilience. Urban transformation research is developing a rich yet consistent research agenda, offering opportunities for integrating multiple perspectives and disciplines concerned with radical change towards desirable urban systems. We outline three perspectives on urban transformations in , of and by cities as a structuring approach for integrating knowledge about urban transformations. We illustrate how each perspective helps d...

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

123 Citations 2021

Zilong Huang, Youcheng Ben, Guozhong Luo + 3 more

arXiv (Cornell University)

Very recently, Window-based Transformers, which computed self-attention within non-overlapping local windows, demonstrated promising results on image classification, semantic segmentation, and object detection. However, less study has been devoted to the cross-window connection which is the key element to improve the representation ability. In this work, we revisit the spatial shuffle as an efficient way to build connections among windows. As a result, we propose a new vision transformer, named Shuffle Transformer, which is highly efficient and easy to implement by modifying two lines of code....

A survey of the vision transformers and their CNN-transformer based variants

263 Citations 2023

Asifullah Khan, Zunaira Rauf, Anabia Sohail + 4 more

Artificial Intelligence Review

This survey presents a taxonomy of the recent vision transformer architectures and more specifically that of the hybrid vision transformers, and sheds light on the future directions of this rapidly evolving architecture.

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

311 Citations 2020

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas + 1 more

arXiv (Cornell University)

This work expresses the self-attention as a linear dot-product of kernel feature maps and makes use of the associativity property of matrix products to reduce the complexity from O(N) to N, where N is the sequence length.

Point Transformer

245 Citations 2021

Nico Engel, Vasileios Belagiannis, Klaus Dietmayer

IEEE Access

This work proposes SortNet, as part of the Point Transformer, which induces input permutation invariance by selecting points based on a learned score, to extract local and global features and relate both representations by introducing the local-global attention mechanism.

Point Transformer

1990 Citations 2021

Hengshuang Zhao, Li Jiang, Jiaya Jia + 2 more

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

The Point Transformer design improves upon prior work across domains and tasks and crosses the 70% mIoU threshold for the first time on the challenging S3DIS dataset.

MSA Transformer

329 Citations 2021

Roshan Rao, Jason Liu, Robert Verkuil + 5 more

journal unavailable

A protein language model which takes as input a set of sequences in the form of a multiple sequence alignment and is trained with a variant of the masked language modeling objective across many protein families surpasses current state-of-the-art unsupervised structure learning methods by a wide margin.

Misogynoir Transformed

243 Citations 2021

Moya Bailey

New York University Press eBooks

Where racism and sexism meet—an understanding of anti-Black misogyny When Moya Bailey first coined the term misogynoir , she defined it as the ways anti-Black and misogynistic representation shape broader ideas about Black women, particularly in visual culture and digital spaces. She had no idea that the term would go viral, touching a cultural nerve and quickly entering into the lexicon. Misogynoir now has its own Wikipedia page and hashtag, and has been featured on Comedy Central’s The Daily Show and CNN’s Cuomo Prime Time . In Misogynoir Transformed , Bailey delves into her groundbreaking c...

Digitale Transformation

112 Citations 2021

Jan-Felix Schrape

journal unavailable

Mit der fortschreitenden Digitalisierung gehen fundamentale gesellschaftliche Transformationsprozesse einher, die weit über eine bloße Rekonfiguration technischer, medialer oder wirtschaftlicher Verhältnisse hinausreichen. Sie betreffen sämtliche Bereiche sozialer Koordination, Kommunikation und Entwicklung der gegenwärtigen Gesellschaft. Die Einführung ordnet die viel diskutierte "digitale Transformation" in das langfristige Wechselverhältnis von Technik und Gesellschaft ein und zeigt die Diskontinuitäten, aber auch die Kontinuitäten in diesem Prozess auf. Sie gibt einen systematisierenden Üb...

Improving your data transformations: Applying the Box-Cox transformation

878 Citations 2020

Jason W. Osborne

Scholarworks (University of Massachusetts Amherst)

Many of us in the social sciences deal with data that do not conform to assumptions of normality and/or homoscedasticity/homogeneity of variance. Some research has shown that parametric tests (e.g., multiple regression, ANOVA) can be robust to modest violations of these assumptions. Yet the reality is that almost all analyses (even nonparametric tests) benefit from improved the normality of variables, particularly where substantial non-normality is present. While many are familiar with select traditional transformations (e.g., square root, log, inverse) for improving normality, the Box-Cox tra...

FLatten Transformer: Vision Transformer using Focused Linear Attention

236 Citations 2023

Dongchen Han, Xuran Pan, Yizeng Han + 2 more

journal unavailable

This paper proposes a novel Focused Linear Attention module, which introduces a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity.

The Interplay of Digital Transformational Leadership, Organizational Agility, and Digital Transformation

107 Citations 2023

Bora Ly

Journal of the Knowledge Economy

The model of how digital transformational leadership (DTL) influences DT through organizational agility (OA) is tested, demonstrating that alignment of organizational models and evolving OA are critical to DT.

Unpacking the Difference Between Digital Transformation and IT-Enabled Organizational Transformation

656 Citations 2021

Lauri Wessel, Abayomi Baiyere, Roxana Ologeanu‐Taddei + 2 more

Journal of the Association for Information Systems

An empirically grounded conceptualization is developed that sets these two phenomena apart, finding that there are two distinctive differences: digital transformation activities leverage digital technology in (re)defining an organization’s value proposition, while IT-enabled organizational transformation activities Leverage digitalTechnology in supporting the value proposition.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

27893 Citations 2021

Ze Liu, Yutong Lin, Yue Cao + 5 more

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

A hierarchical Transformer whose representation is computed with Shifted windows, which has the flexibility to model at various scales and has linear computational complexity with respect to image size and will prove beneficial for all-MLP architectures.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

355 Citations 2021

Ze Liu, Yutong Lin, Yue Cao + 5 more

arXiv (Cornell University)

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differences, we propose a hierarchical Transformer whose representation is computed with \textbf{S}hifted \textbf{win}dows. The shifted windowing scheme brings greater efficiency by limiting self-attention ...

Transformers are RNNs: Fast Autoregressive Transformers with Linear\n Attention

338 Citations 2020

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas + 1 more

arXiv (Cornell University)

Transformers achieve remarkable performance in several tasks but due to their\nquadratic complexity, with respect to the input's length, they are\nprohibitively slow for very long sequences. To address this limitation, we\nexpress the self-attention as a linear dot-product of kernel feature maps and\nmake use of the associativity property of matrix products to reduce the\ncomplexity from $\\mathcal{O}\\left(N^2\\right)$ to $\\mathcal{O}\\left(N\\right)$,\nwhere $N$ is the sequence length. We show that this formulation permits an\niterative implementation that dramatically accelerates autoregre...

The transformative leadership compass: six competencies for digital transformation entrepreneurship

253 Citations 2021

Giovanni Schiuma, Eva Schettini, Francesco Santarsiero + 1 more

International Journal of Entrepreneurial Behaviour & Research

The transformative leadership compass is proposed as a model to outline the critical competencies distinguishing a digital transformative leader capable of driving continuous company innovation and specifically digital transformation entrepreneurship.

Transformative outcomes: assessing and reorienting experimentation with transformative innovation policy

159 Citations 2021

Bipashyee Ghosh, Paula Kivimaa, Matías Ramírez + 2 more

Science and Public Policy

Abstract The impending climate emergency, the Paris agreement and Sustainable Development Goals demand significant transformations in economies and societies. Science funders, innovation agencies, and scholars have explored new rationales and processes for policymaking, such as transformative innovation policy (TIP). Here, we address the question of how to orient the efforts of science, technology, and innovation policy actors to enable transformations. We build on sustainability transitions research and a 4-year co-creation journey of the TIP Consortium to present twelve transformative outcom...

Transformation strategies for the supply chain: the impact of industry 4.0 and digital transformation

132 Citations 2020

Raphael Preindl, Κωνσταντίνος Νικολόπουλος, Konstantia Litsiou

Supply Chain Forum an International Journal

It is shown that the possibility of an entire SC integration based on new technologies is still at distance and the impact of Industry 4.0 and the Digital Transformation on decision making is greatly connected to information sharing.