This approach does not require any assumptions and proves that most networks composed of convolutions and Transformers can be mathematically written in the same form as the fully proven universal approximation theorem, thus establishing them as specific implementations of theuniversal approximation theorem.
With the increase in depth and complexity, deep learning networks have made significant progress in various directions. However, the theoretical understanding of deep learning is still incomplete. Early research successfully proved the universal approximation theorem for linear networks, but these proofs were limited to linear networks. Subsequent studies attempted to prove the approximation properties of convolutions and Transformers, but the proof processes often relied on complex assumptions or were very intricate. This paper aims to propose a unified approach to demonstrate that multi-layer networks composed of convolutions and Transformers are specific realizations of the universal approximation theorem. This approach does not require any assumptions and proves that most networks composed of convolutions and Transformers can be mathematically written in the same form as the fully proven universal approximation theorem, thus establishing them as specific implementations of the universal approximation theorem. This bridges the gap between deep learning practice and theoretical understanding. The method of unifying them is to represent these network architectures (linear, convolutional, and Transformer) in matrix-vector form, hence this unified approach is called the matrix-vector method. This paper takes an important step towards unifying the entire field of deep learning. It deepens our theoretical understanding and reveals the fundamental principles behind the exceptional performance of these networks. It also paves the way for exploring new research directions and optimizing the learning process in various deep learning applications.