Home / Papers / Computing Architecture for Large-Language Models (LLMs) and Large Multimodal Models...

Computing Architecture for Large-Language Models (LLMs) and Large Multimodal Models (LMMs)

DOI: 10.1145/3626184.3639692Semantic Scholar

88 Citations•2024•

Bo Liang

Proceedings of the 2024 International Symposium on Physical Design

Issues of large parameter size, trends and new usage scenarios will shape future computing architecture design, and especially their impacts on mobile processor design are discussed.

Abstract

Large-language models (LLMs) have achieved remarkable performance in many AI applications, but they require large parameter size in their models. The parameter size ranges from several billions to trillion parameters, and results in huge computation requirements on both training and inference. General speaking, LLMs increasing more parameters are to explore "Emergent Abilities" for AI models. On the other hands, LLMs with fewer parameters are to reduce computing burden to democratize generative AI applications. To fulfill huge computation requirement, Domain Specific Architecture is important to co-optimize AI models, hardware, and software designs, and to make trade-offs among different design parameters. Besides, there are also trade-offs between AI computation throughput and energy efficiency on different types of AI computing systems. Large Multimodal Models (LMMs), also called Multimodal Large Language Models, integrates multiple data types as input. Multimodal information can provide rich and or environment information for LMMs to generate better user experience. LMM is also a trend for mobile devices, because mobile devices often connect with many sensors, such as video, audio, touch, gyro, navigation system, etc. Recently, there is a trend to run smaller LLMs/LMMs (near or less than 10 billion parameters) on edge device-side, such as Llama 2, Gemini Nano, Phi-2, etc. It shines a light to apply LLMs/LMMs in mobile devices. Several companies provided experimental solutions on edge devices, such as smartphone and PC. Even LLMs/LMMs model size are reduced, they still require more computing resources than previous mobile processor workloads, and face challenges on memory size, bandwidth, and power efficiency requirements. Besides, device-side LLMs/LMMs in mobile processors can collaborate with cloud-side LLMs/LMMs in the data center to deliver better performance. They can off-load computing from cloud-side models to provide seamless response, or to become an agent to prompt cloud-side LLMs/LMMs, or be fine-tuned locally by user data to keep privacy. Those LLMs/LMMs trends and new usage scenarios will shape future computing architecture design. In this talk we will discuss those issues, and especially their impacts on mobile processor design.