Revisiting Pre-Trained Models for Chinese Natural Language Processing
Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and it is proposed that this model improves upon RoBERTa in several ways, especially the masking strategy that adopts MLM as correction (Mac).
Abstract
Bidirectional Encoder Representations from Transformers (BERT) has shown\nmarvelous improvements across various NLP tasks, and consecutive variants have\nbeen proposed to further improve the performance of the pre-trained language\nmodels. In this paper, we target on revisiting Chinese pre-trained language\nmodels to examine their effectiveness in a non-English language and release the\nChinese pre-trained language model series to the community. We also propose a\nsimple but effective model called MacBERT, which improves upon RoBERTa in\nseveral ways, especially the masking strategy that adopts MLM as correction\n(Mac). We carried out extensive experiments on eight Chinese NLP tasks to\nrevisit the existing pre-trained language models as well as the proposed\nMacBERT. Experimental results show that MacBERT could achieve state-of-the-art\nperformances on many NLP tasks, and we also ablate details with several\nfindings that may help future research. Resources available:\nhttps://github.com/ymcui/MacBERT\n