login
Home / Papers / Continual Learning of Language Models

Continual Learning of Language Models

7 Citations•2023•
Zixuan Ke, Yijia Shao, Haowei Lin
ArXiv

It is seen that different domains give similar importance values, which indirectly shows that the proxy can approximately identify the common general knowledge.

Abstract

For each domain i , we compare its importance vector with the importance vector of every other domain, and then average the cosine similarities to get the value for domain i . We get 0.92 for Restaurant, the same 0.91 for ACL, AI, and Phone, 0.89 for PubMed and 0.92 for Camera. We see that different domains give similar importance values, which indirectly shows that our proxy can approximately identify the common general knowledge.